Schizophrenia Research Forum - A Catalyst for Creative Thinking

Collective Commons: SNPs Tag at Least One-Quarter of Schizophrenia Risk

21 February 2012. Common variants capture at least 23 percent of liability for schizophrenia, according to a new analysis published February 19 in Nature Genetics. Led by Naomi Wray and Peter Visscher at the University of Queensland in Brisbane, Australia, the study estimates the collective effect of all single nucleotide polymorphisms (SNPs) in the large dataset of the Psychiatric GWAS Consortium on Schizophrenia (PGC-SCZ). The results argue that many common variants with small effect sizes substantially contribute to risk for the disorder.

It fits with a mutational load kind of model, where many things have to go wrong at the same time to increase risk, Wray told SRF. To me, the biology of it makes sense in that we have a robust system with redundancy, so that when any one of these things goes wrong, individually it doesn't have much impact.

While ever-larger schizophrenia GWAS have detected a handful of common variants meeting the very high bar for genomewide significance (see SRF related news story), together these variants account for only about 3 percent of the risk, falling short of explaining the rather high heritability of schizophrenia. This missing heritability problem is seized upon by some as an argument against common variants as making any meaningful contribution to schizophrenia risk, leaving rare variants as more important culprits (see SRF genetics overview). Amid this debate, the new analysis offers a tangible, quantitative measure of the relative importance of common variants, and argues that some heritability is not missing, but hidden among many common variants with very small effect sizes lurking below the genomewide significance threshold, but which would emerge with larger GWAS.

Further analyses showed that this signal was due to common causal variants rather than rare ones, and was enriched for SNPs within central nervous system genes. The findings dont rule out a contribution of rare variants to the as-yet unaccounted for heritability, however. I 100 percent expect there to be rare variants, Wray says. But I think following up the common ones may be more informative for the population as a whole than studying the rare ones.

Common contribution
Using methods first applied to human height GWAS (Yang et al., 2010), first author Sang Hong Lee and colleagues estimated the total contribution of 915,354 SNPs to schizophrenia liability in 9,087 individuals with schizophrenia and 12,171 controls—the first time these methods have been applied to disease.

The researchers estimated how genetically different cases were from controls with a measure of genomic variance based on comparisons between the patterns of the 915,354 SNPs in each individual. This computationally intensive endeavor found that the genomic variance accounted for 23 percent of the phenotypic variance, summarized as liability for schizophrenia, equivalent to 30 percent of its heritability. This estimate was consistent with one Wray previously made on a subset of the sample (see SRF related conference story), and another in a related analysis in the GWAS conducted by the International Schizophrenia Consortium (see SRF related news story).

The researchers then set about exploring how much of this reflected a true signal versus artifacts of case-control studies. Genotyping bias, which can lead to seemingly disease-related results when case samples are processed differently from control ones, was ruled out based on an analysis showing that subsets of the data collected by different research groups provided similar results. Population stratification, another potential artifact in GWAS in which different allele frequencies occur between cases and controls because of differences in ancestry rather than disease status, was also deemed to be minimal in the new analysis. When population stratification is driving a GWAS signal, a causal variant on one chromosome could be tagged by a SNP on a different chromosome with the same ancestry. To address this possibility, the researchers partitioned the SNP data by chromosome, and asked what proportion of the variance each contributed. Considering the chromosomes separately, then adding up their individual contributions to the variance, picks up on cross-chromosomal signals, and showed 26 percent—an estimate that was not dramatically higher than the 23 percent found when considering all chromosomes simultaneously. This similarity suggests that the GWAS signal was not an artifact.

The amount of variation attributed to SNPs from each chromosome also correlated with the length of the chromosome (r = 0.89, p = 2.6x10-8)—something that fits with a polygenic model of schizophrenia. Furthermore, despite clinical differences between men and women with schizophrenia, subdividing the data by gender did not reveal a difference in the variance in liability captured by SNPs on the autosomes, and on the X chromosome. This suggests that males and females share the same genetic basis for schizophrenia.

The common-to-rare spectrum
The researchers also partitioned the variance in liability captured by SNPs by function in order to examine the amount of the variance—the 23 percent figure from above—explained by SNPs within genes highly expressed in the central nervous system (CNS), by SNPs in other genes, and by SNPs not within genes. This revealed a similar proportion of variance in each of these three broad categories; however, the 2,725 CNS genes accounted for more variance (31 percent) than expected, given their length and the number of their SNPs (they represent only 20 percent of the genome). This argues that the genomic variance captured by SNPs includes signals pertaining to the brain.

Finally, the researchers had a look at how the variance distributed itself across common and less common SNPs to grapple with the type of variant responsible for their signal. Dividing the SNPs by their minor allele frequency (MAF), the researchers found that the least common ones (0.01 <-MAF <-0.1, meaning they made up 1-10 percent of all gene copies in the population) contributed 2 percent to the 23 percent estimate. The others, ranging from 0.1 to 0.5 MAF and hence fitting the definition of a common variant, contributed the rest. The researchers also simulated a rare, variant-only model of disease and turned up a different distribution of how variance was allocated, with little resemblance to what had been observed. These analyses finger common variants as responsible for the genetic susceptibility to schizophrenia captured by SNPs.

Whats left? The authors suggest that the remaining missing heritability can be found in causal variants that are not yet tagged consistently by SNPs with current microarray technology. This includes both common and rare variants, and finding the rare ones is exacerbated by the fact that it is hard to correlate common SNPs with something that is rare. While convinced that many, many common variants of small effect form a substantial part of the genetic architecture of schizophrenia, the authors recognize the potential contributions of rare variants, concluding: Hence, causal risk variants for schizophrenia range across the entire allelic frequency spectrum.—Michele Solis.

Lee SH, DeCandia TR, Ripke S, Yang J, The Schizophrenia Psychiatric Genome-Wide Association Study Consortium (PGC-SCZ), The International Schizophrenia Consortium (ISC), The Molecular Genetics of Schizophrenia Collaboration (MGS), Sullivan PF, Goddard ME, Keller MC, Visscher PM, Wray NR. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat Genet 2012. Abstract

Comments on News and Primary Papers

Primary Papers: Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs.

Comment by:  Bryan Roth, SRF Advisor
Submitted 24 February 2012
Posted 27 February 2012
  I recommend this paper

This is an interesting analytic paper which tests the hypothesis that common genomic variants are responsible for a substantial proportion of the variance in genomewide association studies of schizophrenia.

As large-scale efforts to fully sequence genomes of individuals with schizophrenia are underway at many centers, it will be interesting to revisit this hypothesis.

View all comments by Bryan Roth

Comments on Related News

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  Todd LenczAnil Malhotra (SRF Advisor)
Submitted 3 July 2009
Posted 3 July 2009

The three companion papers published in Nature provide important new evidence for a role of the MHC complex and common variation across the genome in risk for schizophrenia. These studies have exploited the availability of comprehensive genotyping technologies, coupled with large cohorts of cases and controls, to identify candidate loci for disease susceptibility.

A notable feature of these papers is the clear willingness of each of the groups to share its data, and to provide overlapping presentations of each others results. The combination of datasets permitted the statistical significance of the MHC findings to emerge, thereby increasing confidence in results. The implication that immune processes may interact with genetic risk to influence schizophrenia risk is consistent with several lines of evidence, including our own small GWAS study (Lencz et al., 2007) implicating cytokine receptors in schizophrenia susceptibility.

Perhaps most intriguing is the finding from the International Schizophrenia Consortium demonstrating that a score test—combining information from many thousands of common variants—can reliably differentiate patients and controls across multiple psychiatric cohorts. These results indicate that hundreds, if not thousands, of genes of small effect may contribute to schizophrenia risk. Moreover, these same genes were shown to contribute to bipolar risk (but not risk for non-psychiatric disorders such as diabetes).

Much more work remains to be done in psychiatric genetics. While the score test accounted for about 3 percent of the observed case-control variance, statistical modeling suggested that common variation could explain as much as one-third or more of the total risk. Nevertheless, there remains a substantial proportion of genetic dark matter (unexplained variance), given the high heritability of a disorder such as schizophrenia. Complementary approaches are needed to further parse the source of the common genetic variance, as well as to identify rare yet highly penetrant mutations. Additional techniques, such as pharmacogenetic studies and endophenotypic research, will help to explicate the functionality and clinical significance of observed risk alleles.

View all comments by Todd Lencz
View all comments by Anil Malhotra

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  Daniel Weinberger, SRF Advisor
Submitted 3 July 2009
Posted 3 July 2009

The three Nature papers reporting GWAS results in a large sample of cases of schizophrenia and controls from around Western Europe and the U.S. are decidedly disappointing to those expecting this strategy to yield conclusive evidence of common variants predicting risk for schizophrenia. Why has this extensive and very costly effort not produced more impressive results? There are likely to be many explanations for this, involving the usual refrains about clinical and genetic heterogeneity, diagnostic imprecision, and technical limitations in the SNP chips. But the likely, more fundamental problem in psychiatric genetics involves the biologic complexity of the conditions themselves, which renders them especially poorly suited to the standard GWAS strategy. The GWA analytic model assumes fixed, predictable relationships between genetic risk and illness, but simple relationships between genetic risk and complex pathophysiological mechanisms are unlikely. Many biologic functions show non-linear relationships, and depending on the biologic context, more of a potential pathogenic factor, can make things worse or it can make them better. Studies of complex phenotypes in model systems illustrate that individual gene effects depend upon non-linear interactions with other genes (Toma et al., 2002; Shaoa et al, 2008). Similar observations are beginning to emerge in human disorders, e.g., in risk for cancer (Lo et al., 2008) and depression (Pezawas et al., 2008).

The GWA approach also assumes that diagnosis represents a unitary biological entity, but most clinical diagnoses are syndromal and biologically heterogeneous, and this is especially true in psychiatric disorders. Type 2 diabetes is the clinical expression of changes in multiple physiologic processes, including in pancreatic function, in adipose cell function, as well as in eating behavior. Likewise, hypertension results from abnormalities in many biologic processes (e.g., vascular reactivity, kidney function, CNS control of blood pressure, metabolic factors, sodium regulation), and even a large effect on any specific process within a subset of individuals will seem small when measured in large unrelated samples (Newton-Cheh et al., 2009). In the case of the cognitive and emotional problems associated with psychiatric disorders, the biologic pathways to clinical manifestations are probably much more heterogeneous. While the results of GWAS in disorders like type 2 diabetes and hypertension have been more informative than in the schizophrenia results so far, they, too, have been disappointing, considering all the fanfare about their expectations. But given the pathophysiologic realities of diabetes, hypertension, or psychiatric disorders, how could the effect of any common genetic variant acting on only one of the diverse pathophysiological mechanisms implicated in these disorders be anything other than small when measured in large pathophysiologically heterogeneous populations? Other approaches, e.g., family studies, studies of smaller but much better characterized samples, and studies of genetic interactions in these samples, will be necessary to understand the variable genetic architectures of such biologically complex and heterogeneous disorders.


Toma DP, White KP, Hirsch J and Greenspan RJ: Identification of genes involved in Drosophila melanogaster geotaxis, a complex behavioral trait. Nature Genetics 2002; 31: 349-353. Abstract

Shaoa H, Burragea LC, Sinasac DS et al : Genetic architecture of complex traits: Large phenotypic effects and pervasive epistasis. PNAS 2008 105: 1991019914. Abstract

Lo S-W, Chernoff H, Cong L, Ding Y, and Zheng T: Discovering interactions among BRCA1 and other candidate genes associated with sporadic breast cancer. PNAS 2008; 105: 1238712392. Abstract

Pezawas L, Meyer-Lindenberg A, Goldman AL, et al.: Biologic epistasis between BDNF and SLC6A4 and implications for depression. Mol Psychiatry 2008;13:709-716. Abstract

Newton-Cheh C, Larson MG, Vasan RS: Association of common variants in NPPA and NPPB with circulating natriuretic peptides and blood pressure. Nat Gen 2009; 41: 348-353. Abstract

View all comments by Daniel Weinberger

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  Irving Gottesman, SRF Advisor
Submitted 3 July 2009
Posted 3 July 2009
  I recommend the Primary Papers

The synthesis and extraction of the essence of the 3 Nature papers by Heimer and Farley represents science reporting at its best. Completion of the task while the ink was still wet shows that SRF is indeed in good hands. Congratulations on being concise, even-handed, non-judgmental, and challenging under the pressure of time.

View all comments by Irving Gottesman

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  Christopher RossRussell L. Margolis
Submitted 6 July 2009
Posted 6 July 2009

Schizophrenia Genetics: Glass Half Full?
While it may be disappointing that the GWAS described above did not identify more genes, they nevertheless represent a landmark in psychiatric genetics and suggest a dual approach for the future: continued large-scale genetic association studies along with alternative genetic approaches leading to the discovery of new genetic etiologies, and more functional investigations to identify pathways of pathogenesis—which may themselves suggest new etiologies.

The consistent identification of an association with the MHC locus reinforces (without proving, as pointed out in the SRF news story) long-standing interest in the involvement of infectious or immune factors in schizophrenia pathogenesis (Yolken and Torrey, 2008). Epidemiologic and neuropathological studies that include patients selected for the presence or absence of immunologic genetic risk variants could potentially clarify etiology; cell and mouse model studies could clarify pathogenesis (Ayhan et al., 2009). It is striking that a major genetic finding in schizophrenia serves to reinforce the concept of environmental risk factors.

The two specific genes identified by the SGENE consortium, NRGN and TCF4, offer intriguing new leads into schizophrenia. This should foster a number of further genetic and neurobiological studies. Deep resequencing (and CNV analysis) can detect rare causative mutations, as exemplified by TCF4 mutations leading to Pitt-Hopkins syndrome. Neurogranin already has clear connections to interesting signaling pathways related to glutamate transmission. A hope is that further studies of both gene products and their interactions will identify pathogenic pathways.

The ISC used common genetic variants en masse to generate a polygene score from discovery samples of patients; that score was able to predict case status in test populations. The success of this approach provides very strong evidence that a portion of schizophrenia risk status is attributable to common genetic variants acting in concert and that schizophrenia shares genetic factors with bipolar disorder, but not with other diseases. This analysis has multiple practical implications for the direction of research. First, since polygenic factors explain only a portion of the genetic risk, the search for other genetic factors—rare mutations of major effect detectable by deep sequencing, CNVs, variations in tandem repeats (Bruce et al., 2009, in press), and other genomic lesions—takes on new importance. Second, a meaningful integration of polygenic factors in a way that facilitates understanding of schizophrenia pathogenesis and the discovery of therapeutic targets will require identification of relevant pathways. Examination of patient-derived material—such as neurons differentiated from induced pluripotent stem cells taken from well-characterized, patient populations—may be of great value.

The remarkable overlap between the genetic factors of schizophrenia and bipolar disorder suggests the need for further and more inclusive clinical studies—not just of endophenotypes, but also of the phenotypes themselves, together, rather than in isolation (Potash and Bienvenu, 2009). For instance, it is only within the past few years that the importance of cognitive dysfunction in schizophrenia has been appreciated. Cognition in bipolar disorder is even less well studied.

How much is really known about the longitudinal course of both disorders? Do genetic factors predict disease outcome? It is only recently that studies have focused intensively on the early course of schizophrenia and its prodrome. Much more is still to be learned, and even less is known about bipolar disorder. In conjunction with this greater understanding of clinical phenotype, it will clearly be necessary to refine the approach to phenotype by establishing the biological framework for these diseases and by establishing biomarkers, such as disruption in white matter (Karlsgodt et al., 2009) or abnormalities in functional networks (Demirci et al., 2009), that cut across current nosological categories. In turn, longitudinal study of clinical, imaging, and functional outcomes of schizophrenia and bipolar disorders should facilitate both focused candidate genetic studies and GWAS of large populations.


Yolken RH, Torrey EF. Are some cases of psychosis caused by microbial agents? A review of the evidence. Mol Psychiatry. 2008 May;13(5):470-9. Abstract

Ayhan Y, Sawa A, Ross CA, Pletnikov MV. Animal models of gene-environment interactions in schizophrenia. Behav Brain Res. 2009 Apr 18. Abstract

Potash JB, Bienvenu OJ. Neuropsychiatric disorders: Shared genetics of bipolar disorder and schizophrenia. Nat Rev Neurol. 2009 Jun;5(6):299-300. Abstract

Karlsgodt KH, Niendam TA, Bearden CE, Cannon TD. White matter integrity and prediction of social and role functioning in subjects at ultra-high risk for psychosis. Biol Psychiatry. 2009 May 6. Epub ahead of print. Abstract

Demirci O, Stevens MC, Andreasen NC, Michael A, Liu J, White T, Pearlson GD, Clark VP, Calhoun VD. Investigation of relationships between fMRI brain networks in the spectral domain using ICA and Granger causality reveals distinct differences between schizophrenia patients and healthy controls. Neuroimage. 2009 Jun;46(2):419-31. Abstract

Bruce HA, Sachs NA, Rudnicki DD, Lin SG, Willour VL, Cowell JK, Conroy J, McQuaid D, Rossi M, Gaile DP, Nowak NJ, Holmes SE, Sklar P, Ross CA, DeLisi LE, Margolis RL. Long tandem repeats as a form of genomic copy number variation: structure and length polymorphism of a chromosome 5p repeat in control and schizophrenia populations. Psychiatric Genetics, in press.

View all comments by Christopher Ross
View all comments by Russell L. Margolis

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  David Collier
Submitted 6 July 2009
Posted 6 July 2009
  I recommend the Primary Papers

This report is unnecessarily negative, from my point of view. The three studies show not only that GWAS can identify susceptibility alleles for schizophrenia, but that the majority of risk comes from common variants of small effect. These can be found, but as in other complex traits and diseases, such as obesity and height, considerable power is needed, because effect sizes are small, meaning greater samples sizes. This approach works: there are now almost 60 variants influencing height (Hirschhorn et al., 2009; Soranzo et al., 2009; Sovio et al., 2009). Furthermore, the genes identified so far from both traditional mapping, CNV analysis and GWAS, point to two biological pathways, the integrity of the synapse (neurexin 1, neurogranin, etc.) and the wnt/GSK3β signaling pathway (DISC1, TCF4, etc.), which is involved in functions such as neurogenesis in the brain. The identification of disease pathways for schizophrenia has major implications and should not be underestimated. It would be daft to lose nerve now and turn away from GWAS just as they are bearing fruit.

I would like to correct/expand on my comments to Peter Farley, to say that while statistical significance for some markers may be reached sooner, significance for many of the hundreds if not thousands of common schizophrenia susceptibility alleles of small effect might not emerge until samples of 100,000 cases and more than 100,000 controls have been collected. Another point is that organizations such the Wellcome Trust are already assembling case samples for schizophrenia as well as control samples.

Also, I would like to clarify that I believe the remainder of genetic variation, after common variation has been taken into account, will come from some combination of rare CNVs, other rare variants such as SNPs and other types of genetic marker such as variable number of tandem repeats (VNTRs) and of course the much neglected contribution from gene-environment interactions, in which main genetic effects may be obscured.


Hirschhorn JN, Lettre G. Progress in genome-wide association studies of human height. Horm Res. 2009 Apr 1 ; 71 Suppl 2():5-13. Abstract

Soranzo N, Rivadeneira F, Chinappen-Horsley U, Malkina I, Richards JB, Hammond N, Stolk L, Nica A, Inouye M, Hofman A, Stephens J, Wheeler E, Arp P, Gwilliam R, Jhamai PM, Potter S, Chaney A, Ghori MJ, Ravindrarajah R, Ermakov S, Estrada K, Pols HA, Williams FM, McArdle WL, van Meurs JB, Loos RJ, Dermitzakis ET, Ahmadi KR, Hart DJ, Ouwehand WH, Wareham NJ, Barroso I, Sandhu MS, Strachan DP, Livshits G, Spector TD, Uitterlinden AG, Deloukas P. Meta-analysis of genome-wide scans for human adult stature identifies novel Loci and associations with measures of skeletal frame size. PLoS Genet. 2009 Apr 1 ; 5(4):e1000445. Abstract

Sovio U, Bennett AJ, Millwood IY, Molitor J, O'Reilly PF, Timpson NJ, Kaakinen M, Laitinen J, Haukka J, Pillas D, Tzoulaki I, Molitor J, Hoggart C, Coin LJ, Whittaker J, Pouta A, Hartikainen AL, Freimer NB, Widen E, Peltonen L, Elliott P, McCarthy MI, Jarvelin MR. Genetic determinants of height growth assessed longitudinally from infancy to adulthood in the northern Finland birth cohort 1966. PLoS Genet. 2009 Mar 1 ; 5(3):e1000409. Abstract

View all comments by David Collier

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  Michael O'Donovan, SRF AdvisorNick CraddockMichael Owen (SRF Advisor)
Submitted 9 July 2009
Posted 9 July 2009

Some commentators in their reflections take a rather negative view on what has been achieved through the application of GWAS technology to schizophrenia and psychiatric disorders more generally. We strongly disagree with this position. Below, we give examples of a number of statements that can be made about the aetiology of schizophrenia and bipolar disorder that could not be made at high levels of confidence even two years ago that are based upon evidence deriving from the application of GWAS.

1. We know with confidence that the role of rare copy number variants in schizophrenia is not limited to 22q11DS (VCFS) (reviewed recently in ODonovan et al., 2009). We do not yet know how much of a contribution, but we know the identity of an increasing number of these. Most span multiple genes so it may prove problematic as it has in 22q11DS to identify the relevant molecular mechanisms. However, for one locus, the CNVs are limited to a single gene: Neurexin1 (Kirov et al., 2008; Rujescu et al., 2009). Genetic findings are merely the start of the journey to a deeper biological understanding, but no doubt many neurobiological researchers have already embarked on that journey in respect of neurexin1.

2. Although we have argued in this forum that some of the major pre-GWAS findings in schizophrenia very likely reflect true susceptibility genes (DTNBP1, NRG1, etc), we now have at least 4 novel loci where the evidence is more definitive (ZNF804A, MHC, NRGN, TCF4), (ODonovan et al., 2008a; ISC, 2009; Shi et al., 2009; Stefansson et al., 2009) and two novel loci (Ferreira et al., 2008) in bipolar disorder (ANK3 and CACNA1C), at least one of which (CACNA1C) additionally confers risk of schizophrenia (Green et al., 2009). This is obviously a small part of the picture, but it is certainly better than no picture at all. These findings also offer a much more secure foundation than the earlier findings upon which to build follow up studies, for example brain imaging, and cognitive phenotypes (Esslinger et al., 2009), and even candidate gene studies. We would not regard the first convincing evidence that altered calcium channel function is a primary aetiological event in at least some forms of psychosis as a trivial gain in knowledge.

3. We can say with confidence that common alleles of small effect are abundant in schizophrenia, and that they contribute to a substantial part of the population risk (ISC, 2009). Identifying any one of these at stringent levels of statistical significance may be challenging in terms of sample sizes. As we have pointed out before, merging multiple datasets may indeed obscure some true associations because of sometimes unpredictable relationships between risk alleles and those assayed indirectly in GWAS studies (Moskvina and ODonovan, 2007). Nevertheless, that many of the same alleles are overrepresented in multiple independent GWAS datasets from different countries (ISC, 2009) means that larger samples offer the prospect of identifying many more of these. This is not to say that large samples are the only approach; genetic heterogeneity may well underpin some aspects of clinical heterogeneity (Craddock et al., 2009a). However, with the exception of individual large pedigrees, it is not yet evident which type of clinical sample one should base a small scale study on. It should also be self-evident that the analysis of multiple samples, each with a different phenotypic structure, will pose major problems in respect of multiple testing and subsequent replication. Moreover, ascertaining special samples that represent putative subtypes of the clinical (and endophenotypic) spectrum of psychosis will first require large samples to be carefully assessed and the relevant subjects extracted. Subsequently, downstream, evaluation of specific genotype-phenotype relationships will require the remainder of the clinical population to be genotyped in a suitably powered way to show that those effects are specific to some clinical features of the disorder. Increasingly, it is ascertainment and assessment that dominate the cost of GWAS studies so it is not clear this approach will achieve any economies. We must also remember that after a GWAS study, there remains the opportunity to look in a controlled manner for relatively specific associations in the context of the heterogeneous clinical picture. For example we are aware of a number of papers in development that will exploit the sorts of multi-locus tests reported by the ISC to refine diagnostics, and no doubt many other applications of this will emerge in the next year or so.

Critics should bear in mind that the GWAS data are not just there for the headline genome-wide findings, but that the data will be available to mine for years to come. The findings reported to date are based on only the simplest analyses.

4. If it were the case that the thousands of SNPs of small effect were randomly distributed across biological systems, none being of more relevance to pathophysiology than another, identifying them would probably be a pointless endeavour. However, there is no reason to believe this will be the case. We have recently shown that in bipolar disorder, the GWAS signals are enriched in particular biological pathways (Holmans et al., 2009) and we also published strong evidence for a relatively selective involvement of the GABAergic system in schizoaffective disorder (Craddock et al., 2009b). We are aware of an as-yet unpublished independent sample with similar findings. We would not regard the first convincing evidence that altered GABA function is a primary aetiological event in at least some forms of psychosis as a trivial gain in knowledge.

Incidentally it is a common misconception that the identification of risk alleles of small effect necessarily confers no useful insights into pathogenesis and possible drug targets. For example, common alleles in PPARG and KCNJ11 have been robustly shown to confer risk to Type 2 diabetes (T2D) but with odds ratios in the region of only 1.14 (of similar magnitude to those revealed by GWAS of schizophrenia). PPARG encodes the target for the thiazolidinedione class of drugs used to treat T2D. KCNJ11 encodes part of the target for another class of diabetes drug, the sulphonylureas (Prokopenko et al., 2008). Moreover, studies of novel associated variants identified in T2D GWAS in healthy, non-diabetic, populations have demonstrated that for most, the primary effect on T2D susceptibility is mediated through deleterious effects on insulin secretion, rather than insulin action (Prokopenko et al., 2008). Further examples of insights into the biology of common diseases coming from the identification of loci of small effect are the implication of the complement system in age-related macular degeneration and autophagy in Crohns disease (Hirschhorn, 2009). Already, efforts are under way to translate the new recognition of the role of autophagy in Crohns disease into new therapeutic leads (Hirschhorn, 2009). Of course many of the loci identified in GWAS implicate genes whose functions are as yet largely or completely unknown, and determining those functions is a prerequisite of translating those findings. Nevertheless, we believe that the greatest benefits that will accrue from the continued discovery of risk loci through GWAS will come from the assembly of that information into novel disease pathways leading to novel therapeutic targets.

5. We can say with confidence that bipolar disorder and schizophrenia substantially overlap, at least in terms of polygenic risk (ISC, 2009). As clinicians, we do not regard that knowledge as a trivial achievement.

6. We can say with confidence from studies of CNVs that schizophrenia and autism share at least some risk factors in common (ODonovan et al., 2009). We believe that is also an important insight.

The above are major achievements in what we expect to be a long but accelerating process of picking apart the origins of schizophrenia and other psychotic disorders. We do not think that any other research discipline in psychiatry has done more to advance that knowledge in the past 100 years.

Like that other common familial diseases, the genetics of schizophrenia and bipolar disorder is a mixed economy of common alleles of small effect and rare alleles of large and small effects, including CNVs. Those who are concerned at the cost of collecting large samples for GWAS studies must bear in mind that the robust identification of both types of mutation will require similarly large samples; we will just have to get used to that fact if we want to make progress. Collecting samples like this may be expensive, but as clinicians, we know those costs are trivial compared with the human and economic costs of psychotic disorders.

The question of phenotype definition is one which we have repeatedly addressed (Craddock et al., 2009a). Unquestionably, if we knew the true pathophysiological basis of these disorders, we could do better. The fact is that we dont. Given that, it must be extremely encouraging that despite the problems, risk loci can be robustly identified by GWAS using samples defined by current diagnostic criteria. Moreover, armed with GWAS data in these heterogeneous populations, additional risk genes can be identified through strategies aimed at refining the phenotype that are not constrained by the current dichotomous view of the functional psychoses. We have shown at least one way in which this might be achieved without imposing a further burden of multiple testing (Craddock et al., 2009b), and have little doubt that others will emerge. We agree that approaches to phenotyping that more directly index underlying pathophysiology are highly appealing, and will ultimately be necessary for understanding the mechanistic relationships between gene and disorder. However, the two cardinal assumptions upon which the use of endophenotypes is predicated for gene discovery are questionable. First, there is little good evidence that putative endophenotypes are substantially simpler genetically than exophenotypes (Flint and Munafo, 2007). Second, there is rarely good evidence that the current crop of popular putative endophenotypes lie on the disease pathway, indeed there seems to be substantial pleiotropy in the genetics of complex traits, psychosis included (Prokopenko et al., 2008; ODonovan et al., 2008b).

Finally, we reiterate that while only small parts of the heritability of any complex disorder have been accounted for, large-scale genetic approaches have been extremely successful in studies of non-psychiatric diseases (Manolio et al., 2008) and have led to substantial advances in our understanding of pathogenesis, even for diseases like Crohns disease where there was already prior knowledge of pathogenesis from other research methods (Mathew, 2008).

Psychiatry starts from a situation in which there is no robust prior knowledge of pathogenesis for the major phenotypes. Recent findings suggest that mental illness may be the medical field that will actually benefit most over the coming years from application of these powerful molecular genetic technologies.

Craddock N, O'Donovan MC, Owen MJ. (2009a) Psychosis Genetics: Modeling the Relationship between Schizophrenia, Bipolar Disorder, and Mixed (or "Schizoaffective") Psychoses. Schizophrenia Bulletin 35(3):482-490. Abstract

Craddock N, Jones L, Jones IR, Kirov G, Green EK, Grozeva D, Moskvina V, Nikolov I, Hamshere ML, Vukcevic D, Caesar S, Gordon-Smith K, Fraser C, Russell E, Norton N, Breen G, St Clair D, Collier DA, Young AH, Ferrier IN, Farmer A, McGuffin P, Holmans PA, Wellcome Trust Case Control Consortium (WTCCC), Donnelly P, Owen MJ, ODonovan MC. Strong genetic evidence for a selective influence of GABAA receptors on a component of the bipolar disorder phenotype. Molecular Psychiatry advanced online publication 1 July 2008; doi:10.1038/mp.2008.66. (b) Abstract

Esslinger C, Walter H, Kirsch P, Erk S, Schnell K, Arnold C, Haddad L, Mier D, Opitz von Boberfeld C, Raab K, Witt SH, Rietschel M, Cichon S, Meyer-Lindenberg A. (2009) Neural mechanisms of a genome-wide supported psychosis variant. Science 324(5927):605. Abstract

Ferreira MAR, ODonovan MC, Meng YA, Jones IR, Ruderfer DM, Jones L, Fan J, Kirov G, Perlis RH, Green EK, Smoller JW, Grozeva D, Stone J, Nikolov I, Chambert K, Hamshere ML, Nimgaonkar V, Moskvina V, Thase ME, Caesar S, Sachs GS, Franklin J, Gordon-Smith K, Ardlie KG, Gabriel SB, Fraser C, Blumenstiel B, Defelice M, Breen G, Gill M, Morris DW, Elkin A, Muir WJ, McGhee KA, Williamson R, MacIntyre DJ, McLean A, St Clair D, VanBeck M, Pereira A, Kandaswamy R, McQuillin A, Collier DA, Bass NJ, Young AH, Lawrence J, Ferrier IN, Anjorin A, Farmer A, Curtis D, Scolnick EM, McGuffin P, Daly MJ, Corvin AP, Holmans PA, Blackwood DH, Wellcome Trust Case Control Consortium (WTCCC), Gurling HM, Owen MJ, Purcell SM, Sklar P and Craddock NJ. (2008) Collaborative genome-wide association analysis of 10,596 individuals supports a role for Ankyrin-G (ANK3) and the alpha-1C subunit of the L-type voltage-gated calcium channel (CACNA1C) in bipolar disorder. Nature Genetics 40:1056-1058. Abstract

Flint J, Munaf MR. (2007) The endophenotype concept in psychiatric genetics. Psychological Medicine 37(2):163-180. Abstract

Green EK, Grozeva D, Jones I, Jones L, Kirov G, Caesar S, Gordon-Smith K, Fraser C, Forty L, Russell E, Hamshere ML, Moskvina V, Nikolov I, Farmer A, McGuffin P, Wellcome Trust Case Consortium, Holmans PA, Owen MJ, ODonovan MC and Craddock N. (2009) Bipolar disorder risk allele at CACNA1C also confers risk to recurrent major depression and to schizophrenia. Molecular Psychiatry (in press).

Hirschhorn JN. (2009) Genomewide association studies--illuminating biologic pathways. New England Journal of Medicine 360(17):1699-1701. Abstract

Holmans P, Green E, Pahwa J, Ferreira M, Purcell S, Sklar P, Owen M, ODonovan M, Craddock N. Gene ontology analysis of GWAS datasets provide insights into the biology of bipolar disorder. The American Journal of Human Genetics 2009 Jun 17 [Epub ahead of print]. International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 2009 Jul 1 [Epub ahead of print]. Abstract

Kirov G, Gumus D, Chen W, Norton N, Georgieva L, Sari M, O'Donovan MC, Erdogan F, Owen MJ, Ropers HH, Ullmann R. (2008) Comparative genome hybridization suggests a role for NRXN1 and APBA2 in schizophrenia. Human Molecular Genetics 17(3):458-465. Abstract

Manolio TA, Brooks LD, Collins FS. (2008) A HapMap harvest of insights into the genetics of common disease. Journal of Clinical Investigation 118(5):1590-1605. Abstract

Mathew CG. (2008) New links to the pathogenesis of Crohn disease provided by genome-wide association scans. Nature Review Genetics 9(1):9-14. Abstract

Moskvina V and O'Donovan MC. (2007) Detailed analysis of the relative power of direct and indirect association studies and the implications for their interpretation. Human Heredity 64(1):63-73. Abstract

ODonovan MC, Kirov G, Owen MJ. (2008a) Phenotypic variations on the theme of CNVs. Nature Genetics 40(12):1392-1393. Abstract

ODonovan MC, Craddock N, Norton N, Williams H, Peirce T, Moskvina V, Nikolov I, Hamshere M, Carroll L, Georgieva L, Dwyer S, Holmans P, Marchini JL, Spencer C, Howie B, Leung H-T, Hartmann AM, Mller H-J, Morris DW, Shi Y, Feng G, Hoffmann P, Propping P, Vasilescu C, Maier W, Rietschel M, Zammit S, Schumacher J, Quinn EM, Schulze TG, Williams NM, Giegling I, Iwata N, Ikeda M, Darvasi A, Shifman S, He L, Duan J, Sanders AR, Levinson DF, Gejman P, Molecular Genetics of Schizophrenia Collaboration , Cichon S, Nthen MM, Gill M, Corvin A, Rujescu D, Kirov G, Owen MJ. (2008b) Identification of novel schizophrenia loci by genome-wide association and follow-up. Nature Genetics 40:1053-1055. Abstract

ODonovan MC, Craddock N, Owen MJ. Genetics of psychosis; Insights from views across the genome. Human Genetics 2009 Jun 12 [Epub ahead of print]. Abstract

Prokopenko I, McCarthy MI, Lindgren CM. (2008) Type 2 diabetes: new genes, new understanding. Trends in Genetics 24(12):613-621. Abstract

Rujescu D, Ingason A, Cichon S, Pietilinen OP, Barnes MR, Toulopoulou T, Picchioni M, Vassos E, Ettinger U, Bramon E, Murray R, Ruggeri M, Tosato S, Bonetto C, Steinberg S, Sigurdsson E, Sigmundsson T, Petursson H, Gylfason A, Olason PI, Hardarsson G, Jonsdottir GA, Gustafsson O, Fossdal R, Giegling I, Mller HJ, Hartmann AM, Hoffmann P, Crombie C, Fraser G, Walker N, Lonnqvist J, Suvisaari J, Tuulio-Henriksson A, Djurovic S, Melle I, Andreassen OA, Hansen T, Werge T, Kiemeney LA, Franke B, Veltman J, Buizer-Voskamp JE; GROUP Investigators, Sabatti C, Ophoff RA, Rietschel M, Nthen MM, Stefansson K, Peltonen L, St Clair D, Stefansson H, Collier DA. (2009) Disruption of the neurexin 1 gene is associated with schizophrenia. Human Molecular Genetics 18(5):988-996. Abstract

Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, Dudbridge F, Holmans PA, Whittemore AS, Mowry BJ, Olincy A, Amin F, Cloninger CR, Silverman JM, Buccola NG, Byerley WF, Black DW, Crowe RR, Oksenberg JR, Mirel DB, Kendler KS, Freedman R & Gejman PV. (2009) Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature doi:10.1038/nature08192. Abstract

Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, Werge T, Pietilinen OPH, Mors O, Mortensen PB, Sigurdsson E, Gustafsson O, Nyegaard M, Tuulio-Henriksson A, Ingason A, Hansen T, Suvisaari J, Lonnqvist J, Paunio T, Brglum AD, Hartmann A, Fink-Jensen A, Nordentoft M, Hougaard D, Norgaard-Pedersen B, Bttcher Y, Olesen J, Breuer R, Mller H-J, Giegling I, Rasmussen HB, Timm S, Mattheisen M, Bitter I, Rthelyi JM, Magnusdottir BB, Sigmundsson T, Olason P, Masson G, Gulcher JR, Haraldsson M, Fossdal R, Thorgeirsson TE, Thorsteinsdottir U, Ruggeri M, Tosato S, Franke B, Strengman E, Kiemeney LA, GROUP, Melle I, Djurovic S, Abramova L, Kaleda V, Sanjuan J, de Frutos R, Bramon E, Vassos E, Fraser G, Ettinger U, Picchioni M, Walker N, Toulopoulou T, Need AC, Ge D, Yoon JL, Shianna KV, Freimer NB, Cantor RM, Murray R, Kong A, Golimbet V, Carracedo A, Arango C, Costas J, Jnsson EG, Terenius L, Agartz I, Petursson H, Nthen MM, Rietschel M, Matthews PM, Muglia P, Peltonen L, St Clair D, Goldstein DB, Stefansson K, Collier DA & Genetic Risk and Outcome in Psychosis (GROUP). (2009) Common variants conferring risk of schizophrenia. Nature doi:10.1038/nature08186. Abstract

View all comments by Michael O'Donovan
View all comments by Nick Craddock
View all comments by Michael Owen

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  Kevin J. Mitchell
Submitted 9 July 2009
Posted 9 July 2009

GWAS Results: Is the Glass Half Full or 95 Percent Empty?
The publication of the latest schizophrenia GWAS papers represents the culmination of a tremendous amount of work and unprecedented cooperation among a large number of researchers, for which they should be applauded. In addition to the hope of finding new schizophrenia genes, GWAS have been described by some of the researchers involved as, more fundamentally, a stern test of the common variants hypothesis. Based on the meagre haul of common variants dredged up by these three studies and their forerunners, this hypothesis should clearly now be resoundingly rejected—at least in the form that suggests that there is a large, but not enormous, number of such variants, which individually have modest, but not minuscule, effects. There are no common variants of even modest effect.

However, Purcell and colleagues now argue for a model involving vast numbers of variants, each of almost negligible effect alone. The authors show that an aggregate score derived from the top 10-50 percent of a set of 74,000 SNPs from the association results in a discovery sample can predict up to 3 percent of the variance in a target group. Simply put, a set of putative risk alleles can be defined in one sample and shown, collectively, to be very slightly (though highly significantly in a statistical sense) enriched in the test sample, compared to controls. This is consistent across several different schizophrenia samples and even in two bipolar disorder samples. The authors go on to perform a set of control analyses that suggest that these results are not due to obvious population stratification or genotype rate effects (although effects at this level are obviously prone to cryptic artifacts).

If taken at face value, what do these results mean? They imply some kind of polygenic effect on risk, but of what magnitude? The answer to that depends on the interpretation of the additional simulations performed by the authors. They argue that the risk allele set inevitably contains very many false positives, which dilute the predictive power of the real positives hidden among them. Based on this logic, if we only knew which were the real variants to look at, then the variance explained in the target group would be much greater.

To try and estimate the magnitude of the effect of the polygenic load of true risk alleles, the authors conducted a series of simulations, varying parameters such as allele frequencies, genotype relative risks, and linkage disequilibrium with genotyped markers. They claim that these analyses converge on a set of models that recapitulate the observed data and that all converge on a true level of variance explained of around 34 percent, demonstrating a large polygenic component to the genetic architecture of schizophrenia.

These simulations adopt a level of statistical abstraction that should induce a healthy level of skepticism or at least reserved judgment on their findings. Most fundamentally, they rely explicitly for their calculations of the true variance on a liability-threshold model of the genetic architecture of schizophrenia. In effect, the test of the model incorporates the assumption that the model is correct.

The liability-threshold model is an elegant statistical abstraction that allows the application of the powerful statistics of normal distributions. Unfortunately, it suffers from the fact that it has no support whatsoever and makes no biological sense. First, there is no justification for assuming a normal distribution of underlying liability, whatever that term is taken to mean. Second, as usual when it is invoked, the nature of this putative threshold is not explained, though it surreptitiously implies some form of very strong epistasis (to explain the difference in risk between someone with x liability alleles and someone else with x+1 alleles). If this model is not correct, then these simulations are fatally flawed.

Even if the model were correct, the calculations are far from convincing. From a starting set of 560 models, the authors arrive at seven that are consistent with the observed degree of prediction in the target samples. According to the authors, the fact that these seven models converge on a small range of values for the underlying variance explained by the markers is evidence that this value (around 34 percent) represents the true situation. What is not highlighted is the fact that the values for the actual additive genetic variance (taking into account incomplete linkage disequilibrium between the markers and the assumed causal variants) across these models ranges from 34 percent to 98 percent and that the number of SNPs assumed to be having an effect ranges from 4,625 to 74,062. This extreme variation in the derived models hardly inspires confidence in the authors claim that their data strongly support a polygenic basis to schizophrenia that (1) involves common SNPs, [and] (2) explains at least one-third of the total variation in liability. (italics added)

From a more theoretical perspective, it should be noted that a polygenic model involving thousands of common variants of tiny effect cannot explain and will not contribute to the observed heightened familial relative risks. Such risk can only be explained by a variant of large effect or by an oligogenic model involving at most two to three loci (Bodmer and Bonilla, 2008; Hemminki et al., 2008; Mitchell and Porteous, in preparation). It seems much more likely that the observed predictive power in the target samples represents a modest genetic background effect, which could influence the penetrance and expressivity of rare, causal mutations. However, if the point of GWAS is to find genetic variants that are predictive of risk or that shed light on the pathogenic mechanisms of the disease, then clearly, even if such variants can be found by massively increasing sample sizes, their identification alone would not achieve or even appreciably contribute to either of these goals.


Hemminki K, Frsti A, Bermejo JL. The common disease-common variant hypothesis and familial risks. PLoS ONE. 2008 Jun 18;3(6):e2504. Abstract

Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008 Jun;40(6):695-701. Abstract

View all comments by Kevin J. Mitchell

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  David J. Porteous, SRF Advisor
Submitted 9 July 2009
Posted 10 July 2009
  I recommend the Primary Papers

Thumbs up or down on schizophrenia GWAS?
The triumvirate of schizophrenia GWAS studies just published in Nature gives cause for thought, and bears close scrutiny and reflection. To my reading, these three studies individually and collectively lead to an unambiguous conclusion—there is a lot of genetic heterogeneity and not one individual variant of common ancient origin accounts for a significant fraction of the genetic liability. To put it another way, there is no ApoE equivalent for schizophrenia. Strong past claims for ZNF804A and others look to have fallen by the statistical wayside. Putting the results of all three studies together does appear to provide support for a long known, pre-GWAS association with HLA, but otherwise it is hard to give a strong "thumbs up" to any specific result, not least because of the lack of replication between studies. The results are nevertheless important because the common disease, common variant model, on which GWAS are based and the associated cost justified, is strongly rejected as the main contributor to the genetic variance.

The ISC proposes a highly polygenic model with thousands of variants having an additive effect on both schizophrenia and bipolar disorder. I find no fault with their evidence, but its meaning and interpretation remains speculative. Simply consider the fact that SNPs carefully selected to tag half the genome account for about a third of the variance. It follows that the lion's share has gone undetected and will, by design and limitation, remain impervious to the GWAS strategy.

Part of the GWAS appeal is that the genotyping is technically facile and it is easier to collect lots of cases than it is families, but for as long as a diagnosis of schizophrenia or BP depends upon DSM-IV or ICD-10 classification, then diagnostic uncertainty will have a major effect on true power and validity of statistical association, both positive or negative. Indeed, the longstanding evidence from variable psychopathology amongst related individuals, the recent epidemiology evidence for shared genetic risk for schizophrenia and BP, and the further evidence supporting this from the ISC GWAS, all suggest that we should be returning more to family-based studies as a strategy to reduce genetic heterogeneity and find explanatory genetic variants. Plainly, adding ever more uncertainty through ever larger sample sizes is neither smart nor efficient.

I would certainly give the thumbs up to the full and unencumbered release of the primary data to the community as a whole, as this could usefully recoup some of the GWAS investment. It would facilitate a range of statistical and bioinformatics analyses and, who knows, there might be hidden nuggets of statistical support for independent genetic and biological studies.

View all comments by David J. Porteous

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  Sagiv Shifman
Submitted 11 July 2009
Posted 11 July 2009

The main question that arises from the three large genomewide association studies published in Nature is, What should we do next?

One important way forward would be to follow up the association findings in the MHC region. We need to understand the biological mechanism underlying this association. If the association signal is indeed related to infectious diseases, this line of inquiry may lead to the highly desired development of a treatment that might prevent the diseases in some cases.

One possible explanation for the association between schizophrenia and the MHC region (6p22.1) is that infection during pregnancy leads to disturbances of fetal brain development and increases the risk of schizophrenia later in life. A possible test for the theory of infectious diseases as risk factors for schizophrenia would be to study the associated SNPs in 6p22.1 in fathers and mothers of subjects with schizophrenia relative to parents of control subjects. If the 6p22.11 region is related to the tendency of mothers to be infected by viruses during pregnancy, we would expect the SNPs in 6p22.1 to be most strongly associated with being a mother to a subject with schizophrenia.

Another broader and more complicated part of the question is: What would be the best strategy for continued study of the genetic causes of schizophrenia? There shouldnt be only one way to proceed. Testing samples that are 10 times larger seems likely to lead to the identification of more genes, but with much smaller effect size. Testing the association of common variants with schizophrenia is unlikely to lead to the development of genetic diagnostic tools in the near future. If we want to understand the biology of the disease, it might be easier to concentrate our efforts on the identification of rare inherited and non-inherited variants with large effect on the phenotype. Such rare variants are easier to model in animals (relative to common variants with very small functional effect) and might even account for a larger proportion of cases.

View all comments by Sagiv Shifman

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  Alan BrownPaul Patterson
Submitted 17 July 2009
Posted 17 July 2009

The three companion papers in this weeks issue of Nature, in our view, support the case for investigating interaction between susceptibility genes and infectious exposures in schizophrenia. We and others have argued previously that genetic studies conducted in isolation from environmental factors, and studies of environmental influences in the absence of genetic data, are necessarily limited. Maternal influenza, rubella, toxoplasmosis, herpes simplex virus, and other infections have each been associated with an increased risk of schizophrenia, with effect sizes ranging from twofold to over fivefold. While these epidemiologic findings clearly require replication in independent cohorts, two new developments provide further support for the hypothesis. First, a growing number of animal studies of maternal immune activation have documented behavioral and brain phenotypes in offspring that are analogous to findings from clinical research in schizophrenia, and these findings are mediated in large part by specific cytokines (Meyer et al., 2009; Patterson, 2008). Second, recent evidence indicates that maternal infection is also related to deficits in executive and other cognitive functions and neuropathology thought to arise from disruptions in brain development (Brown et al., 2009a; Brown et al., 2009b).

While the MHC region contains genes not involved in the immune system, in light of the epidemiologic findings on maternal infection, it is intriguing to see that this region is once more implicated in genetic studies of schizophrenia as the importance of this region in the response to infectious insults cannot be ignored. Although it is heartening to see that the potential implications of these findings for infectious etiologies were raised in the article from the SGENE plus group, an analysis of the frequency of SNPs by season of birth falls well short of the type of research that will yield definitive findings on the relationships between susceptibility genes and infectious insults. Hence, we advocate a strategy aimed at large scale genetic analyses of schizophrenia cases using birth cohorts with infectious exposures documented from prospectively collected biological samples from the prenatal period. If the schizophrenia-related pathogenic mechanisms by which MHC-related genetic variants operate involve interactions with prenatal infection, we would expect that studies of gene-infection interaction will yield larger effect sizes than those found in these new papers. The evidence from these papers and the epidemiologic literature should also facilitate narrowing of the number of candidate genes to be tested for interactions with infectious insults, thereby ameliorating the potential for type I error due to multiple comparisons.


Meyer U, Feldon J, Fatemi SH. In-vivo rodent models for the experimental investigation of prenatal immune activation effects in neurodevelopmental brain disorders. Neurosci Biobehav Rev . 2009 Jul 1; 33(7):1061-79. Abstract

Patterson PH. Immune involvement in schizophrenia and autism: Etiology, pathology and animal models. Behav Brain Res. 2008 Dec 24; Abstract

Brown AS, Vinogradov S, Kremen WS, Poole JH, Deicken RF, Penner JD, McKeague IW, Kochetkova A, Kern D, Schaefer CA. Prenatal exposure to maternal infection and executive dysfunction in adult schizophrenia. Am J Psychiatry . 2009a Jun 1 ; 166(6):683-90. Abstract

Brown AS, Deicken RF, Vinogradov S, Kremen WS, Poole JH, Penner JD, Kochetkova A, Kern D, Schaefer CA. Prenatal infection and cavum septum pellucidum in adult schizophrenia. Schizophr Res . 2009b Mar 1 ; 108(1-3):285-7. Abstract

View all comments by Alan Brown
View all comments by Paul Patterson

Related News: Largest GWAS Analysis to Date Offers Only Two New Candidate Genes

Comment by:  Javier Costas
Submitted 17 July 2009
Posted 17 July 2009
  I recommend the Primary Papers

Two hundred years after Darwins birth and 150 years after the publication of On the Origin of Species, these three papers in Nature show the important role of natural selection in shaping the genetic architecture of schizophrenia susceptibility. If we compare the GWAS results for schizophrenia with those obtained for other diseases, it seems that there are less common risk alleles and/or lower effect sizes in schizophrenia than in many other complex diseases (see, for instance, the online catalog of published GWAS at NHGRI). This fact strongly suggests that negative selection limits the spread of susceptibility alleles, as expected due to the decreased fertility of schizophrenic patients.

Interestingly, the MHC region may be an exception. This region represents a classical example of balancing selection, i.e., the presence of several variants at a locus maintained in a population by positive natural selection (Hughes and Nei, 1988). In the case of the MHC, this balancing selection seems to be related to pathogen resistance or MHC-dependent mating choice. Therefore, the presence of common schizophrenia susceptibility alleles at this locus might be explained by antagonistic pleiotropic effects of alleles maintained by natural selection.

If negative selection limits the spread of schizophrenia risk alleles, most of the genetic susceptibility to schizophrenia is likely due to rare variants. Resequencing technologies will allow the identification of many of these variants in the near future. In the meantime, it would be interesting to focus our attention on non-synonymous SNPs at low frequency. Based on human-chimpanzee comparisons and human sequencing data, Kryukov et al. (2008) have shown that a large fraction of de novo missense mutations are mildly deleterious (i.e., they are subject to weak negative selection) and therefore they can still reach detectable frequencies. Assuming that most of these mildly deleterious alleles may be detrimental (i.e., they confer risk for disease) the authors conclude that numerous rare functional SNPs may be major contributors to susceptibility to common diseases Kryukov et al., 2008. Similar conclusions were obtained by the analysis of the relative frequency distribution of non-synonymous SNPs depending on their probability to alter protein function (Barreiro et al., 2008; Gorlov et al., 2008). As shown by Evans et al. (2008), genomewide scans of non-synonymous SNPs might complement GWAS, being able to identify rare non-synonymous variants of intermediate penetrance not detectable by current GWAS panels.


Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L (2008) Natural selection has driven population differentiation in modern humans. Nat Genet 40: 340-5. Abstract

Evans DM, Barrett JC, Cardon LR (2008) To what extent do scans of non-synonymous SNPs complement denser genome-wide association studies? Eur J Hum Genet 16: 718-23. Abstract

Gorlov IP, Gorlova OY, Sunyaev SR, Spitz MR, Amos CI (2008) Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am J Hum Genet 82: 100-12. Abstract

Hughes AL, Nei M (1988) Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335: 167-70. Abstract

Kryukov GV, Pennacchio LA, Sunyaev SR (2007) Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet 80: 727-39. Abstract

View all comments by Javier Costas

Related News: GWAS Goes Bigger: Large Sample Sizes Uncover New Risk Loci, Additional Overlap in Schizophrenia and Bipolar Disorder

Comment by:  David J. Porteous, SRF Advisor
Submitted 21 September 2011
Posted 21 September 2011

Consorting with GWAS for schizophrenia and bipolar disorder: same message, (some) different genes
On 18 September 2011, Nature Genetics published the results from the Psychiatric Genetics Consortium of two separate, large-scale GWAS analyses, for schizophrenia (Ripke et al., 2011) and for bipolar disorder (Sklar et al., 2011), and a joint analysis of both. By combining forces across several consortia who have previously published separately, we should now have some clarity and definitive answers.

For schizophrenia, the Stage 1 GWAS discovery data came from 9,394 cases and 12,462 controls from 17 studies, imputing 1,252,901 SNPs. The Stage 2 replication sample comprised 8,442 cases and 21,397 controls. Of the 136 SNPs which reached genomewide significance in Stage 1, 129 (95 percent) mapped to the MHC locus, long known to be associated with risk of schizophrenia. Of the remaining seven SNPs, five mapped to previously identified loci. In total, just 10 loci met or exceeded the criteria of genomewide significance of p <5 x 10-8 at Stage 1 and/or Stage 2. The 10 "best" SNPs identified eight loci: MIR137, TRIM26, CSM1, CNNM2, NT5C2 and TCF4 were tagged by intragenic SNPs, while the remaining two were at some distance from a known gene (343 kb from PCGEM1 and 126 kb from CCDC68). More important than the absolute significance levels, the overall odds ratios (with 95 percent confidence intervals) ranged from 1.08 (0.96-1.20) to 1.40 (1.28-1.52). These fractional increases contrast with the ~10-fold increase in risk to the first-degree relative of someone with schizophrenia (Gottesman et al., 2010).

Six of these eight loci have been reported previously, but ZNF804A, a past favorite, was noticeably absent from the "top 10" list. The main attention now will surely be on MIR137, a newly discovered locus which encodes a microRNA, mir137, known to regulate neuronal development. The authors remark that 17 predicted MIR137 targets had a SNP with a p <10-4, more than twice as many as for the control gene set (p <0.01), though this relaxed significance cutoff seems somewhat arbitrary and warrants further examination. The result for MIR137 immediately begs the questions, Does the "risk" SNP affect MIR137 function directly or indirectly, and if so, does it affect the expression of any of the putative targets identified here? These are fairly straightforward questions: positive answers are vital to the biological validation of these statistical associations. As has been the case for follow-up studies of ZNF804A, however (reviewed by Donohoe et al., 2010), unequivocal answers from GWAS "hits" can be hard to come by, not least because of the very modest relative risks that they confer. Let us hope that this is not the case for MIR137, but it is of passing note that for two of the eight replication cohorts, the direction of effect for MIR137 was in the opposite direction from the Stage 1 finding. Taken together with the odds ratios reported in the range of 1.11-1.22, the effect size for the end phenotype of schizophrenia may be challenging to validate functionally. Perhaps a relevant intermediate phenotype more proximal to the gene will prove tractable.

For bipolar disorder, Stage 1 comprised 7,481 cases versus 9,250 controls, and identified 34 promising SNPs. These were replicated in Stage 2 in an independent set of 4,496 cases and a whopping 42,422 controls: 18 of the 34 SNPs survived at p <0.05. Taking Stage 1 and 2 together confirmed the previous "hot" finding for CACNA1C (Odds ratio = 1.14) and introduced a new candidate in ODZ4 (Odds ratio = 0.88, i.e., the minor allele is presumably "protective" or under some form of selection). Previous candidates ANK3 and SYNE1 looked promising at Stage 1, but did not replicate at Stage 2.

Finally, in a combined analysis of schizophrenia plus bipolar disorder versus controls, three of the respective "top 10" loci, CACNA1C, ANK3, and the ITIH3-ITIH4 region, came out as significant overall. This is consistent with the earlier evidence from the ISC for an overlap between the polygenic index for schizophrenia and bipolar disorder (Purcell et al., 2009). It is also consistent with the epidemiological evidence for shared genetic risk between schizophrenia and bipolar disorder (Lichtenstein et al., 2009; Gottesman et al., 2010).

What can we take from these studies? The authorship lists alone speak to the size of the collaborative effort involved and the sheer organizational task, depending on your point of view, that most of the positive findings were reported on previously could be seen as valuable "replication," or unnecessary duplication of cost and effort. Whichever way you look at it, though, just two new loci for schizophrenia and one for bipolar looks like a modest return for such a gargantuan investment. It begs the question as to whether the GWAS approach is gaining the hoped-for traction on major mental illness. Indeed, the evidence suggests that the technology tide is rapidly turning away from allelic association methods and towards rare mutation detection by copy number variation, exome, and/or whole-genome sequencing (Vacic et al., 2011; Xu et al., 2011).

Family studies are, as ever and always, of critical importance in genetics, and to distinguish between inherited and de-novo mutations. While the emphasis of GWAS has been on the impact of common, ancient allelic variation, it has become ever more obvious from both past linkage studies and from contemporary GWAS and CNV studies just how heterogeneous these conditions are, and how little note individual cases and families take of conventional DSM diagnostic boundaries. Improved genetic and other tools through which to stratify risk, define phenotypes, and predict outcomes are clearly needed. Whether such tools can be derived for GWAS data remains to be seen. It is important to remind ourselves of two things. First, case/association studies tell us something about the average impact (odds ratio, with confidence interval) of a given allele in the population studied. In these very large GWAS, this measure of impact will be approximating to the European population average. The odds ratios tell us that the impact per allele is modest. More importantly in some ways, the allele frequencies also tell us that the vast majority of allele carriers are not affected. Likewise, a high proportion of cases are not carriers. In the main, they are subtle risk modifiers rather than causal variants. That said, follow-up studies may define rare, functional genetic variants in MIR137 or CACNA1C or ANK3 that are tagged by the risk allele and that have sufficiently strong effects in a subset of cases for a causal link to be made. With this new GWAS data in hand, these sorts of questions can now be addressed.

It should also be said that there is clearly a wealth of potentially valuable information lying below the surface of the most statistically significant findings, but how to sort the true from the false associations? Should the MIR137 finding, and the targets of MIR137, be substantiated by biological analysis, then that would certainly be something well worth knowing and following up on. Network analysis by gene ontology and protein-protein interaction may yield more, but these approaches need to be approached with caution when not securely anchored from a biologically validated start point. Epistasis and pleiotropy are most likely playing a role, but even in these large sample sets, the power to determine statistical (as opposed to biological) evidence is challenging. All told, one is left thinking that more incisive findings have and will in the future come from family-based approaches, through structural studies (CNVs and chromosome translocations), and, in the near future, whole-genome sequencing of cases and relatives.


Ripke S, Sanders AR, Kendler KS, Levinson DF, Sklar P, Holmans PA, Lin DY, Duan J, Ophoff RA, Andreassen OA, Scolnick E, Cichon S, St Clair D, Corvin A, Gurling H, Werge T, Rujescu D, Blackwood DH, Pato CN, Malhotra AK, Purcell S, Dudbridge F, Neale BM, Rossin L, Visscher PM, Posthuma D, Ruderfer DM, Fanous A, Stefansson H, Steinberg S, Mowry BJ, Golimbet V, de Hert M, Jnsson EG, Bitter I, Pietilinen OP, Collier DA, Tosato S, Agartz I, Albus M, Alexander M, Amdur RL, Amin F, Bass N, Bergen SE, Black DW, Brglum AD, Brown MA, Bruggeman R, Buccola NG, Byerley WF, Cahn W, Cantor RM, Carr VJ, Catts SV, Choudhury K, Cloninger CR, Cormican P, Craddock N, Danoy PA, Datta S, de Haan L, Demontis D, Dikeos D, Djurovic S, Donnelly P, Donohoe G, Duong L, Dwyer S, Fink-Jensen A, Freedman R, Freimer NB, Friedl M, Georgieva L, Giegling I, Gill M, Glenthj B, Godard S, Hamshere M, Hansen M, Hansen T, Hartmann AM, Henskens FA, Hougaard DM, Hultman CM, Ingason A, Jablensky AV, Jakobsen KD, Jay M, Jrgens G, Kahn RS, Keller MC, Kenis G, Kenny E, Kim Y, Kirov GK, Konnerth H, Konte B, Krabbendam L, Krasucki R, Lasseter VK, Laurent C, Lawrence J, Lencz T, Lerer FB, Liang KY, Lichtenstein P, Lieberman JA, Linszen DH, Lnnqvist J, Loughland CM, Maclean AW, Maher BS, Maier W, Mallet J, Malloy P, Mattheisen M, Mattingsdal M, McGhee KA, McGrath JJ, McIntosh A, McLean DE, McQuillin A, Melle I, Michie PT, Milanova V, Morris DW, Mors O, Mortensen PB, Moskvina V, Muglia P, Myin-Germeys I, Nertney DA, Nestadt G, Nielsen J, Nikolov I, Nordentoft M, Norton N, Nthen MM, O'Dushlaine CT, Olincy A, Olsen L, O'Neill FA, Orntoft TF, Owen MJ, Pantelis C, Papadimitriou G, Pato MT, Peltonen L, Petursson H, Pickard B, Pimm J, Pulver AE, Puri V, Quested D, Quinn EM, Rasmussen HB, Rthelyi JM, Ribble R, Rietschel M, Riley BP, Ruggeri M, Schall U, Schulze TG, Schwab SG, Scott RJ, Shi J, Sigurdsson E, Silverman JM, Spencer CC, Stefansson K, Strange A, Strengman E, Stroup TS, Suvisaari J, Terenius L, Thirumalai S, Thygesen JH, Timm S, Toncheva D, van den Oord E, van Os J, van Winkel R, Veldink J, Walsh D, Wang AG, Wiersma D, Wildenauer DB, Williams HJ, Williams NM, Wormley B, Zammit S, Sullivan PF, O'Donovan MC, Daly MJ, Gejman PV. Genome-wide association study identifies five new schizophrenia loci. Nat Genet . 2011 Sep 18. Abstract

Psychiatric GWAS Consortium Bipolar Disorder Working Group, Sklar P, Ripke S, Scott LJ, Andreassen OA, Cichon S, Craddock N, Edenberg HJ, Nurnberger JI Jr, Rietschel M, Blackwood D, Corvin A, Flickinger M, Guan W, Mattingsdal M, McQuillin A, Kwan P, Wienker TF, Daly M, Dudbridge F, Holmans PA, Lin D, Burmeister M, Greenwood TA, Hamshere ML, Muglia P, Smith EN, Zandi PP, Nievergelt CM, McKinney R, Shilling PD, Schork NJ, Bloss CS, Foroud T, Koller DL, Gershon ES, Liu C, Badner JA, Scheftner WA, Lawson WB, Nwulia EA, Hipolito M, Coryell W, Rice J, Byerley W, McMahon FJ, Schulze TG, Berrettini W, Lohoff FW, Potash JB, Mahon PB, McInnis MG, Zllner S, Zhang P, Craig DW, Szelinger S, Barrett TB, Breuer R, Meier S, Strohmaier J, Witt SH, Tozzi F, Farmer A, McGuffin P, Strauss J, Xu W, Kennedy JL, Vincent JB, Matthews K, Day R, Ferreira MA, O'Dushlaine C, Perlis R, Raychaudhuri S, Ruderfer D, Hyoun PL, Smoller JW, Li J, Absher D, Thompson RC, Meng FG, Schatzberg AF, Bunney WE, Barchas JD, Jones EG, Watson SJ, Myers RM, Akil H, Boehnke M, Chambert K, Moran J, Scolnick E, Djurovic S, Melle I, Morken G, Gill M, Morris D, Quinn E, Mhleisen TW, Degenhardt FA, Mattheisen M, Schumacher J, Maier W, Steffens M, Propping P, Nthen MM, Anjorin A, Bass N, Gurling H, Kandaswamy R, Lawrence J, McGhee K, McIntosh A, McLean AW, Muir WJ, Pickard BS, Breen G, St Clair D, Caesar S, Gordon-Smith K, Jones L, Fraser C, Green EK, Grozeva D, Jones IR, Kirov G, Moskvina V, Nikolov I, O'Donovan MC, Owen MJ, Collier DA, Elkin A, Williamson R, Young AH, Ferrier IN, Stefansson K, Stefansson H, Thornorgeirsson T, Steinberg S, Gustafsson O, Bergen SE, Nimgaonkar V, Hultman C, Landn M, Lichtenstein P, Sullivan P, Schalling M, Osby U, Backlund L, Frisn L, Langstrom N, Jamain S, Leboyer M, Etain B, Bellivier F, Petursson H, Sigur Sson E, Mller-Mysok B, Lucae S, Schwarz M, Schofield PR, Martin N, Montgomery GW, Lathrop M, Oskarsson H, Bauer M, Wright A, Mitchell PB, Hautzinger M, Reif A, Kelsoe JR, Purcell SM. Large-scale genome-wide association analysis of bipolar disorder reveals a new susceptibility locus near ODZ4. Nat Genet. 2011 Sep 18. Abstract

Lichtenstein P, Yip BH, Bjrk C, Pawitan Y, Cannon TD, Sullivan PF, Hultman CM. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. Lancet . 2009 Jan 17 ; 373(9659):234-9. Abstract

Gottesman II, Laursen TM, Bertelsen A, Mortensen PB. Severe mental disorders in offspring with 2 psychiatrically ill parents. Arch Gen Psychiatry . 2010 Mar 1 ; 67(3):252-7. Abstract

Donohoe G, Morris DW, Corvin A. The psychosis susceptibility gene ZNF804A: associations, functions, and phenotypes. Schizophr Bull . 2010 Sep 1 ; 36(5):904-9. Abstract

Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Sklar P. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature . 2009 Aug 6 ; 460(7256):748-52. Abstract

Vacic V, McCarthy S, Malhotra D, Murray F, Chou HH, Peoples A, Makarov V, Yoon S, Bhandari A, Corominas R, Iakoucheva LM, Krastoshevsky O, Krause V, Larach-Walters V, Welsh DK, Craig D, Kelsoe JR, Gershon ES, Leal SM, Dell Aquila M, Morris DW, Gill M, Corvin A, Insel PA, McClellan J, King MC, Karayiorgou M, Levy DL, DeLisi LE, Sebat J. Duplications of the neuropeptide receptor gene VIPR2 confer significant risk for schizophrenia. Nature . 2011 Mar 24 ; 471(7339):499-503. Abstract

Xu B, Roos JL, Dexheimer P, Boone B, Plummer B, Levy S, Gogos JA, Karayiorgou M. Exome sequencing supports a de novo mutational paradigm for schizophrenia. Nat Genet . 2011 Jan 1 ; 43(9):864-8. Abstract

View all comments by David J. Porteous

Related News: GWAS Goes Bigger: Large Sample Sizes Uncover New Risk Loci, Additional Overlap in Schizophrenia and Bipolar Disorder

Comment by:  Patrick Sullivan, SRF Advisor
Submitted 26 September 2011
Posted 26 September 2011
  I recommend the Primary Papers

The two papers appearing online in Nature Genetics last Sunday are truly important additions to our increasing knowledge base for these disorders. The core analyses have been presented multiple times at international meetings in the past two years.

Since then, the available sample sizes for both schizophrenia and bipolar disorder have grown considerably. If the recently published data are any guide, the next round of analyses should be particularly revealing.

The PGC results and almost all of the data that were used in these reports are available by application to the controlled-access repository.

Please see the references for views of this area that contrast with those of Professor Porteous.


Sullivan P. Don't give up on GWAS. Molecular Psychiatry. 2011 Aug 9. Abstract

Kim Y, Zerwas S, Trace SE, Sullivan PF. Schizophrenia genetics: where next? Schizophr Bull. 2011;37:456-63. Abstract

View all comments by Patrick Sullivan

Related News: GWAS Goes Bigger: Large Sample Sizes Uncover New Risk Loci, Additional Overlap in Schizophrenia and Bipolar Disorder

Comment by:  Edward Scolnick
Submitted 28 September 2011
Posted 29 September 2011
  I recommend the Primary Papers

It is clear in human genetics that common variants and rare variants have frequently been detected in the same genes. Numerous examples exist in many diseases. The bashing of GWAS in schizophrenia and bipolar illness indicates, by those who make such comments, a lack of understanding of human genetics and where the field is. When these studies were initiated five years ago, next-generation sequencing was not available. Large samples of populations or trios or quartets did not exist. The international consortia have worked to collect such samples that are available for GWAS now, as well as for detailed sequencing studies. Before these studies began there was virtually nothing known about the etiology of schizophrenia and bipolar illness. The DISC1 gene translocation in the famous family was an important observation in that family. But almost a decade later there is still no convincing data that variants in Disc1 or many of its interacting proteins are involved in the pathogenesis of human schizophrenia or major mental illness.

Sequencing studies touted to be the Occam's razor for the field are beginning, and already, as in the past in this field, preemptive papers are appearing inadequately powered to draw any conclusions with certainty. Samples collected by the consortia will be critical to clarify the role of rare variants. This will take time and care so as not to set the field back into the morass it used to be. GWAS are basically modern public health epidemiology providing important clues to disease etiology. Much work is clearly needed once hits are found, just as it has been in traditional epidemiology. But in many fields, GWAS has already led to important biological insights, and it is certain it will do so in this field as well because the underlying principles of human genetics apply to this field, also. The primary problem in the field is totally inadequate funding by government organizations that consistently look for shortcuts to gain insights and new treatments, and forget how genetics has transformed cancer, immunology, autoimmune and inflammatory diseases, and led to better diagnostics and treatments. The field will never understand the pathogenesis of these illnesses until the genetic architecture is deciphered. The first enzyme discovered in E. coli DNA biochemistry was a repair enzyme—not the enzyme that replicated DNA—and this was discovered through genetics. The progress in this field has been dramatic in the past five years. All doing this work realize that this is only a beginning and that there is a long hard road to full understanding. But to denigrate the beginning, which is clearly solid, makes no sense and indicates a provincialism unbecoming to a true scientist.

View all comments by Edward Scolnick

Related News: GWAS Goes Bigger: Large Sample Sizes Uncover New Risk Loci, Additional Overlap in Schizophrenia and Bipolar Disorder

Comment by:  Nick CraddockMichael O'Donovan (SRF Advisor)
Submitted 11 October 2011
Posted 11 October 2011

At the start of the millennium, only two molecular genetic findings could be said with a fair amount of confidence to be etiologically relevant to schizophrenia and bipolar disorder. The first of these was that deletions of chromosome 22q11 that are known to cause velo-cardio-facial syndrome also confer a substantial increase in risk of psychosis. The second was the discovery by David St Clair, Douglas Blackwood, and colleagues (St Clair et al., 1990) of a balanced translocation involving chromosomes 1 and 11 that co-segregates with a range of psychiatric phenotypes in a single large family, was clearly relevant to the etiology of illness in that family (Blackwood et al., 2001). The latter finding has led to the conjecture, based upon a translocation breakpoint analysis reported by Kirsty Millar, David Porteous, and colleagues (Millar et al., 2000), that elevated risk in that family is conferred by altered function of a gene eponymously named DISC1. Just over a decade later, what can we now say with similar degrees of confidence? The relevance of deletions of 22q11 has stood the test of time—indeed, has strengthened—through further investigation (Levinson et al., 2011, being only one example), while the relevance of DISC1 remains conjecture. That the evidence implicating this gene is no stronger than it was all those years ago provides a clear illustration of the difficulties inherent in drawing etiological inferences from extremely rare mutations regardless of their effect size.

However, with the publication of several GWAS and CNV papers, culminating in the two mega-analyses reported by the PGC that are the subject of this commentary, one on schizophrenia, one on bipolar disorder, together reporting a total of six novel loci, very strong evidence has accumulated for approximately 20 new loci in psychosis. The majority of these are defined by SNPs, the remainder by copy number variants, and virtually all (including the rare, relatively high-penetrance CNVs) have emerged through the application of GWAS technology to large case-control samples, not through the study of linkage or families. Have GWAS approaches proven their worth? Clearly, the genetic findings represent the tip of a very deeply submerged iceberg, and it is possible that not all will stand the test of time and additional data, although the current levels of statistical support suggest the majority will do so. Nevertheless, the findings of SNP and CNV associations (including 22q11 deletions) seem to us to provide the first real signs of progress in uncovering strongly supported findings of primary etiological relevance to these disorders. Although SNP effects are small, the experience from other complex phenotypes is that statistically robust genetic associations, even those of very small effect, can highlight biological pathways of etiological (height; Lango Allen et al., 2010) and of possible therapeutic relevance (Alzheimer's disease; Jones et al., 2010). Moreover, it would seem intuitively likely that even if capturing the total heritable component of a disorder is presently a distant goal, the greater the number of associations captured, the better will be the snapshot of the sorts of processes that contribute to a disorder, and that might therefore be manipulated in its treatment. Thus, there is evidence that building even a very incomplete picture of the sort of genes that influence risk is an excellent method of informing understanding of pathogenesis of a highly complex disorder (or set of disorders).

As in previous GWAS and CNV endeavors, the PGC studies have required a significant degree of altruism from the hundreds of investigators and clinicians who have shared their data with little hope of significant academic credit. Moreover, where ethical approval permitted, the datasets have been made virtually open source for other investigators who are not part of the study. Sadly, this generosity of spirit is not matched in the rather curmudgeonly commentary provided by David Porteous. Rather than challenging the science or conduct of the study, it appears to us that the commentary takes the easier route of damnation by faint praise, distortion, and even innuendo.

The strongest finding, that being of association to the extended MHC region, is dismissed as "long known to be associated with risk of schizophrenia." How that knowledge was acquired a long time ago is unclear, but it cannot have been based upon data. It is true that weak and inconsistent associations at the MHC locus have been reported, even predating the molecular genetic era (McGuffin et al., 1978), but not until the landmark studies of the International Schizophrenia Consortium (2009), the Molecular Genetics of Schizophrenia Consortium (AbstractShi et al., 2009), and the SGENE+ Consortium (Stefansson et al., 2009) have the findings been strong enough to be described as knowledge. Porteous dismissive tone continues with the phrase "just 10 loci met.," the word "just" being a qualifier that seems designed to denigrate rather than challenge the results. Given the paucity of etiological clues, others might consider this a good yield. The observation in which the effect sizes at the detected loci are contrasted "with the ~10-fold increase in risk to the first-degree relative of someone with schizophrenia" is so fatuous it is difficult to believe its function is anything other than to insinuate in the mind of the reader the impression of failure. Yet no one remotely aware of the expectations behind GWAS would expect that the effect sizes of any common risk allele would bear any resemblance to that of family history, the latter reflecting the combined effects of many risk alleles.

Among the most important findings of the PGC schizophrenia group were those of strong evidence for association between a variant in the vicinity of a gene encoding regulatory RNA MIR137, and the subsequent finding that schizophrenia association signals were significantly enriched (P <0.01) among predicted targets of this regulatory RNA. Of course, like the other findings, there is room for the already very strong data to be further strengthened, but that finding alone opens up a whole new window in potential pathogenic mechanisms. Yet Porteous casually throws four handfuls of mud, dismissing the enrichment p <0.01 as a "relaxed significance cutoff," which "seems somewhat arbitrary," and that "warrants further examination," and commenting that "it is of passing note that for two of the eight replication cohorts, the direction of effect for MIR137 was in the opposite direction from the Stage 1 finding." If Porteous feels he has the expertise to pronounce on this analysis, it would behoove him well to choose his words more carefully. Since when is a P value of <0.01 "relaxed" when applied to a test of a single hypothesis? Can he really be unaware of the longstanding convention of regarding P <0.05 as significant in specific hypothesis testing? If he is not unaware of this, why is it generally applicable but "somewhat arbitrary" in the context of the PGC study? As for "further examination being warranted," this is true of any scientific finding, but what does he specifically mean in the context of his commentary? And why is it of "passing note" that not all samples show trends in the same direction? In the context of the well-known issues in GWAS concerning individual small samples and power, what is surprising about that? There may be simple answers to these questions, but we find it difficult to draw any other conclusion than that the choice of language is anything other than another attempt to sow seeds of doubt through innuendo rather than analysis.

The remark that "ZNF804A, a past favourite, was noticeably absent" falls well short of the standard one might expect of serious discourse. The choice of language suggests a desire to denigrate rather than analyse, and to insinuate without specific evidence that any interest in this gene should now be over. In fact, the largest study of this gene to date is that of Williams et al. (2010), which actually includes at least two-thirds of the PGC discovery dataset and is based on over 57,000 subjects, a sample almost three times as large as the mega-analysis sample of the PGC.

Porteous overall conclusion from the two studies is "whichever way you look at it, though, just two new loci for schizophrenia and one for bipolar looks like a modest return for such a gargantuan investment." This appraisal is misleading. The PGC studies were actually relatively small investments, being based on a synthesis of pre-existing data. Since the studies use existing data, there is naturally an expectation that some of the loci identified will have been previously reported as either significant or have otherwise been flagged up as of interest, while some will be new. Overall, the return on the GWAS investment is not just the six novel loci (rather than three); it is the totality of the findings, which, as noted above, currently number about 20 loci. The schizophrenia research community should also be made aware, if they are not already, that the return on these investments is not "one off"; it is cumulative. In the coming years, the component datasets will continue to generate a return in new gene discoveries (including CNVs yet to be reported by the PGC) as they are added (at essentially no cost) to other emerging GWAS datasets being generated largely through charitable support. With the returns in the bank already, one could (and we do) argue that the investment is negligible, particularly given the cost in human and economic terms of continued ignorance about these illnesses that blight so many lives.

It is true that with so little being known compared with what is yet to be known, the biological insights that can be made from the existing data are limited. This is equally true of the common and rare variants identified so far, and we are not aware of any of the "incisive findings" that Porteous claims have already come from alternative approaches, although the emergence of strong evidence for deletions at NRXN1 as a susceptibility variant for schizophrenia through meta-analysis of case-control GWAS data (one of the extra returns on the GWAS data we referred to above) deserves that description (Kirov et al., 2009). But this is not a cause for despair; in contrast to the future promises made on behalf of other as yet unproven designs, for eyes and minds that are open enough to see, the recent papers provide unambiguous evidence for a straightforward route to identifying more genes and pathways involved in the disorder. Even Porteous has partial sight of this, since he notes that "there is clearly a wealth of potentially valuable information lying below the surface of the most statistically significant findings." What he appears unable to see is "how to sort the true from the false associations?" The answer for a large number of loci is simple. Better-powered studies based upon larger sample sizes.

We would like to add a note of caution for those who too readily denigrate case-control approaches in favor of hyping other approaches, none of which are yet so well proven routes to success. We are not against those approaches; indeed, we are actively involved in them. But we are concerned that the hype surrounding sequencing, and the generation of what we think are unrealistic expectations, will make those designs vulnerable to attack from those who seem only too keen to make premature and inaccurate pronouncements of failure, who seem desperate to derive straw from nuggets of gold. If, as we believe is likely, it turns out to be quite a few years more before sequencing studies become sufficiently powered to provide large numbers of robust findings, as for GWAS, the consequence could be withdrawal of substantial government funding before those designs have had a chance to live up to their potential. That such an outcome has already largely been achieved for GWAS in some countries might be a source of rejoicing in some quarters, but it should also send out a warning to all who broadly hold the view that understanding the genetics of these disorders is central to understanding their origins, and to improving their future management.

The recent PGC papers represent an impressive, international collaboration based upon methodologies that have a proven track record in delivering important biological insights into other complex disorders, and now in psychiatry. Given the complexity of psychiatric phenotypes, we believe it is likely that a variety of approaches, paradigms, and ideas will be essential for success, including the approaches espoused by those who believe the evidence is compatible with essentially Mendelian inheritance. Inevitably, there will be sincerely held differences of opinion concerning the best way forward, and, of course, in any area of science, reasoned arguments based upon a fair assessment of the evidence are essential. Nevertheless, given there are sufficient uncertainties about what can be realistically delivered in the short term by the newer technologies, we suggest that the cause of bringing benefit to patients will most likely be better served by humility, realism, and a constructive discussion in which there is no place for belittling real achievements, for arrogance, or for dogmatic posturing.


Blackwood DH, Fordyce A, Walker MT, St Clair DM, Porteous DJ, Muir WJ. Schizophrenia and affective disorders--cosegregation with a translocation at chromosome 1q42 that directly disrupts brain-expressed genes: clinical and P300 findings in a family. Am J Hum Genet. 2001 Aug;69(2):428-33. Abstract

International Schizophrenia Consortium Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009 Aug 6;460(7256):748-52. Abstract

Jones L, Holmans PA, Hamshere ML, Harold D, Moskvina V, Ivanov D, et al. Genetic evidence implicates the immune system and cholesterol metabolism in the etiology of Alzheimer's disease. PLoS One. 2010 Nov 15;5(11):e13950. Erratum in: PLoS One. 2011;6(2). Abstract

Kirov G, Rujescu D, Ingason A, Collier DA, O'Donovan MC, Owen MJ. Neurexin 1 (NRXN1) deletions in schizophrenia. Schizophr Bull. 2009 Sep;35(5):851-4. Epub 2009 Aug 12. Review. Abstract

Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010 Oct 14;467(7317):832-8. Abstract

Levinson DF, Duan J, Oh S, Wang K, Sanders AR, Shi J, et al. Copy number variants in schizophrenia: confirmation of five previous findings and new evidence for 3q29 microdeletions and VIPR2 duplications. Am J Psychiatry. 2011 Mar;168(3):302-16. Abstract

McGuffin P, Farmer AE, Rajah SM. Histocompatability antigens and schizophrenia. Br J Psychiatry. 1978 Feb;132:149-51. Abstract

Millar JK, Wilson-Annan JC, Anderson S, Christie S, Taylor MS, Semple CA, et al. Disruption of two novel genes by a translocation co-segregating with schizophrenia. Hum Mol Genet. 2000 May 22;9(9):1415-23. Abstract

Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, et al. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 2009 Aug 6;460(7256):753-7. Abstract

St Clair D, Blackwood D, Muir W, Carothers A, Walker M, Spowart G, et al. Association within a family of a balanced autosomal translocation with major mental illness. Lancet. 1990 Jul 7;336(8706):13-6. Abstract

Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, et al Common variants conferring risk of schizophrenia. Nature. 2009 Aug 6;460(7256):744-7. Abstract

The Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium. Genome-wide association study identifies five new schizophrenia loci. Nat Genet. 2011 Sep 18;43(10):969-976. Abstract

Williams HJ, Norton N, Dwyer S, Moskvina V, Nikolov I, Carroll L, et al. Fine mapping of ZNF804A and genome-wide significant evidence for its involvement in schizophrenia and bipolar disorder. Mol Psychiatry. 2011 Apr;16(4):429-41. Abstract

View all comments by Nick Craddock
View all comments by Michael O'Donovan

Related News: GWAS Goes Bigger: Large Sample Sizes Uncover New Risk Loci, Additional Overlap in Schizophrenia and Bipolar Disorder

Comment by:  Todd LenczAnil Malhotra (SRF Advisor)
Submitted 11 October 2011
Posted 11 October 2011

It is worth re-emphasizing that efforts such as the Psychiatric GWAS Consortium do not rule out potentially important discoveries from alternative strategies such as endophenotypic approaches or examination of rare variants. Indeed, such strategies will be necessary to understand the functional mechanisms implicated by GWAS hits.

Moreover, we note that the two recently published PGC papers were not designed to exclude a role for previously identified candidate loci such as DISC1 (Hodgkinson et al., 2004), or prior GWAS findings such as rs1344706 at ZNF804A (Williams et al., 2011). For both these loci, and many others that have been proposed, meta-analysis of available samples suggest very small effect sizes (OR ~1.1), as might be expected for common variants. As noted in Supplementary Table S12 of the schizophrenia PGC paper (Ripke et al., 2011), the currently available sample size (~9,000 cases/~12,000 controls) of the discovery cohort was still underpowered to detect variants with odds ratios of 1.1, especially if they have a minor allele frequency of 20 percent or below.

An instructive example arises from the field of diabetes genetics. An association of a missense variant (rs1801282, Pro12Ala) in PPARG to type 2 diabetes was first reported in a sample of n = 91 Japanese-American patients (Deeb et al., 1998). Many subsequent studies failed to replicate the effect, and the initial large GWAS meta-analysis (involving >14,000 cases and ~18,000 controls; Zeggini et al., 2007) only detected the association at a p-value that would be considered non-significant by todays standard (p =1.7*10-6). Interestingly, the authors deemed the association to be confirmed, and the result was widely accepted within that field. Subsequent meta-analysis, involving twice as many subjects (total n = 67,000), finally obtained conventional genomewide levels of significance (p <5*10-8; Gouda et al., 2010).


Deeb SS, Fajas L, Nemoto M, Pihlajamki J, Mykknen L, Kuusisto J, Laakso M, Fujimoto W, Auwerx J. A Pro12Ala substitution in PPARgamma2 associated with decreased receptor activity, lower body mass index and improved insulin sensitivity. Nat Genet. 1998 Nov;20(3):284-7. Abstract

Gouda HN, Sagoo GS, Harding AH, Yates J, Sandhu MS, Higgins JP. The association between the peroxisome proliferator-activated receptor-gamma2 (PPARG2) Pro12Ala gene variant and type 2 diabetes mellitus: a HuGE review and meta-analysis. Am J Epidemiol. 2010 Mar 15;171(6):645-55. Abstract

Hodgkinson CA, Goldman D, Jaeger J, Persaud S, Kane JM, Lipsky RH, Malhotra AK. Disrupted in schizophrenia 1 (DISC1): association with schizophrenia, schizoaffective disorder, and bipolar disorder. Am J Hum Genet. 2004 Nov;75(5):862-72. Abstract

Williams HJ, Norton N, Dwyer S, Moskvina V, Nikolov I, Carroll L, Georgieva L, Williams NM, Morris DW, Quinn EM, Giegling I, Ikeda M, Wood J, Lencz T, Hultman C, Lichtenstein P, Thiselton D, Maher BS; Molecular Genetics of Schizophrenia Collaboration (MGS) International Schizophrenia Consortium (ISC), SGENE-plus, GROUP, Malhotra AK, Riley B, Kendler KS, Gill M, Sullivan P, Sklar P, Purcell S, Nimgaonkar VL, Kirov G, Holmans P, Corvin A, Rujescu D, Craddock N, Owen MJ, O'Donovan MC. Fine mapping of ZNF804A and genome-wide significant evidence for its involvement in schizophrenia and bipolar disorder. Mol Psychiatry. 2011 Apr;16(4):429-41. Abstract

Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney AS; Wellcome Trust Case Control Consortium (WTCCC), McCarthy MI, Hattersley AT. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007 Jun 1;316(5829):1336-41. Abstract

View all comments by Todd Lencz
View all comments by Anil Malhotra

Related News: Study Claiming Eight Types of Schizophrenia Called Into Question

Comment by:  Michael O'Donovan, SRF Advisor
Submitted 3 October 2014
Posted 3 October 2014

Comment by Michael O'Donovan, Gerome Breen, Brendan Bulik-Sullivan, Mark Daly, Sarah Medland, Benjamin Neale, Stephan Ripke, Patrick Sullivan, Peter Visscher, Naomi Wray

[Editor's note: Reprinted from PubMed Commons, without changes, under the Creative Commons attribution 3.0 license.]

In this study published on September 15, Arnedo et al. asserted that schizophrenia is a heterogeneous group of disorders underpinned by different genetic networks mapping to differing sets of clinical symptoms. As a result of their analyses, Arnedo et al. have made remarkable and perhaps unprecedented claims regarding their capacity to subtype schizophrenia. This paper has received considerable media attention. One claim features in many media reports, that schizophrenia can be delineated into "8 types". If these claims are replicable and consistent, then the work reported in this paper would constitute an important advance into our knowledge of the etiology of schizophrenia.

Unfortunately, these extraordinary claims are not justified by the data and analyses presented. Their claims are based upon complex (and we believe flawed) analyses that are said to reveal links between clusters of clinical data points and patterns of data generated by looking at millions of genetic data points. Instead of the complexities favored by Arnedo et al., there are far simpler alternative explanations for the patterns they observed. We believe that the authors have not excluded important alternative explanations – if we are correct, then the major conclusions of this paper are invalidated.

Analyses such as these rely on independence in many ways: among variables used in prediction, absence of artifactual relationships between genotypes and clinical variables, and between the methods of assessing significance and replication. Below we identify five specific areas of concern that are not adequately addressed in the manuscript, each of which calls into question the conclusions of this study.

A. Ancestry/population stratification.
Two of the three samples the authors studied (MGS and CATIE) have substantial proportions of subjects of European and African ancestry. The third sample is from southern Europe. Ancestry is an extremely well known confounder in genetic studies with a great capacity to yield false associations. Correct inference from genomic data in samples like these requires exceptional care. In the analyses they present, there is almost no mention of how this known bias was addressed or evaluation of its impact on their results. In the samples they used, their references to sets of SNPs that track together is essentially the definition of uncorrected population structure/stratification. Indeed, a central component of their statistical methodology – nonnegative matrix factorization – has been previously employed as a method for ancestry inference in the population genetics literature.

We were unsuccessful in attempts to obtain the full list of SNPs that Arnedo et al. analyzed. Instead, we evaluated the SNPs listed in Table S3 (448 SNP entries, 245 unique SNPs as SNPs could be present more than once, and 237 SNPs with valid allele frequencies in HapMap3). We computed the absolute value of the difference in allele frequencies between the CEU (northwest European) and YRI (Yorubas from Nigeria) groups for all HapMap3 SNPs passing basic quality control (688K SNPs genotyped using Affymetrix 6.0 arrays to match the MGS sample). We then contrasted the SNPs used by Arnedo et al. with all other affy6 SNPs. The Table S3 SNPs had markedly larger differences between a European and an African group. The mean for the absolute difference in allele frequency was 0.27 for the Table S3 SNPs used by Arnedo et al. versus 0.19 for all other SNPs. These highly significant differences underscore our concerns about population stratification bias.

B. X chromosome (chrX).
We noted that 15 of 237 of the SNPs in Table S3 were on chrX (again, Table S3 contains a fraction of the SNPs used in the modeling). Inclusion of chrX SNPs will partly reflect the sex of participants. Arnedo et al. say in their supplement that they include sex as a covariate in their regressions, but they do not describe how they account for sex in their matrix factorization. For example, since males have only one copy of chrX, genotypes for males will be either 0 or 1 whereas chrX genotypes for females will be either 0, 1 or 2. This difference will be salient to clustering algorithms such as those employed by the authors, so it seems likely that some component of the clusters of individuals identified by Arnedo et al. simply reflect genotype differences between sexes rather than clinical features of schizophrenia. It is well-known in statistical genetics that the sex chromosomes require special handling, but this issue is not addressed by Arnedo et al.

C. Linkage disequilibrium (LD).
Pairs of SNPs that are physically close in the genome are often correlated due to LD. Furthermore, in samples containing individuals with different ancestry, SNPs on different chromosomes whose allele frequencies differ between populations will appear to be correlated. These are both well-known phenomenon from population genetics.

The typical size of blocks defined by high LD is on the order of 20,000 bases, but LD is far from uniform across the genome. Using a large European sample genotyped with Affymetrix 6.0 arrays, we had previously computed the locations of particularly large blocks of LD (defined using SNPs with r2 > 0.5). The first step in the statistical methodology described by Arnedo et al. is to identify so-called "SNP sets" – sets of SNPs that travel together – which the authors believe contain some information about clinical subtypes of schizophrenia: "we first identified sets of interacting … SNPs that cluster within subgroups of individuals … regardless of clinical status" (no LD limitations were imposed). Of the 237 SNPs in Table S3 from Arnedo et al., 153 (65%) mapped to exceptionally large LD blocks larger than 100,000 bases (median 275kb, interquartile range 165-653kb, maximum 1.2 mb).

Arnedo et al. claim repeatedly that sets of SNPs that travel together are informative about clinical subtypes of schizophrenia. A more parsimonious interpretation of the SNP clusters identified by Arnedo et al. is that these SNPs represent a combination of (1) SNPs in large LD blocks and (2) SNPs whose allele frequencies differ substantially between European and African sample subsets. Indeed, matrix factorization algorithms similar to the methods employed by Arnedo et al. have been used to identify regions with long-range LD.

D. SNP selection.
Arnedo et al. conducted genetic clustering analyses on 2,891 SNPs selected on the basis of in-sample P-values from analysis of association with case-control status and selected from a total of ~700,000 SNPs. It is therefore expected that linear or non-linear combinations of these SNPs will be associated with case-control status in the same sample (their risk statistic); this is true even if the selected SNPs are not truly associated. A permutation test is used to assess the significance of the observed phenotype/genotype clustering. In this permutation test, subjects are randomly allocated to "SNP sets" but, since the SNPs were selected because they differ in allele frequency between cases and controls, this procedure does not generate a valid null distribution. As a result, the reported P-values are incorrect.

The strategy used by Arnedo etl al. is an example of estimation and selection of effects in a dataset and then testing (or re-estimating) them in the same data, a common pitfall of prediction analyses. To construct a valid permutation test, the authors should have randomized case-control status in the association analysis step, selected a new set of ~3,000 SNPs and generated a distribution of their coincident test index under a truly null distribution.

E. Replication.
Replication of results is a well-acknowledged strategy for generating confidence in reported findings. Arnedo et al. state that they replicated their findings in two samples but, upon closer examination, it is unclear precisely what replicated, exactly how this was done, and whether the degree of "replication" deviated from that expected by chance. It was also unclear whether the replication control samples were or were not independent from the discovery sample. Such non-independence is another common pitfall in prediction or validation analysis.

Given the remarkable claims made by Arnedo et al., it is essential that alternative explanations be excluded. Unfortunately, the authors do not provide the necessary evidence. As presented, their methodology is opaque (even to experts), meaning that their results cannot be independently validated. Arnedo et al. do not consider alternative explanations for the phenomena that they observe, such as confounding from ancestry and LD, even though these are well-known issues for the statistical methods that they employ and have been studied extensively in the statistical and population genetics literature. In addition, their multistep analysis approach is subject to multiple issues as noted above.

We believe that it is highly likely that the results of Arnedo et al. are not relevant for schizophrenia. We urge great caution in the interpretation of the results of study.


Press release from Washington University (St Louis)

Media coverage via Google News

Nonnegative matrix factorization and ancestry inference

Pitfalls of predicting complex traits from SNPs

Using principal components analysis to identify regions with long-range LD

View all comments by Michael O'Donovan

Related News: Study Claiming Eight Types of Schizophrenia Called Into Question

Comment by:  Alexander B. Niculescu
Submitted 6 October 2014
Posted 6 October 2014

Schizophrenia Subtypes: (Some) Right Ideas, (Some) Fuzzy Execution
The recent paper by Arnedo et al. (Arnedo et al., 2014) on "uncovering the hidden risk architecture of the schizophrenias" has three main ideas: 1) empirical discovery of groups of SNPs clustering with groups of schizophrenia subjects; 2) empirical discovery of groups of clinical features (what I have called in the past "phenes"; see Niculescu et al., 2006) clustering with groups of schizophrenia subjects; and 3) trying to put it all together (similar to the PhenoChipping approach put forward by myself and others in the past; see Niculescu et al., 2006).

The fact that groups of SNPs working together in networks can account for the missing heritability is not a new idea. It has been proposed before, as epistasis (Pezawas et al., 2008; Nicodemus et al., 2010) or as more complex combinatoric models integrating the environment (Patel et al., 2010; Ayalew et al., 2012). To people working in the gene expression field, which is closer to biology, it has been a given for many years, from the operon of Jacob and Monod (Jacob et al., 2005) to co-acting gene expression groups (CAGE; Niculescu et al., 2000) or co-expression networks (Zhang and Horvath, 2005; de Jong et al., 2012).

The devil is in the details of the execution, made difficult to judge by some lack of transparency about methodology and how independent the testing cohorts were. From more minor but more obvious caveats, such as SNPs being potentially in LD or potential population stratification, to more major but less obvious caveats, such as that this type of clustering will give you a fit-to-cohort effect that is dependent on the subjects used and the quality of the clinical information available on the subjects (often cursory in the large cohorts used for GWAS), things start to become fuzzy. All in all, it is too early to draw conclusions about how many subtypes of schizophrenia there are.

There are ways to mend this. First, it would be good to see converging lines of evidence scoring such as convergent functional genomics (CFG) used to prioritize SNPs and their associated genes for fit-to-disease first, prior to the clustering, as a way of preventing a fit-to-cohort effect (Niculescu and Le-Niculescu, 2010). Second, the reproducibility in completely independent, non-overlapping cohorts, of the locked panels of markers or "pheno-geno" subtypes, needs to be demonstrated unambiguously, such as was done by others in the past (Ayalew et al., 2012). Third, it is likely that schizophrenia is just one dimension of pathology, albeit a main one, in schizophrenia subjects. Combining also the dimensions of mood and anxiety will provide a better description of the clinical mental landscape (co-morbidities) (Niculescu et al., 2010) present, in fact, in these subjects and may account for some of the "missing reproducibility."


Arnedo J, Svrakic DM, Del Val C, Romero-Zaliz R, Hernandez-Cuervo H,, Fanous AH, Pato MT, Pato CN, de Erausquin GA, Cloninger CR, Zwir I. Uncovering the Hidden Risk Architecture of the Schizophrenias: Confirmation in Three Independent Genome-Wide Association Studies. Am J Psychiatry. 2014 Sep 15. Abstract

Niculescu AB, Lulow LL, Ogden CA, Le-Niculescu H, Salomon DR, Schork NJ, Caligiuri MP, Lohr JB. PhenoChipping of psychotic disorders: a novel approach for deconstructing and quantitating psychiatric phenotypes. Am J Med Genet B Neuropsychiatr Genet. 2006 Sep 5; 141B(6):653-62. Abstract

Pezawas L, Meyer-Lindenberg A, Goldman AL, Verchinski BA, Chen G, Kolachana BS, Egan MF, Mattay VS, Hariri AR, Weinberger DR. Evidence of biologic epistasis between BDNF and SLC6A4 and implications for depression. Mol Psychiatry. 2008 Jul; 13(7):709-16. Abstract

Nicodemus KK, Callicott JH, Higier RG, Luna A, Nixon DC, Lipska BK, Vakkalanka R, Giegling I, Rujescu D, St Clair D, Muglia P, Shugart YY, Weinberger DR. Evidence of statistical epistasis between DISC1, CIT and NDEL1 impacting risk for schizophrenia: biological validation with functional neuroimaging. Hum Genet. 2010 Apr; 127(4):441-52. Abstract

Patel SD, Le-Niculescu H, Koller DL, Green SD, Lahiri DK, McMahon FJ, Nurnberger JI, Niculescu AB. Coming to grips with complex disorders: genetic risk prediction in bipolar disorder using panels of genes identified through convergent functional genomics. Am J Med Genet B Neuropsychiatr Genet. 2010 Jun 5; 153B(4):850-77. Abstract

Ayalew M, Le-Niculescu H, Levey DF, Jain N, Changala B, Patel SD, Winiger E, Breier A, Shekhar A, Amdur R, Koller D, Nurnberger JI, Corvin A, Geyer M, Tsuang MT, Salomon D, Schork NJ, Fanous AH, O'Donovan MC, Niculescu AB. Convergent functional genomics of schizophrenia: from comprehensive understanding to genetic risk prediction. Mol Psychiatry. 2012 Sep; 17(9):887-905. Abstract

Jacob F, Perrin D, Sánchez C, Monod J, Edelstein S. [The operon: a group of genes with expression coordinated by an operator. C.R.Acad. Sci. Paris 250 (1960) 1727-1729]. C R Biol. 2005 Jun; 328(6):514-20. Abstract

Niculescu AB, Segal DS, Kuczenski R, Barrett T, Hauger RL, Kelsoe JR. Identifying a series of candidate genes for mania and psychosis: a convergent functional genomics approach. Physiol Genomics. 2000 Nov 9; 4(1):83-91. Abstract

Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005; 4():Article17. Abstract

de Jong S, Boks MP, Fuller TF, Strengman E, Janson E, de Kovel CG, Ori AP, Vi N, Mulder F, Blom JD, Glenthøj B, Schubart CD, Cahn W, Kahn RS, Horvath S, Ophoff RA. A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes. PLoS One. 2012; 7(6):e39498. Abstract

Niculescu, A.B. & Le-Niculescu, H. The P-value illusion: how to improve (psychiatric) genetic studies. American journal of medical genetics. Part B, Neuropsychiatric genetics: the official publication of the International Society of Psychiatric Genetics 153B, 847-849 (2010). Abstract

Niculescu AB, Le-Niculescu H. The P-value illusion: how to improve (psychiatric) genetic studies. Am J Med Genet B Neuropsychiatr Genet. 2010 Jun 5; 153B(4):847-9. Abstract

View all comments by Alexander B. Niculescu

Related News: Study Claiming Eight Types of Schizophrenia Called Into Question

Comment by:  Hakon Heimer
Submitted 8 October 2014
Posted 8 October 2014

[Editor's note: The discussion on this paper continues apace as of October 7, with replies to the critics from several of the authors of the original report by Arnedo et al. at PubMed Commons. They have submitted their original reply to SRF as well (below), and for the remaining replies and any future comments, we direct you to the discussion at PubMed.]

View all comments by Hakon Heimer

Related News: Study Claiming Eight Types of Schizophrenia Called Into Question

Comment by:  Gabriel de Erausquin
Submitted 9 October 2014
Posted 9 October 2014
  I recommend the Primary Papers

On behalf of: C. Robert Cloninger, MD, PhD (Departments of Psychiatry and Genetics, Washington University School of Medicine, St. Louis, MO, USA); Igor Zwir, PhD (Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA; Department of Computer Science and Artificial Intelligence, University of Granada, Spain); Gabriel A. de Erausquin, MD, PhD (Roskamp Laboratory of Brain Development, Modulation and Repair, Department of Psychiatry and Behavioral Neurosciences, University of South Florida, Tampa, FL, USA); Dragan M. Svrakic, MD, PhD (Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA); Coral del Val, PhD (Department of Computer Science and Artificial Intelligence, University of Granada, Spain); Javier Arnedo, M.S. (Department of Computer Science and Artificial Intelligence, University of Granada, Spain); Rocio Romero-Zaliz, PhD (Department of Computer Science and Artificial Intelligence, University of Granada, Spain); Helena Hernandez-Cuervo, MD, BSc (Roskamp Laboratory of Brain Development, Modulation and Repair, Department of Psychiatry and Behavioral Neurosciences, University of South Florida, Tampa, FL, USA)

Two Distinct Perspectives and Methodological Approaches to GWAS
We expected our paper uncovering the hidden risk architecture of the schizophrenias to be controversial because it takes a fundamentally new approach to solve problems that have plagued the field of medical genetics for more than a decade without resolution (Arnedo et al., 2014). We went through rigorous peer review regarding the method with experts in bioinformatics and genetics (Arnedo et al., 2013) and then again regarding the application of our new approach to the schizophrenias (Arnedo et al., 2014). The critical comments of Breen and other colleagues of Sullivan highlight the fact that we have a fundamentally new approach with a distinct perspective and properties from the traditional method they have used for several years. It is important to understand how our novel approach differs from the traditional one in order to appreciate the opportunities it provides for the advancement of science.

First, in our novel approach, common disorders are recognized to have a complex etiology in which multiple genetic and environmental variables interact in complex ways to influence the risk of disease in an individual person. Breen and commentators are experienced in approaches to genome-wide association studies (GWAS) that allow detection of only the average (additive) effects of individual genes in groups of people. We regard the traditional group-wise approach to GWAS as overly restrictive because it is well established that genes typically function in concert with one another, resulting in substantial epistasis in schizophrenia and many other common disorders (Risch, 1990). Fitness, health, and behavior are properties of persons, not genes. Nevertheless, the traditional approach can be useful when its a priori assumptions are satisfied. The traditional and novel approaches to GWAS should be viewed as being complementary perspectives and procedures.

Second, our novel approach allows for the possibility of complex relationships between multilocus genotypes and multifaceted phenotypes. In other words, different sets of genetic polymorphisms can be associated with the same phenotype ("equi-finality" or genetic heterogeneity), and the same set of genetic polymorphisms can be associated with multiple distinct phenotypes ("multifinality" or pleiotropy). They focus only on heterogeneous groups of cases, neglecting phenotypic variability among cases. It is important to note that our novel approach does not make any a priori assumption that complexity is present, but we do allow it to emerge from the data when present, as occurred clearly in our analysis of multiple independent samples of people with the schizophrenias.

Third, we carry out person-centered analyses that specify genotypic-phenotypic relationships within each individual by using clustering methods in which subjects are one matrix dimension and the other matrix dimension is either genotypic or phenotypic information. In other words, our analyses are informative about each individual, thereby providing a basis for identifying specific causes of illness in each person as a basis for tailoring treatment in a personalized way. In contrast, the traditional GWAS considers only average effects in groups of people, making it unjustified to say anything with confidence about a specific individual. Such traditional methods have failed to produce any reliable genetic test for the diagnosis of any psychiatric disorder in an individual person. In addition, phenomena such as population stratification and linkage disequilibrium may confound interpretation of the group-wise statistics of traditional GWAS, whereas these phenomena are easily evaluated in our person-centered approach.

Fourth, our approach is entirely data-driven using machine learning and data mining procedures that are unbiased and unsupervised (i.e., no a priori assumptions are made). Such data-driven methods have been used successfully in many fields of science but not for GWAS prior to our work. In contrast, the a priori assumptions made in the traditional GWAS approach, as used by Breen and commentators, have produced only weak associations between the average effects of individual genes and the diagnosis of schizophrenia. In fact, Sullivan and others proposed the formation of the Psychiatric Genetics Consortium (PGC) to address the problem of weak and inconsistent associations. Unfortunately, even large samples still produce only weak associations (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014).In addition, even large collections of subjects have encountered what is called the "missing heritability problem" in medical genetics: most of the variability in risk for the schizophrenias has remained unexplained. For example, the resemblance of monozygotic co-twins of people with schizophrenia is much greater than can be explained by the average effects of individual genes, indicating that multiple genes act in concert to influence risk (Risch, 1990). Whereas the heritability of schizophrenia is estimated to be 81 percent from twin studies, only about 25 percent of the variability has been explained by traditional GWAS.

In contrast, by applying our novel approach to GWAS we observed that sets of single nucleotide polymorphisms (SNPs) allow identification of individuals at very high risk (70 percent or more)and replicated the findings consistently in three independent samples. Our results can explain much more about disease risk than traditional group-wise approaches, so we are not surprised that such strong findings may come as a shock to those who have become accustomed to the weak associations identified by traditional GWAS. Again, we expected that this would be very controversial, but we're optimistic that this fundamentally new approach will open up many new opportunities for people interested in medical genetics. Our results were so unexpected that peer-reviewers demanded replication in independent samples before acceptance. Even we were delighted with the strong replication: 81 percent of 42 SNP sets associated with 70 to 100 percent risk of schizophrenia replicated almost exactly across three independent samples, and different SNP sets were associated with distinct phenotypic syndromes in which the gene products suggest possible pathways by which the functions and expression of the genes in the brain may explain the different clinical features of individual patients.

In summary, traditional approaches to GWAS focus on the average effects of individual genes in groups of people, whereas our novel approach focuses on the interactive effects of groups of genes in an individual person. Consequently the differences between these two approaches have profound consequences for the way they view and handle phenomena like linkage disequilibrium, population stratification, and X linkage. Unfortunately, Breen and colleagues have not adequately appreciated the profound differences between the traditional methods of GWAS with which they are familiar and the novel approach we have developed. As a result, the criticisms they made reflect their concerns about problems that regularly occur when a traditional group-wise approach is implemented, but these concerns may have minimal pertinence to our person-centered approach and findings.

The Facts About Ancestry and Population Stratification
Breen and commentators expressed concern that our findings may be an artifact. As scientists we are committed to trying to disconfirm findings we have made, no matter how strong the existing evidence may be. Findings about association need to be resolved experimentally at the molecular level not only for our findings but also for the findings of other published GWAS, which we plan to do in the near future. We have previously considered the variables that concern Breen and commentators carefully, but did not report these observations in detail for two reasons. First, their impact was empirically negligible for our approach, as we will describe, and we prioritized space to the variables that were most significant. Second, although the phenomena that concern Breen and commentators are often a serious problem for traditional group-wise approaches to GWAS, they are not problematic for our novel approach because it directly tests the association between genotypic variability and phenotypic variability within individuals after deconstruction of the observed or hidden structure of the population. We will describe the facts about each of these phenomena and explain why these criticisms fall short of explaining our findings. Breen and commentators expressed concern that we did not take the necessary steps to correct for population stratification bias in the way they would have done using their traditional group-wise statistical approach. They suggest that the clusters we identified simply reflect "SNPs whose allele frequencies differ substantially between European and African sample subsets," and go so far as to claim that the SNP sets we uncovered may not be relevant to schizophrenia at all. However, for that claim to be valid, ethnicity must have a strong influence on the risk and symptoms of schizophrenia, but that requirement is unlikely to be satisfied based on previous observations and was not found in our data. We did consider both sex and ancestry as covariates in the pre-selection of SNPs with at least loose association with schizophrenia. This pre-selection was performed to reduce the large search space using the logistic association function included in the PLINK software suite (Purcell et al., 2007). Our analysis was performed in this way to be compatible with the supplementary tables reported in (Shi et al., 2009) for African Americans (AA), European-Americans (EA), and individuals of mixed African and European ancestry (AA-EA). The most important fact about ethnic stratification is that there were multiple examples of SNP sets containing varying mixes of subjects from different subpopulations for each disease subtype in each of our three independent samples of subjects. For example, in the Molecular Genetics of Schizophrenia (MGS) sample, the SNP set 22_11 was represented by 48 percent AA and 52 percent EA,SNP set 21_8 by 55 percent AA and 45 percent EA, SNP set 31_22 by 53 percent AA and 47 percent EA, SNP set 54_51 by 79 percent AA and 21 percent EA, and SNP set 71_55 by 52 percent AA and 48 percent EA.

In fact, all the SNP sets that appeared to be ethnically stratified (i.e., contained mostly AA or EA subjects in MGS sample, such as 56_30 or 42_37) replicated their association with specific phenotypic indicators of different classes of schizophrenia in subjects of another ethnicity in CATIE or in the Portuguese sample. Although concerns about ethnic stratification may be valid elsewhere, ethnic stratification had little impact on our results and cannot explain the robust association of specific SNP sets with specific phenotypic sets regardless of ethnicity or sample. These observations show the great utility of detailed consideration of phenotypic variability in individual people in our approach, compared to the sensitivity to confounding by population stratification in traditional GWAS when heterogeneous phenotypes are lumped together indiscriminately as cases. The concerns of Breen and commentators about ethnic stratification point out a limitation of traditional group-wise GWAS that is averted by our novel approach. We thank them for drawing attention to another strength of our approach, one that we did not have enough space to report previously. We will discuss population stratification in more detail together with our reply about linkage disequilibrium (see posting 5), and discuss significance testing following our comments on replication in later sections of our reply to Part 2 of their comments (see posting 6).

The Facts About Gender and the X Chromosome
Breen and commentators express concern about gender effects in our results. Traditional GWAS focuses on average effects in heterogeneous groups, but our novel approach focuses on uncovering genotypic-phenotypic relationships in individuals regardless of their gender. Breen and commentators were concerned about the possible bias of results from 15 SNPs on the X chromosome among 245 SNPs in high-risk SNP sets. However, a simple test based on the number of chromosomes shows that 15 SNPs cannot substantially confound the results. It is true that three of our 42 high-risk SNP sets have some SNPs on the X Chromosome, but when this is considered in context along with the remaining 39 SNP sets, the influence of gender was insignificant by a Kolmogorov test. All SNP sets have consistent associations with distinct phenotypic sets regardless of gender. The effects of gender and location of genetic variants on the X chromosome have a negligible influence on our findings. In fact, the small number of SNPs on the X chromosome in SNP sets at high risk for schizophrenia shows that our person-centered method does not select SNPs that are in LD indiscriminately; the X chromosome has many highly conserved sets of epistatic genes in LD that influence gender and brain function (Graves, 2010), but these are not overrepresented in our SNP sets at high risk for schizophrenia. We thank Breen and commentators for calling attention to another finding that demonstrates that our method for identifying SNP sets is highly selective for particular phenotypes.

The Challenge of Understanding and Accepting a Change in Perspective
Is our approach to concurrent genotypic-phenotypic of possible complex relationships a fruitful new approach without the limiting assumptions of standard GWAS, or are our observations really artifactual in ways that have been overlooked by us and by multiple sets of peer reviewers with relevant expertise about these novel methods? The American Journal of Psychiatry gave us generous space for the published article, including clinical vignettes with associated genotypic information to help people see what we have done even if the technical details of the statistical procedures may seem obscure when you first start looking at complex genotype-phenotype relationships through the illuminating lens of sophisticated machine-learning and data mining procedures. We expected that there would be widespread interest and scrutiny of this new data-driven approach with less restrictive assumptions, so we prepared an extensive online supplement specifying procedures, all components of the sets of SNPs and clinical variables as well as a detailed analysis of the associated gene products, their functions, and disease associations.

The full list of SNPs used in our analysis is being made available for others to continue to test. We believe in transparency and collegiality as key ingredients in the advance of science because it is essential for the spirit of empiricism that our data driven method emphasizes. The precise procedures for reproducing the list of SNPs was detailed already in our supplemental information and should be reproducible by experienced investigators. We will continue to consider reasonable requests for assistance from qualified investigators.

Breen and commentators expressed their concern that the "methodology is opaque (even to experts), meaning that their results cannot be independently validated." First of all, the complexity of a method does not invalidate the approach: complex methods may be necessary to deconstruct and understand complex processes. We cannot continue to look for hidden relationships with methods that do not shine light where it is needed. That said, the manuscript was exhaustively evaluated under strict peer review process, which included a separate report from an independent statistician. Because of their many insightful comments, there is no doubt that the referees understood the method and provided recommendations that we conscientiously addressed in the resubmission process. Moreover, the PGMRA method utilized in this work was also evaluated by expert reviewers in bioinformatics and genetics for the journal Nucleic Acid Research (Arnedo et al., 2013). The method is well-described but does require relevant expertise beyond what is required for traditional GWAS. Fortunately, we have made a web-server application of the method publicly available as a service to the field. PGMRA is applicable to a wide variety of analyses besides GWAS, including brain imaging and related methods for uncovering order in complex hidden relationships, which may help to further characterize the pathway from genotype to phenotype more objectively than can be done by categorical diagnoses or symptom inventories in samples so large that costs become prohibitive for thorough assessment. We know that many are increasingly criticizing overspecialization in the fields of science, but the neglect of strictly data-driven techniques from machine-learning and data-mining that do not require restrictive a priori assumptions may well be precisely what has prevented us from understanding the complex genotypic-phenotypic architecture of common disorders like the schizophrenias.

We understand that our new approach is challenging long-held assumptions and that there may be a desire by some to put the genie back in the bottle, but we feel that looking at the complexity of the schizophrenias is a necessary evolution for the field; it is an evolution whose time has come and is currently transforming other fields of science and genetics. There is overwhelming evidence across multiple disciplines that living systems and psychosocial behavior are simply too complex and interactive to ignore the real underlying complexity. Nonetheless, we were a bit surprised to see this discussion in a public forum that is not peer reviewed. We would rather have thoughtful constructive consideration of the scientific merits of alternative approaches, including their fundamental philosophical differences in perspective and goals, as well as scientific differences in assumptions and procedures. One of the major obstacles to evaluating GWAS is that it can be difficult or impossible for scientists in many fields to evaluate complex technical procedures with which they are unfamiliar. The challenge of changing one's perspective can be great and feel counterintuitive, as physicists experienced more than a century ago when quantum mechanics called into question our more natural inclination to a Newtonian perspective. That is why we feel it would have been more constructive to have neutral review by people with relevant expertise in many aspects of methods that span bioinformatics, statistics, genomics, and phenomics, all of which are needed to adequately judge the strengths and weaknesses of a novel approach like ours. Even people with extensive experience in traditional approaches to GWAS may not be sufficiently knowledgeable about these well-tested, but relatively new, machine-learning and data-mining techniques that have allowed us to develop a new, and, we hope, more generative approach to GWAS.

Nevertheless, as scientists we are dedicated to identifying and learning how to move our fields of inquiry forward in order to better understand the underlying mechanisms of disease and to identify effective personalized treatments for complex disorders. We have found it is crucial to pay balanced attention to both phenotypic variability and genotypic variability if we are ever to describe the complex development of common and complex medical disorders like the schizophrenias. We do not feel that this public forum is the best place to have this discussion with Breen and colleagues, but here too we may have a philosophical difference. That said, because they have chosen this forum to voice their criticism, we feel it is important that we take the time to address the facts and give people a broader context so that they can understand the arguments and our responses to their concerns. Ultimately, we feel that this is more of a misunderstanding and a miscommunication due to a lack of a common scientific and philosophical approach, and that with time we hope to find more common ground. Ultimately, the data will settle any dispute.

The Facts About Linkage Disequilibrium
Breen and commentators also expressed concern that it is likely that our SNP sets may be merely artifacts of blocks of markers in linkage disequilibrium (LD). LD is strictly defined as the non-random association of alleles of neighboring polymorphisms derived from single ancestral chromosomes, but some broad measures of LD extend the concept to include co-variation of polymorphisms that are not linked, including even associations among genetic variants on separate chromosomes (Reich et al., 2001). Many variables influence co-variation of polymorphisms, including demographic variables (admixture, population size, migration, ancestral population bottlenecks), selection (including epistasis), and variation In recombination rates in different parts of the genome (Slatkin et al., 2008; Wiehe and Slatkin, 1998). Consequently LD is a serious problem for the traditional group-wise statistical approach of traditional GWAS, so care is taken to analyze groups with distinct ancestries separately in order to help disentangle different causes of association. However, in our novel person-centered approach, the identification of subpopulations is an intrinsic aspect of identifying the genotypic-phenotypic architecture. We identify sets of variables that naturally cluster within individual subjects as measured by covariance of polymorphisms or phenotypic traits within particular subgroups of individuals in a population. We identify SNP sets and phenotypic sets independently of one another and then test how these independently identified sets fit together like a lock and a key. In a strictly data-driven manner the hidden structure of the overall population is decomposed into subpopulations of subjects to allow valid tests of genotypic-phenotypic association despite admixture in the total population from which SNP sets and phenotypic sets are extracted (Pritchard and Donnelly, 2001). We allow the possibility that some constituent SNPs of a particular set may be associated (in LD) as adventitious hitchhikers that are closely linked or may be epistatic sets that are functionally adaptive and maintained by selection pressure even though they are unlinked (Koch et al., 2013). However, being linked (co-localized) or in LD is neither a necessary nor a sufficient condition for being a constituent in a SNP set: set membership depends on co-variation of polymorphisms in particular subpopulations of individuals whether the genetic variants are in LD in the total population or not. LD is actually one way that epistatic sets of genetic variants can be maintained in functionally adaptive blocks if the epistatic selection is strong, but most interactive sets of genetic variants are not in LD. Accordingly, we uncover constituents of SNP sets regardless of their LD status as candidates for functionally adaptive epistatic sets. Then we measure their potential functional interaction by testing for their differential association with phenotypic variability. We also consider the known function of the genes and regulatory sites as part of our analysis of the complex pathway from genotypic networks to distinct clinical syndromes. Thus we jointly utilize genotypic, biological, and phenotypic information as part of an integrated systems analysis that allows for observed or hidden stratification in the total population.

In addition to this fundamental difference in conceptual and procedural approach to LD, the concerns of Breen and commentators about our findings are simply unfounded empirically. In total, approximately 2/3 of the SNPs in high-risk sets map to regions that are so far apart in genomic distance (greater than 100,000 base pairs) that they are highly unlikely to be in LD. We found that nine of 42 high-risk SNP sets have some SNPs located on different chromosomes. These facts indicate that the identified SNP sets are not the result of particular genomic constraints such as LD or being within the region of the same gene. In any case, the presence of LD would not explain or invalidate the association of groups of SNPs within a particular SNP set with a particular phenotypic set. For example, one of our SNP sets maps exclusively to SNPs upstream of the NTRK3 gene, as was also found to be strongly associated with schizophrenia by standard GWAS techniques published by the authors of the commentary. In addition SNPs from another SNP set map inside the same gene. Each of these SNP sets involving different components of the NTRK3 gene are associated with different symptoms. Although LD is viewed as a statistical problem for traditional GWAS, in our person-centered approach it is viewed as the result of adaptive mechanisms that can conserve the functional connectivity of epistatic sets of genetic variants, thereby contributing to the differential development of individuals in subpopulations. The functional adaptation facilitated by gene-gene interactions is fundamentally important for healthy development of individuals and for the evolution of populations, as described in Sewall Wright's classical work on complex adaptive systems and evolution (Wright, 1982). Our concurrent consideration of the functions of gene products and associations between different genotypic networks with specific phenotypic syndromes precludes any suggestion that the highly replicable effects we observed are artifacts.

Breen and commentators have also suggested matrix factorization algorithms similar to the methods employed by us have been used to identify regions with long-range LD. This is certainly true and is not a problem in itself. Long-range LD is an indicator of functional connectivity that is not adequately explained by physical proximity, so it is included in what we want to detect in order to account for gene-gene interactions thoroughly (Wu et al., 2010). Matrix factorization methods like ours have been used for most of the current software applications in data-mining, including a wide variety of biomedical problems (Zwir et al., 2005; Zwir et al., 2005; Romero-Zaliz et al., 2008; Harari et al., 2010), facial recognition (Lee and Seung, 1999), gene expression (Mejia-Roa et al., 2008; Pascual-Montano et al., 2006; Tamayo et al., 2007), and other complex problems (Cichocki, 2009). There is no reason to avoid the use of this powerful method for pattern recognition within fuzzy data sets for uncovering hidden order within the complex association of genotypic and phenotypic variables that characterize complex medical disorders.

The Facts About Replication and Significance Testing of SNP Selection
Breen and commentators expressed concern about our replication process. Of course in traditional GWAS, replication has always been a serious problem, which is the basis for the rationale of PGC to carry out meta-analysis of large collections of samples despite their heterogeneity and limited phenotypic description. It was most challenging for us to identify samples with adequate clinical description to apply our novel approach, but the reward was in identifying strong effects that replicated consistently across three samples, including the Portuguese Islands study that used the same diagnostic instrument in a specific ethnic sample. The samples were independently recruited and independently analyzed, as we stated clearly in the published report. SNP sets, phenotypic sets and associations were separately calculated for the three samples to avoid weighted or biased aggregations. Then, we used a well-known co-clustering test based on the hypergeometric distribution to establish the replicability of results from one sample in the other. This test has been used widely in molecular biology (Zwir et al., 2005; Zwir et al., 2005; Tavazoie et al., 1999), and as a general strategy for validating clusters. For example, it has also been implemented into software packages such as TIBCO/Spotfire. The concerns expressed by Breen and commentators about replication have no reasonable justification. Thus we feel that the concerns expressed by Breen and commentators about replication are overstated and empirically unfounded. Again, the strength of this new approach is it allows us to avoid some of the major problems that plague traditional GWAS approaches.

Breen and commentators also expressed their concern about the use of a permutation test, claiming that "because SNP sets differ in allele frequency between cases and controls, this procedure does not generate a valid null distribution." The permutation test was used not to establish the significance of the SNP sets, which was evaluated by the SKAT method (Wu et al., 2010), but rather to test the validity (and approximate probability) of the association between SNP sets and symptom sets. Controls were not used in this test at all, as they have no symptoms of psychosis. Moreover, these symptoms were not even evaluated in the reported inventories. The misunderstanding of Breen and commentators is probably due to a lack of familiarity with this new statistical procedure, which highlights the previously discussed difficulties people have when first trying to understand a novel approach.

We appreciate the opportunity to clarify the fundamental differences between the assumptions and goals of traditional GWAS and our novel approach that addresses the complexity of common disorders with sophisticated and well-validated machine-learning and data-mining methods. We hope that the profound differences in the approaches with which Breen and colleagues are familiar and those developed by us should stimulate greater understanding of the challenges faced by the fields of psychiatric and medical genetics. We recognize that this new approach will cause a period of reexamination of standard methodology in this field, but every major advance in genetics, and in all of science for that matter, has always required flexibility and creative thinking. There are always things that we can improve upon in any method, and we recognize that many incremental improvements are essential for the advance of science.

We have put forth a new data-driven method that allows the uncovering of complex genotypic-phenotypic relations when they are present without imposing this as an a priori assumption. We uncovered relationships are in fact highly complex, which allowed us to identify individuals at high risk and to associate specific SNP clusters with specific clinical syndromes despite the presence of extensive pleiotropy and heterogeneity. This approach, like all those that have preceded it, is undoubtedly imperfect and will also require refinement and may ultimately give way to yet another approach that will explain more. Such methodological evolution is nothing more than the typical course of advancement in science. We hope that these exciting developments will lead to new ways to push the boundaries of accepted science, and help us to question prior assumptions that restrict our understanding of all the information embedded in data.

If this discussion has shown us nothing else, it is that this process of questioning and reflection has already begun. Ultimately, beyond all of the technical issues, our main goal is to help those in need. With schizophrenia, we know the need is great from the tremendous outpouring of requests for guidance and help that we have received, and we know that there are many people with other diseases who may benefit from our new approach. We can all be comforted knowing that our debate can bring us closer to doing what we are really here to do—that is, helping those suffering from debilitating diseases and finding ways to promote their health and well-being. Whatever path leads us there is worth considering. So let us not permit our philosophical or scientific differences to prevent us from allowing for a sufficient diversity in our tactics, because we never know what path will lead us toward our common goals of improving health and reducing the burden of disease.

Arnedo J, Svrakic DM, Del Val C, Romero-Zaliz R, Hernandez-Cuervo H,, Fanous AH, Pato MT, Pato CN, de Erausquin GA, Cloninger CR, Zwir I. Uncovering the Hidden Risk Architecture of the Schizophrenias: Confirmation in Three Independent Genome-Wide Association Studies. Am J Psychiatry. 2014 Sep 15. Abstract

Arnedo J, Del Val C, de Erausquin GA, Romero-Zaliz R, Svrakic D, Cloninger CR, Zwir I. PGMRA: a web server for (phenotype x genotype) many-to-many relation analysis in GWAS. Nucleic Acids Res. 2013 Jul; 41(Web Server issue):W142-9. Abstract

Risch N. Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet. 1990 Feb; 46(2):222-8. Abstract

Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014 Jul 24; 511(7510):421-7.Abstract

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007 Sep; 81(3):559-75. Abstract

Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, Dudbridge F, Holmans PA, Whittemore AS, Mowry BJ, Olincy A, Amin F, Cloninger CR, Silverman JM, Buccola NG, Byerley WF, Black DW, Crowe RR, Oksenberg JR, Mirel DB, Kendler KS, Freedman R, Gejman PV. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 2009 Aug 6; 460(7256):753-7. Abstract

Graves JA. Review: Sex chromosome evolution and the expression of sex-specific genes in the placenta. Placenta. 2010 Mar; 31 Suppl():S27-32. Abstract

Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, Lavery T, Kouyoumjian R, Farhadian SF, Ward R, Lander ES. Linkage disequilibrium in the human genome. Nature. 2001 May 10; 411(6834):199-204. Abstract

Slatkin M. Linkage disequilibrium--understanding the evolutionary past and mapping the medical future. Nat Rev Genet. 2008 Jun; 9(6):477-85. Abstract

Wiehe T, Slatkin M. Epistatic selection in a multi-locus Levene model and implications for linkage disequilibrium. Theor Popul Biol. 1998 Feb; 53(1):75-84. Abstract

Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theor Popul Biol. 2001 Nov; 60(3):227-37. Abstract

Koch E, Ristroph M, Kirkpatrick M. Long range linkage disequilibrium across the human genome. PLoS One. 2013; 8(12):e80754. Abstract

Wright S. The shifting balance theory and macroevolution. Annu Rev Genet. 1982; 16():1-19. Abstract

Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X. Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet. 2010 Jun 11; 86(6):929-42.Abstract

Zwir I, Shin D, Kato A, Nishino K, Latifi T, Solomon F, Hare JM, Huang H, Groisman EA. Dissecting the PhoP regulatory network of Escherichia coli and Salmonella enterica. Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2862-7. Abstract

Zwir I, Huang H, Groisman EA. Analysis of differentially-regulated genes within a regulatory network by GPS genome navigation. Bioinformatics. 2005 Nov 15; 21(22):4073-83. Abstract

Romero-Zaliz R, Del Val C, Cobb JP, Zwir I. Onto-CC: a web server for identifying Gene Ontology conceptual clusters. Nucleic Acids Res. 2008 Jul 1; 36(Web Server issue):W352-7. Abstract

Romero-Zaliz R, C. Rubio R, Cordin O, Cobb P, Herrera F, Zwir I. A multi-objective evolutionary conceptual clustering methodology for gene annotation within structural databases: a case of study on the gene ontology database. IEEE Transactions on Evolutionary Computation. 2008;12:6:679-701.

Harari O, Park SY, Huang H, Groisman EA, Zwir I. Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria. PLoS Comput Biol. 2010; 6(7):e1000862. Abstract

Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999 Oct 21; 401(6755):788-91. Abstract

Mejia-Roa E, Carmona-Saez P, Nogales R, Vicente C, Vazquez M, Yang XY, Garcia C, Tirado F, Pascual-Montano A. bioNMF: a web-based tool for nonnegative matrix factorization in biology. Nucleic Acids Res. 2008 Jul 1; 36(Web Server issue):W523-8. Abstract

Pascual-Montano A, Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Marqui RD. bioNMF: a versatile tool for non-negative matrix factorization in biology. BMC Bioinformatics. 2006; 7():366. Abstract

Tamayo P, Scanfeld D, Ebert BL, Gillette MA, Roberts CW, Mesirov JP. Metagene projection for cross-platform, cross-species characterization of global transcriptional states. Proc Natl Acad Sci U S A. 2007 Apr 3; 104(14):5959-64. Abstract

Cichocki A. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blinded separation. Chichester, U.K.: John Wiley; 2009.

Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nat Genet. 1999 Jul; 22(3):281-5. Abstract

View all comments by Gabriel de Erausquin