Schizophrenia Research Forum - A Catalyst for Creative Thinking

Schizophrenia Symptom GWAS Debuts

21 December 2012. In the first genomewide association study (GWAS) of schizophrenia symptom dimensions, researchers report a polygenic effect on disorganized symptoms, which include formal thought disorder and bizarre behavior. First authors Ayman Fanous of the Veterans Affairs Medical Center in Washington, DC, and Baiyu Zhou of Albert Einstein College of Medicine in Bronx, New York, along with the Schizophrenia Psychiatric GWAS Consortium (PGC), published their findings online December 1 in the American Journal of Psychiatry.

An illness characterized almost as well by its heterogeneity as by its common features, schizophrenia comes with a wide range of symptoms and variation in illness course, symptom severity, and outcome. A number of studies have suggested that this heterogeneity is heritable (Cardno et al., 2001; Fanous and Kendler, 2008). In the current study, researchers probed the genetic basis for this clinical heterogeneity using both Molecular Genetics of Schizophrenia (MGS) and PGC study samples. This work was presented earlier this year at the World Congress of Psychiatric Genetics meeting (see SRF related conference story).

Fanous, Zhou, and colleagues first examined data from the MGS sample of 2,454 subjects with schizophrenia or schizoaffective disorder (Shi et al., 2009), all of whom had been assessed with the Lifetime Dimensions of Psychosis Scale. Though the scale consists of 14 items, including delusions, paranoia, hallucinations, blunted affect, and formal thought disorder, factor analysis of the scores on these items from the MGS sample identified three symptom categoriespositive, negative/disorganized, and affective.

Even with this simplification, however, no single nucleotide polymorphism (SNP) of the 696,491 examined was significantly associated with any of these three symptom factors. This is not particularly surprising, given that larger sample sizes in the tens of thousands have been needed to detect schizophrenia-associated SNPs at a genomewide level of significance in past studies (see SRF related news story and SRF related conference story). Still, the authors highlighted 18 SNPs with promising p values less than 10-5 and found that these usually associated with a single symptom factor. Except for a signal in the major histocompatibility complex region of chromosome 6p21, these SNPs did not overlap with schizophrenia-associated SNPs detected in case-control designed GWAS, which suggests that genetic loci modulating symptoms may differ from those involved in schizophrenia vulnerability.

To see if these hints of association might signal something real, the researchers examined the symptom dimension-related SNPs in the PGC samples using a polygenic score analysis. A polygenic model, where thousands of alleles each contribute a small effect to schizophrenia on a population basis, was advanced by Gottesman and Shields (Gottesman and Shields, 1967) and has recently received experimental support (Purcell et al., 2009). The researchers sorted the symptom-related SNPs by varying levels of significance determined in the MGS sample, then evaluated the combined contributions of these SNPs in the PGC samples with a polygenic score. This revealed that the SNPs nominally associated with negative/disorganized symptoms were overrepresented in PGC cases compared to controls, though they explained only 0.05 percent of the variance in liability for schizophrenia. Subsequent analyses suggested that this effect was mainly due to disorganized symptoms of formal thought disorder and bizarre behavior rather than negative symptoms. Polygenic scores based on SNPs related to the positive or affective symptom dimensions did not differ between PGC cases and controls.

Standardizing clinical assessments of symptoms would greatly aid the identification of the relevant genetic factors, and a similar analysis of multiple PGC datasets is ongoing. The authors note that this could shed additional light on whether significant associations can be observed between individual SNPs and symptom dimensions, and whether the polygenic effect on negative/disorganized symptoms can be replicated and strengthened despite the need to combine different types of rating systems from different studies.Allison A. Curley.

Fanous AH, Zhou B, Aggen SH, Bergen SE, Amdur RL, Duan J, Sanders AR, Shi J, Mowry BJ, Olincy A, Amin F, Cloninger CR, Silverman JM, Buccola NG, Byerley WF, Black DW, Freedman R, Dudbridge F, Holmans PA, Ripke S, Gejman PV, Kendler KS, Levinson DF. Genomewide association study of clinical dimensions of schizophrenia: polygenic effect on disorganized symptoms. Am J Psychiatry . 2012 Dec 1 ; 169(12):1309-17. Abstract

Comments on Related News

Related News: GWAS Goes Bigger: Large Sample Sizes Uncover New Risk Loci, Additional Overlap in Schizophrenia and Bipolar Disorder

Comment by:  David J. Porteous, SRF Advisor
Submitted 21 September 2011
Posted 21 September 2011

Consorting with GWAS for schizophrenia and bipolar disorder: same message, (some) different genes
On 18 September 2011, Nature Genetics published the results from the Psychiatric Genetics Consortium of two separate, large-scale GWAS analyses, for schizophrenia (Ripke et al., 2011) and for bipolar disorder (Sklar et al., 2011), and a joint analysis of both. By combining forces across several consortia who have previously published separately, we should now have some clarity and definitive answers.

For schizophrenia, the Stage 1 GWAS discovery data came from 9,394 cases and 12,462 controls from 17 studies, imputing 1,252,901 SNPs. The Stage 2 replication sample comprised 8,442 cases and 21,397 controls. Of the 136 SNPs which reached genomewide significance in Stage 1, 129 (95 percent) mapped to the MHC locus, long known to be associated with risk of schizophrenia. Of the remaining seven SNPs, five mapped to previously identified loci. In total, just 10 loci met or exceeded the criteria of genomewide significance of p <5 x 10-8 at Stage 1 and/or Stage 2. The 10 "best" SNPs identified eight loci: MIR137, TRIM26, CSM1, CNNM2, NT5C2 and TCF4 were tagged by intragenic SNPs, while the remaining two were at some distance from a known gene (343 kb from PCGEM1 and 126 kb from CCDC68). More important than the absolute significance levels, the overall odds ratios (with 95 percent confidence intervals) ranged from 1.08 (0.96-1.20) to 1.40 (1.28-1.52). These fractional increases contrast with the ~10-fold increase in risk to the first-degree relative of someone with schizophrenia (Gottesman et al., 2010).

Six of these eight loci have been reported previously, but ZNF804A, a past favorite, was noticeably absent from the "top 10" list. The main attention now will surely be on MIR137, a newly discovered locus which encodes a microRNA, mir137, known to regulate neuronal development. The authors remark that 17 predicted MIR137 targets had a SNP with a p <10-4, more than twice as many as for the control gene set (p <0.01), though this relaxed significance cutoff seems somewhat arbitrary and warrants further examination. The result for MIR137 immediately begs the questions, Does the "risk" SNP affect MIR137 function directly or indirectly, and if so, does it affect the expression of any of the putative targets identified here? These are fairly straightforward questions: positive answers are vital to the biological validation of these statistical associations. As has been the case for follow-up studies of ZNF804A, however (reviewed by Donohoe et al., 2010), unequivocal answers from GWAS "hits" can be hard to come by, not least because of the very modest relative risks that they confer. Let us hope that this is not the case for MIR137, but it is of passing note that for two of the eight replication cohorts, the direction of effect for MIR137 was in the opposite direction from the Stage 1 finding. Taken together with the odds ratios reported in the range of 1.11-1.22, the effect size for the end phenotype of schizophrenia may be challenging to validate functionally. Perhaps a relevant intermediate phenotype more proximal to the gene will prove tractable.

For bipolar disorder, Stage 1 comprised 7,481 cases versus 9,250 controls, and identified 34 promising SNPs. These were replicated in Stage 2 in an independent set of 4,496 cases and a whopping 42,422 controls: 18 of the 34 SNPs survived at p <0.05. Taking Stage 1 and 2 together confirmed the previous "hot" finding for CACNA1C (Odds ratio = 1.14) and introduced a new candidate in ODZ4 (Odds ratio = 0.88, i.e., the minor allele is presumably "protective" or under some form of selection). Previous candidates ANK3 and SYNE1 looked promising at Stage 1, but did not replicate at Stage 2.

Finally, in a combined analysis of schizophrenia plus bipolar disorder versus controls, three of the respective "top 10" loci, CACNA1C, ANK3, and the ITIH3-ITIH4 region, came out as significant overall. This is consistent with the earlier evidence from the ISC for an overlap between the polygenic index for schizophrenia and bipolar disorder (Purcell et al., 2009). It is also consistent with the epidemiological evidence for shared genetic risk between schizophrenia and bipolar disorder (Lichtenstein et al., 2009; Gottesman et al., 2010).

What can we take from these studies? The authorship lists alone speak to the size of the collaborative effort involved and the sheer organizational task, depending on your point of view, that most of the positive findings were reported on previously could be seen as valuable "replication," or unnecessary duplication of cost and effort. Whichever way you look at it, though, just two new loci for schizophrenia and one for bipolar looks like a modest return for such a gargantuan investment. It begs the question as to whether the GWAS approach is gaining the hoped-for traction on major mental illness. Indeed, the evidence suggests that the technology tide is rapidly turning away from allelic association methods and towards rare mutation detection by copy number variation, exome, and/or whole-genome sequencing (Vacic et al., 2011; Xu et al., 2011).

Family studies are, as ever and always, of critical importance in genetics, and to distinguish between inherited and de-novo mutations. While the emphasis of GWAS has been on the impact of common, ancient allelic variation, it has become ever more obvious from both past linkage studies and from contemporary GWAS and CNV studies just how heterogeneous these conditions are, and how little note individual cases and families take of conventional DSM diagnostic boundaries. Improved genetic and other tools through which to stratify risk, define phenotypes, and predict outcomes are clearly needed. Whether such tools can be derived for GWAS data remains to be seen. It is important to remind ourselves of two things. First, case/association studies tell us something about the average impact (odds ratio, with confidence interval) of a given allele in the population studied. In these very large GWAS, this measure of impact will be approximating to the European population average. The odds ratios tell us that the impact per allele is modest. More importantly in some ways, the allele frequencies also tell us that the vast majority of allele carriers are not affected. Likewise, a high proportion of cases are not carriers. In the main, they are subtle risk modifiers rather than causal variants. That said, follow-up studies may define rare, functional genetic variants in MIR137 or CACNA1C or ANK3 that are tagged by the risk allele and that have sufficiently strong effects in a subset of cases for a causal link to be made. With this new GWAS data in hand, these sorts of questions can now be addressed.

It should also be said that there is clearly a wealth of potentially valuable information lying below the surface of the most statistically significant findings, but how to sort the true from the false associations? Should the MIR137 finding, and the targets of MIR137, be substantiated by biological analysis, then that would certainly be something well worth knowing and following up on. Network analysis by gene ontology and protein-protein interaction may yield more, but these approaches need to be approached with caution when not securely anchored from a biologically validated start point. Epistasis and pleiotropy are most likely playing a role, but even in these large sample sets, the power to determine statistical (as opposed to biological) evidence is challenging. All told, one is left thinking that more incisive findings have and will in the future come from family-based approaches, through structural studies (CNVs and chromosome translocations), and, in the near future, whole-genome sequencing of cases and relatives.


Ripke S, Sanders AR, Kendler KS, Levinson DF, Sklar P, Holmans PA, Lin DY, Duan J, Ophoff RA, Andreassen OA, Scolnick E, Cichon S, St Clair D, Corvin A, Gurling H, Werge T, Rujescu D, Blackwood DH, Pato CN, Malhotra AK, Purcell S, Dudbridge F, Neale BM, Rossin L, Visscher PM, Posthuma D, Ruderfer DM, Fanous A, Stefansson H, Steinberg S, Mowry BJ, Golimbet V, de Hert M, Jnsson EG, Bitter I, Pietilinen OP, Collier DA, Tosato S, Agartz I, Albus M, Alexander M, Amdur RL, Amin F, Bass N, Bergen SE, Black DW, Brglum AD, Brown MA, Bruggeman R, Buccola NG, Byerley WF, Cahn W, Cantor RM, Carr VJ, Catts SV, Choudhury K, Cloninger CR, Cormican P, Craddock N, Danoy PA, Datta S, de Haan L, Demontis D, Dikeos D, Djurovic S, Donnelly P, Donohoe G, Duong L, Dwyer S, Fink-Jensen A, Freedman R, Freimer NB, Friedl M, Georgieva L, Giegling I, Gill M, Glenthj B, Godard S, Hamshere M, Hansen M, Hansen T, Hartmann AM, Henskens FA, Hougaard DM, Hultman CM, Ingason A, Jablensky AV, Jakobsen KD, Jay M, Jrgens G, Kahn RS, Keller MC, Kenis G, Kenny E, Kim Y, Kirov GK, Konnerth H, Konte B, Krabbendam L, Krasucki R, Lasseter VK, Laurent C, Lawrence J, Lencz T, Lerer FB, Liang KY, Lichtenstein P, Lieberman JA, Linszen DH, Lnnqvist J, Loughland CM, Maclean AW, Maher BS, Maier W, Mallet J, Malloy P, Mattheisen M, Mattingsdal M, McGhee KA, McGrath JJ, McIntosh A, McLean DE, McQuillin A, Melle I, Michie PT, Milanova V, Morris DW, Mors O, Mortensen PB, Moskvina V, Muglia P, Myin-Germeys I, Nertney DA, Nestadt G, Nielsen J, Nikolov I, Nordentoft M, Norton N, Nthen MM, O'Dushlaine CT, Olincy A, Olsen L, O'Neill FA, Orntoft TF, Owen MJ, Pantelis C, Papadimitriou G, Pato MT, Peltonen L, Petursson H, Pickard B, Pimm J, Pulver AE, Puri V, Quested D, Quinn EM, Rasmussen HB, Rthelyi JM, Ribble R, Rietschel M, Riley BP, Ruggeri M, Schall U, Schulze TG, Schwab SG, Scott RJ, Shi J, Sigurdsson E, Silverman JM, Spencer CC, Stefansson K, Strange A, Strengman E, Stroup TS, Suvisaari J, Terenius L, Thirumalai S, Thygesen JH, Timm S, Toncheva D, van den Oord E, van Os J, van Winkel R, Veldink J, Walsh D, Wang AG, Wiersma D, Wildenauer DB, Williams HJ, Williams NM, Wormley B, Zammit S, Sullivan PF, O'Donovan MC, Daly MJ, Gejman PV. Genome-wide association study identifies five new schizophrenia loci. Nat Genet . 2011 Sep 18. Abstract

Psychiatric GWAS Consortium Bipolar Disorder Working Group, Sklar P, Ripke S, Scott LJ, Andreassen OA, Cichon S, Craddock N, Edenberg HJ, Nurnberger JI Jr, Rietschel M, Blackwood D, Corvin A, Flickinger M, Guan W, Mattingsdal M, McQuillin A, Kwan P, Wienker TF, Daly M, Dudbridge F, Holmans PA, Lin D, Burmeister M, Greenwood TA, Hamshere ML, Muglia P, Smith EN, Zandi PP, Nievergelt CM, McKinney R, Shilling PD, Schork NJ, Bloss CS, Foroud T, Koller DL, Gershon ES, Liu C, Badner JA, Scheftner WA, Lawson WB, Nwulia EA, Hipolito M, Coryell W, Rice J, Byerley W, McMahon FJ, Schulze TG, Berrettini W, Lohoff FW, Potash JB, Mahon PB, McInnis MG, Zllner S, Zhang P, Craig DW, Szelinger S, Barrett TB, Breuer R, Meier S, Strohmaier J, Witt SH, Tozzi F, Farmer A, McGuffin P, Strauss J, Xu W, Kennedy JL, Vincent JB, Matthews K, Day R, Ferreira MA, O'Dushlaine C, Perlis R, Raychaudhuri S, Ruderfer D, Hyoun PL, Smoller JW, Li J, Absher D, Thompson RC, Meng FG, Schatzberg AF, Bunney WE, Barchas JD, Jones EG, Watson SJ, Myers RM, Akil H, Boehnke M, Chambert K, Moran J, Scolnick E, Djurovic S, Melle I, Morken G, Gill M, Morris D, Quinn E, Mhleisen TW, Degenhardt FA, Mattheisen M, Schumacher J, Maier W, Steffens M, Propping P, Nthen MM, Anjorin A, Bass N, Gurling H, Kandaswamy R, Lawrence J, McGhee K, McIntosh A, McLean AW, Muir WJ, Pickard BS, Breen G, St Clair D, Caesar S, Gordon-Smith K, Jones L, Fraser C, Green EK, Grozeva D, Jones IR, Kirov G, Moskvina V, Nikolov I, O'Donovan MC, Owen MJ, Collier DA, Elkin A, Williamson R, Young AH, Ferrier IN, Stefansson K, Stefansson H, Thornorgeirsson T, Steinberg S, Gustafsson O, Bergen SE, Nimgaonkar V, Hultman C, Landn M, Lichtenstein P, Sullivan P, Schalling M, Osby U, Backlund L, Frisn L, Langstrom N, Jamain S, Leboyer M, Etain B, Bellivier F, Petursson H, Sigur Sson E, Mller-Mysok B, Lucae S, Schwarz M, Schofield PR, Martin N, Montgomery GW, Lathrop M, Oskarsson H, Bauer M, Wright A, Mitchell PB, Hautzinger M, Reif A, Kelsoe JR, Purcell SM. Large-scale genome-wide association analysis of bipolar disorder reveals a new susceptibility locus near ODZ4. Nat Genet. 2011 Sep 18. Abstract

Lichtenstein P, Yip BH, Bjrk C, Pawitan Y, Cannon TD, Sullivan PF, Hultman CM. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. Lancet . 2009 Jan 17 ; 373(9659):234-9. Abstract

Gottesman II, Laursen TM, Bertelsen A, Mortensen PB. Severe mental disorders in offspring with 2 psychiatrically ill parents. Arch Gen Psychiatry . 2010 Mar 1 ; 67(3):252-7. Abstract

Donohoe G, Morris DW, Corvin A. The psychosis susceptibility gene ZNF804A: associations, functions, and phenotypes. Schizophr Bull . 2010 Sep 1 ; 36(5):904-9. Abstract

Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Sklar P. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature . 2009 Aug 6 ; 460(7256):748-52. Abstract

Vacic V, McCarthy S, Malhotra D, Murray F, Chou HH, Peoples A, Makarov V, Yoon S, Bhandari A, Corominas R, Iakoucheva LM, Krastoshevsky O, Krause V, Larach-Walters V, Welsh DK, Craig D, Kelsoe JR, Gershon ES, Leal SM, Dell Aquila M, Morris DW, Gill M, Corvin A, Insel PA, McClellan J, King MC, Karayiorgou M, Levy DL, DeLisi LE, Sebat J. Duplications of the neuropeptide receptor gene VIPR2 confer significant risk for schizophrenia. Nature . 2011 Mar 24 ; 471(7339):499-503. Abstract

Xu B, Roos JL, Dexheimer P, Boone B, Plummer B, Levy S, Gogos JA, Karayiorgou M. Exome sequencing supports a de novo mutational paradigm for schizophrenia. Nat Genet . 2011 Jan 1 ; 43(9):864-8. Abstract

View all comments by David J. Porteous

Related News: GWAS Goes Bigger: Large Sample Sizes Uncover New Risk Loci, Additional Overlap in Schizophrenia and Bipolar Disorder

Comment by:  Patrick Sullivan, SRF Advisor
Submitted 26 September 2011
Posted 26 September 2011
  I recommend the Primary Papers

The two papers appearing online in Nature Genetics last Sunday are truly important additions to our increasing knowledge base for these disorders. The core analyses have been presented multiple times at international meetings in the past two years.

Since then, the available sample sizes for both schizophrenia and bipolar disorder have grown considerably. If the recently published data are any guide, the next round of analyses should be particularly revealing.

The PGC results and almost all of the data that were used in these reports are available by application to the controlled-access repository.

Please see the references for views of this area that contrast with those of Professor Porteous.


Sullivan P. Don't give up on GWAS. Molecular Psychiatry. 2011 Aug 9. Abstract

Kim Y, Zerwas S, Trace SE, Sullivan PF. Schizophrenia genetics: where next? Schizophr Bull. 2011;37:456-63. Abstract

View all comments by Patrick Sullivan

Related News: GWAS Goes Bigger: Large Sample Sizes Uncover New Risk Loci, Additional Overlap in Schizophrenia and Bipolar Disorder

Comment by:  Edward Scolnick
Submitted 28 September 2011
Posted 29 September 2011
  I recommend the Primary Papers

It is clear in human genetics that common variants and rare variants have frequently been detected in the same genes. Numerous examples exist in many diseases. The bashing of GWAS in schizophrenia and bipolar illness indicates, by those who make such comments, a lack of understanding of human genetics and where the field is. When these studies were initiated five years ago, next-generation sequencing was not available. Large samples of populations or trios or quartets did not exist. The international consortia have worked to collect such samples that are available for GWAS now, as well as for detailed sequencing studies. Before these studies began there was virtually nothing known about the etiology of schizophrenia and bipolar illness. The DISC1 gene translocation in the famous family was an important observation in that family. But almost a decade later there is still no convincing data that variants in Disc1 or many of its interacting proteins are involved in the pathogenesis of human schizophrenia or major mental illness.

Sequencing studies touted to be the Occam's razor for the field are beginning, and already, as in the past in this field, preemptive papers are appearing inadequately powered to draw any conclusions with certainty. Samples collected by the consortia will be critical to clarify the role of rare variants. This will take time and care so as not to set the field back into the morass it used to be. GWAS are basically modern public health epidemiology providing important clues to disease etiology. Much work is clearly needed once hits are found, just as it has been in traditional epidemiology. But in many fields, GWAS has already led to important biological insights, and it is certain it will do so in this field as well because the underlying principles of human genetics apply to this field, also. The primary problem in the field is totally inadequate funding by government organizations that consistently look for shortcuts to gain insights and new treatments, and forget how genetics has transformed cancer, immunology, autoimmune and inflammatory diseases, and led to better diagnostics and treatments. The field will never understand the pathogenesis of these illnesses until the genetic architecture is deciphered. The first enzyme discovered in E. coli DNA biochemistry was a repair enzyme—not the enzyme that replicated DNA—and this was discovered through genetics. The progress in this field has been dramatic in the past five years. All doing this work realize that this is only a beginning and that there is a long hard road to full understanding. But to denigrate the beginning, which is clearly solid, makes no sense and indicates a provincialism unbecoming to a true scientist.

View all comments by Edward Scolnick

Related News: GWAS Goes Bigger: Large Sample Sizes Uncover New Risk Loci, Additional Overlap in Schizophrenia and Bipolar Disorder

Comment by:  Nick CraddockMichael O'Donovan (SRF Advisor)
Submitted 11 October 2011
Posted 11 October 2011

At the start of the millennium, only two molecular genetic findings could be said with a fair amount of confidence to be etiologically relevant to schizophrenia and bipolar disorder. The first of these was that deletions of chromosome 22q11 that are known to cause velo-cardio-facial syndrome also confer a substantial increase in risk of psychosis. The second was the discovery by David St Clair, Douglas Blackwood, and colleagues (St Clair et al., 1990) of a balanced translocation involving chromosomes 1 and 11 that co-segregates with a range of psychiatric phenotypes in a single large family, was clearly relevant to the etiology of illness in that family (Blackwood et al., 2001). The latter finding has led to the conjecture, based upon a translocation breakpoint analysis reported by Kirsty Millar, David Porteous, and colleagues (Millar et al., 2000), that elevated risk in that family is conferred by altered function of a gene eponymously named DISC1. Just over a decade later, what can we now say with similar degrees of confidence? The relevance of deletions of 22q11 has stood the test of time—indeed, has strengthened—through further investigation (Levinson et al., 2011, being only one example), while the relevance of DISC1 remains conjecture. That the evidence implicating this gene is no stronger than it was all those years ago provides a clear illustration of the difficulties inherent in drawing etiological inferences from extremely rare mutations regardless of their effect size.

However, with the publication of several GWAS and CNV papers, culminating in the two mega-analyses reported by the PGC that are the subject of this commentary, one on schizophrenia, one on bipolar disorder, together reporting a total of six novel loci, very strong evidence has accumulated for approximately 20 new loci in psychosis. The majority of these are defined by SNPs, the remainder by copy number variants, and virtually all (including the rare, relatively high-penetrance CNVs) have emerged through the application of GWAS technology to large case-control samples, not through the study of linkage or families. Have GWAS approaches proven their worth? Clearly, the genetic findings represent the tip of a very deeply submerged iceberg, and it is possible that not all will stand the test of time and additional data, although the current levels of statistical support suggest the majority will do so. Nevertheless, the findings of SNP and CNV associations (including 22q11 deletions) seem to us to provide the first real signs of progress in uncovering strongly supported findings of primary etiological relevance to these disorders. Although SNP effects are small, the experience from other complex phenotypes is that statistically robust genetic associations, even those of very small effect, can highlight biological pathways of etiological (height; Lango Allen et al., 2010) and of possible therapeutic relevance (Alzheimer's disease; Jones et al., 2010). Moreover, it would seem intuitively likely that even if capturing the total heritable component of a disorder is presently a distant goal, the greater the number of associations captured, the better will be the snapshot of the sorts of processes that contribute to a disorder, and that might therefore be manipulated in its treatment. Thus, there is evidence that building even a very incomplete picture of the sort of genes that influence risk is an excellent method of informing understanding of pathogenesis of a highly complex disorder (or set of disorders).

As in previous GWAS and CNV endeavors, the PGC studies have required a significant degree of altruism from the hundreds of investigators and clinicians who have shared their data with little hope of significant academic credit. Moreover, where ethical approval permitted, the datasets have been made virtually open source for other investigators who are not part of the study. Sadly, this generosity of spirit is not matched in the rather curmudgeonly commentary provided by David Porteous. Rather than challenging the science or conduct of the study, it appears to us that the commentary takes the easier route of damnation by faint praise, distortion, and even innuendo.

The strongest finding, that being of association to the extended MHC region, is dismissed as "long known to be associated with risk of schizophrenia." How that knowledge was acquired a long time ago is unclear, but it cannot have been based upon data. It is true that weak and inconsistent associations at the MHC locus have been reported, even predating the molecular genetic era (McGuffin et al., 1978), but not until the landmark studies of the International Schizophrenia Consortium (2009), the Molecular Genetics of Schizophrenia Consortium (AbstractShi et al., 2009), and the SGENE+ Consortium (Stefansson et al., 2009) have the findings been strong enough to be described as knowledge. Porteous dismissive tone continues with the phrase "just 10 loci met.," the word "just" being a qualifier that seems designed to denigrate rather than challenge the results. Given the paucity of etiological clues, others might consider this a good yield. The observation in which the effect sizes at the detected loci are contrasted "with the ~10-fold increase in risk to the first-degree relative of someone with schizophrenia" is so fatuous it is difficult to believe its function is anything other than to insinuate in the mind of the reader the impression of failure. Yet no one remotely aware of the expectations behind GWAS would expect that the effect sizes of any common risk allele would bear any resemblance to that of family history, the latter reflecting the combined effects of many risk alleles.

Among the most important findings of the PGC schizophrenia group were those of strong evidence for association between a variant in the vicinity of a gene encoding regulatory RNA MIR137, and the subsequent finding that schizophrenia association signals were significantly enriched (P <0.01) among predicted targets of this regulatory RNA. Of course, like the other findings, there is room for the already very strong data to be further strengthened, but that finding alone opens up a whole new window in potential pathogenic mechanisms. Yet Porteous casually throws four handfuls of mud, dismissing the enrichment p <0.01 as a "relaxed significance cutoff," which "seems somewhat arbitrary," and that "warrants further examination," and commenting that "it is of passing note that for two of the eight replication cohorts, the direction of effect for MIR137 was in the opposite direction from the Stage 1 finding." If Porteous feels he has the expertise to pronounce on this analysis, it would behoove him well to choose his words more carefully. Since when is a P value of <0.01 "relaxed" when applied to a test of a single hypothesis? Can he really be unaware of the longstanding convention of regarding P <0.05 as significant in specific hypothesis testing? If he is not unaware of this, why is it generally applicable but "somewhat arbitrary" in the context of the PGC study? As for "further examination being warranted," this is true of any scientific finding, but what does he specifically mean in the context of his commentary? And why is it of "passing note" that not all samples show trends in the same direction? In the context of the well-known issues in GWAS concerning individual small samples and power, what is surprising about that? There may be simple answers to these questions, but we find it difficult to draw any other conclusion than that the choice of language is anything other than another attempt to sow seeds of doubt through innuendo rather than analysis.

The remark that "ZNF804A, a past favourite, was noticeably absent" falls well short of the standard one might expect of serious discourse. The choice of language suggests a desire to denigrate rather than analyse, and to insinuate without specific evidence that any interest in this gene should now be over. In fact, the largest study of this gene to date is that of Williams et al. (2010), which actually includes at least two-thirds of the PGC discovery dataset and is based on over 57,000 subjects, a sample almost three times as large as the mega-analysis sample of the PGC.

Porteous overall conclusion from the two studies is "whichever way you look at it, though, just two new loci for schizophrenia and one for bipolar looks like a modest return for such a gargantuan investment." This appraisal is misleading. The PGC studies were actually relatively small investments, being based on a synthesis of pre-existing data. Since the studies use existing data, there is naturally an expectation that some of the loci identified will have been previously reported as either significant or have otherwise been flagged up as of interest, while some will be new. Overall, the return on the GWAS investment is not just the six novel loci (rather than three); it is the totality of the findings, which, as noted above, currently number about 20 loci. The schizophrenia research community should also be made aware, if they are not already, that the return on these investments is not "one off"; it is cumulative. In the coming years, the component datasets will continue to generate a return in new gene discoveries (including CNVs yet to be reported by the PGC) as they are added (at essentially no cost) to other emerging GWAS datasets being generated largely through charitable support. With the returns in the bank already, one could (and we do) argue that the investment is negligible, particularly given the cost in human and economic terms of continued ignorance about these illnesses that blight so many lives.

It is true that with so little being known compared with what is yet to be known, the biological insights that can be made from the existing data are limited. This is equally true of the common and rare variants identified so far, and we are not aware of any of the "incisive findings" that Porteous claims have already come from alternative approaches, although the emergence of strong evidence for deletions at NRXN1 as a susceptibility variant for schizophrenia through meta-analysis of case-control GWAS data (one of the extra returns on the GWAS data we referred to above) deserves that description (Kirov et al., 2009). But this is not a cause for despair; in contrast to the future promises made on behalf of other as yet unproven designs, for eyes and minds that are open enough to see, the recent papers provide unambiguous evidence for a straightforward route to identifying more genes and pathways involved in the disorder. Even Porteous has partial sight of this, since he notes that "there is clearly a wealth of potentially valuable information lying below the surface of the most statistically significant findings." What he appears unable to see is "how to sort the true from the false associations?" The answer for a large number of loci is simple. Better-powered studies based upon larger sample sizes.

We would like to add a note of caution for those who too readily denigrate case-control approaches in favor of hyping other approaches, none of which are yet so well proven routes to success. We are not against those approaches; indeed, we are actively involved in them. But we are concerned that the hype surrounding sequencing, and the generation of what we think are unrealistic expectations, will make those designs vulnerable to attack from those who seem only too keen to make premature and inaccurate pronouncements of failure, who seem desperate to derive straw from nuggets of gold. If, as we believe is likely, it turns out to be quite a few years more before sequencing studies become sufficiently powered to provide large numbers of robust findings, as for GWAS, the consequence could be withdrawal of substantial government funding before those designs have had a chance to live up to their potential. That such an outcome has already largely been achieved for GWAS in some countries might be a source of rejoicing in some quarters, but it should also send out a warning to all who broadly hold the view that understanding the genetics of these disorders is central to understanding their origins, and to improving their future management.

The recent PGC papers represent an impressive, international collaboration based upon methodologies that have a proven track record in delivering important biological insights into other complex disorders, and now in psychiatry. Given the complexity of psychiatric phenotypes, we believe it is likely that a variety of approaches, paradigms, and ideas will be essential for success, including the approaches espoused by those who believe the evidence is compatible with essentially Mendelian inheritance. Inevitably, there will be sincerely held differences of opinion concerning the best way forward, and, of course, in any area of science, reasoned arguments based upon a fair assessment of the evidence are essential. Nevertheless, given there are sufficient uncertainties about what can be realistically delivered in the short term by the newer technologies, we suggest that the cause of bringing benefit to patients will most likely be better served by humility, realism, and a constructive discussion in which there is no place for belittling real achievements, for arrogance, or for dogmatic posturing.


Blackwood DH, Fordyce A, Walker MT, St Clair DM, Porteous DJ, Muir WJ. Schizophrenia and affective disorders--cosegregation with a translocation at chromosome 1q42 that directly disrupts brain-expressed genes: clinical and P300 findings in a family. Am J Hum Genet. 2001 Aug;69(2):428-33. Abstract

International Schizophrenia Consortium Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009 Aug 6;460(7256):748-52. Abstract

Jones L, Holmans PA, Hamshere ML, Harold D, Moskvina V, Ivanov D, et al. Genetic evidence implicates the immune system and cholesterol metabolism in the etiology of Alzheimer's disease. PLoS One. 2010 Nov 15;5(11):e13950. Erratum in: PLoS One. 2011;6(2). Abstract

Kirov G, Rujescu D, Ingason A, Collier DA, O'Donovan MC, Owen MJ. Neurexin 1 (NRXN1) deletions in schizophrenia. Schizophr Bull. 2009 Sep;35(5):851-4. Epub 2009 Aug 12. Review. Abstract

Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010 Oct 14;467(7317):832-8. Abstract

Levinson DF, Duan J, Oh S, Wang K, Sanders AR, Shi J, et al. Copy number variants in schizophrenia: confirmation of five previous findings and new evidence for 3q29 microdeletions and VIPR2 duplications. Am J Psychiatry. 2011 Mar;168(3):302-16. Abstract

McGuffin P, Farmer AE, Rajah SM. Histocompatability antigens and schizophrenia. Br J Psychiatry. 1978 Feb;132:149-51. Abstract

Millar JK, Wilson-Annan JC, Anderson S, Christie S, Taylor MS, Semple CA, et al. Disruption of two novel genes by a translocation co-segregating with schizophrenia. Hum Mol Genet. 2000 May 22;9(9):1415-23. Abstract

Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, et al. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 2009 Aug 6;460(7256):753-7. Abstract

St Clair D, Blackwood D, Muir W, Carothers A, Walker M, Spowart G, et al. Association within a family of a balanced autosomal translocation with major mental illness. Lancet. 1990 Jul 7;336(8706):13-6. Abstract

Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, et al Common variants conferring risk of schizophrenia. Nature. 2009 Aug 6;460(7256):744-7. Abstract

The Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium. Genome-wide association study identifies five new schizophrenia loci. Nat Genet. 2011 Sep 18;43(10):969-976. Abstract

Williams HJ, Norton N, Dwyer S, Moskvina V, Nikolov I, Carroll L, et al. Fine mapping of ZNF804A and genome-wide significant evidence for its involvement in schizophrenia and bipolar disorder. Mol Psychiatry. 2011 Apr;16(4):429-41. Abstract

View all comments by Nick Craddock
View all comments by Michael O'Donovan

Related News: GWAS Goes Bigger: Large Sample Sizes Uncover New Risk Loci, Additional Overlap in Schizophrenia and Bipolar Disorder

Comment by:  Todd LenczAnil Malhotra (SRF Advisor)
Submitted 11 October 2011
Posted 11 October 2011

It is worth re-emphasizing that efforts such as the Psychiatric GWAS Consortium do not rule out potentially important discoveries from alternative strategies such as endophenotypic approaches or examination of rare variants. Indeed, such strategies will be necessary to understand the functional mechanisms implicated by GWAS hits.

Moreover, we note that the two recently published PGC papers were not designed to exclude a role for previously identified candidate loci such as DISC1 (Hodgkinson et al., 2004), or prior GWAS findings such as rs1344706 at ZNF804A (Williams et al., 2011). For both these loci, and many others that have been proposed, meta-analysis of available samples suggest very small effect sizes (OR ~1.1), as might be expected for common variants. As noted in Supplementary Table S12 of the schizophrenia PGC paper (Ripke et al., 2011), the currently available sample size (~9,000 cases/~12,000 controls) of the discovery cohort was still underpowered to detect variants with odds ratios of 1.1, especially if they have a minor allele frequency of 20 percent or below.

An instructive example arises from the field of diabetes genetics. An association of a missense variant (rs1801282, Pro12Ala) in PPARG to type 2 diabetes was first reported in a sample of n = 91 Japanese-American patients (Deeb et al., 1998). Many subsequent studies failed to replicate the effect, and the initial large GWAS meta-analysis (involving >14,000 cases and ~18,000 controls; Zeggini et al., 2007) only detected the association at a p-value that would be considered non-significant by todays standard (p =1.7*10-6). Interestingly, the authors deemed the association to be confirmed, and the result was widely accepted within that field. Subsequent meta-analysis, involving twice as many subjects (total n = 67,000), finally obtained conventional genomewide levels of significance (p <5*10-8; Gouda et al., 2010).


Deeb SS, Fajas L, Nemoto M, Pihlajamki J, Mykknen L, Kuusisto J, Laakso M, Fujimoto W, Auwerx J. A Pro12Ala substitution in PPARgamma2 associated with decreased receptor activity, lower body mass index and improved insulin sensitivity. Nat Genet. 1998 Nov;20(3):284-7. Abstract

Gouda HN, Sagoo GS, Harding AH, Yates J, Sandhu MS, Higgins JP. The association between the peroxisome proliferator-activated receptor-gamma2 (PPARG2) Pro12Ala gene variant and type 2 diabetes mellitus: a HuGE review and meta-analysis. Am J Epidemiol. 2010 Mar 15;171(6):645-55. Abstract

Hodgkinson CA, Goldman D, Jaeger J, Persaud S, Kane JM, Lipsky RH, Malhotra AK. Disrupted in schizophrenia 1 (DISC1): association with schizophrenia, schizoaffective disorder, and bipolar disorder. Am J Hum Genet. 2004 Nov;75(5):862-72. Abstract

Williams HJ, Norton N, Dwyer S, Moskvina V, Nikolov I, Carroll L, Georgieva L, Williams NM, Morris DW, Quinn EM, Giegling I, Ikeda M, Wood J, Lencz T, Hultman C, Lichtenstein P, Thiselton D, Maher BS; Molecular Genetics of Schizophrenia Collaboration (MGS) International Schizophrenia Consortium (ISC), SGENE-plus, GROUP, Malhotra AK, Riley B, Kendler KS, Gill M, Sullivan P, Sklar P, Purcell S, Nimgaonkar VL, Kirov G, Holmans P, Corvin A, Rujescu D, Craddock N, Owen MJ, O'Donovan MC. Fine mapping of ZNF804A and genome-wide significant evidence for its involvement in schizophrenia and bipolar disorder. Mol Psychiatry. 2011 Apr;16(4):429-41. Abstract

Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney AS; Wellcome Trust Case Control Consortium (WTCCC), McCarthy MI, Hattersley AT. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007 Jun 1;316(5829):1336-41. Abstract

View all comments by Todd Lencz
View all comments by Anil Malhotra

Related News: Study Claiming Eight Types of Schizophrenia Called Into Question

Comment by:  Michael O'Donovan, SRF Advisor
Submitted 3 October 2014
Posted 3 October 2014

Comment by Michael O'Donovan, Gerome Breen, Brendan Bulik-Sullivan, Mark Daly, Sarah Medland, Benjamin Neale, Stephan Ripke, Patrick Sullivan, Peter Visscher, Naomi Wray

[Editor's note: Reprinted from PubMed Commons, without changes, under the Creative Commons attribution 3.0 license.]

In this study published on September 15, Arnedo et al. asserted that schizophrenia is a heterogeneous group of disorders underpinned by different genetic networks mapping to differing sets of clinical symptoms. As a result of their analyses, Arnedo et al. have made remarkable and perhaps unprecedented claims regarding their capacity to subtype schizophrenia. This paper has received considerable media attention. One claim features in many media reports, that schizophrenia can be delineated into "8 types". If these claims are replicable and consistent, then the work reported in this paper would constitute an important advance into our knowledge of the etiology of schizophrenia.

Unfortunately, these extraordinary claims are not justified by the data and analyses presented. Their claims are based upon complex (and we believe flawed) analyses that are said to reveal links between clusters of clinical data points and patterns of data generated by looking at millions of genetic data points. Instead of the complexities favored by Arnedo et al., there are far simpler alternative explanations for the patterns they observed. We believe that the authors have not excluded important alternative explanations – if we are correct, then the major conclusions of this paper are invalidated.

Analyses such as these rely on independence in many ways: among variables used in prediction, absence of artifactual relationships between genotypes and clinical variables, and between the methods of assessing significance and replication. Below we identify five specific areas of concern that are not adequately addressed in the manuscript, each of which calls into question the conclusions of this study.

A. Ancestry/population stratification.
Two of the three samples the authors studied (MGS and CATIE) have substantial proportions of subjects of European and African ancestry. The third sample is from southern Europe. Ancestry is an extremely well known confounder in genetic studies with a great capacity to yield false associations. Correct inference from genomic data in samples like these requires exceptional care. In the analyses they present, there is almost no mention of how this known bias was addressed or evaluation of its impact on their results. In the samples they used, their references to sets of SNPs that track together is essentially the definition of uncorrected population structure/stratification. Indeed, a central component of their statistical methodology – nonnegative matrix factorization – has been previously employed as a method for ancestry inference in the population genetics literature.

We were unsuccessful in attempts to obtain the full list of SNPs that Arnedo et al. analyzed. Instead, we evaluated the SNPs listed in Table S3 (448 SNP entries, 245 unique SNPs as SNPs could be present more than once, and 237 SNPs with valid allele frequencies in HapMap3). We computed the absolute value of the difference in allele frequencies between the CEU (northwest European) and YRI (Yorubas from Nigeria) groups for all HapMap3 SNPs passing basic quality control (688K SNPs genotyped using Affymetrix 6.0 arrays to match the MGS sample). We then contrasted the SNPs used by Arnedo et al. with all other affy6 SNPs. The Table S3 SNPs had markedly larger differences between a European and an African group. The mean for the absolute difference in allele frequency was 0.27 for the Table S3 SNPs used by Arnedo et al. versus 0.19 for all other SNPs. These highly significant differences underscore our concerns about population stratification bias.

B. X chromosome (chrX).
We noted that 15 of 237 of the SNPs in Table S3 were on chrX (again, Table S3 contains a fraction of the SNPs used in the modeling). Inclusion of chrX SNPs will partly reflect the sex of participants. Arnedo et al. say in their supplement that they include sex as a covariate in their regressions, but they do not describe how they account for sex in their matrix factorization. For example, since males have only one copy of chrX, genotypes for males will be either 0 or 1 whereas chrX genotypes for females will be either 0, 1 or 2. This difference will be salient to clustering algorithms such as those employed by the authors, so it seems likely that some component of the clusters of individuals identified by Arnedo et al. simply reflect genotype differences between sexes rather than clinical features of schizophrenia. It is well-known in statistical genetics that the sex chromosomes require special handling, but this issue is not addressed by Arnedo et al.

C. Linkage disequilibrium (LD).
Pairs of SNPs that are physically close in the genome are often correlated due to LD. Furthermore, in samples containing individuals with different ancestry, SNPs on different chromosomes whose allele frequencies differ between populations will appear to be correlated. These are both well-known phenomenon from population genetics.

The typical size of blocks defined by high LD is on the order of 20,000 bases, but LD is far from uniform across the genome. Using a large European sample genotyped with Affymetrix 6.0 arrays, we had previously computed the locations of particularly large blocks of LD (defined using SNPs with r2 > 0.5). The first step in the statistical methodology described by Arnedo et al. is to identify so-called "SNP sets" – sets of SNPs that travel together – which the authors believe contain some information about clinical subtypes of schizophrenia: "we first identified sets of interacting … SNPs that cluster within subgroups of individuals … regardless of clinical status" (no LD limitations were imposed). Of the 237 SNPs in Table S3 from Arnedo et al., 153 (65%) mapped to exceptionally large LD blocks larger than 100,000 bases (median 275kb, interquartile range 165-653kb, maximum 1.2 mb).

Arnedo et al. claim repeatedly that sets of SNPs that travel together are informative about clinical subtypes of schizophrenia. A more parsimonious interpretation of the SNP clusters identified by Arnedo et al. is that these SNPs represent a combination of (1) SNPs in large LD blocks and (2) SNPs whose allele frequencies differ substantially between European and African sample subsets. Indeed, matrix factorization algorithms similar to the methods employed by Arnedo et al. have been used to identify regions with long-range LD.

D. SNP selection.
Arnedo et al. conducted genetic clustering analyses on 2,891 SNPs selected on the basis of in-sample P-values from analysis of association with case-control status and selected from a total of ~700,000 SNPs. It is therefore expected that linear or non-linear combinations of these SNPs will be associated with case-control status in the same sample (their risk statistic); this is true even if the selected SNPs are not truly associated. A permutation test is used to assess the significance of the observed phenotype/genotype clustering. In this permutation test, subjects are randomly allocated to "SNP sets" but, since the SNPs were selected because they differ in allele frequency between cases and controls, this procedure does not generate a valid null distribution. As a result, the reported P-values are incorrect.

The strategy used by Arnedo etl al. is an example of estimation and selection of effects in a dataset and then testing (or re-estimating) them in the same data, a common pitfall of prediction analyses. To construct a valid permutation test, the authors should have randomized case-control status in the association analysis step, selected a new set of ~3,000 SNPs and generated a distribution of their coincident test index under a truly null distribution.

E. Replication.
Replication of results is a well-acknowledged strategy for generating confidence in reported findings. Arnedo et al. state that they replicated their findings in two samples but, upon closer examination, it is unclear precisely what replicated, exactly how this was done, and whether the degree of "replication" deviated from that expected by chance. It was also unclear whether the replication control samples were or were not independent from the discovery sample. Such non-independence is another common pitfall in prediction or validation analysis.

Given the remarkable claims made by Arnedo et al., it is essential that alternative explanations be excluded. Unfortunately, the authors do not provide the necessary evidence. As presented, their methodology is opaque (even to experts), meaning that their results cannot be independently validated. Arnedo et al. do not consider alternative explanations for the phenomena that they observe, such as confounding from ancestry and LD, even though these are well-known issues for the statistical methods that they employ and have been studied extensively in the statistical and population genetics literature. In addition, their multistep analysis approach is subject to multiple issues as noted above.

We believe that it is highly likely that the results of Arnedo et al. are not relevant for schizophrenia. We urge great caution in the interpretation of the results of study.


Press release from Washington University (St Louis)

Media coverage via Google News

Nonnegative matrix factorization and ancestry inference

Pitfalls of predicting complex traits from SNPs

Using principal components analysis to identify regions with long-range LD

View all comments by Michael O'Donovan

Related News: Study Claiming Eight Types of Schizophrenia Called Into Question

Comment by:  Alexander B. Niculescu
Submitted 6 October 2014
Posted 6 October 2014

Schizophrenia Subtypes: (Some) Right Ideas, (Some) Fuzzy Execution
The recent paper by Arnedo et al. (Arnedo et al., 2014) on "uncovering the hidden risk architecture of the schizophrenias" has three main ideas: 1) empirical discovery of groups of SNPs clustering with groups of schizophrenia subjects; 2) empirical discovery of groups of clinical features (what I have called in the past "phenes"; see Niculescu et al., 2006) clustering with groups of schizophrenia subjects; and 3) trying to put it all together (similar to the PhenoChipping approach put forward by myself and others in the past; see Niculescu et al., 2006).

The fact that groups of SNPs working together in networks can account for the missing heritability is not a new idea. It has been proposed before, as epistasis (Pezawas et al., 2008; Nicodemus et al., 2010) or as more complex combinatoric models integrating the environment (Patel et al., 2010; Ayalew et al., 2012). To people working in the gene expression field, which is closer to biology, it has been a given for many years, from the operon of Jacob and Monod (Jacob et al., 2005) to co-acting gene expression groups (CAGE; Niculescu et al., 2000) or co-expression networks (Zhang and Horvath, 2005; de Jong et al., 2012).

The devil is in the details of the execution, made difficult to judge by some lack of transparency about methodology and how independent the testing cohorts were. From more minor but more obvious caveats, such as SNPs being potentially in LD or potential population stratification, to more major but less obvious caveats, such as that this type of clustering will give you a fit-to-cohort effect that is dependent on the subjects used and the quality of the clinical information available on the subjects (often cursory in the large cohorts used for GWAS), things start to become fuzzy. All in all, it is too early to draw conclusions about how many subtypes of schizophrenia there are.

There are ways to mend this. First, it would be good to see converging lines of evidence scoring such as convergent functional genomics (CFG) used to prioritize SNPs and their associated genes for fit-to-disease first, prior to the clustering, as a way of preventing a fit-to-cohort effect (Niculescu and Le-Niculescu, 2010). Second, the reproducibility in completely independent, non-overlapping cohorts, of the locked panels of markers or "pheno-geno" subtypes, needs to be demonstrated unambiguously, such as was done by others in the past (Ayalew et al., 2012). Third, it is likely that schizophrenia is just one dimension of pathology, albeit a main one, in schizophrenia subjects. Combining also the dimensions of mood and anxiety will provide a better description of the clinical mental landscape (co-morbidities) (Niculescu et al., 2010) present, in fact, in these subjects and may account for some of the "missing reproducibility."


Arnedo J, Svrakic DM, Del Val C, Romero-Zaliz R, Hernandez-Cuervo H,, Fanous AH, Pato MT, Pato CN, de Erausquin GA, Cloninger CR, Zwir I. Uncovering the Hidden Risk Architecture of the Schizophrenias: Confirmation in Three Independent Genome-Wide Association Studies. Am J Psychiatry. 2014 Sep 15. Abstract

Niculescu AB, Lulow LL, Ogden CA, Le-Niculescu H, Salomon DR, Schork NJ, Caligiuri MP, Lohr JB. PhenoChipping of psychotic disorders: a novel approach for deconstructing and quantitating psychiatric phenotypes. Am J Med Genet B Neuropsychiatr Genet. 2006 Sep 5; 141B(6):653-62. Abstract

Pezawas L, Meyer-Lindenberg A, Goldman AL, Verchinski BA, Chen G, Kolachana BS, Egan MF, Mattay VS, Hariri AR, Weinberger DR. Evidence of biologic epistasis between BDNF and SLC6A4 and implications for depression. Mol Psychiatry. 2008 Jul; 13(7):709-16. Abstract

Nicodemus KK, Callicott JH, Higier RG, Luna A, Nixon DC, Lipska BK, Vakkalanka R, Giegling I, Rujescu D, St Clair D, Muglia P, Shugart YY, Weinberger DR. Evidence of statistical epistasis between DISC1, CIT and NDEL1 impacting risk for schizophrenia: biological validation with functional neuroimaging. Hum Genet. 2010 Apr; 127(4):441-52. Abstract

Patel SD, Le-Niculescu H, Koller DL, Green SD, Lahiri DK, McMahon FJ, Nurnberger JI, Niculescu AB. Coming to grips with complex disorders: genetic risk prediction in bipolar disorder using panels of genes identified through convergent functional genomics. Am J Med Genet B Neuropsychiatr Genet. 2010 Jun 5; 153B(4):850-77. Abstract

Ayalew M, Le-Niculescu H, Levey DF, Jain N, Changala B, Patel SD, Winiger E, Breier A, Shekhar A, Amdur R, Koller D, Nurnberger JI, Corvin A, Geyer M, Tsuang MT, Salomon D, Schork NJ, Fanous AH, O'Donovan MC, Niculescu AB. Convergent functional genomics of schizophrenia: from comprehensive understanding to genetic risk prediction. Mol Psychiatry. 2012 Sep; 17(9):887-905. Abstract

Jacob F, Perrin D, Sánchez C, Monod J, Edelstein S. [The operon: a group of genes with expression coordinated by an operator. C.R.Acad. Sci. Paris 250 (1960) 1727-1729]. C R Biol. 2005 Jun; 328(6):514-20. Abstract

Niculescu AB, Segal DS, Kuczenski R, Barrett T, Hauger RL, Kelsoe JR. Identifying a series of candidate genes for mania and psychosis: a convergent functional genomics approach. Physiol Genomics. 2000 Nov 9; 4(1):83-91. Abstract

Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005; 4():Article17. Abstract

de Jong S, Boks MP, Fuller TF, Strengman E, Janson E, de Kovel CG, Ori AP, Vi N, Mulder F, Blom JD, Glenthøj B, Schubart CD, Cahn W, Kahn RS, Horvath S, Ophoff RA. A gene co-expression network in whole blood of schizophrenia patients is independent of antipsychotic-use and enriched for brain-expressed genes. PLoS One. 2012; 7(6):e39498. Abstract

Niculescu, A.B. & Le-Niculescu, H. The P-value illusion: how to improve (psychiatric) genetic studies. American journal of medical genetics. Part B, Neuropsychiatric genetics: the official publication of the International Society of Psychiatric Genetics 153B, 847-849 (2010). Abstract

Niculescu AB, Le-Niculescu H. The P-value illusion: how to improve (psychiatric) genetic studies. Am J Med Genet B Neuropsychiatr Genet. 2010 Jun 5; 153B(4):847-9. Abstract

View all comments by Alexander B. Niculescu

Related News: Study Claiming Eight Types of Schizophrenia Called Into Question

Comment by:  Hakon Heimer
Submitted 8 October 2014
Posted 8 October 2014

[Editor's note: The discussion on this paper continues apace as of October 7, with replies to the critics from several of the authors of the original report by Arnedo et al. at PubMed Commons. They have submitted their original reply to SRF as well (below), and for the remaining replies and any future comments, we direct you to the discussion at PubMed.]

View all comments by Hakon Heimer

Related News: Study Claiming Eight Types of Schizophrenia Called Into Question

Comment by:  Gabriel de Erausquin
Submitted 9 October 2014
Posted 9 October 2014
  I recommend the Primary Papers

On behalf of: C. Robert Cloninger, MD, PhD (Departments of Psychiatry and Genetics, Washington University School of Medicine, St. Louis, MO, USA); Igor Zwir, PhD (Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA; Department of Computer Science and Artificial Intelligence, University of Granada, Spain); Gabriel A. de Erausquin, MD, PhD (Roskamp Laboratory of Brain Development, Modulation and Repair, Department of Psychiatry and Behavioral Neurosciences, University of South Florida, Tampa, FL, USA); Dragan M. Svrakic, MD, PhD (Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA); Coral del Val, PhD (Department of Computer Science and Artificial Intelligence, University of Granada, Spain); Javier Arnedo, M.S. (Department of Computer Science and Artificial Intelligence, University of Granada, Spain); Rocio Romero-Zaliz, PhD (Department of Computer Science and Artificial Intelligence, University of Granada, Spain); Helena Hernandez-Cuervo, MD, BSc (Roskamp Laboratory of Brain Development, Modulation and Repair, Department of Psychiatry and Behavioral Neurosciences, University of South Florida, Tampa, FL, USA)

Two Distinct Perspectives and Methodological Approaches to GWAS
We expected our paper uncovering the hidden risk architecture of the schizophrenias to be controversial because it takes a fundamentally new approach to solve problems that have plagued the field of medical genetics for more than a decade without resolution (Arnedo et al., 2014). We went through rigorous peer review regarding the method with experts in bioinformatics and genetics (Arnedo et al., 2013) and then again regarding the application of our new approach to the schizophrenias (Arnedo et al., 2014). The critical comments of Breen and other colleagues of Sullivan highlight the fact that we have a fundamentally new approach with a distinct perspective and properties from the traditional method they have used for several years. It is important to understand how our novel approach differs from the traditional one in order to appreciate the opportunities it provides for the advancement of science.

First, in our novel approach, common disorders are recognized to have a complex etiology in which multiple genetic and environmental variables interact in complex ways to influence the risk of disease in an individual person. Breen and commentators are experienced in approaches to genome-wide association studies (GWAS) that allow detection of only the average (additive) effects of individual genes in groups of people. We regard the traditional group-wise approach to GWAS as overly restrictive because it is well established that genes typically function in concert with one another, resulting in substantial epistasis in schizophrenia and many other common disorders (Risch, 1990). Fitness, health, and behavior are properties of persons, not genes. Nevertheless, the traditional approach can be useful when its a priori assumptions are satisfied. The traditional and novel approaches to GWAS should be viewed as being complementary perspectives and procedures.

Second, our novel approach allows for the possibility of complex relationships between multilocus genotypes and multifaceted phenotypes. In other words, different sets of genetic polymorphisms can be associated with the same phenotype ("equi-finality" or genetic heterogeneity), and the same set of genetic polymorphisms can be associated with multiple distinct phenotypes ("multifinality" or pleiotropy). They focus only on heterogeneous groups of cases, neglecting phenotypic variability among cases. It is important to note that our novel approach does not make any a priori assumption that complexity is present, but we do allow it to emerge from the data when present, as occurred clearly in our analysis of multiple independent samples of people with the schizophrenias.

Third, we carry out person-centered analyses that specify genotypic-phenotypic relationships within each individual by using clustering methods in which subjects are one matrix dimension and the other matrix dimension is either genotypic or phenotypic information. In other words, our analyses are informative about each individual, thereby providing a basis for identifying specific causes of illness in each person as a basis for tailoring treatment in a personalized way. In contrast, the traditional GWAS considers only average effects in groups of people, making it unjustified to say anything with confidence about a specific individual. Such traditional methods have failed to produce any reliable genetic test for the diagnosis of any psychiatric disorder in an individual person. In addition, phenomena such as population stratification and linkage disequilibrium may confound interpretation of the group-wise statistics of traditional GWAS, whereas these phenomena are easily evaluated in our person-centered approach.

Fourth, our approach is entirely data-driven using machine learning and data mining procedures that are unbiased and unsupervised (i.e., no a priori assumptions are made). Such data-driven methods have been used successfully in many fields of science but not for GWAS prior to our work. In contrast, the a priori assumptions made in the traditional GWAS approach, as used by Breen and commentators, have produced only weak associations between the average effects of individual genes and the diagnosis of schizophrenia. In fact, Sullivan and others proposed the formation of the Psychiatric Genetics Consortium (PGC) to address the problem of weak and inconsistent associations. Unfortunately, even large samples still produce only weak associations (Schizophrenia Working Group of the Psychiatric Genomics Consortium, 2014).In addition, even large collections of subjects have encountered what is called the "missing heritability problem" in medical genetics: most of the variability in risk for the schizophrenias has remained unexplained. For example, the resemblance of monozygotic co-twins of people with schizophrenia is much greater than can be explained by the average effects of individual genes, indicating that multiple genes act in concert to influence risk (Risch, 1990). Whereas the heritability of schizophrenia is estimated to be 81 percent from twin studies, only about 25 percent of the variability has been explained by traditional GWAS.

In contrast, by applying our novel approach to GWAS we observed that sets of single nucleotide polymorphisms (SNPs) allow identification of individuals at very high risk (70 percent or more)and replicated the findings consistently in three independent samples. Our results can explain much more about disease risk than traditional group-wise approaches, so we are not surprised that such strong findings may come as a shock to those who have become accustomed to the weak associations identified by traditional GWAS. Again, we expected that this would be very controversial, but we're optimistic that this fundamentally new approach will open up many new opportunities for people interested in medical genetics. Our results were so unexpected that peer-reviewers demanded replication in independent samples before acceptance. Even we were delighted with the strong replication: 81 percent of 42 SNP sets associated with 70 to 100 percent risk of schizophrenia replicated almost exactly across three independent samples, and different SNP sets were associated with distinct phenotypic syndromes in which the gene products suggest possible pathways by which the functions and expression of the genes in the brain may explain the different clinical features of individual patients.

In summary, traditional approaches to GWAS focus on the average effects of individual genes in groups of people, whereas our novel approach focuses on the interactive effects of groups of genes in an individual person. Consequently the differences between these two approaches have profound consequences for the way they view and handle phenomena like linkage disequilibrium, population stratification, and X linkage. Unfortunately, Breen and colleagues have not adequately appreciated the profound differences between the traditional methods of GWAS with which they are familiar and the novel approach we have developed. As a result, the criticisms they made reflect their concerns about problems that regularly occur when a traditional group-wise approach is implemented, but these concerns may have minimal pertinence to our person-centered approach and findings.

The Facts About Ancestry and Population Stratification
Breen and commentators expressed concern that our findings may be an artifact. As scientists we are committed to trying to disconfirm findings we have made, no matter how strong the existing evidence may be. Findings about association need to be resolved experimentally at the molecular level not only for our findings but also for the findings of other published GWAS, which we plan to do in the near future. We have previously considered the variables that concern Breen and commentators carefully, but did not report these observations in detail for two reasons. First, their impact was empirically negligible for our approach, as we will describe, and we prioritized space to the variables that were most significant. Second, although the phenomena that concern Breen and commentators are often a serious problem for traditional group-wise approaches to GWAS, they are not problematic for our novel approach because it directly tests the association between genotypic variability and phenotypic variability within individuals after deconstruction of the observed or hidden structure of the population. We will describe the facts about each of these phenomena and explain why these criticisms fall short of explaining our findings. Breen and commentators expressed concern that we did not take the necessary steps to correct for population stratification bias in the way they would have done using their traditional group-wise statistical approach. They suggest that the clusters we identified simply reflect "SNPs whose allele frequencies differ substantially between European and African sample subsets," and go so far as to claim that the SNP sets we uncovered may not be relevant to schizophrenia at all. However, for that claim to be valid, ethnicity must have a strong influence on the risk and symptoms of schizophrenia, but that requirement is unlikely to be satisfied based on previous observations and was not found in our data. We did consider both sex and ancestry as covariates in the pre-selection of SNPs with at least loose association with schizophrenia. This pre-selection was performed to reduce the large search space using the logistic association function included in the PLINK software suite (Purcell et al., 2007). Our analysis was performed in this way to be compatible with the supplementary tables reported in (Shi et al., 2009) for African Americans (AA), European-Americans (EA), and individuals of mixed African and European ancestry (AA-EA). The most important fact about ethnic stratification is that there were multiple examples of SNP sets containing varying mixes of subjects from different subpopulations for each disease subtype in each of our three independent samples of subjects. For example, in the Molecular Genetics of Schizophrenia (MGS) sample, the SNP set 22_11 was represented by 48 percent AA and 52 percent EA,SNP set 21_8 by 55 percent AA and 45 percent EA, SNP set 31_22 by 53 percent AA and 47 percent EA, SNP set 54_51 by 79 percent AA and 21 percent EA, and SNP set 71_55 by 52 percent AA and 48 percent EA.

In fact, all the SNP sets that appeared to be ethnically stratified (i.e., contained mostly AA or EA subjects in MGS sample, such as 56_30 or 42_37) replicated their association with specific phenotypic indicators of different classes of schizophrenia in subjects of another ethnicity in CATIE or in the Portuguese sample. Although concerns about ethnic stratification may be valid elsewhere, ethnic stratification had little impact on our results and cannot explain the robust association of specific SNP sets with specific phenotypic sets regardless of ethnicity or sample. These observations show the great utility of detailed consideration of phenotypic variability in individual people in our approach, compared to the sensitivity to confounding by population stratification in traditional GWAS when heterogeneous phenotypes are lumped together indiscriminately as cases. The concerns of Breen and commentators about ethnic stratification point out a limitation of traditional group-wise GWAS that is averted by our novel approach. We thank them for drawing attention to another strength of our approach, one that we did not have enough space to report previously. We will discuss population stratification in more detail together with our reply about linkage disequilibrium (see posting 5), and discuss significance testing following our comments on replication in later sections of our reply to Part 2 of their comments (see posting 6).

The Facts About Gender and the X Chromosome
Breen and commentators express concern about gender effects in our results. Traditional GWAS focuses on average effects in heterogeneous groups, but our novel approach focuses on uncovering genotypic-phenotypic relationships in individuals regardless of their gender. Breen and commentators were concerned about the possible bias of results from 15 SNPs on the X chromosome among 245 SNPs in high-risk SNP sets. However, a simple test based on the number of chromosomes shows that 15 SNPs cannot substantially confound the results. It is true that three of our 42 high-risk SNP sets have some SNPs on the X Chromosome, but when this is considered in context along with the remaining 39 SNP sets, the influence of gender was insignificant by a Kolmogorov test. All SNP sets have consistent associations with distinct phenotypic sets regardless of gender. The effects of gender and location of genetic variants on the X chromosome have a negligible influence on our findings. In fact, the small number of SNPs on the X chromosome in SNP sets at high risk for schizophrenia shows that our person-centered method does not select SNPs that are in LD indiscriminately; the X chromosome has many highly conserved sets of epistatic genes in LD that influence gender and brain function (Graves, 2010), but these are not overrepresented in our SNP sets at high risk for schizophrenia. We thank Breen and commentators for calling attention to another finding that demonstrates that our method for identifying SNP sets is highly selective for particular phenotypes.

The Challenge of Understanding and Accepting a Change in Perspective
Is our approach to concurrent genotypic-phenotypic of possible complex relationships a fruitful new approach without the limiting assumptions of standard GWAS, or are our observations really artifactual in ways that have been overlooked by us and by multiple sets of peer reviewers with relevant expertise about these novel methods? The American Journal of Psychiatry gave us generous space for the published article, including clinical vignettes with associated genotypic information to help people see what we have done even if the technical details of the statistical procedures may seem obscure when you first start looking at complex genotype-phenotype relationships through the illuminating lens of sophisticated machine-learning and data mining procedures. We expected that there would be widespread interest and scrutiny of this new data-driven approach with less restrictive assumptions, so we prepared an extensive online supplement specifying procedures, all components of the sets of SNPs and clinical variables as well as a detailed analysis of the associated gene products, their functions, and disease associations.

The full list of SNPs used in our analysis is being made available for others to continue to test. We believe in transparency and collegiality as key ingredients in the advance of science because it is essential for the spirit of empiricism that our data driven method emphasizes. The precise procedures for reproducing the list of SNPs was detailed already in our supplemental information and should be reproducible by experienced investigators. We will continue to consider reasonable requests for assistance from qualified investigators.

Breen and commentators expressed their concern that the "methodology is opaque (even to experts), meaning that their results cannot be independently validated." First of all, the complexity of a method does not invalidate the approach: complex methods may be necessary to deconstruct and understand complex processes. We cannot continue to look for hidden relationships with methods that do not shine light where it is needed. That said, the manuscript was exhaustively evaluated under strict peer review process, which included a separate report from an independent statistician. Because of their many insightful comments, there is no doubt that the referees understood the method and provided recommendations that we conscientiously addressed in the resubmission process. Moreover, the PGMRA method utilized in this work was also evaluated by expert reviewers in bioinformatics and genetics for the journal Nucleic Acid Research (Arnedo et al., 2013). The method is well-described but does require relevant expertise beyond what is required for traditional GWAS. Fortunately, we have made a web-server application of the method publicly available as a service to the field. PGMRA is applicable to a wide variety of analyses besides GWAS, including brain imaging and related methods for uncovering order in complex hidden relationships, which may help to further characterize the pathway from genotype to phenotype more objectively than can be done by categorical diagnoses or symptom inventories in samples so large that costs become prohibitive for thorough assessment. We know that many are increasingly criticizing overspecialization in the fields of science, but the neglect of strictly data-driven techniques from machine-learning and data-mining that do not require restrictive a priori assumptions may well be precisely what has prevented us from understanding the complex genotypic-phenotypic architecture of common disorders like the schizophrenias.

We understand that our new approach is challenging long-held assumptions and that there may be a desire by some to put the genie back in the bottle, but we feel that looking at the complexity of the schizophrenias is a necessary evolution for the field; it is an evolution whose time has come and is currently transforming other fields of science and genetics. There is overwhelming evidence across multiple disciplines that living systems and psychosocial behavior are simply too complex and interactive to ignore the real underlying complexity. Nonetheless, we were a bit surprised to see this discussion in a public forum that is not peer reviewed. We would rather have thoughtful constructive consideration of the scientific merits of alternative approaches, including their fundamental philosophical differences in perspective and goals, as well as scientific differences in assumptions and procedures. One of the major obstacles to evaluating GWAS is that it can be difficult or impossible for scientists in many fields to evaluate complex technical procedures with which they are unfamiliar. The challenge of changing one's perspective can be great and feel counterintuitive, as physicists experienced more than a century ago when quantum mechanics called into question our more natural inclination to a Newtonian perspective. That is why we feel it would have been more constructive to have neutral review by people with relevant expertise in many aspects of methods that span bioinformatics, statistics, genomics, and phenomics, all of which are needed to adequately judge the strengths and weaknesses of a novel approach like ours. Even people with extensive experience in traditional approaches to GWAS may not be sufficiently knowledgeable about these well-tested, but relatively new, machine-learning and data-mining techniques that have allowed us to develop a new, and, we hope, more generative approach to GWAS.

Nevertheless, as scientists we are dedicated to identifying and learning how to move our fields of inquiry forward in order to better understand the underlying mechanisms of disease and to identify effective personalized treatments for complex disorders. We have found it is crucial to pay balanced attention to both phenotypic variability and genotypic variability if we are ever to describe the complex development of common and complex medical disorders like the schizophrenias. We do not feel that this public forum is the best place to have this discussion with Breen and colleagues, but here too we may have a philosophical difference. That said, because they have chosen this forum to voice their criticism, we feel it is important that we take the time to address the facts and give people a broader context so that they can understand the arguments and our responses to their concerns. Ultimately, we feel that this is more of a misunderstanding and a miscommunication due to a lack of a common scientific and philosophical approach, and that with time we hope to find more common ground. Ultimately, the data will settle any dispute.

The Facts About Linkage Disequilibrium
Breen and commentators also expressed concern that it is likely that our SNP sets may be merely artifacts of blocks of markers in linkage disequilibrium (LD). LD is strictly defined as the non-random association of alleles of neighboring polymorphisms derived from single ancestral chromosomes, but some broad measures of LD extend the concept to include co-variation of polymorphisms that are not linked, including even associations among genetic variants on separate chromosomes (Reich et al., 2001). Many variables influence co-variation of polymorphisms, including demographic variables (admixture, population size, migration, ancestral population bottlenecks), selection (including epistasis), and variation In recombination rates in different parts of the genome (Slatkin et al., 2008; Wiehe and Slatkin, 1998). Consequently LD is a serious problem for the traditional group-wise statistical approach of traditional GWAS, so care is taken to analyze groups with distinct ancestries separately in order to help disentangle different causes of association. However, in our novel person-centered approach, the identification of subpopulations is an intrinsic aspect of identifying the genotypic-phenotypic architecture. We identify sets of variables that naturally cluster within individual subjects as measured by covariance of polymorphisms or phenotypic traits within particular subgroups of individuals in a population. We identify SNP sets and phenotypic sets independently of one another and then test how these independently identified sets fit together like a lock and a key. In a strictly data-driven manner the hidden structure of the overall population is decomposed into subpopulations of subjects to allow valid tests of genotypic-phenotypic association despite admixture in the total population from which SNP sets and phenotypic sets are extracted (Pritchard and Donnelly, 2001). We allow the possibility that some constituent SNPs of a particular set may be associated (in LD) as adventitious hitchhikers that are closely linked or may be epistatic sets that are functionally adaptive and maintained by selection pressure even though they are unlinked (Koch et al., 2013). However, being linked (co-localized) or in LD is neither a necessary nor a sufficient condition for being a constituent in a SNP set: set membership depends on co-variation of polymorphisms in particular subpopulations of individuals whether the genetic variants are in LD in the total population or not. LD is actually one way that epistatic sets of genetic variants can be maintained in functionally adaptive blocks if the epistatic selection is strong, but most interactive sets of genetic variants are not in LD. Accordingly, we uncover constituents of SNP sets regardless of their LD status as candidates for functionally adaptive epistatic sets. Then we measure their potential functional interaction by testing for their differential association with phenotypic variability. We also consider the known function of the genes and regulatory sites as part of our analysis of the complex pathway from genotypic networks to distinct clinical syndromes. Thus we jointly utilize genotypic, biological, and phenotypic information as part of an integrated systems analysis that allows for observed or hidden stratification in the total population.

In addition to this fundamental difference in conceptual and procedural approach to LD, the concerns of Breen and commentators about our findings are simply unfounded empirically. In total, approximately 2/3 of the SNPs in high-risk sets map to regions that are so far apart in genomic distance (greater than 100,000 base pairs) that they are highly unlikely to be in LD. We found that nine of 42 high-risk SNP sets have some SNPs located on different chromosomes. These facts indicate that the identified SNP sets are not the result of particular genomic constraints such as LD or being within the region of the same gene. In any case, the presence of LD would not explain or invalidate the association of groups of SNPs within a particular SNP set with a particular phenotypic set. For example, one of our SNP sets maps exclusively to SNPs upstream of the NTRK3 gene, as was also found to be strongly associated with schizophrenia by standard GWAS techniques published by the authors of the commentary. In addition SNPs from another SNP set map inside the same gene. Each of these SNP sets involving different components of the NTRK3 gene are associated with different symptoms. Although LD is viewed as a statistical problem for traditional GWAS, in our person-centered approach it is viewed as the result of adaptive mechanisms that can conserve the functional connectivity of epistatic sets of genetic variants, thereby contributing to the differential development of individuals in subpopulations. The functional adaptation facilitated by gene-gene interactions is fundamentally important for healthy development of individuals and for the evolution of populations, as described in Sewall Wright's classical work on complex adaptive systems and evolution (Wright, 1982). Our concurrent consideration of the functions of gene products and associations between different genotypic networks with specific phenotypic syndromes precludes any suggestion that the highly replicable effects we observed are artifacts.

Breen and commentators have also suggested matrix factorization algorithms similar to the methods employed by us have been used to identify regions with long-range LD. This is certainly true and is not a problem in itself. Long-range LD is an indicator of functional connectivity that is not adequately explained by physical proximity, so it is included in what we want to detect in order to account for gene-gene interactions thoroughly (Wu et al., 2010). Matrix factorization methods like ours have been used for most of the current software applications in data-mining, including a wide variety of biomedical problems (Zwir et al., 2005; Zwir et al., 2005; Romero-Zaliz et al., 2008; Harari et al., 2010), facial recognition (Lee and Seung, 1999), gene expression (Mejia-Roa et al., 2008; Pascual-Montano et al., 2006; Tamayo et al., 2007), and other complex problems (Cichocki, 2009). There is no reason to avoid the use of this powerful method for pattern recognition within fuzzy data sets for uncovering hidden order within the complex association of genotypic and phenotypic variables that characterize complex medical disorders.

The Facts About Replication and Significance Testing of SNP Selection
Breen and commentators expressed concern about our replication process. Of course in traditional GWAS, replication has always been a serious problem, which is the basis for the rationale of PGC to carry out meta-analysis of large collections of samples despite their heterogeneity and limited phenotypic description. It was most challenging for us to identify samples with adequate clinical description to apply our novel approach, but the reward was in identifying strong effects that replicated consistently across three samples, including the Portuguese Islands study that used the same diagnostic instrument in a specific ethnic sample. The samples were independently recruited and independently analyzed, as we stated clearly in the published report. SNP sets, phenotypic sets and associations were separately calculated for the three samples to avoid weighted or biased aggregations. Then, we used a well-known co-clustering test based on the hypergeometric distribution to establish the replicability of results from one sample in the other. This test has been used widely in molecular biology (Zwir et al., 2005; Zwir et al., 2005; Tavazoie et al., 1999), and as a general strategy for validating clusters. For example, it has also been implemented into software packages such as TIBCO/Spotfire. The concerns expressed by Breen and commentators about replication have no reasonable justification. Thus we feel that the concerns expressed by Breen and commentators about replication are overstated and empirically unfounded. Again, the strength of this new approach is it allows us to avoid some of the major problems that plague traditional GWAS approaches.

Breen and commentators also expressed their concern about the use of a permutation test, claiming that "because SNP sets differ in allele frequency between cases and controls, this procedure does not generate a valid null distribution." The permutation test was used not to establish the significance of the SNP sets, which was evaluated by the SKAT method (Wu et al., 2010), but rather to test the validity (and approximate probability) of the association between SNP sets and symptom sets. Controls were not used in this test at all, as they have no symptoms of psychosis. Moreover, these symptoms were not even evaluated in the reported inventories. The misunderstanding of Breen and commentators is probably due to a lack of familiarity with this new statistical procedure, which highlights the previously discussed difficulties people have when first trying to understand a novel approach.

We appreciate the opportunity to clarify the fundamental differences between the assumptions and goals of traditional GWAS and our novel approach that addresses the complexity of common disorders with sophisticated and well-validated machine-learning and data-mining methods. We hope that the profound differences in the approaches with which Breen and colleagues are familiar and those developed by us should stimulate greater understanding of the challenges faced by the fields of psychiatric and medical genetics. We recognize that this new approach will cause a period of reexamination of standard methodology in this field, but every major advance in genetics, and in all of science for that matter, has always required flexibility and creative thinking. There are always things that we can improve upon in any method, and we recognize that many incremental improvements are essential for the advance of science.

We have put forth a new data-driven method that allows the uncovering of complex genotypic-phenotypic relations when they are present without imposing this as an a priori assumption. We uncovered relationships are in fact highly complex, which allowed us to identify individuals at high risk and to associate specific SNP clusters with specific clinical syndromes despite the presence of extensive pleiotropy and heterogeneity. This approach, like all those that have preceded it, is undoubtedly imperfect and will also require refinement and may ultimately give way to yet another approach that will explain more. Such methodological evolution is nothing more than the typical course of advancement in science. We hope that these exciting developments will lead to new ways to push the boundaries of accepted science, and help us to question prior assumptions that restrict our understanding of all the information embedded in data.

If this discussion has shown us nothing else, it is that this process of questioning and reflection has already begun. Ultimately, beyond all of the technical issues, our main goal is to help those in need. With schizophrenia, we know the need is great from the tremendous outpouring of requests for guidance and help that we have received, and we know that there are many people with other diseases who may benefit from our new approach. We can all be comforted knowing that our debate can bring us closer to doing what we are really here to do—that is, helping those suffering from debilitating diseases and finding ways to promote their health and well-being. Whatever path leads us there is worth considering. So let us not permit our philosophical or scientific differences to prevent us from allowing for a sufficient diversity in our tactics, because we never know what path will lead us toward our common goals of improving health and reducing the burden of disease.

Arnedo J, Svrakic DM, Del Val C, Romero-Zaliz R, Hernandez-Cuervo H,, Fanous AH, Pato MT, Pato CN, de Erausquin GA, Cloninger CR, Zwir I. Uncovering the Hidden Risk Architecture of the Schizophrenias: Confirmation in Three Independent Genome-Wide Association Studies. Am J Psychiatry. 2014 Sep 15. Abstract

Arnedo J, Del Val C, de Erausquin GA, Romero-Zaliz R, Svrakic D, Cloninger CR, Zwir I. PGMRA: a web server for (phenotype x genotype) many-to-many relation analysis in GWAS. Nucleic Acids Res. 2013 Jul; 41(Web Server issue):W142-9. Abstract

Risch N. Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet. 1990 Feb; 46(2):222-8. Abstract

Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014 Jul 24; 511(7510):421-7.Abstract

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007 Sep; 81(3):559-75. Abstract

Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, Dudbridge F, Holmans PA, Whittemore AS, Mowry BJ, Olincy A, Amin F, Cloninger CR, Silverman JM, Buccola NG, Byerley WF, Black DW, Crowe RR, Oksenberg JR, Mirel DB, Kendler KS, Freedman R, Gejman PV. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 2009 Aug 6; 460(7256):753-7. Abstract

Graves JA. Review: Sex chromosome evolution and the expression of sex-specific genes in the placenta. Placenta. 2010 Mar; 31 Suppl():S27-32. Abstract

Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, Lavery T, Kouyoumjian R, Farhadian SF, Ward R, Lander ES. Linkage disequilibrium in the human genome. Nature. 2001 May 10; 411(6834):199-204. Abstract

Slatkin M. Linkage disequilibrium--understanding the evolutionary past and mapping the medical future. Nat Rev Genet. 2008 Jun; 9(6):477-85. Abstract

Wiehe T, Slatkin M. Epistatic selection in a multi-locus Levene model and implications for linkage disequilibrium. Theor Popul Biol. 1998 Feb; 53(1):75-84. Abstract

Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theor Popul Biol. 2001 Nov; 60(3):227-37. Abstract

Koch E, Ristroph M, Kirkpatrick M. Long range linkage disequilibrium across the human genome. PLoS One. 2013; 8(12):e80754. Abstract

Wright S. The shifting balance theory and macroevolution. Annu Rev Genet. 1982; 16():1-19. Abstract

Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X. Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet. 2010 Jun 11; 86(6):929-42.Abstract

Zwir I, Shin D, Kato A, Nishino K, Latifi T, Solomon F, Hare JM, Huang H, Groisman EA. Dissecting the PhoP regulatory network of Escherichia coli and Salmonella enterica. Proc Natl Acad Sci U S A. 2005 Feb 22; 102(8):2862-7. Abstract

Zwir I, Huang H, Groisman EA. Analysis of differentially-regulated genes within a regulatory network by GPS genome navigation. Bioinformatics. 2005 Nov 15; 21(22):4073-83. Abstract

Romero-Zaliz R, Del Val C, Cobb JP, Zwir I. Onto-CC: a web server for identifying Gene Ontology conceptual clusters. Nucleic Acids Res. 2008 Jul 1; 36(Web Server issue):W352-7. Abstract

Romero-Zaliz R, C. Rubio R, Cordin O, Cobb P, Herrera F, Zwir I. A multi-objective evolutionary conceptual clustering methodology for gene annotation within structural databases: a case of study on the gene ontology database. IEEE Transactions on Evolutionary Computation. 2008;12:6:679-701.

Harari O, Park SY, Huang H, Groisman EA, Zwir I. Defining the plasticity of transcription factor binding sites by Deconstructing DNA consensus sequences: the PhoP-binding sites among gamma/enterobacteria. PLoS Comput Biol. 2010; 6(7):e1000862. Abstract

Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature. 1999 Oct 21; 401(6755):788-91. Abstract

Mejia-Roa E, Carmona-Saez P, Nogales R, Vicente C, Vazquez M, Yang XY, Garcia C, Tirado F, Pascual-Montano A. bioNMF: a web-based tool for nonnegative matrix factorization in biology. Nucleic Acids Res. 2008 Jul 1; 36(Web Server issue):W523-8. Abstract

Pascual-Montano A, Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Marqui RD. bioNMF: a versatile tool for non-negative matrix factorization in biology. BMC Bioinformatics. 2006; 7():366. Abstract

Tamayo P, Scanfeld D, Ebert BL, Gillette MA, Roberts CW, Mesirov JP. Metagene projection for cross-platform, cross-species characterization of global transcriptional states. Proc Natl Acad Sci U S A. 2007 Apr 3; 104(14):5959-64. Abstract

Cichocki A. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blinded separation. Chichester, U.K.: John Wiley; 2009.

Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nat Genet. 1999 Jul; 22(3):281-5. Abstract

View all comments by Gabriel de Erausquin