Comment by:  Kevin J. Mitchell
Submitted 9 July 2009
Posted 9 July 2009

GWAS Results: Is the Glass Half Full or 95 Percent Empty?
The publication of the latest schizophrenia GWAS papers represents the culmination of a tremendous amount of work and unprecedented cooperation among a large number of researchers, for which they should be applauded. In addition to the hope of finding new “schizophrenia genes,” GWAS have been described by some of the researchers involved as, more fundamentally, a stern test of the common variants hypothesis. Based on the meagre haul of common variants dredged up by these three studies and their forerunners, this hypothesis should clearly now be resoundingly rejected—at least in the form that suggests that there is a large, but not enormous, number of such variants, which individually have modest, but not minuscule, effects. There are no common variants of even modest effect.

However, Purcell and colleagues now argue for a model involving vast numbers of variants, each of almost negligible effect alone. The authors show that an aggregate score derived from the top 10-50 percent of a set of 74,000 [...continued] SNPs from the association results in a discovery sample can predict up to 3 percent of the variance in a target group. Simply put, a set of putative “risk alleles” can be defined in one sample and shown, collectively, to be very slightly (though highly significantly in a statistical sense) enriched in the test sample, compared to controls. This is consistent across several different schizophrenia samples and even in two bipolar disorder samples. The authors go on to perform a set of control analyses that suggest that these results are not due to obvious population stratification or genotype rate effects (although effects at this level are obviously prone to cryptic artifacts).

If taken at face value, what do these results mean? They imply some kind of polygenic effect on risk, but of what magnitude? The answer to that depends on the interpretation of the additional simulations performed by the authors. They argue that the risk allele set inevitably contains very many false positives, which dilute the predictive power of the real positives hidden among them. Based on this logic, if we only knew which were the real variants to look at, then the variance explained in the target group would be much greater.

To try and estimate the magnitude of the effect of the polygenic load of “true risk” alleles, the authors conducted a series of simulations, varying parameters such as allele frequencies, genotype relative risks, and linkage disequilibrium with genotyped markers. They claim that these analyses converge on a set of models that recapitulate the observed data and that all converge on a true level of variance explained of around 34 percent, demonstrating a large polygenic component to the genetic architecture of schizophrenia.

These simulations adopt a level of statistical abstraction that should induce a healthy level of skepticism or at least reserved judgment on their findings. Most fundamentally, they rely explicitly for their calculations of the true variance on a liability-threshold model of the genetic architecture of schizophrenia. In effect, the “test” of the model incorporates the assumption that the model is correct.

The liability-threshold model is an elegant statistical abstraction that allows the application of the powerful statistics of normal distributions. Unfortunately, it suffers from the fact that it has no support whatsoever and makes no biological sense. First, there is no justification for assuming a normal distribution of “underlying liability,” whatever that term is taken to mean. Second, as usual when it is invoked, the nature of this putative threshold is not explained, though it surreptitiously implies some form of very strong epistasis (to explain the difference in risk between someone with x liability alleles and someone else with x+1 alleles). If this model is not correct, then these simulations are fatally flawed.

Even if the model were correct, the calculations are far from convincing. From a starting set of 560 models, the authors arrive at seven that are consistent with the observed degree of prediction in the target samples. According to the authors, the fact that these seven models converge on a small range of values for the underlying variance explained by the markers is evidence that this value (around 34 percent) represents the true situation. What is not highlighted is the fact that the values for the actual additive genetic variance (taking into account incomplete linkage disequilibrium between the markers and the assumed causal variants) across these models ranges from 34 percent to 98 percent and that the number of SNPs assumed to be having an effect ranges from 4,625 to 74,062. This extreme variation in the derived models hardly inspires confidence in the authors’ claim that their data “strongly support a polygenic basis to schizophrenia that (1) involves common SNPs, [and] (2) explains at least one-third of the total variation in liability.” (italics added)

From a more theoretical perspective, it should be noted that a polygenic model involving thousands of common variants of tiny effect cannot explain and will not contribute to the observed heightened familial relative risks. Such risk can only be explained by a variant of large effect or by an oligogenic model involving at most two to three loci (Bodmer and Bonilla, 2008; Hemminki et al., 2008; Mitchell and Porteous, in preparation). It seems much more likely that the observed predictive power in the target samples represents a modest “genetic background” effect, which could influence the penetrance and expressivity of rare, causal mutations. However, if the point of GWAS is to find genetic variants that are predictive of risk or that shed light on the pathogenic mechanisms of the disease, then clearly, even if such variants can be found by massively increasing sample sizes, their identification alone would not achieve or even appreciably contribute to either of these goals.


Hemminki K, Försti A, Bermejo JL. The “common disease-common variant” hypothesis and familial risks. PLoS ONE. 2008 Jun 18;3(6):e2504. Abstract

Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008 Jun;40(6):695-701. Abstract