Comments on News and Primary Papers
Comment by: Daniel Weinberger, SRF Advisor
Submitted 19 March 2007
Posted 19 March 2007
Sense and Nonsense: General Lessons from Genetic Studies of Autism
The capability to characterize genetic variation across the entire genome in one fell swoop has generated considerable enthusiasm and expectation that the important genes for mental illness will “finally” be found. Whole genome association (WGA) is being touted as the path to genetic success in psychiatry. Is this sensible? Before considering the likely successes and limitations of this new capability, it is worth reminding ourselves of how we got here.
With respect to schizophrenia, over 50 years of studies of twin samples and of infants adopted away at birth have demonstrated that the lion’s share of risk for schizophrenia is determined by genes, to the tune of over 70 percent of the variance in liability (“heritability”). Family segregation studies have shown that the pattern of relative risk across relationships is most consistent with at minimum oligogenic inheritance, and more likely polygenic inheritance (Gottesman, I. I., Schizophrenia Genesis: The Origin of Madness, New York: W.H. Freeman.1991). After over a decade of linkage studies, it is clear that across diverse family samples, schizophrenia is not related to a common genetic locus, and no locus accounts for more than a fraction of risk for illness. Because we know that schizophrenia is highly heritable, the failure of linkage to reveal a chromosomal locus providing a highly significant LOD score in most samples is not because there are no genetic variations accounting for the heritability, but because, among other reasons, there is just too much locus heterogeneity across samples.
If we accept that schizophrenia is polygenic and genetically heterogeneous, meaning that in any sample under study, some cases will be ill because they have risk genes W, X, Y, and Z, while other cases will be ill because they have risk genes C, D, E, and F, then any common linkage signals will be diluted by this genetic heterogeneity if these genes are spread throughout the genome. In light of this situation, why, then, have some recent linkage studies of schizophrenia revealed significant and replicable linkage regions? Notwithstanding improvement in ascertainment methods and the informativeness of DNA marker sets, it is likely that linkage has worked in some regions of the genome because some of the genetic heterogeneity is concentrated in these areas, meaning that heterogeneity across families does not necessarily dilute the linkage signal at these loci. For example, in the 8p linkage peak, there are at least five genes that have been found to show association with schizophrenia in various samples: NRG1, PCM1, PPP3CC, DRP2, and FZD3, so if 10 percent of the families have risk alleles in NRG1 that contribute to their risk profile, and even if 10 percent have no NRG1 risk alleles but PCM1 alleles, and the same for PPP3CC and so on, this genetic heterogeneity will not dilute the linkage signal and the 8p locus containing these five genes will be positive in these families. Of course, in a subsequent association study, samples will be positive or negative for any one of these individual genes depending on which alleles happen to be enriched in that sample. This is how heterogeneity affects the prospects for positive linkage and association. Many observers of psychiatric genetics who argue against the validity of linkage and association in psychiatry like to talk about multifactorial medical illnesses such as heart disease and schizophrenia being genetically heterogeneous, but they do not like the walk when it comes to acknowledging the implications for finding association, positive or negative.
Heterogeneity has obvious implications for studies that attempt to survey variation in the entire genome and compare allele frequencies across ill and well samples. Heterogeneity in such studies dilutes the statistical effect of any single DNA polymorphism in the entire sample. Because literally hundreds of thousands of variations may be typed at one time, many of which have no prior probability of being related to the phenotype of interest, it is critical to employ some approach to statistical correction for the possibility of random positive associations. If one were to correct for 500,000 tests, the likelihood that any SNP related to a condition like schizophrenia will survive this level of correction, at least to the extent that the illness is polygenic and heterogeneous, is very small. Based on the strength of the existing data, none of the well-supported candidate susceptibility genes for schizophrenia that have been identified to date (e.g., DTNPB1, NRG1, DISC1, etc.) would survive such correction. It has been argued that the solution to this conundrum is the collection of very large datasets. This may increase power and generate impressive p values for a few genes, but the effect size of the association does not change with sample size, only the p value. It is also important to remember that the larger the sample size, the greater the potential for heterogeneity, because the collection of very large samples often requires multiple collection centers, each with their own ascertainment quirks. Thus, this approach runs the risk of a paradoxical reduction in the strength of linkage and association (see Brzustowicz, 2007).
These considerations have implications for studies of the genetic origins of other neuropsychiatric disorders, such as depression, bipolar disorder, anxiety disorders, and autism. Two recent important papers related to autism illustrate each of these points and offer important lessons for WGA studies that will be emerging soon related to schizophrenia and other psychiatric disorders.
The paper by the Autism Genome Project Consortium (AGPC) reports the largest linkage study of families (over 1,490 families) with children having the autism spectrum syndrome and the most informative set of linkage markers yet reported. This study illustrates in dramatic detail the complications alluded to above. Many areas of the genome show evidence of linkage, i.e., locus heterogeneity, but the individual signals are statistically weak. Indeed, using strict criteria for statistical analysis, no region would have been considered positive, and the region that was closest (11p12-13) was not identified as a promising region in earlier linkage studies.
In a series of exploratory post-hoc reanalyses of the data, trying to create more theoretically homogeneous clinical samples (e.g., gender specific, narrower diagnosis), several linkage signals became slightly more positive, but also involving regions of the genome not highlighted in earlier linkage studies. Does this failure to find an impressive statistical result in such an impressively large sample mean that this study is negative? Not if we expect autism to be genetically complex in the ways enumerated above. The results are exactly what would be predicted. Indeed, similar results have been reported before (Risch et al., 1999). The AGPC study also discovered regions where evidence of genomic structural changes, so-called sequence copy number variations (CNVs), might be associated with clinical diagnosis. Their data suggest that as many as 253 CNVs were discovered in 196 cases. The CNVs were found in many chromosomal regions (i.e., locus heterogeneity); involved duplications more often than deletions; varied considerably from one family to another; were spontaneous in most cases but inherited in some; and were most often found only in one individual, though recurrences occurred across ill subjects in some instances. It is very difficult to determine from these data how much of the genetic contribution to autism in this sample is explained by these copy number variations. In a few families, where multiple affected individuals had the same deletion, the data look convincing. However, it appears that CNVs were just as frequent, just as large (average 3.4 Mb) and just as likely to be duplications or deletions in the unaffected siblings of the children with autism.
The paper by Sebat and colleagues surveys the genome exclusively for evidence of structural changes related to variable copy numbers of DNA sequences and uses a putatively more sensitive method. They discovered submicroscopic deletions of 17 chromosomal regions in 14 children with autism spectrum disorder (7 percent of their ill sample). By design, all of the deletions described in this report were de novo, or spontaneous, meaning they were not found in the parents of the affected offspring and were thus not inherited. In other words, these deletions do not explain the very substantial heritability of autism, nor did they map to the regions of the genome that have shown up in linkage studies, which look specifically for loci that contribute to heritable risk (including the regions in the AGPC), nor did they highlight genes that have emerged from linkage studies as likely candidates accounting for the heritability of autism. Moreover, with one exception, all of the deletions were private, meaning they occurred in only one individual. As Sebat and colleagues point out, however, the infrequency of these copy number variations does not preclude them from pointing to more generalizable insights about genetic risk factors that operate in other cases. The genes affected by these infrequent structural variations may in other cases show common variations (e.g., SNPs) that contribute more widely to genetic liability. It is not clear how much overlap there is between the findings of these two studies, but clearly there are major differences.
The bottom line here is that genetic heterogeneity appears to be the rule in autism. While most cases are related to a complex set of inherited risk factors, some may be related to spontaneous genetic lesions, with many different lesions producing a similar clinical phenotype. None of this should surprise us, as diverse congenital encephalopathies can manifest the autism syndrome, e.g., fragile X syndrome, Rett syndrome, tuberous sclerosis. From a genetic point of view, autism is a syndrome that can be reached from many directions, along many paths. It is not likely that autism is any more of a discrete disease entity than, say, blindness or mental retardation.
So where does this leave us with respect to the goal of fully defining the genetic origins of mental disorders such as schizophrenia? The current list of promising candidate genes for schizophrenia is growing rapidly, and some already are leading to insights about potential pathophysiologic mechanisms and potential treatment targets (Straub and Weinberger, 2006). Genome variation scans will hopefully uncover many more novel genes that contribute to the risk for schizophrenia, and regardless of their outcome, these types of studies will be very important. It is likely that within the next 5 years we will have a good sense of all the common genetic variants that contribute to schizophrenia across many world samples. It is also likely that some cases will be related to structural variations (e.g., the 22q11 deletion associated with the velocardiofacial syndrome [VCFS]), both spontaneous and inherited. But, a phoenix rising from this newest chapter of investigation is not likely. Rather, as the recent autism studies illustrate, many genetic loci and many genes, again each accounting for only a relatively small percentage of ill subjects, will likely be the legacy of these studies. It is the legacy of all the work up to this point, and it is not likely to be different now that we can do many more of the same SNP assays all at one time. I doubt that genes that are discovered via WGA or related approaches will show greater effect sizes than the current top candidates, but there certainly will be more of them. Schizophrenia, like autism, is almost certainly a disorder that can be reached from many directions, along many paths. This being said, is it likely that a few genes with “highly significant” p values will be observed in a few of the multitude of WGA studies that will hit the press over the next year or two? Of course it is. Will these be the most important genes? Not necessarily. The challenge for our using these new data will be to make strategic choices about which of the various signals to pursue further and how to pursue them. The most important genes will be the ones that can be translated into meaningful information about disease mechanisms, therapeutic target identification, and clinical prediction.
View all comments by Daniel WeinbergerComment by: Paul Patterson
Submitted 21 March 2007
Posted 22 March 2007
Regarding the very high "heritability" of schizophrenia and autism: these values are usually based on twin studies, and there is good reason to be skeptical about these numbers.
For instance, the frequency of schizophrenia in dizygotic twins is twice as high as for siblings, suggesting a role for the fetal environment. Second, the concordance for monozygotic twins is 60 percent if they share a placenta, but only 11 percent if they have separate placentas, again highlighting the importance of the fetal environment. (Two-thirds of monozygotic twins share a placenta.) It is also relevant that roughly two-thirds of schizophrenia subjects do not have a primary or secondary relative with the disorder.
No one questions that genes play a role in the risk for schizophrenia and autism, but twins share a fetal environment as well as genes. The importance of the fetal environment is very well illustrated by the work of Brown and colleagues in their studies of the risk factor, maternal respiratory infection.
Phelps J, Davis J, Schartz K. Nature, Nurture, and Twin Research Strategies. Curr. Directions in Pyschol. Sci. 1997;6:117-120.
Brown AS. Prenatal infection as a risk factor for schizophrenia. Schizophr Bull. 2006 Apr;32(2):200-2. Epub 2006 Feb 9. Abstract
Brown AS, Susser ES. In utero infection and adult schizophrenia. Ment Retard Dev Disabil Res Rev. 2002;8(1):51-7. Review.
Ryan B, Vandenbergh J. Intrauterine position effects. Neuroscience and Biobehavioral Reviews. 2002;26:665–678. Abstract
View all comments by Paul PattersonComment by: Ben Pickard
Submitted 24 March 2007
Posted 24 March 2007
The Curious Incident of the Gap in the Chromosome
Our bodies are accustomed to a double dose of genes. The cellular ecosystem has been evolutionarily fine-tuned to this baseline of gene expression. Even the exceptions to the rule such as the sex-specific imbalance of X/Y chromosomes or the set of imprinted genes serve to highlight the compensatory mechanisms that have allowed the cell to adapt. Therefore, it is not surprising that chromosomal dosage changes are associated with disease states.
An ever-increasing appreciation of the link between disease and gene copy number has followed closely behind advances in techniques that have enabled the measurement of copy number variation at ever-greater resolution and sensitivity. Starting with Giemsa-stained chromosomes in classical cytogenetics, which identified visible aneuploidies such as trisomy 21, the field has progressed through fluorescence in situ hybridization (FISH) studies which pinpointed finer abnormalities, including those discovered through comparative genomic hybridization and sub-telomeric analysis, to today’s chip-based approaches, which can survey the whole genome at once. (In fact, as an aside, the sensitivity of the current state-of-the-art techniques is only likely to be truly improved upon with the advent of whole-genome sequencing—realistically, that is not likely for a decade or so.)
Despite this progress, the one-off nature and scarcity of many chromosome abnormalities have often led to their dismissal as genetic quirks and not relevant to disease biology at the population level.
Perhaps the tide is now turning in their favor as recent studies of sub-microscopic gene copy number changes have yielded intriguing and provocative discoveries. The two papers summarized on this site asked whether a proportion of autism spectrum disorders are caused by CNVs. The same question could, and doubtless will, be asked of schizophrenia, bipolar disorder, and other psychiatric conditions and so is worthy of discussion in this forum. The answer for autism seems to be a resounding “yes,” and this is likely to precipitate a sea change in autism research, both at the genetic and biological levels.
Sebat et al. (Science, 15 March, 2007) and The Autism Genome Project Consortium (“AGPC,” Nature Genetics, 18 February, 2007) used slightly different variations on the chip theme in their studies: the former had the advantage of a more discrete output for copy number compared to the continuous distribution from the latter approach. This had consequences for the setting of statistical detection thresholds, but both groups were quite thorough in the confirmation of many of their findings using secondary detection approaches.
Understanding the Consequences of Experimental Design:
Choice of Samples and Assessment
The samples chosen for analysis by both research groups focused on nominally family-based collections rather than sporadic cases. Thus, the mutations represented are highly likely to be of higher penetrance and relatively rare. In my opinion, the high level of locus heterogeneity that accompanies such a sample set makes the multiple-family linkage approach unlikely to yield practical dividends—indeed, the linkage component from the AGPC group is the least impressive aspect of their paper. The main linkage peak at 11p12-p13 was not a replication of the typical autism linkage findings (e.g., chromosome 7q, etc.; for review see Klauck, 2006). Additionally, above-threshold LOD scores were not significantly improved when diagnostic boundaries were changed or CNV carriers removed from the data. In fact, one of the most impressive features of the Sebat paper was the enlightened subdivision of the samples based not on phenotype, but rather by the nature of the inheritance patterns of autistic spectrum disorders within the families (the same may be true for the AGPC data, but the information is not explicitly categorized). This stratification into “simplex” (single case within the family) and “multiplex” (more than one affected individual) must be telling us something about the genetic architecture of complex genetic disorders. The results indicate that de novo CNVs were four times more common in the simplex families than multiplex. Let’s examine a hypothetical explanation for this finding. First, the simplex families may not be, or rather may not go on to be, true “families” in the genetic sense—their mutations are of the lower penetrance, “susceptibility altering” class. Such CNV mutations would not produce the densely affected families that are so attractive to gene mappers and so will never be collected and categorized as “multiplex.” The fact that three CNV regions (2q37.3, 3p14.2, and 20p13) are independently present twice in the Sebat simplex group adds weight to these CNVs being “common” risk variants—perhaps they are ripe candidates for a case-control association study in a larger simplex/sporadic cohort? The type of CNVs present in the multiplex families are, by definition, of sufficient penetrance for the multiplex classification to become possible: this class of mutations will probably be rarer. One supportive observation for the distinction between the two CNV types rests on the fact that there is no overlap between identified multiplex and simplex CNV regions—will that remain the case as further studies are carried out? Another, from the AGPC paper, is that many of their familial CNVs lie over previously identified linkage hotspots or known balanced chromosomal rearrangements (breakpoints, see below).
However, two mysteries remain: the predominance of CNV deletions in the Sebat paper compared to the stated overrepresentation of duplications in the AGPC paper. Whether this is a technical or family sample choice issue remains to be elucidated. Secondly, and perhaps more vague a problem, is the seldom addressed nature of the mutations identified in neuropsychiatric disorders. The archetypal mutations we learn about in undergraduate lectures, primarily in the context of neoplasms, include gain-of-function (oncogenes), loss-of-function (tumor suppressors), dominant negative and so on. Chromosome abnormalities in general, and CNVs in particular, seem to suggest that autism spectrum disorder (ASD), schizophrenia, and bipolar disorder are diseases in which gene dosage changes are the only pathological mechanism. Is this a real biological phenomenon or merely a methodological ascertainment bias? If the latter, how might we better adapt our gene hunting strategies to target other forms of mutation?
A Gene in the Hand Is Worth 50 Under a Linkage Peak
In the warm afterglow of an experimental tour-de-force, the biological ramifications can sometimes be sidelined. What genes have these CNVs affected and what does this tell us about the biology of autism spectrum disorder, we can ask, not forgetting that this work should be considered in the context of the history of other genetic and biological studies on ASD.
The first, and perhaps most impressive, finding is that of a CNV covering the Neurexin 1 (NRXN1) gene. The protein encoded by this gene interacts with a family of receptors called Neuroligins. Interestingly, Neuroligin 3 (NLGN3) and Neuroligin 4 (NLGN4) have been linked to ASD through chromosome abnormalities and mutations detected in rare cases. Moreover, SHANK3 has recently been identified as an ASD candidate through the study of cytogenetic abnormalities and several point mutations. SHANK3 protein has also been demonstrated to bind to neuroligins. This amazing convergence is reminiscent of another recent celebrity pairing in the schizophrenia field: the discovery of DISC1 and PDE4B through independent chromosome abnormalities followed by the discovery that their proteins functionally interact. The identification of these four ASD candidate genes is likely to stimulate much research into this nascent signaling pathway, particularly in the context of its supposed role in synaptogenesis.
Many of the CNVs affect gene clusters, and only by analyzing multiple overlapping deletions or systematically examining the gene candidates individually will the causative ASD genes be identified. This seems to be the case for the genes ZFP42 and PACRG, which have been found both in large CNVs with multiple genes affected and singly in smaller CNVs. Several additional CNVs were identified which were small enough, or within large enough genes (large size seems to be a anecdotally reported feature of genes identified through a variety of cytogenetic approaches) to implicate just that gene. These include SLC4A10, FLJ16237, A2BP1, NFIA, GAB2, PCDH7, PCDH9, CDH8, C18orf58, FHOD3, C2orf10, MAN2A1, CSMD1, and TRPM3 as a conservative selection. Two aspects of biology immediately spring to mind when viewing these genes. Firstly, the three members of the cadherin family identified fall into the same biological role as the neuroligins, namely cell adhesion. A related gene, FAT, has also been implicated in familial bipolar disorder. Secondly, the identification of MAN2A1 encoding a component enzyme in the pathway which post-translationally modifies proteins through glycosylation adds another gene from this process to a list including ALG9/DIBD1 and MGAT5 , both of which have been implicated in psychiatric illness. Together with the list of genes identified through CNV analysis, one can add USP6, NBEA, ST7, AUTS2, SSBP1, GRPR, and SHANK3, discovered in previous studies of autism spectrum disorder chromosome abnormalities. These candidates (and those identified in the psychoses) provide a wealth of resources for future functional and genetic studies. However, on the journey to a more rigorous biological definition of ASD, it may be a mistake to attempt to squeeze the functions of these genes into one unifying but unhelpfully vague cellular grouping, e.g., “signal transduction” or “metabolism.” Rather, biological investigations might benefit from trying to place these disparate genes in the context of their roles in the functioning of the brain regions or subsystems in which they are expressed. A hard task undoubtedly, but an endeavor that is likely to provide us with a more holistic understanding of the conditions.
View all comments by Ben Pickard