1 November 2005. Early navigators who ventured into the vast unknown were sometimes rewarded with landfall on warm, tropical islands. Modern-day explorers charting diversity in the genetic code have found a few hot spots of their own. In the October 27 Nature, an international crew of investigators called the International HapMap Consortium published their first draft of the human haplotype chart. This navigational aid promises to help researchers plumb the depths of the human genome for hazards, that is, variations that confer susceptibility to a myriad of diseases.
The HapMap project, led by David Altshuler at the Broad Institute of Harvard and MIT and Peter Donnelly at the University of Oxford in England, began in October 2002 with the goal of drafting a haplotype map within 3 years. Almost 3 years to the day later, the consortium, with research input from Canada, China, Japan, Nigeria, the UK, and the US, released a phase I map based on 269 sequenced genomes. The DNA samples were obtained from volunteers in Tokyo, Japan; Ibadan, Nigeria; Beijing, China; and Utah, USA.
Haplotypes are a means of cataloging genetic variance. Though 99.9 percent of the 3 billion or so nucleotides that make up the human genetic code are identical among the world’s 6.5 billion people, it is the 0.1 percent difference that ensures we don’t all look, sound, or think alike. And while that variety may add spice to life, it is also a large part of the reason for why some of us succumb to cancer, diabetes, or a disease of the central nervous system. Though some diseases can be blamed on a single letter change in the genetic code—the single nucleotide polymorphism (SNP)—more complex diseases are thought to result from a number of such changes. This represents an enormous challenge when trying to identify what specific combinations of mutations confer susceptibility to disease, or resistance to a drug. If you thought searching for a needle in a haystack was hard, consider trying to find five or ten needles among the 23 haystacks that are the human chromosomes.
This is where the haplotype comes in. A haplotype is a section of DNA that contains many single nucleotide polymorphisms. Because SNPs come and go infrequently, haplotypes are relatively stable. They also tend to be inherited as a whole block because genetic recombination, which could potentially rearrange the haplotype, is also somewhat rare. So if you inherit one haplotype SNP, you most likely inherit all the other associated SNPs, too.
For this reason, haplotype analysis has the potential to reduce significantly the amount of searching researchers need to do to find SNPs that are associated with a disease or other phenotype, such as drug resistance or sensitivity. In short, by finding one needle in the haystack, you can pull out many others along with it. In their paper, The HapMap Consortium reports how this redundancy may be even more extensive than previously thought. Using a pair-wise comparison method to analyze all the SNPs genotyped, they found that identifying a “tag” SNP every 5-10 kilobases of DNA is sufficient to reveal all the common variants in genome samples obtained from the Utah, Chinese, and Japanese volunteers. A slightly more dense array of SNPs (one every two to five kilobases) would achieve the same result for the Nigerian samples.
In practice, this means that in order to identify, with reasonable accuracy, which of the 10 million or so SNPs a person carries, researchers would only have to test about 250,000 tag SNPs. To be 100 percent accurate the number jumps up to about 450,000 tag SNPs (600,000 in the case of the Nigerian population), still less than 10 percent of the total. Hence, identifying haplotypes should not only be faster, but also cheaper than expected. In fact, consortium member Yusuke Nakamura, University of Tokyo, estimates that haplotype mapping could reduce the cost of searching for inherited genetic factors by 10- to 20-fold.
The point is illustrated by David Goldstein and Gianpiero Cavalleri from Duke University in an accompanying News & Views. Four years ago, these authors launched a project to identify which polymorphisms in the gene SCN1A are responsible for dictating a given patient’s response to an epileptic drug. It took the research team two years to identify the common SNPs and appropriate tags. “Today, the same job can be accomplished with simple computer algorithms, in minutes, using the HapMap data,” they write.
The value of the HapMap project is illustrated in an accompanying Nature paper from Vivian Cheung and colleagues at the University of Pennsylvania and The Children’s Hospital, both in Philadelphia. These researchers used the HapMap data to identify SNPs that influence gene expression. For 15 of 27 different genes previously identified as being heavily influenced by genetic variation, the authors found that their HapMap-based study agreed with previous findings—the HapMap analysis pointed to exactly the same cis-regulatory regions in the DNA. For one gene, chitinase 3-like 2 (CHI3L2), Cheung and colleagues were able to identify the exact SNP that regulated expression—a G to T mutation that leads to stronger binding of RNA polymerase II, which makes messenger RNA. “Our findings suggest that association studies with dense SNP maps will identify susceptibility loci or other determinants for some complex traits or diseases,” write the authors.
Though haplotype analysis may increase efficiency, there are concerns that it may do so at a cost—the studies may be weak in terms of statistical power. The second phase of the HapMap project, which is designed to uncover considerably more SNPs, may help in this regard, and in the meantime, methods exist that can be employed to increase the power of the studies. So conclude Altshuler and colleagues in a related Nature Genetics paper published online October 23. Joint first authors Paul de Bakker, Roman Yelensky and colleagues report that there are numerous ways of carrying out the analysis to preserve statistical power. They found, for example, that analyzing all haplotypes for association, not just those that have been linked to known SNPs, can increase the chances of detecting rare polymorphisms that cause disease.—Tom Fagan.
Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly P; International HapMap Consortium. A haplotype map of the human genome. Nature. 2005 Oct 27;437(7063):1299-320. Abstract
Goldstein DB, Cavalleri GL. Genomics: understanding human diversity.
Nature. 2005 Oct 27;437(7063):1241-2. No abstract available. Abstract
Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT. Mapping determinants of human gene expression by regional and genome-wide association. Nature. October 27, 2005;437:1365-1369. Abstract
De Bakker PIW, Yelensky R, Pe’er I, Gabriel SB, Daly MJ, Altshuler D. Efficiency and power in genetic association studies. Nat Genet. 2005 Nov;37(11):1217-1223. Epub 2005 Oct 23. Abstract