Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Jun 8.
Published in final edited form as: Hum Mol Genet. 2007 Jun 27;16(SPEC):R220–R225. doi: 10.1093/hmg/ddm161

Successful design and conduct of genome-wide association studies

Christopher I Amos 1,*
PMCID: PMC2691963  NIHMSID: NIHMS115958  PMID: 17597095

Abstract

Genome-wide association studies are becoming an increasingly effective tool for identifying genetic factors contributing to complex diseases. In this review, I discuss two sets of genome-wide association studies that identified novel genetic factors for age-related macular degeneration and genetic factors for type II diabetes. In reviewing these sets of studies, my goal is to identify factors that contributed to the success of these studies. Design-related factors include the selection of traits that show strong familiality, the selection of clinically homogeneous populations and the selection of cases that have a family history. Ethnic stratification within the study sample can lead to biases, and methods to control for stratification are briefly reviewed. Finally, the impact of single nucleotide polymorphism selection on the power of a study and procedures for improving power by inferring genotypes, by combining data across studies and by performing multistage analyses are discussed. The continuing success of genome-wide association studies depends on careful selection of populations for study and on collaborative analytical approaches.

INTRODUCTION

The genome-wide association study (GWAS) is an increasingly popular approach for identifying genetic factors influencing common, complex diseases. The popularity of this approach reflects the major advances in technology for high-throughput genetic analysis which is now associated with very low error rates and very low per-single nucleotide polymorphism (SNP) genotype costs. Collecting data and samples from individuals for association studies is generally much easier than the collection of families. In addition, ongoing epidemiological studies have often already assembled large case–control and cohort populations, which are then available for genetic analysis. Despite many positive factors that are promoting the application of association studies, costs for these studies still remain very high because usually a very large number of subjects need to be studied. A large number of subjects are needed because associations between SNPs and causal variants are expected to show low odds ratios, typically below 1.5. In addition, in order to obtain a reliable signal, given the very large number of tests that are required, associations must either show a high level of significance or replicate in multiple studies, or both.

Several approaches have been proposed to control the costs of genome-wide association studies. Methods that can be applied include pooling of case and control populations, application of two and multistage designs and the sharing of controls across populations. Although these approaches can reduce costs and in some cases dramatically, they are only optimal under certain restrictive conditions. On the other hand, careful consideration of epidemiological and genetic characteristics of the study population can increase the power or reduce costs under a variety of conditions. Therefore, in this review, I emphasize features of the study population that increase power. Features that can improve power include selection of cases because of a family history of disease, restriction to genetically homogeneous subsets, the inclusion of quantitative factors and the choice of SNPs.

The application of whole-genome association analysis in the study of large collections of cases and controls can provide novel insights into the etiology of complex diseases. The agnostic, genome-wide approach has played a key role in understanding the genetic basis of many rare genetic diseases. Often prior to the discovery of the underlying genetic factors for a disease, the causal mechanisms were unknown. For example, the discoveries of BRCA1 and BRCA2 (1) as causal factors for familial breast cancer and Merlin (2), the causal factor for NF2, led to fundamental changes in the understanding of biological mechanisms for these diseases, which could not have been predicted a priori. In the study of more complex diseases, it seems likely that aside from the discovery of novel pathways that cause diseases, complex interactions may also be required for disease causation. Large collections of cases and controls are needed to characterize joint effects from multiple loci.

In order to provide a more empirically driven review of the literature, I have chosen to evaluate several recent publications that have successfully applied genome-wide association analyses to identify genetic factors for complex diseases. From these case studies, I have extracted details that provide insights into further design of successful GWAS. Although theoretical population genetic arguments have played a key role in encouraging the use of GWAS, many of the specific pressures from human evolution that affect the power of GWAS, such as effects from recurrent mutations and positive or negative selective pressure on specific loci, are unknown. For some populations such as the admixed populations of the New World, studies can be developed to specifically take the advantage of the impact that recent events in human history have had upon the genetic architecture of these populations.

CASE STUDY 1: THE IDENTIFICATION OF COMPLEMENT FACTOR H AS A CAUSAL FACTOR FOR AGE-RELATED MACULAR DEGENERATION

Three reports in 2005 identified complement factor H (CFH) as a causal factor for age-related macular degeneration (AMD) (3-5). Results of these studies clearly indicated that CFH plays a key role in the etiology of AMD degeneration, a disease that is influenced by multiple environmental factors such as obesity and smoking, and that otherwise appears complex in etiology. The development of AMD results from the accumulation of drusen, which can be measured in a semi-quantitative fashion as the area of the retina affected. Although the development of drusen in affected subjects appears to reflect in part an inflammatory process, CFH was only one of very many potential candidates genes for AMD.

The three manuscripts applied a comprehensive set of approaches to identify CFH as a causal factor for AMD. Klein et al. (3) used an agnostic, GWAS approach. This approach would provide a comprehensive view of the impact that common variants have upon disease risk if all causal loci were in perfect linkage disequilibrium with at least one of the SNPs being genotyped. When linkage disequilibrium is not perfect, power to detect associations can be variable. Linkage disequilibrium indicates the association between alleles at two loci. For diallelic loci, let the first locus have alleles A and a and let the second locus have alleles B and b. A haplotype is the pair of alleles that were derived from a common ancestor, or a pair of alleles that are physically located on the same chromosome. We let PAB be the haplotype frequency for the AB haplotype in a population of interest, and PA and PB be the frequencies of A and B alleles. Then the unstandardized measure of disequilibrium, D, is PABPAPB. The range of D depends upon the allele frequencies, so it is usually standardized (6). D' gives insight into the conservation of ancestral haplotypes over time. For association studies, the measure R2 = D2/(PAPBPaPb) is preferred, as it relates directly to the correlation between the alleles at the two loci. If an SNP is used as a surrogate for a causal allele, then the sample size required to detect an association increases proportionally to 1/R2. Because causal SNPs are usually unknown and there are more SNPs than can be genotyped with current platforms, GWAS similar to that performed by Klein et al. (3) queries SNPs that have been selected to be relatively common (usually with minor allele frequencies > 5%) and that are maximally correlated with SNPs that have been identified through the ongoing efforts of the International HapMap Project (7,8). The actual value of R2 is also influenced by the allele frequencies at the two loci under consideration. Letting A denote the locus with the rarer minor allele, the maximal range of R2PAPb/PaPB when D' = 1. As shown in Figure 1, as the allele frequencies deviate from equality, the association between the two loci quickly decreases. The power to detect associations depends strongly upon the allele frequencies at the causal locus and the linked locus.

Figure 1.

Figure 1

Maximal values of R2 between two loci. Lines correspond to minor allele frequencies at one locus (with alleles A and a). The X-axis indicates allele frequencies at the second locus.

The publication of Klein et al. (3) is particularly remarkable given the small sample size used in the study (96 cases and 50 controls) and the limited number of markers that were genotyped (116 204, with 105 980 analyzed). However, there are several contributing factors that helped to assure the success of this study. First, the genetic contribution in the development of AMD is relatively strong for a complex disorder. The increased risk for first-degree relatives of individuals with AMD is 4.2 and this risk is increased to nearly 20 for late-onset AMD appearing in a relative of a case with late-onset AMD (9). The relatively strong genetic contribution to AMD supports the application of genomic studies to identify risk factors.

A second factor that improved the power of this study was the consideration of only the patients with a very carefully delineated clinical disease presentation. The patients were selected from the Age Related Eye Disease Study (10) and had to have both AMD as well as the presence of at least one large drusen > 125 μm in diameter and drusen in total covering a circle of diameter 1061 μm in one eye. A feature that strongly impacts the power of an association study to detect a causal genetic factor is the presence of genetic heterogeneity. Genetic heterogeneity (h) in the case population due to either poor phenotype definition of the cases or variation in the allele associated with the disease has the effect of increasing the sample size requirement as a function, 1/(1 – h)2. Thus the careful phenotype definition used by this study helped to assure adequate power.

As described earlier, power to detect associations in a GWAS depends critically upon a high level of LD between the causal factor and the SNPs that are queried. After association with rs380390 was noted, data from the International HapMap Project was used to show that the CFH gene lies within a region showing LD to this SNP. Resequencing of the CFH exons identified a causal variant, which was found on 97% of the haplotypes that showed the strongest association of AMD. Thus, this study was fortunate to have included an SNP showing very strong LD with the causal variant so that the power to detect association was only minimally affected by not directly genotyping the causal variant. The current need for resequencing loci during disease discovery studies has been discussed by Taylor et al. (11). Only a limited number of individuals were completely sequenced during the initial SNP discovery phases of the International HapMap Project, and many uncommon and rare SNPs in healthy subjects remain undiscovered. Taylor et al. (11) found that only ∼50% of genes had adequate coverage by SNPs to allow haplotype frequencies to be adequately estimated, based only upon inclusion of SNPs from HapMap projects.

Finally, Klein et al. (3) were able to select from the AREDS control population subjects that ethnically matched the cases. Increasingly, structure within racial groups is being recognized as a potential confounder for genome association studies. In order for ethnic background to create confounding, there must be differences among ethnic groups with respect to both the marker allele frequencies as well as the causal locus, which influence disease risk. Some diseases such as lung cancer incidences (12) and celiac disease (13) vary greatly in incidence across Europe. As an example of the confounding that can occur, Campbell et al. (14) showed a strong spurious association between height and the lactase allele that permits adult catabolism of lactase, both of which show strong North/South European variations in allele frequency. To estimate Northern and Southern European ancestry, Seldin et al. (15) developed panels of 400 ancestry informative markers. Within genome-wide association panels, sufficient numbers of markers are genotyped to permit North/South European ancestry separation. Markers to allow for racial stratification within admixed populations such as African-American (16,17) and Hispanic populations (18) have also been developed. However, studies of admixed populations can search for genomic regions showing an excess of alleles from one of the ancestral populations as an alternative mapping strategy (19,20).

Several approaches have been developed to estimate and then to condition on unknown population stratification. Unobserved stratification in the sample causes the distribution of hypothesis tests for association to follow a non-central χ2 distribution for which relatively simple adjustments can be applied (21), but these may not be sufficiently sensitive to adjust for confounding effects which may vary among the markers being studied (22). A Bayesian approach was developed to identify population structure (23) and then to condition on it (24) (http://pritch.bsd.uchicago.edu/software.html). This approach provides insight into the latent stratification within a population but benefits greatly from some prior insight concerning the number of strata. Principle components analysis can identify factors that explain the most interindividual variability among the study subjects, thus providing an approach to adjust for latent population structure (25). Finally, latent class analysis is a highly effective approach for jointly estimating stratification membership while concomitantly performing association analysis (26). Purcell et al. (27) developed a user-friendly software that performs latent class analysis without respect to case–control status.

The report by Edwards et al. (4) to identify CFH as a causal factor for AMD used a candidate SNP approach. A genomic region showing evidence for linkage to AMD had previously been described. Therefore, Edwards et al. (4) queried only non-synonymous SNPs with exons of genes that were known to reside within the linkage region. Roeder et al. (28) provided an approach for upweighting genomic regions that support evidence for linkage. Jorgenson and Witte (29) advocated querying primarily coding SNPs in genic regions to increase power because only those SNPs that most likely to show a causal relationship to disease are queried. Unfortunately, a single most effective approach to correcting for the multiple tests that are required in GWAS has yet to be defined. The frequentist approach would correct for only those tests that have been performed so that candidate gene studies or restriction of analysis to coding SNPs or regions under linked regions increases power because a less severe correction for multiple testing is required. A Bonferroni or Sidaăk correction can be applied, but because this correction does not allow for correlation among the tests, it is often conservative. False discovery rate procedures are not generally conservative but also have difficulty in adjusting for strong correlations among the tests. Permutation testing approaches have been developed (30,31) and they can appropriately allow for correlation among the tests. The HapMap Consortium has proposed that GWAS adjusts for the total number of independent tests, which is projected to require P < 5.6 × 10−8 for Caucasians and P < 2.4 × 10−8 for Yorubans.

The final manuscript (5) identifying CFH as causal for AMD used a family-based design initial discovery followed by confirmatory analysis using a larger collection of independent cases and controls. Family-based designs provide a valuable tool for association studies because they can condition on familial ethnic background, thus eliminating potential biases from confounding. They also can be used to check for non-faithful segregation of alleles which reflects effects from copy number variation. Copy number variations have been increasingly associated with some human diseases such as autism (32). Haines et al. (5) also demonstrate the considerable gain in power that can be achieved by sampling through probands that have affected siblings as was originally noted by Risch and Merikangas (33) and has subsequently been noted in additional studies (34). Although about twice as many independent case–control samples were analyzed as independent sib pairs, the significance level achieved by either group was comparable. In Figure 2, I present a depiction of the power to detect associations using either cases sampled regardless of family history versus cases selected to have a family history. One can see a dramatic improvement in power when cases are selected on the basis of a positive family history because there is enrichment for genetic causes of disease.

Figure 2.

Figure 2

Power to detect a 4-fold increased relative risk for a dominant disease with 400 cases and 400 controls; 14% prevalence, 1% significance and 20% allele frequency of the causal allele. The X-axis indicates the SNP marker allele frequency, the Y-axis denotes D' between the marker and the causal allele and the Z-axis denotes power. The left panel indicates the power for cases sampled without reference to family history, whereas the right panel indicates power for cases sampled because they have an affected sibling.

CASE STUDY 2: THE IDENTIFICATION OF NOVEL GENETIC FACTORS FOR TYPE II DIABETES

Recently, several large genome-wide association studies have been presented, documenting the strength of these approaches to unravel the etiology of complex diseases. Scott et al. (35) provide a particularly well-conducted study that reflects advances in the design and conduct of GWAS. In this analysis of the Finnish US Investigation of NIDDM (FUSION), a multistage design was implemented to improve the power to detect associations by using all of the available data while reducing cost. The initial stage of the GWAS used data from 1215 Finnish Type II diabetics (T2D) who had undergone extensive quantitative phenotyping studies and 1161 Finnish normal glucose tolerant controls. This stage used an Illumina Human Hap300 Bead Chip. Alleles from an additional 1.7 million markers that are in LD with those that were directly genotyped were inferred from the tagging SNPs that were genotyped. Sophisticated programs to infer ungenotyped markers have been developed (27,36), and methods described by Scott et al. (35) are further developments. The inference of these markers permitted an integration of the data from this study, with a parallel study being conducted by the Wellcome Trust Case Control Consortium (WTCCC) (37) which replicated its findings using data from the Diabetes Genetics Initiative (DGI) and which used the Affymetrix GeneChip Human Mapping 500K Array Set for its first stage. Zeggini et al. (37) genotyped a larger number of SNPs in a second phase including 5367 SNPs with P-values < 10−2 to 10−5. A joint analysis was conducted by Scott et al. (35) using data from the Fusion Stage I and WTCCC studies to identify a panel of 82 markers to be further genotyped using an additional 1215 Finnish cases and 1258 controls. Analyses were conducted using data from both stages of analysis as well as from the analyses conducted by the WTCCC and DGI. Results from the FUSION-only stages significantly implicated only an SNP in TCF7L2, which had previously been identified. Results from the jointly analyzed WTCCC and DGI significantly implicated the FTO and DKAL1 which are novel findings. However, joint analysis of all of the data showed significant associations with FTO, CDKAL1, HHEX, CDKN2B, IGF2BP2, SLC30A8, TCF7L2 and KCNJ11. For Zeggini et al. (37), the completion of a study of a larger set of markers in the confirmation phase appears to have provided an opportunity for more SNPs to reach significance and indicates that the limited set of markers studied by Scott et al. (35) in their second phase may have lead to the missing of some significant associations. The stochastic characteristic of SNP detection that requires being more inclusive for second-stage genotyping has been noted by other groups performing GWAS for complex diseases (38). Methods for the design of multistage studies have been described (39,40), and these usually require that a substantial percentage (1−10%) of markers are retained from the first to the second stage of analysis.

Factors that argue for a single stage of genotyping include much higher costs in the second stage and the study of multiple traits, otherwise a two-stage design is cost-efficient.

Results from these combined analyses show that no single SNP is associated with odds ratios exceeding 1.4 so that only by combining data from multiple studies could a reliable signal be obtained. An adverse collection of multiple SNPs can identify a subset of individuals who are at approximately 4-fold increased risk for type II diabetes, but this level of risk has not yet provided a meaningful risk prediction. However, the identification of novel genes involved in the pathophysiology of type II diabetes should provide new insights into the biology of this complex disease. The FUSION study has collected extensive quantitative information on risk factors relating to type II diabetes. As novel risk factors are identified, their impact upon these quantitative factors can be assessed to provide information about the physiological mechanisms.

CONCLUSION

Genome-wide association studies have become an extremely valuable tool for unraveling the etiology of complex diseases. In this review, I have characterized some of the features of published, successful GWAS. Factors that increase the likelihood of success include the study of diseases or more narrowly defined phenotypes that show a strong component of risk, the study of cases with a family history of disease and the study of ethnically homogeneous populations. Studies in the USA and other potentially stratified populations can adjust for the stratification by using appropriate analytical tools. The actual associations that can be detected by GWAS of complex diseases are expected to be relatively weak and these are further attenuated by incomplete linkage disequilibrium between the markers that are queried and the causal factors. If there are recurrent mutations at the same locus which cause the same disease phenotype, then linkage disequilibrium can be greatly attenuated so that it becomes difficult to assure that all loci predisposing for a disease can be identified by an association study (41), and alternative procedures such as linkage analysis or joint linkage and association analysis (42) are useful for such situations. In order for GWAS to be successful, it is anticipated that large sample sizes will generally be required, typically comprising at least several thousand cases and controls for more complex diseases, such as type II diabetes and breast cancer, that show only moderately increased risks to siblings. Despite challenges in managing very large collections of samples and data, the promise from large-scale GWAS are being realized and will present new insights into the biology of complex diseases.

ACKNOWLEDGEMENTS

The author thanks Lynn Carrasco for assisting in the preparation of this manuscript. Partial support is provided by NIH grants ES09912, 2 PO1 CA34936 and N01 AR 8 2232.

Footnotes

Conflict of Interest statement. None declared.

REFERENCES

  • 1.Boulton SJ. Cellular functions of the BRCA tumour-suppressor proteins. Biochem. Soc. Trans. 2006;34:633–645. doi: 10.1042/BST0340633. [DOI] [PubMed] [Google Scholar]
  • 2.Okada T, You L, Giancotti FG. Shedding light on Merlin's wizardry. Trends Cell Biol. 2003;5:222–229. doi: 10.1016/j.tcb.2007.03.006. [DOI] [PubMed] [Google Scholar]
  • 3.Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Edwards AO, Ritter R, III, Abel KJ, Manning A, Panhuysen C, Farrer LA. Complement factor H polymorphism and age-related macular degeneration. Science. 2005;308:421–424. doi: 10.1126/science.1110189. [DOI] [PubMed] [Google Scholar]
  • 5.Haines JL, Hauser MA, Schmidt S, Scott WK, Olson LM, Gallins P, Spencer KL, Kwan SY, Noureddine M, Gilbert JR, et al. Complement factor H variant increases the risk of age-related macular degeneration. Science. 2005;308:419–421. doi: 10.1126/science.1110359. [DOI] [PubMed] [Google Scholar]
  • 6.Devlin B, Risch N. A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics. 1995;29:311–322. doi: 10.1006/geno.1995.9003. [DOI] [PubMed] [Google Scholar]
  • 7.de Bakker PI, Yelensky, Pe'er R, Gabriel SB, Daly MJ, Altshuler D. Efficiency and power in genetic association studies. Nat. Genet. 2005;37:1217–1223. doi: 10.1038/ng1669. [DOI] [PubMed] [Google Scholar]
  • 8.Zeggini E, Rayner W, Morris AP, Hattersley AT, Walker M, Hitman GA, Deloukas P, Cardon LR, McCarthy MI. An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets. Nat. Genet. 2005;37:1320–1322. doi: 10.1038/ng1670. [DOI] [PubMed] [Google Scholar]
  • 9.Klaver CC, Wolfs RC, Assink JJ, van Duijn CM, Hofman A, de Jong PT. Genetic risk of age-related maculopathy. Population-based familial aggregation study. Arch. Ophthalmol. 1998;116:1646–1651. doi: 10.1001/archopht.116.12.1646. [DOI] [PubMed] [Google Scholar]
  • 10.AREDS research group A randomized, placebo-controlled, clinical trial of high-dose supplementation with vitamins C and E, beta carotene, and zinc for age-related macular degeneration and vision loss: AREDS report no. 8. Arch. Ophthalmol. 2001;119:1417. doi: 10.1001/archopht.119.10.1417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Taylor JA, Xu ZL, Kaplan NL, Morris RW. How well do HapMap haplotypes identify common haplotypes of genes? A comparison of 334 genes resequenced in the environmental genome project. Cancer Epidemiol. Biomarkers Prev. 2006;15:133–137. doi: 10.1158/1055-9965.EPI-05-0641. [DOI] [PubMed] [Google Scholar]
  • 12.Ferlay J, Autier P, Boniol M, Heanue M, Colombet M, Boyle P. Estimates of the cancer incidence and mortality in Europe in 2006. Ann. Oncol. 2007;18:581–592. doi: 10.1093/annonc/mdl498. [DOI] [PubMed] [Google Scholar]
  • 13.Dube C, Rostom A, Sy R, Cranney A, Saloojee N, Garritty C, Sampson M, Zhang L, Yazdi F, Mamaladze V, et al. The prevalence of celiac disease in average-risk and at-risk Western European populations: a systematic review. Gastroenterology. 2005;128(4 Suppl 1):S57–S67. doi: 10.1053/j.gastro.2005.02.014. [DOI] [PubMed] [Google Scholar]
  • 14.Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, Altshuler D, Ardlie KG, Hirschhorn JN. Demonstrating stratification in a European American population. Nat. Genet. 2005;37:868–872. doi: 10.1038/ng1607. [DOI] [PubMed] [Google Scholar]
  • 15.Seldin MF, Shigeta R, Villoslada P, Selmi C, Tuomilehto J, Silva G, Belmont JW, Klareskog L, Gregersen PK. European population substructure: clustering of northern and southern populations. PLoS Genet. 2006;15 doi: 10.1371/journal.pgen.0020143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tian C, Hinds DA, Shigeta R, Kittles R, Ballinger DG, Seldin MF. A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am. J. Hum. Genet. 2006;79:640–649. doi: 10.1086/507954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, Malasky MJ, Scafe C, Le E, et al. A high-density admixture map for disease gene discovery in African-Americans. Am. J. Hum. Genet. 2004;74:1001–1013. doi: 10.1086/420856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Price AL, Patterson N, Yu F, Cox DR, Waliszewska A, McDonald GJ, Tandon A, Schirmer C, Neubauer J, Bedoya G, et al. A genomewide admixture map for Latino populations. Am. J. Hum. Genet. 2007;80:1024–1036. doi: 10.1086/518313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Montana G, Pritchard JK. Statistical tests for admixture mapping with case–control and cases-only data. Am. J. Hum. Genet. 2004;75:771–789. doi: 10.1086/425281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Redden DT, Divers J, Vaughan LK, Tiwari HK, Beasley TM, Fernandez JR, Kimberly RP, Feng R, Padilla MA, Liu N, Miller MB, Allison DB. Regional admixture mapping and structured association testing: conceptual unification and an extensible general linear model. PLoS Genet. 2006;2:e137. doi: 10.1371/journal.pgen.0020137. Epub July 18, 2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Devlin B, Roeder K, Wasserman L. Genomic control for association studies: a semiparametric test to detect excess-haplotype sharing. Biostatistics. 14:369–387. doi: 10.1093/biostatistics/1.4.369. [DOI] [PubMed] [Google Scholar]
  • 22.Purcell S, Sham P. Properties of structured association approaches to detecting population stratification. Hum. Hered. 2004;58:93–107. doi: 10.1159/000083030. [DOI] [PubMed] [Google Scholar]
  • 23.Pritchard JK, Rosenberg NA. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 1999;65:220–228. doi: 10.1086/302449. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am. J. Hum. Genet. 2000;67:170–181. doi: 10.1086/302959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
  • 26.Epstein MP, Allen AS, Satten GA. A simple and improved correction for population stratification in case–control studies. Am. J. Hum. Genet. 2007;80:921–930. doi: 10.1086/516842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Purcell S, Neale B, Todd-Brown K, et al. PLINK: a toolset for whole genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007 doi: 10.1086/519795. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Roeder K, Bacanu SA, Wasserman L, Devlin B. Using linkage genome scans to improve power of association in genome scans. Am. J. Hum. Genet. 2006;78:243–252. doi: 10.1086/500026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Jorgenson E, Witte JS. A gene-centric approach to genome-wide association studies. Nat. Rev. Genet. 2006;11:885–891. doi: 10.1038/nrg1962. [DOI] [PubMed] [Google Scholar]
  • 30.Lin DY. An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics. 2005;21:781–787. doi: 10.1093/bioinformatics/bti053. [DOI] [PubMed] [Google Scholar]
  • 31.International HapMap Consortium A haplotype map of the human genome. Nature. 2005;437:1299. doi: 10.1038/nature04226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Autism Genome Project Consortium. Szatmari P, Paterson AD, Zwaigenbaum L, Roberts W, Brian J, Liu XQ, Vincent JB, Skaug JL, Thompson AP, et al. Mapping autism risk loci using genetic linkage and chromosomal rearrangements. Nat. Genet. 2007;39:319–328. doi: 10.1038/ng1985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1517. doi: 10.1126/science.273.5281.1516. [DOI] [PubMed] [Google Scholar]
  • 34.Rudd MF, Webb EL, Matakidou A, Sellick GS, Williams RD, Bridle H, Eisen T, Houlston RS, GELCAPS Consortium Variants in the GH-IGF axis confer susceptibility to lung cancer. Genome Res. 2006;16:693–701. doi: 10.1101/gr.5120106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–1345. doi: 10.1126/science.1142382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Nicolae DL. Testing untyped alleles (TUNA)-applications to genome-wide association studies. Genet. Epidemiol. 2006;30:718–727. doi: 10.1002/gepi.20182. [DOI] [PubMed] [Google Scholar]
  • 37.Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, et al. Replication of genome-wide association signals in U.K. samples reveals risk loci for type 2 diabetes. Science. 2007;316:1336–1341. doi: 10.1126/science.1142364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger D, Struewing JP, Morrison J, Field H, Luben R, et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007 doi: 10.1038/nature05887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 2006;38:209–213. doi: 10.1038/ng1706. [DOI] [PubMed] [Google Scholar]
  • 40.Wang H, Thomas DC, Pe'er I, Stram DO. Optimal two-stage genotyping designs for genome-wide association scans. Genet. Epidemiol. 2006;30:356–368. doi: 10.1002/gepi.20150. [DOI] [PubMed] [Google Scholar]
  • 41.Slager SL, Huang J, Vieland VJ. Effect of allelic heterogeneity on the power of the transmission disequilibrium test. Genet. Epidemiol. 2006;18:143–156. doi: 10.1002/(SICI)1098-2272(200002)18:2<143::AID-GEPI4>3.0.CO;2-5. [DOI] [PubMed] [Google Scholar]
  • 42.Li M, Boehnke M, Abecasis GR. Joint modeling of linkage and association: identifying SNPs responsible for a linkage signal. Am. J. Hum. Genet. 2005;76:934–949. doi: 10.1086/430277. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES