Abstract
Genome-wide association (GWA) studies have been extremely successful in identifying novel loci contributing effects to a wide range of complex human traits. However, despite this success, the joint marginal effects of these loci account for only a small proportion of the heritability of these traits. Interactions between variants in different loci are not typically modelled in traditional GWA analysis, but may account for some of the missing heritability in humans, as they do in other model organisms. One of the key challenges in performing gene-gene interaction studies is the computational burden of the analysis. We propose a two-stage interaction analysis strategy to address this challenge in the context of both quantitative traits and dichotomous phenotypes. We have performed simulations to demonstrate only a negligible loss in power of this two-stage strategy, while minimizing the computational burden. Application of this interaction strategy to GWA studies of T2D and obesity highlights potential novel signals of association, which warrant follow-up in larger cohorts. Genet. Epidemiol. 2011.© 2011 Wiley Periodicals, Inc.35: 800-808, 2011
Keywords: genome-wide association study, gene-gene interaction, computational efficiency
INTRODUCTION
In the search for novel loci contributing effects to complex human traits, the success of genome-wide association (GWA) studies has been well published [McCarthy et al., 2008]. These studies have improved our understanding of the genetic architecture of complex traits, often implicating pathways that might previously have been overlooked as promising biological candidates. However, despite this success, much of the genetic component of most complex traits is still, as yet, unexplained. For example, despite meta-analysis of GWA studies of many thousands of individuals from closely related populations, not more than 10% of the familial aggregation of type 2 diabetes (T2D) can be attributed to more than 35 established disease loci [Dupuis et al., 2010; Voight et al., 2010].
GWA studies are typically analyzed using single-locus methods, testing for the association of the trait with each SNP, in turn, across the genome. This approach is well powered to detect association with common causal variants with moderate marginal effects on the trait [The Wellcome Trust Case Control Consortium, 2007]. However, there is increasing evidence from model organisms that quantitative traits may be influenced by complex interplay between genes [Flint and MacKay, 2009], with the consequence that the effect of genotypes at one locus is modified, or even masked, by genotypes at other loci [Cordell, 2009]. Within this paradigm, individual causal variants need not exhibit strong marginal effects, but together may contribute to the overall trait variance. If these gene-by-gene (G × G) interactions also exist in humans, they may thus account for some of the “missing heritability” of complex traits, but will not be easily identified through single-SNP analysis of GWA studies, irrespective of sample size.
To date, there have been few examples of significant evidence of G × G interaction in human GWA studies. Recently, “compelling” evidence of interaction between SNPs in HLA-C and ERAP1 has been demonstrated in a GWAS of psoriasis, although both these loci demonstrate strong marginal effects on the disease [The Genetic Analysis of Psoriasis Consortium and the Wellcome Trust Case Control Consortium 2, 2010]. The fact that very few interactions have been observed may not preclude their existence. Rather, this may simply reflect the reluctance of researchers to undertake GWA interaction studies, primarily because of the large number of tests that are required. For example, in a GWA study of 500,000 SNPs, a complete two-locus interaction scan of the genome necessitates 500,000C2 (500,000 “choose” 2) or ∼125 billion tests. As a result, a more stringent significance threshold is required for G × G interactions than for marginal effects in order to allow for the burden of additional multiple testing. Alternative strategies have been proposed to reduce the number of tests required by focusing, for example, only on interactions between variants with at least some marginal evidence of association [Evans et al., 2006]. However, over a wide range of models of G × G interactions, such two-stage strategies lack power compared to a full two-locus scan of the genome, despite the reduced penalty for multiple testing. With such stringent significance levels, much larger sample sizes will be required to detect interaction effects with the same contribution to the phenotypic variance than marginal effects identified through single-locus analysis. Nevertheless, meta-analysis of GWA studies across large-scale international consortia [Barrett et al., 2009; Dupuis et al., 2010; Voight et al., 2010; Lango Allen et al., 2010] may provide sufficient power to detect large G × G interaction effects, provided that an appropriate statistical testing framework is available.
An additional challenge arising from the number of tests in GWA G × G interaction studies is computational, particularly with the large sample sizes that will be required to detect these effects. We typically assess the statistical evidence for interaction effects by means of likelihood-ratio tests in a generalized linear modeling (GLM) framework, which can accommodate both quantitative and dichotomous traits. This framework is extremely flexible and can allow for covariates to adjust for potential environmental risk factors or population structure. However, maximum-likelihood solutions for these models generally require application of numerical optimization algorithms, which are computationally demanding, and thus infeasible for GWA studies using a single processor. To alleviate this problem, Purcell et al. [2007] have implemented a “fast-epistasis” procedure in PLINK to allow rapid-testing of G × G interactions for dichotomous traits. Their approach focuses on a comparison of odds ratios from a 2 × 2 contingency table of allele counts at a pair of SNPs between cases and controls (Table I). They demonstrate that P-values from the “fast-epistasis” procedure approximate those from a GLM, and thus can be used for screening of pairs of SNPs for detailed follow-up with more complex interaction modeling techniques, with minimal computation cost.
Table I.
SNP 2 | |||
---|---|---|---|
SNP 1 | MM | Mm | mm |
Cases | |||
MM | |||
Mm | |||
mm | |||
Controls | |||
MM | |||
Mm | |||
mm |
SNP 2 | |||
---|---|---|---|
SNP 1 | M | m | |
Cases | |||
M | |||
m | |||
Controls | |||
M | |||
m |
In this study, we propose a two-stage strategy for rapid testing of G × G interactions in GWA studies of quantitative traits. In the first stage, all pairs of SNPs are screened for evidence of interaction using a computationally efficient test akin to the “fast-epistasis” approach in PLINK. In the next stage, all those pairs of SNPs achieving a nominal significance threshold are carried forward for more detailed modeling in the GLM framework. We perform simulations to determine an appropriate significance threshold for screening interactions in the rapid-testing stage so as to minimize computation time without substantial loss of power compared to a complete two-locus scan of the genome in the GLM framework. Our testing strategy has been applied to GWA G × G interaction studies of T2D and obesity [The Wellcome Trust Case Control Consortium, 2007]. Our results highlight potential novel signals of association that would not have been identified through traditional single-locus analysis, but which warrant follow-up in larger cohorts.
MODEL AND METHODS
MODEL FORMULATION AND ANALYSIS FRAMEWORK
Consider a sample of N unrelated individuals genotyped for two SNPs. We denote the genotypes of the ith individual at these two SNPs by and , respectively. Genotypes are coded as 0 for the common homozygote, 1 for the heterozygote and 2 for the rare homozygote. Here, we consider binary or quantitative phenotypes, denoted yi for the ith individual. Then, under the assumption of an additive main effect of each SNP on the phenotype, β1 and β2, and an additive-additive interaction effect, β12, it follows that
(1) |
where g is the link function in a GLM framework. Within this framework, we can construct a likelihood ratio test of interaction between the two SNPs, Λ, by comparing the deviance of model (1) when β12 = 0, with that when β12 is unconstrained. The precise form of the test depends on the phenotype under investigation (Appendix), and has an approximate χ2 distribution with one degree of freedom, with a resulting P-value denoted PGLM.
The GLM framework is extremely flexible, and can be adapted to incorporate covariates to allow for nongenetic risk factors or population structure. However, obtaining model parameter estimates and testing for interactions between pairs of SNPs requires numerical algorithms that are too computationally demanding to be applied on a genome-wide scale.
RAPID G × G INTERACTION TESTING: BINARY PHENOTYPES
In the context of a binary phenotype, a computationally efficient approach for detecting G × G interactions is to compare SNP-SNP association between cases and controls. We begin by constructing contingency tables of alleles at the two SNPs in each phenotype group by collapsing the sample genotype data (Table I). A test of interaction between the SNPs is then given by comparing allelic odds ratios in the two groups, given by
and has an approximate χ2 distribution with one degree of freedom, with the resulting P-value denoted PFAST. In this expression,
and
This test has been implemented in PLINK [Purcell et al., 2007], and is invoked using the “fast-epistasis” command. The test statistic, X2, has a closed form, and thus can be applied to G × G interactions on a genome-wide scale. However, by collapsing genotype data to a contingency table of alleles, we are implicitly assuming Hardy-Weinberg equilibrium at each SNP, and may inflate type I error rates if violated. It is therefore recommended that X2 is used as a screening tool. All pairs of SNPs passing a pre-determined significance threshold, PFAST<αFAST, are subsequently tested for interaction in the GLM, which is robust to Hardy-Weinberg disequilibrium. The choice of significance threshold represents a trade off between power to detect interaction effects and computation time: the more stringent αFAST, the fewer pairs of SNPs are tested in the GLM, but consequently the more likely a true interaction effect will be overlooked.
RAPID G × G INTERACTION TESTING: QUANTITATIVE PHENOTYPES
One simple approach to construct a rapid test of interaction between pairs of SNPs in the context of a quantitative trait is to dichotomize the phenotype, and proceed as described above. Some study designs focus on the ascertainment of individuals from the extremes of the trait distribution, and thus naturally form a dichotomy. However, when individuals are selected entirely at random with respect to the quantitative trait, we can define a quasi-case-control phenotype, denoted y for the ith individual, and given by
where yMED is the median of the distribution.
INTERACTION TESTING BETWEEN IMPUTED VARIANTS
The methodology described above can be easily applied to imputed genotypes at variants not typed directly as part of the study, but which are present on high-density reference panels, such as those made available through the international HapMap project [The International HapMap Consortium, 2007]. Typically, imputation generates a distribution of possible genotype calls for the ith individual at the jth SNP, denoted , and , respectively, for the common homozygote, heterozygote and rare homozygote [Marchini et al., 2007; Howie et al., 2009]. In the GLM, we replace the observed genotypes, and , at directly typed SNPs, with their expectations under an allele-dose model, given by
In the rapid G × G interaction test, we replace the contingency table of observed genotype counts with expected counts from the imputation distribution, assuming no LD between the pair of SNPs.
ADJUSTMENT FOR COVARIATES
The GLM framework is extremely flexible and can be adapted to incorporate covariates to allow for non-genetic risk factors or population structure. However, it is not possible to adjust for these covariates directly in the rapid G × G interaction test of binary or quantitative phenotypes outlined above, without a loss in computational efficiency. Clearly, allowing for covariates only in the GLM in the second stage of analysis may lead to misleading results. We thus recommend the use of residuals after adjustment of the phenotype for covariates in both the initial rapid G × G interaction test and the subsequent detailed modeling analysis in the GLM framework.
SOFTWARE
The methodology described above has been implemented, for both quantitative and binary phenotypes, in the open-source IntRapid software, and is freely available for download from the website http://www.well.ox.ac.uk/INTRAPID. The software allows specification of the significance threshold, αFAST, for carrying forward pairs of SNPs from the rapid interaction screening phase, and can be applied to both directly genotyped and imputed variants.
SIMULATION STUDY
We have undertaken simulations to evaluate the power of the proposed rapid G × G interaction test in the context of a quantitative trait compared with that of the more computationally intensive GLM. We consider a range of models of two-locus interaction: additive-additive, pure additive, dominant-dominant and recessive-recessive. Each model is parameterized in terms of causal allele frequency at two SNPs, denoted q1 and q2, and an interaction component, ε. For any given model, this component can be used to obtain population mean trait values, γjk, for each two-locus genotype (Table II).
Table II.
SNP 2 | |||
---|---|---|---|
SNP 1 | MM | Mm | mm |
Additive-additive interaction | |||
MM | γ00 = ε–1 | γ01 = −0.5 | γ02 = −ε |
Mm | γ10 = −0.5 | γ11 = 0 | γ12 = 0.5 |
mm | γ20 = −ε | γ21 = 0.5 | γ22 = 1 + ε |
Pure additive interaction | |||
MM | γ00 = ε | γ01 = 0 | γ02 = −ε |
Mm | γ10 = 0 | γ11 = 0 | γ12 = 0 |
mm | γ20 = −ε | γ21 = 0 | γ22 = ε |
Dominant-dominant interaction | |||
MM | γ00 = 0 | γ01 = 0 | γ02 = 0 |
Mm | γ10 = 0 | γ11 = ε | γ12 = ε |
mm | γ20 = 0 | γ21 = ε | γ22 = ε |
Recessive-recessive interaction | |||
MM | γ00 = 0 | γ01 = 0 | γ02 = 0 |
Mm | γ10 = 0 | γ11 = 0 | γ12 = 0 |
mm | γ20 = 0 | γ21 = 0 | γ22 = ε |
For each model, we generate 1,000 replicates of a sample of 5,000 unrelated individuals. We generate the genotype of each individual at the two SNPs, denoted and , according to the causal allele frequencies, assuming HWE at both variants. The phenotype of the individual is then generated from a Gaussian distribution with mean and unit variance. For each replicate of phenotype-genotype data, we perform: (i) the rapid G × G interaction test to obtain PFAST; and (ii) the GLM interaction test to obtain PGLM. We then consider a range of significance thresholds, αFAST, for carrying forward pairs of SNPs from the rapid interaction screening phase. In particular, we consider αFAST = 1 as our benchmark, since this corresponds to no initial screening step (all pairs of SNPs will be carried forward for testing in the GLM). For each significance threshold, power is estimated by the proportion of replicates for which PFAST<αFAST and PGLM meets a nominal pair-wise genome-wide significance threshold of 10−10.
APPLICATION TO GWA STUDIES OF T2D AND OBESITY
The T2D component of the WTCCC [The Wellcome Trust Case Control Consortium, 2007] consists of 1,999 cases from the Diabetes UK Warren 2 repository, and 3,004 population controls from the 1958 British Birth Cohort (58C) and the UK National Blood Service (NBS). Samples were genotyped using the Affymetrix GeneChip 500K Mapping Array Set that incorporates 500,568 SNPs, genome-wide. The T2D cases were also measured for body mass index (BMI) to assess overall obesity, a phenotype that is typically adjusted for age and sex in downstream analyses.
We utilized the same quality control (QC) filters employed by the WTCCC to exclude samples and SNPs from the analysis, full details of which are presented in the description of the experiment [The Wellcome Trust Case Control Consortium, 2007]. In brief, samples were excluded on the basis of low call rate, outlying genome-wide heterozygosity, discrepancies in WTCCC and external identifying information, non-European ancestry, duplication and apparent relatedness. SNPs were excluded on the basis of low call rate, extreme deviation from HWE, differential allele or genotype frequencies between the two control cohorts, and manual visual inspection of genotype calls in cluster plots. To avoid any problems of sparsity in the two-locus genotype contingency tables (Table I), we restricted our analysis to SNPs with MAF of at least 5%.
For each pair of autosomal SNPs passing QC filters, we tested for: (i) interaction with T2D in WTCCC cases and controls using the rapid approach for binary traits; and (ii) interaction with log10BMI residuals after adjustment for age and sex in T2D cases using the rapid approach for quantitative traits. All pairs of SNPs achieving a screening significance threshold of PFAST<10−4 were then tested for interaction in the GLM.
RESULTS
SIMULATION STUDY
Figure 1 presents the power, at a nominal significance threshold of PGLM<10−10, of the proposed rapid interaction testing strategy for an additive-additive two-locus model of association, with causal allele frequency of 20% at both SNPs. Power is presented as a function of the additive-additive interaction component, ε (Table II), for a range of thresholds, αFAST, for carrying forward pairs of SNPs from the rapid testing stage. There is only a minimal loss in power for a threshold of αFAST = 10−4, compared to the benchmark of αFAST = 1 where all pairs of SNPs are tested for interaction in the computationally intensive GLM framework. However, for more stringent thresholds, the reduction in power is more noticeable.
Figure 2 presents the power, at a nominal significance threshold of PGLM<10−10, of the proposed rapid interaction testing strategy for two-locus models of association with causal allele frequency of 50% at both SNPs: (A) additive-additive; (B) pure additive; (C) dominant-dominant; and (D) recessive-recessive. Power is presented as a function of the additive-additive interaction component, ε (Table II), for a range of thresholds, αFAST, for carrying forward pairs of SNPs from the rapid testing stage. As before, there is only a minimal loss in power for a threshold of αFAST = 10−4, compared to the benchmark of αFAST = 1. The conclusions are consistent across the range of interaction models considered, despite the fact that our analyses assume additive effects only.
APPLICATION TO GWA STUDY OF T2D
A total of 4,862 individuals and 357,775 SNPs with MAF >5% passed QC filters. Using a rapid interaction testing threshold of αFAST = 10−4, a total of 6.17 million pairs of SNPs were carried forward for consideration in the more computationally intensive GLM framework. This analysis took 312 hr of computing time on a dedicated processor, compared to an expected 4,389 hr to test for interaction between all pairs of SNPs in the GLM framework.
No pairs of SNPs met a Bonferroni correction for multiple testing of all pairs of SNPs passing QC filters (P<7.8 × 10–13). Table III presents lead SNPs at each pair of loci demonstrating strong evidence for interaction (P<10−10) in the second-stage GLM, together with P-values from the first stage rapid test. Also presented are P-values for each SNP from a marginal single-locus test of association from a GLM incorporating only an additive effect. None of these SNPs show evidence of marginal association with T2D, and thus would not have been discovered through single-locus GWAS analysis under the assumption of an additive effect in the log-odds ratio.
Table III.
SNP 1 | SNP 2 | Interaction p-value | Single-locus p-value | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | Chromosome | Position | Locus | MAF | ID | Chromosome | Position | Locus | MAF | Rapid | GLM | SNP1 | SNP2 |
rs6421008 | 8 | 135581827 | ZFAT | 0.05 | rs7827545 | 8 | 135635749 | ZFAT | 0.32 | 1.8 × 10−14 | 1.6 × 10−11 | 5.1 × 10−1 | 1.8 × 10−3 |
rs10916293 | 1 | 224738665 | OBSCN | 0.33 | rs9314349 | 8 | 27530121 | CLU | 0.38 | 1.5 × 10−11 | 2.8 × 10−11 | 5.7 × 10−1 | 9.0 × 10−1 |
For each SNP, the nearest gene in the locus is indicated. Rapid and GLM interaction P-values are obtained from the two stages of the IntRapid analysis. Single-locus P-values are obtained from a GLM without interaction effects.
The strongest signal of interaction with T2D was detected between a pair of proximal SNPs in the ZFAT gene. The SNPs are only weakly correlated with each other (r2 = 0.062 and D′ = 1.000 in CEU 1000 Genomes Project pilot data), and this interaction could represent a haplotype effect where the risk alleles have a synergistic effect when in cis, for example. The ZFAT gene encodes a protein that likely functions as a transcriptional regulator involved in apoptosis, but has not been previously implicated in T2D or other metabolic traits. The second strongest signal of interaction with T2D was detected between a pair of SNPs in the OBSC gene and flanking the CLU gene, respectively. These genes have not been previously implicated in T2D. However, both genes are involved in carbohydrate and lipid metabolic process and in apoptosis, and thus might be reasonable candidates for interaction within related functional pathways.
APPLICATION TO GWA STUDY OF OBESITY
A total of 1,903 T2D cases and 375,159 SNPs with MAF>5% passed QC filters. We considered residuals of log10BMI, after adjustment for age and sex, as our phenotype. Using a rapid interaction testing threshold of αFAST = 10−4, a total of 7.14 million pairs of SNPs were carried forward for consideration in the more computationally intensive GLM framework. This analysis took 127 hr of computing time on a dedicated processor, compared to an expected 4,826 hr to test for the interaction between all pairs of SNPs in the GLM framework.
No pairs of SNPs met a Bonferroni correction for multiple testing of all pairs of SNPs passing QC filters (P<7.1 × 10−13). Table IV presents lead SNPs at each pair of loci demonstrating strong evidence for interaction (P<10−10) in the second-stage GLM, together with P-values from the first-stage rapid test. Also presented are P-values for each SNP from a marginal single-locus test of association from a GLM incorporating only an additive main effect. None of these SNPs show evidence of marginal association with BMI (adjusted for age and sex), and thus would not have been discovered through single-locus GWAS analysis under the assumption of an additive effect in the log-odds ratio. The loci implicated in the four strongest signals of pair-wise interaction have not been previously implicated in obesity or other metabolic traits.
Table IV.
SNP 1 | SNP 2 | Interaction P-value | Single-locus P-value | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | Chromosome | Position | Locus | MAF | ID | Chromosome | Position | Locus | MAF | Rapid | GLM | SNP1 | SNP2 |
rs9460779 | 6 | 22855116 | HDGFL1 | 0.13 | rs9928199 | 16 | 75649112 | MON1B | 0.09 | 1.2 × 10−5 | 2.5 × 10−11 | 9.9 × 10−1 | 1.5 × 10−2 |
rs12655480 | 5 | 178212877 | ZNF354B | 0.35 | rs2423646 | 20 | 12212116 | 0.39 | 2.3 × 10−8 | 2.5 × 10−11 | 6.9 × 10−1 | 2.9 × 10−1 | |
rs10067788 | 5 | 151313204 | GLRA1 | 0.07 | rs1023446 | 16 | 19321487 | TMC5 | 0.37 | 3.2 × 10−6 | 3.3 × 10−11 | 6.2 × 10−1 | 1.6 × 10−1 |
rs17684830 | 3 | 60910010 | FHIT | 0.17 | rs521446 | 9 | 133647868 | VAV2 | 0.47 | 1.0 × 10−6 | 5.5 × 10−11 | 7.4 × 10−1 | 4.7 × 10−1 |
For each SNP, the nearest gene in the locus is indicated. Rapid and GLM interaction P-values are obtained from the two stages of the IntRapid analysis. Single-locus P-values are obtained from a GLM without interaction effects.
DISCUSSION
Interactions between variants may contribute to the missing heritability of complex traits. However, such interactions have not typically been studied in GWA analyses because of a fear of a lack of power due to the large number of statistical tests, and the burden of computation. Emily et al. [2009] identified four potential interactions in GWA studies of Crohn's disease, bipolar disease, hypertension and rheumatoid arthritis [The Wellcome Trust Case Control Consortium, 2007], although these signals have yet to be followed-up in independent replication cohorts. More recently, investigation of interaction effects between pairs of SNPs within established loci for psoriasis identified the first G × G interactions with “compelling” evidence of association through GWA studies [The Genetic Analysis Of Psoriasis Consortium and The Wellcome Trust Case Control Consortium 2, 2010].
In this study, we have developed a novel GWA G × G interaction testing scheme, applicable to both binary phenotypes and quantitative traits. We employ a two-stage strategy, implemented in the IntRapid software. In the first stage, all pairs of SNPs are tested in a computationally efficient “rapid” interaction test. For a binary phenotype, our rapid interaction test is equivalent to the “fast-epistasis” approach utilized in PLINK for case-control association analysis [Purcell et al., 2007]. For a quantitative trait, we simply dichotomize the phenotype at the median, creating “pseudo-cases” and “pseudo-controls”, which can then be analyzed in the same way. In the second stage, all pairs of SNPs meeting a pre-determined significance threshold are carried forward for detailed analyses in a more computationally demanding GLM framework.
We have undertaken simulations to evaluate the power of the proposed two-stage rapid G × G interaction test in the context of a quantitative trait compared with that of the more computationally intensive GLM. Over a range of models of two-locus interaction, our results suggest that there is only a minimal loss in power compared to testing all pairs of SNPs in the GLM by carrying forward only those pairs meeting a significance threshold of PFAST<10−4 in the rapid testing stage. In this way, the number of regressions performed in the computationally intensive GLM framework is minimized, substantially reducing computation time, but at only minimal cost in terms of reduced power.
We have applied IntRapid to GWA studies of T2D and obesity [The Wellcome Trust Case Control Consortium, 2007] using a rapid-interaction testing significance threshold of PFAST<10−4. The rapid testing strategy reduced computation time by an order of magnitude compared with a full two-locus scan of the genome in a GLM framework. The rapid-interaction testing stage can also easily be parallelized, further reducing computation if a cluster of processors is available for analysis. Although our analysis did not reveal genome-wide significant evidence of interaction (at a Bonferroni corrected threshold), our results highlighted two pairs of loci with strong evidence of interaction (P<10−10) with T2D and four pairs of loci with strong evidence of interaction with BMI (adjusted for age and sex). Our T2D analysis highlighted an interaction between a pair of weakly correlated SNPs in the ZFAT gene, and an interaction between SNPs in genes involved in metabolic process and apoptosis. The biological relevance of the loci identified in the obesity interaction analysis is not so obvious. Further investigation of the potential interactions between pathways in which all these genes act is necessary. Although interesting, caution is advised to avoid over-interpretation of these results, particularly given the relatively small sample size of the GWA study for detecting interaction effects. Follow-up of these results in independent replication cohorts is now required to confirm their relevance to T2D and obesity.
The coming months promise an exciting period of research and development into methodology for the detection of G × G interactions and their application to GWA studies. With ever increasing sample sizes, made possible through meta-analysis of GWA studies across large-scale international consortia, we are perhaps now in a position, for the first time, to detect strong interaction effects for many complex traits, even with the enormous burden of multiple testing. Furthermore, with the availability of computationally efficient software, such as IntRapid, we expect that GWA G × G interaction studies will be a natural addition to traditional single-locus analysis, with the potential to discover many novel loci contributing effects to complex human traits.
Acknowledgments
This study makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the Wellcome Trust Case Control Consortium project was provided by the Wellcome Trust under award 076113.
A.P.M acknowledges financial support from the Wellcome Trust (grant number WT081682/Z/06/Z).
APPENDIX
Within the generalized linear modeling framework, we can construct a likelihood ratio test of interaction between the two SNPs, Λ, by comparing the deviance of model (1) when β12 = 0, with that when β12 is unconstrained. The precise form of the test depends on the phenotype under investigation, and has an approximate χ2 distribution with one degree of freedom.
BINARY PHENOTYPE
In the context of a binary phenotype, the deviance is given by
In this expression, the log-likelihood
is maximized over all unconstrained parameters, where
QUANTITATIVE PHENOTYPE
In the context of a quantitative phenotype, the deviance is given by
In this expression, the log-likelihood
is maximized over all unconstrained parameters, where
REFERENCES
- Barrett JC, Clayton DG, Concannon P, Akolkar B, Cooper JD, Erlich HA, Julier C, Morahan G, Nerup J, Nierras C, Plagnol V, Pociot F, Schuilenburg H, Smyth DJ, Stevens H, Todd JA, Walker NM, Rich SS Type 1 Diabetes Genetics Consortium. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature. 2009;41:703–707. doi: 10.1038/ng.381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cordell H. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009;10:392–404. doi: 10.1038/nrg2579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, Jackson AU, Wheeler E, Glazer NL, Bouatia-Naji N, Gloyn AL, Lindgren CM, Mägi R, Morris AP, Randall J, Johnson T, Elliott P, Rybin D, Thorleifsson G, Steinthorsdottir V, Henneman P, Grallert H, Dehghan A, Hottenga JJ, Franklin CS, Navarro P, Song K, Goel A, Perry JR, Egan JM, Lajunen T, Grarup N, Sparsø T, Doney A, Voight BF, Stringham HM, Li M, Kanoni S, Shrader P, Cavalcanti-Proença C, Kumari M, Qi L, Timpson NJ, Gieger C, Zabena C, Rocheleau G, Ingelsson E, An P, O'Connell J, Luan J, Elliott A, McCarroll SA, Payne F, Roccasecca RM, Pattou F, Sethupathy P, Ardlie K, Ariyurek Y, Balkau B, Barter P, Beilby JP, Ben-Shlomo Y, Benediktsson R, Bennett AJ, Bergmann S, Bochud M, Boerwinkle E, Bonnefond A, Bonnycastle LL, Borch-Johnsen K, Böttcher Y, Brunner E, Bumpstead SJ, Charpentier G, Chen YD, Chines P, Clarke R, Coin LJ, Cooper MN, Cornelis M, Crawford G, Crisponi L, Day IN, de Geus EJ, Delplanque J, Dina C, Erdos MR, Fedson AC, Fischer-Rosinsky A, Forouhi NG, Fox CS, Frants R, Franzosi MG, Galan P, Goodarzi MO, Graessler J, Groves CJ, Grundy S, Gwilliam R, Gyllensten U, Hadjadj S, Hallmans G, Hammond N, Han X, Hartikainen AL, Hassanali N, Hayward C, Heath SC, Hercberg S, Herder C, Hicks AA, Hillman DR, Hingorani AD, Hofman A, Hui J, Hung J, Isomaa B, Johnson PR, Jørgensen T, Jula A, Kaakinen M, Kaprio J, Kesaniemi YA, Kivimaki M, Knight B, Koskinen S, Kovacs P, Kyvik KO, Lathrop GM, Lawlor DA, Le Bacquer O, Lecoeur C, Li Y, Lyssenko V, Mahley R, Mangino M, Manning AK, Martínez-Larrad MT, McAteer JB, McCulloch LJ, McPherson R, Meisinger C, Melzer D, Meyre D, Mitchell BD, Morken MA, Mukherjee S, Naitza S, Narisu N, Neville MJ, Oostra BA, Orrö M, Pakyz R, Palmer CN, Paolisso G, Pattaro C, Pearson D, Peden JF, Pedersen NL, Perola M, Pfeiffer AF, Pichler I, Polasek O, Posthuma D, Potter SC, Pouta A, Province MA, Psaty BM, Rathmann W, Rayner NW, Rice K, Ripatti S, Rivadeneira F, Roden M, Rolandsson O, Sandbaek A, Sandhu M, Sanna S, Sayer AA, Scheet P, Scott LJ, Seedorf U, Sharp SJ, Shields B, Sigurethsson G, Sijbrands EJ, Silveira A, Simpson L, Singleton A, Smith NL, Sovio U, Swift A, Syddall H, Syvänen AC, Tanaka T, Thorand B, Tichet J, Tönjes A, Tuomi T, Uitterlinden AG, van Dijk KW, van Hoek M, Varma D, Visvikis-Siest S, Vitart V, Vogelzangs N, Waeber G, Wagner PJ, Walley A, Walters GB, Ward KL, Watkins H, Weedon MN, Wild SH, Willemsen G, Witteman JC, Yarnell JW, Zeggini E, Zelenika D, Zethelius B, Zhai G, Zhao JH, Zillikens MC, DIAGRAM Consortium; GIANT Consortium; Global BPgen Consortium. Borecki IB, Loos RJ, Meneton P, Magnusson PK, Nathan DM, Williams GH, Hattersley AT, Silander K, Salomaa V, Smith GD, Bornstein SR, Schwarz P, Spranger J, Karpe F, Shuldiner AR, Cooper C, Dedoussis GV, Serrano-Ríos M, Morris AD, Lind L, Palmer LJ, Hu FB, Franks PW, Ebrahim S, Marmot M, Kao WH, Pankow JS, Sampson MJ, Kuusisto J, Laakso M, Hansen T, Pedersen O, Pramstaller PP, Wichmann HE, Illig T, Rudan I, Wright AF, Stumvoll M, Campbell H, Wilson JF, Anders Hamsten on behalf of Procardis Consortium; MAGIC investigators. Bergman RN, Buchanan TA, Collins FS, Mohlke KL, Tuomilehto J, Valle TT, Altshuler D, Rotter JI, Siscovick DS, Penninx BW, Boomsma DI, Deloukas P, Spector TD, Frayling TM, Ferrucci L, Kong A, Thorsteinsdottir U, Stefansson K, van Duijn CM, Aulchenko YS, Cao A, Scuteri A, Schlessinger D, Uda M, Ruokonen A, Jarvelin MR, Waterworth DM, Vollenweider P, Peltonen L, Mooser V, Abecasis GR, Wareham NJ, Sladek R, Froguel P, Watanabe RM, Meigs JB, Groop L, Boehnke M, McCarthy MI, Florez JC, Barroso I. New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet. 2010;42:105–116. doi: 10.1038/ng.520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Emily M, Mailund T, Hein J, Schauser L, Schierup MH. Using biological networks to search for interacting loci in genome-wide association studies. Eur J Hum Genet. 2009;17:1231–1240. doi: 10.1038/ejhg.2009.15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Evans DM, Marchini J, Morris AP, Cardon LR. Two-stage two-locus models in genome-wide association. PLoS Genet. 2006;2:e157. doi: 10.1371/journal.pgen.0020157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Flint J, MacKay TF. Genetic architecture of quantitative traits in mice, flies and humans. Genome Res. 2009;19:723–733. doi: 10.1101/gr.086660.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5:e1000529. doi: 10.1371/journal.pgen.1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, Rivadeneira F, Willer CJ, Jackson AU, Vedantam S, Raychaudhuri S, Ferreira T, Wood AR, Weyant RJ, Segrè AV, Speliotes EK, Wheeler E, Soranzo N, Park JH, Yang J, Gudbjartsson D, Heard-Costa NL, Randall JC, Qi L, Vernon Smith A, Mägi R, Pastinen T, Liang L, Heid IM, Luan J, Thorleifsson G, Winkler TW, Goddard ME, Sin Lo K, Palmer C, Workalemahu T, Aulchenko YS, Johansson A, Carola Zillikens M, Feitosa MF, Esko T, Johnson T, Ketkar S, Kraft P, Mangino M, Prokopenko I, Absher D, Albrecht E, Ernst F, Glazer NL, Hayward C, Hottenga JJ, Jacobs KB, Knowles JW, Kutalik Z, Monda KL, Polasek O, Preuss M, Rayner NW, Robertson NR, Steinthorsdottir V, Tyrer JP, Voight BF, Wiklund F, Xu J, Hua Zhao J, Nyholt DR, Pellikka N, Perola M, Perry JR, Surakka I, Tammesoo ML, Altmaier EL, Amin N, Aspelund T, Bhangale T, Boucher G, Chasman DI, Chen C, Coin L, Cooper MN, Dixon AL, Gibson Q, Grundberg E, Hao K, Juhani Junttila M, Kaplan LM, Kettunen J, König IR, Kwan T, Lawrence RW, Levinson DF, Lorentzon M, McKnight B, Morris AP, Müller M, Suh Ngwa J, Purcell S, Rafelt S, Salem RM, Salvi E, Sanna S, Shi J, Sovio U, Thompson JR, Turchin MC, Vandenput L, Verlaan DJ, Vitart V, White CC, Ziegler A, Almgren P, Balmforth AJ, Campbell H, Citterio L, De Grandi A, Dominiczak A, Duan J, Elliott P, Elosua R, Eriksson JG, Freimer NB, Geus EJ, Glorioso N, Haiqing S, Hartikainen AL, Havulinna AS, Hicks AA, Hui J, Igl W, Illig T, Jula A, Kajantie E, Kilpeläinen TO, Koiranen M, Kolcic I, Koskinen S, Kovacs P, Laitinen J, Liu J, Lokki ML, Marusic A, Maschio A, Meitinger T, Mulas A, Paré G, Parker AN, Peden JF, Petersmann A, Pichler I, Pietiläinen KH, Pouta A, Ridderstråle M, Rotter JI, Sambrook JG, Sanders AR, Oliver Schmidt C, Sinisalo J, Smit JH, Stringham HM, Bragi Walters G, Widen E, Wild SH, Willemsen G, Zagato L, Zgaga L, Zitting P, Alavere H, Farrall M, McArdle WL, Nelis M, Peters MJ, Ripatti S, van Meurs JB, Aben KK, Ardlie KG, Beckmann JS, Beilby JP, Bergman RN, Bergmann S, Collins FS, Cusi D, den Heijer M, Eiriksdottir G, Gejman PV, Hall AS, Hamsten A, Huikuri HV, Iribarren C, Kähönen M, Kaprio J, Kathiresan S, Kiemeney L, Kocher T, Launer LJ, Lehtimäki T, Melander O, Mosley TH, Jr, Musk AW, Nieminen MS, O'Donnell CJ, Ohlsson C, Oostra B, Palmer LJ, Raitakari O, Ridker PM, Rioux JD, Rissanen A, Rivolta C, Schunkert H, Shuldiner AR, Siscovick DS, Stumvoll M, Tönjes A, Tuomilehto J, van Ommen GJ, Viikari J, Heath AC, Martin NG, Montgomery GW, Province MA, Kayser M, Arnold AM, Atwood LD, Boerwinkle E, Chanock SJ, Deloukas P, Gieger C, Grönberg H, Hall P, Hattersley AT, Hengstenberg C, Hoffman W, Mark Lathrop G, Salomaa V, Schreiber S, Uda M, Waterworth D, Wright AF, Assimes TL, Barroso I, Hofman A, Mohlke KL, Boomsma DI, Caulfield MJ, Adrienne Cupples L, Erdmann J, Fox CS, Gudnason V, Gyllensten U, Harris TB, Hayes RB, Jarvelin MR, Mooser V, Munroe PB, Ouwehand WH, Penninx BW, Pramstaller PP, Quertermous T, Rudan I, Samani NJ, Spector TD, Völzke H, Watkins H, Wilson JF, Groop LC, Haritunians T, Hu FB, Kaplan RC, Metspalu A, North KE, Schlessinger D, Wareham NJ, Hunter DJ, O'Connell JR, Strachan DP, Wichmann HE, Borecki IB, van Duijn CM, Schadt EE, Thorsteinsdottir U, Peltonen L, Uitterlinden AG, Visscher PM, Chatterjee N, Loos RJ, Boehnke M, McCarthy MI, Ingelsson E, Lindgren CM, Abecasis GR, Stefansson K, Frayling TM, Hirschhorn JN. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–838. doi: 10.1038/nature09410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JPA, Hirschhorn JN. Genome-wide association studies for complex traits: concensus, uncertainty and challenges. Nat Rev Genet. 2008;9:356–369. doi: 10.1038/nrg2344. [DOI] [PubMed] [Google Scholar]
- Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies via imputation of genotypes. Nat Genet. 2007;39:906–913. doi: 10.1038/ng2088. [DOI] [PubMed] [Google Scholar]
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The Genetic Analysis of Psoriasis Consortium, The Wellcome Trust Case Control Consortium 2. A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nat Genet. 2010;42:985–990. doi: 10.1038/ng.694. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. doi: 10.1038/nature06258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Voight BF, Scott LJ, Steinthorsdottir V, Morris AP, Dina C, Welch RP, Zeggini E, Huth C, Aulchenko YS, Thorleifsson G, McCulloch LJ, Ferreira T, Grallert H, Amin N, Wu G, Willer CJ, Raychaudhuri S, McCarroll SA, Langenberg C, Hofmann OM, Dupuis J, Qi L, Segrè AV, van Hoek M, Navarro P, Ardlie K, Balkau B, Benediktsson R, Bennett AJ, Blagieva R, Boerwinkle E, Bonnycastle LL, Bengtsson Boström K, Bravenboer B, Bumpstead S, Burtt NP, Charpentier G, Chines PS, Cornelis M, Couper DJ, Crawford G, Doney AS, Elliott KS, Elliott AL, Erdos MR, Fox CS, Franklin CS, Ganser M, Gieger C, Grarup N, Green T, Griffin S, Groves CJ, Guiducci C, Hadjadj S, Hassanali N, Herder C, Isomaa B, Jackson AU, Johnson PR, Jørgensen T, Kao WH, Klopp N, Kong A, Kraft P, Kuusisto J, Lauritzen T, Li M, Lieverse A, Lindgren CM, Lyssenko V, Marre M, Meitinger T, Midthjell K, Morken MA, Narisu N, Nilsson P, Owen KR, Payne F, Perry JR, Petersen AK, Platou C, Proença C, Prokopenko I, Rathmann W, Rayner NW, Robertson NR, Rocheleau G, Roden M, Sampson MJ, Saxena R, Shields BM, Shrader P, Sigurdsson G, Sparsø T, Strassburger K, Stringham HM, Sun Q, Swift AJ, Thorand B, Tichet J, Tuomi T, van Dam RM, van Haeften TW, van Herpt T, van Vliet-Ostaptchouk JV, Bragi Walters G, Weedon MN, Wijmenga C, Witteman J, Bergman RN, Cauchi S, Collins FS, Gloyn AL, Gyllensten U, Hansen T, Hide WA, Hitman GA, Hofman A, Hunter DJ, Hveem K, Laakso M, Mohlke KL, Morris AD, Palmer CN, Pramstaller PP, Rudan I, Sijbrands E, Stein LD, Tuomilehto J, Uitterlinden A, Walker M, Wareham NJ, Watanabe RM, Abecasis GR, Boehm BO, Campbell H, Daly MJ, Hattersley AT, Hu FB, Meigs JB, Pankow JS, Pedersen O, Wichmann HE, Barroso I, Florez JC, Frayling TM, Groop L, Sladek R, Thorsteinsdottir U, Wilson JF, Illig T, Froguel P, van Duijn CM, Stefansson K, Altshuler D, Boehnke M, McCarthy MI MAGIC investigators; GIANT Consortium. Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nat Genet. 2010;42:579–589. doi: 10.1038/ng.609. [DOI] [PMC free article] [PubMed] [Google Scholar]