Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Nov 1.
Published in final edited form as: Genet Epidemiol. 2013 Oct 5;37(7):643–657. doi: 10.1002/gepi.21756

Gene-Environment Interactions in Cancer Epidemiology: A National Cancer Institute Think Tank Report

Carolyn M Hutter 1, Leah E Mechanic 2, Nilanjan Chatterjee 3, Peter Kraft 4, Elizabeth M Gillander, on behalf of the NCI Gene-Environment Think Tank5,*
PMCID: PMC4143122  NIHMSID: NIHMS611711  PMID: 24123198

Abstract

Cancer risk is determined by a complex interplay of genetic and environmental factors. Genome-wide association studies (GWAS) have identified hundreds of common (minor allele frequency [MAF]>0.05) and less common (0.01<MAF<0.05) genetic variants associated with cancer. The marginal effects of most of these variants have been small (odds ratios: 1.1–1.4). There remain unanswered questions on how best to incorporate the joint effects of genes and environment, including gene-environment interactions, into epidemiologic studies of cancer. To help address these questions, and to better inform research priorities and allocation of resources, the National Cancer Institute sponsored a “Gene-Environment Think Tank” on January 10th–011th, 2012. The objective of the Think Tank was to facilitate discussions on: 1) the state of the science; 2) the goals of gene-environment interaction studies in cancer epidemiology; and 3) opportunities for developing novel study designs and analysis tools. This report summarizes the Think Tank discussion, with a focus on contemporary approaches to the analysis of gene-environment interactions. Selecting the appropriate methods requires first identifying the relevant scientific question and rationale, with an important distinction made between analyses aiming to characterize the joint effects of putative or established genetic and environmental factors and analyses aiming to discover novel risk factors or novel interaction effects. Other discussion items include measurement error, statistical power, significance and replication. Additional designs, exposure assessments, and analytical approaches need to be considered as we move from the current small number of success stories to a fuller understanding of the interplay of genetic and environmental factors.

Keywords: Gene-environment interactions, complex phenotypes, genetic epidemiology

Introduction

The study of gene-environment (GxE) interactions in complex diseases has a long history [Haldane 1938; Khoury, et al. 1988; Thomas 2000]. In contrast to simple Mendelian disorders, susceptibility to common complex traits, including cancer, is multi-factorial, involving multiple genetic and environmental risk factors. Over the past decade, the field has progressed from candidate gene and candidate gene-gene (GxG) and GxE interaction studies to genome-wide association studies (GWAS) and gene-environment-wide interaction studies (GEWIS [Khoury and Wacholder 2009] or “GE-Whiz” [Thomas, et al. 2012]). Using the Human Genome Epidemiology (HuGE) Navigator tool [Yu, et al. 2008] to track publications, Dr. Khoury and colleagues identified exponential increases in published genetic epidemiology literature from 2001 to 2010, including GWAS, substantive epidemiologic studies, method analyses, meta-analyses, and reviews [Khoury, et al. 2011]. They noted challenges in developing and applying appropriate methods for analysis and synthesis of GxE interactions. These challenges stem from the complex, evolving, and expanding nature of genetic and environmental data collected. The field continues to face new challenges as we move into the “Post-GWAS” era [Aschard, et al. 2012; Dempfle, et al. 2008; Khoury, et al. 2011; Liu, et al. 2012; Thomas 2010].

To address these challenges, the National Cancer Institute (NCI) sponsored a Gene-Environment Think Tank, held on January 10th–11th, 2012. The goal of the meeting was to facilitate discussion on GxE interaction studies in cancer epidemiology, with a focus on current progress, and recommendations for future research [http://epi.grants.cancer.gov/workshops/thinktank/]. Presentations covered a broad spectrum of topics, including: rationale for GxE studies; state of the science; optimal study designs; emerging approaches for analytic methods; challenges and opportunities in measurement of the environment; and clinical and public health implications.

A key theme that emerged at the Think Tank was that, as with any scientific endeavor, the analytical challenges of GxE studies can only be met by first elucidating the underlying scientific question and rationale. Broadly, examples of scientific rationale for GxE interaction studies in epidemiology can include: discovering novel genetic or environmental risk factors; providing etiologic insight; and providing guidance on public health and clinical strategies for cancer prevention, intervention and treatment [Hunter 2005; Thomas 2010]. Throughout the Think Tank discussion a distinction was drawn between the goal of characterizing joint effects of known or putative genetic and environmental risk factors, and the goal of discovering novel genetic loci by leveraging GxE interactions. In a translational epidemiology framework, where the translational pathway is defined on a five point scale from T0 (scientific discovery research) to T4 (translational research from practice to population health impact) [Khoury, et al. 2010], discovery can be framed within the T0 (scientific discovery research) phase, and characterization within the T1 (translational research from discovery to candidate application) phase.

Despite many years of candidate gene studies testing for GxE in cancer, there are only a few notable replicated and widely-agreed-upon examples of successes (e.g. NAT2, smoking and bladder cancer; ALDH2, alcohol and esophageal cancer) [Brooks, et al. 2009; Garcia-Closas, et al. 2013; Wu, et al. 2012]. Hundreds of studies reporting analyses of GxE interaction in cancer were published before the advent of GWAS, but most suffered from problems that plagued candidate gene studies of marginal association, including small sample sizes, insufficiently stringent thresholds for statistical significance (needed to account for multiple testing and low priors), incomplete genetic coverage, and publication bias [Hirschhorn and Altshuler 2002; Ioannidis 2005; Wacholder, et al. 2004]. For example, of 407 studies examining GxE interactions in breast cancer published before May 2011, 307 (75%) reported a statistically significant GxE interaction—a strikingly high proportion, suggesting most are false positives.

The Think Tank participants discussed several large studies that have tested for GxE interactions for GWAS identified loci. Most of these studies have not observed statistically significant interactions [Campa, et al. 2011; Hutter, et al. 2010; Milne, et al. 2010; Nickels, et al. 2013; Travis, et al. 2010], although one study did find evidence for a statistical interaction between genetic variants at 8q22.3 and vegetable consumption for risk of colorectal cancer [Hutter, et al. 2012], and another showed evidence for statistical interactions between LSP1 and parity, and CASP8 and alcohol consumption for risk of breast cancer [Nickels, et al. 2013]. Generally these studies have focused on the statistical significance of GxE interaction terms, rather than full characterization of joint effects. Concern was raised as to whether these studies adequately model the genetic and environmental factors [Prentice 2011]. Participants discussed several initial GEWIS studies of cancer phenotypes with null findings that have yet to be published. While there have been a small number of initial success stories where consideration of environmental factors or GxE interactions contributed to discovery of novel genetic loci for cancer and other complex diseases [Cornelis, et al. 2012; Hamza, et al. 2011; Hancock, et al. 2012; Manning, et al. 2012; Wu, et al. 2012] publication bias is of substantive concern. The upcoming years may be more successful, as increasingly large studies with rare and common genome-wide genotype data incorporate existing environmental data, improved measures of environmental factors, and novel statistical methods.

This report aims to summarize the Think Tank discussions, focusing on contemporary analysis of GxE interactions for cancer and other complex diseases. Specifically, we provide an overview of motivation for performing GxE analysis, present methods that can be applied to existing genetic and exposure data within observational studies to characterize and discover GxE interactions, discuss key considerations for analysis in case-control or nested case-control studies, and comment on interpretation of GxE interactions. We highlight some key unanswered questions (Box 1).

Some Considerations and Questions for GxE Interaction Studies.

Considerations for Characterization of GxE

  • What do we mean by GxE in a characterization setting?

  • When is it appropriate to select a SNP or environmental factor for characterization?

  • What are the methods for testing pure interactions?

  • What are the optimal methods for evaluating risk models?

  • How do we interpret an interaction?

Considerations for Discovery of GxE

  • What do we mean by GxE in a discovery setting?

  • What is the optimal method for discovery of GxE in GEWIS studies?

  • How prevalent is GxE correlation in real data sets?

  • How do we interpret an interaction?

Measurement Error

  • What methods should we use to account for misclassification and measurement error in GxE studies?

  • What are the best methods for improving environmental exposure measurement?

  • What methods or designs are most appropriate for time-varying exposures and timevarying interactions?

Significance Testing

  • Is 10−10, or some other p-value threshold, appropriate for GEWIS?

  • How do we best incorporate outside information (i.e. biological information), together with statistical data, to establish “credible” or “real” interactions?

Sample Size and Power

  • How do we address small cell sizes in finite samples? Can we find appropriate alternatives tests that do not rely on asymptotic assumptions?

  • What are the best methods for meta-analysis of GxE interactions?

Replication

  • What should be the criteria for selecting GxE for follow up studies?

  • What should be the criteria to define sufficient replication?

  • How to handle replication with rare exposures or unique populations?

  • How can we best use GxE information to pick SNPs for replication in GEWIS settings?

Other

  • Given that many initial attempts at GxE in characterization and discovery (GEWIS) have had null findings, how do we prioritize publication? How do we ensure dissemination of information? And how do we best capitalize on the use of this information?

Motivation for Assessing GxE Interaction

The analysis of GxE is motivated by interest in either “characterization” of the joint effects of genetic and environmental risk factors or “discovery” of novel risk factors or interaction effects. In either context, it is important to define several categories of interactions, including: qualitative interaction, where the effect of one exposure is reversed by the other; pure interaction, where the effect of one exposure is present only in the presence of the other; and quantitative interaction, where the effect of one exposure, on some specified scale, is of a different magnitude in the presence of the other [De Gonzalez and Cox 2007]. Whereas qualitative interactions are not removable by any transformation [Satagopan and Elston 2013], the presence of quantitative interaction depends on whether the effects are being measured in an underlying additive scale, for example using risk-differences; multiplicative scale, using risk-ratios; or some other scale [Walter and Holford 1978].

Characterization

One goal for GxE studies is to characterize risks associated with joint effects of putative or known genetic and environmental factors. In this setting the goal is often estimation rather than statistical testing. In many GxE interaction studies the common practice to simply model interaction terms, scan p-values and report “significant” interaction terms. This is often done without considering the context of the direction and interpretation of the full joint effects. However, such practice is not ideal from a biological or public health point of view [Knol, et al. 2009; Knol and VanderWeele 2012]. Instead, understanding joint risks may be important both for obtaining etiologic insights, and for translation to public health applications such as risk-based screening and intervention. In these studies, the joint effects would ideally be estimated empirically using data within each particular GxE strata. Obtaining adequate sample sizes for each unique combination of risk factors is often not feasible, so we often rely on models for parsimonious description of joint effects.

A primary challenge of characterizing joint effects and GxE to provide biological insights into mechanisms is the lack of explicit links between statistical and biological interactions. This issue has been vigorously debated in the epidemiologic literature for decades [Cordell 2002; Siemiatycki and Thomas 1981; Thompson 1991; VanderWeele 2011; Weinberg 2012b]. While it is generally recognized that the simple existence of multiplicative interactions between two risk factors does not readily identify a unique model for biologic mechanisms for interactions, there is considerable debate whether tests for interaction in alternative scales could be more insightful for this purpose. We present some of these alternatives in the statistical methods section of this paper.

In addition, there are certain categories of joint effects that might more readily provide mechanistic interpretations. The presence of qualitative or pure interaction can highlight components of a complex exposure that might be active in specific conditions. This is illustrated by polymorphisms in genes that metabolize the exposure agent, such as acetylation activity of the NAT2 gene, different kinds of aromatic amines and the etiology of bladder cancer. Studies in the general population consistently show slow acetylation activity increases risk of bladder cancer among smokers, but has no effect among non-smokers [García-Closas, et al. 2005]. In contrast, a study conducted among subjects highly exposed to benzidine, which is rare in the general population, showed slow acetylation activity to be associated with reduced risk of bladder cancer for benzidine-exposed workers [Carreon, et al. 2006]. These studies provided biological insights into mechanisms of actions for different types PAHs on carcinogenesis.

Another motivation for characterization of joint effects and GxE is for potential translational applications of epidemiologic research. Studies of GxE may help public health researchers develop strategies for targeted intervention for risk-factor modification based on individuals’ genetic profile [Hunter 2005]. Some cancers have strong environmental risk factors, and in considering the practicality of intervention, the environment is often more easily modified than genetic factors. If an intervention can be applied only to a subset of the population due to ethical issues, risk of side effects, cost, or other practical considerations, then targeting the intervention to high-risk subjects could be more beneficial in terms of number of diseases prevented. In this context, when a GxE interaction is found, the joint effects can be modeled to identify subgroups for whom interventions may best be targeted. Further, although tests for statistical significance could be performed based on multiplicative models of relative risks, the magnitude of benefit from targeted intervention cannot be assessed without reference to absolute risks [Garcia-Closas, et al. 2013].

Discovery

A second common goal for studies of GxE interactions is the identification of novel factors that may contribute to the etiology of disease. This paper will focus on discovery of novel genetic loci, but it is important to consider that GxE can also be used to discover novel environmental factors. Genetic variation that impacts disease through interaction with environmental factors may not be readily detected in traditional GWAS analysis, particularly if marginal effects of the genetic factors are small [Gauderman, et al. 2013; Manolio, et al. 2009; Thomas 2010]. Therefore, there is growing motivation to use GEWIS, or other methods that incorporate environmental information, to identify novel risk loci. Notably, the focus is on discovery of new genetic loci that impact disease risk, more than on identifying novel GxE interactions per se. The expectation is that future studies will characterize any underlying GxE interaction.

There have only been a small number of GEWIS publications to date. These include examples for cancer risk, including a large study that replicated interactions between alcohol and single nucleotide polymorphisms (SNPs) in relation to esophageal squamous cell carcinoma [Wu, et al. 2012], and a relatively small, unreplicated, finding for obesity and colorectal cancer [Siegert, et al. 2013]. As well as findings in other complex diseases and outcomes [Cornelis, et al. 2012; Hamza, et al. 2011; Hancock, et al. 2012; Manning, et al. 2012]. Notably, in some of these cases inclusion of an environmental factor assisted in identifying novel genetic loci, even though there was not strong evidence for a GxE interaction. The findings to date suggest that the number of novel genetic loci identified by a GEWIS discovery approach may be small, and the interaction effects modest. Therefore, it is important to have large studies incorporating accurate assessment of environmental exposures in well characterized populations and/or combined analysis of new or existing studies with harmonized environmental data across studies. It will be important to develop and use powerful analytical methods, as described below.

Statistical Methods for Testing and Estimating GxE Interactions

This paper focuses on methods for SNP × E case-control and nested case-control studies. The most common approach to investigate multiplicative GxE interactions is to incorporate a product interaction term within a logistic regression model. Some Think Tank participants noted that even when testing for departures from a multiplicative odds ratio model is the primary goal, other approaches may be more powerful than logistic regression [Hein, et al. 2008]. Other participants noted that these methods have their own drawbacks, and that logistic regression has the advantage of widespread familiarity. In this section we outline alternative approaches which merit further consideration, and touch on which may be most appropriate for use in the context of characterization, discovery, or both. The different approaches are summarized in table 1, and statistical software resources are summarized in table 2.

Table 1.

Overview of Analytical Methods for Characterization and Discovery of GxE Interactions

Method Highlights Reference
Sufficient Component Models
  • Framework where the presence of interaction in the additive scale can be used as evidence of overlap of biologic actions through a common underlying pathway.

  • Useful in characterization motivated by understanding biological mechanisms.

[VanderWeele 2009; VanderWeele and Robins 2007].
Test for Qualitative Interaction
  • Tests for qualitative interactions.

  • Useful in characterization motivated by understanding nature of interaction.

[Gail and Simon 1985]
Goodness of Fit tests
  • Simultaneously test multiple terms including GxG and GxE.

  • Useful when building parsimonious models for risk assessment in public health contexts.

[Hosmer, et al. 1997]
Unconditional Logistic Regression
  • Standard method for analysis.

  • Robust to assumptions about G-E correlation.

[Breslow and Day 1980]
Case-only
  • Efficient method for analysis of multiplicative interaction odds ratio.

  • Exploits, and is highly sensitive to, assumption of G-E independence.

  • Useful for improved power for discovery of GxE interaction.

[Piegorsch, et al. 1994]
Maximum Likelihood Estimation method
  • Exploits GxE independence assumption in the analysis of case-control data.

  • Allows efficient estimation of all parameters from logistic regression model. Useful for both discovery and characterization. For discovery, the method could be used for joint test for genetic effects and GxE interaction.

[Chatterjee and Carroll 2005]
Two-step procedures that screen based on G-E correlation in cases and controls.
  • Filters markers based on GxE correlation and tests using standard case-control logistic regression.

  • Sensitive to case:control ratio relative to the population disease prevalence.

  • Efficient method for testing interaction, unless G-E association is opposite direction from the interaction. Useful in GEWIS discovery.

[Murcray, et al. 2011; Murcray, et al. 2009]
Two-step procedures that screen on marginal gene-disease association
  • Filters markers based on gene-disease association and tests using standard case-control logistic regression.

  • Efficient method for testing for pure and quantitative interactions. Useful in GEWIS discovery.

[Kooperberg and Leblanc 2008]
Empirical Bayes
  • Weighs the case-only and conventional case-control tests depending on the degree of G-E association present in data.

  • Useful for both discovery and characterization. Allows efficient estimation of all of a general logistic regression model. Can have inflated Type I error in presence of strong population level G-E correlations.

[Mukherjee and Chatterjee 2008]
Bayes-Model Averaging
  • Combines case-only model and case-control model.

  • Bayes Model Averaging estimator is expected value of posterior distribution.

  • Dependent on choice of prior weights for case-only vs. case-control.

[Li and Conti 2009]
Frequentist model averaging (AIC)
  • Combines case-only model and case-control model.

  • Weights are data-driven and depend on Akaike information criteria (AIC).

[Mukherjee, et al. 2012]
Joint two-degree of freedom test
  • Two-degree of freedom test that simultaneously tests the main effect and the GxE interaction.

  • Tests hypothesis that the genetic factor is associated with risk of disease in any exposure sub-group.

  • Powerful method for discovery of novel loci.

  • G-E independence assumption can be incorporated to improve the power of the test using the MLE or empirical-Bayes method.

[Kraft, et al. 2007]
Joint Meta-Analysis Approach
  • Allows for meta-analysis of the joint two-degree of freedom test.

  • Useful for discovery of novel loci in consortia and other collaborative analysis.

[Aschard, et al. 2010; Manning, et al. 2011]
Cocktail Method
  • Modular approach that incorporates aspects of two-step methods, case-only and empirical-Bayes.

  • Allows for weighted hypothesis testing to account for multiple comparisons.

  • Hedges different hedge methods for GEWIS discovery.

[Hsu, et al. 2012]
EDGxE Method
  • Two-step procedure.

  • Uses both G-E correlation and marginal gene-disease association to filter markers for testing.

[Gauderman, et al. 2013b]
Multifactor Dimensionality Reduction
  • Data mining technique.

  • Allows for identification of higher-order interactions.

[Ritchie, et al. 2001]
Random Forest Regression
  • Non-parametric method.

  • Useful for selecting subsets of genetic and/or environmental factors for further modeling.

  • May uncover interactions that do not show strong marginal effects.

[Breiman 2001]
Bayesian Networks
  • Allows for multi-level inference and considers multi-variety adjusted associations.

  • Can incorporate prior distributions for SNP inclusion.

[Chen and Thomas 2010; Wilson, et al. 2010]
Entropy-Based Information Gain Approaches
  • Utilizes information theory based on entropy.

  • Can detect non-linear relationship between G and E.

[Fan, et al. 2011]

Table 2.

Programs and software for power calculations and analysis

Program/Macro Specific Uses Website Reference
SAS macro GEmis2
  • Power calculations

  • Addresses misclassification in E

http://www.hsph.harvard.edu/faculty/peter-kraft/software/ [Lindstrom, et al. 2009]
Quanto
  • Power calculations

  • GxE and joint test, case-control, case-only, family-based designs, continuous outcome

http://biostats.usc.edu/software [Gauderman 2002a; Gauderman 2002b]
Power
  • Power calculations

  • Additive Interactions

http://dceg.cancer.gov/tools/design/POWER [García-Closas and Lubin 1999]
Stata program
  • Power calculations

  • GxE test from logistic regression

http://ideas.repec.org/p/boc/asug03/07.html [Katie, et al. 2003]
PLINK
  • Analysis for discovery GEWIS

  • data handling, GE test, joint test

http://pngu.mgh.harvard.edu/~purcell/plink/ [Purcell, et al. 2007]
ProbABEL
  • Analysis for discovery GEWIS

  • computes robust variance-covariance matrix

http://www.genabel.org/packages/ProbABEL [Aulchenko, et al. 2010]
GxEscan
  • Analysis for discovery GEWIS

  • Implements a suite of testing methods for GEWIS data, including efficient 2-step methods

http://biostats.usc.edu/software [Gauderman, et al. 2013a]
Multassoc
  • Analysis for discovery and characterization

  • Test a group of SNPs taking interaction with other G, E into account

http://dceg.cancer.gov/tools/analysis/multassoc [Chatterjee, et al. 2006]
METAL
  • Meta-analysis

  • Common package for combining existing GWAS results

http://genome.sph.umich.edu/wiki/Meta_Analysis_of_SNPxEnvironment_Interaction [Willer, et al. 2010]
CGEN
  • Analysis for discovery and characterization

  • R package implementing a suite of testing and estimation methods for GxE studies

http://bioconductor.org/packages/release/bioc/html/CGEN.html [Bhattacharjee, et al. 2012]

Methods for biological interpretation and for testing specific categories of interaction

In the characterization setting, a “sufficient component” causal inference framework has been used extensively by VanderWeele and colleagues to propose new tests for “biologic interactions” and to develop scenarios where existing tests for statistical interaction may be given biologic interpretation [VanderWeele 2009; VanderWeele and Robins 2007]. Although these methods provide a consistent philosophical framework for defining and testing for interaction, their practical utility remains to be demonstrated. The biological insight obtained from using this framework may be limited since the underlying mechanism through which the exposures interact could be very broad and not pinpoint particular biological causes [Clayton 2012]. It is not clear how and whether the presence of such interaction can aid in design of biological experiments to provide more specific mechanistic insights.

A classic test proposed in the context of clinical trials [Gail and Simon 1985], can be easily adapted to test for qualitative GxE interaction in observational studies. The statistical methodology for identification of pure interaction, as defined above, is not well developed and further development of methodologies is merited. One option may be to formulate the problem in terms of a model selection problem, comparing the pure interaction model to a more general model allowing for other types of interactions, using model fit measures such as AIC or BIC or a Bayesian model selection approach.

Methods for risk modeling and public health applications

In moving towards a personalized, or stratified, medicine paradigm, it will be important to develop well-calibrated models for the joint effects of all known SNPs and environmental factors. To this end, one might select models based on overall goodness-of-fit tests [Hosmer, et al. 1997], rather than on the significance of individual interaction terms. Think Tank participants discussed how it might be appropriate to build models that assess the interaction between environmental exposures and genetic susceptibility through categories of polygenic risk-scores [Garcia-Closas, et al. 2013]. Participants recognized that such models have limited biological interpretation due to mixing of SNPs with different biological functions. However, they may adequately capture variation in joint risk.

Once a suitable model for joint effects is determined, various alternative criteria can be used for evaluating utility for public health applications [Gail and Pfeiffer 2005]. The area under the receiver operating characteristics curves (AUC) is a popularly used measure for discriminatory ability. However, the measure is not necessarily a good guide in all applications. In particular, AUC is a measure that depends on the distribution of the risk-profile conditional on case-control status, and cannot take into account information about baseline risk of a disease, which could be an important determinant of degree of stratification for absolute risk. For example, a model with modest discriminatory performance, when applied to a relatively common condition such as breast cancer, was shown to provide sufficient stratification for absolute risk to be useful for weighing risks and benefits for a drug such as Tamoxifen [Gail, et al. 1999]. Studies that aim to develop risk models need to take into consideration specific public health applications and then accordingly use an appropriate criterion for evaluating utility of the models.

Methods for efficient analysis of GxE interactions

The case-only method tests the association between the environmental factor and the genetic factor within cases, and has improved power over the traditional logistic regression method[Piegorsch, et al. 1994]; however, the case-only method has a large type I error if the GxE independence assumption is violated [Albert, et al. 2001]. This method loses power when the interaction odds ratio and the association between the gene and environmental exposure are in opposite directions [Mukherjee and Chatterjee 2008], and is less powerful when the disease is common.

There was active discussion at the Think Tank as to whether violations of the assumption of GxE independence are commonly observed in real data. Several argued that lack of GxE independence may be rare in practice and, therefore, power gains from case-only approaches may outweigh the potential risks of increase in type I error. Support for this has been shown empirically in a GEWIS with body mass index as exposure and diabetes as the outcome [Cornelis, et al. 2012]. However, there are several plausible scenarios where one would expect G/E association in a population [Weinberg, et al. 2011]. Participants noted consideration of the violation of GxE independence assumption should be informed by previous data, experience and study characteristics. A recent empirical example in esophageal cancer demonstrates advantages and disadvantages of the case-only test compared to other methods [Wu, et al. 2013]. Determination of whether case-only analyses are appropriate will depend on the study population, exposure and disease.

Hedge methods have also been proposed to have improved power, and are less susceptible to violations of the GxE independence assumption. These include: (1) Two-step procedures that filter on marginal effects, gene environment correlations in the full sample population, or other tests [Dai, et al. 2012a; Gauderman, et al. 2013; Kooperberg and Leblanc 2008; Murcray, et al. 2011; Murcray, et al. 2009]; and (2) Data adaptive methods, such as empirical Bayes, Bayes-model averaging, or frequentist model averaging [Li and Conti 2009; Mukherjee, et al. 2012; Mukherjee and Chatterjee 2008]. Several papers have compared these two types of hedge methods for the genome-wide discovery setting [Cornelis, et al. 2012; Gauderman, et al. 2013; Mukherjee, et al. 2012; Murcray, et al. 2011; Thomas, et al. 2012]. These methods all performed relatively equivalently and were shown to generally have more power compared to standard unconditional logistic regression and better control of type I error when compared to case-only approaches [Mukherjee, et al. 2012]. Standard unconditional logistic regression methods maintained proper type 1 error control.

The field is rapidly evolving, and additional approaches continue to be developed. For example, the cocktail method uses a module-based approach that implements multiple analytical methods simultaneously, including weighting for multiple comparison testing adjustment, implementing two-step procedures and testing with case-only, case-control and empirical Bayes [Hsu, et al. 2012], and the EDGxE method has a screening step that uses both marginal-effect and gene-environment correlation information to efficiently filter SNPs for GxE testing [Gauderman, et al. 2013]. An additional approach augments case-only data with exposure data collected from siblings [Weinberg, et al. 2011]. Further approaches are being developed through NIH funded applications in response to a program announcement on “Methods and Approaches for Detection of Gene-Environment Interactions in Human Disease” [http://grants.nih.gov/grants/guide/pa-files/PAR-11-032.html].

As discussed above, characterization studies focus on the joint effects, looking at relative and absolute risks rather than narrowly focusing on specific forms of interactions. Nevertheless, several methods described in this section can be extended in ways that make them quite useful for characterization as well as discovery. The assumption of independence of G and E, exploited in the case-only method for testing multiplicative interaction [Piegorsch, et al. 1994], can also be exploited in a case-control study to make efficient inference regarding all parameters of a general logistic regression model. Having all of the parameters, allows for characterization of the full joint effects using a maximum likelihood estimation (MLE) method [Chatterjee and Carroll 2005]. Similarly, the robust empirical-Bayes method can be applied to make inference about all of the parameters of a logistic regression models [Mukherjee and Chatterjee 2008]. Methods have been developed for testing for additive interactions in case-control data [Knol, et al. 2011; Rothman 1986], and the assumption of GxE independence can be exploited to improve power to test for additive interaction based on case-control study data [Han, et al. 2012].

Methods for the joint analysis of G and GxE

In the discovery setting, focus is often on identifying novel genetic loci associated with disease. For GEWIS approaches, there is typically not a priori information as to whether multiplicative or additive models are most appropriate. When environmental factors and genetic loci only have modest effects on disease risk, there will not be large differences between additive and multiplicative tests [Weinberg 2012b]. However, similarity between the models breaks down when one of the main effects is large, or the environmental exposure is continuous. In general, Think Tank participants were supportive of considering approaches that were less dependent on the choice of additive or multiplicative parameterizations in the discovery setting. Joint tests consider the hypothesis that a genetic factor is associated with risk of disease in any exposure subgroup, and test the main effect of the genetic factor and the GxE interaction simultaneously in a 2-degree of freedom test [Kraft, et al. 2007]. Alternative versions of this test have been proposed [Dai, et al. 2012b] and the GxE independence assumption can be incorporated into joint tests to improve power using MLE [Chatterjee and Carroll 2005] or empirical-Bayes methods [Mukherjee and Chatterjee 2008]. These tests are scale independent under the restricted situation of dichotomous G and dichotomous E, although they are sensitive to choice of scale for continuous or categorical G or E. Additionally, joint tests can have increased power over traditional marginal and interaction tests, particularly if the genetic effects are modest [Lindstrom, et al. 2009] After a variant is identified with the joint test method, standard practice is to characterize the full joint effects, including a separate examination of the marginal and GxE interaction effects. This produces stratum-specific relative risks that are jointly cross classified with their 95% confidence intervals. Some recent examples have identified novel genetic loci using the joint test that do not show strong evidence for multiplicative interaction when a standard GxE interaction test is performed [Hancock, et al. 2012; Manning, et al. 2012]. In one case the joint test allowed for identification of novel genetic loci not because of underlying GxE interactions, but because of precision gained by including a known risk-factor in a linear regression model [Manning, et al. 2012]; however, such a gain in power may not be realized with logistic regression [Mefford and Witte 2012].

Additional Methods

Some methods propose testing GxE interactions at the gene level, rather than the SNP or genetic variant level [Chatterjee, et al. 2006; Lin, et al. 2013]. In these methods, all variants in a gene region are considered as a unit. Although such approaches may focus and reduce the total number of interaction tests being performed, the methods require multiple degree of freedom tests or strong parametric assumptions, which may result in greater loss of power. There is uncertainty in how to define a gene or gene region, leading to uncertainty in what variants to include in these tests [Djebali, et al. 2012]. Furthermore, gene based approaches do not account for variation in intergenic regions.

There are also several machine learning approaches, including multifactor dimensionality reduction (MDR) [Ritchie, et al. 2001], random forest regression [Breiman 2001], and Bayesian network analysis [Baurley, et al. 2010; Chen and Thomas 2010; Wilson, et al. 2010] that can be considered for exploration of GxE interactions (for review see [Cordell 2009; Moore, et al. 2010]). Bayesian generalized linear models can be used to simultaneously test main effects for environmental exposures, multiple genetic variants along with GxG and GxE interactions [Yi, et al. 2011]. Additional methods incorporate different frameworks, such as the natural and orthogonal interaction framework [Ma, et al. 2012]. Many of these approaches consider more complex models of interaction as alternatives to additive or multiplicative scales. Furthermore, Bayesian approaches can potentially incorporate prior biological information or pathway data into interaction models [Hung, et al. 2004]. However, pathway definitions and functional annotation are not fully established [Kraft and Raychaudhuri 2009; Mechanic, et al. 2012; Wang, et al. 2010]. Specific considerations for these multilevel methods include incorporation of computational and bioinformatic advances, new approaches to interpretation of results and specific challenges for framing and performing replication.

Key Considerations for Characterization and Discovery

Measurement Error and Improved Exposure Assessment

Participants at the Think Tank noted every GxE study must address challenges in measuring, assessing and modeling both genetic and environmental factors. For GWAS, directly genotyped common SNPs are often analyzed as if measured without error. However, a closer examination of cluster plots may demonstrate uncertainty in genotype measurement. The assumption of no error is less safe for imputed genotypes [Jiao, et al. 2011; Sinnott and Kraft 2012], For rare variants, and sequencing studies, variant calls are often made with less accuracy and confidence, particularly for low-coverage designs [Li, et al. 2011]. Additionally, error may be introduced in the choice of genetic model (i.e. assuming a log-additive model when the true effect is dominant or recessive [Prentice 2011]). The environment is dynamic, changes over time and individual’s lifespans, and is fraught with measurement error [Spiegelman 2010]. Participants stressed the importance of valid, efficient, computationally feasible methodology for measuring “E” in GxE studies, since misclassification can be a major source of bias and loss of power [Aschard, et al. 2012; Lindstrom, et al. 2009].

Think Tank participants discussed methods that correct for bias and loss of power due to exposure measurement error in ways best suited for GxE and GEWIS studies [Cheng 2006; Cheng 2007; Wong, et al. 2004; Zhang, et al. 2008] The common trade-off between increased sample size and decreased quality of exposure data (or harmonizable exposure data) can be considered, and in some instances, the fully validated design may be optimal [Greenland 1988]. In most cases, main study/validation study designs [Greenland 1988; Holcroft and Spiegelman 1999; Spiegelman 2002] and main study/reliability study designs may be appropriate [Spiegelman D 1998]. The joint test, discussed above for use in discovery settings, has been shown to be less sensitive to bias from measurement error [Lindstrom, et al. 2009]. However, more research is needed to develop and evaluate methods that account for measurement error.

Recent reports discuss the clear need for developing improved measures of environmental exposures, and lay out details of next steps in this area [National Research Council [2012; Balshaw and Kwok 2012]. Although these advances will reduce environmental measurement error and improve tests of GxE, some participants noted that it typically will not be feasible to remove all error. More accurate environmental exposure methods may not be applicable in some current large population-based studies because the data and specimens were already collected, because of expense of these measures, or because of other factors.

There is additional uncertainty in environmental data stemming from our lack of knowledge of the timing of effects. For instance, even if we could measure a person’s pack years of smoking without error for a study of breast cancer risk, we might still need to know whether smoking was initiated prior to her first birth, and how long ago she quit smoking. These considerations are compounded for studies of the impact of in utero and early childhood exposures on a disease with late onset, and highlight the need for methods and designs, such as longitudinal studies, that consider interactions over time and time-varying exposures.

Power and Sample Size

Think Tank participants discussed the related issues of power and sample size. Because of small effect sizes, multiple testing corrections, and the need to model more parameters, studies of GxE interactions need to have large sample sizes. Table 2 provides a list of software that can be used for power calculations in GxE studies. Lack of power is an important limitation in many GxE studies, particularly GEWIS [Dempfle, et al. 2008; Mukherjee, et al. 2012]. The GWAS community was able to address sample size by performing meta-analysis, efficiently combining results across multiple studies [de Bakker, et al. 2008]. However, there are more challenges when considering meta-analysis of GxE (see Cornelius and Hu for discussion [Cornelis and Hu 2012]). Some of these issues overlap with issues in replication as discussed below.

Studies with sufficiently large samples for analyzing interactions with common exposures may still have sample size issues for a relatively rare exposure or genotype of interest, or when a large sample size is created through a meta-analysis of many small studies. In these situations there may be small cell counts where asymptotics break down, leading to unstable results.

Assessing Statistical Significance and Evidence for Interaction

Another key issue that was raised throughout the Think Tank was the question of statistical significance thresholds for GxE interactions. Even for candidate gene GxE interactions, a traditional nominal p<0.05 cut off will not suffice because multiple comparisons are typically performed. The lack of multiple testing corrections, coupled with publication bias, leads to an increase in the rate of publication of false positive findings; thus obscuring true positives. Notably, a recent review of NCI’s external funding portfolio and the literature suggests publication biases resulting from not publishing null findings [Ghazarian, et al. 2013].

For characterization studies the goal is often estimation rather than significance testing. When testing in the characterization setting, there are typically relatively small numbers of candidate SNPs and environmental factors, and standard methods are often used to correct for multiple testing. One option is to perform permutation methods that can account for correlation between different genetic factors and/or different environmental factors [Buzkova, et al. 2011].

For GEWIS studies of SNP × E for discovery, the Think Tank participants discussed setting a threshold for genome-wide GxE significance. Thresholds in the 5×10−7 to 1×10−8 range have been an important component of the standards and success of GWAS [Chanock, et al. 2007]. Although specifics about appropriate significance thresholds for the GWAS setting are still actively debated [Panagiotou, et al. 2012; Wakefield 2012], thresholds are viewed as a pragmatic way to determining what is “established” and curtailing false positives. However, a concern was raised at the Think Tank as to whether it is too early to establish fixed thresholds for GxE, since there have not been a sufficient number of large studies with replication to empirically evaluate a threshold standard. There was additional concern that it is not clear what decision we are trying to make with a threshold. Specifically, are we using a threshold to determine when an interaction is real, or using a threshold more broadly to determine whether interactions are noteworthy and merit further follow-up? Additionally, there was concern that a statistical threshold may represent too stringent a standard, particularly if thresholds are the only standard. For example, in a multi-stage GWAS and replication joint analysis of 10,519 bladder cancer cases and 13,218 controls, the p-value for interaction between ever/never smoking and the NAT2 tag SNP rs1495741 was only 2.8×10−4, clearly not fulfilling any “genome-wide significant” threshold [Rothman, et al. 2010]. However, this interaction is widely accepted because of underlying biology and consistency in replication. This example is not being used to advocate low thresholds; however, it demonstrates that very large sample sizes are needed to detect interactions using only highly stringent significance thresholds.

Despite these concerns, some Think Tank participants suggested adjusting GWAS thresholds to account for the number of exposures as an approach for considering an interaction “established” based on statistical evidence in GEWIS. One number proposed was 10−10. This number assumes 100 exposures, and some participants questioned if that was a realistic assumption, or if it was an overestimate leading to too stringent of a threshold. Participants felt strongly that if a threshold is adopted, it should not create a barrier to publication for GxE. We should not conflate what is publishable with truth. As has been noted previously [Ghazarian, et al. 2013; Mechanic, et al. 2012], it is important to have publication or reports of data from well-designed studies that have been properly analyzed but did not achieve statistical significance, and the group supported a web-based clearinghouse or repository for results of putative interactions while more evidence is being collected. There is current concern that many initial examinations of GEWIS are “sitting in desk drawers”. Additionally, if a threshold is chosen, it should not be viewed as replacing or superseding replication in an independent population. However, as discussed below, there are special challenges in replication for GxE interactions.

As an alternative to a set threshold, participants discussed a recently published set of guidelines for a standardized and transparent assessment of the evidence of GxE interactions [Boffetta, et al. 2012]. A true GxE interaction is more likely to have strong evidence, replication, protection from bias, and high prior plausibility [Boffetta, et al. 2012]. However, some participants noted these guidelines do not fully address complex issues such as when the multiplicative joint effects are are null for a multiplicative null model but truly interactive compared to an additive null. and exposure-related population stratification. Further, because so much about human physiology is unknown, it is difficult to calibrate the level of a priori knowledge, particularly for agnostic genome-wide approaches.

Replication

Replication has proven to be an essential component to any genetic association study [Chanock, et al. 2007]. A contributing factor to the success of GWAS was the standardized requirement for independent replication [Chanock, et al. 2007; Kraft, et al. 2009]. The division of GWAS studies into discovery and replication was also motivated by GWAS being expensive, and follow-up genotyping more cost efficient. As we combine and create studies large enough to reach adequate power in the discovery phase, it may not be possible to have large studies for replication [Hernan and Savitz 2013]. Current practice often focuses on whether stringent genome-wide significance standards are achieved for combined analysis of all available data [Eeles, et al. 2013; Michailidou, et al. 2013; Skol, et al. 2006]

Think Tank participants discussed several challenges unique to performing replication in studies of GxE. One challenge is determining what aspects of the GxE interplay are factored into replication. As discussed above, often the goal of GxE studies is to model the joint effects [Weinberg 2012b]; therefore, such joint effects should be the focus of replication. However, the best approach to test if the joint effects are similar across studies is not as straight forward as when considering a single interaction term. Moreover, without similar patterns of joint effects, replication of the interaction coefficient is not interpretable [Mechanic, et al. 2012].

Another challenge is obtaining an appropriate replication population. Differences in the underlying distribution of environmental exposures, LD structures, and genetic modifiers can reduce the power to detect an interaction in independent studies. In cases where an investigator is examining a rare disease, genetic, or environmental exposure; where exposures are unique to particular populations; or where the initial finding was obtained within a large consortium comprising all known studies of a specific outcome, an appropriate replication population may not exist [Mechanic, et al. 2012]. Potential differences in the method or timing of assessment of an environmental exposures or measurement of genetic variants (e.g. genotyping platform coverage) may also reduce the suitability of potential replication populations for a specific GxE interaction.

The Think Tank discussion covered possible considerations for addressing challenges in replication. One suggestion was to incorporate supporting biological knowledge, noting that functional studies may serve as independent support of a true association when an appropriate independent replication population is not present. However, the best approach for such incorporation is not clear, and participants stressed the importance of strong communication between the investigators performing the initial GEWIS, and those performing biological assays for follow-up. The need for independent replication in a second population versus performing biological assays must be balanced against cost of follow-up. Another possibility is to focus on environmental exposures with known replication datasets available. These replication-ready data sets could include cohorts with pre-diagnostic data and stored biospecimens; and existing data could be leveraged through enhanced data sharing. Sharing of null results through a clearinghouse, as discussed above, would foster replication of both positive and negative findings. A final consideration relates to the population size for the initial observation. When performing well-designed studies of suitably large samples sizes (50–100K individuals) from multiple studies, observing an interaction provides stronger credibility and therefore, the requirement for independent replication may be lessened. However, heterogeneity across the component studies may provide insights about effect modifiers or may provide evidence that the SNP found to be interacting with the environmental exposure is a marker and not the directly causative SNP.

DISCUSSION

GWAS have identified several hundred variants for cancer risk, with new variants being reported from large-scale meta-analysis [Hindorff, et al. 2013; Sakoda, et al. 2013]. The “post-GWAS” era is seeing advances in exome and whole-genome sequencing and rare-variant analysis, as well as functional follow-up and epidemiologic examinations through projects like the NCI-sponsored Genetic Associations and Mechanisms of Oncology (“GAME-ON”) Initiative. [http://epi.grants.cancer.gov/gameon/]. GAME-ON has been actively involved in developing large collection of samples for which environmental exposures have been well harmonized, thus allowing large-scale genome-wide interaction analyses to be effectively conducted. Adopting standards for assessment of environmental exposure assessment, such as standards that have been presented by the PhenX initiative will assist researchers to aggregate sufficiently harmonized studies to be jointly analyzed [Hamilton, et al. 2011]. There have been advances in measuring exposures and developing tools and technologies to assess environmental factors [National Research Council 2012].

This paper has focused on contemporary analysis issues for case-control studies, with the goal of guiding the conversation about what investigators should do today with the data that they have in hand. However, the Think Tank participants recognized that there will be many opportunities to advance our understanding of GxE interactions. These opportunities include development of more comprehensive methods for identifying and assessing environmental factors, discussion of novel study designs, and discussion of the need for dissemination and sharing of resources.

In addition to identifying new genetic factors, GxE studies could be used to identify novel environmental risk factors and to improve our understanding of the etiologic role of those factors for disease. The Think Tank participants recognized inequality in GxE research, with G proceeding at higher speed because of technological developments that allowed for more precise high-throughput assessment of relevant genetic variants garnering more attention. They drew on the analogy of the claws of the male fiddler crab, with G being the oversized chela, and E being the smaller claw [Wild 2005]. As with GxE interactions, both claws are important, as the larger is used in clashes as part of courtship while the smaller is used for eating; moreover, if the large claw is lost, the smaller one will grow larger. Participants raised the challenge of investing in technology for exposure characterization to bring it up to par with genetic data. While recognizing that environmental exposures are not as static as genetic variants, part of this challenge may be addressed by developing low-cost approaches to do large scale assessment and characterization over time, including methods development and applications in longitudinal studies. There was discussion as to whether we should consider discovery studies for novel risk factors using exhaustive, systematic and agnostic analysis across multiple environmental factors [Patel, et al. 2013; Rappaport 2011], and discussion about the potential for including epigenetic components, such as non-coding RNA in GxE. An additional challenge was determining what methods were high priority in an environment of scarce resources.

The Think Tank participants discussed advantages and opportunities for conducting new studies using innovative designs. The focus of this paper has been on case-control or nested case-control study designs. Family-based designs may provide some advantages over case-control studies due to robustness to population stratification, ability to make genetic inferences by comparisons to family members, increased power for assessing GxE interactions [Witte, et al. 1999], and resistance to self-selection of controls [Shi, et al. 2011; Weinberg 2012a]. Extensions to these methods have been developed that complement some of the case-control methods described in this paper [Chatterjee, et al. 2005; Kistner, et al. 2009]. However, population stratification remains an issue in family studies of GxE interactions. Several alternatives such as the tetrad design or sibling augmented case-only (SACO) are being explored to address those robustness issues [Shi, et al. 2011; Weinberg, et al. 2011], and work in this area is merited to facilitate the use of data collected under these family study designs.

Finally, the Think Tank participants discussed the importance of developing collaborative resources. As discussed above, there was support for a clearinghouse to share results from GxE studies, and that this should not be limited to “significant” or publishable findings. There was also an emphasis on providing a resource or clearinghouse for sharing software. Methods and software not only need to be developed and validated, but to be implemented they need to be fast, uncomplicated, well-documented and available to the broader research community. Sharing of this software could follow a model similar to the recently developed Genetic Simulation Resources [Peng, et al. 2013].

Overall the Think Tank stressed the importance of careful consideration of design and analysis in studies of GxE for characterization and for discovery. The design and analysis should be motivated by the underlying scientific question and rationale, and key items—including replication, power, measurement error and credibility— should be considered. As the field moves forward we will need to consider additional designs and analytical approaches in order to move from the current small number of success stories to a better understanding of the interplay of genetic and environmental factors.

The NCI Think Tank participants are as follows (* indicates speakers, panelists and discussion moderators)

Christian C. Abnet*, National Cancer Institute, Division of Cancer Epidemiology and Genetics, Nutritional Epidemiology Branch, Bethesda, MD, USA; Christopher Amos*, Dartmouth-Hitchcock Norris Cotton Cancer Center, Community and Family Medicine, Hanover, NH, USA; David Balshaw*, National Institute of Environmental Health Sciences, Center for Risk & Integrated Sciences, Research Triangle Park, NC, USA; Heike Bickeböller, University of Göttingen, Department of Genetic Epidemiology, Gottingen, Germany; Laura Jean Bierut*, Washington University, Department of Psychiatry, St. Louis, MO, USA; Paolo Boffetta*, Mount Sinai School of Medicine, Institute for Translational Epidemiology, New York, NY, USA; Melissa Bondy*, Baylor College of Medicine, Department of Pediatrics Hematology and Oncology Division, Houston, TX, USA; Stephen Chanock*, National Cancer Institute, Division of Cancer Epidemiology and Genetics, Bethesda, MD, USA; Nilanjan Chatterjee*, National Cancer Institute, Division of Cancer Epidemiology and Genetics, Biostatistics Branch, Bethesda, MD, USA; Huann-Sheng Chen, National Cancer Institute, Division of Cancer Control and Population Sciences, Surveillance Research Program, Bethesda, MD, USA; Nancy Cox*, University of Chicago, Department of Medicine, Chicago, IL, USA; Immaculata De Vivo, Harvard School of Public Health, Department of Epidemiology, Boston, MA, USA; Rao Divi, National Cancer Institute, Division of Cancer Control and Population Sciences, Epidemiology and Genomics Research Program, Bethesda, MD, USA; Josee Dupuis, Boston University School of Public Health, Department of Biostatistics, Boston, MA, USA; Gary Ellison, National Cancer Institute, Division of Cancer Control and Population Sciences, Epidemiology and Genomics Research Program, Bethesda, MD, USA; Margaret Daniele Fallin*, Johns Hopkins Bloomberg School of Public Health, Department of Epidemiology, Baltimore, MD, USA; W. James Gauderman, Keck School of Medicine of University of Southern California, Department of Preventive Medicine, Los Angeles, CA, USA; Elizabeth Gillanders*, National Cancer Institute, Division of Cancer Control and Population Sciences, Epidemiology and Genomics Research Program, Bethesda, MD, USA; Christopher Haiman, Keck School of Medicine of University of Southern California, Department of Preventive Medicine, Los Angeles, CA, USA; Carolyn Hutter, National Cancer Institute, Division of Cancer Control and Population Sciences, Epidemiology and Genomics Research Program, Bethesda, MD, USA; Naoko Ishibe Simonds, National Cancer Institute, Division of Cancer Control and Population Sciences, Epidemiology and Genomics Research Program, Bethesda, MD, USA; Edwin Iversen, Duke University, Department of Statistical Science, Durham, NC, USA; Muin J. Khoury*, National Cancer Institute, Division of Cancer Control and Population Sciences, Epidemiology and Genomics Research Program, Bethesda, MD, USA; Peter Kraft*, Harvard School of Public Health, Department of Epidemiology and Biostatistics, Boston, MA, USA; Loic Le Marchand, University of Hawaii Cancer Center, Epidemiology Program, Honolulu, HI, USA; Dongxin Lin, Cancer Institute & Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, State Key Laboratory of Molecular Oncology, Beijing, China; Kimberly McAllister*, National Institute of Environmental Health Sciences, Division of Extramural Research and Training, Susceptibility & Population Health Branch, Research Triangle Park, NC, USA; Leah Mechanic*, National Cancer Institute, Division of Cancer Control and Population Sciences, Epidemiology and Genomics Research Program, Bethesda, MD, USA; Ulrike Peters*, Fred Hutchinson Cancer Research Center, Public Health Sciences Division, Seattle, WA, USA; Ross Prentice*, Fred Hutchinson Cancer Research Center, Public Health Sciences Division, Seattle, WA, USA; Timothy Rebbeck, University of Pennsylvania, Department of Biostatistics and Epidemiology, Philadelphia, PA, USA; Jill Reedy, National Cancer Institute, Division of Cancer Control and Population Sciences, Applied Research Program, Bethesda, MD, USA; Nathaniel Rothman*, National Cancer Institute, Division of Cancer Epidemiology and Genetics, Occupational and Environmental Epidemiology Branch, Bethesda, MD, USA; Sheri Schully, National Cancer Institute, Division of Cancer Control and Population Sciences, Epidemiology and Genomics Research Program, Bethesda, MD, USA; Daniela Seminara, National Cancer Institute, Division of Cancer Control and Population Sciences, Epidemiology and Genomics Research Program, Bethesda, MD, USA; Daniel Shaughnessy, National Institute of Environmental Health Sciences, Division of Extramural Research and Training, Susceptibility & Population Health Branch, Research Triangle Park, NC, USA; Sanjay Shete, MD Anderson Cancer Center, Program in Biomathematics and Biostatistics, Houston, TX, USA; Donna Spiegelman*, Harvard School of Public Health, Departments of Epidemiology and Biostatistics, Boston, MA, USA; Daniel O. Stram*, Keck School of Medicine of University of Southern California, Department of Preventive Medicine, Los Angeles, CA, USA; Duncan Thomas*, Keck School of Medicine of University of Southern California, Department of Preventive Medicine, Los Angeles, CA, USA; Molin Wang*, Harvard School of Public Health, Department of Epidemiology and Biostatistics, Boston, MA, USA; Wendy Wang, National Cancer Institute, Division of Cancer Prevention, Bethesda, MD, USA; Clarice Weinberg*, National Institute of Environmental Health Sciences, Intramural Research Division, Biostatistics Branch, Research Triangle Park, NC, USA; Deborah M. Winn, National Cancer Institute, Division of Cancer Control and Population Sciences, Bethesda, MD, USA; John S. Witte*, University of California, San Francisco, Department of Epidemiology and Biostatistics, San Francisco, CA, USA

Contributor Information

Carolyn M. Hutter, Epidemiology and Genetics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Bethesda, Maryland

Leah E. Mechanic, Epidemiology and Genetics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Bethesda, Maryland

Nilanjan Chatterjee, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland.

Peter Kraft, Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts.

Elizabeth M. Gillander, Epidemiology and Genetics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Bethesda, Maryland

References

  1. Albert PS, Ratnasinghe D, Tangrea J, Wacholder S. Limitations of the case-only design for identifying gene-environment interactions. American Journal of Epidemiology. 2001;154(8):687–693. doi: 10.1093/aje/154.8.687. [DOI] [PubMed] [Google Scholar]
  2. Aschard H, Hancock DB, London SJ, Kraft P. Genome-wide meta-analysis of joint tests for genetic and gene-environment interaction effects. Hum Hered. 2010;70(4):292–300. doi: 10.1159/000323318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aschard H, Lutz S, Maus B, Duell EJ, Fingerlin TE, Chatterjee N, Kraft P, Van Steen K. Challenges and opportunities in genome-wide environmental interaction (GWEI) studies. Hum Genet. 2012;131(10):1591–1613. doi: 10.1007/s00439-012-1192-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Aulchenko YS, Struchalin MV, van Duijn CM. ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics. 2010;11:134. doi: 10.1186/1471-2105-11-134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Balshaw DM, Kwok RK. Innovative methods for improving measures of the personal environment. Am J Prev Med. 2012;42(5):558–559. doi: 10.1016/j.amepre.2012.02.002. [DOI] [PubMed] [Google Scholar]
  6. Baurley JW, Conti DV, Gauderman WJ, Thomas DC. Discovery of complex pathways from observational data. Stat Med. 2010;29(19):1998–2011. doi: 10.1002/sim.3962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bhattacharjee S, Chatterjee N, Han SS, Wheeler W. An R package for analysis of case control studies in genetic epidemiology. 2012 [Google Scholar]
  8. Boffetta P, Winn DM, Ioannidis JP, Thomas DC, Little J, Smith GD, Cogliano VJ, Hecht SS, Seminara D, Vineis P, et al. Recommendations and proposed guidelines for assessing the cumulative evidence on joint effects of genes and environments on cancer occurrence in humans. International Journal of Epidemiology. 2012;41(3):686–704. doi: 10.1093/ije/dys010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Breiman L. Random Forests. Mach. Learn. 2001;45(1):5–32. [Google Scholar]
  10. Breslow NE, Day NE. Statistical Methods in Cancer Research, Volume I: The Analysis of Case-Control Studies. Lyon, France: International Agency for Research on Cancer; 1980. IARC Scientific Publications, No. 32. [PubMed] [Google Scholar]
  11. Brooks PJ, Enoch MA, Goldman D, Li TK, Yokoyama A. The alcohol flushing response: an unrecognized risk factor for esophageal cancer from alcohol consumption. PLoS Med. 2009;6(3):e50. doi: 10.1371/journal.pmed.1000050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Buzkova P, Lumley T, Rice K. Permutation and parametric bootstrap tests for gene-gene and gene-environment interactions. Ann Hum Genet. 2011;75(1):36–45. doi: 10.1111/j.1469-1809.2010.00572.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Campa D, Kaaks R, Le Marchand L, Haiman CA, Travis RC, Berg CD, Buring JE, Chanock SJ, Diver WR, Dostal L, et al. Interactions Between Genetic Variants and Breast Cancer Risk Factors in the Breast and Prostate Cancer Cohort Consortium. Journal of the National Cancer Institute. 2011;103(16):1252–1263. doi: 10.1093/jnci/djr265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Carreon T, Ruder AM, Schulte PA, Hayes RB, Rothman N, Waters M, Grant DJ, Boissy R, Bells DA, Kadlubar FF, et al. NAT2 slow acetylation and bladder cancer in workers exposed to benzidine. International Journal of Cancer. 2006;118(1):161–168. doi: 10.1002/ijc.21308. [DOI] [PubMed] [Google Scholar]
  15. Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, Thomas G, Hirschhorn JN, Abecasis G, Altshuler D, Bailey-Wilson JE, et al. Replicating genotype-phenotype associations. Nature. 2007;447(7145):655–660. doi: 10.1038/447655a. [DOI] [PubMed] [Google Scholar]
  16. Chatterjee N, Carroll RJ. Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies. Biometrika. 2005;92(2):399–418. [Google Scholar]
  17. Chatterjee N, Kalaylioglu Z, Carroll RJ. Exploiting gene-environment independence in family-based case-control studies: Increased power for detecting associations, interactions and joint effects. Genetic Epidemiology. 2005;28(2):138–156. doi: 10.1002/gepi.20049. [DOI] [PubMed] [Google Scholar]
  18. Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S. Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. Am J Hum Genet. 2006;79(6):1002–1016. doi: 10.1086/509704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Chen GK, Thomas DC. Using biological knowledge to discover higher order interactions in genetic association studies. Genet Epidemiol. 2010;34(8):863–878. doi: 10.1002/gepi.20542. [DOI] [PubMed] [Google Scholar]
  20. Cheng KF. A maximum likelihood method for studying gene-environment interactions under conditional independence of genotype and exposure. Stat Med. 2006;25(18):3093–3109. doi: 10.1002/sim.2506. [DOI] [PubMed] [Google Scholar]
  21. Cheng KF. Analysis of case-only studies accounting for genotyping error. Ann Hum Genet. 2007;71(Pt 2):238–248. doi: 10.1111/j.1469-1809.2006.00314.x. [DOI] [PubMed] [Google Scholar]
  22. Clayton D. Commentary: reporting and assessing evidence for interaction: why, when and how? Int J Epidemiol. 2012;41(3):707–710. doi: 10.1093/ije/dys069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Cordell HJ. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum Mol Genet. 2002;11(20):2463–2468. doi: 10.1093/hmg/11.20.2463. [DOI] [PubMed] [Google Scholar]
  24. Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009;10(6):392–404. doi: 10.1038/nrg2579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Cornelis MC, Hu FB. Gene-environment interactions in the development of type 2 diabetes: recent progress and continuing challenges. Annu Rev Nutr. 2012;32:245–259. doi: 10.1146/annurev-nutr-071811-150648. [DOI] [PubMed] [Google Scholar]
  26. Cornelis MC, Tchetgen Tchetgen EJ, Liang L, Qi L, Chatterjee N, Hu FB, Kraft P. Gene-Environment Interactions in Genome-Wide Association Studies: A Comparative Study of Tests Applied to Empirical Studies of Type 2 Diabetes. American Journal of Epidemiology. 2012;175(3):191–202. doi: 10.1093/aje/kwr368. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Dai JY, Kooperberg C, Leblanc M, Prentice RL. Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction. Biometrika. 2012a;99(4):929–944. doi: 10.1093/biomet/ass044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Dai JY, Logsdon BA, Huang Y, Hsu L, Reiner AP, Prentice RL, Kooperberg C. Simultaneously testing for marginal genetic association and gene-environment interaction. Am J Epidemiol. 2012b;176(2):164–173. doi: 10.1093/aje/kwr521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. de Bakker PI, Ferreira MA, Jia X, Neale BM, Raychaudhuri S, Voight BF. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet. 2008;17(R2):R122–R128. doi: 10.1093/hmg/ddn288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. De Gonzalez AB, Cox DR. Interpretation of Interaction: A Review. Annals of Applied Statistics. 2007;1(2):371–385. [Google Scholar]
  31. Dempfle A, Scherag A, Hein R, Beckmann L, Chang-Claude J, Schafer H. Gene-environment interactions for complex traits: definitions, methodological requirements and challenges. Eur J Hum Genet. 2008;16(10):1164–1172. doi: 10.1038/ejhg.2008.106. [DOI] [PubMed] [Google Scholar]
  32. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, et al. Landscape of transcription in human cells. Nature. 2012;489(7414):101–108. doi: 10.1038/nature11233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Eeles RA, Olama AA, Benlloch S, Saunders EJ, Leongamornlert DA, Tymrakiewicz M, Ghoussaini M, Luccarini C, Dennis J, Jugurnauth-Little S, et al. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array. Nat Genet. 2013;45(4):385–391. doi: 10.1038/ng.2560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Fan R, Zhong M, Wang S, Zhang Y, Andrew A, Karagas M, Chen H, Amos CI, Xiong M, Moore JH. Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment interactions/correlations of complex diseases. Genet Epidemiol. 2011;35(7):706–721. doi: 10.1002/gepi.20621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Gail M, Simon R. Testing for qualitative interactions between treatment effects and patient subsets. Biometrics. 1985;41(2):361–372. [PubMed] [Google Scholar]
  36. Gail MH, Costantino JP, Bryant J, Croyle R, Freedman L, Helzlsouer K, Vogel V. Weighing the risks and benefits of tamoxifen treatment for preventing breast cancer. J Natl Cancer Inst. 1999;91(21):1829–1846. doi: 10.1093/jnci/91.21.1829. [DOI] [PubMed] [Google Scholar]
  37. Gail MH, Pfeiffer RM. On criteria for evaluating models of absolute risk. Biostatistics. 2005;6(2):227–239. doi: 10.1093/biostatistics/kxi005. [DOI] [PubMed] [Google Scholar]
  38. García-Closas M, Lubin JH. Power and Sample Size Calculations in Case-Control Studies of Gene-Environment Interactions: Comments on Different Approaches. American Journal of Epidemiology. 1999;149(8):689–692. doi: 10.1093/oxfordjournals.aje.a009876. [DOI] [PubMed] [Google Scholar]
  39. García-Closas M, Malats N, Silverman D, Dosemeci M, Kogevinas M, Hein DW, Tardón A, Serra C, Carrato A, García-Closas R, et al. NAT2 slow acetylation, GSTM1 null genotype, and risk of bladder cancer: results from the Spanish Bladder Cancer Study and meta-analyses. The Lancet. 2005;366(9486):649–659. doi: 10.1016/S0140-6736(05)67137-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Garcia-Closas M, Rothman N, Figueroa J, Prokunina-Olsson L, Han SS, Baris D, Jacobs EJ, Malats N, De Vivo I, Albanes D, et al. Common Genetic Polymorphisms Modify the Effect of Smoking on Absolute Risk of Bladder Cancer. Cancer Research OnlineFirst. 2013 Mar;27 doi: 10.1158/0008-5472.CAN-12-2388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Gauderman WJ. Sample size requirements for association studies of gene-gene interaction. Am J Epidemiol. 2002a;155(5):478–484. doi: 10.1093/aje/155.5.478. [DOI] [PubMed] [Google Scholar]
  42. Gauderman WJ. Sample size requirements for matched case-control studies of gene-environment interaction. Stat Med. 2002b;21(1):35–50. doi: 10.1002/sim.973. [DOI] [PubMed] [Google Scholar]
  43. Gauderman WJ, Zhang P, Morrison J. A program to detect GxE interactions in a GWAS. Version Beta. University of Southern California, Los Angeles; 2013a. [Google Scholar]
  44. Gauderman WJ, Zhang P, Morrison JL, Lewinger JP. Finding Novel Genes by Testing G × E Interactions in a Genome-Wide Association Study. Genet Epidemiol. 2013b doi: 10.1002/gepi.21748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Ghazarian AA, Simonds NI, Bennett K, Pimentel CB, Ellison GL, Gillanders EM, Schully SD, Mechanic LE. A Review of NCI's Extramural Grant Portfolio: Identifying Opportunities for Future Research in Genes and Environment in Cancer. Cancer Epidemiol Biomarkers Prev. 2013 doi: 10.1158/1055-9965.EPI-13-0156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Greenland S. STATISTICAL UNCERTAINTY DUE TO MISCLASSIFICATION - IMPLICATIONS FOR VALIDATION SUBSTUDIES. Journal Of Clinical Epidemiology. 1988;41(12):1167–1174. doi: 10.1016/0895-4356(88)90020-0. [DOI] [PubMed] [Google Scholar]
  47. Haldane J. Heredity and politics. New York: W. W. Norton, Company; 1938. [Google Scholar]
  48. Hamilton CM, Strader LC, Pratt JG, Maiese D, Hendershot T, Kwok RK, Hammond JA, Huggins W, Jackman D, Pan H, et al. The PhenX Toolkit: get the most from your measures. Am J Epidemiol. 2011;174(3):253–260. doi: 10.1093/aje/kwr193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Hamza TH, Chen H, Hill-Burns EM, Rhodes SL, Montimurro J, Kay DM, Tenesa A, Kusel VI, Sheehan P, Eaaswarkhanth M, et al. Genome-Wide Gene-Environment Study Identifies Glutamate Receptor Gene GRIN2A as a Parkinson's Disease Modifier Gene via Interaction with Coffee. PLoS Genet. 2011;7(8):e1002237. doi: 10.1371/journal.pgen.1002237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Han SS, Rosenberg PS, Garcia-Closas M, Figueroa JD, Silverman D, Chanock SJ, Rothman N, Chatterjee N. Likelihood Ratio Test for Detecting Gene (G)-Environment (E) Interactions Under an Additive Risk Model Exploiting G-E Independence for Case-Control Data. American Journal of Epidemiology. 2012;176(11):1060–1067. doi: 10.1093/aje/kws166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Hancock DB, Artigas MS, Gharib SA, Henry A, Manichaikul A, Ramasamy A, Loth DW, Imboden M, Koch B, McArdle WL, et al. Genome-Wide Joint Meta-Analysis of SNP and SNP-by-Smoking Interaction Identifies Novel Loci for Pulmonary Function. PLoS Genet. 2012;8(12):e1003098. doi: 10.1371/journal.pgen.1003098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Hein R, Beckmann L, Chang-Claude J. Sample size requirements for indirect association studies of gene–environment interactions (G × E) Genetic Epidemiology. 2008;32(3):235–245. doi: 10.1002/gepi.20298. [DOI] [PubMed] [Google Scholar]
  53. Hernan MA, Savitz DA. From "big epidemiology" to "colossal epidemiology": when all eggs are in one basket. Epidemiology. 2013;24(3):344–345. doi: 10.1097/EDE.0b013e31828c7694. [DOI] [PubMed] [Google Scholar]
  54. Hindorff LA, MacArthur J, Morales J, Junkins HA, Hall PN, Klemm AK, Manolio TA. A Catalog of Published Genome-Wide Association Studies. NHGRI; 2013. Available at: www.genome.gov/gwastudies. [Google Scholar]
  55. Hirschhorn JN, Altshuler D. Once and again-issues surrounding replication in genetic association studies. J Clin Endocrinol Metab. 2002;87(10):4438–4441. doi: 10.1210/jc.2002-021329. [DOI] [PubMed] [Google Scholar]
  56. Holcroft CA, Spiegelman D. Design of validation studies for estimating the odds ratio of exposure-disease relationships when exposure is misclassified. Biometrics. 1999;55(4):1193–1201. doi: 10.1111/j.0006-341x.1999.01193.x. [DOI] [PubMed] [Google Scholar]
  57. Hosmer DW, Hosmer T, Le Cessie S, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Stat Med. 1997;16(9):965–980. doi: 10.1002/(sici)1097-0258(19970515)16:9<965::aid-sim509>3.0.co;2-o. [DOI] [PubMed] [Google Scholar]
  58. Hsu L, Shuo J, Dai Y, Hutter CM, Peters U, Kooperberg C. Powerful Cocktail Methods for Detecting Genome-wide Gene-Environment Interaction. Genetic Epidemiology. 2012;36(3):183–194. doi: 10.1002/gepi.21610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Hung RJ, Brennan P, Malaveille C, Porru S, Donato F, Boffetta P, Witte JS. Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer. Cancer Epidemiol Biomarkers Prev. 2004;13(6):1013–1021. [PubMed] [Google Scholar]
  60. Hunter DJ. Gene-environment interactions in human diseases. Nat Rev Genet. 2005;6(4):287–298. doi: 10.1038/nrg1578. [DOI] [PubMed] [Google Scholar]
  61. Hutter CM, Chang-Claude J, Slattery ML, Pflugeisen BM, Lin Y, Duggan D, Nan H, Lemire M, Rangrej J, Figueiredo JC, et al. Characterization of gene-environment interactions for colorectal cancer susceptibility loci. Cancer Research. 2012 doi: 10.1158/0008-5472.CAN-11-4067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Hutter CM, Slattery ML, Duggan DJ, Muehling J, Curtin K, Hsu L, Beresford SA, Rajkovic A, Sarto GE, Marshall JR, et al. Characterization of the association between 8q24 and colon cancer: gene-environment exploration and meta-analysis. BMC Cancer. 2010;10(670):670. doi: 10.1186/1471-2407-10-670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124. doi: 10.1371/journal.pmed.0020124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Jiao S, Hsu L, Hutter CM, Peters U. The use of imputed values in the meta-analysis of genome-wide association studies. Genet Epidemiol. 2011;35(7):597–605. doi: 10.1002/gepi.20608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Katie S, Tim B, Jenny B. Case-control study power and sample size calculations using Stata. Stata Users Group. 2003 [Google Scholar]
  66. Khoury MJ, Adams MJ, Jr, Flanders WD. An epidemiologic approach to ecogenetics. Am J Hum Genet. 1988;42(1):89–95. [PMC free article] [PubMed] [Google Scholar]
  67. Khoury MJ, Gwinn M, Clyne M, Yu W. Genetic epidemiology with a capital E, ten years after. Genet Epidemiol. 2011;35(8):845–852. doi: 10.1002/gepi.20634. [DOI] [PubMed] [Google Scholar]
  68. Khoury MJ, Gwinn M, Ioannidis JP. The emergence of translational epidemiology: from scientific discovery to population health impact. Am J Epidemiol. 2010;172(5):517–524. doi: 10.1093/aje/kwq211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Khoury MJ, Wacholder S. Invited Commentary: From Genome-Wide Association Studies to Gene-Environment-Wide Interaction Studies—Challenges and Opportunities. American Journal of Epidemiology. 2009;169(2):227–230. doi: 10.1093/aje/kwn351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Kistner EO, Shi M, Weinberg CR. Using cases and parents to study multiplicative gene-by-environment interaction. Am J Epidemiol. 2009;170(3):393–400. doi: 10.1093/aje/kwp118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Knol MJ, Egger M, Scott P, Geerlings MI, Vandenbroucke JP. When one depends on the other: reporting of interaction in case-control and cohort studies. Epidemiology. 2009;20(2):161–166. doi: 10.1097/EDE.0b013e31818f6651. [DOI] [PubMed] [Google Scholar]
  72. Knol MJ, VanderWeele TJ. Recommendations for presenting analyses of effect modification and interaction. Int J Epidemiol. 2012;41(2):514–520. doi: 10.1093/ije/dyr218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Knol MJ, VanderWeele TJ, Groenwold RH, Klungel OH, Rovers MM, Grobbee DE. Estimating measures of interaction on an additive scale for preventive exposures. Eur J Epidemiol. 2011;26(6):433–438. doi: 10.1007/s10654-011-9554-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Kooperberg C, Leblanc M. Increasing the power of identifying gene×gene interactions in genome-wide association studies. Genet Epidemiol. 2008;32(3):255–263. doi: 10.1002/gepi.20300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Kraft P, Raychaudhuri S. Complex diseases, complex genes: keeping pathways on the right track. Epidemiology. 2009;20(4):508–511. doi: 10.1097/EDE.0b013e3181a93b98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene-environment interaction to detect genetic associations. Hum Hered. 2007;63(2):111–119. doi: 10.1159/000099183. [DOI] [PubMed] [Google Scholar]
  77. Kraft P, Zeggini E, Ioannidis JP. Replication in genome-wide association studies. Stat Sci. 2009;24(4):561–573. doi: 10.1214/09-STS290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Li D, Conti DV. Detecting Gene-Environment Interactions Using a Combined Case-Only and Case-Control Approach. American Journal of Epidemiology. 2009;169(4):497–504. doi: 10.1093/aje/kwn339. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR. Low-coverage sequencing: implications for design of complex trait association studies. Genome Res. 2011;21(6):940–951. doi: 10.1101/gr.117259.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Lin X, Lee S, Christiani DC, Lin X. Test for interactions between a genetic marker set and environment in generalized linear models. Biostatistics. 2013 doi: 10.1093/biostatistics/kxt006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Lindstrom S, Yen YC, Spiegelman D, Kraft P. The impact of gene-environment dependence and misclassification in genetic association studies incorporating gene-environment interactions. Hum Hered. 2009;68(3):171–181. doi: 10.1159/000224637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Liu C-y, Maity A, Lin X, Wright R, Christiani D. Design and analysis issues in gene and environment studies. Environmental Health. 2012;11(1):93. doi: 10.1186/1476-069X-11-93. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Ma J, Xiao F, Xiong M, Andrew AS, Brenner H, Duell EJ, Haugen A, Hoggart C, Hung RJ, Lazarus P, et al. Natural and orthogonal interaction framework for modeling gene-environment interactions with application to lung cancer. Hum Hered. 2012;73(4):185–194. doi: 10.1159/000339906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Manning AK, Hivert M-F, Scott RA, Grimsby JL, Bouatia-Naji N, Chen H, Rybin D, Liu C-T, Bielak LF, Prokopenko I, et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat Genet. 2012;44(6):659–669. doi: 10.1038/ng.2274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Manning AK, LaValley M, Liu C-T, Rice K, An P, Liu Y, Miljkovic I, Rasmussen-Torvik L, Harris TB, Province MA, et al. Meta-analysis of gene-environment interaction: joint estimation of SNP and SNP × environment regression coefficients. Genetic Epidemiology. 2011;35(1):11–18. doi: 10.1002/gepi.20546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Mechanic LE, Chen H-S, Amos CI, Chatterjee N, Cox NJ, Divi RL, Fan R, Harris EL, Jacobs K, Kraft P, et al. Next generation analytic tools for large scale genetic epidemiology studies of complex diseases. Genetic Epidemiology. 2012;36(1):22–35. doi: 10.1002/gepi.20652. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Mefford J, Witte JS. The Covariate's Dilemma. PLoS Genet. 2012;8(11):e1003096. doi: 10.1371/journal.pgen.1003096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Michailidou K, Hall P, Gonzalez-Neira A, Ghoussaini M, Dennis J, Milne RL, Schmidt MK, Chang-Claude J, Bojesen SE, Bolla MK, et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat Genet. 2013;45(4):353–361. doi: 10.1038/ng.2563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Milne R, Gaudet M, Spurdle A, Fasching P, Couch F, Benitez J, Arias Perez JI, Zamora MP, Malats N, dos Santos Silva I, et al. Assessing interactions between the associations of common genetic susceptibility variants, reproductive history and body mass index with breast cancer risk in the breast cancer association consortium: a combined case-control study. Breast Cancer Research. 2010;12(6):R110. doi: 10.1186/bcr2797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Moore JH, Asselbergs FW, Williams SM. Bioinformatics challenges for genome-wide association studies. Bioinformatics. 2010;26(4):445–455. doi: 10.1093/bioinformatics/btp713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Mukherjee B, Ahn J, Gruber SB, Chatterjee N. Testing Gene-Environment Interaction in Large-Scale Case-Control Association Studies: Possible Choices and Comparisons. American Journal of Epidemiology. 2012;175(3):177–190. doi: 10.1093/aje/kwr367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Mukherjee B, Chatterjee N. Exploiting gene-environment independence for analysis of case-control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics. 2008;64(3):685–694. doi: 10.1111/j.1541-0420.2007.00953.x. [DOI] [PubMed] [Google Scholar]
  94. Murcray CE, Lewinger JP, Conti DV, Thomas DC, Gauderman WJ. Sample size requirements to detect gene-environment interactions in genome-wide association studies. Genetic Epidemiology. 2011;35(3):201–210. doi: 10.1002/gepi.20569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Murcray CE, Lewinger JP, Gauderman WJ. Gene-Environment Interaction in Genome-Wide Association Studies. American Journal of Epidemiology. 2009;169(2):219–226. doi: 10.1093/aje/kwn353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. National Research Council. Exposure Science in the 21st Century: A Vision and a Strategy: The National Academies Press. 2012 [PubMed] [Google Scholar]
  97. Nickels S, Truong T, Hein R, Stevens K, Buck K, Behrens S, Eilber U, Schmidt M, Haberle L, Vrieling A, et al. Evidence of Gene-Environment Interactions between Common Breast Cancer Susceptibility Loci and Established Environmental Risk Factors. PLoS Genet. 2013;9(3):e1003284. doi: 10.1371/journal.pgen.1003284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Panagiotou OA, Ioannidis JP, Genome-Wide Significance P. What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int J Epidemiol. 2012;41(1):273–286. doi: 10.1093/ije/dyr178. [DOI] [PubMed] [Google Scholar]
  99. Patel CJ, Chen R, Kodama K, Ioannidis JP, Butte AJ. Systematic identification of interaction effects between genome- and environment-wide associations in type 2 diabetes mellitus. Hum Genet. 2013 doi: 10.1007/s00439-012-1258-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Peng B, Chen HS, Mechanic LE, Racine B, Clarke J, Clarke L, Gillanders E, Feuer EJ. Genetic Simulation Resources: a website for the registration and discovery of genetic data simulators. Bioinformatics. 2013 doi: 10.1093/bioinformatics/btt094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med. 1994;13(2):153–162. doi: 10.1002/sim.4780130206. [DOI] [PubMed] [Google Scholar]
  102. Prentice RL. Empirical evaluation of gene and environment interactions: methods and potential. J Natl Cancer Inst. 2011;103(16):1209–1210. doi: 10.1093/jnci/djr279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Rappaport SM. Implications of the exposome for exposure science. J Expo Sci Environ Epidemiol. 2011;21(1):5–9. doi: 10.1038/jes.2010.50. [DOI] [PubMed] [Google Scholar]
  105. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. The American Journal of Human Genetics. 2001;69(1):138–147. doi: 10.1086/321276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Rothman K. Modern Epidemiology. Boston: Little, Brown; 1986. [Google Scholar]
  107. Rothman N, Garcia-Closas M, Chatterjee N, Malats N, Wu X, Figueroa JD, Real FX, Van Den Berg D, Matullo G, Baris D, et al. A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci. Nat Genet. 2010;42(11):978–984. doi: 10.1038/ng.687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Sakoda LC, Jorgenson E, Witte JS. Turning of COGS moves forward findings for hormonally mediated cancers. Nat Genet. 2013;45(4):345–348. doi: 10.1038/ng.2587. [DOI] [PubMed] [Google Scholar]
  109. Satagopan JM, Elston RC. Evaluation of removable statistical interaction for binary traits. Stat Med. 2013;32(7):1164–1190. doi: 10.1002/sim.5628. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Shi M, Umbach DM, Weinberg CR. Family-based gene-by-environment interaction studies: revelations and remedies. Epidemiology. 2011;22(3):400–407. doi: 10.1097/EDE.0b013e318212fec6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Siegert S, Hampe J, Schafmayer C, von Schonfels W, Egberts JH, Forsti A, Chen B, Lascorz J, Hemminki K, Franke A, et al. Genome-wide investigation of gene-environment interactions in colorectal cancer. Hum Genet. 2013;132(2):219–231. doi: 10.1007/s00439-012-1239-2. [DOI] [PubMed] [Google Scholar]
  112. Siemiatycki J, Thomas DC. Biological models and statistical interactions: an example from multistage carcinogenesis. Int J Epidemiol. 1981;10(4):383–387. doi: 10.1093/ije/10.4.383. [DOI] [PubMed] [Google Scholar]
  113. Sinnott JA, Kraft P. Artifact due to differential error when cases and controls are imputed from different platforms. Hum Genet. 2012;131(1):111–119. doi: 10.1007/s00439-011-1054-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet. 2006;38(2):209–213. doi: 10.1038/ng1706. [DOI] [PubMed] [Google Scholar]
  115. Spiegelman D. Validation Studies. In: El-Shaarawi, Piegorsch WW, editors. Encyclopedia of Environmetrics. Englad: John Wiley & Sons LTD; 2002. pp. 2311–2313. [Google Scholar]
  116. Spiegelman D. Approaches to uncertainty in exposure assessment in environmental epidemiology. Annu Rev Public Health. 2010;31:149–63. doi: 10.1146/annurev.publhealth.012809.103720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Spiegelman D. Reliability studies. In: Colton T, Armitage P, editors. Encyclopedia of Biostatistics. Sussex, England: Wiley; 1998. pp. 3771–3775. [Google Scholar]
  118. Thomas D. Gene–environment-wide association studies: emerging approaches. Nat Rev Genet. 2010;11(4):259–272. doi: 10.1038/nrg2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Thomas DC. Genetic epidemiology with a capital "E". Genet Epidemiol. 2000;19(4):289–300. doi: 10.1002/1098-2272(200012)19:4<289::AID-GEPI2>3.0.CO;2-P. [DOI] [PubMed] [Google Scholar]
  120. Thomas DC, Lewinger JP, Murcray CE, Gauderman WJ. Invited Commentary: GE-Whiz! Ratcheting Gene-Environment Studies up to the Whole Genome and the Whole Exposome. American Journal of Epidemiology. 2012;175(3):203–207. doi: 10.1093/aje/kwr365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Thompson WD. Effect modification and the limits of biological inference from epidemiologic data. J Clin Epidemiol. 1991;44(3):221–232. doi: 10.1016/0895-4356(91)90033-6. [DOI] [PubMed] [Google Scholar]
  122. Travis RC, Reeves GK, Green J, Bull D, Tipper SJ, Baker K, Beral V, Peto R, Bell J, Zelenika D, et al. Gene–environment interactions in 7610 women with breast cancer: prospective evidence from the Million Women Study. The Lancet. 2010;375(9732):2143–2151. doi: 10.1016/S0140-6736(10)60636-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. VanderWeele TJ. Sufficient Cause Interactions and Statistical Interactions. Epidemiology. 2009;20(1):6–13. doi: 10.1097/EDE.0b013e31818f69e7. [DOI] [PubMed] [Google Scholar]
  124. VanderWeele TJ. A Word and That to Which it Once Referred Assessing "Biologic" Interaction. Epidemiology. 2011;22(4):612–613. doi: 10.1097/EDE.0b013e31821db393. [DOI] [PubMed] [Google Scholar]
  125. VanderWeele TJ, Robins JM. The identification of synergism in the sufficient-component-cause framework. Epidemiology. 2007;18(3):329–339. doi: 10.1097/01.ede.0000260218.66432.88. [DOI] [PubMed] [Google Scholar]
  126. Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst. 2004;96(6):434–442. doi: 10.1093/jnci/djh075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  127. Wakefield J. Commentary: Genome-wide significance thresholds via Bayes factors. Int J Epidemiol. 2012;41(1):286–291. doi: 10.1093/ije/dyr241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  128. Walter SD, Holford TR. Additive, Multiplicative, and Other Models for Disease Risks. American Journal of Epidemiology. 1978;108(5):341–346. doi: 10.1093/oxfordjournals.aje.a112629. [DOI] [PubMed] [Google Scholar]
  129. Wang K, Li M, Hakonarson H. Analysing biological pathways in genome-wide association studies. Nat Rev Genet. 2010;11(12):843–854. doi: 10.1038/nrg2884. [DOI] [PubMed] [Google Scholar]
  130. Weinberg CR. Commentary: Thoughts on assessing evidence for gene by environment interaction. International Journal of Epidemiology. 2012a doi: 10.1093/ije/dys048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  131. Weinberg CR. Interaction and Exposure Modification: Are We Asking the Right Questions? American Journal of Epidemiology. 2012b doi: 10.1093/aje/kwr495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Weinberg CR, Shi M, Umbach DM. A Sibling-augmented Case-only Approach for Assessing Multiplicative Gene-Environment Interactions. American Journal of Epidemiology. 2011;174(10):1183–1189. doi: 10.1093/aje/kwr231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  133. Wild CP. Complementing the genome with an "exposome": the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev. 2005;14(8):1847–1850. doi: 10.1158/1055-9965.EPI-05-0456. [DOI] [PubMed] [Google Scholar]
  134. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  135. Wilson MA, Iversen ES, Clyde MA, Schmidler SC, Schildkraut JM. Bayesian Model Search and Multilevel Inference for Snp Association Studies. Ann Appl Stat. 2010;4(3):1342–1364. doi: 10.1214/09-aoas322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  136. Witte JS, Gauderman WJ, Thomas DC. Asymptotic bias and efficiency in case-control studies of candidate genes and gene-environment interactions: basic family designs. Am J Epidemiol. 1999;149(8):693–705. doi: 10.1093/oxfordjournals.aje.a009877. [DOI] [PubMed] [Google Scholar]
  137. Wong MY, Day NE, Luan JA, Wareham NJ. Estimation of magnitude in gene-environment interactions in the presence of measurement error. Stat Med. 2004;23(6):987–998. doi: 10.1002/sim.1662. [DOI] [PubMed] [Google Scholar]
  138. Wu C, Chang J, Ma B, Miao X, Zhou Y, Liu Y, Li Y, Wu T, Hu Z, Shen H, et al. The case-only test for gene-environment interaction is not uniformly powerful: an empirical example. Genet Epidemiol. 2013;37(4):402–407. doi: 10.1002/gepi.21713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Wu C, Kraft P, Zhai K, Chang J, Wang Z, Li Y, Hu Z, He Z, Jia W, Abnet CC, et al. Genome-wide association analyses of esophageal squamous cell carcinoma in Chinese identify multiple susceptibility loci and gene-environment interactions. Nat Genet. 2012;44(10):1090–1097. doi: 10.1038/ng.2411. [DOI] [PubMed] [Google Scholar]
  140. Yi N, Kaklamani VG, Pasche B. Bayesian analysis of genetic interactions in case-control studies, with application to adiponectin genes and colorectal cancer risk. Ann Hum Genet. 2011;75(1):90–104. doi: 10.1111/j.1469-1809.2010.00605.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  141. Yu W, Gwinn M, Clyne M, Yesupriya A, Khoury MJ. A navigator for human genome epidemiology. Nat Genet. 2008;40(2):124–125. doi: 10.1038/ng0208-124. [DOI] [PubMed] [Google Scholar]
  142. Zhang L, Mukherjee B, Ghosh M, Gruber S, Moreno V. Accounting for error due to misclassification of exposures in case-control studies of gene-environment interaction. Stat Med. 2008;27(15):2756–2783. doi: 10.1002/sim.3044. [DOI] [PubMed] [Google Scholar]

RESOURCES