Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Jul 24.
Published in final edited form as: Nat Rev Genet. 2024 May 28;25(11):768–784. doi: 10.1038/s41576-024-00731-z

Gene-environment interactions in human health

Esther Herrera-Luis 1, Kelly Benke 2, Heather Volk 2, Christine Ladd-Acosta 1, Genevieve L Wojcik 1
PMCID: PMC12288441  NIHMSID: NIHMS2078606  PMID: 38806721

Abstract

Gene-environment interactions (G × E), the interplay of genetic variation with environmental factors, have a pivotal impact on human complex traits and diseases. Statistically, G × E can be assessed by determining the deviation from expectation of predictive models based solely on the phenotypic effects of genetics or environmental exposures. Despite the unprecedented, widespread and diverse use of G × E analytical frameworks, heterogeneity in their application and reporting hinders their applicability in public health. In this Review, we discuss study design considerations as well as G × E analytical frameworks to assess polygenic liability dependent on the environment, to identify specific genetic variants exhibiting G × E, and to characterize environmental context for these dynamics. We conclude with recommendations to address the most common challenges and pitfalls in the conceptualization, methodology and reporting of G × E studies, as well as future directions.

Introduction

Genome-wide association studies (GWAS) have contributed to the identification of genetic loci associated with complex traits or diseases far beyond the scope provided by candidate gene association or family linkage studies. However, for most complex traits, broad-sense heritability H2 estimates are significantly higher than the heritability attributable to the additive combination of effects from SNPs, that is, SNP-based narrow-sense heritability h2SNP — a phenomenon known as “missing heritability”1. For example, asthma h2SNP(933%) estimates are consistently lower than H2 estimates from family-based studies (55–90%), with larger estimates for childhood-onset than adult-onset asthma2. In addition, despite the high genetic correlation between asthma subtypes (rg = 0.78), genetic effects are greater in childhood-onset than in adult-onset asthma3. Together with differences in disease manifestation and triggers of asthma exacerbations4 across ages of onset, these discrepancies suggest the presence of interactions between genes and age, which are potentially driven by environmental factors, with heterogeneous genetic effects on asthma across the life course.

Recent computational advances and the increasing availability of large-scale genomic datasets for diverse populations, combined with deep phenotype information from accessible electronic medical records and geo-coded environmental data, offer unparalleled opportunities to explore how genetic predispositions interact with environmental exposures, ultimately advancing our understanding of complex traits and diseases. Although other factors may contribute to the missing heritability — such as unaccounted, rare or structural variation, non-linear genetic effects, epigenetic inheritance, horizontal and vertical microbial transmissions or underestimation of the shared environment in twin studies1,5 — gene-environment interactions (G × E) thus merit further exploration.

G × E are statistically defined as a departure from predictive models that consider the phenotypic effect of genetics and environmental exposures alone (Fig. 1). Investigating G × E can lead to the identification of novel genes that do not exhibit marginal effects6 to provide insights into underlying biological mechanisms, strengthen causal inference derived from observational studies7, affect public health policy regarding prevalent environmental exposures and refine prevention programmes by building better prognostic models810. However, genome-wide screenings of G × E, also known as genome-wide interaction studies (GWIS), face multiple challenges that require consideration8,9,11. For example, these analyses require larger sample sizes to achieve adequate statistical power to detect G × E effects compared with main genetic effects. Additionally, it can be difficult to appropriately model the complexity of G × E dynamics, especially with the statistical and computational burden of multiple testing.

Fig. 1 |. G × E modelling.

Fig. 1 |

A, The interplay between genetic and exposure risk factors on a phenotype can be depicted through traditional epidemiological models of gene-environment interactions (G × E), with variable genetic effects across environments. Model scenarios include situations where the genotype exacerbates the exposure effect on the phenotype, but there is no genetic effect in non-exposed individuals (Aa); the exposure factor exacerbates the genetic effect on the phenotype, but has no effect on the phenotype in the absence of the risk genotype (Ab); both the genotype and exposure factors are required to increase the risk (Ac); and either the genotype or the exposure factor can influence risk, with their combined effect potentially differing from their individual effects (Ad). B, G × E on a phenotype. The model includes the effect of two genotypes (G0, G1) and two environments (E0, E1) on the phenotype. However, genetic and exposure factors may not necessarily interplay (Ba). When interactions occur, non-crossover interaction effects may present in either or both the additive (Bb) or multiplicative scale (Bc). Given that additive interaction suggests a differential absolute risk reduction associated with one factor across different levels of another, it can identify subgroups that would benefit the most from public health interventions. Conversely, multiplicative interactions imply that the combined effect of risk factors diverges concerning the product of individual effect and provide a better fit for the joint G × E effects78. Although less common, crossover interaction effects (Bd) can alter the genotype rank order based on environmental influences, as seen in flip-flop effects for ABCA1 and ATP8A1 loci and environmental tobacco smoke exposure on bronchial hyper-responsiveness121. C, Mechanisms underlying interaction with environmental exposures when genetic variation, DNA methylation and gene expression levels are considered. G × E models can be expanded to capture regulatory effects of genetic variation on different biological layers, such as DNA methylation and gene expression levels130. Simultaneously, these models can encompass environmental effects across biological layers. This multidimensionality allows for a more comprehensive understanding of how genetic and environmental factors interact and influence various aspects of biological processes. Part A adapted with permission from ref. 148, Elsevier. Part B adapted from ref. 149, Springer Nature Limited.

The molecular mechanisms underlying environmental influences on human health have been recently reviewed elsewhere1113. Here, we provide guidance for designing G × E studies depending on the research question and available data. Furthermore, we discuss G × E methods for analysing single and polygenic G × E, with a specific focus on elucidating the genetic contributions to G × E and G × E detection and characterization. We also provide an overview of the current landscape of G × E findings for human complex traits and diseases, and highlight considerations for future research.

Designing a G × E study

Environmental, socio-cultural and genetic factors dynamically affect human health across the lifespan by influencing biological responses at the molecular, cellular or systemic levels, shaping behavioural responses and processes. This complex interplay unfolds across multiple levels of socio-ecological dimensions, spanning from the individual level to a broader, global environment (Fig. 2). Thus, the design of a G × E study is an iterative process between choosing an appropriate research question and analytical approach, and the selection, cleaning and harmonization of available genetic data and environmental data.

Fig. 2 |. A framework depicting the joint contribution of genetic susceptibility and environmental and social exposures to human health across the lifespan.

Fig. 2 |

As an extension of the Integrated Socio-Environmental Model of Health and Well-Being (ISEM)150, this model illustrates how intra-individual, community-level, social and physical environmental factors contribute distinctly and collectively to shaping human health. Individuals are exposed to and affected by a diverse range of interconnected physical environmental factors, including natural elements, built-in environments and global conditions, as well as social-structural factors. This dynamic interplay is further influenced by age, culture and policy. This intricate relationship triggers behavioural and biological responses that affect health across various levels, encompassing behavioural outcomes, system and identity, as well as biological responses at systemic, multi-organ, cellular, subcellular and molecular levels. These responses may occur independently or interact, underscoring the complex interplay between biology, behaviour, environment and society. Adapted from ref. 150, Springer Nature Limited.

Choosing a G × E analytical framework

Understanding how genetics and environmental contexts interact to influence human health hinges on central strategies (Fig. 3) that require specific methodological approaches to address different underlying questions (Table 1). A key question is whether polygenic G × E contribute to a human trait (and by how much) across one or more environmental contexts. For this purpose, genetic variants are considered in aggregate through population-level parameters, which characterize genetic liability and its relationships with environment across a study population (that is, G × E heritability h2GE and G × E correlation rGE), or individual-level factors, which capture genetic risk per individual (that is, the polygenic score (PGS)). Differences in genetic effects per environment can also be leveraged to strengthen causal inference using Mendelian randomization (MR).

Fig. 3 |. General overview of G × E methods.

Fig. 3 |

Data for gene-environment interactions (G × E) studies can be linked through multi-stream sources. To ensure suitability for analysis, rigorous quality control, harmonization and normalization procedures are essential, particularly when data are collected from different study sites. Screening approaches involve one or multiple of the approaches shown. The genome-wide association study (GWAS) and genome-wide interaction study (GWIS) analysis equations depicted assume binary outcomes (Y) for simplicity. The most common approach models G × E using a generalized linear model, where the probability (Pr) is calculated given that outcome β0 is the intercept, βG is the genetic (G) main effect, βE is the exposure (E) main effect, βG×E is the G × E effect and βC is the covariate(s) (C) effect (Box 1). The model tests whether there is a significant interaction effect between genetics and environment, in which the null hypothesis assumes no interaction between genetics and environment H0:βG×E=0 and the alternative hypothesis suggests a non-zero interaction H1:βG×E0. In a case-only design, the equation estimates the log odds of specific genetic factors (G=g) given exposure and covariates, while accounting for the individual being a case (Y=1). The exposure-stratified design divides the study population into subgroups based on the environmental exposure factor (where E=0 represents unexposed and E=1 represents exposed). Within each subgroup, a regression model is fitted to estimate the probability of the outcome given genetic factors and covariates. γ, effect of the interaction between variables on the outcome; 1DF, one degree of freedom; 2DF, two degrees of freedom; h2GE, narrow-sense heritability attributable to G × E; IGZ, gene-environment interaction; IPGS, interaction polygenic score; M, total number of variants; MR, Mendelian randomization; PGS, polygenic score; PGS × E, polygenic score-environment interaction; rGE, G × E correlation; U, unmeasured confounder; vQTL, variance quantitative trait locus; vPGS, variance polygenic score; Z, interaction covariate.

Table 1 |.

G × E study designs and methods

Application Approach Advantages Limitations
Traditional common study designs a
Screening Case-control Modest sample size requirements for rare diseases; can individually match on confounders Recall bias for environment; difficulties selecting proper controls and higher cost than case-only
Screening and characterization Case-only approach More cost-efficient, simpler to implement when recruiting suitable controls is challenging and more precise effect size estimation compared with case-control Requires gene-environment independence; provides estimates for G × E, and not main genetic and environment effects; biased estimates under population stratification
Family-based designs a
Screening Case-parents Assessment of main genetic, G × E and joint effects; protects against confounding due to population stratification (by matching genetic background with case-parents); more powerful than case-control studies for G × E Often infeasible for late-onset diseases; possible bias in G × E if genetics and environment are associated within parental mating types; difficult to enrol complete triads; unsuitable for environment effect estimates; may require additional control for exposure-related genetic stratification
Case-siblings Protects against confounding due to population stratification (matching genetic background between cases and controls); can be applied in late-onset disease Enrolment of discordant sibships can be challenging; overmatching for genetic main effects
GWIS-like designs a
Screening Several designs Several analytical frameworks available; offer marked versatility for subsequent genetic and functional analyses, and facilitate replication and portability assessment; primary goal is screening, yet summary statistics and genetic data can also be used to characterize Gx E liability and for prediction Biased to detect common variants with modest to small effect size in non-coding regions; requires large sample size for adequate statistical power to detect associations
Analytical frameworks
Screening 2DF test Boosts power to detect signals by testing main genetic and G × E together Requires 1DF to disentangle whether findings are driven by genetic main effects, G × E effects or both; inflated type I error rates in presence of unaccounted covariate-environment confounding, when compared with meta-regression; potential bias on G × E effects under heterogeneity of main genetic effects and proportion of exposed individuals across studies; potential noisy main genetic effect estimates for continuous exposures
Screening approaches Computational and statistical burden of multiple testing from genome-wide scans; can be combined with aggregate-level effects, PGS and/or quantitative trait variability Limited applicability for G × E screening using marginal effects loci; can miss some interactions
Set-based methods Aggregation of weak individual effects can identify signals undetectable in isolation (indispensable for rare variants); different set-based or hybrid methods boost statistical power to detect signals; multi-exposure and multi-phenotype frameworks available Statistical power fluctuates based on underlying method in the presence of heterogeneous effects and varying numbers of causal variants
Screening and characterization 1DF test Discern whether findings are driven by main genetic, G × E and joint effects Lower power to identify variants with moderate joint effects and low genetic main effects compared with 2DF test
Exposure-stratified framework Characterizes main genetic effects across environmental strata (concordant, single and opposite effect direction)
Metaregression Applicable to quantitative traits; tests main genetic, G × E and joint effects; robust to covariate-environment confounder; accommodates overlapping data and heterogeneity No widespread application to date
Machine learning Can outperform regression-based analysis and accommodates non-linear effects High computational burden for genome-wide analysis; high dimensionality and noise reduction required; limited toolset
Bayesian approaches Accounts for uncertainty of gene-environment independence assumption and can outperform the case-only approach High computational burden for genome-wide analysis (in some cases); limited toolset
GEVACO Tests the joint genetic and dynamic G × E with continuous traits; applicable to a single marker and sets of variants Currently unavailable for categorical phenotypes
Multilevel modelling Applicable to longitudinal data; accounts for kinship; yields higher statistical power than linear mixed models and regression analyses; robust to partially missing data Requires strong assumptions for causal inference; requires sufficient representation across level units with dataset
Heritability Focus on polygenic G × E; can be used for G × E screening under varying genetic variance Potentially limited contribution to complex traits; unclear how environment-partitioned and cell or tissue-partitioned heritability contribute to human health
Correlation Focus on polygenic G × E; can be used for G × E screening under imperfect genetic correlation and prioritize trait-environment pairs with substantial trait variance for G × E screening Presence of r GE may act as a confounder when modelling G × E
G × E liability and characterization MR Applicable to individual-level or summary statistics; novel approaches leveraging environmental data expected to emerge Limited toolset; software-specific limitations in application for individual-level and summary statistics analyses
G × E liability, screening and prediction PGS×E Focus on polygenic G × E; can be used for G × E screening under amplification and varying genetic variance Limited accommodation of antagonistic effects; noisy PGS×E estimates when main effects PGS considered; requires disentangling main genetic and G × E effects, potential confounders and rGE; unclear how environment-partitioned and cell or tissue-partitioned PGS contributes to human health
Screening and prediction Quantitative variance genetic variability Adaptable for polygenic G × E using vPGS from vQTL; capture outcome plasticity better than traditional PGS; applicable to large datasets with incomplete or lack of environment data; novel approaches leveraging environment data expected Potentially limited number of vQTL and/or small contribution to complex traits; immature statistical methods; lack of frameworks to discern phenotypic dispersion, epistasis, effects on phenotype distribution and phantom vQTL; lack of frameworks for within-individual variability or diverse populations

1DF, one degree of freedom; 2DF, two degrees of freedom; G × E, gene-environment interactions; GWIS, genome-wide interaction study; MR, Mendelian randomization; PGS, polygenic score; PGS×E, polygenic score-environment interaction; rge, G × E correlation; vQTL, variance quantitative trait locus; vPGS, variance polygenic score.

a

For a comprehensive summary of study designs, we refer readers to Table 1 in ref. 10.

The avenue of investigation entails identifying the specific genetic variants implicated in G × E and characterizing the dynamics of these genetic variants with environmental factors. Analytical approaches for single-marker and polygenic analysis, including the PGS, can offer insights into the individual-level variability in genetic risk with environment. Novel G × E screening methods have investigated the variance of continuous phenotypes across loci1416 using variance quantitative trait locus (vQTL) or variance polygenic score (vPGS) analyses in biobanks (Fig. 3).

Data collection and quality control for G × E studies

The advent of microarrays and affordable sequencing, coupled with increased computational capacity to analyse genetic data, has paved the way for GWIS to supersede candidate gene strategies. Testing for interaction of genetic variation across the genome and phenotype(s) of interest, GWIS offer greater versatility for downstream genetic and functional analyses, as well as improved capabilities for replication and portability assessment. Thus, it is now standard practice to conduct multi-study, large-scale genomic analyses to boost statistical power to detect genetic signals. This approach requires appropriate quality control practices to control for potential batch or platform effects through common filters and, if needed, genome-wide imputation using appropriate reference panels17. Although GWIS involve similar steps to GWAS, including data collection, genotyping or sequencing, quality control, imputation (if needed), association testing, meta-analysis and replication, additional considerations are necessary to ensure the integrity and harmonization of genetic, phenotype and environmental data (Fig. 3).

Issues arising from inadequate control of heterogeneity by data collection site are amplified in environmental and outcome data, as well as during biospecimen collection for their measurement, and can affect downstream analyses when combining and comparing data from multi-centre studies or multiple individual studies18,19. The PhenXToolkit20 is an important tool for data harmonization, as it provides protocols and definitions for a range of variables over 30 domains, including demographics, social determinants of health and numerous biological systems, such as respiratory or cardiovascular systems. Missing phenotype or environmental exposure data can be imputed, leveraging the correlation structure of the available phenotypes, environmental data or family structure either from the same study or external sources2124. When imputation is unviable (for example, insufficient data across strata or multiple missing variables), it has been shown that considering only environmental data that are comprehensively available for unexposed individuals while allowing for missingness in exposed individuals outperforms any other scenario25.

In planning a multi-study analysis, the comparison of questionnaire data and biological assessments across pre-existing studies for harmonization facilitates the linkage of study variables, enabling prioritization of specific variables for meta-analysis18. In addition, power calculations help comprehend the constraints of the research design given available data, identifying prerequisites for new studies and determining appropriate analytical methods. Thus, they should be routinely considered in the conceptualization stage and can be easily performed using available software2631.

Researchers should be aware of the impact of trait normalization and scaling on the interpretation of their findings. The choice of measurement scale, such as raw or log scales, influences the magnitude of interaction effects. Although trait transformation can reduce scaling effects and accommodate non-normal distributions, it can also generate biases on the null hypothesis testing32. Moreover, scale dependency can result in inaccurate quantification of interaction effects, potentially leading to false positives or negatives32. Therefore, when environmental data across studies vary in scale, researchers could consider outlier studies for validation18, across geographic locations, over time or life stages, with careful reporting and interpretation of the findings.

Overall, careful consideration of data selection, cleaning and harmonization must be conducted for G × E analyses, as potential issues with heterogeneity are amplified with non-genetic data, depending on the specific research question and analytical decisions. Moreover, when pooling studies, downstream quality control at the study or post-meta-analysis level is necessary to avoid analytical bias. A common issue for G × E studies arises from insufficient control for covariation between the covariates and the environment, which could be mitigated by assessing model misspecifications before genome-wide scans33 or applying exposure-stratified analysis34. Furthermore, available frameworks can facilitate summary statistics harmonization and assessment of genomic control, potential analytical issues and cross-study heterogeneity17,35. In addition to applying filters based on the minor allele count and imputation quality17, the study-specific choice of approximate degrees of freedom — estimated as the product of these metrics — is essential for eliminating low-informative variants34.

Statistical G × E methods

Statistical methods are applied to systematically test for interactions between genetic variants and environmental factors. This involves testing each variant (or set of variants)-environment interaction using different statistical approaches while controlling for potential confounders and correcting for multiple testing (Table 1 and Fig. 3). Apart from considering the most adequate analytical framework(s) to approach the research question, thorough consideration of the anticipated genetic model of the trait and identification of potential sources of analytical bias are necessary to ensure a nuanced interpretation of findings. Although statistical interactions can reflect mechanistic interactions, they do not inherently imply biological or functional interaction where two exposures physically interact. The choice of statistical method should align with the study rationale, focusing on either or both the additive or multiplicative scale (Fig. 1), or exploring exposure rate or case status variation within specific genetic subgroups. Therefore, reporting choices and assumptions for a study’s statistical modelling, as well as genetic effects for all tested scenarios, is crucial for contextualization and replication assessment.

Determining differences in polygenic liability by exposure

A pivotal question in G × E research revolves around quantifying the influence of the environment on genetic liability. These following methodologies prove valuable in causal inference, distinguishing parental genetic and environmental effects, providing insights into trait aetiology and/or prioritizing environmental exposure-outcome pairs for further refinement36,37.

Quantifying differing genetic contributions by environment at the population level.

The extent to which G × E contribute to the proportion of phenotypic variation of human traits h2GE remains unclear and is likely highly dependent on population, context and specificity of trait. Although h2GE is generally considered to be low3842, it exhibits heterogeneity in estimates based on the analytical approaches3841,43. However, exposure-partitioned and functionally partitioned heritability analyses suggest variable genetic influence by exposure strata (Box 1). Most analytical tools currently estimate h2GE for a single exposure, but there is growing interest in the overall intersectional contribution of multiple environmental contexts to human health40,42,44 (discussed below). Heritability is often calculated using GREML-based40,41,45,46 methods, which require individual genotypes and yield reliable estimates. Conversely, linkage disequilibrium score regression4,48 gains computational efficiency by relying on summary statistics, and therefore may be more accessible.

Box 1. Novel advances on G × E by the largest multi-ancestry GWIS meta-analysis.

The CHARGE Gene-Lifestyle Interactions Working Group recently leveraged 28 genome-wide interaction study (GWIS) summary statistics from diverse individuals131 to provide methodological advances and characterize gene-lifestyle interactions on lipid levels and blood pressure traits. Their comprehensive gene-environment interactions (G × E) framework showed that the two degrees of freedom (2DF) joint test systematically boosts the statistical power to detect significant loci for ancestry-specific and trans-ancestry analyses compared with the one degree of freedom (1DF) test. However, future methods should consider the low correlation between marginal and interaction genetic effects observed within the identified 2DF loci. Along the same line, they also evidenced the limited applicability of two-step approaches reliant on marginal genetic effects for G × E screening.

Their analysis of the phenotypic variance explained by marginal genetic effects and interaction effects implied that the contribution of interaction effects in addition to marginal genetic effects is relatively limited. Despite this, their work revealed heterogeneity in heritability partitioned by exposure. Further stratification of exposure-stratified heritability by functional and cell-type annotations revealed differential variability between exposed and unexposed strata, suggesting specific biological mechanisms at play. Although these differences in exposure-stratified heritability may arise from unknown biases, further analysis are required to disentangle the contribution of stratified analyses to G × E liability, characterization and prediction.

The CHARGE Gene-Lifestyle Interactions Working Group research also suggested potential ancestry-specific patterns of gene-lifestyle interactions, with differences in phenotypic variability explained by interaction effects, particularly among populations of African descent. This underscores the importance of conducting G × E research on diverse populations.

Genetic variation also shapes our experiences with the environment, such as individual-level lifestyle factors (that is, smoking) or community-level influences (that is, social support). rGE measures the extent to which genetics is linked with environmental context variation. Although rGE cannot be used for risk prediction at the individual level, it can be leveraged to distinguish G × E variants in a case-control setting49 and prioritize trait-environment pairs with substantial trait variance for locus-specific analysis37. The presence of rGE may act as a confounder when modelling G × E, leading to spurious results50 and biased interaction estimates51. Appropriate study designs, such as family-based designs52, and analytical approaches encompassing joint modelling of interaction and correlation53 can contribute to disentangle the specific contribution of each factor.

Lastly, another methodology to assess differences in genetic contributions to health outcomes by environmental context is MR. Rather than quantifying differential genetic liability by environment, MR uses genetic variants as instrumental variables to investigate the causal effects of a modifiable environmental factor on a health outcome. For instance, the influence of body mass index (BMI) on blood pressure could be investigated through BMI-associated variants, provided essential assumptions are met54. One of the sources of MR estimate biases is the presence of horizontal pleiotropy, which occurs when a genetic variant is associated with a health outcome through biological pathways unrelated to the exposure of interest (for example, BMI-associated variants do affect blood pressure through additional confounder pathways). Strengthened MR tools5560 can leverage an environmental characteristic modulating the strength of the association of variants on the exposure to improve the precision of causal inference estimates. For example, MR-G × E54 refined causal estimates for the effect of BMI on systolic blood pressure in the UK Biobank, by accounting for alcohol and physical activity as interaction covariates (represented as IGZ in Fig. 3).

Determining how environmental context alters distributions of genetic risk at the individual level.

PGSs are calculated for individuals by aggregating the weighted effects of multiple genetic variants from GWAS summary statistics to improve genetic risk prediction beyond individual genetic variants. Evaluating the interaction between the PGS and the environment enables the characterization of the polygenic Gx E relationship and allows for G × E screening. Significant PGS-trait pairs could prioritize traits for both single-marker and further multi-marker analyses. Additionally, they can aid in uncovering the underlying biology of the trait by assessing its association or interaction with different exposures, outcomes or omic data. They could also provide insights into the variation in genetic risk distribution by environmental context. For instance, the clinical relevance of exposure-stratified PGS estimation remains unclear at present given the differences in exposure-stratified heritability (Box 1). Nevertheless, GWAS-derived polygenic score-environment interaction (PGS × E) may be susceptible to bias due to small sample size measurement errors in the original GWAS and confounding by rGE if unaccounted for. In addition, PGS × E may fail to capture antagonistic G × E effects and overlook variants with G × E effects but minimal genetic main effects. Despite these limitations, the prevailing approaches in PGS × E development continue to rely on GWAS.

PGS × E estimation can be obtained using a case-only scenario under the assumption of gene-environment independence61 whereas the correlation between the PGS and environment can be discerned by joint modelling of interaction and correlation using a case-control design53. GWIS summary statistics can also be used to derive variant weights based on the main effects for the genetic variant and interaction term for the genetic variant and the environmental exposure, or the interaction term alone, with contradictory findings regarding performance compared with the traditional PGS × E across studies6264. Given the sample size requirements for G × E, analytical strategies advocate for sample overlap between datasets51,65. For instance, PIGEON51 showed that, under a variance component analytical framework, a genetic correlation analysis of GWIS and GWAS summary statistics provided unbiased G × E estimates compared with PGS × E.

Identification of G × E relationships at variant and set levels

With the rise of next-generation sequencing data, traditional epidemiological study designs10, such as case-only or family-based studies (Table 1), are shifting to GWIS-like approaches in unrelated individuals from diverse populations through consortia or biobanks. Instead of characterizing overall differences in genetic influences on human health by environmental context, these analyses aim to estimate how the environment can influence the association between specific genetic variant(s) and outcome. Here, we outline common strategies for single-marker and multi-marker (or set-based) testing for G × E screening and characterization while ensuring computational tractability.

Variant-level effects assessed through joint or stratified approaches.

A model approach to estimate G × E can take the form of a joint model, in which a single statistical model is constructed that includes both main effects for the genetic variant and the environmental exposure as well as an interaction term for the genetic variant and the environmental exposure. Alternatively, in the stratified approach, the study population is divided into subgroups based on different environmental exposure levels, and genetic effects are estimated separately within each stratum.

The most common approach has G × E modelled using a generalized linear model (Fig. 3), including terms for the main genetic and environmental effects and an interaction term for the genetic and environmental effect. The model tests whether there is a significant interaction effect between genetics and environment in which the null hypothesis assumes no interaction between genetics and environment, and the alternative hypothesis suggests a non-zero interaction. The one degree of freedom (1DF) approach examines differences in genetic effects on an outcome given an exposure, whereas the two degrees of freedom (2DF) approach jointly tests66 whether the combined impact of genetic variation and exposure on outcomes differs from their individual effects (Box 1). In the joint model, a 1DF and 2DF inverse-variance weighted meta-analysis can be used for the interaction term67 and joint main genetic and interaction effects68, respectively. Although the 2DF test may have better power for signal detection66, conducting 1DF testing is essential to clarify whether the signal is driven by main genetic effects, interaction effects or the combination of both (Table 1).

An alternative approach tests main genetic effects stratified by outcome or exposure strata. Under the assumption of gene-environment independence, the case-only analysis provides unbiased effect size estimates for the interaction term in affected individuals (Fig. 3). Alternatively, the exposure-stratified framework can distinguish opposite effects from concordant or single-stratum effects through stratumdifference69 or stratum-specific scans. To combine the stratified effects, a 1DF meta-analysis can account for between-strata relatedness70. Under the absence of stratified data, the J2S framework71 estimates the stratified and marginal effects from joint G × E summary statistics. Overall, the joint 2DF meta-analysis outperforms the stratified framework, particularly for low-frequency variants72. However, it may lead to a higher type I error rate under covariate-environment confounding34. To address this issue in quantitative traits, meta-regression evaluates the main genetic effects from groups stratified according to exposure strata34. By pooling multiple studies, meta-regression has uncovered sex-specific and age-specific effects for anthropometric traits in Europeans73. Moreover, several meta-analysis frameworks have been extended to accommodate random effects, overlapping data or multiple phenotypes7476.

Approaches to increase statistical power for G × E.

Analyses investigating the role of both genes and environment on human health often have reduced statistical power compared with standard GWAS approaches. Two contrasting approaches can increase statistical power by either screening loci for follow-up to conduct fewer comparisons or combining variants and analysing them jointly.

Screening approaches prioritizing sets of variants or genes for G × E analysis using data-driven or hypothesis-driven approaches can diminish the computational and statistical burden of multiple testing from genome-wide scans. To increase the statistical power to detect interactions, these strategies leverage aggregate-level effects, the PGS and/or quantitative trait variability. Some methods lacking computational tractability for genome-wide screening are highly valuable for modelling G × E for prioritized variant sets, such as machine learning77 and Bayes approaches78,79 (Table 1). The data-driven strategies often implement two-step approaches, with genome-wide scans for genetic main effects followed by interaction testing78,8083 (Box 1). By contrast, hypothesis-driven strategies rely on biological and functional annotations to prioritize gene or variant sets for interaction testing. For instance, candidate genes could be selected based on their pharmaceutical targetability or implication in xenobiotic metabolism using the ComparativeToxicogenomicsDatabase84 and the Drug-GeneInteractionDatabase85.

A novel screening method for large-scale studies incorporates screening for individual loci or PGS influencing quantitative trait variability, followed by interaction testing for environmental factors. vQTL-environment interaction (vQTL × E) screening has been applied to study interactions on human traits in the UK Biobank, including anthropometric and cardiometabolic traits1416. For example, the vQTL rs1229984 (ADH1B) evidenced a positive association between alcohol intake and alanine transaminase levels in CC carriers but was absent in TT carriers15. Despite the enhanced ability of vQTL effects to capture the plasticity of G × E dynamics compared with traditional GWAS variant weights86, their contribution to certain traits is limited16 and analytical challenges remain (Table 1). Spurious vQTLs may arise from phantom effects stemming from untagged causal additive QTL variants, nonlinear trait-environment relationships or scale artefacts87. Mitigating these effects may involve sensitivity analyses and/or replicating effects on both additive and multiplicative scales. In the context of pervasive heterogeneity of pleiotropic effects on the additive scale for the outcome — where outcome heteroscedasticity is probable — some MR methods54,56,59 become bias-prone, prompting consideration of alternative frameworks57.

In contrast to decreasing the scope of variants investigated, set-based analyses amplify the statistical power for detecting G × E signals by aggregating individual variant effects within genes, regions, pathways or the genome. The aggregated effect of multiple weak signals, possibly undetectable individually, is particularly relevant for rare variant identification. In the presence of synergistic G × E effects, set-based tests outperform other methods88. Several types of set-based G × E have been developed based on the variance component89,90, trait similarity regression91,92, the burden test and sequence kernel association test9397, optimal weighting98, adaptive combination of Bayes factors80, machine learning99 or a combination of these100104. Some of these methods accommodate multi-trait analysis of rare variation105, relatedness100 or joint tests100,102, and are available for longitudinal106 or large-scale biobank data89,100,107. Using whole-exome sequencing data from the UK Biobank, a set-based analysis revealed a significant MC4R by sex interaction on BMI, consistent with prior research on sex-specific MC4R effects on brain structure and eating behaviour100. Nevertheless, the landscape of statistical frameworks for meta-analysis of set-based G × E and joint test summary statistics is limited to fixed-effects and random-effects meta-analysis108, additional filtering prior meta-analysis109, variance component methods110 and linear mixed model-based meta-analysis111.

Considerations for large-scale biobanks.

The inception of large population biobanks will foreseeably lead to the development of numerous computationally tractable G × E statistical frameworks for large-scale genomic analysis. A few are detailed here that account for various potential setbacks when considering large-scale biobanks and their data with respect to ascertainment and availability. Given the outcome-independent recruitment designs, biobanks often have unbalanced case-control ratios, for which saddle-point approximation has been tailored for GWIS112. Moreover, fastGWA-GE conducts genome-wide testing for main genetic, G × E and joint effects while accounting for relatedness either through pedigree information or a sparse genetic relationship matrix113.

Aggregate-level effects can be assessed in large sequencing studies using different methods89,100,101,104,107. The state-of-the-art approximate method, MAGEE100, implements several interaction and joint tests while accommodating relatedness using a genetic relationship matrix. Other methods have been proposed to boost statistical power by modelling multiple factors that may exert antagonistic or synergistic effects, such as StructLMM44 or LEMMA40. StructLMM44 models environmental similarity between individuals as a random effect and adopts a variance component test score for G × E testing40. By contrast, LEMMA is a Bayesian whole-genome approach that jointly models the main genetic and G × E effects considering an environmental score aggregating the effect of multiple environmental exposures40.

Plasticity of G × E dynamics

Unravelling the complexity of G × E (Fig. 2) requires attention to nuanced relationships, with challenges arising from dynamic interactions among multiple genetic and environmental factors. This influence of time becomes vital when understanding G × E, both within the study design process as well as interpretation of results. Genetic effects can exhibit context specificity, whereby the environment can modulate the effect’s magnitude or even invert it based on the individual’s personal environmental context across the lifespan and interactions across the exposures. This plasticity requires multi-exposure, multi-trait or environmental score approaches to account for complex dynamics; such approaches have been popularized as a means of leveraging pleiotropy to identify G × E signals40,44,83,114116. These strategies offer enhanced robustness and power compared with standard single environmental exposure analysis, potentially revealing novel loci44,88,116,117.

Moreover, the endeavour to model the plasticity of the G × E relationship also stems from the variability of the interaction across contexts, such as differential magnified effects across environmental strata that can be captured by the meta-regression approach34. Alternatively, semi-parametric varying-coefficient joint testing enables the identification of signals missed by the traditional linear effects tests118. As an example, multilevel models using time-varying terms explored the impact of resilience and trauma on genetic effects in war-related psychosocial stress over 48 weeks among Syrian refugee and Jordanian non-refugee youth119. The study revealed significant interactions between resilience and 5-HTTLPR and COMT variants, but no time-related changes were observed.

Family-based studies of inherited effects

Family-based designs for G × E, more robust against population structure than case-control methods, are limited120,121, possibly due to challenges in replication assessment arising from disparities in ascertainment, study variable definitions and environmental exposure distribution. Nevertheless, these designs prove convenient for exploring the causal environmental influences of parental traits on offspring outcomes, as the PGS can capture both genetic and environmental factors. Genetic transmission and genetic nurture on child outcomes have been investigated using parental and offspring PGSs122,123. Another method enriched their structural equation models with information about transmission of parental PGSs to explore the effect of parental genetic and environmental influences on offspring trait variation while considering assortative mating58.

Challenges and future directions

The investigation of G × E faces methodological and conceptual challenges related to statistical power, exposure/omics data measurement and quality control, study-specific confounders, co-linearity of environmental exposures, genetic ancestry and G × E dynamics. Table 2 summarizes our recommendations for the conceptualization, methodology, reporting and sharing of future G × E research, expanding on those from Dunn et al.124.

Table 2 |.

Recommendations for the conceptualization, methodology, reporting and sharing of G × E research

Item Recommendation
Conceptualization
Aim Identify strategic opportunities for systematic assessment of environmental and outcome data in pre-existing studies; define the relevant area(s) of research — G × E liability, screening or characterization; define the relevant catalogue of genetic variation (for example, SNPs, CNV and so on) and environmental exposure(s) to assess
Plasticity of G × E Consider the most appropriate strategies to address the G × E dynamic in your research question based on frequency and duration of exposure; testing of linear, non-linear or non-constant effects; and multi-trait, multi-exposure or ‘exposome’ effects
Sample size and power Conduct power calculations to quantify potential bias and implement correction measures wherever possible (useful references2631)
Multi-omics Define whether integration with other omic data may provide additional relevant insights into your research question
Subgroup analysis Consider subgroup analysis in the main stage or as a sensitivity analysis, and the suitable subgroup stratification strategies to address the research question
Replication Aim to assess your findings for replication and to validate previous associations where possible; when the same environmental measurements are not available across studies, consider proxies for replication across geographic locations, over time or life stage
Methods
Study design Report on the study design (including type of study, power calculations, sample size), advantages and limitations to address the research question
Setting Describe the locations and periods of recruitment, exposure and follow-up
Sample Include descriptive statistics of the whole sample and sample subgroups (for example, exposed versus non-exposed) for all the considered variables
Report on the assessment of genetic ancestry, the ethnicity/ancestry group categorization, and the advantages and limitations of the selected approach (useful references142,143)
Variables Define all outcomes, exposures and potential confounders; define cross-study harmonization strategy, if applicable
Data sources or measurement Report sources of data and methodological details for assessing each variable of interest in the report; report the process of data harmonization, as well as choice and assumptions for data transformation and scaling; describe the comparability of assessment methods if multiple methods or groups were considered; describe how repeated measurements were processed, if considered; describe how missing data were handled
Genetic data Follow standard profiling and quality control practices and incorporate additional procedures as needed (useful references140,144,145)
Statistical analysis
Reporting of model selection Report on the reasoning and assumptions for model selection
Confounding Include all the relevant covariates in the analysis and report how covariates were deemed relevant; evaluate model misspecifications; test and report gene-environment correlation, and main genetic and interaction effects for all scales and statistical models considered (useful reference33)
Quality control of summary statistics Follow standard recommendations for study-level and meta-analysis summary statistics quality control and consider additional practices when needed for each specific scenario (useful reference17)
Reporting and sharing G × E research
Reporting Follow STREGA guidelines for genetic association studies140
Follow the PRS-RS guidelines for PGS141
Follow the GWAS Catalog guidelines for GWAS summary statistics146
Report on the model parameters and interaction term(s) as well as the effect estimates and statistical significance of genes, environment and G × E; report G × E results even if not significant146
Data sharing Share summary statistics to publicly available resources13,147

CNV, copy number variation; G × E, gene–environment interactions; GWAS, genome-wide association study; PGS, polygenic score; PRS-RS, Polygenic Risk Score Reporting Standards; STREGA, STrengthening the REporting of Genetic Association Studies.

G × E frameworks are gaining popularity, likely preceding a wave of new approaches especially tailored for the analysis of large-scale sequencing data. Although h2GE MR and PGS continue to attract attention, the utilization of case-only and family-based designs has declined, giving way to case-control or exposure-stratified GWIS. Along the same line, methodologies focused on non-linear or non-constant effects, longitudinal data, machine learning, Bayesian approaches or vQTLs remain outliers in current research practice.

Nevertheless, this is bound to change as refined methods surface. The development and implementation of analytical frameworks that accommodate G × E plasticity, multiple environment or omic layers could be achieved using existing simulated large-scale genomic data125 and large-scale studies of diverse participants incorporating standardized protocols for extensive environmental exposure assessment, such as the National Institutes of Health (NIH) All of Us126 and Environmental Influences on Child Health Outcomes (ECHO)19 research programmes.

Multi-omics as a link between genetics and environment

The integration or combination of multiple omics layers could accelerate the discovery of novel genes or the underlying biological mechanisms of the disease (Fig. 1). Gene expression and DNA methylation patterns are, at least partially, genetically controlled, with differences across populations defined by genetic similarity and/or racial or ethnic identity127,128. In turn, DNA methylation patterns are associated with a wide range of environmental exposures even from the prenatal stage and can act as a regulator of gene expression129. A recent study combined summary statistics for gene-lifestyle interaction associations, DNA methylation and gene expression levels to prioritize relevant loci based on the molecular evidence of these interactions130 (Fig. 1C). Another study evaluated changes in allelic gene expression in response to several treatments in pluripotent stem cells from six donors131. Approximately half of their findings had not been previously reported by large-scale allelic expression analyses not considering interactions, highlighting the potential of constraining genomic variance. The successful application of multi-omics for G × E identification, characterization and prediction requires the development of cost-effective data storage and analysis strategies to address challenges related to high dimensionality, batch effects and temporal heterogeneity on the observation of biological effects across omic layers.

Compounded needs for diversity of genetics and environment for G × E

Knowledge of the G × E architecture of human traits remains limited, partially due to the modest number of G × E studies conducted to date (Fig. 4a), representing 1.14% of the studies and 1.87% of the GWAS Catalog signals (Box 2; see Supplementary information). As is the case in genetics and genomics research at large, another ongoing limitation of current G × E studies is the lack of diversity in genetic studies (Fig. 4b). This limitation is noteworthy as historically under-studied groups may encounter environmental health disparities, such as differential exposure to pollutants or nutritional deficits, that could influence the gradient of G × E effect sizes estimates at causal loci across populations132. Moreover, differences in allele frequency distributions and linkage disequilibrium patterns across populations could also affect the interaction with the environment, leading to spurious G × E signals if local ancestry is not accounted for133. The presence of shared local environment and possible correlations between genetic ancestry and socio-economic/cultural factors may induce ancestry-environment correlation134,135, which needs to be considered when evaluating differences across populations and environments.

Fig. 4 |. GWIS characteristics and findings over time, 2008–2022, as curated by the NHGRI-EBI GWAS Catalog.

Fig. 4 |

a, Maximum and median number of participants of the genome-wide interaction study (GWIS) discovery sample, cumulative number of gene-environment interactions (G × E) signals and GWIS publications with and without full summary statistics per year. b, GWIS individual sample size across the discovery stage per study accession and ‘broad’ ancestry category per year. The barplot summarizes the data between 2008 and 2022. c, Outcome distribution per ‘broad’ ancestry category as annotated by the GWAS Catalog based on the number of publications over time. d, Exposure distribution per ‘broad’ ancestry category as annotated by the GWAS Catalog based on the number of publications over time. GWAS, genome-wide association study; NHGRI, National Human Genome Research Institute.

Box 2. The current landscape of G × E.

graphic file with name nihms-2078606-f0005.jpg

As of March 2023, the National Human Genome Research Institute (NHGRI)-EBI GWAS Catalog147 encompasses 228 gene-environment interactions (G × E) publications spanning 806 unique study accessions — trait-specific analyses conducted within each publication — across a total of 649 unique outcome-exposure pairs and 9,240 G × E signals. The number of summary statistics publicly available has not kept pace with the rising sample size and identified G × E signals (Fig. 4a), limiting their downstream use. Although the largest meta-analysis across the discovery and replication phases included 610,410 diverse participants (Box 1), most studies were much smaller (50.9% under the median, ~13,600).

As observed for genome-wide association studies (GWAS)134, the majority contribution to G × E studies derives from individuals of European ancestries in terms of sample size (87.3% and 65.4% in the discovery and replication phases, respectively) and assessed outcomes (Fig. 4bd). The most frequently investigated outcomes across ancestry groups are lipid/lipoprotein (n=27) and cardiovascular measurements (n=24), as well as biological processes (n=18), excluding general broad ‘other’ categories (Fig. 4c). The most frequently considered exposures include alcohol consumption (n=40), smoking exposure (n=29) and sex (n=22) (Fig. 4d).

Despite the absence of a harmonized database for polygenic G × E, there is a discernible surge in the number of publications on polygenic score-environment interactions (PGS×E) (n=294, from 2008 until 2022) compared with genome-wide interaction studies (GWIS), with a predominant focus on psychiatric (22.7%), neurological (13.4%) and psychological (12.0%) traits (see the figure, parts a,b). Notably, a considerable proportion of publications are authored by researchers affiliated to institutions in the United States and England (72.9%) and funded by the US Department of Health and Human Services (HHS) and UK Research and Innovation (UKRI) institutions (see the figure, parts c,d). Although these trends are consistent with large-scale genomic research in general, this emphasizes a need for better representation across outcomes, environments and geography to fully explore the heterogeneity of genetic effects by diverse environmental contexts.

Number of GWIS and PGS × E publications published per year between 2008 and 2022 (part a). Polar barplots for cumulative percentages per study type by research areas (part b), country of affiliation (part c) or funding source (part d). Top ten items are shown. CIHR, Canadian Institutes of Health Research; EU, European Union; MRC, Medical Research Council; NCI, National Cancer Institute; NIA, National Institute on Aging; NIH, National Institutes of Health; NSFC, National Natural Science Foundation of China.

Beyond the genetic ancestry-environment correlation, appropriate strategies are needed to address environmental geographic heterogeneity. Foremost, excluding such diversity may restrain the reach of public health initiatives, especially for collectives affected by environmental geographic inequalities. Conversely, biased findings may arise if inadequately considered. Homogeneous subgroup stratification could improve signal detection by constraining either the environmental or the genomic variance131, as long as the phenotypic variance is not diminished to the point that it compromises the research question. Although there is no consensus on the strategy for subgroup stratification, exploratory and sensitivity analysis could discern the effects of sample assignment by self-reported ethnicity or major ancestral groups, genomic clusters (for example, estimated identity-by-descent haplotype groups or global ancestry) or environmental/demographic clusters (for example, age groups or school district).

Further international collaborations and initiatives diversifying the ancestry, geography and socio-economic backgrounds of study participants and research teams hold the potential to foster a more robust understanding of the full spectrum of genetic effects across local contexts which require nuanced understanding that would not be achieved without local expertise. This strategy could help mitigate the Eurocentric biases in G × E studies (Fig. 4bd and Box 2) and limited regional diversity in the research workforce, ultimately shaping the scientific questions posed and funded.

Adherence to FAIR data principles for G × E

The rise in biobank and consortia initiatives has solidified summary statistics as the standard for data sharing, considering the stringent requirements for individual-level data in terms of data storage memory and anonymity protection. Although some of the discussed G × E frameworks use summary statistics to advance our understanding of trait and disease mechanisms, few publications openly shared these statistics through the GWAS Catalog136 (Fig. 4a). The PGS Catalog137, although currently lacking PGS × E, has the potential for their incorporation. Promoting open and consistent data sharing of G × E findings, under the FAIR (Findable, Accessible, Interoperable and Reusable) data principles138, should be a priority for journals and researchers. These practices enhance not only reproducibility, portability and transparency assessments but also dissemination of research results139 and the development of new analytical frameworks.

Given the wide breadth of analytical G × E frameworks and the presence of standardized reporting frameworks for some, such as STrengthening the REporting of Genetic Association Studies (STREGA)140 for GWAS and Polygenic Risk Score Reporting Standards (PRS-RS)141 for the PGS, extensions should be developed to account for current approaches to G × E analyses. In this regard, there is a pressing need for guidelines in constructing, evaluating and reporting PGS × E, with best practices involving untangling potential confounding, interaction and correlation factors. Despite the seemingly modest contribution of G × E to human health3842, the assessment of variation in heritability across environmental contexts could provide insights into the potential clinical utility of PGS developed considering specific high-risk exposure strata (Box 1).

Conclusions

The aetiology of human diseases is multifaceted and intricate, often involving the interplay of various genetic and environmental factors. The exploration of G × E in this context faces substantial methodological and conceptual challenges. Despite the variety of G × E statistical frameworks for multiple scenarios, few methods accommodate G × E plasticity, and large-scale studies of these questions remain scarce. As the number of G × E studies on human traits and diseases continues to rise, it is crucial to emphasize the need for diversity and inclusion, standardized reporting protocols and data sharing. These practices will foster transparency and facilitate the assessment of reproducibility in G × E research. Lastly, there is a need for frameworks that consider the spectrum of genetic variation types as the field has increasingly moved towards their inclusion. Although G × E may not account for the bulk of phenotypic variability for complex human traits and diseases, the identification of G × E may increase our understanding of the underlying biological mechanisms and inform environmental health decisions and risk assessment for specific genetic subgroups, providing a much-needed additional link between the success of traditional trait-mapping approaches and their translation for precision medicine and public health.

Supplementary Material

Supplementary Information

The online version contains supplementary material available at https://doi.org/10.1038/s41576-024-00731-z.

Acknowledgements

The authors’ work was funded by the Combining advances in Genomics and Environmental science to accelerate Actionable Research and practice in ASD (GEARS) Network (R01ES034554). E.H.-L. and G.L.-W. received support from the National Institutes of Health (NIH) National Human Genome Research Institute (NHGRI) (R35HG011944-02). No funding agencies had a role in the study design, data collection and analysis, decision to publish or preparation of the manuscript. The views expressed are those of the authors and not necessarily those of any funder.

Glossary

Biological or functional interaction

The scenario whereby the exposures physically interact to produce an outcome

Broad-sense heritability

(H2). The proportion of phenotypic variation attributed to genetic variation, including additive, dominance, epistasis and gene-environment interactions (G × E)

G × E correlation

(rGE). A situation when there are genetic differences in exposure to specific environments. rGEcan be classified as passive, active or evocative according to its source

G × E heritability h2GE

The proportion of the phenotypic variability attributable to G × E

Genome-wide interaction studies

(GWIS). Analytical approaches to investigate the modifying effect of an exposure variable on a genetic association

Heteroscedasticity

A statistical phenomenon where the variability of the residuals (errors) in a regression model varies across different levels of the predictor variables

Imputation

The statistical process of predicting missing genetic data based on observed data, typically using reference panels of known genetic variants

Instrumental variables

Variables associated with an exposure that is not associated with the outcome through any other pathway

Mechanistic interactions

Situations when there are individuals for whom the outcome would occur if both of two exposures were present but not if one or both of the exposures were absent

Mendelian randomization

(MR). An analytical method that uses ‘instrumental variables’ for the predictor to determine the causal influences of a predictor on an outcome. MR is based on three key assumptions: instrumental variables are associated with the exposure; instrumental variables are not associated with any confounder; and instrumental variables do not exert effects on the phenotype not mediated by the exposure (also known as ‘horizontal pleiotropy’)

Multiple testing

Repeated statistical testing of hypotheses within a single study, increasing the likelihood of false positives. Adjustments such as Bonferroni correction help control the family-wise error rate

Polygenic score

(PGS). A single value that aggregates the effect of variants across an individual’s germ-line genome to quantify the genetic predisposition to a trait. Usually calculated as a weighted sum of several trait-associated alleles (dosage), where the per-allele effect sizes (βi) are extracted from the genome-wide association study (GWAS) summary statistics of independent studies

Set-based analyses

Analytical methods that aggregate G × E signals within a gene or genomic region

Statistical interactions

Situations when the effect of a risk factor on an outcome varies across strata of another factor. An interaction on an additive scale occurs when the combined effect of the risk factors differs with regards to the sum of the individual effects, whereas an interaction on a multiplicative scales occurs when the combined effect of the risk factors differs with regards to the product of the individual effects

Variance quantitative trait locus

(vQTL). Genetic variants associated with the variance of a continuous trait

Footnotes

Competing interests

The authors declare no competing interests.

References

  • 1.Brandes N, Weissbrod O. & Linial M. Open problems in human trait genetics. Genome Biol. 23, 131 (2022). This review comprehensively discusses the major issues and challenges moving forward in human genetics. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Vicente CT, Revez JA & Ferreira MAR Lessons from ten years of genome-wide association studies of asthma. Clin. Transl. Immunol. 6, e165 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Tsuo K. et al. Multi-ancestry meta-analysis of asthma identifies novel associations and highlights the value of increased power and diversity. Cell Genomics 2, 100212 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Trivedi M. & Denton E. Asthma in children and adults — what are the differences and what can they tell us about asthma? Front. Pediatr. 7, 256 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Sandoval-Motta S, Aldana M, Martinez-Romero E. & Frank A. The human microbiome and the missing heritability problem. Front. Genet. 8, 80 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Manning AK et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659–669 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Pingault J-B et al. Using genetic data to strengthen causal inference in observational research. Nat. Rev. Genet. 19, 566–580 (2018). [DOI] [PubMed] [Google Scholar]
  • 8.Ritz BR et al. Lessons learned from past gene-environment interaction successes. Am. J. Epidemiol. 186, 778–786 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.McAllister K. et al. Current challenges and new opportunities for gene-environment interaction studies of complex diseases. Am. J. Epidemiol. 186, 753–761 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Thomas D. Gene-environment-wide association studies: emerging approaches. Nat. Rev. Genet. 11, 259–272 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Virolainen SJ, VonHandorf A, Viel KCMF, Weirauch MT & Kottyan LC Gene-environment interactions and their impact on human health. Genes. Immun. 24, 1–11 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wu H, Eckhardt CM & Baccarelli AA Molecular mechanisms of environmental exposures and human disease. Nat. Rev. Genet. 24, 332–344 (2023). This review highlights the impact of environmental factors on molecular mechanisms and the challenges in exposome approaches. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Breton CV et al. Exploring the evidence for epigenetic regulation of environmental influences on child health across generations. Commun. Biol. 4, 769 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Marderstein AR et al. Leveraging phenotypic variability to identify genetic interactions in human phenotypes. Am. J. Hum. Genet. 108, 49–67 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Westerman KE et al. Variance-quantitative trait loci enable systematic discovery of gene-environment interactions for cardiometabolic serum biomarkers. Nat. Commun. 13, 3993 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Shi G. Genome-wide variance quantitative trait locus analysis suggests small interaction effects in blood pressure traits. Sci. Rep. 12, 12649 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Winkler TW et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–1212 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ives C. et al. Linking complex disease and exposure data — insights from an environmental and occupational health study. J. Expo. Sci. Environ. Epidemiol. 33, 12–16 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Knapp EA et al. The Environmental Influences on Child Health Outcomes (ECHO)-wide cohort. Am. J. Epidemiol. 10.1093/aje/kwad071 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Hamilton CM et al. The PhenX Toolkit: get the most from your measures. Am. J. Epidemiol. 174, 253–260 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.White IR, Royston P. & Wood AM Multiple imputation using chained equations: issues and guidance for practice. Stat. Med. 30, 377–399 (2011). [DOI] [PubMed] [Google Scholar]
  • 22.Austin PC, White IR, Lee DS & van Buuren S. Missing data in clinical research: a tutorial on multiple imputation. Can. J. Cardiol. 37, 1322–1331 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hormozdiari F. et al. Imputing phenotypes for genome-wide association studies. Am. J. Hum. Genet. 99, 89–103 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Dahl A. et al. A multiple-phenotype imputation method for genetic studies. Nat. Genet. 48, 466–472 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Xu H. et al. Lifestyle risk score: handling missingness of individual lifestyle components in meta-analysis of gene-by-lifestyle interactions. Eur. J. Hum. Genet. 29, 839–850 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Moore CM, Jacobson SA & Fingerlin TE Power and sample size calculations for genetic association studies in the presence of genetic model misspecification. Hum. Hered. 84, 256–271 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kooperberg C. & Hsu L. powerGWASinteraction: power calculations for G × E and G×G interactions for GWAS. R package version 1.1.3 https://CRAN.R-project.org/package=powerGWASinteraction (2015). [Google Scholar]
  • 28.Gauderman WJ Sample size requirements for matched case-control studies of gene-environment interaction. Stat. Med. 21, 35–50 (2002). [DOI] [PubMed] [Google Scholar]
  • 29.Gauderman WJ Candidate gene association analysis for a quantitative trait, using parent-offspring trios. Genet. Epidemiol. 25, 327–338 (2003). [DOI] [PubMed] [Google Scholar]
  • 30.Gjerdevik M. et al. Haplin power analysis: a software module for power and sample size calculations in genetic association analyses of family triads and unrelated controls. BMC Bioinforma. 20, 165 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gaye A, Burton TWY & Burton PR ESPRESSO: taking into account assessment errors on outcome and exposures in power analysis for association studies. Bioinformatics 31, 2691–2696 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.McCaw ZR, Lane JM, Saxena R, Redline S. & Lin X. Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics 76, 1262–1272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ueki M, Fujii M. & Tamiya G for Alzheimer’s Disease Neuroimaging Initiative and the Alzheimer’s Disease Metabolomics Consortium Quick assessment for systematic test statistic inflation/deflation due to null model misspecifications in genome-wide environment interaction studies. PLoS ONE 14, e0219825 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Shi G. & Nehorai A. Robustness of meta-analyses in finding gene×environment interactions. PLoS ONE 12, e0171446 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Rao DC et al. Multiancestry study of gene-lifestyle interactions for cardiovascular traits in 610475 individuals from 124 cohorts: design and rationale. Ciro. Cardiovasc. Genet. 10, e001649 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Young AI et al. Relatedness disequilibrium regression estimates heritability without environmental bias. Nat. Genet. 50, 1304–1310 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bernabeu E. et al. Sex differences in genetic architecture in the UK Biobank. Nat. Genet. 53, 1283–1289 (2021). [DOI] [PubMed] [Google Scholar]
  • 38.Wang H. et al. Genotype-by-environment interactions inferred from genetic effects on phenotypic variability in the UK Biobank. Sci. Adv. 5, eaaw3538 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Laville V. et al. Gene-lifestyle interactions in the genomics of human complex traits. Eur. J. Hum. Genet. 30, 730–739 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kerin M. & Marchini J. Inferring gene-by-environment interactions with a Bayesian whole-genome regression model. Am. J. Hum. Genet. 107, 698–713 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kerin M. & Marchini J. A non-linear regression method for estimation of gene-environment heritability. Bioinformatics 36, 5632–5639 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Dahl A. et al. A robust method uncovers significant context-specific heritability in diverse complex traits. Am. J. Hum. Genet. 106, 71–91 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Sulc J. et al. Quantification of the overall contribution of gene-environment interaction for obesity-related traits. Nat. Commun. 11, 1385 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Moore R. et al. A linear mixed-model approach to study multivariate gene-environment interactions. Nat. Genet. 51, 180–186 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Robinson MR et al. Genotype-covariate interaction effects and the heritability of adult body mass index. Nat. Genet. 49, 1174–1181 (2017). [DOI] [PubMed] [Google Scholar]
  • 46.Ni G. et al. Genotype-covariate correlation and interaction disentangled by a whole-genome multivariate reaction norm model. Nat. Commun. 10, 2239 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Bulik-Sullivan BK et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Shin J. & Lee SHG × Esum: a novel approach to estimate the phenotypic variance explained by genome-wide G × E interaction based on GWAS summary statistics for biobank-scale data. Genome Biol. 22, 183 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Murcray CE, Lewinger JP & Gauderman WJ Gene-environment interaction in genome-wide association studies. Am. J. Epidemiol. 169, 219–226 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Dudbridge F. & Fletcher O. Gene-environment dependence creates spurious gene-environment interaction. Am. J. Hum. Genet. 95, 301–307 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Miao J. et al. Reimagining gene-environment interaction analysis for human complex traits. Preprint at bioRxiv 10.1101/2022.12.11.519973 (2022). [DOI] [Google Scholar]
  • 52.Warrier V. et al. Gene-environment correlations and causal effects of childhood maltreatment on physical and mental health: a genetically informed approach. Lancet Psychiatry 8, 373–386 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Wang Z, Shi W, Carroll RJ & Chatterjee N. Joint modeling of gene-environment correlations and interactions using polygenic risk scores in case-control studies. Preprint at bioRxiv 10.1101/2023.02.14.528572 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Spiller W, Hartwig FP, Sanderson E, Davey Smith G. & Bowden J. Interaction-based Mendelian randomization with measured and unmeasured gene-by-covariate interactions. PLoS ONE 17, e0271933 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Kim Y, Balbona JV & Keller MC Bias and precision of parameter estimates from models using polygenic scores to estimate environmental and genetic parental influences. Behav. Genet. 51, 279–288 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Tchetgen Tchetgen E, Sun B. & Walter S. The GENIUS approach to robust Mendelian randomization inference. Stat. Sci. 36, 443–464 (2021). [Google Scholar]
  • 57.Liu Z, Ye T, Sun B, Schooling M. & Tchetgen ET Mendelian randomization mixed-scale treatment effect robust identification and estimation for causal inference. Biometrics 79, 2208–2219 (2022). [DOI] [PubMed] [Google Scholar]
  • 58.Balbona JV, Kim Y. & Keller MC Estimation of parental effects using polygenic scores. Behav. Genet. 51, 264–278 (2021). This article describes a model to determine parental effects on child traits using polygenic risk scores while accounting for assortative mating. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Spiller W, Slichter D, Bowden J. & Davey Smith G. Detecting and correcting for bias in Mendelian randomization analyses using gene-by-environment interactions. Int. J. Epidemiol. 48, 702–712 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Karageorgiou V, Tyrrell J, Mckinley TJ & Bowden J. Weak and pleiotropy robust sex-stratified Mendelian randomization in the one sample and two sample settings. Genet. Epidemiol. 47, 135–151 (2023). [DOI] [PubMed] [Google Scholar]
  • 61.Meisner A, Kundu P. & Chatterjee N. Case-only analysis of gene-environment interactions using polygenic risk scores. Am. J. Epidemiol. 188, 2013–2020 (2019). [DOI] [PubMed] [Google Scholar]
  • 62.Tang Y, You D, Yi H, Yang S. & Zhao Y. IPRS: leveraging gene-environment interaction to reconstruct polygenic risk score. Front. Genet. 13, 801397 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Arnau-Soler A. et al. Genome-wide by environment interaction studies of depressive symptoms and psychosocial stress in UK Biobank and Generation Scotland. Transl. Psychiatry 9, 14 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Werme J, van der Sluis S, Posthuma D. & de Leeuw CA Genome-wide gene-environment interactions in neuroticism: an exploratory study across 25 environments. Transl. Psychiatry 11, 180 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Lau M, Kress S, Schikowski T. & Schwender H. Efficient gene-environment interaction testing through bootstrap aggregating. Sci. Rep. 13, 937 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Kraft P, Yen Y-C, Stram DO, Morrison J. & Gauderman WJ Exploiting gene-environment interaction to detect genetic associations. Hum. Hered. 63, 111–119 (2007). [DOI] [PubMed] [Google Scholar]
  • 67.Willer CJ, Li Y. & Abecasis GR METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Manning AK et al. Meta-analysis of gene-environment interaction: joint estimation of SNP and SNP×environment regression coefficients. Genet. Epidemiol. 35, 11–18 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Aschard H, Hancock DB, London SJ & Kraft P. Genome-wide meta-analysis of joint tests for genetic and gene-environment interaction effects. Hum. Hered. 70, 292–300 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Randall JC et al. Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLoS Genet 9, e1003500 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Laville V. et al. Deriving stratified effects from joint models investigating gene-environment interactions. BMC Bioinforma. 21, 251 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Sung YJ et al. An empirical comparison of joint and stratified frameworks for studying G × E interactions: systolic blood pressure and smoking in the CHARGE Gene-Lifestyle Interactions Working Group. Genet. Epidemiol. 40, 404–415 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Winkler TW et al. The influence of age and sex on genetic associations with adult body size and shape: a large-scale genome-wide interaction study. PLoS Genet. 11, e1005378 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Jin Q. & Shi G. Meta-analysis of joint test of SNP and SNP-environment interaction with heterogeneity. Hum. Hered. 86, 1–9 (2021). [DOI] [PubMed] [Google Scholar]
  • 75.Jin Q. & Shi G. Meta-analysis of SNP-environment interaction with heterogeneity. Hum. Hered. 84, 117–126 (2019). [DOI] [PubMed] [Google Scholar]
  • 76.Yu Y. et al. Subset-based analysis using gene-environment interactions for discovery of genetic associations across multiple studies or phenotypes. Hum. Hered. 83, 283–314 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Wu S, Xu Y, Zhang Q. & Ma S. Gene-environment interaction analysis via deep learning. Genet. Epidemiol. 47, 261–286 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Gauderman WJ et al. Update on the state of the science for analytical methods for gene-environment interactions. Am. J. Epidemiol. 186, 762–770 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Mukherjee B. & Chatterjee N. Exploiting gene-environment independence for analysis of case-control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics 64, 685–694 (2008). [DOI] [PubMed] [Google Scholar]
  • 80.Lin W-Y, Huang C-C, Liu Y-L, Tsai S-J & Kuo P-H Polygenic approaches to detect gene-environment interactions when external information is unavailable. Brief. Bioinform. 20, 2236–2252 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Gauderman WJ, Zhang P, Morrison JL & Lewinger JP Finding novel genes by testing G × E interactions in a genome-wide association study. Genet. Epidemiol. 37, 603–613 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Hsu L. et al. Powerful cocktail methods for detecting genome-wide gene-environment interaction. Genet. Epidemiol. 36, 183–194 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Majumdar A. et al. A two-step approach to testing overall effect of gene-environment interaction for multiple phenotypes. Bioinformatics 36, 5640–5648 (2021). [DOI] [PubMed] [Google Scholar]
  • 84.Davis AP et al. Comparative Toxicogenomics Database (CTD): update 2023. Nucleic Acids Res. 51, D1257–D1262 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Freshour SL et al. Integration of the Drug-Gene Interaction Database (DGIdb 4.0) with open crowdsource efforts. Nucleic Acids Res. 49, D1144–D1151 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Miao J. & Lu Q. in Handbook of Statistical Bioinformatics (eds. Lu HH-S, Scholköpf B, Wells MT & Zhao H) 257–270 (Springer, 2022). [Google Scholar]
  • 87.Lyon MS, Millard LAC, Smith GD, Gaunt TR & Tilling K. Hypothesis-free detection of gene-interaction effects on biomarker concentration in UK Biobank using variance prioritisation. Preprint at medRxiv 10.1101/2022.01.05.21268406 (2022). [DOI] [Google Scholar]
  • 88.Kim J. et al. Joint analysis of multiple interaction parameters in genetic association studies. Genetics 211, 483–494 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Chi JT et al. SEAGLE: a scalable exact algorithm for large-scale set-based gene-environment interaction tests in Biobank data. Front. Genet. 12, 710055 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Lin X, Lee S, Christiani DC & Lin X. Test for interactions between a genetic marker set and environment in generalized linear models. Biostatistics 14, 667–681 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Zhao G, Marceau R, Zhang D. & Tzeng J-Y Assessing gene-environment interactions for common and rare variants with binary traits using gene-trait similarity regression. Genetics 199, 695–710 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Tzeng J-Y et al. Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression. Am. J. Hum. Genet. 89, 277–288 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Zhao N, Zhang H, Clark JJ, Maity A. & Wu MC Composite kernel machine regression based on likelihood ratio test for joint testing of genetic and gene-environment interaction effect. Biometrics 75, 625–637 (2019). [DOI] [PubMed] [Google Scholar]
  • 94.Price AL et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Wu MC et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Li B. & Leal SM Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Jiao S. et al. SBERIA: set-based gene-environment interaction test for rare and common variants in complex diseases. Genet. Epidemiol. 37, 452–464 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Zhao Z, Zhang J, Sha Q. & Hao H. Testing gene-environment interactions for rare and/or common variants in sequencing association studies. PLoS ONE 15, e0229217 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Zemlianskaia N, Gauderman WJ & Lewinger JP A scalable hierarchical lasso for gene-environment interactions. J. Comput. Graph. Stat. 31, 1091–1103 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Wang X. et al. Efficient gene-environment interaction tests for large biobank-scale sequencing studies. Genet. Epidemiol. 44, 908–923 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Su Y-R, Di C-Z, Hsu L. & Genetics and Epidemiology of Colorectal Cancer Consortium A unified powerful set-based test for sequencing data analysis of G × E interactions. Biostatistics 18, 119–131 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Chen H, Meigs JB & Dupuis J. Incorporating gene-environment interaction in testing for association with rare genetic variants. Hum. Hered. 78, 81–90 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Lim E, Chen H, Dupuis J. & Liu C-T A unified method for rare variant analysis of gene-environment interactions. Stat. Med. 39, 801–813 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Lin X. et al. Test for rare variants by environment interactions in sequencing association studies. Biometrics 72, 156–164 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Zhang J. et al. Test gene-environment interactions for multiple traits in sequencing association studies. Hum. Hered. 84, 170–196 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.He Z. et al. Set-based tests for the gene-environment interaction in longitudinal studies. J. Am. Stat. Assoc. 112, 966–978 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Johnsen PV, Riemer-Sørensen S, DeWan AT, Cahill ME & Langaas M. A new method for exploring gene-gene and gene-environment interactions in GWAS with tree ensemble methods and SHAP values. BMC Bioinforma. 22, 230 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Lee S, Teslovich TM, Boehnke M. & Lin X. General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet. 93, 42–53 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Wang J. et al. A meta-analysis approach with filtering for identifying gene-level gene-environment interactions. Genet. Epidemiol. 42, 434–446 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Jin X. & Shi G. Variance-component-based meta-analysis of gene-environment interactions for rare variants. G3 Bethesda 11, jkab203 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Wang X. et al. Genomic summary statistics and meta-analysis for set-based gene-environment interaction tests in large-scale sequencing studies. Preprint at medRxiv 10.1101/2022.05.08.22274819 (2022). [DOI] [Google Scholar]
  • 112.Bi W. et al. A fast and accurate method for genome-wide scale phenome-wide G × E analysis and its application to UK Biobank. Am. J. Hum. Genet. 105, 1182–1192 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Zhong W, Chhibber A, Luo L, Mehrotra DV & Shen J. A fast and powerful linear mixed model approach for genotype-environment interaction tests in large-scale GWAS. Brief. Bioinform. 24, bbac547 (2023). [DOI] [PubMed] [Google Scholar]
  • 114.Jin X. & Shi G. Kernel-based gene-environment interaction tests for rare variants with multiple quantitative phenotypes. PLoS ONE 17, e0275929 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Hecker J. et al. A robust and adaptive framework for interaction testing in quantitative traits between multiple genetic loci and exposure variables. PLoS Genet. 18, e1010464 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Westerman KE et al. GEM: scalable and flexible gene-environment interaction analysis in millions of samples. Bioinformatics 37, 3514–3520 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Osazuwa-Peters OL et al. Identifying blood pressure loci whose effects are modulated by multiple lifestyle exposures. Genet. Epidemiol. 44, 629–641 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Zhou Z, Ku H-C, Manning SE, Zhang M. & Xing C. A varying coefficient model to jointly test genetic and gene-environment interaction effects. Behav. Genet. 10.1007/s10519-022-10131-w (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Mulligan CJ et al. Novel G × E effects and resilience: a case:control longitudinal study of psychosocial stress with war-affected youth. PLoS ONE 17, e0266509 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Zhang W. et al. Detecting gene-environment interaction for maternal exposures using case-parent trios ascertained through a case with non-syndromic orofacial cleft. Front. Cell Dev. Biol. 9, 621018 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Dizier M-H et al. Interactive effect between ATPase-related genes and early-life tobacco smoke exposure on bronchial hyper-responsiveness detected in asthma-ascertained families. Thorax 74, 254–260 (2019). [DOI] [PubMed] [Google Scholar]
  • 122.Pingault J-B et al. Genetic nurture versus genetic transmission of risk for ADHD traits in the Norwegian Mother, Father and Child Cohort Study. Mol. Psychiatry 10.1038/s41380-022-01863-6 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Demange PA et al. Estimating effects of parents’ cognitive and non-cognitive skills on offspring education using polygenic scores. Nat. Commun. 13, 4801 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Dunn EC et al. Research review: gene-environment interaction research in youth depression — a systematic review with recommendations for future research. J. Child. Psychol. Psychiatry 52, 1223–1238 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Zhang H. et al. A new method for multiancestry polygenic prediction improves performance across diverse populations. Nat. Genet. 55, 1757–1768 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.All of Us Research Program Investigators et al. The ‘All of Us’ Research Program. N. Engl. J. Med. 381, 668–676 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Kachuri L. et al. Gene expression in African Americans, Puerto Ricans and Mexican Americans reveals ancestry-specific patterns of genetic architecture. Nat. Genet. 55, 952–963 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Li B. et al. Incorporating local ancestry improves identification of ancestry-associated methylation signatures and meQTLs in African Americans. Commun. Biol. 5, 401 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129.Czamara D. et al. Integrated analysis of environmental and genetic influences on cord blood DNA methylation in new-borns. Nat. Commun. 10, 2548 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Majarian TD et al. Multi-omics insights into the biological mechanisms underlying statistical gene-by-lifestyle interactions with smoking and alcohol consumption. Front. Genet. 13, 954713 (2022). This study elucidates how epigenomics, transcriptomics and G × E summary statistics can be combined to provide molecular evidence for Gx E statistical interactions. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Findley AS et al. Functional dynamic genetic effects on gene regulation are specific to particular cell types and environmental conditions. eLife 10, e67077 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132.Nagar SD, Napoles AM, Jordan IK & Marino-Ramirez L. Socioeconomic deprivation and genetic ancestry interact to modify type 2 diabetes ethnic disparities in the United Kingdom. EClinicalMedicine 37, 100960 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133.Park DS et al. An ancestry-based approach for detecting interactions. Genet. Epidemiol. 42, 49–63 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134.Abdellaoui A, Yengo L, Verweij KJH & Visscher PM 15years of GWAS discovery: realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135.Peterson RE et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136.Buniello A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137.Lambert SA et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 53, 420–425 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138.Wilkinson MD et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139.Reales G. & Wallace C. Sharing GWAS summary statistics results in more citations. Commun. Biol. 6, 116 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Little J. et al. STrengthening the REporting of Genetic Association Studies (STREGA): an extension of the STROBE statement. PLoS Med. 6, e22 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Wand H. et al. Improving reporting standards for polygenic scores in risk prediction studies. Nature 591, 211–219 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Khan AT et al. Recommendations on the use and reporting of race, ethnicity, and ancestry in genetic research: experiences from the NHLBI TOPMed program. Cell Genomics 2, 100155 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143.Committee on the Use of Race, Ethnicity, and Ancestry as Population Descriptors in Genomics Research et al. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field 26902 (National Academies Press, 2023). [PubMed] [Google Scholar]
  • 144.Wijsman EM Family-based approaches: design, imputation, analysis, and beyond. BMC Genet. 17, 9 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145.Truong VQ et al. Quality control procedures for genome-wide association studies. Curr. Protoc. 2, e603 (2022). [DOI] [PubMed] [Google Scholar]
  • 146.Hayhurst J. et al. A community driven GWAS summary statistics standard. Preprint at bioRxiv 10.1101/2022.07.15.500230 (2022). [DOI] [Google Scholar]
  • 147.Sollis E. et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148.Ottman R. Gene-environment interaction: definitions and study designs. Prev. Med. 25, 764–770 (1996). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149.Wright AF, Carothers AD & Campbell H. Gene-environment interactions — the BioBank UK study. Pharmacogenomics J. 2, 75–82 (2002). [DOI] [PubMed] [Google Scholar]
  • 150.Olvera Alvarez HA, Appleton AA, Fuller CH, Belcourt A. & Kubzansky LD An integrated socio-environmental model of health and well-being: a conceptual framework exploring the joint contribution of environmental and social exposures to health and disease over the life span. Curr. Environ. Health Rep. 5, 233–243 (2018). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information

RESOURCES