Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2023 Nov 2;110(11):1875–1887. doi: 10.1016/j.ajhg.2023.10.002

Factorizing polygenic epistasis improves prediction and uncovers biological pathways in complex traits

David Tang 1,2,, Jerome Freudenberg 1,3, Andy Dahl 1,∗∗
PMCID: PMC10645564  PMID: 37922884

Summary

Epistasis is central in many domains of biology, but it has not yet been proven useful for understanding the etiology of complex traits. This is partly because complex-trait epistasis involves polygenic interactions that are poorly captured in current models. To address this gap, we developed a model called Epistasis Factor Analysis (EFA). EFA assumes that polygenic epistasis can be factorized into interactions between a few epistasis factors (EFs), which represent latent polygenic components of the observed complex trait. The statistical goals of EFA are to improve polygenic prediction and to increase power to detect epistasis, while the biological goal is to unravel genetic effects into more-homogeneous units. We mathematically characterize EFA and use simulations to show that EFA outperforms current epistasis models when its assumptions approximately hold. Applied to predicting yeast growth rates, EFA outperforms the additive model for several traits with large epistasis heritability and uniformly outperforms the standard epistasis model. We replicate these prediction improvements in a second dataset. We then apply EFA to four previously characterized traits in the UK Biobank and find statistically significant epistasis in all four, including two that are robust to scale transformation. Moreover, we find that the inferred EFs partly recover pre-defined biological pathways for two of the traits. Our results demonstrate that more realistic models can identify biologically and statistically meaningful epistasis in complex traits, indicating that epistasis has potential for precision medicine and characterizing the biology underlying GWAS results.

Keywords: epistasis, complex trait, genetic architecture


Epistasis has been difficult to study in complex traits due to the statistical challenges of fitting polygenic interactions. Here, we develop a model called Epistasis Factor Analysis and demonstrate its utility for modeling epistasis effects in complex traits.

Introduction

Epistasis refers to interactions between genetic effects on a trait and is central to many domains of biology, including rare human disorders,1,2,3 protein evolution,4,5 natural selection,6,7 and functional genomics.8 Statistical models of epistasis can be useful for characterizing genetic architecture,9,10,11,12,13,14,15,16 improving genomic selection,17,18 and unbiasedly screening for unknown genomic mechanisms19,20,21,22 in model systems. Nonetheless, it remains debated whether epistasis is relevant for studying complex traits.23,24,25 This is primarily because complex-trait biology is poorly understood, which severely limits our ability to study epistatic interactions between causal genetic mechanisms.

Unbiased genome-wide statistical tests for epistasis could quantify and characterize epistasis in complex traits. However, modeling epistasis in complex traits is challenging because they are affected by a large number of genetic variants, i.e., they are polygenic (Figure 1A). Although studies have identified some specific examples of epistasis in complex traits,26,27,28 they have not explained significant missing heritability.29,30 We hypothesize this is partly due to shortcomings in current models of complex-trait epistasis. Specifically, prior studies have assumed that each SNP-SNP interaction is independent of all other additive and interaction effects (Figure 1B).31,32 This model is mathematically simple but biologically unrealistic and statistically under-powered.

Figure 1.

Figure 1

Three genetic architectures consistent with a typical GWAS

(A) Additive: Each SNP has an independent effect on the phenotype, Y.

(B) Uncoordinated: Each SNP interacts randomly with all other SNPs.

(C) Coordinated: SNPs can be grouped into interacting factors. EFA uses the P1xP2 interaction to (1) improve statistical power to detect epistasis and (2) sort GWAS loci into factors enriched in distinct biological pathways.

Here, we develop a complex trait epistasis model called Epistasis Factor Analysis (EFA). EFA assumes a “coordinated” form of epistasis where SNP-SNP interactions are structured by interactions between a few latent polygenic components, which we call epistasis factors (EFs) (Figure 1C).33,34 EFA simultaneously partitions genome-wide association study (GWAS) effects into latent EFs and estimates the interactions between the EFs. The inferred EFs can be validated with known genomic annotations and, in principle, be used to discover novel biological pathways underlying GWAS results. A key feature of EFA is leveraging additive effects to improve estimates of epistasis effects, which is important because additive effects are usually much stronger in complex traits.

We illustrate the advantages of EFA over standard models in realistic simulations where heritability is mostly additive. We also show that EFA improves phenotype prediction over the additive model for traits with modest epistasis heritability in two multi-trait yeast datasets. We then apply EFA to the four UK Biobank (UKB) traits characterized by Sinnott-Armstrong et al.35 and find significant epistasis in all four, including two that are robust to phenotype scale. Finally, we find that the inferred EFs partly recover predefined biological pathways. Our results show that structured models of polygenic epistasis can improve statistical performance and biological understanding in complex traits.

Subjects and methods

EFA intuition and overview

Nearly all complex trait studies assume the additive model, and it remains debated whether epistasis matters in complex traits.24,25 However, this debate has largely been predicated on an “uncoordinated” model of epistasis that assumes that epistasis effects are independent of both additive effects and other epistasis effects.30,31,36 Uncoordinated epistasis is neither interpretable, nor realistic, nor statistically supported.30 In contrast, we recently proposed a more realistic “coordinated” model of epistasis where epistasis and additive effects are structured by latent factors (Figure 1C).34 Coordination is motivated by known forms of epistasis in simpler traits, such as genetic modifiers of a Mendelian disease.2

Here, we introduce a coordinated epistasis model to identify these latent factors called EFA. EFA makes two key assumptions (A1 and A2) on the nature of epistasis in complex traits: (A1) SNP effects are mediated through a few distinct polygenic factors and (A2) these polygenic factors interact.

A1 and A2 are biologically motivated. For example, the factors in A1 could reflect distinct causal tissues,37,38 core genes,39,40 or heritable exposures like smoking.41 In accordance with A2, causal tissues might signal each other, core genes might encode proteins that physically interact, or an exposure’s effect may depend on underlying genetic risk. These assumptions are the same as the assumptions required for the coordinated epistasis model proposed in Sheppard et al.34 (Note S1).

The key idea in EFA is to share information between additive effects and epistasis effects. This connection is central to both of EFA’s goals: (1) leveraging additive signals greatly improves power to detect epistasis, and (2) leveraging epistasis signals provides a way to unbiasedly partition SNPs into distinct factors, which is impossible in the additive model without external data or biological annotations.

Mathematical description of EFA

EFA applies to a quantitative trait measured on N individuals, y, and a matrix of genotypes measured on N individuals at M loci, G. The EFA model is

yi=Pi1+Pi2+2λPi1Pi2+εi;Pik:=j=1MGijUjk, (Equation 1)

where Pik is the level of EF k for sample i, Ujk is the weight of SNP j on EF k, and λ is the interaction between EFs. In practice, we recommend using roughly 10 to 100 linkage disequilibrium (LD)-pruned SNPs in EFA that are chosen based on their additive effects. However, the optimal number will depend on the sample size and SNP effect-size distribution, and EFA can scale up to hundreds of SNPs (Figure S1).

While we primarily focus on the EFA model defined in Equation 1, we also present a generalized model in Equation 2 that allows for K>2 pathways and quadratic effects of each EF. We refer to this generalized model as EFA.

y=GU1K+GUGUvecΛ+ε (Equation 2)

In Equation 2, U is an M×K matrix of SNP effects on each of K factors, Λ is a symmetric K×K matrix of factor-level interactions, is the face-splitting product (AB is a matrix where each column is the element-wise product of a column of A and a column of B), and ε is independent and identically distributed (i.i.d.) Gaussian noise. Note that EFA is still a pairwise interaction model; K refers to the number of latent pathways and EFA considers all K(K+1)/2 second-order interactions.

Intuitively, each column of U represents the SNP weights on a different EF, and P,k:=GU,k is an individual’s weight on the k-th EF. Λkk represents the factor-level interaction between EFs k and k. When Λkk>0, factors k and k interact synergistically, meaning that their combined phenotypic effect is greater than the sum of their individual effects. The reverse is true for antagonistic factors, when Λkk<0. When Λkk=0, factors k and k do not interact. Finally, Λkk refers to the quadratic effect of pathway k, which can help identify the pathway weights in U and improve prediction. Nonetheless, our primary interest is in the Λkk terms where kk, as we are motivated by unraveling interactions across distinct biological factors. We note that the generalized EFA model with quadratic terms is not identified without further assumptions, such as that U is sparse. We prove these facts and comprehensively characterize the theoretical properties of EFA in Note S1.

EFA and EFA are special cases of the standard pairwise polygenic epistasis model for complex traits,

yi=jGijβj+j,jGijGijωjj+εi, (Equation 3)

where βj is the additive effect of SNP j and ωjj is the epistatic interaction between SNPs j and j. We can derive these SNP-level effects from EFA post hoc by

β=kU,k;ω=UΛUT. (Equation 4)

This implicit, low-rank factorization of the genome-wide epistasis matrix is where EFA derives its name. Moreover, the linked factorization across β and ω gives rise to a certain level of coordination as defined in Sheppard et al.34 (Note S1) and is how EFA leverages additive effects to estimate epistasis effects; the constraint on the EFs summing to β improves power to estimate ω when epistasis is coordinated.

Pragmatically, EFA outputs SNP weights on each EF (U) and the interaction between EFs (λ). These outputs can be used to reconstruct individual-level EFs (P), additive SNP effects (β, Equation 4), and SNP-SNP interaction effects (ω, Equation 4).

Fitting EFA parameters

We estimate U and Λ in Equation 2 using maximum likelihood (Note S1). In general, we fit parameters using gradient descent based on automatic differentiation. We evaluate multiple random restarts, each initialized with U,k drawn from a neighborhood around 0. We use this approach in simulations and in the yeast analyses. This generic, flexible estimation method can easily accommodate future model extensions, e.g., introducing sparsity in U with l1 penalties or “anchoring” EFs to specific SNPs (Note S1).

To scale our method to UKB sample size, we derive a block coordinate descent algorithm applicable only to the special case where we disallow quadratic terms (Note S1). The updates are remarkably simple, requiring only iterative least-squares regressions that scale linearly in sample size and quadratically in SNPs (i.e., O(NM2)). The key to this efficiency is that we only evaluate EF-EF interactions and never explicitly evaluate SNP-SNP interactions, which would cost O(NM4). With 100 SNPs, as in our UKB analyses, this is a 10,000-fold speedup.

Uncoordinated models of polygenic epistasis

We compare EFA with other polygenic models in simulations and real data analyses. All comparison methods are special cases of the overarching polygenic pairwise epistasis model in Equation 3.

First, we compare with the standard additive model, which ignores epistasis by setting ωjj=0 for all j,j. Because we mostly study scenarios with M<N, we primarily fit the additive model with fixed effects (by ordinary least squares). The predictions from this model are then essentially polygenic scores. We also fit the additive model with random effects (assuming each βj is i.i.d. Gaussian, Note S2) in the real yeast analysis, enabling us to fit the full genome-wide data without pruning.

Second, we compare with the standard pairwise epistasis model in Equation 3 fit with fixed effects. Because this model has O(M2) parameters, jointly fitting all pairwise epistasis effects is often noisy or even impossible. Instead, we fit each pairwise interaction ωjj one at a time (for jj) while controlling for additive fixed effects (note that only ωjj+ωjj is identified).

Third, we compare with the uncoordinated random effect model for pairwise epistasis.31 This model assumes that ωjj are drawn i.i.d. Gaussian. This assumption simplifies computation because the ω can be easily marginalized out of the likelihood function. We fit the variance components (one for β and one for ω) with maximum likelihood. We derive the best linear unbiased predictors (BLUPs) for both ω and for phenotype prediction in Note S2. We note that EFA is more similar to the fixed-effects uncoordinated model than the random-effect model.

Finally, we also develop an approach to estimate latent pathways from uncoordinated estimates by applying principal component analysis (PCA) to the estimated ω matrix. This is a naive baseline approach that only involves post-processing uncoordinated estimates. In contrast, EFA directly learns pathways, linking the low-dimensional components of ω with the additive effects.

Simulations

We used a polygenic simulation framework that is fully described in Note S3. In brief, we simulated allele frequencies at independent SNPs from a beta(2,2) distribution and diploid genotypes from a binomial distribution parameterized by the randomly chosen allele frequency. We centered and scaled genotypes to remove the ambiguity between variance “attributable to” vs. variance “caused by” additive effects, which can cause confusion.24,25

We then simulated phenotypes under the EFA model Equation 2 with the simulated genotypes. We simulated the latent pathways, U,k, from i.i.d. Gaussians to assign SNP weights on each EF and then rescaled the EFs to give the appropriate additive heritability (recall that the additive effect is related to the latent pathways by β=kU,k). Next, we simulated the upper triangular entries of Λ from i.i.d. Gaussians with variance chosen to give the appropriate epistasis heritability. Using these key parameters, we generated the pairwise polygenic epistasis effects by ω=k,kΛkkU,kU,kT.

We also simulated partly coordinated epistasis by adding i.i.d. Gaussian random variables to each entry of the ω matrix simulated above from the EFA model. By varying the relative contribution of these two pieces, we can interpolate between the EFA model (coordinated) and the classical epistasis model (uncoordinated). By choosing the relative variances of these different parameters appropriately, we are able to simulate the standard additive model, the standard uncoordinated model, and the coordinated EFA model.

Unless otherwise specified, we simulated N=1000 individuals and M=20 SNPs. We fixed the broad-sense heritability at H2=0.5 and set both the additive and epistasis heritabilities at hadd2=hepi2=0.25 such that the total heritability was evenly partitioned between additive and epistasis effects. As a baseline, we simulated K=2 pathways without quadratic terms and perfect coordination between additive and epistasis effects.

Finally, we used two alternative simulation frameworks to evaluate the calibration of our bootstrap test with respect to LD and dominance, which we described fully in Note S3.

Yeast data

We downloaded genotype and phenotype data for prototrophic haploid segregants from a cross between a laboratory strain and a wine strain of yeast.13 This dataset consisted of 46 quantitative traits for 1,008 samples genotyped at 11,623 unique markers. We standardized each phenotype and SNP to mean 0 and variance 1. Each phenotype represented growth rate on a different medium as measured by endpoint colony size assays. To account for LD in fixed-effect models, we used PLINK 1.942 to clump and threshold variants at a threshold of r2<0.2, minimum distance of 250 kb, and marginal additive p<0.05. We performed clumping and thresholding separately on each cross-validated fold to avoid bias from overfitting.43

Additionally, we downloaded a second yeast growth trait dataset from the same group with a larger sample size (N=4390).14 This second dataset contained 20/46 of the growth phenotypes in the first dataset, and it included 28,220 unique genotype markers for each sample. We preprocessed this dataset in the same manner as above.

We evaluated phenotype prediction accuracy with 10-fold cross-validation using the same 10-folds across all methods so we could test for differential prediction accuracy with paired t tests.

UKB data

This project used the UKB Resource under application number 30397 and application number 89052. All UKB participants provided informed consent. We studied the same four traits in UKB that were extensively characterized in Sinnott-Armstrong et al.35: urate, insulin growth factor 1 (IGF-1), testosterone in males, and testosterone in females. We mainly focused on analyzing unrelated “White British” individuals as defined by UKB. We used the top 100 LD-pruned GWAS hits from Sinnott-Armstrong et al.35 or all 77 GWAS hits for female testosterone. Prior to fitting EFA, we regressed out sex, age, batch, assessment center, and the top 10 genotype PCs from the phenotype.

We assessed statistical significance using 499 bootstrap samples to estimate standard errors and confidence intervals on the pathway-level interaction, λ. We computed empirical p values as 2·min(Bλ<0,Bλ>0), where Bλ<0 is the number of bootstrap samples where EFA estimates λ<0 (and vice versa for Bλ>0). We also report Gaussian p values obtained from Z scores computed as the bootstrap mean divided by the bootstrap standard deviation.

To test the biological significance of our inferred EFs, we used the annotations of GWAS hits to biological pathways provided in Sinnott-Armstrong et al.35 We note that EFA outputs a continuous assignment of SNPs to EFs via the effect sizes in U; discretely mapping SNPs to EFs requires setting thresholds on EF-specific effect sizes.

We assigned SNPs to the epistasis factors by (1) polarizing SNPs to have positive additive effect sizes and (2) choosing the largest 10 effects on each EF after subtracting out additive effects from the EFs. To test robustness, we also evaluated results using the top 15 or 20 SNPs per EF (Tables S3–S7). We calculated p values for enrichment of these top SNPs in biological pathways with a hypergeometric test comparing the top 10 SNPs for each EF with the other 90 SNPs.

To test the portability of the EFs estimated in White British individuals, we performed ordinary linear regression in other populations by

(YpopGpopUˆ1(wB)GpopUˆ2(wB))2·GpopUˆ1(wB)×GpopUˆ2(wB).. (Equation 5)

This yielded estimates and p values for population-specific values of λˆ based on the same EFs learned in the White British population, Uˆ(wB). In this regression, Ypop and Gpop are the (covariate residualized) phenotypes and genotypes of individuals from a given ancestry and Uˆ(wB) is the EF effect sizes learned in White British individuals. As with most portability analyses, we expect these population-specific estimates of λ to be attenuated relative to the estimate in the White British population because our EFs are learned in that population.

We defined these populations based on self-reported “ethnic background” (UKB datafield 21,000). Specifically, we use two top-level categories (Asian or Asian British, shortened to “Asian,” and Black or Black British, shortened to “Black”). The third population we consider are the “White” individuals after excluding “British,” which we term non-British European. We exclude all White British individuals used to fit EFA and all individuals related to another individual in UKB.

Results

EFA recovers true pathways and improves prediction in simulations

We performed simulations to characterize EFA as a function of additive heritability (hadd2), epistasis heritability (hepi2), and the fraction of epistasis heritability due to coordinated vs. uncoordinated effects (subjects and methods). We compared EFA with uncoordinated epistasis estimates fit either by fixed or random effects (subjects and methods). For each parameter setting, we ran 100 independent simulations and report the mean estimation accuracy for the pairwise SNP-SNP interactions ωjj and for the EFs Ujk. EFA directly fits EFs and implicitly fits ω (using Equation 3 in the subjects and methods). Uncoordinated methods directly fit ω, and we derive uncoordinated EFs post hoc using PCs of the estimated ω matrix (subjects and methods).

First, we varied the relative contribution of additive and epistasis effects (hadd2 vs. hepi2). We find that EFA outperforms uncoordinated estimates of ω across the range of hepi2, which is expected as we simulate from the EFA model (Figure 2A). EFA’s gain is greatest when hadd2hepi2, illustrating how EFA’s epistasis estimates borrow strength from additive effects. Analogously, EFA improves additive effect estimates over standard additive models when hepi2hadd2 (Figure S2), though this scenario is highly unlikely in practice. Additionally, EFA substantially outperforms uncoordinated estimates of EFs, which is expected because EFA specifically targets these low-dimensional factors (Figure 2B). This also shows how EFA leverages additive signals to improve EF estimates: when hadd2=0, EFA actually underperforms the uncoordinated model, whereas EFA remains accurate even when heritability is 90% additive.

Figure 2.

Figure 2

Simulations characterize the accuracy of polygenic epistasis estimates

Points show the average across 100 simulation replicates, and the error bars depict the standard errors. Accuracy is quantified by the squared correlation between estimated and true pairwise SNP epistasis effects, ω (A, C, E) or between estimated and true EFs, U (B, D).

(A and B) Simulations under the EFA model vary the fraction of heritability due to additive vs. epistasis effects.

(C and D) Partial coordination is simulated by combining epistasis effects from the EFA model with uncoordinated epistasis effects.

(E) Model misspecification is evaluated by simulating and fitting EFA with different values of K, the number of EFs in the EFA model. In (B) and (D), the EF R2 is nonzero even without any coordinated epistasis because the EFs still capture the additive effects in our simulation framework.

Next, we fixed hadd2=hepi2=25% and varied the level of coordination by adding in uncoordinated epistasis effects (subjects and methods). EFA’s performance decays as coordination weakens, while the uncoordinated estimates remain stable because the uncoordinated model is agnostic to the structure of epistasis (Figure 2C). Nonetheless, EFA produces accurate estimates of the EFs even at mild levels of correlation (Figure 2D). Moreover, EFA always outperforms the uncoordinated estimates because EFA directly targets the coordinated component of epistasis.

To investigate EFA’s performance when the number of latent factors, K, is misspecified, we simulated from the EFA model (subjects and methods) with a true K and fit different EFA models with K being potentially misspecified. As expected, ω estimates are more accurate when fitting EFA with the true value of K, though the base EFA model is robust to modest misspecification of K (Figure 2E). Again, the uncoordinated estimates remain stable as K increases and eventually outperform the EFA estimates. This is unsurprising, as the K= case is equivalent to uncoordinated epistasis34; intuitively, the independent pathways become individually negligible and wash out. We did not consider EF estimation accuracy because EFs are not identified for K>2 (Note S1).

We next profiled the computational and statistical scalability of EFA by varying the number of SNPs, M, and sample size, N, while simulating from the EFA model with hadd2=9% and hepi2=1% (Figure S1). Across all N and M, EFA always outperforms uncoordinated estimates of ω and the EFs. Performances for all methods decay as M grows because each individual SNP effect is smaller. This illustrates why modeling epistasis in complex traits is difficult. Likewise, all methods improve as N grows. In terms of run time, EFA is comparable with the uncoordinated model, fitting hundreds of SNPs with a reasonable computational demand. However, we note that fitting EFA requires an iterative algorithm, meaning that the EFA’s run times depend on its convergence rate while the uncoordinated model has a steady run time guaranteed by its analytic solution.

Finally, we asked whether EFA can improve phenotype prediction in our baseline EFA simulation. We found that EFA performs well, with out-of-sample prediction R2 near the theoretical limit of H2 (Figure 3A). EFA always outperforms the fixed-effect uncoordinated model as we vary the fraction of heritability that is additive, and EFA outperforms the random-effect uncoordinated model except when epistasis is absent.

Figure 3.

Figure 3

Phenotype prediction accuracy in simulations

Simulations and plots were conducted and created as in Figure 2. The y axis depicts the out-of-sample phenotype prediction accuracy as measured by R2. The x axis varies the fraction of total heritability due to additive vs. epistasis effects (A) or the fraction of epistasis heritability due to coordinated vs. uncoordinated epistasis (B). The black dotted line indicates the broad-sense heritability, H2, which is the upper bound for R2, and the pink dotted line indicates the narrow-sense heritability, h2, which is the upper bound for R2 with the additive model.

We then fixed hepi2=hadd2=25% and varied the fraction of epistasis that is coordinated, as above. Results were similar to ω estimation accuracy in Figure 2C: when epistasis is mostly coordinated, EFA is optimal, but EFA performs poorly when epistasis is uncoordinated (Figure 3A). Overall, EFA improves estimation of latent pathways in all our tests and improves prediction when epistasis is sufficiently strong and coordinated.

EFA improves prediction in complex yeast traits

We next asked whether EFA could improve phenotype prediction in complex traits. Using data from Bloom et al., we generated genetic predictions of 46 quantitative yeast traits measured on 1,008 samples, where each trait is defined as the yeast's growth rate on a different medium.13 These traits have varying levels of epistasis heritability, as defined by the difference between the phenotypic similarity of clones (H2) and the additive heritability estimated from genotype data (h2). We measured prediction accuracy by the squared correlation between predicted and observed phenotypes using 10-fold cross-validation. For each trait, we clumped+thresholded SNPs, which yielded 30–70 roughly independent SNPs (r2<0.2, subjects and methods).

We first compared EFA with the additive model and found that EFA significantly improved prediction accuracy for 6/46 traits (p<0.05, paired t test, subjects and methods, Table S1). These 6 traits all had large epistasis heritability as indicated by differences between H2 and h2 (Figure 4A). More generally, the traits that are furthest to the bottom-right of the diagonal in Figure 4A tended to be the traits for which EFA has the largest improvement over the additive model. Concretely, the difference between REFA2 and Radd2 correlated with the epistasis heritability (Spearman’s ρ=0.48, p<0.001). This is consistent with our simulations showing that EFA can outperform additive predictions when hepi2 is higher. By comparison, the additive model significantly improved prediction over EFA for 20/46 traits. Finally, EFA predictions were superior to uncoordinated epistasis predictions using fixed effects for all 46 traits (Figure 4B).

Figure 4.

Figure 4

Phenotype prediction accuracy across complex yeast traits. Prediction accuracy is measured by R2 between the out-of-sample predicted phenotypes and the true phenotypes in 10-fold cross-validation

Each point is a trait whose x and y coordinates are the broad- and narrow-sense heritability estimates from13 (A, B) or14 (C, D). Traits are colored by the difference between EFA prediction R2 and the prediction R2 of the additive model (A, C) or the uncoordinated epistasis model (B, D), where green indicates EFA is superior and purple indicates EFA is inferior.

(A) EFA significantly outperforms the additive model for 6/46 traits from Bloom et al., 2013 data.

(C) EFA significantly outperforms the additive model for 7/20 traits from Bloom et al., 2015 data.

(B and D) EFA always outperforms the uncoordinated fixed-effect epistasis model.

Next, we compared EFA with additive and epistasis random-effects models. Unlike fixed-effect models, including EFA, random-effect models can fit the whole genome without LD pruning, which is often a major advantage for prediction in complex traits.18,44,45 Nonetheless, EFA significantly outperforms both additive and uncoordinated epistasis random-effect models for 7/46 traits (p<0.05, Figure S3). We also compared the 2 random-effect models, and we found that the uncoordinated epistasis model significantly outperformed the additive model for 9/46 traits (p<0.05). Intriguingly, these 9 traits only overlap 1 of the 7 traits where EFA outperforms the additive random-effect model. Together with our simulations in Figure 3B, this suggests that the degree of coordination varies across traits.

Finally, we evaluated replication in a second dataset from the same group with 4,390 samples measured on 20 of the above 46 traits (Bloom et al.,14 Figure S4). We analyze all 20 traits measured in Bloom et al.,14 which are a subset of the 46 traits in Bloom et al. 2013.13 We replicated the superiority of EFA over the additive model for all 3 traits where EFA beat the additive model in the first dataset (p<0.05; the other 3/6 traits were not measured in the second dataset). Further, at this larger sample size, we find 4 additional traits where EFA significantly outperforms the additive fixed-effect model (Figure 4C; Table S2). We also replicate that EFA always outperforms the uncoordinated fixed-effects model (Figure 4D). Overall, EFA improves prediction over the additive model for some complex traits, and the utility of EFA will likely grow with larger sample sizes.

EFA detects epistasis and recovers known pathways in complex human traits

We applied EFA to complex human traits in UKB. We studied the same 4 traits that were extensively characterized in Sinnott-Armstrong et al.:35 IGF-1, urate, testosterone in males, and testosterone in females. We used the top 100 LD-pruned GWAS hits for each trait (or all 77 for female testosterone) and fit EFA using our coordinate descent algorithm on the genotypes and phenotypes of unrelated White British individuals in the UKB (subjects and methods). While recent work has suggested that regions of long-range LD may be enriched for epistasis,46 we chose conservatively to prune our SNPs as LD can generate subtle, non-additive statistical signals.47,48

We first asked whether EFA detects statistically significant epistasis using the bootstrap to test whether λ=0. Specifically, we resample individuals to calculate the sampling distribution of λˆ, the factor-level epistasis, which yields empirical p values and confidence intervals (subjects and methods). We used simulations to confirm that this bootstrap test is calibrated under the null model where SNPs are additive (Figure S5). Additionally, we performed realistic simulations with true epistasis and found that EFA’s λ estimates are unbiased and that EFA’s bootstrap test is powerful. A caveat is that our bootstrap confidence intervals are liberal in the presence of true epistasis, but this does not cause false-positive tests of epistasis.

The EFA bootstrap test finds significant epistasis for all 4 tested traits (all empirical p<0.05/4; Figures 5E–5H and Table 1). For IGF-1 and male and female testosterone, λ is positive, while λ is negative for urate. Because we use 499 bootstrap replicates, the empirical p values are noisy and cannot be lower than 1/500. If we further assume that our estimate of λ is roughly Gaussian under the additive model, we get much lower and more numerically precise p values (Table 1). Our simulations suggest that this Gaussian approximation is conservative (Figure S6). To test robustness under the null, we repeated our analysis after permuting phenotypes. As expected, our bootstrap test was null (Table 1 and Figure S7).

Figure 5.

Figure 5

EFA identifies statistically and biologically significant epistasis in 4 complex human traits

(A–D) EFA jointly partitions the additive effects of GWAS-significant SNPs (x axis) into two EFs (y axis: difference between the EFs and the additive effects, i.e., Ujkβj/2).

(E and F) Bootstrap replicates show that EFA identifies significant epistasis in all 4 traits. Each histogram displays the distribution of λˆ across bootstrap replicates. Red lines indicate the symmetric 95% confidence intervals calculated empirically from the bootstrap distribution. For visualization, 14 values of λ are excluded across all plots (out of a total of 6,000). See Figure S7 for comparable plots on quantile-normalized and permuted phenotypes.

(I–L) Distribution of estimated interaction between White British EFs in the Black, Asian, and non-British European populations in UK Biobank. Distributions obtained from standard linear regression of phenotypes on the product between EFs estimated in White British UK Biobank participants (see subjects and methods). Gaussian distribution mean and standard deviation given by the regression coefficient point estimate and standard errors, respectively.

Table 1.

EFA finds statistically significant evidence of epistasis in complex human traits

IGF-1 Urate Testosterone (male) Testosterone (female)
Standard

Confidence interval (0.25, 0.60) (−0.50, −0.27) (0.39, 0.70) (1.19, 1.78)
Empirical p val 0.002 0.006 0.002 0.002
Gaussian p val 3.52e−04 1.16e−07 0.011 2.75e−04

QNorm

Confidence interval (−0.30, 0.37) (−0.57, −0.36) (−0.46, 0.42) (0.63, 1.04)
Empirical p val 0.570 0.002 0.826 0.002
Gaussian p val 0.689 7.67e−15 0.846 3.72e−2

Permuted

Confidence interval (−0.28, 0.32) (−0.47, 0.00) (−0.46, 0.41) (−1.48, 0.75)
Empirical p val 0.730 0.066 0.618 0.274
Gaussian p val 0.749 0.090 0.642 0.984

Empirical 95% confidence intervals were computed using the 0.025 and 0.975 quantiles of the bootstrap distribution. All p values are two sided.

Certain forms of epistasis can be entirely removed by a non-linear transformation of the phenotype onto a scale in which all effects are additive.49 Therefore, we tested robustness to scale by quantile-normalizing phenotypes (Figure S7). We found that urate and female testosterone remained highly significant after quantile normalization (empirical p=0.002,0.002; Gaussian p=7.7e-15,3.7e-13). Moreover, the sign and magnitude of λ remained stable after this transformation. However, quantile normalization eliminated the EFA signal for male testosterone and IGF-1, which emphasizes the importance of evaluating scale dependence in interaction tests. Overall, our results increase confidence that the EFA signals for urate and female testosterone are not scale dependent, while the discrepancies for IGF-1 and male testosterone could either be false positives on the original scale or false negatives on the transformed scale.34

Next, given the complexities of testing epistasis in complex traits, we performed a new set of simulations to evaluate our bootstrap-based test in several realistic scenarios that cause false positives in many epistasis tests. These simulations use the same broad framework as in Figure 2 in that they simulate from the EFA model, but instead of evaluating estimation accuracy, these simulations measure power and false-positive rates of our bootstrap-based tests.

First, we simulated a dominance model without any epistasis between distinct SNPs (Note S3). Under modest levels of dominance (hdom2=0.1%30,50), the bootstrap test is roughly null (Figure S5). Under stronger dominance (hdom2=1%), however, the bootstrap test becomes significant (Figure S5). Both results are expected: dominance is not technically a false positive under EFA (unlike our prior even-odd test34), but EFA primarily targets inter-SNP epistasis; hence, the EFA test has low power if dominance effects are the only form of epistasis.

Second, we simulated an additive model with unobserved causal variants in LD with measured variants. Our simulation meets the necessary condition for “phantom” epistasis in de Los Campos et al.,47 which can cause severe bias in pairwise SNP-SNP epistasis tests (Note S3). For EFA, however, we do not observe any bias from ungenotyped variants (Figure S5). Intuitively, the phantom epistasis generated by this particular form of unmodeled LD is uncoordinated, so it cancels out at the level of pathway interactions. Nonetheless, in principle, other forms of LD could bias EFA estimates. Overall, the structured and polygenic nature of EFA substantially mitigates the LD biases that can plague other epistasis tests.18,48,51,52,53

We next asked whether EFA captures biologically meaningful epistasis by comparing estimated EFs with the known biological pathways defined in Sinnott-Armstrong et al.35 Specifically, we used a hypergeometric test for enrichment of the top 10 SNPs per EF, defined by EF-specific effect size, in each predefined biological pathway (relative to the other 90 GWAS SNPs, Table 2, subjects and methods).

Table 2.

Epistasis factors are enriched in predefined biological pathways

Trait Biological pathway # SNPs in EF1 # SNPs in EF2 # Top marginal SNPs # SNPs total
IGF-1 downstream signaling 1 (p=0.284) 0 (p=1.000) 0 (p=1.000) 3
growth horm. secr. 0 (p=1.000) 1 (p=0.361) 2 (p=0.053) 4
IGF-1 secretion 0 (p=1.000) 1 (p=0.361) 0 (p=1.000) 4
IGF-1 serum balance 0 (p=1.000) 1 (p=0.549) 3 (p=0.023) 7
Ras signaling 0 (p=1.000) 1 (p=0.430) 0 (p=1.000) 5
Urate solute transport 3 (p=0.175) 5 (p=0.007) 7 (p=4.1e-5) 15
Testosterone (male) steroid synthesis and metabolism 0 (p=1.000) 1 (p=0.478) 2 (p=0.109) 6
HPG signaling 0 (p=1.000) 0 (p=1.000) 2 (p=0.077) 5
serum homeostasis 0 (p=1.000) 3 (p=0.001) 1 (p=0.273) 3
Testosterone (female) steroid synthesis and metabolism 1 (p=0.638) 0 (p=1.000) 2 (p=0.223) 7

We show pathways that have been assigned >2 SNPs used in EFA. p values are from a hypergeometric test of the top 10 EF SNPs in each biological pathway relative to the other 90 SNPs.

We find that EF2 for male testosterone is enriched in “serum homeostasis” (p=0.0001). This enrichment is driven by SNPs assigned to SHBG and SLCO1B3, which live on different chromosomes (17 and 12). In urate, EF2 is enriched in “solute transport” (p=0.007, Table 2), which is driven solely by a single locus with large effect (SLC2A9). This urate signal could either reflect genuine cis-trans interactions or could merely reflect subtle LD with unmeasured additive effects and/or dominance effects.30 We confirmed these enrichments are robust if we instead use the top 15 or 20 SNPs per EF (Tables S3–S6). The results are also robust if we increase the window size used to assign SNPs to genes (Tables S3–S6). Altogether, EFA is capable of recovering biologically meaningful pathways using the SNPs that contribute to each EF.

For comparison, we next asked whether the top 10/100 additive SNPs are enriched in these pathways compared with the bottom 90/100 using the same hypergeometric test (subjects and methods). We find exactly 1 significant enrichment, which is for the same solute transport pathway in urate that EFA identifies. While these analyses provide context for the EFA enrichments, they are not directly comparable as they evaluate different parameters (additive vs. epistasis enrichments).

Finally, we examined whether EFA is portable across ancestries. Specifically, we tested replication in each of 3 additional populations defined by self-reported ethnicity (subjects and methods): non-British European, Asian, and Black. Because the sample sizes for these populations are much smaller in UKB, we did not attempt to fully replicate our EFA results. Instead, we built EFs from the EFs learned in the White British population, akin to constructing polygenic scores in a test dataset. We then estimate λ by fitting a regression of the phenotype onto the interactions between these EFs in each population (subjects and methods).

Overall, the regression-based estimates of λ are consistent across the Black, Asian, and non-British European populations (Figures 5I–5L). These estimates are attenuated but sign consistent with the EFA estimates from the White British population, which is expected when porting genetic scores between populations. However, the power of these tests varied substantially due to differences across traits and sample size differences across populations. In urate, for example, all populations had highly significant and similar estimates of λ (all three p<4e-5, Table S7). In male testosterone, at the other extreme, the estimates were insignificant and inconsistent across populations. IGF-1 and female testosterone lie between, with consistent estimates of λ across populations that were significant in the larger non-British European population but not the smaller Black and Asian populations. Altogether, this suggests that the interactions learned by EFA are generally portable across ancestries.

Discussion

Despite strong evidence for epistasis in many domains of biology, it remains unclear whether modeling epistasis adds biological or statistical value in complex traits. We introduced a model called EFA to address this question. In contrast to existing uncoordinated models of polygenic epistasis, which lack biological motivation, EFA assumes a structured form of polygenic epistasis driven by latent factors (Figure 1C). We showed that EFA outperforms additive predictions in several yeast traits and replicated this result in a second dataset. Additionally, we found that EFA can identify epistasis in complex human traits where standard models fail and that the EFs can partly recover known biological pathways without prior information. Finally, we showed the EF interaction replicates well across populations.

We found that EFA can improve prediction in model organisms and simulations, and it is likely that epistasis will eventually improve prediction in natural populations. However, the within-population prediction gain will likely be modest because additive models partly capture epistasis effects—especially when causal variants have been driven to lower frequencies by natural selection.24,54 It is possible that epistasis predictions will improve portability across ancestries and it is promising that the EFs are somewhat portable, but epistasis has not yet been shown to contribute to this important problem.55,56 Therefore, the greater promise of epistasis models in complex human traits is not better statistical prediction but rather better biological understanding of GWAS results. Our results showing that EFs are enriched in predefined genomic annotations provide support for this idea.

Many polygenic epistasis models have been previously developed. The standard model, which we call “uncoordinated,”34 assumes that all SNP-SNP epistasis effects are independent of each other and of additive effects.31 While uncoordinated models are rigorous and easy to fit, they are biologically implausible and statistically under-powered.30 More recent polygenic epistasis models make parsimonious assumptions to improve power and interpretation. For example, autosome-sex interactions represent a particular form of structured epistasis that is strongly supported in diverse complex traits57,58,59,60 (though many of these interactions are at least partly due to gender, not sex). As another example, interactions between a single locus and a polygenic background can identify epistasis hubs.61,62,63,64,65,66 Finally, biologically annotated neural network (BANN) models complex traits as epistatic compositions of simpler pathways, like EFA, though BANN pathways are more numerous, less polygenic, and require prior biological annotations.67

There are several important limitations to our study. First, like linear regression, EFA can fit dozens to thousands of genetic variants (Figure S1) but cannot jointly fit the entire genome. In practice, we pre-screen SNPs based on additive signals, such as selecting only GWAS-significant SNPs in UKB. This is because SNPs with additive effects are enriched in SNPs with epistasis effects.24,68 Second, our polygenic epistasis tests do not establish interactions between any given SNP pair. In the future, sparse models (Bayesian or frequentist) could test which SNPs affect which EFs. Third, following Sinnott-Armstrong et al.,35 we have only studied 4 relatively simple complex human traits, and it will be important in the future to evaluate EFA more broadly across more complex traits.

Another set of concerns is statistical false positives from phenotype scale and/or complex LD patterns. First, it is obvious that scale transformations of an additive trait can induce interactions.34,49,69,70 Nonetheless, 2/4 of our human EFA signals are qualitatively identical on the natural and quantile-normalized scales, and coordination is theoretically robust to modest rescaling.34 Second, LD with unmeasured causal variants can cause dramatic and replicating false positives, especially in small cis windows with large effects.18,48,51,52,53 We partly address this by LD-pruning SNPs, and we also use simulations to show that EFA’s polygenic nature reduces sensitivity to locus-specific LD biases. More broadly, EFA’s utility for prediction and biological characterization persists even if EFA signals reflect scaling and/or subtle LD patterns. Overall, these caveats are crucial for interpreting interactions, but they are not likely to impact our central conclusions.

There are several extensions to EFA that may prove useful. First, EFA could jointly model multiple traits that partly share EFs. This would add power to detect shared EFs and provide a rich decomposition of pleiotropy that goes beyond simple genetic correlation.71,72,73,74,75 Second, EFA could model complex diseases in a generalized linear-model framework, which is closely related to the idea of limiting disease pathways33 and disease subtyping.59 While the heterogeneous genetic architectures of binary case-control disease traits are naturally explained by the pathway-level model in EFA,34,59 extending EFA to model such traits requires care as the forms of scale-dependence issues that generate non-additive signals can be particularly extreme for binary traits.49,76 Third, we have focused on SNP interactions, but in principle, EFA is capable of modeling more powerful and/or interpretable genetic features such as imputed gene expression,77,78 copy number variants,27,28,79 or polygenic scores for secondary traits80 or exposures.41 EFA could also incorporate non-genetic variables such as epigenomic marks, medical image-derived features,81 disease symptoms, or comorbidities from electronic health records. Fourth, it would be useful to quantify the variance explained by coordinated vs. uncoordinated epistasis, but extending EFA to a variance component model remains challenging because coordination cannot be modeled as a simple Gaussian random effect. Ultimately, we hope that the EFA model and its strong results in real data provide a solid step toward unraveling epistasis in complex human traits.

Data and code availability

Acknowledgments

We thank Sasha Gusev, Hyunkyung Kim, Sriram Sankararaman, Matthew Stephens, and Noah Zaitlen for helpful feedback. We also thank the participants in UKB for making this study possible. Finally, we thank the Center for Research Informatics and the Research Computing Center for providing the computer resources necessary for carrying out this project. A.D. is supported by K25HL157603.

Author contributions

D.T. developed statistical methodology, performed analysis, and wrote the manuscript. J.F. performed analysis. A.D. conceived and supervised the project and wrote the manuscript.

Declaration of interests

The authors declare no competing interests.

Published: November 2, 2023

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.ajhg.2023.10.002.

Contributor Information

David Tang, Email: davidtang@g.harvard.edu.

Andy Dahl, Email: andywdahl@uchicago.edu.

Supplemental information

Document S1. Figures S1–S8, Tables S1–S7, Notes S1 and S2
mmc1.pdf (10MB, pdf)
Table S8. IFG1 EFA results
mmc2.csv (11.4KB, csv)
Table S9. Urate EFA results
mmc3.csv (11.6KB, csv)
Table S10. Male testosterone EFA results
mmc4.csv (11.3KB, csv)
Table S11. Female testosterone EFA results
mmc5.csv (9KB, csv)
Document S2. Article plus supplemental information
mmc6.pdf (13.3MB, pdf)

References

  • 1.Cutting G.R. Modifier genes in mendelian disorders: the example of cystic fibrosis. Ann. N. Y. Acad. Sci. 2010;1214:57–69. doi: 10.1111/j.1749-6632.2010.05879.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Cutting G.R. Cystic fibrosis genetics: from molecular understanding to clinical application. Nat. Rev. Genet. 2015;16:45–56. doi: 10.1038/nrg3849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Timberlake A.T., Choi J., Zaidi S., Lu Q., Nelson-Williams C., Brooks E.D., Bilguvar K., Tikhonova I., Mane S., Yang J.F., et al. Two locus inheritance of non-syndromic midline craniosynostosis via rare SMAD6 and common BMP2 alleles. Elife. 2016;5 doi: 10.7554/eLife.20125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Starr T.N., Thornton J.W. Epistasis in protein evolution. Protein Sci. 2016;25:1204–1218. doi: 10.1002/pro.2897. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Bakerlee C.W., Nguyen Ba A.N., Shulgina Y., Rojas Echenique J.I., Desai M.M. Idiosyncratic epistasis leads to global fitness-correlated trends. Science. 2022;376:630–635. doi: 10.1126/science.abm4774. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Corbett-Detig R.B., Zhou J., Clark A.G., Hartl D.L., Ayroles J.F. Genetic incompatibilities are widespread within species. Nature. 2013;504:135–137. doi: 10.1038/nature12678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Barton N.H. How does epistasis influence the response to selection? Heredity. 2017;118:96–109. doi: 10.1038/hdy.2016.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Lin X., Liu Y., Liu S., Zhu X., Wu L., Zhu Y., Zhao D., Xu X., Chemparathy A., Wang H., et al. Nested epistasis enhancer networks for robust genome regulation. Science. 2022;377:1077–1085. doi: 10.1126/science.abk3512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Brem R.B., Storey J.D., Whittle J., Kruglyak L. Genetic interactions between polymorphisms that affect gene expression in yeast. Nature. 2005;436:701–703. doi: 10.1038/nature03865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Shao H., Burrage L.C., Sinasac D.S., Hill A.E., Ernest S.R., O’Brien W., Courtland H.W., Jepsen K.J., Kirby A., Kulbokas E.J., et al. Genetic architecture of complex traits: large phenotypic effects and pervasive epistasis. Proc. Natl. Acad. Sci. USA. 2008;105:19910–19914. doi: 10.1073/pnas.0810388105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sittig L.J., Carbonetto P., Engel K.A., Krauss K.S., Barrios-Camacho C.M., Palmer A.A. Genetic background limits generalizability of genotype-phenotype relationships. Neuron. 2016;91:1253–1259. doi: 10.1016/j.neuron.2016.08.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Huang W., Richards S., Carbone M.A., Zhu D., Anholt R.R.H., Ayroles J.F., Duncan L., Jordan K.W., Lawrence F., Magwire M.M., et al. Epistasis dominates the genetic architecture of drosophila quantitative traits. Proc. Natl. Acad. Sci. USA. 2012;109:15553–15559. doi: 10.1073/pnas.1213423109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bloom J.S., Ehrenreich I.M., Loo W.T., Lite T.L.V., Kruglyak L. Finding the sources of missing heritability in a yeast cross. Nature. 2013;494:234–237. doi: 10.1038/nature11867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bloom J.S., Kotenko I., Sadhu M.J., Treusch S., Albert F.W., Kruglyak L. Genetic interactions contribute less than additive effects to quantitative trait variation in yeast. Nat. Commun. 2015;6:8712. doi: 10.1038/ncomms9712. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Mackay T.F.C. Epistasis and quantitative traits: using model organisms to study gene–gene interactions. Nat. Rev. Genet. 2014;15:22–33. doi: 10.1038/nrg3627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Forsberg S.K.G., Bloom J.S., Sadhu M.J., Kruglyak L., Carlborg Ö. Accounting for genetic interactions improves modeling of individual quantitative trait phenotypes in yeast. Nat. Genet. 2017;49:497–503. doi: 10.1038/ng.3800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Jiang Y., Reif J.C. Modeling epistasis in genomic selection. Genetics. 2015;201:759–768. doi: 10.1534/genetics.115.177907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Schrauf M.F., Martini J.W.R., Simianer H., de Los Campos G., Cantet R., Freudenthal J., Korte A., Munilla S. Phantom epistasis in genomic selection: On the predictive ability of epistatic models. G3 (Bethesda) 2020;10:3137–3145. doi: 10.1534/g3.120.401300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hickey K.L., Dickson K., Cogan J.Z., Replogle J.M., Schoof M., D’Orazio K.N., Sinha N.K., Hussmann J.A., Jost M., Frost A., et al. GIGYF2 and 4EHP inhibit translation initiation of defective messenger RNAs to assist ribosome-associated quality control. Mol. Cell. 2020;79:950–962.e6. doi: 10.1016/j.molcel.2020.07.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Norman T.M., Horlbeck M.A., Replogle J.M., Ge A.Y., Xu A., Jost M., Gilbert L.A., Weissman J.S. Exploring genetic interaction manifolds constructed from rich single-cell phenotypes. Science. 2019;365:786–793. doi: 10.1126/science.aax4438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Horlbeck M.A., Xu A., Wang M., Bennett N.K., Park C.Y., Bogdanoff D., Adamson B., Chow E.D., Kampmann M., Peterson T.R., et al. Mapping the Genetic Landscape of Human Cells. Cell. 2018;174:953–967.e22. doi: 10.1016/j.cell.2018.06.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dixit A., Parnas O., Li B., Chen J., Fulco C.P., Jerby-Arnon L., Marjanovic N.D., Dionne D., Burks T., Raychowdhury R., et al. Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell. 2016;167:1853–1866.e17. doi: 10.1016/j.cell.2016.11.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Carlborg Ö., Haley C.S. Epistasis: too often neglected in complex trait studies? Nat. Rev. Genet. 2004;5:618–625. doi: 10.1038/nrg1407. [DOI] [PubMed] [Google Scholar]
  • 24.Hill W.G., Goddard M.E., Visscher P.M. Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits. PLoS Genet. 2008;4 doi: 10.1371/journal.pgen.1000008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Huang W., Mackay T.F.C. The genetic architecture of quantitative traits cannot be inferred from variance component analysis. PLoS Genet. 2016;12 doi: 10.1371/journal.pgen.1006421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Schrode N., Ho S.M., Yamamuro K., Dobbyn A., Huckins L., Matos M.R., Cheng E., Deans P.J.M., Flaherty E., Barretto N., et al. Synergistic effects of common schizophrenia risk variants. Nat. Genet. 2019;51:1475–1485. doi: 10.1038/s41588-019-0497-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Weiner D.J., Wigdor E.M., Ripke S., Walters R.K., Kosmicki J.A., Grove J., Samocha K.E., Goldstein J.I., Okbay A., Bybjerg-Grauholm J., et al. Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders. Nat. Genet. 2017;49:978–985. doi: 10.1038/ng.3863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bergen S.E., Ploner A., Howrigan D., CNV Analysis Group and the Schizophrenia Working Group of the Psychiatric Genomics Consortium. O’Donovan M.C., Smoller J.W., Sullivan P.F., Sebat J., Neale B., Kendler K.S. Joint contributions of rare copy number variants and common SNPs to risk for schizophrenia. Am. J. Psychiatry. 2019;176:29–35. doi: 10.1176/appi.ajp.2018.17040467. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Manolio T.A., Collins F.S., Cox N.J., Goldstein D.B., Hindorff L.A., Hunter D.J., McCarthy M.I., Ramos E.M., Cardon L.R., Chakravarti A., et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. doi: 10.1038/nature08494. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hivert V., Sidorenko J., Rohart F., Goddard M.E., Yang J., Wray N.R., Yengo L., Visscher P.M. Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals. Am. J. Hum. Genet. 2021;108 doi: 10.1016/j.ajhg.2021.04.012. 962–798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Cockerham C.C. An Extension of the Concept of Partitioning Hereditary Variance for Analysis of Covariances among Relatives When Epistasis Is Present. Genetics. 1954;39:859–882. doi: 10.1093/genetics/39.6.859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Henderson C.R. Best linear unbiased prediction of nonadditive genetic merits in noninbred populations. J. Anim. Sci. 1985;60:111–117. [Google Scholar]
  • 33.Zuk O., Hechter E., Sunyaev S.R., Lander E.S. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. USA. 2012;109:1193–1198. doi: 10.1073/pnas.1119675109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sheppard B., Rappoport N., Loh P.R., Sanders S.J., Zaitlen N., Dahl A. A model and test for coordinated polygenic epistasis in complex traits. Proc. Natl. Acad. Sci. USA. 2021;118 doi: 10.1073/pnas.1922305118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sinnott-Armstrong N., Naqvi S., Rivas M., Pritchard J.K. Gwas of three molecular traits highlights core genes and pathways alongside a highly polygenic background. Elife. 2021;10 doi: 10.7554/eLife.58615. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Young A.I., Frigge M.L., Gudbjartsson D.F., Thorleifsson G., Bjornsdottir G., Sulem P., Masson G., Thorsteinsdottir U., Stefansson K., Kong A. Relatedness disequilibrium regression estimates heritability without environmental bias. Nat. Genet. 2018;50:1304–1310. doi: 10.1038/s41588-018-0178-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Finucane H.K., Bulik-Sullivan B., Gusev A., Trynka G., Reshef Y., Loh P.R., Anttila V., Xu H., Zang C., Farh K., et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Finucane H.K., Reshef Y.A., Anttila V., Slowikowski K., Gusev A., Byrnes A., Gazal S., Loh P.R., Lareau C., Shoresh N., et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 2018;50:621–629. doi: 10.1038/s41588-018-0081-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Boyle E.A., Li Y.I., Pritchard J.K. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Liu X., Li Y.I., Pritchard J.K. Trans Effects on Gene Expression Can Drive Omnigenic Inheritance. Cell. 2019;177:1022–1034.e6. doi: 10.1016/j.cell.2019.04.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ma Y., Patil S., Zhou X., Mukherjee B., Fritsche L.G. ExPRSweb: An online repository with polygenic risk scores for common health-related exposures. Am. J. Hum. Genet. 2022;109:1742–1760. doi: 10.1016/j.ajhg.2022.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A.R., Bender D., Maller J., Sklar P., de Bakker P.I.W., Daly M.J., Sham P.C. Plink: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Moscovich A., Rosset S. On the cross-validation bias due to unsupervised preprocessing. J. Roy. Stat. Soc. B. 2022;84:1474–1502. [Google Scholar]
  • 44.Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.R., Bhatia G., Do R., et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am. J. Hum. Genet. 2015;97:576–592. doi: 10.1016/j.ajhg.2015.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Márquez-Luna C., Gazal S., Loh P.R., Kim S.S., Furlotte N., Auton A., 23andMe Research Team. Price A.L. Incorporating functional priors improves polygenic prediction accuracy in UK biobank and 23andme data sets. Nat. Commun. 2021;12:6052. doi: 10.1038/s41467-021-25171-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Singhal P., Veturi Y., Dudek S.M., Lucas A., Frase A., van Steen K., Schrodi S.J., Fasel D., Weng C., Pendergrass R., et al. Evidence of epistasis in regions of long-range linkage disequilibrium across five complex diseases in the UK biobank and eMERGE datasets. Am. J. Hum. Genet. 2023;110:575–591. doi: 10.1016/j.ajhg.2023.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.de Los Campos G., Sorensen D.A., Toro M.A. Imperfect linkage disequilibrium generates phantom epistasis (& perils of big data) G3 (Bethesda) 2019;9:1429–1436. doi: 10.1534/g3.119.400101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Hemani G., Powell J.E., Wang H., Shakhbazov K., Westra H.J., Esko T., Henders A.K., McRae A.F., Martin N.G., Metspalu A., et al. Phantom epistasis between unlinked loci. Nature. 2021;596:E1–E3. doi: 10.1038/s41586-021-03765-z. [DOI] [PubMed] [Google Scholar]
  • 49.Sverdlov S., Thompson E.A. The Epistasis Boundary: Linear vs. Nonlinear Genotype-Phenotype Relationships. bioRxiv. 2018 doi: 10.1101/503466. Preprint at. [DOI] [Google Scholar]
  • 50.Pazokitoroudi A., Chiu A.M., Burch K.S., Pasaniuc B., Sankararaman S. Quantifying the contribution of dominance deviation effects to complex trait variation in biobank-scale data. Am. J. Hum. Genet. 2021;108:799–808. doi: 10.1016/j.ajhg.2021.03.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Hemani G., Shakhbazov K., Westra H.J., Esko T., Henders A.K., McRae A.F., Yang J., Gibson G., Martin N.G., Metspalu A., et al. Detection and replication of epistasis influencing transcription in humans. Nature. 2014;508:249–253. doi: 10.1038/nature13005. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 52.Brown A.A., Buil A., Viñuela A., Lappalainen T., Zheng H.F., Richards J.B., Small K.S., Spector T.D., Dermitzakis E.T., Durbin R. Genetic interactions affecting human gene expression identified by variance association mapping. Elife. 2014;3 doi: 10.7554/eLife.01381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Fish A.E., Capra J.A., Bush W.S. Are interactions between cis-regulatory variants evidence for biological epistasis or statistical artifacts? Am. J. Hum. Genet. 2016;99:817–830. doi: 10.1016/j.ajhg.2016.07.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Sackton T.B., Hartl D.L. Genotypic context and epistasis in individuals and populations. Cell. 2016;166:279–287. doi: 10.1016/j.cell.2016.06.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Saitou M., Dahl A., Wang Q., Liu X. Allele frequency differences of causal variants have a major impact on low cross-ancestry portability of PRS. medRxiv. 2022 doi: 10.1101/2022.10.21.22281371. Preprint at. [DOI] [Google Scholar]
  • 56.Hou K., Ding Y., Xu Z., Wu Y., Bhattacharya A., Mester R., Belbin G.M., Buyske S., Conti D.V., Darst B.F., et al. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat. Genet. 2023;55:549–558. doi: 10.1038/s41588-023-01338-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Rawlik K., Canela-Xandri O., Tenesa A. Evidence for sex-specific genetic architectures across a spectrum of human complex traits. Genome Biol. 2016;17:166. doi: 10.1186/s13059-016-1025-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Khramtsova E.A., Davis L.K., Stranger B.E. The role of sex in the genomics of human complex traits. Nat. Rev. Genet. 2019;20:173–190. doi: 10.1038/s41576-018-0083-1. [DOI] [PubMed] [Google Scholar]
  • 59.Dahl A., Zaitlen N. Genetic influences on disease subtypes. Annu. Rev. Genomics Hum. Genet. 2020;21:413–435. doi: 10.1146/annurev-genom-120319-095026. [DOI] [PubMed] [Google Scholar]
  • 60.Oliva M., Muñoz-Aguirre M., Kim-Hellmuth S., Wucher V., Gewirtz A.D.H., Cotter D.J., Parsana P., Kasela S., Balliu B., Viñuela A., et al. The impact of sex on gene expression across human tissues. Science. 2020;369 doi: 10.1126/science.aba3066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Jannink J.L. Identifying Quantitative Trait Locus by Genetic Background Interactions in Association Studies. Genetics. 2007;176:553–561. doi: 10.1534/genetics.106.062992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Crawford L., Zeng P., Mukherjee S., Zhou X. Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLoS Genet. 2017;13 doi: 10.1371/journal.pgen.1006869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Akdemir D., Jannink J.L. Locally epistatic genomic relationship matrices for genomic association and prediction. Genetics. 2015;199:857–871. doi: 10.1534/genetics.114.173658. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Rau C.D., Gonzales N.M., Bloom J.S., Park D., Ayroles J., Palmer A.A., Lusis A.J., Zaitlen N. Modeling epistasis in mice and yeast using the proportion of two or more distinct genetic backgrounds: Evidence for “polygenic epistasis”. PLoS Genet. 2020;16 doi: 10.1371/journal.pgen.1009165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Smith S.P., Darnell G., Udwin D., Harpak A., Ramachandran S., Crawford L. Accounting for statistical non-additive interactions enables the recovery of missing heritability from GWAS summary statistics. bioRxiv. 2023 doi: 10.1101/2022.07.21.501001. Preprint at. [DOI] [Google Scholar]
  • 66.Turchin M.C., Darnell G., Crawford L., Ramachandran S. Pathway analysis within multiple human ancestries reveals novel signals for epistasis in complex traits. bioRxiv. 2020 doi: 10.1101/2020.09.24.312421. Preprint at. [DOI] [Google Scholar]
  • 67.Demetci P., Cheng W., Darnell G., Zhou X., Ramachandran S., Crawford L. Multi-scale inference of genetic trait architecture using biologically annotated neural networks. PLoS Genet. 2021;17:e1009754. doi: 10.1371/journal.pgen.1009754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Marchini J., Donnelly P., Cardon L.R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 2005;37:413–417. doi: 10.1038/ng1537. [DOI] [PubMed] [Google Scholar]
  • 69.Domingue B.W., Kanopka K., Mallard T.T., Trejo S., Tucker-Drob E.M. Modeling interaction and dispersion effects in the analysis of gene-by-environment interaction. Behav. Genet. 2022;52:56–64. doi: 10.1007/s10519-021-10090-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Domingue B.W., Kanopka K., Trejo S., Rhemtulla M., Tucker-Drob E.M. Ubiquitous bias and false discovery due to model misspecification in analysis of statistical interactions: The role of the outcome’s distribution and metric properties. Psychol. Methods. 2022 doi: 10.1037/met0000532. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Ballard J.L., O’Connor L.J. Shared components of heritability across genetically correlated traits. Am. J. Hum. Genet. 2022;109:989–1006. doi: 10.1016/j.ajhg.2022.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Border R., Athanasiadis G., Buil A., Schork A.J., Cai N., Young A.I., Werge T., Flint J., Kendler K.S., Sankararaman S., et al. Cross-trait assortative mating is widespread and inflates genetic correlation estimates. Science. 2022;378:754–761. doi: 10.1126/science.abo2059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Udler M.S., Kim J., von Grotthuss M., Bonàs-Guarch S., Cole J.B., Chiou J., Christopher D., Anderson on behalf of METASTROKE and the ISGC. Boehnke M., Laakso M., et al. Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis. PLoS Med. 2018;15 doi: 10.1371/journal.pmed.1002654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Liley J., Todd J.A., Wallace C. A method for identifying genetic heterogeneity within phenotypically defined disease subgroups. Nat. Genet. 2017;49:310–316. doi: 10.1038/ng.3751. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Werme J., van der Sluis S., Posthuma D., de Leeuw C.A. An integrated framework for local genetic correlation analysis. Nat. Genet. 2022;54:274–282. doi: 10.1038/s41588-022-01017-y. [DOI] [PubMed] [Google Scholar]
  • 76.Kendler K.S., Gardner C.O. Interpretation of interactions: guide for the perplexed. Br. J. Psychiatry. 2010;197:170–171. doi: 10.1192/bjp.bp.110.081331. [DOI] [PubMed] [Google Scholar]
  • 77.Gusev A., Bhatia G., Zaitlen N., Vilhjálmsson B.J., Diogo D., Stahl E.A., Gregersen P.K., Worthington J., Klareskog L., Raychaudhuri S., et al. Quantifying Missing Heritability at Known GWAS Loci. PLoS Genet. 2013;9:e1003993. doi: 10.1371/journal.pgen.1003993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Gamazon E.R., Wheeler H.E., Shah K.P., Mozaffari S.V., Aquino-Michaels K., Carroll R.J., Eyler A.E., Denny J.C., GTEx Consortium. Nicolae D.L., et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 2015;47:1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Mukamel R.E., Handsaker R.E., Sherman M.A., Barton A.R., Zheng Y., McCarroll S.A., Loh P.R. Protein-coding repeat polymorphisms strongly shape diverse human phenotypes. Science. 2021;373:1499–1505. doi: 10.1126/science.abg8289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.LaBianca S., Brikell I., Helenius D., Loughnan R., Mefford J., Palmer C.E., Walker R., Gådin J.R., Krebs M., Appadurai V., et al. Polygenic profiles define aspects of clinical heterogeneity in ADHD. medRxiv. 2021 doi: 10.1101/2021.07.13.21260299. Preprint at. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Elliott L.T., Sharp K., Alfaro-Almagro F., Shi S., Miller K.L., Douaud G., Marchini J., Smith S.M. Genome-wide association studies of brain imaging phenotypes in UK biobank. Nature. 2018;562:210–216. doi: 10.1038/s41586-018-0571-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S8, Tables S1–S7, Notes S1 and S2
mmc1.pdf (10MB, pdf)
Table S8. IFG1 EFA results
mmc2.csv (11.4KB, csv)
Table S9. Urate EFA results
mmc3.csv (11.6KB, csv)
Table S10. Male testosterone EFA results
mmc4.csv (11.3KB, csv)
Table S11. Female testosterone EFA results
mmc5.csv (9KB, csv)
Document S2. Article plus supplemental information
mmc6.pdf (13.3MB, pdf)

Data Availability Statement


Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES