Skip to main content
Genetics logoLink to Genetics
. 2014 Oct 17;198(4):1405–1416. doi: 10.1534/genetics.114.170795

Estimation of Epistatic Variance Components and Heritability in Founder Populations and Crosses

Alexander I Young *,1, Richard Durbin
PMCID: PMC4256760  PMID: 25326236

Abstract

Genetic association studies have explained only a small proportion of the estimated heritability of complex traits, leaving the remaining heritability “missing.” Genetic interactions have been proposed as an explanation for this, because they lead to overestimates of the heritability and are hard to detect. Whether this explanation is true depends on the proportion of variance attributable to genetic interactions, which is difficult to measure in outbred populations. Founder populations exhibit a greater range of kinship than outbred populations, which helps in fitting the epistatic variance. We extend classic theory to founder populations, giving the covariance between individuals due to epistasis of any order. We recover the classic theory as a limit, and we derive a recently proposed estimator of the narrow sense heritability as a corollary. We extend the variance decomposition to include dominance. We show in simulations that it would be possible to estimate the variance from pairwise interactions with samples of a few thousand from strongly bottlenecked human founder populations, and we provide an analytical approximation of the standard error. Applying these methods to 46 traits measured in a yeast (Saccharomyces cerevisiae) cross, we estimate that pairwise interactions explain 10% of the phenotypic variance on average and that third- and higher-order interactions explain 14% of the phenotypic variance on average. We search for third-order interactions, discovering an interaction that is shared between two traits. Our methods will be relevant to future studies of epistatic variance in founder populations and crosses.

Keywords: cross, epistasis, founder, heritability, interaction


GENOME-WIDE association studies (GWAS) have renewed interest in methods for estimating the narrow sense heritability, which is the maximum proportion of the phenotypic variance that the additive effects found by GWAS could explain. The variance explained by the known associations for a trait is typically only a fraction of the estimated narrow sense heritability, with the remaining heritability often labeled “missing” (Eichler et al. 2010; Visscher et al. 2012).

Most estimates of narrow sense heritability come from twin and family studies (Boomsma et al. 2002), which can be upwardly biased in the presence of genetic interactions (Zuk et al. 2012). Genetic interactions introduce this bias by introducing convex nonlinearity to the relationship between phenotypic correlation and kinship (see Figure 1). Both the most common twin-studies estimator, the additive-common-environment (ACE) estimator, and a method that exploits the variation in kinship between siblings (Visscher et al. 2006) assume a linear relationship between phenotypic correlation and kinship. The gradient at the mean level of kinship in the population is the true narrow sense heritability (Zuk et al. 2012), with the gradient increasing as kinship increases above the mean due to the influence of genetic interactions. The degree to which this has biased twin- and family-study estimates depends on the amount of epistatic variance, which is not known for most complex traits.

Figure 1.

Figure 1

Epistatic trait with heritability estimators. Phenotypic correlation as a function of genotypic correlation is plotted for an epistatic trait (solid black curve) with narrow sense heritability h2 = 0.4 and broad sense heritability H2 = 0.8 and for an additive trait (dotted black line) with narrow sense heritability h2 = 0.8. The genotypic correlation is a function of the kinship coefficient, K. The genotypic correlation for dizygotic twins (DZ) and monozygotic twins (MZ) is indicated on the x-axis. K0 is the mean kinship coefficient in the population, where phenotypic correlation is zero. The ACE estimator is twice the difference between the monozygotic and dizygotic phenotypic correlation, the gradient of the blue line, which is 1 here. The estimator used by Visscher et al. (2006) is the gradient of the orange line (0.8), which is the rate of change of phenotypic correlation around the mean genotypic correlation for siblings. The estimator proposed by Zuk et al. (2012) is the gradient of the red line, which is equal to the narrow sense heritability, 0.4.

How much variance there is from interaction effects reflects the genetic architecture of a trait and the statistical complexity of the relationship between genotype and phenotype- it is therefore of interest beyond the debate about “missing heritability.” If we knew in advance which traits exhibited considerable variance from interactions, it would help focus resources on searching for interactions in those traits.

Some of the first evidence that common variants interact to influence human traits was found by Strange et al. (2010), Brown et al. (2014), and Hemani et al. (2014). However, the interactions they found explained only a small amount of the phenotypic variance. By analogy with the problem of missing heritability for additive effects, it is unlikely that we will have the power to detect all of the interactions influencing a trait, so the only way to assess the statistical importance of interaction effects will be by methods that measure the variance they contribute in aggregate.

The evolutionary model of a phenotype depends on the partition of its genetic variance into additive and nonadditive components. Mackay (2014) argues that natural populations have evolved suppressing epistatic interactions as “canalizing” mechanisms, which is a possible explanation for why we do not observe many common, large marginal-effect alleles. Hemani et al. (2013) suggest that epistasis explains why there is more genetic variation than expected for traits under selection, a phenomenon called “stasis.” Vukcevic et al. (2011) and Hemani et al. (2013) suggest that some of the associations found by GWAS look additive only because of incomplete linkage disequilibrium between genotyped variants and interacting causal variants. These hypotheses could be tested by accurately estimating the variance from pairwise interactions.

Classical quantitative genetic theory generally assumes an infinite, outbred, random-mating population—see Gallais (1974) for a list of typical assumptions. Fisher (1918) showed how pairwise genetic interactions influence the covariance between relatives under these population assumptions. Cockerham (1954) and Kempthorne (1954, 1955) independently generalized Fisher’s result to include all orders of genetic interaction.

It has proved very difficult to estimate the variance from pairwise interactions in the outbred populations for which the theory was derived. This is because there is almost no contribution from interaction effects to the covariance between pairs of unrelated individuals in outbred populations. Samples of unrelated individuals therefore contain very little information about the contribution of interaction effects to phenotypic variation. Samples of closely related individuals do, but the information is often confounded with shared environmental and dominance effects.

The difficulty of estimating the variance from pairwise interactions can be likened to the difficulty of fitting the quadratic curve shown for the epistatic trait in Figure 1. Fitting a quadratic requires information about a wider range of points than fitting a line. Therefore, estimating the variance from interactions requires information about phenotypic correlation over a wider range of kinship than estimating the narrow sense heritability does. Bottlenecked populations are characterized by increased kinship variation and mean kinship compared to outbred populations (Abney et al. 2000; Carmi et al. 2013). A sample from a founder population therefore contains more information about the nonlinear change in phenotypic correlation with kinship, thereby bringing the variance from genetic interactions into statistical reach.

Theoretical work in founder populations was previously restricted to the contribution of the variance from pairwise interactions to the covariance between half and full siblings (Cockerham and Tachida 1988; Tachida and Cockerham 1989). Tachida and Cockerham (1989) adjust the covariance between siblings for the background relatedness introduced by the population bottleneck. We instead adjust the kinship for recent finite population size and use this expression to derive the covariance between relatives in a founder population as a function of their kinship, the mean kinship in the population, and the variance from genetic interactions of different order. We thereby give the estimator proposed by Zuk et al. (2012) (see Figure 1) as a simple corollary.

The theory we develop applies to certain laboratory crosses as well as to natural founder populations. Laboratory crosses often start with a small number of founding individuals, such as the “outbred” rat and mouse populations (Baud et al. 2013). The small number of founders gives a larger range of kinship than occurs in natural human founder populations, giving more power to estimate interaction variance. We exploit this greater power to perform the first estimates of the variance from third- and higher-order interactions in a yeast cross. We search for third-order interactions in those traits exhibiting evidence for variance from third- and higher-order interactions, and we find a third-order interaction shared between two traits that explains the majority of their covariance.

Theory

The theory is derived for a population recently founded by a finite number of ancestors carrying a total of A haplotypes, with random mating after founding.

Genotypic covariance

We consider an allele at a locus i on haplotypes t and u. We calculate the covariance of the allelic states of the haplotypes by conditioning on whether the alleles were inherited from the same founder: whether the alleles are identical-by-descent (IBD). The allelic state is coded as a binary variable: gti for haplotype t and gui for haplotype u.

If there were A ancestral haplotypes at the time of the bottleneck, and ci of these haplotypes carried the allele, then the probability that the two haplotypes carry the allele given that they are IBD at the locus is

(gui=gti=1|IBDu,ti)=ciA, (1)

where IBDu,ti indicates that haplotypes u and t are IBD at locus i.

Conversely, given that the two haplotypes are not IBD at the locus, then the probability they both carry the allele is

(gui=gti=1|¬IBDu,ti)=ci(ci1)A(A1). (2)

This is because, given the alleles are not IBD, if one haplotype inherits the allele from one of the ci ancestral haplotypes carrying the allele, the other haplotype can inherit the allele only from one of the (ci − 1) other ancestral haplotypes carrying the allele—sampling from the ancestral haplotypes without replacement.

The probability that a pair of haplotypes sampled without replacement is IBD at a locus is the mean kinship coefficient, defined to be K0, which is A−1 for a random mating population. If we define the expected allele frequency to be fi = ci/A, (2) can therefore be expressed as

(gui=gti=1|¬IBDu,ti)=fi(fiK0)1K0. (3)

Note that because fiK0 ≥ 0, fi2(gui=gti=1|¬IBDu,ti)0. Therefore, if κt,u is the probability that haplotypes t and u are IBD at locus i, the probability that both haplotypes carry the allele is

(gti=gui=1)=κt,ufi+(1κt,u)fi(fiK0)1K0. (4)

Because E[gui]=E[gti]=fi, the covariance between gti and gui is therefore

Cov(gti,gui)=E[gtigui]E[gui]E[gti] (5)
=(gti=gui=1)fi2 (6)
=fi(1fi)κt,uK01K0. (7)

K0 parameterizes the adjustment of the genotypic covariance for finite population history. When K0 = 0, as in an infinite, random-mating population, alleles are independent when not shared IBD.

Covariance between relatives

We derive the result first for the simple case of two interacting biallelic loci in linkage equilibrium, with the generalization following the same logic.

The phenotype of a diploid individual t is modeled as

Yt=μ+β1xt1+β2xt2+β1,2xt1xt2+ϵt. (8)

The xti variables represent t’s deviation in minor allele copy number from the mean at locus i:

xti=gtim+gtip2fi, (9)

where gtim and gtip are indicator variables for whether the maternal and paternal haplotypes of individual t carry the minor allele at locus i. The frequency of the minor allele at locus i is fi, and therefore E[xti]=0. The phenotypic mean is μ, and ϵt is the residual error, with mean zero and variance σϵ2, which includes both environmental influences and random noise.

Expressing the genetic contribution to the phenotypic value in this way gives an orthogonal partition of the phenotypic variance. Because of linkage equilibrium, Cov(xt1, xt2) = 0. Cov(xt1, xt1xt2) also equals zero because

Cov(xt1,xt1xt2)=E[xt12xt2]=E[xt12]E[xt2]=0, (10)

where we have again relied on the fact that the loci are in linkage equilibrium. This implies that β1 is the regression coefficient of the genotype at locus 1 on the phenotype. β1,2xt1xt2 is the residual effect of the interaction between loci 1 and 2 after accounting for the marginal effects of the loci.

The covariance between the phenotypes of two individuals t and u relies upon the covariance of their genotypes, which is a function of the IBD sharing between their haplotypes,

Cov(xt1,xu1)=i=m,pj=m,pCov(gt1i,gu1j)=f1(1f1)i=m,pj=m,pκt,ui,jK01K0, (11)

by (7), and where κt,um,p is the proportion of the maternal haplotype of individual t that is IBD with the paternal haplotype of individual u. This can be expressed in terms of the kinship coefficient between t and u, defined to be Kt,u. This is the probability that an allele drawn at random from each individual is IBD. Therefore,

Kt,u=14i=m,pj=m,pκt,ui,j. (12)

The covariance can thereby be expressed as

Cov(xt1,xu1)=2Kt,uK01K0Var(x1), (13)

where K0 is the mean kinship coefficient in the population.

The covariance between the interaction effects is

β1,22E[xt1xt2xu1xu2]=β1,22E[xt1xu1]E[xt2xu2] (14)

by linkage equilibrium; by (13), this is equivalent to

4(Kt,uK01K0)2β1,22Var(x1)Var(x2). (15)

Therefore, the phenotypic covariance is

Cov(Yt,Yu)=2(Kt,uK01K0)v1+4(Kt,uK01K0)2v2+Cov(ϵt,ϵu), (16)

where v1=(β12Var(x1)+β22Var(x2)) is the additive variance, and v2=β1,22Var(x1)Var(x2) is the pairwise interaction variance.

The phenotypic variance of individual t is a function of t’s inbreeding coefficient, Ft. Setting t = u in (16) and using the fact that Kt,t = (1 + Ft)/2,

Var(Yt)=(1+FtK01K0)v1+(1+FtK01K0)2v2+σϵ2. (17)

In Supporting Information, File S1, we extend the two-locus model to include dominance effects at the loci. In a founder population, inbreeding induces a correlation between the additive and dominance effects at a locus. The change in the mean due to inbreeding—inbreeding depression—introduces a further variance component. The individual-level variance is

Var(Yt)=τ=12(1+FtK01K0)τvτ+(1Ft)vδ+4FtK01K0Ca,d+Ftvh+Ft(1Ft)(1K0)2SSμh+σϵ2, (18)

where vδ is approximately equal to the dominance variance as defined in an outbred population; Ca,d is the covariance between additive and dominance effects; vh is the dominance variance in a homozygous population; and SSμh is the sum of the squared inbreeding depressions at the loci, which is approximately equal to vδ when K0 is small. The components, apart from vδ, are as defined in Abney et al. (2000); however, their coefficients are different.

The variance of the phenotype in the population is found by applying the law of total variance, Var(Y)=Et[Var(Yt)]+Vart(E[Yt]), to Equation 18. Because the mean inbreeding coefficient is equal to the mean kinship coefficient in a random-mating population,

Var(Y)=v1+(1+Var(F)(1K0)2)v2+(1K0)vδ+K0vh+K0(1K0)SSμh+Var(F)(1K0)2(μh2SSμh)+σϵ2. (19)

The narrow sense heritability in the population is h2 = v1/Var(Y). For an outbred population, the variance in inbreeding coefficient, Var(F), is zero, so the proportion of phenotypic variance explained by the interaction is v2/Var(Y). However, for strongly bottlenecked populations, variation in inbreeding coefficient increases the contribution of the interaction to Var(Y). The dominance variance components arising from inbreeding, vh, SSμh, and μh2, do not contribute much to population variation except in the most strongly bottlenecked populations.

We now give the generalization of (16) to arbitrary epistasis between a set of causal loci each with any number of alternative alleles—for the detailed derivation, see File S1. The phenotypic covariance, for a set of causal loci N, is

Cov(Yt,Yu)=τ=1|N|2τ(Kt,uK01K0)τvτ+Cov(ϵt,ϵu), (20)

where vτ is the variance from interactions involving τ loci.

If we take the limit of (20) as K0 → 0, we get

Cov(Yt,Yu)=τ=1|N|(2Kt,u)τvτ+Cov(ϵt,ϵu), (21)

which is equivalent to the result of Kempthorne (1954) without dominance effects.

Under more restrictive assumptions, Zuk et al. (2012) derived that, for haploids, the gradient of the phenotypic correlation at the mean IBD sharing is the narrow sense heritability—see Figure 1 for a visualization of this. The diploid version of the Zuk et al. (2012) theorem is a corollary of (20) given by

v1=(1K0)2Cov(Yt,Yu)Kt,u|Kt,u=K0. (22)

This shows that (20) unifies the estimator proposed by Zuk et al. (2012) with the classic result of Kempthorne (1954).

The regression method proposed by Zuk et al. (2012) to estimate v1 does not take into account the dependencies between pairs of phenotype observations. It is therefore preferable to fit variance components by maximum likelihood or restricted maximum likelihood, as in Abney et al. (2000), Browning and Browning (2013), and Zaitlen et al. (2013). The off-diagonal elements of the phenotypic covariance matrix are given by (20), with the diagonal elements given by

Var(Yt)=τ=1|N|(1+FtK01K0)τvτ+σϵ2. (23)

Haploid case

The theory simplifies in the haploid case due to absence of inbreeding or dominance effects. The kinship between two haploids i and j, Ki,j, is simply the proportion of the genome shared IBD, and the phenotypic covariance matrix is

Cov(Y)=τ=1nvτKτ+Cov(E), (24)

where Kτ is a symmetric matrix with 1’s on the diagonal and off-diagonal elements

[Kτ]ij=(Ki,jK01K0)τ, (25)

and Cov(E) is the covariance matrix of the environmental effects.

Materials and Methods

Simulations for variance component inference

Pairwise interaction variance:

To investigate the precision with which the variance from pairwise interactions could be estimated in different populations, we simulated founder populations with different mean kinship by varying the number of founding haplotypes. The R code used for the simulations is in File S2.

The allele frequencies of the variants in the ancestral population were generated by randomly sampling from a distribution with density proportional to 1/f, where f is the allele frequency. We simulated 100 variants in this way.

Each chromosome in the sample was made as a mosaic of independently inherited segments: the length of each segment was drawn from an exponential distribution with a mean of 10, and the genotypes in the segment were copied from a random ancestral haplotype. The expected number of independently inherited segments for each haplotype in the sample was therefore 10. The ancestor from whom each segment was inherited was recorded.

To calculate the diploid kinship coefficient for a pair of individuals, the total number of variants descending from the same ancestor for each of the four maternal/paternal–maternal/paternal haplotype pairs, one from each individual, was calculated; the sum total sharing across the four haplotype pairs divided by four times the number of variants gives the diploid kinship coefficient between the two individuals. The mean kinship coefficient, K0, was taken to be the inverse of the number of ancestral haplotypes, which is its expectation. There will be negligible deviation of the sample K0, calculated over all pairs, from its expectation.

Following the theoretical results, we calculated the component of the covariance matrix due to additive effects, defined to be R1, by calculating element s, t of R1 as 2(Ks,tK0)/(1 − K0), where Ks,t is the diploid kinship coefficient of the pair of individuals s and t. The component of the covariance matrix due to pairwise interaction effects, R2, was calculated as the Hadamard square of R1.

The kinship coefficients were calculated using all 100 variants, corresponding to calculating kinship from genome-wide IBD sharing. However, only a small proportion of the genome is likely to affect a particular trait. To simulate the sparsity of causal variants, the traits were simulated by randomly choosing 10 variants to be causal.

The variants were independently chosen for each simulated trait, covering a range of different frequency distributions of causal variants. Each variant was given an additive effect, and each pair of variants was given an interaction effect. Effects were drawn from normal distributions scaled so that v1 = 0.4 and v2 = 0.2; Gaussian error was added with variance 0.4. The variance components were inferred by fitting the covariance matrix as

v1R1+v2R2+σϵ2I. (26)

Variance components were estimated by restricted maximum likelihood, using the average information algorithm in GCTA (Yang et al. 2011).

We simulated four populations with mean kinship ranging from 1/240 to 1/30, covering a broad range of human founder populations. We simulated 500 phenotypes with the same variance components for each population.

To investigate how epistasis might bias inference of additive variance, we fitted the covariance matrix as v1R1+σϵ2I, ignoring any epistasis, across the four simulated populations for the phenotypes with v1 = 0.4 and v2 = 0.2. To measure the effect of the amount of epistasis on the bias, we simulated further phenotypes, varying v2 from 0.1 to 0.4 for the population with mean kinship 1/240.

Third-order interaction variance:

To investigate the limits of our ability to fit epistatic variance components, for each of the four populations we simulated 200 additional phenotypes with v1 = 0.4, v2 = 0.2, and v3 = 0.2; Gaussian error was added with variance 0.2. The phenotypes were simulated as above except every combination of three causal variants was given a third-order interaction effect, scaled so that the total variance from third-order interactions was 0.2. The variance components were inferred by fitting the covariance matrix as

v1R1+v2R2+v3R3+σϵ2I, (27)

where R3 is the Hadamard cube of R1.

Yeast cross

Bloom et al. (2013) presented data from a cross of a laboratory strain and a wine strain of yeast. They sequenced the founder strains and 1008 genetically distinct haploid descendants (segregants) of the cross of the two strains. This allowed Bloom et al. (2013) to infer from which founder each allele had been inherited. We inferred IBD sharing proportions for each pair of haploid segregants by calculating the probability that a randomly chosen variant was inherited from the same founder. The phenotype data are final colony size for each segregant on 46 different growth media. Bloom et al. (2013) estimated the broad sense heritability H2 by analyzing biological replicates.

Inference of heritability components:

We fitted the following model to each phenotype Y,

YN(μ,v1K1+v2K2+σ2I), (28)

where K1 and K2 are as defined in Equation 25 and are calculated from IBD sharing between segregants. We used the average information algorithm (Gilmour et al. 1995) as implemented in GCTA (Yang et al. 2011) to find the restricted maximum-likelihood estimates of the narrow sense heritability, h2 = v1/Var(Y), and the proportion of phenotypic variance from pairwise interactions, h22=v2/Var(Y). Bloom et al. (2013) estimate the broad sense heritability, H2, from analyzing biological replicates, allowing us to take advantage of the fact that

τ=3nvτVar(Y)=H2(h2+h22) (29)

to estimate τ=3nvτ/Var(Y), which we define to be h>2. This is the component of the broad sense heritability that originates exclusively in interactions involving three or more loci.

The standard error of the estimate of h>2 is estimated as SE(h2^)2+SE(h22^)2+SE(H2^)2, where h2^ and h22^ are our maximum-likelihood estimates, and H2^ is from Bloom et al. (2013).

Simulation of epistatic traits from yeast data:

To test inference of h2, h22, and h>2, we simulated 500 phenotypes from the genotypes. The R code used to simulate the phenotypes is in File S3. For each phenotype, 50 causal variants were sampled independently and at random from across the genome. All 50 causal variants were given additive effects, and all pairs were given interaction effects; 10 of the 50 causal variants were chosen at random to have third-order interactions with each other; and 8 of these were chosen at random to have fourth-order interactions with each other. The effects were drawn from normal distributions scaled so that h2 = 0.4, h22=0.3, and h>2=0.2, with the higher-order variance equally divided between third- and fourth-order interactions. Gaussian error was added so that H2 = 0.9.

Search for third-order interactions:

To restrict ourselves to traits that were likely to harbor third-order interactions, we searched for interactions only in those traits whose estimated variance from third-order interactions was more than twice the estimated standard error. The R code used to do this is in File S4.

Phenotypic variance differs conditional on the genotype at a locus involved in an interaction. This has been exploited to reduce the multiple-testing burden when searching for pairwise interactions (Nelson et al. 2013), a burden that is an order of magnitude worse when searching for third-order interactions. To find candidate loci for third-order interactions, we performed a genome-wide scan for variance-controlling loci for each trait, using the Brown–Forsythe test to assess evidence for an effect of the locus on phenotypic variance. To reduce correlation between tests, we selected the locus with the smallest P-value in a sliding window of 300 SNPs. We adjusted these P-values for multiple comparisons, using the Benjamini–Hochberg method, selecting those loci with an adjusted P-value <0.05.

We added the variance-controlling loci to the loci found to have significant marginal effects by Bloom et al. (2013). If a pair of loci from this list had a correlation of >0.7, we removed one of the pair. We performed regressions on each combination of three loci from this list, where we fitted the full model including all interaction effects. P-values for each effect were calculated by ANOVA. We selected only those third-order interactions with a Bonferroni-corrected P-value <0.05.

Results

Simulations

Pairwise interaction variance:

We simulated 5000 individuals from founder populations with varying degrees of kinship as described in Materials and Methods. We simulated 500 phenotypes for each population, where each phenotype had the same variance components: additive variance v1 = 0.4 and pairwise interaction variance v2 = 0.2. Gaussian error was added with variance 0.4. These variance components were chosen to test inference for a relatively small amount of epistatic variance and a relatively large amount of noise.

Figure 2A shows that the mean estimate is close to the true value for each population; the mean estimate across the four populations was 0.204, indicating the estimation was unbiased. Figure 2B shows that the standard deviations of the simulation estimates scale in proportion to 1/K0—. It may therefore be preferable to use a smaller sample from a more strongly bottlenecked population than a larger sample from a less strongly bottlenecked population.

Figure 2.

Figure 2

Simulation results for the estimation of the variance from pairwise interactions. Phenotypes were simulated 500 times for four simulated populations with different mean kinship, each composed of 5000 individuals. (A) Boxplots of the simulation estimates of the variance from pairwise interactions for the four populations. The dashed red line indicates the true variance from pairwise interactions, 0.2. (B) The standard deviation of the simulation estimates of the variance from pairwise interactions plotted against the mean kinship of the sample. The points marked on the x-axis correspond to estimates of the mean kinship in Saguenay (QC, Canada) (Gauvin et al. 2013), the Amish (Khoury et al. 1987), and the Hutterites (Abney et al. 2000). The curve drawn is proportional to 1/K0.

Third-order interaction variance:

To explore our ability to fit higher-order epistatic variance components, 200 additional phenotypes were simulated for each population, including third-order as well as pairwise interactions (v1 = 0.4, v2 = 0.2, v3 = 0.2). The estimation of the variance from third-order interactions was unbiased, with a mean estimate of 0.204 across the populations with K0 = 1/120, 1/60, and 1/30. There was not enough information to fit the model in the population with K0 = 1/240.

The standard deviation for the third-order interaction variance was at least twice as large as the standard deviation for the pairwise interaction variance across the populations (Figure 3). Even for the most strongly bottlenecked population, the standard error for the third-order interaction variance is nearly 15% of the phenotypic variance, comparable to the size of the variance component.

Figure 3.

Figure 3

Standard deviations for variance component estimates in a simulation that includes third-order interactions, plotted as in Figure 2B. The standard deviations of the estimate for the third-order component are at least twice those of the estimate for the pairwise component.

Ignoring epistasis biases additive variance estimates:

To investigate any possible bias in the estimation of narrow sense heritability that may arise from ignoring epistasis, we fitted models with only additive variance components to the simulation data used in Figure 2. We found that the bias did not depend on the mean kinship of the population (Table S3).

We simulated additional phenotypes with varying amounts of epistasis (v2 = 0.1, 0.2, 0.3, 0.4) for the population with K0 = 1/240. The amount of bias was proportional to the amount of epistatic variance; the additive variance estimates were inflated by ∼6% of v2 (Table S4).

Ignoring epistasis resulted in inaccurate standard error estimates. Figure 4 shows that even when only 10% of the phenotypic variance is epistatic, the standard error of the additive variance estimated from simulations is >15% larger than the standard error estimated by GCTA.

Figure 4.

Figure 4

The effect of ignoring epistasis on standard error estimates. Additive-only models were fitted to phenotypes with v1 = 0.4 and v2 ranging from 0.1 to 0.4. The standard deviations of the estimates of the additive variance, denoted simulation error, were calculated. Ratios of these to the standard error estimates given by GCTA are plotted on the y-axis.

Approximate analytic standard error

The amount of information a sample contains about the epistatic variance of a trait depends on the distribution of kinship in that sample. To better understand how the standard error of the estimator of v2 depends on the moments of the kinship distribution, we extend the analogy of fitting a quadratic from Figure 1 to derive an approximate analytic expression for the standard error.

If we define Rs,t = 2(Ks,tK0)/(1 − K0), then the process of fitting the covariance matrix implied by (20) can be likened to fitting a polynomial in R. Fitting the additive variance, v1, and the variance from pairwise interactions, v2, is similar to fitting to all pairs s, t the regression model

Σs,tN(v1Rs,t+v2Rs,t2,σ2), (30)

where Σs,t=(YsE[Y])(YtE[Y]) is the observed similarity between s and t. If s and t are independent and Y has variance 1, then σ2 = 1.

In File S1, we derive the asymptotic variance of the maximum-likelihood estimator of v2 in this model. If μc is the cth central moment of the distribution of R, then

Var(v2^)η(1/2)(μ4μ32Var(R))(1/2), (31)

where η is the number of pairs, and where we take σ to be 1.

Testing this using the simulation results, we find the standard deviation of the simulation estimates for K0 = 1/30, 1/60 to be very close to (31) calculated from the sample kinship statistics, with the error increasing above (31) for smaller K0—see Figure S1.

The information about the epistatic variance in a sample increases with the fourth central moment of the kinship distribution, which is the unnormalized kurtosis. The samples with heavy tails in their kinship distributions are therefore likely to have the largest kurtosis and the most information about epistatic variance.

Yeast cross

We analyzed data, further described in Materials and Methods, from a cross of a laboratory strain (BY) and a wine strain (RM) of yeast (Bloom et al. 2013). The data included 46 growth phenotypes measured for 1008 haploids dissected from tetrads produced by crossing the two founder strains.

Variance components:

To establish that our methods worked for these data, we first simulated epistatic traits from the genetic data, as detailed in Materials and Methods. The variance component estimates from the simulated traits were unbiased and the standard error estimates were accurate (Table S2).

Next we applied our approach to partition the phenotypic variance of the 46 growth phenotypes into additive, pairwise, and higher-order genetic components, plus a residual. Figure 5 visualizes this partitioning. The numerical results of the analysis are in Table S1 and File S5. Five phenotypes had estimates of the variance from higher-order interactions >3 standard errors from zero. The mean proportions of phenotypic variance explained by pairwise interactions (h22) and higher-order interactions (h>2) were 0.10 and 0.14, respectively.

Figure 5.

Figure 5

Variance components inferred for 46 different growth traits in the yeast cross. The lengths of the bars give the estimated proportions of phenotypic variance explained by the components: additive (black), pairwise interactions (yellow), and interactions of order higher than pairwise (blue). Z2 gives the estimate of the variance from pairwise interactions divided by the estimated standard error for the trait. Z> gives the estimate of the variance from third-order interactions divided by the estimated standard error.

Third-order interactions:

To further explore the contributions of higher-order interactions, we searched directly for third-order interactions as described in Materials and Methods. Table 1 shows the only two third-order interactions found to be significant at the 0.05 level after Bonferroni correction. For each marker in the interaction for the formamide phenotype, there is a corresponding marker in near perfect linkage in the interaction for the indoleacetic acid phenotype. This suggests that the interaction is shared between the two phenotypes, and it may explain some of their covariance. To estimate how much of the covariance this shared third-order interaction explains, we calculated the fitted values for the full interaction models, including the additive and pairwise effects, for each trait separately, and the covariance of these fitted values. The covariance of the fitted values explains 59% of the covariance between the phenotypes.

Table 1. Third-order interactions.
Trait H2 exp (%) h>2 exp (%) Marker-1 pos (Chr) Marker-2 pos (Chr) Marker-3 pos (Chr) P-value Bonferroni P-value
Formamide 2.2 11 997,621 (4) 101,490 (8) 487,251 (14) 1.6e-7 0.0012
Indoleacetic acid 1.4 5 974,039 (4) 101,016 (8) 470,846 (14) 5.9e-6 0.043

Two third-order interactions were found to be significant at the 0.05 level after Bonferroni correction. The interactions are for different traits but are between the same linkage blocks. Exp, explained; pos, position; Chr, chromosome.

Discussion

Theory

We used an approach based on allelic indicator variables to calculate the covariance between individuals in a founder population as a function of their kinship, the mean kinship in the population, and the variance components of the phenotype. This extends the classic result for outbred populations (Kempthorne 1954) to founder populations. Equations 20 and 23 together determine the phenotypic covariance matrix, the parameters of which can be estimated by (restricted) maximum likelihood by assuming the phenotype follows a particular distribution. These parameters, along with the central moments of the inbreeding distribution, determine the proportion of population variance explained by different orders of genetic interaction in a founder population. The relationship between (20) and Figure 1 can be seen by writing the phenotypic covariance as a polynomial function of R = 2(KK0)/(1 − K0), the x-axis of Figure 1. The correlation for the epistatic trait in Figure 1 as a function of R is

v1R+v2R2, (32)

where v1 is the additive variance in the population and is 0.4.

The model applies exactly only to populations that have been randomly mating since being founded; however, the allelic indicator variable approach could be extended to non-random-mating populations by considering models for nonrandom inheritance of alleles.

Extending the method to include linkage disequilibrium would be possible but would rely on knowing the linkage disequilibrium between unknown causal alleles. We note that identity-by-descent-based methods such as this are biased only by linkage disequilibrium between causal alleles, whereas identity-by-state (IBS)-based methods, such as in Yang et al. (2010), are also biased by varying linkage disequilibrium between marker alleles and causal alleles (Speed et al. 2012).

We have derived the individual- and population-level variance decomposition for two interacting loci with dominance in a founder population (see File S1 for details). Finite population history induces dependence between allelic states, both within and between individuals. This prohibits a simple and exact expression for the covariance between relatives for the additional variance components introduced by dominance. However, except for the most strongly bottlenecked populations, using the identity states implemented by Abney et al. (2000) will probably give a good approximation.

We note that an alternative theoretical approach would be to extend the frequency-weighted IBS estimator employed by Yang et al. (2010) to epistatic variance components. The IBS-based method of estimating the additive variance was compared to an IBD-based approach by Zaitlen et al. (2013), using Icelandic data. They found the IBS-based approach underestimated the additive variance relative to the IBD-based approach by a considerable amount. The same underestimation would be expected to apply to an IBS-based epistatic variance estimator, because it originates from incomplete linkage disequilibrium between markers and causal variants. The underestimation could be even more severe for epistatic variance components because, for the variance from an interaction to be properly detected, all of the loci involved in the interaction would have to be in strong linkage with the markers. We therefore argue for the IBD-based approach, which takes advantage of the long shared segments present in a founder population to reduce the bias in the estimates.

Simulations and sampling

The simulations (Figure 2) suggest that with a sample of 5000 Hutterites, it would be possible to estimate the variance from pairwise interactions with a standard error of <5% of the phenotypic variance. The standard error scales in proportion to the inverse of the mean kinship. This explains why one cannot simply use a very large, random sample from an outbred population to fit the epistatic variance, as the standard error tends to infinity as K0 tends to zero.

More recently founded populations that have expanded rapidly should be preferred to older populations, due to the reduced amount of recombination since the founding, which results in more kinship variation.

Including close relatives would increase the precision of the estimator. However, the similarity between close relatives could be due to shared environment as well as shared genetics. The confounding with shared environment could be ameliorated by fitting additional variance components for different relative classes. However, if shared environmental effects extend beyond first-degree relatives, the model may become excessively complex. Dominance could also cause similarity between siblings above what is expected by additive effects, leading to overestimation of the epistatic variance. For traits that are known to have little dominance variance or shared environmental effects, including close relatives would increase precision without causing bias. Otherwise, very large samples of close relatives would be required to disentangle epistasis, dominance, and shared environment.

Power calculations can be aided by the approximate analytic formula for the standard error of the variance from pairwise interactions. This acts like a lower bound for the standard error in the simulated data—see Figure S1. The moments of the kinship distribution can be calculated from a small sample, and, from these, an estimate of the standard error can be calculated for different sample sizes. If this is too high to give a useful estimate, then the sample is probably not appropriate for estimating the variance from pairwise interactions.

Direct estimation of the variance from third-order interactions may be beyond the limits of possibility for current human samples. Even with 5000 Hutterites, the standard error is likely to be at least 15% of the phenotypic variance (Figure 3). Unless this component is a large part of the phenotypic variance, it is unlikely that any current samples of human founder populations would provide the power to detect that the component is nonzero.

Founder populations have recently been used to estimate the narrow sense heritability (Browning and Browning 2013; Zaitlen et al. 2013). We found in simulations that ignoring epistasis leads to a slight overestimation of the additive variance in proportion to the amount of epistatic variance (Table S3 and Table S4), as well as underestimation of the standard error (Figure 4). This could cause improper calibration of statistical tests. It is possible that these problems could be reduced by restricting to a smaller range of relatedness, but this would increase the standard error.

Yeast cross

Bloom et al. (2013) found evidence for epistatic variance in the difference between H2 and h2. In the yeast cross analysis (Figure 5 and Table S1) we have gone further by partitioning the epistatic variance into components arising from pairwise interactions and from third- and higher-order interactions. While the individual estimates of the higher-order components are not very precise, we provide strong evidence that the variance from pairwise interactions does not in general explain all of the difference between H2 and h2. It is impossible to draw precise conclusions about the relative size of h22 and h>2 for individual traits, because the method of estimation results in a negative correlation between the estimates. A larger sample from a similarly designed experiment could overcome some of these difficulties and enable direct estimation of the variance from third-order interactions.

The relatively small amount of variance explained by the only third-order interaction found (Table 1) suggests that there are many more third- and higher-order interactions, each explaining a small amount of the variance. While a small number of third-order and higher-order interactions have previously been identified (Pettersson et al. 2011; Taylor and Ehrenreich 2014), this is the first such pleiotropic interaction to be discovered, as far as we are aware.

The statistical importance of pairwise and higher-order interactions in the yeast cross cannot be readily generalized to natural populations. For some interaction models, the proportion of the variance that is epistatic rather than additive is greatest for interacting alleles at intermediate frequencies (Hill et al. 2008; Mackay 2014). Therefore, if interactions occur between rarer alleles in natural populations, the proportion of the variance that is epistatic could be reduced.

The large amount of epistatic variance in the cross could be explained by the breakdown of coadapted variant combinations. The cross is between a laboratory strain and a wine strain of yeast, which have diverged under different selection pressures (Liti et al. 2009). Given that hybrid incompatibilities were observed between experimentally evolved strains (Anderson et al. 2010), it is plausible that these strains have accumulated them.

The large amount of epistatic variance arising from third- and higher-order interactions in particular could be explained by the buildup of hybrid incompatibilities. Incompatibilities between three or more loci are theoretically expected to be more common than incompatibilities between two loci, because a greater proportion of evolutionary paths to higher-order incompatibilities do not pass through a less fit intermediary (Orr 1995). Figure 6 shows that the fitness profile of the discovered third-order interaction is consistent with the interaction being the result of an evolved Dobzhansky–Muller-like incompatibility. Only one of the eight possible three-locus genotypes results in a sizeable reduction in fitness. Two-thirds of the evolutionary paths from a common ancestor to the two yeast strains would have avoided this less fit intermediary and would therefore have been allowed by selection (Orr 1995).

Figure 6.

Figure 6

Growth on formamide media for different combinations of interacting alleles. Shown is the distribution of growth on media containing formamide for the eight possible three-locus genotypes of the three interacting loci identified in Table 1. BRB indicates that the locus on chromosome 4 has the BY allele, the locus on chromosome 8 has the RM allele, and the locus on chromosome 14 has the BY allele, etc.

Conclusion

These methods can be used to investigate the role of pairwise and higher-order epistasis in model organisms by applying them to appropriate crosses. In particular, by measuring the variance that higher-order interactions contribute to crosses between diverged populations, these methods could be used to investigate the role of higher-order interactions in hybrid incompatibilities.

We anticipate that it will soon be possible to apply these methods to precisely estimate the variance from pairwise interactions in human founder populations. These estimates, combined with estimates of the additive and dominant components of the variance, will help in answering where the missing heritability is, in searching for causal loci, in building prediction models, and in testing evolutionary models of traits.

Supplementary Material

Supporting Information

Acknowledgments

We acknowledge Peter Donnelly for helpful comments on the manuscript. This work was supported by the Wellcome Trust (grants WT098051 to Richard Durbin, 099670/Z/12/Z to Alexander Young, and 090532/Z/09/Z to the Wellcome Trust Centre for Human Genetics).

Footnotes

Communicating editor: G. A. Churchill

Literature Cited

  1. Abney M., McPeek M. S., Ober C., 2000.  Estimation of variance components of quantitative traits in inbred populations. Am. J. Hum. Genet. 66: 629–650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Anderson J. B., Funt J., Thompson D. A., Prabhu S., Socha A., et al. , 2010.  Determinants of divergent adaptation and Dobzhansky-Muller interaction in experimental yeast populations. Curr. Biol. 20: 1383–1388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baud A., Hermsen R., Guryev V., Stridh P., Graham D., et al. , 2013.  Combined sequence-based and genetic mapping analysis of complex traits in outbred rats. Nat. Genet. 45: 767–775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bloom J. S., Ehrenreich I. M., Loo W. T., Lite T.-L. V. o., Kruglyak L., 2013.  Finding the sources of missing heritability in a yeast cross. Nature 494: 234–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Boomsma D., Busjahn A., Peltonen L., 2002.  Classical twin studies and beyond. Nat. Rev. Genet. 3: 872–882. [DOI] [PubMed] [Google Scholar]
  6. Brown A. A., Buil A., Viñuela A., Lappalainen T., Zheng H.-f., et al. , 2014.  Genetic interactions affecting human gene expression identified by variance association mapping. eLife 3: e01381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Browning S. R., Browning B. L., 2013.  Identity-by-descent-based heritability analysis in the Northern Finland Birth Cohort. Hum. Genet. 132: 129–138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Carmi S., Palamara P. F., Vacic V., Lencz T., Darvasi A., et al. , 2013.  The variance of identity-by-descent sharing in the Wright–Fisher model. Genetics 193: 911–928. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cockerham C. C., 1954.  An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39: 859–882. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cockerham C. C., Tachida H., 1988.  Permanency of response to selection for quantitative characters in finite populations. Proc. Natl. Acad. Sci. USA 85: 1563–1565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Eichler E. E., Flint J., Gibson G., Kong A., Leal S. M., et al. , 2010.  Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11: 446–450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Fisher R., 1918.  The correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52: 399–433. [Google Scholar]
  13. Gallais A., 1974.  Covariances between arbitrary relatives with linkage and epistasis in the case of linkage disequilibrium. Biometrics 30: 429–446. [PubMed] [Google Scholar]
  14. Gauvin, H., C. Moreau, J.-F. Lefebvre, C. Laprise, H. Vézina, et al., 2013 Genome-wide patterns of identity-by-descent sharing in the French Canadian founder population. Eur. J. Hum. Genet. 22: 814–821. [DOI] [PMC free article] [PubMed]
  15. Gilmour A. R., Thompson R., Cullis B. R., 1995.  Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51: 1440–1450. [Google Scholar]
  16. Hemani G., Knott S., Haley C., 2013.  An evolutionary perspective on epistasis and the missing heritability. PLoS Genet. 9: e1003295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hemani G., Shakhbazov K., Westra H.-J., Esko T., Henders A. K., et al. , 2014.  Detection and replication of epistasis influencing transcription in humans. Nature 508: 249–253. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  18. Hill W. G., Goddard M. E., Visscher P. M., 2008.  Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4: e1000008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kempthorne O., 1954.  The correlation between relatives in a random mating population. Proc. R. Soc. Lond. B Biol. Sci. 143: 103–113. [PubMed] [Google Scholar]
  20. Kempthorne O., 1955.  The theoretical values of correlations between relatives in random mating populations. Genetics 40: 153–167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Khoury M. J., Cohen B. H., Diamond E. L., Chase G. A., McKusick V. A., 1987.  Inbreeding and prereproductive mortality in the Old Order Amish. I. Genealogic epidemiology of inbreeding. Am. J. Epidemiol. 125: 453–461. [DOI] [PubMed] [Google Scholar]
  22. Liti G., Carter D. M., Moses A. M., Warringer J., Parts L., et al. , 2009.  Population genomics of domestic and wild yeasts. Nature 458: 337–341. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Mackay T. F. C., 2014.  Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat. Rev. Genet. 15: 22–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Nelson R. M., Pettersson M. E., Li X., Carlborg O., 2013.  Variance heterogeneity in Saccharomyces cerevisiae expression data: trans-regulation and epistasis. PLoS ONE 8: e79507. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Orr H. A., 1995.  The population genetics of speciation: the evolution of hybrid incompatibilities. Genetics 139: 1805–1813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Pettersson M., Besnier F., Siegel P. B., Carlborg O., 2011.  Replication and explorations of high-order epistasis using a large advanced intercross line pedigree. PLoS Genet. 7: e1002180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Speed D., Hemani G., Johnson M. R., Balding D. J., 2012.  Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91: 1011–1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Strange A., Capon F., Spencer C. C. A., Knight J., Weale M. E., et al. , 2010.  A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nat. Genet. 42: 985–990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Tachida H., Cockerham C. C., 1989.  Effects of identity disequilibrium and linkage on quantitative variation in finite populations. Genet. Res. 53: 63–70. [DOI] [PubMed] [Google Scholar]
  30. Taylor M. B., Ehrenreich I. M., 2014.  Genetic interactions involving five or more genes contribute to a complex trait in yeast. PLoS Genet. 10: e1004324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Visscher P. M., Medland S. E., Ferreira M. R., Morley K. I., Zhu G., et al. , 2006.  Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2: e41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Visscher P. M., Brown M. A., McCarthy M. I., Yang J., 2012.  Five years of GWAS discovery. Am. J. Hum. Genet. 90: 7–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Vukcevic D., Hechter E., Spencer C., Donnelly P., 2011.  Disease model distortion in association studies. Genet. Epidemiol. 290: 278–290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Yang J., Benyamin B., McEvoy B. P., Gordon S., Henders A. K., et al. , 2010.  Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42: 565–569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Yang J., Lee S. H., Goddard M. E., Visscher P. M., 2011.  GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88: 76–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Zaitlen N., Kraft P., Patterson N., Pasaniuc B., Bhatia G., et al. , 2013.  Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS Genet. 9: e1003520. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Zuk O., Hechter E., Sunyaev S. R., Lander E. S., 2012.  The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. USA 109: 1193–1198. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES