Using known QTLs to detect directional epistatic interactions

MONTGOMERY SLATKIN; MARK KIRKPATRICK

doi:10.1017/S0016672312000043

. Author manuscript; available in PMC: 2016 Nov 14.

Published in final edited form as: Genet Res (Camb). 2012 Feb;94(1):39–48. doi: 10.1017/S0016672312000043

Using known QTLs to detect directional epistatic interactions

MONTGOMERY SLATKIN ^1,^*, MARK KIRKPATRICK ²

PMCID: PMC5108529 NIHMSID: NIHMS827399 PMID: 22353247

Summary

Epistasis plays important roles in evolution, for example in the evolution of recombination, but each of the current methods to study epistasis has limitations. Here, we propose a new strategy. If a quantitative trait locus (QTL) affecting a quantitative character has been identified, individuals who have the same genotype at that QTL can be regarded as comprising a subpopulation whose response to selection depends in part on interactions with other loci affecting the character. We define the marginal differences to be the differences in the average phenotypes of individuals with different genotypes of that QTL. We show that the response of the marginal differences to directional selection on the quantitative character depends on epistatic gene interactions. For a model with no interactions, the marginal differences do not differ on average from their starting values once linkage equilibrium has been re-established. If there is directional epistasis, meaning that interactions between the QTL and other loci tend to increase or decrease the character more than under an additive model, then the marginal differences will tend to increase or decrease accordingly when larger values of the character are selected for. We develop a likelihood ratio test for significant changes in the marginal differences and show that it has some power to detect directional epistasis for realistic sample sizes. We also show that epistatic interactions which affect the evolution of the marginal differences do not necessarily result in a substantial epistatic component of the genetic variance.

Gene interactions are commonly found in genetic experiments but their overall importance for quantitative genetics is not well understood (Phillips, 2008; Flint & Mackay, 2009). In this paper, we introduce a new way of testing for epistatic interactions when one or more QTLs affecting a quantitative character have been identified. For a specific QTL, the marginal averages, defined to be the character averages in individuals with each of the QTL genotypes, depend on other loci affecting the character. After the marginal averages are estimated in the initial population, selection can be applied. Changes in the marginal averages in the selected population reflect interactions with other loci affecting the character. In particular, the differences in the marginal averages, which we call the marginal differences, increase or decrease if interactions tend to increase or decrease the character from the expectation under additivity. Such interactions are called directional epistatic interactions by Hansen & Wagner (2001). They can affect the evolutionary dynamics of a trait even when they do not result in a substantial epistatic component of genetic variance.

Directional epistatic interactions have been discussed extensively in the context of loci affecting fitness. If alleles that increase fitness have negative interactions (i.e. overall fitness is less than would be predicted by the individual effects), then negative linkage disequilibrium results. Under some conditions, this situation can favour the evolution of increased recombination and sexual reproduction (reviewed in Otto, 2009). This theoretical prediction has motivated a substantial number of experiments searching for consistent patterns in the sign of epistasis. No clear trend has emerged (de Visser & Elena, 2007; Kouyos et al., 2007). Those results, however, largely rest on two experimental paradigms: constructing genotypes with known combinations of mutations, and following changes in time and differences between mutation accumulation lines. These and other approaches used to quantify epistasis, however, have a variety of limitations that we review in the Discussion. It would therefore be useful to have other experimental strategies that could be applied to other genes, organisms and traits.

Early work on the effects of epistasis on quantitative traits was led by Cockerham (1954) and Kempthorne (1954), who first worked out the theory for estimating epistatic components of genetic variance from the resemblance between relatives. Since then, many specific models of epistasis have been developed and analysed, some based on the deviations from additivity expressed in algebraic terms (Lewontin & Kojima, 1960) and others based on explicit assumptions about interactions of gene products (Kacser & Burns, 1981). In most cases, explicit models of gene interactions do not result in much epistatic variance (Keightley, 1989). Hansen and Wagner (2001), Barton and Turelli (2004) and others have explored the properties of general models of epistasis. Pavlicev et al. (2010) implemented the Hansen and Wagner theory and found evidence for directional epistasis between QTLs affecting several characters in inbred mice. Le Rouzic et al. (2010) developed a statistical method for estimating epistatic effects by analysing data from selection experiments. Although they did not assume that QTL frequencies could be monitored in the same experiment, their method could be adapted to allow for that possibility.

In this paper, we propose a strategy to detect directional epistasis using selection experiments. Our method differs from that used by Pavlicev et al. (2010) because it is designed to detect directional epistasis between a previously identified QTL and all other QTLs affecting the same character, including both those that have been identified and those that are still unknown. The method used by Pavlicev et al. (2010) tests for directional epistasis between only those QTLs that have been identified. We begin by describing a null model of additive effects across loci (i.e. no epistasis). We next define three simple models of gene interactions that will allow us to illustrate our results. The first is a completely symmetric model of epistasis for which the theory is relatively simple. We then consider two other models for epistasis in which one locus, representing a known QTL, is distinguished from the others. We then turn to detecting epistasis. Using analysis and simulation of the models, we show that with directional epistasis the average differences between the genotypes at a QTL changes following selection. We propose a statistical test for detecting these changes and examine its power.

1. Model assumptions

We assume that a quantitative character, x, is determined by the genotype at L unlinked diallelic loci. At each locus, the allele that tends to increase x is denoted by +. The genotype of an individual is represented by a vector k = {k₁, …k_L}, where k_i = 0, 1 or 2, indicates the number of + alleles at that locus. In general,

x = f (k),

(1)

where f (k) is the phenotype map that specifies the average phenotype associated with each genotype. In general, f depends on 3^L parameters, one for each genotype.

Our reference by which to judge the effects of epistasis is the additive model. It assumes that the contributions of each locus add:

f (k_{1}, \dots, k_{L}) = f_{1} (k_{1}) + \dots + f_{L} (k_{L}) .

(2)

The first and simplest of our three models that include epistasis is the symmetric model. It assumes that all + alleles are equivalent in their effect on the character, which implies that f depends only on the total number of + alleles:

f (k_{1}, \dots, k_{L}) = h_{i},

(3)

where $i = \sum_{j = 1}^{L} k_{j}$ . If h_i is a linear function of i, then the symmetric model is also additive. In later analysis, we will assume quadratic functional dependence on i: h_i = βi + γi². A generalization of the symmetric model assumes that a random interaction term, e_k is added with probability q to each genotype:

f (k) = β i + γ i^{2} + e_{k} .

(4)

The parameter q allows us to vary the sparseness of the additional epistatic terms. We will assume that each e_k is drawn independently from a distribution with mean 0 and standard deviation σ_I.

We next consider cases in which a focal locus, which we take to be locus 1 without loss of generality, is distinguished from the others. Locus 1 represents a QTL that has already been identified in a mapping study. In our second model for epistasis, which we call conditional additive, locus 1 interacts additively with all other loci considered together, but the other loci interact among themselves:

f (k_{1}, \dots, k_{L}) = f_{1} (k_{1}) + f_{2 - L} (k_{2}, \dots, k_{L}) .

(5)

In our third and final model for epistasis, which we call conditional epistatic, locus 1 interacts with each other locus epistatically but the other loci do not interact with one another:

f (k_{1}, \dots, k_{L}) = f_{1} (k_{1}) + \dots + f_{L} (k_{L}) + f_{12} (k_{1}, k_{2}) + \dots + f_{1 L} (k_{1}, k_{L}) .

(6)

The conditional epistatic model allows for directional epistasis in an especially simple context.

2. Population properties

The mean and total genetic variance of x in the population are

\bar{x} = \sum_{k} Pr (k) f (k)

(7)

and

V_{G} = \sum_{k} Pr (k) {[f (k) - \bar{x}]}^{2},

(8)

where Pr(k) is the frequency of genotype k in the population.

We assume there is a base population in which the frequency of the + allele at locus j is p_j and the loci are in Hardy–Weinberg and linkage equilibrium. Then we allow directional selection on x to be applied. Continued random mating will ensure Hardy–Weinberg genotype frequencies each generation, but selection will create linkage disequilibrium even between unlinked loci. We will assume that selection is applied for t₁ generations and then t₂ generations of random mating occur without selection in order to allow linkage equilibrium to be restored.

We calculated the components of genetic variance in the standard way (Falconer & Mackay, 1996; Lynch & Walsh, 1998). The total genetic variance is V_G (eqn 8). We compute a_j = (x̄_2, _j − x̄_0, _j)/2 and d_j = x̄_1, _j − (x̄_2, _j + x̄_0, _j)/2, where x̄_k_, _j are the marginal averages for locus j. Then, $V_{A} = \sum_{j = 1}^{L} 2 p_{j} (1 - p_{j}) {[a_{j} + d_{j} (1 - 2 p_{j})]}^{2}, V_{D} = \sum_{j = 1}^{L} {[2 p_{j} (1 - p_{j}) d_{j}]}^{2}$ and V_I = V_G − V_A − V_D.

3. Marginal averages and marginal differences

Our test for epistasis depends on changes in the differences between the average phenotypes produced by genotypes at the focal QTL. The marginal averages for that QTL (locus 1) are the expectations of x, given the genotype at locus 1:

{\bar{x}}_{k} = \sum_{k_{2} \dots k_{L}} Pr (k_{2}, \dots, k_{L} ∣ k_{1} = k) f (k, k_{2}, \dots, k_{L}) .

(9)

where k = 0, 1, 2 corresponds to the number of + alleles at the focal locus. At linkage equilibrium, the conditional probabilities do not depend on k.

We define the marginal differences for locus 1 to be Δ₁ = x̄₂ − x̄₁ and Δ₀ = x̄₁ − x̄₀. These quantities are the key to the test we propose for detecting directional epistasis: as we will see in the following sections, changes in allele frequencies cause the marginal differences to change under some types of epistasis. If the marginal differences for a locus do change that will alter the additive genetic variance the locus contributes to the trait. To see this, write the additive genetic effect of locus 1 in terms of the marginal differences:

α = \frac{Δ_{1} + Δ_{0}}{2} - (1 - 2 p) \frac{Δ_{1} - Δ_{0}}{2} .

(10)

The marginal differences are therefore of interest for two reasons: they provide an opportunity to test for epistasis, and changes in their values alter the contribution of a locus to the additive genetic variance for a trait.

We now consider the properties of the marginal differences under the conditional additive and conditional epistatic model, where simple analytic results can be obtained, and under the symmetric model, where we rely on numerical analysis.

(i) The conditional additive model

In the conditional additive model, the effect of locus 1 is added to the net effect of the other L–1 loci, which may interact among themselves. It is easy to see that the expectations of the marginal averages do not depend on the allele frequencies at the other loci at linkage equilibrium. From eqn (9),

\begin{array}{l} {\bar{x}}_{k} = \sum_{k_{2} \dots k_{L}} Pr (k_{2}, \dots, k_{L} ∣ k_{1} = k) f (k, k_{2}, \dots, k_{L}) \\ = \sum_{k_{2} \dots k_{L}} Pr (k_{2}, \dots, k_{L} ∣ k_{1} = k) [f_{1} (k) + f_{2, L} (k_{2}, \dots, k_{L})] \\ = f_{1} (k) + \sum_{k_{2} \dots k_{L}} Pr (k_{2}, \dots, k_{L}) f_{2, L} (k_{2}, \dots, k_{L}) \end{array}

(11)

because, at linkage equilibrium, Pr(k₂, …, k_L|k₁ = k) is independent of k. Therefore, Δ₁ = x̄₂ − x̄₁ = f₁(2) − f₁(1) and Δ₀ = x̄₁ − x̄₀ = f₁(1) − f₁(0). The conclusion is that the marginal differences at locus 1 are independent of the allele frequencies at the other loci. In fact, all that is required for this result is that there should be linkage equilibrium between locus 1 and the others, even if there is linkage disequilibrium among the other loci.

(ii) The conditional epistasis model

A different result holds under the conditional epistasis model. If locus 1 interacts with the other loci, the marginal differences may change. Substituting into eqn (9), we obtain

{\bar{x}}_{k} = f_{1} (k) + \sum_{k_{2} \dots k_{L}} Pr (k_{2}, \dots, k_{L} ∣ k_{1} = k) [f_{2} (k_{2}) + \dots + f_{L} (k_{L}) + f_{12} (k, k_{2}) + \dots + f_{1 L} (k, k_{L})] .

(12)

Subtracting and assuming linkage equilibrium yields

Δ_{1} = f_{1} (2) - f_{1} (1) + \sum_{i = 2}^{L} \sum_{k_{i} = 0}^{2} Pr (k_{i}) [f_{1 i} (2, k_{i}) - f_{1 i} (1, k_{i})]

(13)

and

Δ_{0} = f_{1} (1) - f_{1} (0) + \sum_{i = 2}^{L} \sum_{k_{i} = 0}^{2} Pr (k_{i}) [f_{1 i} (1, k_{i}) - f_{1 i} (0, k_{i})] .

(14)

The third terms on the right-hand sides of these expressions represent the net effects of the epistatic interactions between locus 1 and the others. These expressions show that the marginal differences at locus 1 are functions of the allele frequencies at the other loci.

A special case of the conditional epistasis model is that in which f_i(k_i) = k_i and f₁_i(k₁,k_i) = β_ik₁k_i. Then

Δ_{1} = Δ_{0} = 1 + 2 \sum_{i = 2}^{L} β_{i} p_{i} .

(15)

Thus, the changes in the marginal averages are proportional to the changes in allele frequency at the other loci, weighted by the β_i.

The key conclusion that emerges from the conditional epistasis model is that the marginal differences for the focal QTL will change if there are allele frequency changes at other loci with which the focal locus has directional epistatic interactions. This motivates the experimental test proposed below, in which the allele frequencies change in response to artificial selection.

(iii) The symmetric model

The general model is not analytically or even numerically tractable for more than a few loci. This situation motivates the symmetric model, in which the frequency and effect of the + allele is assumed to be the same at all loci. These assumptions lead to a relatively simple theory that can be analysed numerically for larger numbers of loci.

Our analysis is based on that of Barton (1992). We assume an infinitely large population and ignore the effects of genetic drift. The population is characterized by the frequencies of gametes carrying i + alleles. A generation consists of the random union of gametes, then selection based on the number of + alleles in zygotes, and finally free recombination to create the next generation of gametes. After one generation

g_{j} (t + 1) = \sum_{i, i^{'} = 0}^{L} g_{i} (t) g_{i^{'}} (t) \frac{w (i + i^{'})}{\bar{w}} P (i, i^{'}; j),

(16)

where g_i(t) is the frequency of gametes carrying i + alleles in generation t, w(i + i′) is the relative fitness of an individual with i + i′ + alleles, w̄ is the average fitness in the population and P(i,i′;j) is the probability that an individual with gametes carrying i and i′ + alleles will produce a gamete with j + alleles. Barton (1992) derived the expression for P by assuming all configurations of + alleles are equally probable in a gamete carrying i + alleles:

P (i, i^{'}; j) = \sum_{m = 0}^{min (i^{'}, j, i + i^{'} - j)} \frac{(\begin{matrix} i \\ m \end{matrix}) (\begin{matrix} L - i \\ i^{'} - m \end{matrix})}{(\begin{matrix} L \\ i^{'} \end{matrix})} (\begin{matrix} i + i^{'} - 2 m \\ j - m \end{matrix}) \times {(\frac{1}{2})}^{i + i^{'} - 2 m}

(17)

if i′≤i and max(i + i′ − L)≤j≤min(i + i′ − L). By symmetry P(i,i′; j) = P(i′,i; j). Because the ratio of binomial coefficients in the sum is the same as the ratio in the hypergeometric probability distribution, this model is sometimes called the hypergeometric model. We compute the marginal differences and the variance components for the symmetric model as described in Appendix A. A Mathematica program that carries out these calculations is available on request.

There is no closed form solution for this model, but it is easy to iterate numerically for an arbitrary fitness function, w. Here we will assume truncation selection for larger x and let α_x be the fraction of zygotes that survive to breed. Our interest is with the dependence on the marginal differences on the phenotype map, h_i.

We assumed a quadratic phenotype map, h_i = βi + γi², in order to contrast the results for the additive model (γ = 0) with models that assume synergistic (γ > 0) and antagonistic (γ < 0) interactions of + alleles. Figure 1 shows typical results for experiments in which five generations of truncation selection with α_x = 0·5 are followed by 10 generations with no selection. In the additive model, the marginal differences initially decrease because of the accumulation of linkage disequilibrium, but then return to their original values after 10 generations of random mating. In contrast, if γ > 0, the marginal differences are larger than their initial values after linkage equilibrium has been re-established (Fig. 1b), and if γ < 0, the marginal differences are smaller than their initial values (Fig. 1c).

Fig. 1 — Time dependence of the marginal differences, Δ₁ and Δ₀, after five generations of truncation selection with *α_x* = 1/2. The phenotype map in each case is *h_i* = βi + γi². In all cases, β = 0·5. Generation 0 is the initial population assumed to be in Hardy–Weinberg and linkage equilibrium with p = 0·3 at all 10 loci. Selection was applied in generations 0–4 followed by 10 generations of random mating without selection.

This trend in the results does not require much synergistic or antagonistic interaction and these interactions do not create much interaction variance once linkage equilibrium is re-established. The final values of V_I/V_G are 0·002 for γ = 0·05 and 0·015 for γ = −0·01.

We conclude from these numerical results that the marginal differences are sensitive to relatively weak gene interactions and that they are sensitive to gene interactions that do not create significant interaction components of the genetic variance.

4. Statistical test for changes in marginal differences

The symmetric model shows that the marginal differences depend on directional epistatic interactions between a QTL and other loci. To detect a significant change in the marginal differences, either in real or simulated data, a statistical test is needed. We developed a likelihood ratio test that is described in Appendix B. The test compares the genotype-specific distributions of the character in the initial and final populations. The data are in six vectors, {x_ki} for the starting population and {y_ki} for the final population, where k = 0, 1, 2 is the number of copies of the + allele at the QTL. We assume that the elements of each vector are drawn from a normal distribution with means μ_kx or μ_ky and variances V_kx or V_ky. Our test is of whether μ₂_x − μ₁_x = μ₂_y − μ₁_y and μ₁_x − μ₀_x = μ₁_y − μ₀_y under the assumption that the variances can take any values. Our test assumes that twice ratio of likelihoods computed under the assumption of no constraints on the means and under the assumption of these two constraints has a χ² distribution with two degrees of freedom. The P-value reported is the tail probability of this distribution.

Like any statistical test, the power of this test increases with sample size. The power can be increased by accumulating data from several generations before and after selection is performed. In this way, larger total sample sizes can be obtained without having to increase the total size of the populations studied.

5. Simulations

The numerical analysis of the symmetric model assumes that selection is performed in infinitely large populations and that allele frequencies at all loci are the same in each generation. We wrote a simulation program for finite populations to test whether the trends seen in those analyses can be detected in selection experiments and to analyse other models for epistasis.

The simulation program assumes that a population of 2N gametes is formed into N zygotes. A phenotype x is assigned to each zygote according the specified phenotype map. Then the N_s = α_xN individuals with the largest x are chosen to breed. Meiosis was modelled by randomly choosing individuals with replacement 2N times from the breeding pool and generating a random gamete from each, assuming no linkage. Because the gametes are paired randomly into zygotes, this method is equivalent to allowing self-fertilization.

In each generation, we recorded the marginal differences and the genetic, additive, dominance and epistatic components of the genetic variance. After t₁ + t₂ generations, we tested for significant changes in the marginal differences at locus 1 to obtain a P-value of the test that there was no change in the marginal differences.

(i) Simulation results for the symmetric model

Figure 2 shows the simulation results for the marginal differences in the symmetric model for the same parameter values used in Fig. 1. The numerical results from Fig. 1 are included for comparison. In Fig. 2, the marginal differences for all 10 loci are plotted for each generation. In the symmetric model, all loci are equivalent, and hence variation among loci represents the variation among replicates for a single locus. The simulation results are consistent on average with the results for an infinitely large population and they illustrate that there is substantial stochastic variation even with N = 1000. It is worth noting that the stochastic variation seen is not the result of the instability of allele frequencies caused by selection. Although the equality of allele frequencies in a symmetric model may be unstable to some kinds of selection (Barton & Shpak, 2000), that is not the case with the selection model we used. In these and other simulation results for the symmetric model, allele frequencies at different loci do not vary more than would be expected under genetic drift alone acting for the same number of generations. Instead, the variation in the marginal differences results from stochastic variation in linkage disequilibrium among the loci.

Fig. 2 — Time dependence of Δ₁ and Δ₀ for all 10 loci in a population of N = 1000 individuals. The parameter values are the same as in Fig. 1. The predictions of the analytic theory from Fig. 1 are plotted with solid lines for comparison with the simulations.

To determine whether significant changes in the marginal differences can be detected with the test described above, we ran 100 replicates each with the same parameter values used in the three cases shown in Fig. 2 but with $σ_{e}^{2}$ chosen so that the heritability in each case is roughly 1/2: $σ_{e}^{2} = 1.1$ for γ = 0, 3·0 for γ = 0·5 and 0·9 for γ = −0·1. For 3A (linear model), 5% had a significant change in the marginal differences at the 5% level. For 3B (synergistic), 31% were significant at the 5% level and for 3C (antagonistic) 14% were significant.

These simulation results confirm that the average behaviour predicted by the analytic theory is seen in populations of finite size and that there is some power to detect significant changes in the marginal differences. The symmetric model we analysed is not a realistic model for interactions affecting quantitative character, however, because all interactions are in the same direction. As a consequence, V_G is predicted to change systematically in a way not seen in selection experiments. With γ = 0·05, V_G almost doubles and if γ = −0·01, it decreases by almost a half.

(ii) Simulation results for other models

The results from the symmetric model indicate that deviations from the additive model of the same sign (either synergistic or antagonistic) tend to have consistent effects on the marginal differences. To determine whether those results are sensitive to additional random epistatic effects, we simulated the model with the randomly generated interaction term, eqn (4). Figure 3(a) shows that if γ = 0, selection on x does not tend to change the marginal differences, even though the additional interaction terms create a substantial interaction component of the variance. After linkage equilibrium is re-established, V_I/V_G≈0·25. With both synergistic (γ > 0) and antagonistic (γ < 0), the trends in the marginal differences persist (Fig. 3b and c). In sets of 100 replicate simulations for each set of parameter values, we tested whether the marginal differences changed significantly at the 5% level. In 100 replicates, we found 10% significant changes in the marginal differences for the parameters in 3A (with $σ_{e}^{2} = 1.4$ ), 56% for 3B (with $σ_{e}^{2} = 9$ ) and 5% for 3C (with $σ_{e}^{2} = 0.9$ )

Fig. 3 — Time dependence of Δ₁ and Δ₀ in the model with random epistatic terms added to the quadratic dependence on i (cf. eqn (4)). The parameter values are the same as in Figs 1 and 2. The additional epistatic term for each genotype was drawn with probability q = 0.5 from a normal distribution with mean 0 and standard deviation σ_I = 1. The predictions of the analytic theory from Fig. 1 are plotted with solid lines for comparison.

We determined the power of the likelihood ratio test for the conditional epistasis model with f₁_i(k₁, k_i) = β_i₁k_i. Table 1 shows some results for L = 10, p_i = 0·2 initially, and the same β used for i = 2, …, 10. With the parameter values used, the average p_i was approximately 0·8 in the final population. Therefore, from eqn (15) the expected change in the marginal differences is $2 β (L - 1) (p_{i}^{final} - p_{i}^{initial})$ or approximately 10·6β. We can see there is some power to detect significant changes in marginal differences, particularly if N = 1000.

Table 1.

Power to detect significant changes in the marginal differences (Δ₁ and Δ₀) for the conditional epistasis model. In all cases, 100 replicate simulations were run, α_x = 0·5, L = 10, f₁(k₁) = k₁, the frequency of the + allele was initially 0·2 at every locus, and the same value of β_i was assumed for loci i = 2, …, 10. If the + allele became fixed after selection, the test could not be performed. The numbers shown are the fractions of replicates for which the likelihood ratio test was performed and P ≤ 0·05. The numbers in parentheses are the numbers of tests performed

	N

β	100	250	1000
0·01	0·10 (87)	0·10 (99)	0·25 (100)
0·02	0·22 (83)	0·22 (100)	0·56 (100)
0·03	0·25 (73)	0·23 (99)	0·75 (100)

Open in a new tab

As mentioned above, the power of our test can be increased if information is accumulated over several generations before and after selection. To illustrate, we ran additional simulations for the middle cell in Table 1 (N = 250 and β = 0·02). With data taken in one generation before and after selection was performed, 22% of the replicates had a significant change in the marginal differences at the 5% level. With data accumulated over three generations before and after, 47% of the replicates were significant and over five generations before and after 62% were significant.

6. Discussion and conclusions

Four research strategies are currently used to study epistasis among genes affecting quantitative traits, each with its strengths and weaknesses. The total genetic variance can be decomposed to determine the contribution from epistasis (Cockerham, 1954; Kempthorne, 1954). Limitations of this approach are that it does not give information about the form of epistasis, or how epistasis will affect evolutionary dynamics. A second approach is to construct known genotypes to study interactions between loci (Flint & Mackay, 2009). This strategy has the virtues that it can directly identify loci that interact, and allows the interactions to be quantified in detail and studied experimentally. It is limited to organisms in which appropriate mutations have been characterized and by the number of mutant combinations that can be constructed and analysed. A third strategy is to analyse replicated mutation accumulation lines (Elena & Lenski, 1997). The dynamics of trait values (such as fitness) within lines and differences between lines can be used to estimate parameters of epistasis. The strength of this method is that it can in principle detect the effects of epistasis at loci throughout the genome, and these loci do not need to be identified a priori. A major limitation is that it quantifies only the epistatic effects of spontaneous mutations, which may not be typical of genes segregating in natural populations. A fourth method used to study epistasis is QTL mapping (Manichaikul et al., 2010). This approach identifies the QTLs involved, but is constrained by statistical power to detecting only strong epistatic interactions.

This paper proposes a new strategy for studying epistasis that complements these other methods. Its strengths are that it can detect contributions to epistasis from naturally segregating variation at loci throughout the genome. Its limitations are that it requires a large replicated selection experiment and is sensitive to only certain types of epistasis. The principle of our method depends on the fact that the marginal effects of a known QTL depend on the epistatic interactions of that locus with other loci affecting the same character. Changes in allele frequencies at those loci change the genetic environment for the QTL (Phillips, 2008). This generalization implies that changes in the marginal effects of a known QTL can provide evidence of interactions that tend to be in one direction, either synergistic or antagonistic. Furthermore, interactions that affect the marginal differences do not necessarily lead to a substantial interaction component of the genetic variance and therefore might not be detected by studies that focus on variance components.

Our method is not able to detect all types of epistatic interactions, only those that result in a net directional component. As shown in Fig. 3(a), interactions can be present and can generate a substantial interaction component of the variance and yet will not result in a net change in the marginal differences because their effects tend to cancel.

Our numerical and simulation results show that synergistic and antagonistic interactions result in predictable changes in the marginal differences, on average. At first sight, it may seem unlikely that epistatic interactions would show a bias towards synergistic or antagonistic interactions. It is, however, well established that dominance interactions often display directionality. Deleterious mutations are typically partly recessive (Eyre-Walker & Keightley, 2007), and consistent dominance patterns have been seen for other kinds of traits, a phenomenon called ‘directional dominance’ (Falconer & Mackay, 1996, p. 250). If dominance (i.e. interactions between alleles at a single locus) can show consistent patterns, it is plausible that epistasis (which is interactions between alleles at different loci) will also. Metabolic control theory gives support to that view (Keightley, 1989).

In our analysis, we have assumed that QTLs are unlinked. In that case a few generations of random mating with no selection will restore linkage equilibrium. Our conclusions are still true if there is very close linkage between the focal QTL and others. In the time scale of the experiment, very closely linked QTLs would behave as a single QTL because recombination between them would be unlikely. Complications arise if there is weaker linkage between the focal QTL and one or more unseen QTLs. Even if they are in linkage equilibrium in the initial population, directional selection would create LD between them that would not decay in a few generations of random mating. The remaining LD could create a false signal of directional epistasis that would decline if more generations of random mating were allowed. If a signal of significant directional epistasis were found in an experiment, it would be necessary to test for the presence of a linked QTL by doing controlled crosses.

Our method tests for directional epistasis between a specific QTL and others. If there were directional epistasis between all QTLs in the same direction, as in our symmetric model, then the directionality of the interactions would also be indicated by the response of the character mean to selection (an outcome that can also result from dominance). The mean would increase more than linearly in response to selection. In that case, the directionality could be removed by changing the scale of measurement so that the selection response is linear. The situation is different if the focal QTL interacts with others but they do not interact among themselves, as in our conditional epistasis model. Even when there are significant changes in the marginal differences, we found that the average response to selection is not distinguishable from linear. In a model in which there are positive directional interactions with one QTL and negative interactions with another, the effect of the directional interactions would cancel on average while directionality for each QTL could in principle be detected with our method.

It may be difficult to detect epistatic interactions by the approach explored in this paper because of the large stochastic fluctuations in the marginal differences. These stochastic fluctuations reflect the fact that the marginal differences are the difference in the averages of two distributions that are quite similar unless the QTL accounts for a substantial fraction of the genetic variance. Nevertheless, changes in marginal differences after selection reflect kinds of epistatic interactions that cannot be detected by other means.

Although relatively large sample sizes are required to detect significant directional epistasis between a known QTL and other QTLs that affect the same character, that is a problem for classical tests of epistatic interactions as well. Large sample sizes are needed to estimate additive by additive and other components of genetic variance from breeding experiments (Hill et al., 2008). Estimates for the genetic variance resulting from epistasis, however, cannot predict evolutionary consequences, for example how the selection response will change as allele frequencies evolve.

Our method will not be effective in detecting directional epistasis caused by interactions with low frequency alleles. Changes in the marginal differences result from changes in the frequency of unseen QTLs because of directional selection, and low frequency alleles will respond very slowly to directional selection. However, low frequency alleles would be expected to contribute little to directional epistasis (cf. eqn 15), just as they contribute little to the interaction variance (Hill et al., 2008). Epistatic interactions with more common QTLs may not contribute much to the interaction component of the genetic variance, even if they can be detected from changes in the marginal differences.

Our results also suggest a way to detect loci strongly affected by epistatic interactions. The idea is to perform a genome-wide association study (GWAS) for a quantitative character in an initial population and again after several generations of directional selection on the character. An experiment of this type was carried out by Burke et al. (2010). They selected for accelerated development in Drosophila melanogaster and tested for significant differences in single nucleotide polymorphism (SNP) frequencies between selected and unselected (control) populations in order to identify SNPs associated with development time. In such an experiment, SNPs for which the marginal differences change substantially would likely be closely linked to loci that are affected by epistatic interactions. Although the statistical problems arising in the analysis of such data are formidable and complications would arise because linkage disequilibrium between the SNPs surveyed and causative genes could change, data from independent replicates would give increasing power to detecting interacting loci if they are present. Decreasing costs of genotyping on a genome wide scale will make such experiments increasingly feasible.

Acknowledgments

We thank J. Felsenstein and Y. S. Song for helpful discussions of this topic and comments on a draft of this paper. M. S. was supported in part by NIH Grant R01-GM40282 and M. K. was supported in part by NSF Grant DEB-0819901.

Appendix A. Formulae for symmetric model

We show here the formulae for calculating the marginal averages and components of genetic variance for the symmetric model with equal allele frequencies. From the distribution of the number of + alleles in gametes, g_i. the distribution in zygotes is the convolution of g_i with itself:

ϕ_{i} = \sum_{j = max (0, i - L)}^{min (i, L)} g_{j} g_{i - j} .

Therefore,

\bar{x} = \sum_{i = 0}^{2 L} h_{i} ϕ_{i}

and

V_{G} = \sum_{i = 0}^{2 L} {(h_{i} - \bar{x})}^{2} ϕ_{i} .

The distribution of i in gametes, given that one of the loci has k = 0 or 1 + alleles, is

g_{i}^{(k)} = g_{i} \frac{(\begin{matrix} L - 1 \\ i - k \end{matrix})}{(\begin{matrix} L \\ i \end{matrix})} / \sum_{i = 0}^{L} g_{i} \frac{(\begin{matrix} L - 1 \\ i - k \end{matrix})}{(\begin{matrix} L \\ i \end{matrix})}

(i = 0, …, L), where the binomial coefficients are assumed to be 0 if either of the arguments is negative. The marginal averages are then

\begin{array}{l} {\bar{x}}_{0} = \sum_{i, j = 0}^{L} g_{i}^{(0)} g_{j}^{(0)} h_{i + j}, \\ {\bar{x}}_{1} = \sum_{i, j = 0}^{L} g_{i}^{(1)} g_{j}^{(0)} h_{i + j} \end{array}

and

{\bar{x}}_{2} = \sum_{j, k = 0}^{L} g_{i}^{(1)} g_{j}^{(1)} h_{i + j} .

The marginal differences, Δ₁ and Δ₀, are obtained by subtraction. For each locus, a = (Δ₁ + Δ₀)/2 and d = (Δ₀ − Δ₁)/2. Substituting these into the expressions given in the text, V_A, V_D and V_I are obtained.

Appendix B. Likelihood ratio test for a change in the marginal differences

In the initial population, there are m_k individuals with k + alleles at locus 1 and n_k in the final population. In the initial population, the phenotypes of individuals with each of the three genotypes are vectors with elements x_ki (k = 0, 1, 2 and i = 1, …, m_k) and in the final population they are y_ki (k = 0, 1, 2 and i = 1, …, n_k). We assume the x’s and y’s are normally distributed with arbitrary means and variances.

In the unconstrained model, the parameters of the normal distributions for x_ki are μ_kx and V_kx and for y_ki the parameters are μ_ky and V_ky.

The likelihood expressed as a function of these means and variances is

L = \prod_{k = 0}^{2} {(\frac{1}{2 π V_{k x}})}^{m_{k} / 2} \prod_{i = 1}^{m_{k}} exp [\frac{- {(x_{k i} - μ_{k x})}^{2}}{2 V_{k x}}] {(\frac{1}{2 π V_{k y}})}^{n_{k} / 2} \times \prod_{i = 1}^{n_{k}} exp [\frac{- {(y_{k i} - μ_{k y})}^{2}}{2 V_{k y}}] .

For the unconstrained model, the maximum likelihood estimates of the means and variances are the same as for three independent normal distributions:

{\hat{μ}}_{k x} = \frac{1}{m_{k}} \sum_{i = 1}^{m_{k}} x_{k i}

with similar expressions for μ_ky, and

{\hat{V}}_{k x} = \frac{1}{m_{k}} \sum_{i = 1}^{m_{k}} {(x_{k i} - {\hat{μ}}_{k x})}^{2} .

(k = 0, 1, 2) with similar expressions for the V_ky.

The marginal differences at the beginning of the experiment are Δ₁ = μ₂_x − μ₁_x and Δ₀ = μ₁_x − μ₀_x and at the end of the experiment they are Δ′₁ = μ₂_y − μ₁_y and Δ′₀ = μ₁_y − μ₀_y. We want to test the hypothesis that Δ₁ = Δ′₁ and Δ₀ = Δ′₀. To do so, we find the maximum likelihood estimates for a model in which the variances can take any value but the means are constrained to satisfy Δ₁ = Δ′₁ and Δ₀ = Δ′₀.

For the constrained model, the six equations for the variances, given the means, are the same as for the unconstrained model. We parameterize the constrained model by setting μ₂_x = μ₁_x + Δ₁, μ₀_x = μ₁_x − Δ₀, μ₂_y = μ₁_y + Δ₁ and μ₀_y = μ₁_y − Δ₀. We find that the maximum likelihood estimates of μ₁_x, μ₁_y, Δ₁ and Δ₀ have to satisfy the following four equations:

\begin{array}{l} (\frac{m_{2}}{V_{2 x}} + \frac{m_{1}}{V_{1 x}} + \frac{m_{0}}{V_{0 x}}) μ_{1 x} + \frac{m_{2}}{V_{2 x}} Δ_{1} - \frac{m_{0}}{V_{0 x}} Δ_{0} = \frac{1}{V_{2 x}} \sum_{i = 1}^{m_{2}} x_{2 i} + \frac{1}{V_{1 x}} \sum_{i = 1}^{m_{1}} x_{1 i} + \frac{1}{V_{0 x}} \sum_{i = 1}^{m_{0}} x_{0 i}, \\ (\frac{n_{2}}{V_{2 y}} + \frac{n_{1}}{V_{1 y}} + \frac{n_{0}}{V_{0 y}}) μ_{1 y} + \frac{n_{2}}{V_{2 y}} Δ_{1} - \frac{n_{0}}{V_{0 y}} Δ_{0} = \frac{1}{V_{2 y}} \sum_{i = 1}^{n_{2}} y_{2 i} + \frac{1}{V_{1 y}} \sum_{i = 1}^{n_{1}} y_{1 i} + \frac{1}{V_{0 y}} \sum_{i = 1}^{n_{0}} y_{0 i}, \\ \frac{m_{2}}{V_{2 x}} μ_{1 x} + \frac{n_{2}}{V_{2 y}} μ_{1 y} + (\frac{m_{2}}{V_{2 x}} + \frac{n_{2}}{V_{2 y}}) Δ_{1} = \frac{1}{V_{2 x}} \sum_{i = 1}^{n_{2}} x_{2 i} + \frac{1}{V_{2 y}} \sum_{i = 1}^{m_{2}} y_{2 i}, \\ \frac{m_{0}}{V_{0 x}} μ_{1 x} + \frac{n_{0}}{V_{0 y}} μ_{1 y} - (\frac{m_{0}}{V_{0 x}} + \frac{n_{0}}{V_{0 y}}) Δ_{0} = \frac{1}{V_{0 x}} \sum_{i = 1}^{m_{0}} x_{0 i} + \frac{1}{V_{0 y}} \sum_{i = 1}^{n_{0}} y_{0 i}, \end{array}

where the carats have been omitted for notational convenience.

In the constrained model, the estimates of the variances depend on the means and the estimates of the μs and Δs depend on the variances. To solve all 10 equations, we used an EM algorithm that starts with the means and variances from the unconstrained model, estimates the means by solving the above four equations, re-estimates the variances from the new means, and continues until the maximum change in any of the estimates in one cycle is less than a specified small value (10⁻⁸). This procedure converged in a few iterations in all cases ran. We then computed the logarithm of the ratio of likelihoods under the unconstrained and constrained models and computed a P-value from a χ² distribution with two degrees of freedom. We found in a simulation test of a null model that this test rejected the null hypothesis at the 5% level roughly 5% of the time.

References

Barton NH. On the spread of new gene combinations in the 3rd phase of Wright’s shifting-balance. Evolution. 1992;46:551–557. doi: 10.1111/j.1558-5646.1992.tb02058.x. [DOI] [PubMed] [Google Scholar]
Barton NH, Shpak M. The stability of symmetric solutions to polygenic models. Theoretical Population Biology. 2000;57:249–263. doi: 10.1006/tpbi.2000.1455. [DOI] [PubMed] [Google Scholar]
Barton NH, Turelli M. Effects of genetic drift on variance components under a general model of epistasis. Evolution. 2004;58:2111–2132. doi: 10.1111/j.0014-3820.2004.tb01591.x. [DOI] [PubMed] [Google Scholar]
Burke MK, Dunham JP, Shahrestani P, Thornton KR, Rose MR, Long AD. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature. 2010;467:587–590. doi: 10.1038/nature09352. [DOI] [PubMed] [Google Scholar]
Cockerham CC. An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics. 1954;39:859–882. doi: 10.1093/genetics/39.6.859. [DOI] [PMC free article] [PubMed] [Google Scholar]
de Visser JAGM, Elena SF. The evolution of sex: empirical insights into the roles of epistasis and drift. Nature Reviews Genetics. 2007;8:139–149. doi: 10.1038/nrg1985. [DOI] [PubMed] [Google Scholar]
Elena SF, Lenski RE. Test of synergistic interactions among deleterious mutations in bacteria. Nature. 1997;390:395–398. doi: 10.1038/37108. [DOI] [PubMed] [Google Scholar]
Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nature Reviews Genetics. 2007;8:610–618. doi: 10.1038/nrg2146. [DOI] [PubMed] [Google Scholar]
Falconer DS, Mackay TFC. Introduction to Quantitative Genetics. Longman; Essex: 1996. [Google Scholar]
Flint J, Mackay TFC. Genetic architecture of quantitative traits in mice, flies, and humans. Genome Research. 2009;19:723–733. doi: 10.1101/gr.086660.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hansen TF, Wagner GP. Modeling genetic architecture: a multilinear theory of gene interaction. Theoretical Population Biology. 2001;59:61–86. doi: 10.1006/tpbi.2000.1508. [DOI] [PubMed] [Google Scholar]
Hill WG, Goddard ME, Visscher PM. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genetics. 2008;4:e1000008. doi: 10.1371/journal.pgen.1000008. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kacser H, Burns JA. The molecular basis of dominance. Genetics. 1981;97:639–666. doi: 10.1093/genetics/97.3-4.639. [DOI] [PMC free article] [PubMed] [Google Scholar]
Keightley PD. Models of quantitative variation of flux in metabolic pathways. Genetics. 1989;121:869–876. doi: 10.1093/genetics/121.4.869. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kempthorne O. The correlation between relatives in a random mating population. Proceedings of the Royal Society of London Series B. Biological Sciences. 1954;143:103–113. [PubMed] [Google Scholar]
Kouyos RD, Silander OK, Bonhoeffer S. Epistasis between deleterious mutations and the evolution of recombination. Trends in Ecology and Evolution. 2007;22:308–315. doi: 10.1016/j.tree.2007.02.014. [DOI] [PubMed] [Google Scholar]
Le Rouzic A, Skaug HJ, Hansen TF. Estimating genetic architectures from artificial-selection responses: a random-effect framework. Theoretical Population Biology. 2010;77:119–130. doi: 10.1016/j.tpb.2009.12.003. [DOI] [PubMed] [Google Scholar]
Lewontin RC, Kojima K. The evolutionary dynamics of complex polymorphisms. Evolution. 1960;14:458–472. [Google Scholar]
Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits. Sinauer Associates; Sunderland, MA: 1998. [Google Scholar]
Manichaikul A, Moon JY, Sen S, Yandell BS, Broman KW. A model selection approach for the identification of quantitative trait loci in experimental crosses, allowing epistasis. Genetics. 2010;184:607–607. doi: 10.1534/genetics.108.094565. [DOI] [PMC free article] [PubMed] [Google Scholar]
Otto SP. The evolutionary enigma of sex. American Naturalist. 2009;174:S1–S14. doi: 10.1086/599084. [DOI] [PubMed] [Google Scholar]
Pavlicev M, Le Rouzic A, Cheverud JM, Wagner GP, Hansen TF. Directionality of epistasis in a murine intercross population. Genetics. 2010;185:1489–1505. doi: 10.1534/genetics.110.118356. [DOI] [PMC free article] [PubMed] [Google Scholar]
Phillips PC. Epistasis – the essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics. 2008;9:855–867. doi: 10.1038/nrg2452. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Barton NH. On the spread of new gene combinations in the 3rd phase of Wright’s shifting-balance. Evolution. 1992;46:551–557. doi: 10.1111/j.1558-5646.1992.tb02058.x. [DOI] [PubMed] [Google Scholar]

[R2] Barton NH, Shpak M. The stability of symmetric solutions to polygenic models. Theoretical Population Biology. 2000;57:249–263. doi: 10.1006/tpbi.2000.1455. [DOI] [PubMed] [Google Scholar]

[R3] Barton NH, Turelli M. Effects of genetic drift on variance components under a general model of epistasis. Evolution. 2004;58:2111–2132. doi: 10.1111/j.0014-3820.2004.tb01591.x. [DOI] [PubMed] [Google Scholar]

[R4] Burke MK, Dunham JP, Shahrestani P, Thornton KR, Rose MR, Long AD. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature. 2010;467:587–590. doi: 10.1038/nature09352. [DOI] [PubMed] [Google Scholar]

[R5] Cockerham CC. An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics. 1954;39:859–882. doi: 10.1093/genetics/39.6.859. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] de Visser JAGM, Elena SF. The evolution of sex: empirical insights into the roles of epistasis and drift. Nature Reviews Genetics. 2007;8:139–149. doi: 10.1038/nrg1985. [DOI] [PubMed] [Google Scholar]

[R7] Elena SF, Lenski RE. Test of synergistic interactions among deleterious mutations in bacteria. Nature. 1997;390:395–398. doi: 10.1038/37108. [DOI] [PubMed] [Google Scholar]

[R8] Eyre-Walker A, Keightley PD. The distribution of fitness effects of new mutations. Nature Reviews Genetics. 2007;8:610–618. doi: 10.1038/nrg2146. [DOI] [PubMed] [Google Scholar]

[R9] Falconer DS, Mackay TFC. Introduction to Quantitative Genetics. Longman; Essex: 1996. [Google Scholar]

[R10] Flint J, Mackay TFC. Genetic architecture of quantitative traits in mice, flies, and humans. Genome Research. 2009;19:723–733. doi: 10.1101/gr.086660.108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Hansen TF, Wagner GP. Modeling genetic architecture: a multilinear theory of gene interaction. Theoretical Population Biology. 2001;59:61–86. doi: 10.1006/tpbi.2000.1508. [DOI] [PubMed] [Google Scholar]

[R12] Hill WG, Goddard ME, Visscher PM. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genetics. 2008;4:e1000008. doi: 10.1371/journal.pgen.1000008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Kacser H, Burns JA. The molecular basis of dominance. Genetics. 1981;97:639–666. doi: 10.1093/genetics/97.3-4.639. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Keightley PD. Models of quantitative variation of flux in metabolic pathways. Genetics. 1989;121:869–876. doi: 10.1093/genetics/121.4.869. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Kempthorne O. The correlation between relatives in a random mating population. Proceedings of the Royal Society of London Series B. Biological Sciences. 1954;143:103–113. [PubMed] [Google Scholar]

[R16] Kouyos RD, Silander OK, Bonhoeffer S. Epistasis between deleterious mutations and the evolution of recombination. Trends in Ecology and Evolution. 2007;22:308–315. doi: 10.1016/j.tree.2007.02.014. [DOI] [PubMed] [Google Scholar]

[R17] Le Rouzic A, Skaug HJ, Hansen TF. Estimating genetic architectures from artificial-selection responses: a random-effect framework. Theoretical Population Biology. 2010;77:119–130. doi: 10.1016/j.tpb.2009.12.003. [DOI] [PubMed] [Google Scholar]

[R18] Lewontin RC, Kojima K. The evolutionary dynamics of complex polymorphisms. Evolution. 1960;14:458–472. [Google Scholar]

[R19] Lynch M, Walsh B. Genetics and Analysis of Quantitative Traits. Sinauer Associates; Sunderland, MA: 1998. [Google Scholar]

[R20] Manichaikul A, Moon JY, Sen S, Yandell BS, Broman KW. A model selection approach for the identification of quantitative trait loci in experimental crosses, allowing epistasis. Genetics. 2010;184:607–607. doi: 10.1534/genetics.108.094565. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Otto SP. The evolutionary enigma of sex. American Naturalist. 2009;174:S1–S14. doi: 10.1086/599084. [DOI] [PubMed] [Google Scholar]

[R22] Pavlicev M, Le Rouzic A, Cheverud JM, Wagner GP, Hansen TF. Directionality of epistasis in a murine intercross population. Genetics. 2010;185:1489–1505. doi: 10.1534/genetics.110.118356. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Phillips PC. Epistasis – the essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics. 2008;9:855–867. doi: 10.1038/nrg2452. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Using known QTLs to detect directional epistatic interactions

MONTGOMERY SLATKIN

MARK KIRKPATRICK

Summary

1. Model assumptions

2. Population properties