Abstract
Many approaches have been developed for inferring selection coefficients from time series data while accounting for genetic drift. These approaches have been motivated by the intuition that properly accounting for the population size history can significantly improve estimates of selective strengths. However, the improvement in inference accuracy that can be attained by modeling drift has not been characterized. Here, by comparing maximum likelihood estimates of selection coefficients that account for the true population size history with estimates that ignore drift by assuming allele frequencies evolve deterministically in a population of infinite size, we address the following questions: how much can modeling the population size history improve estimates of selection coefficients? How much can mis-inferred population sizes hurt inferences of selection coefficients? We conduct our analysis under the discrete Wright–Fisher model by deriving the exact probability of an allele frequency trajectory in a population of time-varying size and we replicate our results under the diffusion model. For both models, we find that ignoring drift leads to estimates of selection coefficients that are nearly as accurate as estimates that account for the true population history, even when population sizes are small and drift is high. This result is of interest because inference methods that ignore drift are widely used in evolutionary studies and can be many orders of magnitude faster than methods that account for population sizes.
Keywords: selection, inference, time series, diffusion, Wright–Fisher
Introduction
Methods for inferring the selection coefficient at a single genetic locus from time series data have been employed extensively in evolutionary studies of simple traits. Such methods track the frequency of an allele or Mendelian trait over multiple generations and infer the selection coefficient that best explains the observed frequency changes. Studies of selective pressures conducted using time series approaches have provided evidence for selective forces in natural populations and have helped to characterize the ways in which environmental factors influence evolution through selection (Fisher and Ford 1947; Clarke and Murray 1962; Wall et al. 1980; Lynch 1987; Stine and Smith 1990; Goudsmit et al. 1996; Harrigan et al. 1998; Cook et al. 1999; Haubruge and Arnaud 2001; Bonhoeffer et al. 2002; Reimchen and Nosil 2002; Cook et al. 2005; Labbé et al. 2009).
Because random fluctuations in allele frequencies due to genetic drift are often small compared with changes due to selective pressures, it is common practice for studies to assume that allele frequencies change deterministically over time as they would in a population of infinite size according to well-known deterministic formulas of (Fisher 1922, p. 424) and (Haldane 1927, p. 840) or related expressions (Gillespie 1998; Hartl and Clark 2007). However, because allele frequency trajectories can be heavily influenced by genetic drift when population sizes or selection coefficients are small, many methods have been developed to account for drift by explicitly modeling finite population sizes when inferring selection coefficients from observed allele frequency trajectories (Manly 1985; O’Hara 2005; Bollback et al. 2008; Malaspinas et al. 2012; Mathieson and McVean 2013; Lacerda and Seoighe 2014; Steinrücken et al. 2014; Foll et al. 2015; Ferrer-Admetlla et al. 2015; Schraiber et al. 2016) and when testing hypotheses about selection versus drift (Fisher and Ford 1947; Schaffer et al. 1977; Wilson 1980; Nishino 2013; Feder et al. 2014).
Although it is commonly assumed that estimates of selection coefficients are likely to be improved by accounting for population size histories, the expected amount of improvement is not well characterized. Even in relatively small populations, allele frequencies and other evolutionary processes behave almost deterministically if the selection coefficient or allele frequency is sufficiently high (Rouzine et al. 2001), suggesting that methods that ignore drift might perform well under these conditions. Conversely, if drift is strong allele frequency trajectories can be noisy and the accuracy of methods that ignore drift may be comparable to that of methods that account for population size, as all methods are likely to perform poorly under these conditions (Gallet et al. 2012).
If computationally fast methods that ignore drift are accurate, they could dramatically reduce the time required to infer selection coefficients in data sets with many loci. In addition to their computational efficiency, methods that ignore drift by assuming that the population size is infinite do not require estimates of effective population sizes, which can be difficult to obtain accurately. Moreover, ignoring drift can lead to simple formulas and inference procedures under complicated scenarios involving multiple populations with migration or multiple loci with recombination (Illingworth et al. 2012). Therefore, in light of the beneficial properties of methods that ignore drift and assume deterministic allele frequency trajectories, it is of interest to compare their accuracy to that of methods that account for population size histories.
The theoretical accuracy of methods for inferring selection coefficients can be difficult to derive analytically. Thus, to explore differences between methods that ignore or account for drift, one can take the approach of empirically comparing inferences made by estimators that either account for the true population size history or ignore the size history by assuming that populations are large and drift is negligible. This is the approach we take here. For our analyses, we consider maximum likelihood estimators of selection coefficients because they are typically quite accurate and have desirable statistical properties. Moreover, the majority of recently developed methods for inferring selection coefficients from time series data are maximum likelihood estimators, making them an important category of methods to evaluate.
To draw conclusions about the accuracy of maximum likelihood estimators, it is important to consider estimators based on exact likelihoods rather than approximations, so that differences in estimates can be attributed entirely to whether a method ignores or accounts for drift. Although several approximate approaches have been developed for computing the likelihood of a selection model given time series allele frequency data, only three existing methods compute probabilities that are exact under a widely accepted model. In particular, the methods of Bollback et al. (2008) and Steinrücken et al. (2014) compute exact probabilities under the diffusion approximation of the Wright–Fisher process. However, these methods do not model time-varying population size histories. The third inference method based on an exact likelihood considers time-varying population size histories under the diffusion approximation of the Wright–Fisher process (Schraiber et al. 2016); however, it uses an MCMC algorithm to perform Bayesian inference that is not easily incorporated into a unified inference algorithm that allows us to directly compare inferences made by estimators that model the true population history with those that assume a population of infinite size without drift. No existing method computes the exact probability of an allele frequency trajectory under the discrete Wright–Fisher model, as the matrix powers required for such a method are considered to be computationally inefficient.
Here, we derive the exact probability of an allele frequency trajectory in a population of piecewise constant size under two classical models: the discrete Wright–Fisher model and the diffusion approximation of the Wright–Fisher process. We then use maximum likelihood estimators obtained using these probabilities to explore how ignoring or accounting for the true population history affects estimates of selection coefficients.
Our results are useful for understanding when point estimates obtained using estimators based on deterministic allele frequency trajectories are likely to be accurate and when accounting for the true population history could improve these estimates. Our results have implications for the interpretation of existing estimates of selection coefficients and for the use of deterministic estimators in future studies. The results can also help guide the development of demography-aware estimators of selection coefficients by identifying scenarios under which such estimators are likely to improve inference accuracy.
Results
To compare the performance of estimators that ignore or account for drift, we inferred selection coefficients from allele frequency trajectories simulated under a variety of population histories of time-varying size.
The Population Model
In all of our analyses, we considered a single biallelic locus with alleles labeled a and A evolving under selection and recurrent mutation in a panmictic population comprised of L different epochs , each with constant size diploid individuals (fig. 1). Epoch ℓ corresponds to the time interval , where time is measured continuously in units of generations and we define . By varying the population sizes across epochs, it is possible to model a variety of size-change patterns including exponential growth, bottlenecks, and rapidly oscillating population sizes.
Within epoch , all mutation and selection parameters are assumed to be constant. In particular, we assume that the per-generation probability that allele a mutates to allele A is and the per-generation probability that allele A to a is . The three possible genotypes, aa, aA, and AA, have relative fitnesses given by , and in epoch , where is the selection coefficient and is the dominance parameter.
We denote the collection of model parameters in epoch by and the set of parameters across all epochs by Θ. It will also be convenient to denote the value of the model parameters at time t by Nt, , st, and ht, where t is measured continuously in units of generations. The epoch in which time t lies will be denoted by and the epoch in which sampling event k lies will be denoted by . It will be clear from the context whether the subscript on refers to a time or a sampling event.
We denote the population-wide number of copies of allele A in generation t by ct and the population-wide frequency of allele A by yt. In practice, we do not observe the true population count of allele A. Instead, the data consist of observed counts of the number of times allele A is observed in K different samples of sizes haplotypes, taken at times . For simplicity, we assume that each sampling time tk is an integer for . The consecutive observed counts will be denoted by .
In general, we will denote random variables corresponding to observed quantities using capital letters (e.g., Ok, Ct, and Yt). The goal is to compute the probability of the observed data, conditional on the model parameters Θ.
Probabilities of Frequency Trajectories
Several different evolutionary models can be used to describe stochastic allele frequency changes over time in a population. Discrete changes in allele frequency are often modeled using the Wright–Fisher and Moran processes, whereas continuous changes are often modeled using the diffusion approximation of the Wright–Fisher process (Karlin and Taylor 1981; Ewens 2004; Wakeley 2008) or one of several approximations of the diffusion (Feder et al. 2014; Lacerda and Seoighe 2014).
Because it is unclear which model provides the most accurate description of biological evolutionary processes, we take the approach in this paper of deriving exact probabilities of allele frequency trajectories under two different evolutionary models: the discrete Wright–Fisher process and the continuous diffusion approximation.
Under the Wright–Fisher model, the probability of the observed allele counts can be obtained using the recursive formula presented in Procedure 1. Under the diffusion approximation, the probability can be obtained using the recursive formula presented in Procedure 2.
In the “Deterministic allele frequency trajectories under the Wright–Fisher model” and “Deterministic allele frequency trajectories under the diffusion model” sections, we show that if drift is ignored and allele frequencies evolve deterministically, then the probabilities and can be reduced to the simpler approximate probabilities and which ignore the population history and which are computed using Procedures 3 and 4, respectively.
Different estimates of the model parameters Θ can be obtained using each of the different probabilities , and by finding the value of Θ that maximizes the given probability of the observed allele counts . In our analyses, we estimated the model parameters Θ separately using each of the different probabilities, yielding the estimators , and . The estimator accounts for drift under the discrete Wright–Fisher model, whereas drift in this model is ignored by the estimator . Similarly, the estimator accounts for drift under the diffusion model, whereas drift in this model is ignored by the estimator .
The degree to which accounting for drift can improve estimates of selection coefficients can be investigated by comparing to on trajectories simulated under the discrete Wright–Fisher model and by comparing to on trajectories simulated under the diffusion approximation.
1: Define the quantities and , where with .
2: Initialize .
3: For , compute
4: Compute
______________________________________________________________________________________________
Modifications:
If is unspecified, omit Step 1and set in Step 2.
If conditioning on the event SK that allele A is segregating in the final sample, omit Step 4 and instead compute using equation(C.1).
1: For an initial starting frequency y0 initialize
where is the vector of eigenfunctions of the diffusion operator given inequation (A.14) and is given in equation (A.18).
2: For , compute
and
where the matrices , and are given byequations (A.17), (B.10), (A.15), and (A.11), respectively, and ζ is the set of Chebyshev nodes in the interval . The matrix inverse is computed easily using the diagonal matrices and in equations (A.18) and (A.19).
3: Compute
(1) |
where is the (0, 0) element of matrix in equation (A.18) and is the 0th element of the vector in equation (A.14).
______________________________________________________________________________________________
Modifications:
If y0 is unspecified, omit Step 1 and initialize using equation (B.17). Then iterate over in Step 2.
If conditioning on the event SK that allele A is segregating in the final sample, omit Step 3 and instead compute using equation (D.1).
1: Starting with , for ,
Compute .
Compute
where .
2: Compute
______________________________________________________________________________________________
Modifications:
If y0 is unspecified, set for a large value M and repeat Steps 1 and 2 for the dense uniform grid of M + 1 values . Set .
1: Fix a large integer n and set .
2: Starting with , for , compute
3: Compute
______________________________________________________________________________________________
Modifications:
If y0 is unspecified, set for a large value M and repeat Steps 1 and 2 for the dense uniform grid of M + 1 values . Set .
Overview of the Experimental Design
We simulated allele frequency trajectories under a variety of selection strengths and piecewise constant population histories reflecting demographic patterns such as exponential growth, bottlenecks, rapid population size oscillations, and constant histories. We then compared the demography-aware estimates and with the estimates and that ignore drift to study the degree to which accounting for population size can improve the accuracy of inferences.
Expected Allele Frequency Trajectories
Before comparing the accuracy of the different estimators, we first explored the degree to which trajectories that ignore drift differ from trajectories that account for drift resulting from finite population sizes. Figure 2 shows the expected frequency of allele A in a discrete Wright–Fisher population of constant size for several different initial allele frequencies, selection coefficients, and effective population sizes. Figure 2 illustrates that, for any starting frequency and selection coefficient, the mean allele frequency trajectory approaches the mean trajectory in a population without drift (i.e., in a population of infinite size), as the true population size increases. Moreover, if the initial frequency is sufficiently high, the expected trajectory is close to its deterministic limit even when the population size is small and drift is high.
From figure 2, it can be seen that an effective population size of several thousand individuals is often sufficiently large to produce deterministic behavior, even when the selection coefficient and initial allele frequency are small. Thus, selection coefficient inference methods that ignore drift are likely to be accurate for a broad range of population sizes and selection coefficients. As we will see, methods that ignore drift can be almost as accurate as methods that account for drift, even within the small-parameter-value regime.
Inference Accuracy for Populations of Constant Size
To explore how accounting for drift affects inference accuracy, we first considered the accuracy of inferring selection coefficients in a population of constant finite size. Figure 3 shows the maximum likelihood estimate (MLE) of the selection coefficient for three different effective population sizes (N = 100, 500, 1000), three selection coefficients , and two initial allele frequencies for h = ½. In each of panels A-R, the violin plots summarize the maximum likelihood estimates for 100 different simulation replicates in which an allele frequency trajectory was simulated for 500 generations with samples of size n = 50 taken at generations 50, 100, 150, 200, 250, 300, 350, 400, 450, and 500.
For the discrete Wright–Fisher model, allele frequency trajectories were simulated by sampling the allele frequency in each generation from the vector of transition probabilities, conditional on the frequency in the previous generation using Procedure 5. Under the diffusion model, trajectories were sampled using the approach in Procedure 6. Maximum likelihood estimates were obtained for the Wright–Fisher trajectories using the grid search described in Procedure 7 over the likelihoods computed using Procedures 1 and 3, and maximum likelihood estimates for the diffusion trajectories were obtained using the same grid search approach over the likelihoods computed using Procedures 2 and 4. In each panel in figure 3, the estimates and were computed for the same set of 100 allele frequency trajectories simulated under the discrete Wright–Fisher model and the estimates and were computed for the same set of 100 allele frequency trajectories simulated under the diffusion model.
By comparing the estimates computed accounting for drift with the estimates obtained ignoring drift, it can be seen that all methods have similar accuracies. All methods perform well when the population size, selection coefficient, and initial frequency are sufficiently large (e.g., fig. 3I for the case and Panels 3K through 3R for the case ), and all methods have reduced accuracy, otherwise. To put this another way: the allele frequency trajectories for which selection coefficients are inferred accurately by demography-aware methods correspond to those for which the deterministic estimates are also accurate. Thus, methods that ignore or account for drift are likely to produce estimates with similar accuracy.
Moreover, it can be seen from the scatter plots (Panels 3S–3X), which compare the estimators and , that the point estimates themselves are very similar for both the demography-aware and deterministic methods. Although this similarity may be expected given that the deterministic methods differ from the demography-aware methods only in that they ignore the additional variability in the allele frequencies arising from genetic drift, it is surprising that the point estimates are so similar, as the overall expected allele frequency trajectory in the deterministic case can differ considerably from the expected trajectory accounting for drift (fig. 2A–C).
As the magnitude of the selection coefficient decreases, the point estimates of the deterministic and demography-aware estimators remain well correlated, although the accuracy of all methods decreases. This can be seen in figure 3S–X and in supplementary fig. S1, Supplementary Material online in which the estimates by the different methods remain correlated, but become more variable as the selection coefficient s decreases and an increasingly large number of trajectories drift out of the population quickly, leading to strong negative estimates of selection coefficients by both methods.
Inference Accuracy in Populations of Piecewise Constant Size
We next explored the degree to which accounting for more complicated population histories can improve maximum likelihood estimates, focusing on three scenarios, a population with a bottleneck, a population undergoing exponential growth, and a population undergoing rapid oscillations in size. Under each scenario, we simulated 100 allele frequency trajectories for an allele with selection coefficient s = 0.05, dominance parameter h = ½, and initial frequency under the Wright–Fisher model and separately under the diffusion model. The parameter values in these simulations were chosen so that drift would be strong enough to affect allele frequency trajectories, but not strong enough to result in poor estimates of selection coefficients by the full-likelihood methods.
In addition to comparing estimates made by the deterministic estimators, and , with those of the exact estimators, and , that account for the true time-varying population history, we investigated the effect on accuracy of using crude, yet reasonable estimates of the population history. In particular, we also inferred selection coefficients using likelihoods computed using variants of Procedures 1 and 2 in which the population was assumed to consist of a single epoch of constant size equal to the Watterson estimate (Watterson 1975, Eqn. 1.4a; Hein et al. 2005, p.62). The Watterson estimate was obtained by computing the expected site frequency spectrum (SFS) for the multi-epoch model for a sample size of 20 alleles using the method of Kamm et al. (2016) and then inferring the effective size of a single epoch using Watterson’s estimator (Computing the Watterson estimator of Ne from the expected SFS of a piecewise constant population). The discrete Wright–Fisher and diffusion estimators based on the Watterson estimate of effective size are denoted by and , respectively.
The Case of a Bottleneck
To model populations with bottlenecks, we considered populations composed of three epochs, each of length 100 generations, with sizes N1, N2, and N3 satisfying . Samples of size 50 were taken at times , and 300. Figure 4A and B shows the results for two different populations; in the population in figure 4A, we set and in the population in figure 4B we set .
From figure 4A and B, it can be seen that all methods performed similarly. However, the deterministic estimators had significantly lower bias than the full-likelihood estimators computed using the mis-specified population history for a bottleneck of size with (fig. 4A). In the case of the bottleneck in figure 4A, the means of the deterministic estimators and were not significantly different from the true selection coefficient s (P = 0.59 and P = 0.93 for t-tests of the null hypotheses and versus the alternative hypotheses and ). In comparison, the means of the full likelihood estimators and with mis-specified histories were significantly different from s (P = 0.03 and P < 0.01, respectively). A similar trend was observed for the bottleneck history with larger sizes shown in figure 4B (P = 0.64 and P = 0.12 for t-tests of the null hypotheses and , compared with P = 0.13 and P = 0.01 for t-tests of the null hypotheses and ). Note that, despite the tight bottleneck in figure 4A, inferences were still relatively accurate due to the larger sizes of epochs 1 and 3.
The Case of Exponential Growth
To model exponential growth, we considered populations composed of five epochs, each of length 100 generations, with effective population sizes chosen to represent 5-fold exponential growth across all five epochs. Specifically, the size in epoch ℓ was set to , where we chose and the growth constant η was chosen such that . Samples of size 50 were taken in generations 100, 200, 300, 400, and 500. From the results in figure 4C and D, it can be seen that all methods performed with similar accuracy in the growth scenario.
The Case of Rapidly Oscillating Population Size
Figure 4E and F shows inferences of the selection coefficient in a population with rapidly oscillating size. Such demographic histories, which are often seen in insect populations like Drosophila, have moderate arithmetic mean sizes, but small harmonic mean sizes and experience episodes of extreme drift.
In the simulations shown in figure 4E, the population size oscillates rapidly between 10 and 100 diploids every five generations. In the simulations shown in figure 4F, the population size oscillates between 100 and 500 diploids every five generations. From figure 4, it can be seen that the methods that ignore drift have similar accuracy to the methods that account for drift. However, the methods that account for drift are slightly less biased when the population size oscillates between very small values (fig. 4E) (P = 0.017 and P = 0.012, respectively, for the one-tailed t-tests of the null hypotheses and versus the alternative hypotheses and ).
Conditioning on Segregation in the Final Sample
It is sometimes of interest to infer the selection coefficient of an allele, conditional on the event that the allele is segregating in the most recent sample. Such conditional inferences are useful if alleles are ascertained in present-day samples and their historical trajectories are subsequently investigated.
Conditioning on segregation in the final sample is also useful for estimating weak positive selection coefficients when initial allele frequencies are low. This is because a large fraction of weakly selected alleles with low initial frequencies will drift out of the population quickly resulting in large negative estimates of their selection coefficients. However, more accurate estimates can be obtained for the subset of alleles that are not lost quickly, which can be seen, for example, in figure 3B–I through I in which the part of the density corresponding to alleles that are not lost quickly from the population is localized around the true selection coefficient.
Considering only alleles that are segregating in the final sample can lead to biased estimates of selection coefficients if likelihood methods do not properly condition on segregation. For example, weakly selected alleles typically drift out of small populations quickly. Thus, weakly selected alleles that escape loss and ultimately fix generally exhibit faster-than-expected increases in frequency that are similar to the unconditional trajectories of alleles under stronger selection. Thus, if a likelihood method does not properly account for conditioning, weakly selected alleles that are segregating in the final sample will have inflated inferred selection coefficients.
Estimators that ignore drift cannot be modified to condition on the event of segregation in the final sample because they implicitly assume that alleles follow fixed trajectories whose long-term behavior in the absence of mutation is entirely determined by the selection coefficient: fixation for positively selected alleles and loss for negatively selected alleles. Thus, estimators that ignore drift are expected to produce biased estimates of selection coefficients when applied to conditioned trajectories.
In contrast, the allele frequency trajectories in likelihood methods that account for the population size are modeled stochastically, allowing likelihoods to be modified to condition on segregation in the final sample. It is expected that methods that account for the true population size can be modified to produce accurate estimates of selection coefficients, whereas methods that ignore drift will necessarily produce biased estimates.
Simulations Conditioning on Segregation
To investigate the degree to which accounting for drift can improve estimates of selection coefficients when allele frequency trajectories are conditioned on segregation in the final sample, we modified the discrete Wright–Fisher probability in Procedure 1 to compute the likelihood conditional on segregation in the final sample using results derived in the “Conditional Probabilities” section. Under a grid search, this modified likelihood yields the conditional maximum likelihood estimator . We compared the estimates computed using the exact conditional estimator with estimates computed using the approximate estimator that ignores drift and cannot be modified to account for conditional allele frequency trajectories.
The effect of failing to account for conditioning is evident in the violin plots in figure 5A–I corresponding to the unconditional approximate maximum likelihood estimates . As expected, when the true selection coefficient is small (), the estimates are biased upward. Conversely, when the selection coefficient is larger (), the approximate estimator produces negatively biased estimates because alleles under strong positive selection that remain segregating in the final sample show slower-than-expected increases in frequency. In contrast to the estimator , the bias is negligible in the estimator , which accounts for drift and properly conditions on segregation in the final sample.
The results shown in figure 5A–I suggest that methods that account for drift are capable of significantly improving the accuracy of estimates of selection coefficients when allele frequency trajectories are conditioned on segregation. The differences in accuracy between methods that ignore or account for drift are visible for a range of selection coefficients and population sizes. However, the differences in accuracy between the methods diminish as the population size becomes larger.
Simulations Conditioning on Segregation or Fixation
The magnitude of the bias in the estimates is due in part to the event on which trajectories are conditioned. In cases involving positive selection in populations of moderate or large size, most alleles will be fixed in the final sample (e.g., fixation within 10 generations when s = 0.1, h = ½, , and N = 1000). Thus, it may sometimes be more natural to condition on the event FK that a selected allele is found (segregating or fixed) in the final sample. Under this conditioning scheme, the approximate estimator will not generally produce negatively biased estimates of selection coefficients because allele frequency trajectories will not be constrained to those which exhibit slower-than-expected increases in allele frequency.
In light of these considerations, we repeated the analysis shown in figure 5A–I, simulating allele frequency trajectories conditional on the event that the allele was segregating or fixed in the final sample. To compare the estimates with maximum likelihood estimates that fully account for drift and the proper conditioning, we also modified the probability computed in Procedure 1 to condition on the event FK of segregation or fixation in the final sample, yielding the conditional probability (eq. 21) with the associated estimator .
By comparing figure 5J–R with figure 5A–I, it can be seen that the estimator has considerably less bias when conditioning on the event FK than when conditioning on SK. Although the bias is still high when the population size is small (), it decreases quickly as the population size increases and becomes comparable to the bias in the properly conditioned, demography-aware estimator when the population size is greater than approximately N = 500 diploids. In contrast to figure 5E–I, the bias in observed in figure 5M–R is positive because the trajectories on which these estimates are based exclude only those in which the allele is lost; thus, they exhibit faster-than-expected growth on average. The results in figure 5J–R suggest that under certain conditioning schemes, methods that ignore drift can produce similar estimates to methods that account for drift.
The Effect of Sample Size on Accuracy
When the sample size is small, the variance in estimates arising from sampling noise will tend to obscure small differences between estimators that ignore or account for population size. Thus, when comparing methods, it is important to sample a sufficiently large number of alleles to ensure that the differences between the methods due to ignoring or accounting for drift are visible.
To evaluate the effects of sample size on inference accuracy, we inferred the selection coefficient for a range of sample sizes for several different combinations of the population size and selection coefficient. Figure 6 shows a plot of the variance in selection coefficients inferred using Procedures 1 and 3 for sample sizes ranging from n = 2 to n = 50. For each combination of Ne, s, and n, the trajectories of 100 alleles were simulated under the Wright–Fisher process with an initial allele frequency of . Samples were taken in generations 50 and 100.
The plots in figure 6 suggest that variability due to small sample sizes has a strong effect on the variability in estimates only for sample sizes smaller than 10 alleles. Thus, in all of our simulations, we have used a sample size of n = 50 alleles so that differences between estimators are not likely to be obscured by the variance in estimates due to small sample sizes.
Unspecified Initial Allele Frequencies
In our simulations, we have assumed that the initial frequency y0 of allele A at time t = 0 is known. Knowledge of the initial allele frequency provides approximately the same information as an informative sample taken at time t = 0. Although this information can improve the estimate of the selection coefficient if few samples are taken, it contributes relatively little information to the estimate of s when the number of samples is moderate. However, because a lack of knowledge of the initial frequency could potentially lead to greater errors in the deterministic estimators, we also compared versions of the estimators that assume a uniform prior distribution on the allele frequency at the time of the first sampling event, and which do not incorporate information about the initial allele frequency (Computing When Is Unspecified, Computing When y0 Is Unspecified, and Deterministic Estimators When the Initial Allele Frequency Is Unknown).
Figure 7 shows a comparison between the deterministic and exact Wright–Fisher estimates when the initial frequency is drawn uniformly from the interval and the selection coefficient is inferred using Procedures 1 and 2 assuming no knowledge of the initial allele frequency. From figure 7, it can be seen that the deterministic and exact estimators produce similar results, even in the absence of knowledge of the initial frequency. Moreover, although the estimates produced by the two methods decrease in accuracy as the number of sampling events decreases, the accuracies of the two kinds of methods remain similar. In particular, From figure 7D–F, it can be seen that both the deterministic and exact methods are relatively accurate when allele frequency trajectories do not drift out or fix immediately due to random fluctuations.
Violation of Model Assumptions
It is possible that violations of model assumptions in real data could increase the differences in accuracy between the deterministic and demography-aware estimators. To investigate this possibility we compared the performance of the deterministic and demography-aware discrete Wright–Fisher estimators using experimentally sampled allele frequency time series data that are thought to violate several model assumptions. In particular, we considered time series data for an allele at the medionigra locus in the species Panaxia dominula, which confers a darkened wing phenotype (Cook and Jones 1996). Although these samples are thought to represent a single population that is isolated from migration events, there is evidence for temporal fluctuations in the selection coefficient, frequency dependent selection, and assortative mating. These data are also useful for our analyses because they include estimates of population sizes obtained using mark, release, and recapture. The data set, which spans over 40 years, provided important evidence for natural selection resulting from environmental pressures.
We computed the log likelihood as a function of the selection coefficient s for three different values of the dominance parameter (, and 1). Figure 8 shows the log likelihood for different values of h, along with asymptotic normal approximations of 95% confidence intervals. Figure 8 shows that the log likelihood surfaces computed using Procedures 1 and 3 are qualitatively similar, yielding similar point estimates for s. However, the confidence intervals for the deterministic estimator are considerably smaller than those of the demography-aware estimator. This result is expected, given that the deterministic estimator ignores the largely symmetrical variability arising from drift and considers only variability in the sampling frequency.
Note that the point estimates presented in figure 8 are similar to those estimated previously in other studies. Using a model of additive selection (h = 0.5) Cook and Jones (1996) estimated whereas Mathieson and McVean (2013) inferred . These estimates are close to the maximum likelihood estimates that we inferred under the same model of additive selection ( and ) (note that the estimates in Cook and Jones (1996) and Mathieson and McVean (2013) are reported using a different parameterization of the dominance model than the one we have used in this paper; thus, we have scaled the selection coefficients reported in these papers so that they are comparable with the ones reported here).
It is of interest to note that Mathieson and McVean (2013) found evidence that the medionigra allele is recessive, finding that the likelihood was maximized for a recessive model with a strong negative selection coefficient around , although they note that such a strong negative selection coefficient violates the assumptions of the Gaussian model under which their likelihoods were derived. In accordance with the result of Mathieson and McVean (2013), we find that both the deterministic and exact Wright–Fisher likelihoods we considered were also maximized for a recessive pattern of dominance (h = 0) with a large negative selection coefficient around .
Under a recessive model of dominance, the strongly negative selection coefficient that was inferred by the estimators in this study and by that of Mathieson and McVean (2013) is reasonable given that the allele frequency decreases rapidly, which would be unlikely under weak selection where alleles must combine in homozygotes for selection to act. Unlike the analyses of Mathieson and McVean (2013), a selection coefficient of does not violate our model assumptions. Thus, our analysis provides further support that the medionigra variant is largely recessive, although there is evidence that the dominance of the variant can change over time (Cook and Jones 1996).
Computational Efficiency of Methods
As we have noted, methods that assume that allele frequency trajectories are deterministic can be considerably faster than methods that account for population size histories. Table 1 shows the average runtimes of the estimators , and for the computations used to produce figure 3A–I.
Table 1.
Ne | s | ||||
---|---|---|---|---|---|
100 | 0.01 | 0.01 | 2.30 | 4.74 | 197.25 |
0.05 | 0.01 | 2.36 | 4.23 | 204.66 | |
0.1 | 0.01 | 2.29 | 3.98 | 217.34 | |
500 | 0.01 | 0.02 | 134.07 | 4.41 | 185.18 |
0.05 | 0.01 | 132.45 | 4.41 | 496.83 | |
0.1 | 0.02 | 126.35 | 4.46 | 531.15 | |
1000 | 0.01 | 0.02 | 175.27 | 4.64 | 196.90 |
0.05 | 0.02 | 191.53 | 4.78 | 815.27 | |
0.1 | 0.02 | 199.32 | 4.67 | 1950.59 |
From the table, it can be seen that the runtimes are considerably faster for the estimators based on deterministic trajectories ( and ). Moreover, the runtimes for and do not depend on the population size or selection coefficient. In comparison, the runtimes for the estimators and increase with increasing Ne and s because these methods depend on eigenvalue decompositions or sparse matrix products, which require larger matrices or greater precision when Ne or s is large. Note that although the discrete Wright–Fisher estimator is considerably faster than the diffusion estimator for the scenarios we considered, the diffusion estimator can still be more efficient when samples are widely separated in time and the repeated matrix–vector products required by the discrete Wright–Fisher method become cumbersome.
Discussion
The results of our analyses suggest that accurate estimates of selection coefficients from allele frequency time series data can often be obtained by assuming that alleles evolve without drift in a population of infinite size. In the majority of our simulations, the estimates obtained using deterministic approximations were nearly as accurate as estimates obtained by explicitly modeling the true population history and they were sometimes more accurate than estimates obtained using crude but reasonable estimates of the population history. The latter result indicates that an overestimate of the population size, for example due to underestimation of the mutation rate or other factors, may have only small adverse effects on the accuracy of exact likelihood estimates of the selection coefficient from time series data, a result that is of interest because population size histories can be difficult to infer accurately.
Surprisingly, estimates made under the deterministic approximation were generally as accurate as estimates that accounted for drift, due to the fact that the exact maximum likelihood methods had low accuracy when drift was strong. Accounting for the true population history only resulted in significantly improved estimates of selection coefficients when conditioning on the event that the target allele was segregating in the final sample. Methods that modeled the true population history were more accurate in this case because they could be modified to model conditional trajectories, whereas methods that assumed infinite population sizes could not. These results suggest that methods that account for drift are likely to be preferable under circumstances in which conditioning on segregation is desirable. However, it is important to note that deterministic methods can perform well when population sizes are moderately large if allele frequencies are conditioned on a slightly different event: the event that an allele is found (segregating or fixed) in the final sample.
It is important to note that we have focused on estimates of the selection coefficient rather than the dominance parameter h, which is a more difficult task. However, in our analyses of the medionigra variant in the Panaxia dominula moth, we found that both the deterministic and exact likelihoods were qualitatively similar in both h and s and were both maximized for a recessive model of dominance. A comprehensive exploration of the effects of deterministic assumptions on the inference of h is deferred to future analyses.
It is also important to note that our analyses do not imply that the deterministic and exact estimators will produce exactly the same point estimate for a given allele frequency trajectory. Although the scatter plots in figures 3, 7, and supplementary figure S1, Supplementary Material online suggest that both deterministic and exact estimators often agree closely on the magnitude of the selective strength, the estimates for any given trajectory can differ by 100% or more. This result is consistent with the observation by Schraiber et al. (2016) that specifying different demographic histories led to different estimates of dominance and selection parameters for alleles affecting coat coloration in horses. Although specific point estimates can differ, our results demonstrate that deterministic and exact estimators have similar accuracies for estimating the strength of selection, even when the population size is small.
The idea that ignoring drift can lead to accurate estimates of selection coefficients is not new. In fact, inference methods based on deterministic allele frequency trajectories capitalize on exactly this idea. However, our comparison with estimators based on exact likelihoods makes it possible to characterize the relative loss in accuracy that is incurred when drift is ignored, as well as the demographic, evolutionary, and sampling scenarios under which accounting for drift is likely to be important.
The comparatively accurate estimates achieved by methods that assume deterministic allele frequency trajectories are encouraging for three primary reasons. First, a large number of studies have relied on the assumption that alleles evolve deterministically in order to infer selection coefficients from biological time series data. Our results suggest that these estimates are likely to be nearly as accurate as those obtained using the exact likelihood accounting for drift. Second, estimators based on deterministic trajectories can be considerably faster than estimators that account for drift, making them useful for inferring selection coefficients at large numbers of loci. Third, it may be easier to obtain analytical results under the assumption that allele frequencies change deterministically, simplifying the development of inference methods for inferring selection coefficients under more complicated scenarios; for example, inferring coefficients at linked loci (Illingworth et al. 2012). Finally, the ability to ignore the population size is useful in situations in which the true population history is unknown or difficult to infer.
In addition to characterizing the degree to which accounting for drift can improve estimates of selection coefficients, our results shed light on the accuracy of exact maximum likelihood methods for inferring selection coefficients from allele frequency trajectories. In accordance with predictions about the relative strengths of genetic drift and selection (Gillespie 1998, section 3.7) and experimental work (Gallet et al. 2012), our findings suggest that very small selection coefficients () are difficult to infer if the initial allele frequency and population size are not large. Moreover, even if the population size is large, the accurate inference of a small selection coefficient may require samples taken over hundreds of generations, during which time the selection coefficient could change considerably (Felsenstein 1976; Siepielski et al. 2009).
Despite the difficulties of inferring weak selection coefficients when the population size is small, coefficients of one percent or lower can be inferred accurately if the initial allele frequency is sufficiently high. It is important to note that the selection coefficient need not be high at the time of the very first sampling event, as long as the allele has reached a sufficiently high frequency at one of the intermediate sampling events, leading to quasi-deterministic behavior between some sampling time points that can be exploited by the maximum likelihood estimator.
Although we have only considered positively selected alleles in our simulation analyses, our results apply equally well to negatively selected alleles, as it is arbitrary whether we choose to track the trajectory of the allele with higher or lower fitness. We have also focused on low initial allele frequencies for selected alleles; however, it is clear from figure 2 that allele frequency trajectories become increasingly deterministic as the initial allele frequency increases. Thus, the accuracy of a method that assumes a deterministic trajectory will become more similar to that of a method that accounts for drift as the initial allele frequency increases. Conversely, for negatively selected alleles, the accuracy of the deterministic method will approach that of the exact likelihood as the initial allele frequency decreases. Thus, our analyses provide a characterization of inference accuracy for both positively and negatively selected alleles for a broad range of starting frequencies.
At first glance, our finding that the population size does not strongly influence estimates of selection coefficients might appear to be at odds with the fact that population size histories can be inferred from allele frequency time series data (O’Hara 2005; Bollback et al. 2008; Ferrer-Admetlla et al. 2015). However, this is not the case. Methods for inferring the population size capitalize on information in the short-term fluctuations of the allele frequency around its expected value, arising from drift; conversely, estimators of selection coefficients capitalize on the long-term changes in allele frequency due to selection, effectively averaging over the short-term fluctuations due to drift. Our results suggest that allele frequencies often change quasi-deterministically, even in small populations. Thus, deviations around the expected trajectory can be distinguished from long-term changes, allowing effective population sizes to be inferred accurately even in small populations.
We have conducted our analyses under two different models of evolution: the discrete Wright–Fisher model and the continuous diffusion model. Although the diffusion model was developed as an approximation to the Wright–Fisher process, it also captures the limiting behavior of a large class of evolutionary models, including the Wright–Fisher process, as the population size grows to infinity and mutation and selection parameters are scaled accordingly. Thus, it is reasonable to believe that our findings will generalize to maximum likelihood estimators derived under a wide range of evolutionary models.
Taken together, our results help to characterize the properties of maximum likelihood methods for inferring selection coefficients from time series data. Because of the accuracy and beneficial properties of maximum likelihood methods, it is reasonable to believe that our results provide insight into the accuracy with which it is possible to infer selection coefficients from biological data, and the degree to which accounting for the true population history can improve these estimates. Our results also provide justification for the use of fast inference methods based on the assumption that allele frequencies evolve deterministically. Such methods can be applied to infer selection coefficients efficiently on large genomic data sets with many loci. Finally, our results provide further justification for the use of deterministic approximations in the development of statistical approaches for studying time series data.
Methods
In this section, we compute the exact probability of an allele frequency trajectory in a population of piecewise-constant size under the discrete Wright–Fisher model and under the diffusion approximation. We also describe how drift can be ignored in these probabilities, yielding approximate estimators of selection coefficients that are similar to commonly used approaches that assume deterministic allele frequency trajectories.
Computing under the Discrete Wright–Fisher Model
To compute the probability under the discrete Wright–Fisher model, we make use of a hidden Markov model (HMM) similar to that presented in Steinrücken et al. (2014). However, the hidden state in our discrete model is the count ct of the number of (unobserved) copies of allele A in the population at time t, rather than the continuous allele frequency yt.
In our model, the count ct of allele A evolves according to a Wright–Fisher process in which mutation occurs, followed by random mating, selection, and drift. Given that the count of allele A in generation t is ct = i, let be the frequency of allele A in the gamete pool after mutation. Then
(2) |
After random mating, the fraction of zygotes with each of the genotypes AA, Aa, and aa is , and , from which it follows that the fraction of genotypes of each kind remaining after selection is given by
(3) |
where is the mean fitness of the population.
Immediately after selection and before drift occurs, the probability that a randomly chosen allele is of type A is given by . Then, as the result of drift, the count of allele A in generation t + 1 is binomially distributed with mean . Thus, the probability that allele A has count j in generation t + 1, given that it had count i in generation t is
(4) |
The Wright–Fisher transition matrix from generation t to generation t + 1 is the matrix with entry i, j given by
(5) |
which can be used to obtain the allele frequency distribution at each discrete generation t given the initial distribution at some time r < t. In particular, define , to be the distribution of the count of allele A in generation t. Using equation (5), can be computed recursively as
(6) |
for r < t.
Computing the Probability
The probability of the observed data is computed using the forward procedure for hidden Markov models. In particular, we define the vector whose ith entry is the joint probability of the population-wide count of allele A at the kth sampling event and the observed sample allele counts up to sample k:
(7) |
To simplify calculations, we also define the conditional “emission probability”
(8) |
of the observed allele count, conditional on the population allele count. The probability in equation (8) comes from the fact that the observed allele count at time tk can be modeled as a binomial random variable with sample size nk and probability . Although the observed allele count is, strictly speaking, hypergeometric we use the binomial distribution to maintain consistency with the formulas for the diffusion model, which are often derived using a binomial sampling distribution. The binomial and hypergeometric distributions are very similar for the population and sample sizes we consider. We then construct the emission probability vector
(9) |
The probability of the data is then given by the forward procedure (Rabiner 1989), outlined in Procedure 1. In Procedure 1, the formula for comes from the fact that
(10) |
where denotes the square matrix whose diagonal entries are given by .
It has been noted by several authors that computing powers of the transition matrix is computationally prohibitive, providing one motivating factor for the use of approximations of the Wright–Fisher process, such as the diffusion and Gaussian approximations (Ewens 1963; Feder et al. 2014; Lacerda and Seoighe 2014). However, the products in Procedure 1 do not require products of the transition matrix because it suffices to repeatedly compute vector–matrix products instead of multiplying full matrices together. In practice, this can be done very quickly, even for large population sizes. A similar fast procedure was carried out by Zhao et al. (2014) to simulate trajectories under the Wright–Fisher model.
Computing When Is Unspecified
When the initial distribution is unspecified, the probability can be obtained by assuming that the distribution is uniform at the time of the first sampling event. Under this assumption, it follows directly from the second to last equality in equation (10) that the value of the joint density vector is given by
(11) |
where 1 is the vector of length with all entries equal to one. This form of can then be substituted into Procedure 1.
Computing under the Diffusion Approximation
The diffusion approximation models the evolution of the continuous population frequency Yt of allele A, rather than its count Ct. The time-evolution of the random frequency Yt is governed by the diffusion transition density given by
(12) |
for an infinitesimal increment dy. The quantity specifies the density of the allele frequency at time t, conditional on the value of the allele frequency at an earlier time s. For more details about the transition density function of the diffusion approximation, see Appendix A.
Using the diffusion transition density Steinrücken et al. (2014) developed an HMM to compute the probability of the data in a single epoch of constant size by efficiently integrating over the hidden allele frequencies at the set of sampling times. Here, we extend this HMM to the case of piecewise-constant population size.
To compute the probability efficiently, Steinrücken et al. (2014) define the quantities and satisfying
(13) |
and
(14) |
for an infinitesimal increment dy. The quantity is the joint density of the allele frequency at time tk and the observed counts up to sampling event k. The quantity is the joint density of the allele frequency at time tk and the observed counts up to sampling event k – 1.
It follows from the definition of that the probability of the data is given by
(15) |
The quantity can be obtained efficiently by recursion using the relationships
(16) |
and
(17) |
Equation (16) follows from the fact that the observed number of copies of allele A at sampling event k is binomially distributed with count nk and probability and equation (17) follows from the law of total probability integrating over .
Let be the ith eigenfunction of the backward diffusion operator and let be the speed density of in Epoch (Appendix A). Steinrücken et al. (2014) demonstrated that the recursive formulas in equations (16) and (17) can be evaluated efficiently by expressing and as series of the form
(18) |
and
(19) |
where , and where and are vectors of constants that encode the densities and . In Appendix B, we extend the results of Steinrücken et al. (2014) to derive recursive formulas for the coefficients and , resulting in Procedure 2, which computes the probability of an allele frequency trajectory under the diffusion approximation in a population of piecewise constant size.
Computing When y0 Is Unspecified
When the initial frequency y0 is unspecified, the probability can be obtained by assuming that the distribution of the allele frequency at the time of the first sampling event is uniform. Under this assumption, we show in Lemma B.3.2 that the coefficients encoding the distribution at the time of the first sampling event are given by equation (B.17). The form of in equation (B.17) is then used in Procedure 2.
Conditional Probabilities
Sometimes it is desirable to compute the probability of the observed allele counts conditional on the event SK that allele A is segregating in the final sample. In this section, we provide formulas for these conditional probabilities under the Wright–Fisher and diffusion models.
Computing
In the “Simulations Conditioning on Segregation” section, we consider the probability of the data conditional on the event SK that allele A is segregating in the final sample. In Appendix C, we show that in the case of the discrete Wright–Fisher model,
(20) |
where is defined in equation (7) and if , or 0 otherwise. The probability is given in equation (C.3). Thus, if we wish to compute conditional probabilities under the Wright–Fisher model, we carry out Procedure 1, replacing step 3 with equation (20).
Computing
Similarly, for the event FK that allele A is segregating or fixed in the final sample, we show in Appendix C that
(21) |
where is defined in equation (7) and if , or 0 otherwise. The probability is given in equation (C.6). If we wish to compute the conditional probability under the Wright–Fisher model, we carry out Procedure 1, replacing step 3 with equation (21).
Computing
In the case of the diffusion approximation, we show in Appendix D that the conditional probability of the data given SK can be computed as
(22) |
where if or 0 otherwise, and
(23) |
Thus, if we are interested in conditional probabilities under the diffusion model, we carry out Procedure 2, replacing step 3 with equation (22).
The Probability in the Absence of Genetic Drift
If we ignore genetic drift, the allele frequency changes deterministically over time, as it would in a population of infinite size. Here, we obtain versions of Procedures 1 and 2 in the case when the changes in allele frequency arising from genetic drift are negligible relative to the changes due to selection and recurrent mutation.
Deterministic Allele Frequency Trajectories under the Wright–Fisher Model
If there is no contribution to the change in allele frequency arising from genetic drift, the allele frequency in a given generation is equal to its expectation after mutation, random mating, and selection, conditional on its value in the previous generation. Because the expectation is not necessarily integer-valued, we no longer consider discrete integer allele counts ct. Instead, we track the expected allele frequency in the absence of drift, which we denote by , where the subscript ∞ denotes the expectation without drift as the effective population size tends to infinity.
The expected frequency in the absence of drift is obtained by combining equations (2) and (3), ignoring the drift step in equation (4), yielding
(24) |
where
(25) |
and . Equations (24) and (25) are iterated to find the allele frequency in any generation t > 0.
Equation (24) is related to equation 3.1 of Gillespie (1998), which describes the dynamics of allele frequency change in a population of infinite size without drift. The frequency trajectories described by this formula are closely approximated by logistic curves that have closed formulas (Feder et al. 2014, equation 3), which can be used to increase the speed of deterministic approaches even further. However, we have chosen to implement the exact formulas for better comparison with the exact likelihoods.
Deterministic Allele Frequency Trajectories under the Diffusion Model
Under the diffusion model in an Epoch of constant size, the allele frequency Yt obeys the stochastic differential equation (SDE)
(26) |
with the initial condition , where time is measured in units of generations and is the time at which Epoch begins (Karlin and Taylor 1981, Section 15.14; Durrett 2008, Section 7.2). The quantity in equation (26) controls random fluctuations due to drift whereas the quantity describes the deterministic change in the mean frequency of the allele over time due to mutation and selection and is given by
(27) |
In equation (27), we have rescaled the usual form of so that time is measured continuously in units of generations.
If the drift term in equation (26) is negligible compared with , then equation (26) can be approximated by the ordinary differential equation
(28) |
where we may write instead of Yt because the evolution of the allele frequency is deterministic and follows its expectation in the absence of drift.
We can also suppress the explicit dependence on the epoch by defining , yielding
(29) |
which holds for the full population history across all epochs . Equation (29) can be solved numerically, for instance by choosing a sufficiently small time step and iteratively computing .
Sample Probabilities Based on Deterministic Allele Frequency Trajectories
To compute the probability under either the discrete Wright–Fisher or diffusion models when drift is negligible, we note that the observations are conditionally independent of one another, given the underlying allele frequencies. Thus, in the absence of drift we have
(30) |
for both the diffusion and Wright–Fisher models, where is the deterministic allele frequency at time tk, for . Using equations (24) and (30), the probability of the data under the Wright–Fisher model in a population without drift can be obtained using Procedure 3. Similarly, using equations (29) and (30), the probability of the data in the case of the diffusion model is given by Procedure 4.
Deterministic Estimators When the Initial Allele Frequency Is Unknown
If the initial allele frequency is unspecified, the selection coefficient can still be inferred using the deterministic estimators by integrating over the allele frequency at the time of the first sampling event, where we assume in these analyses that the initial allele frequency distribution is uniform. In practice, we perform this integration by specifying a dense grid of values , where for some large predetermined value M. Defining the probability of the data when the initial allele frequency is y to be , we compute the probability of the data as
(31) |
Simulations
Allele frequency trajectories were simulated under two different models, the discrete Wright–Fisher model and the continuous diffusion model. All simulations were carried out by iteratively sampling the allele frequency at the time points starting with a specified frequency y0 or with the initial frequency y0 sampled uniformly from the interval at time .
Wright–Fisher simulations were carried out using Procedure 5. Specifically, for an initial allele frequency y0, we took the initial distribution to be the standard basis vector of length with element set to unity and all other elements set to zero. We then sampled the population allele count at each sampling time tk by iteratively propagating the allele frequency distribution forward, conditional on the population count at the previous sampling time using equation (6). The derived allele count ok in each sample k was then chosen from a binomial distribution with sample size nk and probability . For the results in the “Inference Accuracy for Populations of Constant Finite Size”, “Inference Accuracy in Populations of Piecewise Constant Size”, “Conditioning on Segregation in the Final Sample”, and “The Effect of Sample Size on Accuracy” sections, we fixed or in all simulations, as indicated in the results section. In the “Unspecified Initial Allele Frequencies” section, we simulated y0 uniformly from the interval .
Simulations under the diffusion model were carried out using Procedure 6. Specifically, for an initial frequency y0, we computed the coefficients of the expansion of the initial condition in the basis functions of the first epoch. In Appendix D, we show that the probability of sampling i derived alleles at sampling event k (eq. D.3) can be obtained by computing a set of coefficients that we call (eq. D.7) that specify an expansion in the basis of eigenfunctions .
Thus, to sample the derived allele count o1 in the first sample, we propagated the initial coefficients to obtain and we used the fact that (eq. D.3) to sample with probability . Using the fact that is defined in Section D as the set of coefficients such that for an infinitesimal increment dy, we then set and iterated this procedure to obtain the subsequent coefficient vectors and the full set of samples . Simulated trajectories were checked for accuracy against those obtained using the software of Jenkins and Spanò (2015), personal communication.
Grid Searches to Infer Selection Coefficients
The same iterated grid search procedure, implemented as a wrapper function and outlined in Procedure 7, was used to infer selection coefficients using each of the methods described in Procedures 1–4. The grid search is initialized by specifying lower and upper bounds on a region over which to search for a value of the selection coefficient s that maximizes the likelihood. Assuming that the likelihood surface is convex and smooth, the grid search iteratively refines the search to the region that contains the optimum until the width of the region is smaller than a specified tolerance ϵ. In our analyses, we chose the initial region to be and we set .
For each allele frequency trajectory simulated under the Wright–Fisher model, Procedure 7 was carried out using Procedure 1 and then subsequently for the same trajectory using Procedure 3, with or without conditioning. For each allele frequency trajectory simulated under the diffusion model, Procedure 7 was carried out using Procedure 2 and then subsequently for the same trajectory using Procedure 4, with or without conditioning. A gradient descent optimization approach produced nearly identical results and is available as an option in the software package we have released.
Computing the Watterson Estimator of Ne from the Expected SFS of a Piecewise Constant Population
In the “Inference Accuracy in Populations of Piecewise Constant Size” section, we obtain a crude constant estimate of the effective size Ne of a piecewise constant population by computing the expected unnormalized SFS for the true history and then inferring the effective size of a constant population with the same level of genetic diversity as the piecewise constant population using the Watterson estimator.
Under the assumption that any given base pair in a collection of n haplotypes from a population is at most biallelic, the unnormalized and folded SFS for the n haplotypes is a collection of counts in which is the number of sites with one or the other allele appearing in i out of n copies in the sample. When the expected number μ of new mutations occurring in a population within in a region of fixed length is specified, the expected SFS for n haplotypes spanning the region can be computed for a population of piecewise constant size using algorithms by Kamm et al. (2016).
The effective size of a population of constant size extending infinitely far back into the past with the same level of diversity as the piecewise constant population can be computed using the Watterson estimator (Watterson 1975, Eqn. 1.4a; Hein et al. 2005, p. 62). The information used by the Watterson estimator is the diversity in the n sampled haplotypes of specified length, combined with knowledge of the expected mutation rate μ. The Watterson estimator is given by
(32) |
where S is the total number of segregating sites observed in the sample. The total number of segregating sites can be computed as , where is the ith entry of the un-normalized folded SFS.
Thus, given the expected SFS for a piecewise constant population computed using the algorithm of Kamm et al. (2016) and a specified mutation rate μ, we computed the Watterson estimate of the size Ne of a constant-sized population with similar diversity as
(33) |
When computing the expected SFS and the Watterson estimator, we arbitrarily chose a value of μ = 1.
1: Optional: sample the initial population allele frequency y0 uniformly from a predetermined interval , where , and set .
2: Let be the standard basis vector of length with element equal to one.
3: For ,
Compute the conditional distribution at sampling event k, conditional on the distribution at time using
Sample with probability , for .
Set , where is the standard basis vector of length with element equal to one.
4: For, sample .
5: If conditioning on segregation in the final sample and oK = 0 or oK = nK, return to Step 1. Otherwise, the sampled trajectory is .
1: Optional: sample the initial population allele frequency y0 uniformly from a predetermined interval , where .
2: For an initial starting frequency y0 initialize
where is the vector of eigenfunctions of the diffusion operator given inequation (A.14) and is given in equation(A.18).
3: For ,
Compute
For , compute
noting that (eq. D.3).
Sample ok = i with probability .
Set .
Here, the matrices , and are given by equations (A.17), (B.10), (A.15) and (A.11), respectively and ζ is the set of Chebyshev nodes in the interval . The matrix inverse is computed easily using the diagonal matrices and in equations (A.18) and (A.19).
4: If conditioning on segregation in the final sample and oK = 0 or oK = nK, return to Step 1. Otherwise, the sampled trajectory is .
1: Specify the bounds and su of an interval in which to search.
2: Specify the stopping tolerance ε.
3: While :
Evaluate the likelihood at the points , where .
Let .
If , set and .
Else, if , set and .
Else, set and .
4: Return .
Supplementary Material
Supplementary figure S1 is available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
This research was supported by the National Institutes of Health (grant number R01-GM094402) and by a Packard Fellowship for Science and Engineering. We would also like to thank Jeffrey Spence, Joshua Schraiber, and an anonymous reviewer for their helpful comments on this manuscript.
Appendix A
Diffusion Transition Densities: Background
The equations in the “Computing under the Diffusion Approximation” section were derived under a model in which the selected allele A evolves under the diffusion approximation in a population of piecewise constant size. Given that allele A has frequency x at a fixed time s, the density at a later time t is given by the transition density of the diffusion approximation (eq. 12). Steinrücken et al. (2014) derived a formula for the density for the case of a single population of constant size. Here, we review this derivation to provide background and notation for the derivation of the diffusion model probability computed in Procedure 2.
The Diffusion Approximation in a Population of Constant Size
Let denote the transition density restricted to a specific epoch of constant size with . The density is the unique solution of the Kolmogorov backward equation,
(A.1) |
satisfying the terminal condition , where is the Dirac delta distribution and is the Kolmogorov backward operator in the epoch defined in equation (A.2). The factor in equation (A.1) is introduced so that the time-scaling is the same in all epochs, and time is measured continuously in units of generations.
The Kolmogorov backward operator is defined in terms of the scaled mutation and selection parameters , and as
(A.2) |
where the quantity
(A.3) |
captures the contribution to the change in allele frequency arising from genetic drift and
(A.4) |
captures the contribution from recurrent mutation and selection.
Song and Steinrücken (2012) showed that can be expressed as an expansion in the eigenfunctions of of the form
(A.5) |
where are the eigenfunctions of with associated eigenvalues and the function is given by
(A.6) |
where . The inner product with respect to a weight function in equation (A.5) is defined for two functions f and g on an interval by
(A.7) |
In equation (A.5), the inner product is taken over the interval with respect to .
Expressions for the Quantities in Equation (A.5)
Expressions for the eigenvalues , eigenfunctions , and inner products in equation (A.5) can be obtained using a matrix formulation developed by Steinrücken et al. (2014). In particular, the eigenfunctions can be expressed as
(A.8) |
where and is the mth classical Jacobi polynomial (Abramowitz and Stegun 1972, Chapter 22). The vector is the nth left eigenvector of the infinite-dimensional matrix
(A.9) |
corresponding to the nth eigenvalue , where is the diagonal matrix with elements given by and the quantities are given by
(A.10) |
The matrix in equation (A.9) has elements given by
(A.11) |
which correspond to the coefficients of the three-term recurrence relation satisfied by the Jacobi Polynomials.
Matrix Expressions for the Transition Density
It is computationally and notationally simpler to express the eigenfunctions of and the transition density as products of matrices. In particular, we can express equation (A.8) as
(A.12) |
where
(A.13) |
and we can express the vector of eigenfunctions as
(A.14) |
where
(A.15) |
is the matrix whose rows are the left eigenvectors of the matrix in equation (A.9).
Using equations (A.5) and (A.14), the transition density in a single epoch ℓ can then be expressed as the matrix product
(A.16) |
where
(A.17) |
and . Steinrücken et al. (2014) showed that the matrix in equation (A.16) can be expressed as
(A.18) |
where
(A.19) |
and
(A.20) |
Thus, the transition density in a single epoch can be computed by constructing matrix , computing its eigenvectors and eigenvalues , and plugging these into the components of equation (A.16). In practice, because the matrix has infinite dimension, we approximate it by truncating its dimensions at some large integer M yielding approximate eigenvectors and eigenvalues . We also truncate the length of the vector at a large integer N. Although these truncations lead to approximate values of the transition density, the approximation can be made arbitrarily precise by taking to be sufficiently large.
Appendix B
Recursions for the Coefficients and
Discussion of the Problem
Here, we extend the HMM of Steinrücken et al. (2014) to accommodate populations of piecewise constant size. As we noted in the “Computing under the Diffusion Approximation” section, the probability of the data under the diffusion model can be obtained using the equation
(B.1) |
where the quantity is obtained by recursively evaluating equations (16) and (17). Because and can be expressed as the series and (eqs. 18 and 19), determining and amounts to determining the coefficients and . Thus, it is useful to develop analogs of the recursions (15) and (16) that apply to the coefficients themselves.
Equations for Propagating Coefficients
From equation (16), it can be seen that obtaining from involves only multiplication by a polynomial in y. Thus, the formula for obtaining the coefficients from the coefficients does not depend on the population history and, therefore, it can be obtained from results in Steinrücken et al. (2014) who derived formulas for the recursion for the case of a population of constant size. However, the formula for obtaining from (eq. 17) involves the transition probability , which depends on the population parameters Θ. Thus, it is necessary to account for the population history when computing the coefficients from the coefficients .
To obtain from , we first consider the more general problem of obtaining the generalized vector of coefficients from , where is defined as the vector of coefficients of the expansion of the generalized density defined by
(B.2) |
i.e., the joint density of the observed data up to sample k – 1 and the allele frequency at time t, where we assume so that the time t at which is evaluated occurs later than the time at which is evaluated. The density defined in equation (14) is a special case of the generalized density obtained when t = tk.
To obtain from , there are two scenarios to consider: the case in which both and t lie within the same epoch and the case in which and t lie within distinct epochs. Our derivations of these separate cases provide the results necessary for step 2 of Procedure 2.
The Case
If both and t lie within the same epoch , then the transition density is given by equation (A.16) and we have
(B.3) |
where the first equality follows from the definition of (eq. B.2) and the second equality follows from equation (17). In the fifth equality we have used the fact that . Because the eigenfunctions form a complete basis of the Hilbert space defined with respect to the inner product , the coefficients in the expansion on the left-hand side of equation (B.3) must equal those on the right-hand side. Thus,
(B.4) |
The Case When
If the times and t lie in different epochs, and , then the transition density is no longer given by equation (A.16). Instead, we must use a formula for the transition density across multiple epochs of different sizes. Steinrücken et al. (2015) showed that if the allele frequency density at time s in epoch is given by the expansion
(B.5) |
where are the coefficients encoding the density at time s in the basis of the eigenfunctions , then at time t in epoch , the allele frequency density is given by , where the coefficients are given by
(B.6) |
where is the time of the terminating boundary of epoch , and
(B.7) |
In equation (B.7), and are given by
(B.8) |
where is defined in equation (A.13) and
(B.9) |
for an arbitrary collection of distinct values . In practice, we take to be the Chebyshev nodes (Steinrücken et al. 2015).
By repeated application of equation (B.6), it follows that if the coefficients encode the density at time s in epoch , then the coefficients encoding the density at time t in epoch are given by where
(B.10) |
Moreover, if we define to be the vector of coefficients encoding the density , then it follows from equation (B.10) that the transition density for times s < t lying in distinct epochs is given by
(B.11) |
For the initial condition , it was shown in Proposition 1 of Steinrücken et al. (2014) that the coefficients are given by
(B.12) |
yielding
(B.13) |
which is obtained by plugging equation (B.12) into equation (B.11).
We can now plug equation (B.13) into equation (17) to obtain a relationship between and when times and t lie in different epochs:
(B.14) |
where we have again used the fact that . Finally, by the uniqueness of expansions in the Hilbert basis , we have
(B.15) |
The results derived in the “Equations for Propagating Coefficients” section provide the machinery necessary to propagate the coefficients and in the HMM over time. These results can now be used to compute the probability of observing a set of sampled allele frequencies under the diffusion model.
Derivation of Lemmas Necessary for Procedure 2
We now obtain three lemmas that provide the steps in Procedure 2.
Lemma B.3.1
If the initial frequency density at time is , then the value of the initial vector encoding the quantity is given by
(B.16) where is given in equation (A.14) and is the diagonal matrix given in equation (A.18).
Proof
Because depends only on the parameters in the first epoch, the proof of Lemma B.3.1 is the same whether we consider a population composed of a single epoch, or a population composed of multiple epochs. The equation for (eq. 18) is the same as equation 2.14 of Steinrücken et al. (2014). Thus, the coefficients in this paper correspond to the coefficients in Steinrücken et al. (2014) who proved Lemma B.3.1 for the case of a population of constant size. Thus, the first equality in Lemma B.3.1 follows directly from Proposition 1 of Steinrücken et al. (2014). The matrix representation in the second equality follows directly from the definitions of and . □
Lemma B.3.2
If the initial frequency density at time is unspecified and we instead assume a uniform prior on the allele frequency at the time of the first sampling event, then the coefficients encoding the density at time t1 are given by
(B.17) where , and are obtained by replacing , and in equations (A.18), (A.19), and (A.15) with σ, α, and β. Similarly, is obtained by replacing , and in equation (B.7) with σ, , α, , β, and , respectively, is the eigenfunction in equation (B.23) with , and replaced with σ, α, and β, and is the matrix given in equation (A.15) whose rows are the left eigenvectors of , i.e., the matrix in equation (A.9) with , and replaced with σ, α, and β. Finally, is the normalizing constant of the beta distribution with parameters α and β.
Proof
If the initial density of the allele frequency is unspecified, then the density of at the time t1 of the first sampling event, conditional on the observed number o1 of copies of allele A is given by Bayes Theorem:
(B.18) If we suppose that is uniform on the interval then equation (B.18) becomes
(B.19) where, in the final equality, we have used the fact that is the unnormalized beta distribution with parameters and . Thus, conditional on the first observation o1, we see that has a beta distribution with parameters and .
Steinrücken et al. (2014) showed in Appendix C of their paper that the distribution has coefficients given by
in an epoch with selection parameter σ and mutation parameters α and β. Thus, using the change of basis in equation (B.6), we see that the distribution has coefficients given by in the basis of eigenfunctions corresponding to an epoch with parameters , and , proving the result in equation (B.17). □
Lemma B.3.3
Let , and denote the matrices defined in equations (A.11), (A.15), (A.17), and (B.10), respectively, where is a set of distinct values arbitrarily chosen such that . Then the coefficient vectors and satisfy the recursive relationships
(B.20)
(B.21) where .
Proof
The relationship in equation (B.21) is obtained immediately by setting t = tk in equations (B.4) and (B.15), which follows because . The relationship in equation (B.20) does not depend on the population parameters Θ; therefore, equation (B.20) is the same as that derived in Steinrücken et al. (2014), who considered a population of constant size (see Steinrücken et al. 2014, Theorem 2). □
Lemma B.3.4
The probability of observing the allele counts , given the population parameters Θ is
(B.22) where is element 0, 0 of the matrix in equation (A.18) and
(B.23) The quantity in equation (B.23) is element i, j of the matrix given in equation (A.15).
Proof
Equation (B.22) can be obtained by integrating over the joint density of the data and the allele frequency at the final sampling time:
(B.24) where . In the fourth equality we have used the fact that is a constant function in y. To see why is constant, note that the eigenvalues are non-negative and strictly increasing. Thus, all terms in equation (A.5) must vanish in the limit , except possibly the term n = 0. Because approaches the stationary density in the limit , it must be the case that , so at least one term does not vanish. Thus, we have
(B.25) where we have used the fact that is proportional to the stationary density of the diffusion equation in Epoch . It follows from equation (B.25) that is constant. Thus, we obtain the result, proving equation (B.22). Equation (B.23) follows directly from the proof of Proposition 3 in Steinrücken et al. (2014). □
Appendix C
Conditional Probabilities: The Wright–Fisher Model
Under the Wright–Fisher model, the probability of the observed allele counts, conditional on the event SK that allele A is segregating in the final sample can be computed using the fact that
(C.1) |
where the third equality in equation (C.1) follows from the fact that the conditional probability depends only on the allele count oK and the final equality in equation (C.1) follows from the definition of . The probability in equation (C.1) is given by
(C.2) |
and the probability is given by
(C.3) |
where, as before, is given by the ith element of .
Note that it is easy to condition on other configurations of the final sample using a procedure similar to that used to derive equation (C.1). For example, for the event FK that allele A is segregating or fixed in the final sample, which we consider in the “Simulations Conditioning on Segregation or Fixation” section, the probability is given by
(C.4) |
where
(C.5) |
and
(C.6) |
Other probabilities can be obtained in a similar fashion.
Appendix D
Conditional Probabilities: Diffusion Model
Under the diffusion approximation, the probability of the observed allele counts conditional on the event SK that allele A is segregating in the final sample can be computed using the fact that
(D.1) |
where the second equality follows from the fact that the conditional probability depends only on the allele count oK in the final sample, and the final equality follows from equation (B.22).
The probability can be computed as
(D.2) |
In equation (D.2), the probability can be found easily by noting that if the only sampling time is tK, at which OK = j lineages are observed, then the probability computed using Procedure 2 is precisely the probability .
Consider the problem in which the only sampling occurs at time tK and denote the coefficient vectors for this related problem by and . Then, by equation (B.22), we see that
(D.3) |
where is obtained by computing the steps in Procedure 2. In Step 1, we compute
(D.4) |
which follows because the initial vector depends only on the initial frequency. In Step 2, we compute
(D.5) |
which follows because the coefficients are propagated directly from time to time tK. Finally, in Step 3 we have
(D.6) |
Combined together, equations (D.4), (D.5), and (D.6) yield
(D.7) |
Plugging equations (D.2) and (D.3) into equation (D.1) gives
(D.8) |
where
(D.9) |
Note that it is easy to condition on other configurations of the final sample by computing the probabilities and for some other event VK.
References
- Abramowitz M., Stegun I. A., editors. 1972. Handbook of mathematical functions with formulas, graphs, and mathematical tables. 9th ed. New York: Dover. [Google Scholar]
- Bollback JP, York TL, Nielsen R. 2008. Estimation of 2Nes from temporal allele frequency data. Genetics 179:497–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonhoeffer S, Barbour AD, De Boer RJ. 2002. Procedures for reliable estimation of viral fitness from time-series data. Proc R Soc Lond B 269:1887–1893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clarke B, Murray J. 1962. Changes in gene-frequency in Cepaea nemoralis (L.): the estimation of selective values. Heredity 17:467–476. [Google Scholar]
- Cook LM, Jones DA. 1996. The medionigra gene in the moth Panaxia dominula: the case for selection. Phil Trans R Soc Lond B. 351:1623–1634. [Google Scholar]
- Cook LM, Cowie RH, Jones JS. 1999. Change in morph frequency in the snail Cepaea nemoralis on the Marlborough Downs. Heredity 82:336–342. [DOI] [PubMed] [Google Scholar]
- Cook LM, Sutton SL, Crawford TJ. 2005. Melanic moth frequencies in Yorkshire, an old English industrial hot spot. J Hered. 96:522–528. [DOI] [PubMed] [Google Scholar]
- Durrett R. 2008. Probability models for DNA sequence evolution. New York: Springer Science & Business Media. [Google Scholar]
- Ewens WJ. 1963. Numerical results and diffusion approximations in a genetic process. Biometrika 50:241–249. [Google Scholar]
- Ewens WJ. 2004. Mathematical population genetics: I. 2nd ed. New York: Springer. [Google Scholar]
- Feder AF, Kryazhimskiy S, Plotkin JB. 2014. Identifying signatures of selection in genetic time series. Genetics 196:509–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Felsenstein J. 1976. The theoretical population genetics of variable selection and migration. Annu Rev Genet. 10:253–280. [DOI] [PubMed] [Google Scholar]
- Ferrer-Admetlla A, Leuenberger C, Jensen JD, Wegmann D. 2015. An approximate Markov model for the Wright-Fisher diffusion. Genetics 203:831–846. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fisher RA. 1922. On the dominance ratio. Proc R Soc Edin. 42:321–341. [Google Scholar]
- Fisher RA, Ford EB. 1947. The spread of a gene in natural conditions in a colony of the moth Panaxia dominula L. Edinnburgh: Oliver & Boyd.
- Foll M, Shim H, Jensen JD. 2015. WFABC: a Wright–Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Mol Ecol Resour. 15:87–98. [DOI] [PubMed] [Google Scholar]
- Gallet R, Cooper TF, Elena SF, Lenormand T. 2012. Measuring selection coefficients below : method, questions, and prospects. Genetics 190(1):175–186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gillespie JH. 1998. Population genetics: a concise guide. Baltimore: JHU Press. [Google Scholar]
- Goudsmit J, De Ronde A, Ho DD, Perelson AS. 1996. Human Immunodeficiency Virus fitness in vivo: calculations based on a single zidovudine resistance mutation at codon 215 of reverse transcriptase. J Virol. 70:5662–5664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haldane JBS. 1927. A mathematical theory of natural and artificial selection, Part V: selection and mutation. Math Proc Cambridge. 23:838–844. [Google Scholar]
- Harrigan PR, Bloor S, Larder BA. 1998. Relative replicative fitness of zidovudine-resistant Human Immunodeficiency Virus Type 1 isolates in vitro. J Virol. 72:3773–3778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hartl DL, Clark AG. 2007. Principles of population genetics. 4th ed. Sunderland: (MA): Sinauer Associates. [Google Scholar]
- Haubruge E, Arnaud L. 2001. Fitness consequences of malathion-specific resistance in Red Flour Beetle (Coleoptera: Tenebrionidae) and selection for resistance in the absence of malathion. J Econ Entomol. 94(2):552–557. [DOI] [PubMed] [Google Scholar]
- Hein J, Schierup MH, Wiuf C. 2005. Gene genealogies, variation and evolution. Milton Keynes (United Kingdom): Oxford University Press. [Google Scholar]
- Illingworth CJR, Parts L, Schiffels S, Liti G, Mustonen V. 2012. Quantifying selection acting on a complex trait using allele frequency time series data. Mol Biol Evol. 29:1187–1197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jenkins PA, Spanò D. 2015. Exact simulation of the Wright-Fisher diffusion. arXiv:1506.06998, http://arxiv.org/abs/1506.06998.
- Kamm JA, Terhorst J, Song YSS. 2016. Efficient computation of the joint sample frequency spectra for multiple populations. J Comput Graph. Stat. http://dx.doi.org/10.1080/10618600.2016.1159212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karlin S, Taylor H. 1981. A second course in stochastic processes. 2nd ed. New York: Academic Press. [Google Scholar]
- Labbé P, Sidos N, Raymond M, Lenormand T. 2009. Resistance gene replacement in the mosquito Culex pipiens: fitness estimation from long-term cline series. Genetics 182:303–312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lacerda M, Seoighe C. 2014. Population genetics inference for longitudinally-sampled mutants under strong selection. Genetics 198:1237–1250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lynch M. 1987. The consequences of fluctuating selection for isozyme polymorphisms in daphnia. Genetics 115:657–669. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malaspinas A, Malaspinas O, Evans SN, Slatkin M. 2012. Estimating allele age and selection coefficient from time-serial data. Genetics 192:599–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Manly BF. 1985. The statistics of natural selection. London: Chapman & Hall. [Google Scholar]
- Mathieson I, McVean G. 2013. Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193:973–984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nishino J. 2013. Detecting selection using time-series data of allele frequencies with multiple independent reference loci. G3 3:2151–2161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Hara RB. 2005. Comparing the effects of genetic drift and fluctuating selection on genotype frequency changes in the scarlet tiger moth. Proc Roy Soc Lond B. 272:211–217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rabiner LR. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc IEEE. 77:257–286. [Google Scholar]
- Reimchen TE, Nosil P. 2002. Temporal variation in divergent selection on spine number in Threespine Stickleback. Evolution 56:2472–2483. [DOI] [PubMed] [Google Scholar]
- Rouzine IM, Rodrigo A, Coffin JM. 2001. Transition between stochastic evolution and deterministic evolution in the presence of selection: general theory and application to virology. Microbiol Mol Biol R. 65:151–185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaffer HE, Yardley D, Anderson WW. 1977. Drift or selection: a statistical test of gene frequency variation over generations. Genetics 87:371–379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schraiber JG, Evans SN, Slatkin M. 2016. Bayesian inference of natural selection from allele frequency time series. Genetics 203:493–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Siepielski AM, DiBattista JD, Carlson SM. 2009. Its about time: the temporal dynamics of phenotypic selection in the wild. Ecol Lett. 12(11):1261–1276. [DOI] [PubMed] [Google Scholar]
- Song YS, Steinrücken M. 2012. A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection. Genetics 190(3):1117–1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steinrücken M, Bhaskar A, Song YS. 2014. A novel spectral method for inferring general diploid selection from time series genetic data. Ann Appl Stat. 8(4):2203–2222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Steinrücken M, Jewett EM, Song YS. 2015. Spectraltdf: transition densities of diffusion processes with time-varying selection parameters, mutation rates and effective population sizes. Bioinformatics 32(5):795–797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stine OC, Smith KD. 1990. The estimation of selection coefficients in afrikaners: Huntington disease, porphyria variegata, and lipoid proteinosis. Am J Hum Genet. 46:452–458. [PMC free article] [PubMed] [Google Scholar]
- Wakeley J. 2008. Coalescent theory: an introduction. Greenwood Village, CO: Roberts & Company Publishers. [Google Scholar]
- Wall S, Carter MA, Clarke B. 1980. Temporal changes of gene frequencies in cepaea hortensis. Biol J Linn Soc. 14(3–4):303–317. [Google Scholar]
- Watterson G. 1975. On the number of segregating sites in genetic models without recombination. Theor Popul Biol. 7:256–276. [DOI] [PubMed] [Google Scholar]
- Wilson SR. 1980. Analyzing gene-frequency data when the effective population size is finite. Genetics 95:489–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhao L, Lascoux M, Waxman D. 2014. Exact simulation of conditioned wright–fisher models. J Theor Biol. 363:419–426. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.