Skip to main content
Genetics logoLink to Genetics
. 2016 Mar 21;203(1):493–511. doi: 10.1534/genetics.116.187278

Bayesian Inference of Natural Selection from Allele Frequency Time Series

Joshua G Schraiber *,1, Steven N Evans †,, Montgomery Slatkin §
PMCID: PMC4858794  PMID: 27010022

Abstract

The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from modern individuals. We developed a Bayesian method to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in nonequilibrium populations. We introduce a novel path augmentation approach, in which we use Markov chain Monte Carlo to integrate over the space of allele frequency trajectories consistent with the observed data. Using simulations, we show that this approach has good power to estimate selection coefficients and allele age. Moreover, when applying our approach to data on horse coat color, we find that ignoring a relevant demographic history can significantly bias the results of inference. Our approach is made available in a C++ software package.

Keywords: Bayesian inference, diffusion theory, natural selection, path augmentation


THE ability to obtain high-quality genetic data from ancient samples is revolutionizing the way that we understand the evolutionary history of populations. One of the most powerful applications of ancient DNA (aDNA) is to study the action of natural selection. While methods making use of only modern DNA sequences have successfully identified loci evolving subject to natural selection (Nielsen et al. 2005; Voight et al. 2006; Pickrell et al. 2009), they are inherently limited because they look indirectly for selection, finding its signature in nearby neutral variation. In contrast, by sequencing ancient individuals, it is possible to directly track the change in allele frequency that is characteristic of the action of natural selection. This approach has been exploited recently, using whole-genome data to identify candidate loci under selection in European humans (Mathieson et al. 2015).

To infer the action of natural selection rigorously, several methods have been developed to explicitly fit a population genetic model to a time series of allele frequencies obtained via aDNA. Initially, Bollback et al. (2008) extended an approach devised by Williamson and Slatkin (1999) to estimate the population-scaled selection coefficient, α=2Nes, along with the effective size, Ne. To incorporate natural selection, Bollback et al. (2008) used the continuous diffusion approximation to the discrete Wright–Fisher model. This required them to use numerical techniques to solve the partial differential equation (PDE) associated with transition densities of the diffusion approximation to calculate the probabilities of the population allele frequencies at each time point. Ludwig et al. (2009) obtained an aDNA time series from six coat-color–related loci in horses and applied the method of Bollback et al. (2008) to find that two of them, ASIP and MC1R, showed evidence of strong positive selection.

Recently, a number of methods have been proposed to extend the generality of the Bollback et al. (2008) framework. To define the hidden Markov model they use, Bollback et al. (2008) were required to posit a prior distribution on the allele frequency at the first time point. They chose to use a uniform prior on the initial frequency; however, in truth the initial allele frequency is dictated by the fact that the allele at some point arose as a new mutation. Using this information, Malaspinas et al. (2012) developed a method that also infers allele age. They also extended the selection model of Bollback et al. (2008) to include fully recessive fitness effects. A more general selective model was implemented by Steinrücken et al. (2014), who model general diploid selection, and hence they are able to fit data where selection acts in an over- or underdominant fashion; however, Steinrücken et al. (2014) assumed a model with recurrent mutation and hence could not estimate allele age. The work of Mathieson and McVean (2013) is designed for inference of metapopulations over short timescales and so it is computationally feasible for them to use a discrete time, finite population Wright–Fisher model. Finally, the approach of Feder et al. (2014) is ideally suited to experimental evolution studies because they work in a strong selection, weak drift limit that is common in evolving microbial populations.

One key way that these methods differ from each other is in how they compute the probability of the underlying allele frequency changes. For instance, Malaspinas et al. (2012) approximated the diffusion with a birth–death type Markov chain, while Steinrücken et al. (2014) approximate the likelihood analytically, using a spectral representation of the diffusion discovered by Song and Steinrücken (2012). These different computational strategies are necessary because of the inherent difficulty in solving the Wright–Fisher partial differential equation. A different approach, used by Mathieson and McVean (2013) in the context of a densely sampled discrete Wright–Fisher model, is to instead compute the probability of the entire allele frequency trajectory in between sampling times.

In this work, we develop a novel approach for inference of general diploid selection and allele age from allele frequency time series obtained from aDNA. The key innovation of our approach is that we impute the allele frequency trajectory between sampled points when they are sparsely sampled. Moreover, by working with a diffusion approximation, we are able to easily incorporate general diploid selection and changing population size. This approach to inferring parameters from a sparsely sampled diffusion is known as high-frequency path augmentation and has been successfully applied in a number of contexts (Roberts and Stramer 2001; Golightly and Wilkinson 2005, 2008; Sørensen 2009; Fuchs 2013). The diffusion approximation to the Wright–Fisher model, however, has several features that are atypical in the context of high-frequency path augmentation, including a time-dependent diffusion coefficient and a bounded state space. We test this approach with simulation, showing that it is important to accurately model demography history, and then apply it to several data sets and find that we have power to estimate parameters of interest from real data.

Model and Methods

Overview

We begin by first reviewing the Wright–Fisher model, presenting its diffusion approximation as a stochastic differential equation (SDE). We then describe our inferential strategy, using a path augmentation approach, in which we model the underlying allele frequency trajectory as an additional (infinite-dimensional) parameter. This requires us to derive an expression for the likelihood of an allele frequency trajectory, including accounting for the fact that we model alleles that start from low frequency as new mutants. Finally, we describe a Markov chain Monte Carlo algorithm for obtaining a posterior distribution of the parameters of natural selection, as well as the allele frequency trajectory.

Generative model

We assume a randomly mating diploid population that is size N(t) at time t, where t is measured in units of 2N0 generations for some arbitrary, constant N0. At the locus of interest, the ancestral allele, A0, was fixed until some time t0 when the derived allele, A1, arose with diploid fitnesses as given in Table 1.

Table 1. Fitness scheme assumed in the text.

Genotype A1A1 A1A0 A0A0
Fitness 1+s2 1+s1 1

Given that an allele arises at some finite population frequency 0<x0<1 at some time t0, the trajectory of population frequencies of A1 at times tt0, (Xt)tt0, is modeled by the usual diffusion approximation to the Wright–Fisher model (and many other models such as the Moran model), which we henceforth call the Wright–Fisher diffusion. While many treatments of the Wright–Fisher diffusion define it in terms of the partial differential equation that characterizes its transition densities (e.g., Ewens 2004), we instead describe it as the solution of a SDE. Specifically, (Xt)tt0 satisfies the SDE

dXt=Xt(1Xt)(α1(2Xt1)α2Xt)dt+Xt(1Xt)ρ(t)dBtXt0=x0, (1)

where B is a standard Brownian motion, α1=2N0s1, α2=2N0s2, and ρ(t)=N(t)/N0. If Xt*=0 (resp. Xt*=1) at some time t*>t0, then Xt=0 (resp. Xt=1) for all tt*.

To make this description of the dynamics of the population allele frequency trajectory (Xt)tt0 complete, we need to specify an initial condition at time t0. In a finite population Wright–Fisher model we would take the allele A1 to have frequency 1/2N(t0) at the time t0 when it first arose in a single chromosome. This frequency converges to 0 when we pass to the diffusion limit, but we cannot start the Wright–Fisher diffusion at 0 at time t0 because the diffusion started at 0 remains at 0. Instead, we take the value of Xt0 to be some small, but arbitrary, frequency x0. This arbitrariness in the choice of x0 may seem unsatisfactory, but we will see that, in the context of a Bayesian inference procedure, the resulting posterior distribution for the parameters α1,α2,t0 converges as x00 to a limit that can be thought of as the posterior corresponding to a certain improper prior distribution, and so, in the end, there is actually no need to specify x0.

Finally, we require a model for how alleles arise. We assume that mutations at time t occur at a rate proportional to 2N(t) and that a mutant allele arises exactly once. Further constraining alleles to have arisen more recently than some time, T, in the past, this implies that the prior density of allele ages is

π(t0)=ρ(t0)T0ρ(t0)ds.

Taking the limit as T results in an improper distribution on allele age, which, in the context of our Bayesian inference algorithm, implies an improper prior distribution on t0 that is proportional to ρ. However, we emphasize that this still produces a proper posterior distribution on allele age (see also Slatkin 2001).

Finally, we model the data assuming that at known times t1,t2,,tk samples of known sizes n1,n2,,nk chromosomes are taken and c1,c2,,ck copies of the derived allele are found at the successive time points (Figure 1). Note that it is possible that some of the sampling times are more ancient than t0, the age of the allele.

Figure 1.

Figure 1

Taking samples from an allele frequency trajectory. An allele frequency trajectory is simulated from the Wright–Fisher diffusion (solid line). At each time, ti, a sample of size ni chromosomes is taken and ci copies of the derived allele are observed. Each point corresponds to the observed allele frequency of sample i. Note that t1 is more ancient than the allele age, t0.

Bayesian path augmentation

We are interested in devising a Bayesian method to obtain the posterior distribution on the parameters, α1, α2, and t0 given the sampled allele frequencies and sample times—data that we denote collectively as D. Because we are dealing with objects that do not necessarily have distributions that have densities with respect to canonical reference measures, it is convenient in the beginning to treat priors and posteriors as probability measures rather than as density functions. For example, the posterior is the probability measure

P(dα1,dα2,dt0|D)=P(dD|α1,α2,t0)π(dα1,dα2,dt0)P(dD), (2)

where π is a joint prior on the model parameters. However, computing the likelihood P(dD|α1,α2,t0) is computationally challenging because, implicitly,

P(dD|α1,α2,t0)=P(dD|X)P(dX|α1,α2,t0),

where the integral is over the (unobserved, infinite-dimensional) allele frequency path X=(Xt)tt0, P(|α1,α2,t0) is the distribution of a Wright–Fisher diffusion with selection parameters α1,α2 started at time t0 at the small but arbitrary frequency x0, and

P(dD|X)=i=1k(nici)Xtici(1Xti)nici

because we assume that sampled allele frequencies at the times t1,,tk are independent binomial draws governed by underlying population allele frequencies at these times. Integrating over the infinite-dimensional path (Xt)tt0 involves either solving partial differential equations numerically or using Monte Carlo methods to find the joint distribution of population allele frequency paths at the times t1,,tk.

To address this computational difficulty, we introduce a path augmentation method that treats the underlying allele frequency path (Xt)tt0 as an additional parameter. Observe that the posterior may be expanded out to

P(dα1,dα2,dt0|D)=P(dD|X)P(dX|α1,α2,t0)π(dα1,dα2,dt0)P(dD|X)P(dX|α1,α2,t0)π(dα1,dα2,dt0),

where we use primes to designate dummy variables over which we integrate. Thinking of the path (Xt)tt0 as another parameter and taking the prior distribution for the augmented family of parameters to be

P(dX|α1,α2,t0)π(dα1,dα2,dt0),

the posterior for the augmented family of parameters is

P(dα1,dα2,dt0;dX|D)=P(dD|X)P(dX|α1,α2,t0)π(dα1,dα2,dt0)P(dD|X)P(dX|α0,α0,t0)π(dα0,dα0,dt0). (3)

We thus see that treating the allele frequency path as a parameter is consistent with the initial “naive” Bayesian approach in that if we integrate the path variable out of the posterior (3) for the augmented family of parameters, then we recover the posterior (2) for the original family of parameters. In practice, this means that marginalizing out the path variable from a Monte Carlo approximation of the augmented posterior gives a Monte Carlo approximation of the original posterior.

Implicit in our setup is the initial frequency x0 at time t0. Under the probability measure governing the Wright–Fisher diffusion, any process started from x0=0 will stay there forever. Thus, we would be forced to make an arbitrary choice of some x0>0 as the initial frequency of our allele. However, we argue in the Appendix that in the limit as x00, we can achieve an improper prior distribution on the space of allele frequency trajectories. We stress that our inference using such an improper prior is not one that arises directly from a generative probability model for the allele frequency path. However, it does arise as a limit as the initial allele frequency x0 goes to zero of inferential procedures based on generative probability models and the limiting posterior distributions are probability distributions. Therefore, the parameters α1,α2,t0 retain their meaning, our conclusions can be thought of as approximations to those that we would arrive at for all sufficiently small values of x0, and we are spared the necessity of making an arbitrary choice of x0.

Path likelihoods

Most instances of Bayesian inference in population genetics have hitherto involved finite-dimensional parameters. Recall that for continuous, finite-dimensional parameters, one simply includes the prior density of the parameter value in place of the prior probability. Finite-dimensional parameters usually have densities defined with respect to Lebesgue measure in an appropriate dimension; however, there is no infinite-dimensional Lebesgue measure against which to define a density for our infinite-dimensional augmented path. We thus require a reference measure on the infinite-dimensional space of paths that will play a role analogous to that of Lebesgue measure in the finite-dimensional case, allowing us to write down the probability density for each sampled path.

To see what is involved, suppose we have a diffusion process (Zt)tt0 that satisfies the SDE

dZt=a(Zt,t)dt+dBtZt0=z0, (4)

where B is a standard Brownian motion (the Wright–Fisher diffusion is not of this form but, as we shall soon see, it can be reduced to it after suitable transformations of time and space). Let be the distribution of (Zt)tt0—this is a probability distribution on the space of continuous paths that start from position z0 at time t0. While the probability assigned by to any particular path is zero, we can, under appropriate conditions, make sense of the probability of a path under relative to its probability under the distribution of Brownian motion. If we denote by W the distribution of Brownian motion starting from position z0 at time t0, then Girsanov’s theorem (Girsanov 1960) gives the density of the path segment (Zs)t0st under relative to W as

ddW((Zs)t0st)=exp{t0ta(Zs,s)dZs12t0ta2(Zs,s)ds}, (5)

where the first integral in the exponentiand is an Itô integral. For (5) to hold, the integral t0ta2(Zs,s)ds must be finite, in which case the Itô integral t0ta(Zs,s)dZs is also well defined and finite.

However, the Wright–Fisher SDE (1) is not of the form (4). In particular, the factor multiplying the infinitesimal Brownian increment dBt (the so-called diffusion coefficient) depends on both space and time. To deal with this issue, we first apply a well-known time transformation (see, e.g., Slatkin and Hudson 1991 and Griffiths and Tavare 1994) and consider the process (X˜τ)τ0 given by X˜τ=Xf1(τ), where

f(t)=t0t1ρ(s)ds,tt0. (6)

It is not hard to see that (X˜τ)τ0 satisfies the following SDE with a time-independent diffusion coefficient,

dX˜τ=ρ(f1(τ))X˜τ(1X˜τ)(α1(2X˜τ1)α2X˜τ)dτ+X˜τ(1X˜τ)dB˜τX˜0=x0,

where B˜ is a standard Brownian motion. Next, we employ an angular space transformation first suggested by Fisher (1922), Yτ=arccos(12X˜τ). Applying Itô’s lemma (Itô 1944) shows that (Yτ)τ0 is a diffusion that satisfies the SDE

dYτ=14(ρ(f1(τ))sin(Yτ)(α2+(2α1α2)cos(Yτ))2cot(Yτ))dτ+dWτY0=y0=arccos(12x0), (7)

where W is a standard Brownian motion. If the process X hits either of the boundary points 0,1, then it stays there, and the same is true of the time and space transformed process Y for its boundary points 0,π.

The restriction of the distribution of the time and space transformed process Y to some set of paths that do not hit the boundary is absolutely continuous with respect to the distribution of standard Brownian motion restricted to the same set; that is, the distribution of Y restricted to such a set of paths has a density with respect to the distribution of Brownian motion restricted to the same set. However, the infinitesimal mean in (7) (that is, the term multiplying dτ) becomes singular as Yτ approaches the boundary points 0 and π, corresponding to the boundary points 0 and 1 for allele frequencies. These singularities prevent the process Y from reentering the interior of its state space and ensure that a Wright–Fisher path will be absorbed when the allele is either fixed or lost. A consequence is that the density of the distribution of Y relative to that of a Brownian motion blows up as the path approaches the boundary. We are modeling the appearance of a new mutation in terms of a Wright–Fisher diffusion starting at some small initial frequency x0 at time t0 and we want to perform our parameter inference in such a way that we get meaningful answers as x00. This suggests that rather than working with the distribution W of Brownian motion as a reference measure it may be more appropriate to work with a tractable diffusion process that exhibits similar behavior near the boundary point 0.

To start making this idea of matching singularities more precise, consider a diffusion process (Z¯t)tt0 that satisfies the SDE

dZ¯t=b(Z¯t,t)dt+dB¯tZ¯0=z0, (8)

where B¯ is a standard Brownian motion. Write for the distribution of the diffusion process (Z¯t)tt0 and recall that is the distribution of a solution of (4). If (Zs)t0st is a segment of path such that both t0ta2(Zs,s)ds< and t0tb2(Zs,s)ds<, then

dd((Zs)t0st)=ddW((Zs)t0st)/ddW((Zs)t0st)=exp{t0t(a(Zs,s)b(Zs,s))dZs12t0t(a2(Zs,s)b2(Zs,s))ds}. (9)

Note that the right-hand side will stay bounded if one considers a sequence of paths, indexed by η, (Zsη)t0st, with t0ta2(Zsη,s)ds< and t0tb2(Zsη,s)ds<, provided that t0t(a2(Zsη,s)b2(Zsη,s))ds stays bounded. These manipulations with densities may seem somewhat heuristic, but they can be made rigorous and, moreover, the form of d/d follows from an extension of Girsanov’s theorem that gives the density of with respect to directly without using the densities with respect to W as intermediaries (see, for example, Kallenberg 2002, theorem 18.10).

We wish to apply this observation to the time and space transformed Wright–Fisher diffusion of (7). Because

12cot(y)+14ρ(f1(t))sin(y)((2α1α2)cos(y)+α2)=12y+O(y)

when y is small, an appropriate reference process should have infinitesimal mean b(y,t)1/(2y) as y0. Following suggestions by Schraiber et al. (2013) and Jenkins (2013), we compute path densities relative to the distribution of the Bessel(0) process, a process that is the solution of the SDE

dY¯t=12Y¯tdt+dB¯t,Y¯0=y0=arccos(12x0) (10)

up until the first time that Y¯t hits 0, after which time Y¯t stays at 0 (Revuz and Yor 1999, Chap. XI).

As we show more explicitly in the Appendix, this choice of dominating measure allows us to arrive at a proper posterior distribution as we send the initial frequency of the allele down to 0. In brief, if we write y0 and y0 for the respective distributions of the solutions of (7) and (10) to emphasize the dependence on y0 (equivalently, on the initial allele frequency x0), then there are σ-finite measures 0 and 0 with infinite total mass such that for each ϵ>0

limy00y0((Yt)tϵ|Yϵ>0)=0((Yt)tϵ)/0(Yϵ>0)

and

limy00y0((Y¯t(tϵ|Y¯ϵ>0)=0((Y¯t)tϵ)/0(Y¯ϵ>0),

where the numerators and denominators in the last two equations are all finite. Moreover, 0 has a density with respect to 0 that arises by naively taking limits as y00 in the functional form of the density of y0 with respect to y0 [we say “naively” because y0 and y0 assign all of their mass to paths that start at position y0=arccos(12x0) at time 0, whereas 0 and 0 assign all of their mass to paths that start at position 0 at time 0, and so the set of paths at which it is relevant to compute the density changes as y00]. As we have already remarked, the limit of our Bayesian inferential procedure may be thought of as Bayesian inference with an improper prior, but we stress that the resulting posterior is proper.

The notion of the infinite measure 0 may seem somewhat forbidding, but this measure is characterized by the simple properties

0(Y¯ϵdy)=y2ϵ2exp{y22ϵ}dy,y>0,

so that 0(Y¯ϵ>0)=π/2(1/ϵ), and conditional on the event {Y¯ϵ=y} the evolution of (Y¯t)tϵ is exactly that of the Bessel(0) process started at position y at time ε. In the Appendix, we provide a more explicit construction of the measure 0 as part of our derivation of the proposal ratios in our MCMC algorithm. Moreover, conditional on the event {Y¯s=a,Y¯u=b} for 0s<u and a,b>0, the evolution of the “bridge” (Y¯u)stu is the same as that of the corresponding bridge for a Bessel(4) process; a Bessel(4) process satisfies the SDE

dY^t=32Y^tdt+dB^t.

Very importantly for the sake of simulations, the Bessel(4) process is just the radial part of a four-dimensional standard Brownian motion—in particular, this process started at 0 leaves immediately and never returns.

Note that the Bessel(0) process arises naturally because our space transformation xarccos(12x)=0x(1/w(1w))dw is approximately x0x(1/w)dw=2x when x>0 is small. Interestingly, a multiple of the square of the Bessel(0) process, sometimes called Feller’s continuous state branching processes, arises naturally as an approximation to the Wright–Fisher diffusion for low frequencies and has a long history in population genetics (Haldane 1927; Feller 1951).

The joint likelihood of the data and the path

To write down the full likelihood of the observations and the path, we make the assumption that the population size function ρ(t) is continuously differentiable except at a finite set of times d1<d2<<dM. Further, we require that ρ(di+)=limtdiρ(t) exists and is equal to ρ(di) while ρ(di)=limtdiρ(t) also exists [although it may not necessarily equal ρ(di)].

We can write the joint likelihood of the data and the path as

L(D,(Yt)t0|α1,α2,t0)=F(D|(Yt)t0,t0)dd((Yt)t0;α1,α2,t0),

where F() is the binomial sampling probability of the observed allele frequencies, is the distribution of transformed Wright–Fisher paths, and is the distribution of Bessel(0) paths. In the Appendix, we show that

L(D,(Ys)0stk|α1,α2,t0)=exp{A(Yf(tk),tk)+A(Yf(dm),dm)(A(Yf(dK),dK)+A(Yf(t0),t0))+i=mK[A(Yf(di+1),di+1)A(Yf(di),di)]t0tkB(Yf(s),s)ds12t0tkC(Yf(s),s)ds12t0tkD(Yf(s),s)ds}×i=1k(nici)(1cos(Yf(ti))2)ci(1+cos(Yf(ti))2)nici, (11)

where f is as in (6), m=min{i:di>t0} and K=max{i:di>tk}, and

A(y,t)=log(y)218(ρ(t)cos(y)(2α2+(2α1α2)cos(y))+4log(sin(y)))B(y,t)=18dρdt(t)cos(y)(2α2+(2α1α2)cos(y))C(y,t)=12(α1cos(y)+csc(y)2ρ(t))12y2ρ(t)D(y,t)=116ρ(t)(ρ(t)sin(y)(α2+(2α1α2)cos(y))2cot(y))214y2ρ(t).

While this expression may appear complicated, it has the important feature that, unlike the form of the likelihood that would arise by simply applying Girsanov’s theorem, it involves only Lebesgue (indeed Riemann) integrals and not Itô integrals, which, as we recall in the Appendix, are known from the literature to be potentially difficult to compute numerically.

Metropolis–Hastings algorithm

We now describe a Markov chain Monte Carlo method for Bayesian inference of the parameters α1, α2, and t0, along with the allele frequency path (Xt)tt0 [equivalently, the transformed path (Yt)t0]. While updates to the selection parameters α1 and α2 do not require updating the path, updating the time t0 at which the derived allele arose requires proposing updates to the segment of path from t0 up to the time of the first sample with a nonzero number of derived alleles. Additionally, we require proposals to update small sections of the path without updating any parameters and proposals to update the allele frequency at the most recent sample time.

Interior path updates:

To update a section of the allele frequency, we first choose a time s1(t0,tk) uniformly at random and then choose a time s2 that is a fixed fraction of the path length subsequent to s1. We prefer this approach of updating a fixed fraction of the path to an alternative strategy of holding s2s1 constant because paths for very strong selection may be quite short. Recalling the definition of f from (6), we subsequently propose a new segment of transformed path between the times f(s1) and f(s2) while keeping the values Yf(s1) and Yf(s2) fixed (Figure 2A). Such a path that is conditioned to take specified values at both end points of the interval over which it is defined is called a bridge, and by updating small portions of the path instead of the whole path at once, we are able to obtain the desirable behavior that our Metropolis–Hastings algorithm is able to stay in regions of path space with high posterior probability. If we instead drew the whole path each time, we would much less efficiently target the posterior distribution.

Figure 2.

Figure 2

Illustration of path updates. Solid circles correspond to the same sample frequencies as in Figure 1. The shaded line in each panel is the current allele frequency trajectory and the dashed black lines are the proposed updates. In A, an interior section of path is proposed between points s1 and s2. In B, a new allele age, t0 is proposed and a new path is drawn between t0 and ts. In C, a new most recent allele frequency Ytk is proposed and a new path is drawn between tf and tk.

Noting that bridges must be sampled against the transformed timescale, the best bridges for the allele frequency path would be realizations of Wright–Fisher bridges themselves. However, sampling Wright–Fisher bridges is challenging (but see Schraiber et al. 2013; Jenkins and Spano 2015), so we instead opt to sample bridges for the transformed path from the Bessel(0) process. Sampling Bessel(0) bridges can be accomplished by first sampling Bessel(4) bridges (as described in Schraiber et al. 2013) and then recognizing that a Bessel(4) process is the same as a Bessel(0) process conditioned to never hit 0 and hence has the same bridges—in the language of the general theory of Markov processes, the Bessel(0) and Bessel(4) processes are Doob h-transforms of each other and it is well known that processes related in this way share the same bridges. We denote by (Yτ)τ0 the path that has the proposed bridge spliced in between times f(s1) and f(s2) and coincides with (Yτ)τ0 outside the interval [f(s1),f(s2)].

In the Appendix, we show that the acceptance probability in this case is simply

min{1,L(D,(Yτ)f(s1)τf(s2)|α1,α2,t0)L(D,(Yτ)f(s1)τf(s2)|α1,α2,t0)}. (12)

Note that we need to compute the likelihood ratio only for the segment of transformed path that changed between the times f(s1) and f(s2).

Allele age updates:

The first sample time with a nonzero count of the derived allele (Figure 2B) is ts, where s=min{i:ci>0}. We must have t0<ts. Along with proposing a new value t0 of the allele age t0 we propose a new segment of the allele frequency path from time t0 to time ts. Changing the allele age t0 to some new proposed value t0 changes the definition of the function f in (6). Write f(t)=t0t(1/ρ(s))ds, where we stress that the prime does not denote a derivative. The proposed transformed path Y consists of a new piece of path that goes from location 0 at time 0 to location Yf(ts) at time f(ts) and then has Yf(t)=Yf(t) for tts. Recall that we use the improper prior ρ(t0) for t0, which reflects the fact that an allele is more likely to arise during times of large population size (Slatkin 2001). In the Appendix, we show that the acceptance probability is

min{1,L(D,(Yτ)0τf(ts)|α1,α2,t0)L(D,(Yτ)0τf(ts)|α1,α2,t0)ψ(Yf(ts);f(ts))ψ(Yf(ts);f(ts)))q(t0|t0)q(t0|t0)ρ(t0)ρ(t0)}, (13)

where, in the notation of the Path likelihoods section,

ψ(y;ϵ)=y2ϵ2exp{y22ϵ}=0(Y¯ϵdy)dy (14)

is the density of the so-called entrance law for the Bessel(0) process that appears in the characterization of the σ-finite measure 0 and q(t0|t0) is the proposal distribution of t0 (in practice, we use a half-truncated normal distribution centered at t0, with the upper truncation occurring at the first time of nonzero observed allele frequency).

Most recent allele frequency update:

While the allele frequency at sample times t1,t2,,tk1 is updated implicitly by the interior path update, we update the allele frequency at the most recent sample time tk separately (note that the most recent allele frequency is not an additional parameter, but simply a random variable with a distribution implied by the Wright–Fisher model on paths). We do this by first proposing a new allele frequency Yf(tk) and then proposing a new bridge from Yf(tf) to Yf(tk) where tf(tk1,tk) is a fixed time (Figure 2C). If q(Yf(tk)|Yf(tk)) is the proposal density for Yf(tk) given Yf(tk) [in practice, we use a truncated normal distribution centered at Yf(tk) and truncated at 0 and π], then, arguing along the same lines as the interior path update and the allele age update, we accept this update with probability

min{1,L(D,(Yτ)f(tf)τf(tk)|α1,α2,t0)L(D,(Yτ)f(tf)τf(tk)|α1,α2,t0)q(Yf(tk)|Yf(tk))q(Yf(tk)|Yf(tk))Q(Yf(tf),Yf(tk);f(tk)f(tf))Q(Yf(tf),Yf(tk);f(tk)f(tf))}, (15)

where

Q(x,y;t)=ytexp{x2+y22t}I1(xyt) (16)

is the transition density of the Bessel(0) process [with I1() being the Bessel function of the first kind with index 1—see Knight 1981, section 4.3.6]. Again, it is necessary to compute the likelihood ratio only for the segment of transformed path that changed between the times f(tf) and f(tk).

Updates to α1 and α2

Updates to α1 and α2 are conventional scalar parameter updates. For example, letting q(α1|α1) be the proposal density for the new value of α1, we accept the new proposal with probability

min{1,L(D,(Yτ)τ0|α1,α2,t0)L(D,(Yτ)τ0|α1,α2,t0)q(α1|α1)q(α1|α1)π(α1,α2,t0)π(α1,α2,t0)}.

The acceptance probability for α2 is similar. For both α1 and α2, we use a heavy-tailed Cauchy prior with median 0 and scale parameter 100, and we take the parameters α1,α2,t0 to be independent under the prior distribution. In addition, we use a normal proposal distribution, centered around the current value of the parameter. Here, it is necessary to compute the likelihood across the whole path.

Data availability

C++ software implementing the method described in this article is freely available under a GNU Public License at https://github.com/Schraiber/selection.

Results

We first test our method using simulated data to assess its performance and then apply it to two real data sets from horses.

Simulation performance

To test the accuracy of our MCMC approach, we performed two sets of simulations. First, we simulated data under a constant demographic history to assess the quality of parameter inference under a simple model. Second, we simulated data under the horse demographic history of Der Sarkissian et al. (2015) and compared inferences performed with and without accounting for the demographic history.

In the constant demography simulations, we simulated allele frequency trajectories with ages uniformly distributed between 0.1 and 0.3 diffusion time units ago, evolving with α1 and α2 uniformly distributed between 0 and 100. We simulate allele frequency trajectories using a Euler approximation to the Wright–Fisher SDE (1) with ρ(t)1. At each time point between 0.4 and 0.0 in steps of 0.05, we simulated the sampling of 20 chromosomes.

We then ran the MCMC algorithm for 1,000,000 generations, sampling every 1000 generations to obtain 1000 MCMC samples for each simulation. After discarding the first 500 samples from each MCMC run as burn-in, we computed the effective sample size of the allele age estimate, using the R package coda (Plummer et al. 2006). For the analysis of the simulations, we included only simulations that had an effective sample size >150 for the allele age, resulting in retaining 744 of 1000 simulations.

Because our MCMC analysis provides a full posterior distribution on parameter values, we summarized the results by computing the maximum a posteriori estimate of each parameter. We find that across the range of simulated α1 values, estimation is quite accurate (Figure 3A). There is some downward bias for large true values of α1, indicating the influence of the prior. On the other hand, the strength of selection in favor of the homozygote, α2, is less well estimated, with a more pronounced downward bias (Figure 3B). This is largely because most simulated alleles do not reach sufficiently high frequency for homozygotes to be common. Hence, there is very little information regarding the fitness of the homozygote. Allele age is estimated accurately, although there is a slight bias toward estimating a more recent age than the truth (Figure 3C).

Figure 3.

Figure 3

Maximum a posteriori estimates of different parameters. Panels A, B and C show the results for α1, α2, and t0, respectively. Each panel shows the true value of a parameter on the x-axis, and the inferred value is on the y-axis. Dashed line is y=x.

When simulating under the horse demographic history, we drew 1000 allele ages with probability proportional to ρ(t) for t between 0.1 and 0.3 diffusion time units ago. Similarly to the simulations with constant demography, we drew α1 and α2 uniformly between 0 and 100 and then simulated allele frequency trajectories, using a Euler approximation to (1) with ρ(t) given by the history inferred by Der Sarkissian et al. (2015). The sampling scheme is identical to the constant demography simulations.

We ran our simulated data through two separate MCMC pipelines, one accounting for the true simulated demographic history and the other assuming a constant population size. All other settings were identical to the analysis of the data simulated under constant demography. We retained MCMC runs where the sampling likelihood, path likelihood, α1 estimate, α2 estimate, and allele age estimate all had effective sample sizes >50, resulting in 561 analyses retained from the inference with variable demography, 647 analyses retained from the inference with constant demography, and 454 analyses that were retained in both.

To quantify the overall impact of demographic model misspecification on parameter inference, we approximated the posterior root mean square error of a parameter (generically θ) by averaging over the posterior distribution,

RMSE(θ)=((θ^θ)2P(θ^|D)dθ^)1/2(1Ni(θ^iθ)2)1/2,

where the sum is over retained MCMC samples.

We found substantially smaller root mean square error (RMSE) for inference of α1 when demography is properly modeled (Figure 4). While inference of α2 was similar between the two models, there is somewhat larger RMSE when demography is incorrectly assumed to be constant. Interestingly, there seem to be two regimes of error in allele age estimation: for the most recent allele ages, modeling demography results in larger RMSE, while for more ancient ages, inferences with constant population size result in larger RMSE. These are likely caused by a particular feature of this demographic model, which is a very strong bottleneck inferred in the recent past. Because alleles are more likely to arise during periods of larger population size, accounting for demographic history extends the tail of the posterior distribution farther into the past, when the population was larger.

Figure 4.

Figure 4

Comparison of root mean square error (RSME) when inference is performed with the proper (variable) demographic model on the x-axis compared to a misspecified constant demography model on the y-axis. Panels A, B and C show the results for α1, α2, and t0, respectively. Each point represents a single simulation, and points are shaded according to simulated parameter values (scale on the right of each panel). Solid line is y=x.

Application to ancient DNA

We applied our approach to real data by reanalyzing the MC1R and ASIP data from Ludwig et al. (2009). In contrast to earlier analyses of these data, we explicitly incorporated the demography of the domesticated horse, as inferred by Der Sarkissian et al. (2015), using a generation time of 8 years. Table 2 shows the sample configurations and sampling times corresponding to each locus, where diffusion units are scaled to 2N0, with N0=16,000 being the most recent effective size reported by Der Sarkissian et al. (2015). For comparison, we also analyzed the data assuming the population size has been constant at N0.

Table 2. Sample information for horse data.

Sample time (years BCE) 20,000 13,100 3,700 2,800 1,100 500
Sample time (diffusion units) 0.078 0.051 0.014 0.011 0.004 0.002
Sample size 10 22 20 20 36 38
Count of ASIP alleles 0 1 15 12 15 18
Count of MC1R alleles 0 0 1 6 13 24

Diffusion time units are calculated assuming N0=2500 and a generation time of 5 years. BCE, Before Common Era.

With the MC1R locus, we found that posterior inferences about selection coefficients can be strongly influenced by whether demographic information is included in the analysis (Figure 5). Marginally, we see that incorporating demographic information results in an inference that α1 is larger than the constant-size model [maximum a posteriori (MAP) estimates of 267.6 and 74.1, with and without demography, respectively; Figure 5A], while α2 is inferred to be smaller (MAP estimates of 59.1 and 176.2, with and without demography, respectively; Figure 5B). This has very interesting implications for the mode of selection inferred on the MC1R locus. Recall that α2>α1>0 is direction selection, in which the derived allele is always beneficial; α2<α1>0 is overdominant selection, in which the heterozygote is favored; and α2>α1<0 is underdominant selection, in which the heterozygote is disfavored. With constant demography, the trajectory of the allele is estimated to be shaped by positive directional selection (joint MAP, α1=87.6, α2=394.8; Figure 5C), while when demographic information is included, selection is inferred to act in an overdominant fashion (joint MAP, α1=262.5, α2=128.1; Figure 5D).

Figure 5.

Figure 5

Posterior distributions of selection coefficients for the MC1R locus. A and B show marginal distributions of α1 and α2, respectively, with the solid line indicating the posterior obtained from an analysis including the full demographic history and the dotted line showing what would be inferred in a constant size population. C and D show contour plots of the joint distribution of α1 and α2 without and with demography, respectively.

Incorporation of demographic history also has substantial impacts on the inferred distribution of allele ages (Figure 6). Most notably, the distribution of the allele age for MC1R is significantly truncated when demography is incorporated, in a way that correlates to the demographic events (Supplemental Material, Figure S1). While both the constant-size history and the more complicated history result in a posterior mode at approximately the same value of the allele age, the domestication bottleneck inferred by Der Sarkissian et al. (2015) makes it far less likely that the allele rose more anciently than the recent population expansion. Because the allele is inferred to be younger under the model incorporating demography, the strength of selection in favor of the heterozygote must be higher to allow it to escape low frequency quickly and reach the observed allele frequencies. Hence, α1 is inferred to be much higher when demographic history is explicitly modeled.

Figure 6.

Figure 6

(A and B) Posterior distribution on allele frequency paths for the MC1R locus. Each panel shows the sampled allele frequency data (solid circles), the point-wise median (black), 25% and 75% quantiles (red), and 5% and 95% quantiles (green) of the posterior distribution on paths and the posterior distribution on allele age (blue). A reports inference with constant demography, and B shows the result of inference with the full demographic history.

Incorporation of demographic history has an even more significant impact on inferences made about the ASIP locus (Figure 7). Most strikingly, while α1 is inferred to be very large without demography, it is inferred to be close to 0 when demography is incorporated (MAP estimates of 16.3 and 159.9 with and without demography, respectively; Figure 7A). On the other hand, inference of α2 is largely unaffected (MAP estimates of 34.7 and 39.8 with and without demography, respectively; Figure 7B). Interestingly, this has an opposite implication for the mode of selection compared to the results for the MC1R locus. With a constant-size demographic history, the allele is inferred to have evolved under overdominance (joint MAP, α1=153.3, α2=47; Figure 7C), but when the more complicated demography is modeled, the allele frequency trajectory is inferred to be shaped by positive, nearly additive, selection (joint MAP, α1=16.4, α2=46.8; Figure 7D).

Figure 7.

Figure 7

(A–D) Posterior distributions of selection coefficients for the ASIP locus. Panels are as in Figure 5.

Incorporating demography has a similarly opposite effect on inference of allele age (Figure 8). In particular, the allele is inferred to be much older when demography is modeled and features a multimodal posterior distribution on allele age, with each mode corresponding to a period of historically larger population size (Figure S2). Because the allele is inferred to be substantially older when demography is modeled, selection in favor of the heterozygote must have been weaker than would be inferred with the much younger age. Hence, the mode of selection switches from one of overdominance in a constant demography to one in which the homozygote is more fit than the heterozygote.

Figure 8.

Figure 8

(A and B) Posterior distribution on allele frequency paths for the ASIP locus. Panels are as in Figure 6.

Discussion

Using DNA from ancient specimens, we have obtained a number of insights into evolutionary processes that were previously inaccessible. One of the most interesting aspects of ancient DNA is that it can provide a temporal component to evolution that has long been impossible to study. In particular, instead of making inferences about the allele frequencies, we can directly measure these quantities. To take advantage of these new data, we developed a novel Bayesian method for inferring the intensity and direction of natural selection from allele frequency time series. To circumvent the difficulties inherent in calculating the transition probabilities under the standard Wright–Fisher process of selection and drift, we used a data augmentation approach in which we learn the posterior distribution on allele frequency paths. Doing this not only allows us to efficiently calculate likelihoods, but also provides an unprecedented glimpse at the historical allele frequency dynamics.

The key innovation of our method is to apply high-frequency path augmentation methods (Roberts and Stramer 2001) to analyze genetic time series. The logic of the method is similar to the logic of a path integral, in which we average over all possible allele frequency trajectories that are consistent with the data (Schraiber 2014). By choosing a suitable reference probability distribution against which to compute likelihood ratios, we were able to adapt these methods to infer the age of alleles and properly account for variable population sizes through time. Moreover, because of the computational advantages of the path augmentation approach, we were able to infer a model of general diploid selection. To our knowledge, ours is the first work that can estimate both allele age and general diploid selection while accounting for demography.

Using simulations, we showed that our method performs well for strong selection and densely sampled time series. However, it is worth considering the work of Watterson (1979), who showed that even knowledge of the full trajectory results in very flat likelihood surfaces when selection is not strong. This is because for weak selection, the trajectory is extremely stochastic and it is difficult to disentangle the effects of drift and selection (Schraiber et al. 2013).

We also used simulations to test how misspecification of demographic history affects inference. We saw substantially increased posterior root mean square error in inference of selection parameters if demographic history is misspecified. To examine the impact of demographic history in the context of real data, we then applied our method to a classic data set from horses. We found that our inference of both the strength and mode of natural selection depended strongly on whether we incorporated demography. For the MC1R locus, a constant-size demographic model results in an inference of positive selection, while the more complicated demographic model inferred by Der Sarkissian et al. (2015) causes the inference to tilt toward overdominance, as well as a much younger allele age. In contrast, the ASIP locus is inferred to be overdominant under a constant-size demography, but the complicated demographic history results in an inference of positive selection and a much older allele age.

These results stand in contrast to those of Steinrücken et al. (2014), who found that the most likely mode of evolution for both loci under a constant demographic history is one of overdominance. There are a several reasons for this discrepancy. First, we computed the diffusion time units differently, using N0=16,000 and a generation time of 8 years, as inferred by Der Sarkissian et al. (2015), while Steinrücken et al. (2014) used N0=2500 (consistent with the bottleneck size found by Der Sarkissian et al. 2015) and a generation time of 5 years. Hence, our constant-size model has far less genetic drift than the constant-size model assumed by Steinrücken et al. (2014). This emphasizes the importance of inferring appropriate demographic scaling parameters, even when a constant population size is assumed. Second, we use MCMC to integrate over the distribution of allele ages, which can have a very long tail going into the past, while Steinrücken et al. (2014) assume a fixed allele age.

One key limitation of this method is that it assumes that the aDNA samples all come from the same, continuous population. If there is in fact a discontinuity in the populations from which alleles have been sampled, this could cause rapid allele frequency change and create spurious signals of natural selection. Several methods have been devised to test this hypothesis (Sjödin et al. 2014), and one possibility would be to apply these methods to putatively neutral loci sampled from the same individuals, thus determining which samples form a continuous population. Alternatively, if our method is applied to a number of loci throughout the genome and an extremely large portion of the genome is determined to be evolving under selection, this could be evidence for model misspecification and suggest that the samples do not come from a continuous population.

An advantage of the method that we introduced is that it may be possible to extend it to incorporate information from linked neutral diversity. In general, computing the likelihood of neutral diversity linked to a selected site is difficult and many researchers have used Monte Carlo simulation and importance sampling (Slatkin 2001; Coop and Griffiths 2004; Chen and Slatkin 2013). These approaches average over allele frequency trajectories in much the same way as our method; however, each trajectory is drawn completely independently of the previous trajectories. Using a Markov chain Monte Carlo approach, as we do here, has the potential to ensure that only trajectories with a high posterior probability are explored and hence greatly increase the efficiency of such approaches.

Acknowledgments

We are grateful for helpful comments from and discussion with Yun Song, Matthias Steinrucken, and Anand Bhaskar during the conception and implementation of this work. We also thank two anonymous reviewers for their helpful comments that improved the clarity and thoroughness of this article. J.G.S. is supported by National Science Foundation (NSF) grant DBI-1402120 (to J.G.S.) and National Institutes of Health (NIH) grant R01-GM40282 (to M.S.); S.N.E. is supported in part by NSF grant DMS-0907630, NSF grant DMS-1512933, and NIH grant 1R01GM109454-01.

Appendix

A Proper Posterior in the Limit As the Initial Allele Frequency Approaches 0

For reasons that we explain in the Path likelihoods section, we reparameterize our model by replacing the path variable (Xt)tt0 with a deterministic time and space transformation of it (Yt)t0 that takes values in the interval [0,π] with the boundary point 0 (resp. π) for (Yt)t0 corresponding to the boundary point 0 (resp. 1) for (Xt)tt0. The transformation producing (Yt)t0 is such that (Xt)tt0 can be recovered from (Yt)t0 and t0.

Implicit in our setup is the initial frequency x0 at time t0 that corresponds to an initial value y0 at time 0 of the transformed process (Yt)t0. For the moment, let us make the dependence on y0 explicit by including it in relevant notation as a superscript. For example, y0(|α1,α2,t0) is the prior distribution of (Yt)t0 given the specified values of the other parameters α1,α2,t0. We construct a tractable “reference” process (Y¯t)t0 with distribution y0() such that the probability distribution y0(|α1,α2,t0) has a density with respect to y0()—explicitly, y0() is the distribution of a Bessel(0) process started at location y0 at time 0. That is, there is a function Φy0(;α1,α2,t0) on path space such that

y0(dy|α1,α2,t0)=Φy0(y;α1,α2,t0)y0(dy) (A1)

for a path (yt)t0. Assuming that π has a density with respect to Lebesgue measure that, with a slight abuse of notation, we also denote by π, the outcome of our Bayesian inferential procedure is determined by the ratios

(dD|y,t0**)Φy0(y;α1**,α2**,t0**)π(α1**,α2**,t0**)(dD|y,t0*)Φy0(y;α1*,α2*,t0*)π(α1*,α2*,t0*) (A2)

for pairs of augmented parameter values (y,α1*,α2*,t0*) and (y,α1**,α2**,t0**) (i.e., the Metropolis–Hastings ratio).

Under the probability measure y0(|α1,α2,t0), the process (Yt)t0 converges in distribution as y00 (equivalently, x00) to the trivial process that starts at location 0 at time 0 and stays there. However, for all ϵ>0 the conditional distribution of (Yt)tϵ under the probability measure y0(|α1,α2,t0) given the event {Yϵ>0} converges to a nontrivial probability measure as y00. Similarly, the conditional distribution of the reference diffusion process (Y¯t)tϵ under the probability measure y0() given the event {Y¯ϵ>0} converges as y00 to a nontrivial limit. There are σ-finite measures 0(|α1,α2,t0) and 0() on path space that both have infinite total mass and are such that for any ϵ>0 both of these measures assign finite, nonzero mass to the set of paths that are strictly positive at the time ε, and the corresponding conditional probability measures are the limits as y00 of the conditional probability measures described above. Moreover, there is a function Φ0(;α1,α2,t0) on path space such that

0(dy|α1,α2,t0)=Φ0(y;α1,α2,t0)0(dy). (A3)

The posterior distribution (3) converges to

0(dα1,dα2,dt0;dY|D)=(dD|Y,t0)0(dY|α1,α2,t0)π(dα1,dα2,dt0)(dD|Y)0(dY|α1,α2,t0)π(dα1,dα2,dt0). (A4)

Thus, the limit as y00 of a Bayesian inferential procedure for the augmented set of parameters can be viewed as a Bayesian inferential procedure with the improper prior 0(dY|α1,α2,t0)π(dα1,dα2,dt0) for the parameters Y,α1,α2,t0. In particular, the limiting Bayesian inferential procedure is determined by the ratios

(dD|y,t0**)Φ0(h;α1**,α2**,t0**)π(α1**,α2**,t0**)(dD|y,t0*)Φ0(y;α1*,α2*,t0*)π(α1*,α2*,t0*) (A5)

for pairs of augmented parameter values (y,α1*,α2*,t0*) and (y,α1**,α2**,t0**).

The Likelihood of the Data and the Path

Write τi=f(ti). Note that τ0=f(t0)=0. Using Equation 9, the density of the distribution of the transformed allele frequency process (Yt)0sτk against the reference distribution of the Bessel(0) process (Y¯s)0sτk when Y0=Y¯0=y0 can be written

exp{0τk(a(Yr,r)b(Yr))dYr120τk(a2(Yr,r)b2(Yr))dr}, (A6)

where

a(y,τ)=12cot(Yτ)+14(ρ(f1(τ))sin(y)(α2+(2α1α2)cos(y)))

is the infinitesimal mean of the transformed Wright–Fisher process and

b(y)=12y

is the infinitesimal mean of the Bessel(0) process. However, as shown by Sermaidis et al. (2013), attempting to approximate the Itô integral in (A6) using a discrete representation of the path can lead to biased estimates of the posterior distribution. Instead, consider the potential functions

H1(y,τ)=ya(ξ,τ)dξ=18(ρ(f1(τ))cos2(y)(2α1α2)+4log(sin(y)))

and

H2(y)=yb(ξ,τ)dξ=log(y)2.

If we assume that ρ is continuous (not merely right continuous with left limits), then Itô’s lemma shows that we can write

0τk(a(Yr,r)b(Yr))dYr=H1(Yτk,τk)H2(Yτk)(H1(Y0,0)H2(Y0))0τk(H1τ(Yr,r)H2τ(Yr))dr0τk(2H1y2(Yr,r)2H2y2(Yr))dr.

To generalize this to the case where ρ is right continuous with left limits, write

0τk(a(Yr,r)b(Yr))dYr=I0+i=mKIi,

where m and K are defined in the main text,

I0=limτf(dm)0τ(a(Yr,r)b(Yr))dYr,

for m<i<K,

Ii=limτf(di+1)f(di)τ(a(Yr,r)b(Yr))dYr,

and

IK=limττkf(dK)τ(a(Yr,r)b(Yr))dYr.

Itô’s lemma can then be applied to each segment in turn. Following the conversion of the Itô integrals into ordinary Lebesgue integrals, making the substitution s=f1(r) results in the path likelihood displayed in (11).

Acceptance Probability for an Interior Path Update

When we propose a new path (yt)0tτk to update the current path (yt)0tτk that does not hit the boundary, the new path agrees with the existing path outside some time interval [v1,v2] and has a new segment spliced in that goes from yv1 at time v1 to yv2 at time v2. The proposed new path segment comes from a Bessel(0) process over the time interval [v1,v2] that is pinned to take the values yv1 and yv2 at the end points; that is, the proposed new piece of path is a bridge.

The ratio that determines the probability of accepting the proposed path is

P(dD|y,t0)P(dD|y,t0)×(dy)κ(dy|y)(dy)κ(dy|y), (A7)

where P(|y,t0) and P(|y,t0) give the probability of the observed allele counts given the transformed allele frequency paths and initial time t0, () is the distribution of the transformed Wright–Fisher diffusion starting from y0>0 at time 0 (that is, the distribution we have sometimes denoted by y0), the probability kernel κ(|y) gives the distribution of the proposed path when the current path is y, and κ(|y) is similar. To be completely rigorous, the second term in the product in (A7) should be interpreted as the Radon–Nikodym derivative of two probability measures on the product of path space with itself.

Consider a finite set of times 0τ0u0<u1<<uτk. Suppose that {v1,v2}{u0,,u}, v1=um, and v2=un for some m<n. Let (yt)0tτk and (yt)0tτk be two paths that coincide on [0,v1][v2,τk]=[u0,um][un,u]. Write P(x,y;s,t) for the transition density (with respect to Lebesgue measure) of the transformed Wright–Fisher diffusion from time s to time t and Q(x,y;t) for the transition density (with respect to Lebesgue measure) of the Bessel(0) process. Suppose that (ξ,ζ) is a pair of random paths with P((ξ,ζ)(dy,dy))=(dy)κ(dy|y). Then, writing zt=yt=yt for t[0,v1][v2,τk]=[u0,um][un,u], we have

P(ξu1dyu1,,ξudyu,ζu1dyu1,,ζudyu)=P(zu0,zu1;u0,u1)dzu1××P(zum1,zum;um1,um)dzum×P(zum,yum+1;um,um+1)dyum+1××P(yun1,zun;un1,un)dzun×P(zun,zun+1;un,un+1)dzum+1××P(zu1,zu;u1,u)dzu×Q(zum,yum+1;um+1um)dyum+1××Q(yun1,zun;unun1)/Q(zum,zun;unum),

where the factor in the denominator arises because we are proposing bridges and hence conditioning on going from a fixed location at v1=um to another fixed location at v2=un. Thus,

P(ξu1dyu1,,ξudyu,ζu1dyu1,,ζudyu)P(ξu1dyu1,,ξudyu,ζu1dyu1,,ζudyu)=j=mn1P(yuj,yuj+1;uj,uj+1)/Q(yuj,yuj+1;uj+1uj)j=mn1P(yuj,yuj+1;uj,uj+1)/Q(yuj,yuj+1;uj+1uj).

Therefore, the Radon–Nikodym derivative appearing in (A7) is the ratio of Radon–Nikodym derivatives

(d˜/d˜)(y)(d˜/d˜)(y),

where ˜ (resp. ˜) is the distribution of the transformed Wright–Fisher diffusion [resp. the Bessel(0) process] started at location yv1=yv1 at time v1 and run until time v2. The formula (12) for the acceptance probability associated with an interior path update follows immediately.

The above argument was carried out under the assumption that the transformed initial allele frequency y0 was strictly positive and so all the measures involved were probability measures. However, taking y00 we see that the formula (12) continues to hold. Alternatively, we could have worked directly with the measure 0 in place of y0. The only difference is that we would have to replace P(y0,y;0,s) with the density φ(y;0,s) of an entrance law for 0. That is, φ(y;0,s) has the property that

limy00P(y0,y;0,s)P(y0,y;0,s)=φ(y;0,s)φ(y;0,s)

for all y,y>0 and s,s>0 so that

φ(y;0,s)P(y,z;s,t)dy=φ(z;0,t)

for 0<s<t. Such a density, and hence the corresponding entrance law, is unique up to a multiplicative constant. In any case, it is clear that the choice of entrance law in the definition of 0 does not affect the formula (12) as the entrance law densities “cancel out.”

Acceptance Probability for an Allele Age Update

The argument justifying the formula (13) for the probability of accepting a proposed update to the allele age t0 is similar to the one just given for interior path updates. Now, however, we have to consider replacing a path y that starts from y0 at time 0 and runs until time f(tk) with a path y that starts from y0 at time 0 and runs until time f(tk). Instead of removing an internal segment of path and replacing it with one of the same length with the same values at the endpoints, we replace the initial segment of path that runs from time 0 to f(ts)=t0ts(1/ρ(s))ds with one that runs from time 0 to time f(ts)=t0ts(1/ρ(s))ds, with yf(ts)=yf(ts).

By analogy with the previous subsection, we need to consider

P(ξdy,T0ξdt,ζdy,T0ζdt)P(ξdy,T0ξdt,ζdy,T0ζdt),

where ξ is a transformed Wright–Fisher process starting at y0 at time 0 and running to time Fξ=T0ξts(1/ρ(s))ds, where P(T0ξdt)=ρ(t)dt, and conditional on ξ, ζ is a Bessel(0) bridge running from y0 at time 0 to ξFξ at time Fζ=T0ζts(1/ρ(s))ds, where P(T0ζdt)=ρ(t)dt independent of ξ and T0ξ.

Suppose that 0=u0<u1<<um=tts(1/ρ(s))ds and 0=v0<v1<<vn=tts(1/ρ(s))ds. We have for y0,,ym and y0,,yn with y0=y0 and ym=yn that

P(ξuidyi,1im1,T0ξdt,ζvjdyj,1jn,T0ζdt)P(ξvjdyj,1jn1,T0ξdt,ζuidyi,1im,T0ζdt)={i=0m1P(yj,yj+1;ui,ui+1)dyi+1×ρ(t)dt×[j=0n2Q(yj,yj+1;vj+1vj)dyj+1×Q(yn1,yn;vnvn1)/Q(y0,yn;vn)]×dt}/{j=0n1P(yj,yj+1;vj,vj+1)dyj+1)×ρ(t)dt×[i=0m2Q(yi,yi+1;ui+1ui)dyi+1×Q(ym1,ym;umum1)/Q(y0,ym;um)]×dt}={i=0m1P(yj,yj+1;ui,ui+1)dyi+1×ρ(t)dt×[j=0n1Q(yj,yj+1;vj+1vj)dyj+1/Q(y0,yn;vn)]×dt}/{j=0n1P(yj,yj+1;vj,vj+1)dyj+1)×ρ(t)dt×[i=0m1Q(yi,yi+1;ui+1ui)dyi+1/Q(y0,ym;um)]×dt}=i=0m1P(yj,yi+1;ui,ui+1)dyi+1/[i=0m1Q(yi,yi+1;ui+1ui)dyi+1]j=0n1P(yj,yj+1;vj,vj+1)dyj+1)/[j=0n1Q(yj,yj+1;vj+1vj)dyj+1]×Q(y0,ym;um)Q(y0,yn;vn)×ρ(t)ρ(t),

where the second equality follows from the fact that yn=ym.

Thus,

P(ξdy,T0ξdt,ζdy,T0ζdt)P(ξdy,T0ξdt,ζdy,T0ζdt)=(d?/d?)(y)(d^/d^)(y)×Q(y0,yT;T)Q(y0,yT;T)×ρ(t)ρ(t),

where T=tts(1/ρ(s))ds and T=tts(1/ρ(s))ds, ^ (resp. ˇ) is the distribution of the transformed Wright–Fisher diffusion starting at location y0 at time 0 and running until time T (resp. T), and ^ (resp. ˇ) is the distribution of the Bessel(0) process starting at location y0 at time 0 and running until time T (resp. T).

We have thus far assumed that y0 is strictly positive. As in the previous subsection, we can let y00 to get an expression in terms of Radon–Nikodym derivatives of σ-finite measures and the density ψ(y;s) of an entrance law for 0. That is, ψ(y;s) has the property that

limy00Q(y0,y;s)Q(y0,y;s)=ψ(y;s)ψ(y;s)

for all y,y>0 and s,s>0, so that

ψ(y;s)Q(y,z;t)dy=ψ(z;s+t)

for s,t>0. Up to an irrelevant multiplicative constant, ψ is given by the expression (14), and the formula (13) for the acceptance probability follows immediately.

Acceptance Probability for a Most Recent Allele Frequency Update

The derivation of formula (15) for the probability of accepting a proposed update to the most recent allele frequency is similar to those for the other acceptance probabilities (12) and (13), so we omit the details.

Footnotes

Communicating editor: M. A. Beaumont

Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.116.187278/-/DC1.

Literature Cited

  1. Bollback, J. P., T. L. York, and R. Nielsen, 2008.   Estimation of 2nes from temporal allele frequency data. Genetics 179: 497–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Chen, H., and M. Slatkin, 2013 Inferring selection intensity and allele age from multilocus haplotype structure. G3 3: 1429–1442. [DOI] [PMC free article] [PubMed]
  3. Coop G., and R. C. Griffiths, 2004.  Ancestral inference on gene trees under selection. Theor. Popul. Biol. 66(3): 219–232. [DOI] [PubMed] [Google Scholar]
  4. Der Sarkissian C., Ermini L., Schubert M., M. A., Yang P., Librado et al, 2015.  Evolutionary genomics and conservation of the endangered Przewalski’s horse. Curr. Biol. 25(19): 2577–2583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Ewens, W. J., 2004 Mathematical Population Genetics: I. Theoretical Introduction, Vol. 27. Springer-Verlag, Berlin/Heidelberg, Germany/New York. [Google Scholar]
  6. Feder, A. F., S. Kryazhimskiy, and J. B. Plotkin, 2014.  Identifying signatures of selection in genetic time series. Genetics 196: 509–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Feller W., 1951.  Diffusion processes in genetics, p. 246 in Proceedings of the Second Berkeley Symposium Mathematical Statistics and Probability, Vol. 227, edited by M. Kac. University of California Press. [Google Scholar]
  8. Fisher R. A., 1922.  On the dominance ratio. Proc. R. Soc. Edinb. 42: 321–341. [Google Scholar]
  9. Fuchs, C., 2013 Inference for Diffusion Processes: With Applications in Life Sciences. Springer-Verlag, Berlin/Heidelberg, Germany/New York. [Google Scholar]
  10. Girsanov I. V., 1960.  On transforming a certain class of stochastic processes by absolutely continuous substitution of measures. Theory Probab. Appl. 5(3): 285–301. [Google Scholar]
  11. Golightly A., and D. J. Wilkinson, 2005.  Bayesian inference for stochastic kinetic models using a diffusion approximation. Biometrics 61(3): 781–788. [DOI] [PubMed] [Google Scholar]
  12. Golightly A., and D. J. Wilkinson, 2008.  Bayesian inference for nonlinear multivariate diffusion models observed with error. Comput. Stat. Data Anal. 52(3): 1674–1693. [Google Scholar]
  13. Griffiths, R. C., and S. Tavare, 1994.  Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. Lond. B Biol. Sci. 344(1310): 403–410. [DOI] [PubMed] [Google Scholar]
  14. Haldane, J. B. S., 1927.  A mathematical theory of natural and artificial selection, part v: selection and mutation. Math. Proc. Camb. Philos. Soc. 23(07): 838–844. [Google Scholar]
  15. Itô K., 1944.  Stochastic integral. Proc. Jpn. Acad. Ser. A Math. Sci. 20(8): 519–524. [Google Scholar]
  16. Jenkins, P. A., 2013 Exact simulation of the sample paths of a diffusion with a finite entrance boundary. arXiv:1311.5777.
  17. Jenkins, P. A., and D. Spano, 2015 Exact simulation of the Wright-Fisher diffusion. arXiv:1506.06998.
  18. Kallenberg, O., 2002 Foundations of Modern Probability (Probability and Its Applications, Ed. 2). Springer-Verlag, New York. [Google Scholar]
  19. Knight, F. B., 1981 Essentials of Brownian Motion and Diffusion (Mathematical Surveys, Vol. 18). American Mathematical Society, Providence, RI. [Google Scholar]
  20. Ludwig A., Pruvost M., Reissmann M., Benecke N., G. A. Brockmann, et al. , 2009.  Coat color variation at the beginning of horse domestication. Science 324(5926): 485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Malaspinas A.-S., Malaspinas O., S. N. Evans, and M. Slatkin, 2012.  Estimating allele age and selection coefficient from time-serial data. Genetics 192: 599–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Mathieson I., McVean G., 2013.  Estimating selection coefficients in spatially structured populations from time series data of allele frequencies. Genetics 193: 973–984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Mathieson I., Lazaridis I., Rohland N., Mallick S., Patterson N., et al. , 2015.  Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528(7583): 499–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Nielsen R., Williamson S., Kim Y., M. J. Hubisz, A. G. Clark et al, 2005.  Genomic scans for selective sweeps using SNP data. Genome Res. 15(11): 1566–1575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Pickrell, J. K., G. Coop, J. Novembre, S. Kudaravalli, J. Z. Li et al, 2009.  Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19(5): 826–837. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Plummer, M., N. Best, K. Cowles, and K. Vines, 2006 Coda: convergence diagnosis and output analysis for MCMC. R News 6(1): 7–11.
  27. Revuz, D., and M. Yor, 1999 Continuous Martingales and Brownian Motion (Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Vol. 293, Ed. 3). Springer-Verlag, Berlin. [Google Scholar]
  28. Roberts, G. O., and O. Stramer, 2001.  On inference for partially observed nonlinear diffusion models using the Metropolis–Hastings algorithm. Biometrika 88(3): 603–621. [Google Scholar]
  29. Schraiber, J. G., 2014.  A path integral formulation of the Wright–Fisher process with genic selection. Theor. Popul. Biol. 92: 30–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Schraiber J. G., Griffiths R. C., Evans S. N., 2013.  Analysis and rejection sampling of Wright-Fisher diffusion bridges. Theor. Popul. Biol. 89(0): 64–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Sermaidis G., Papaspiliopoulos O., G. O. Roberts, A. Beskos, and P. Fearnhead, 2013.  Markov chain Monte Carlo for exact inference for diffusions. Scand. J. Stat. 40: 294–321. [Google Scholar]
  32. Sjödin P., Skoglund P., Jakobsson M., 2014.  Assessing the maximum contribution from ancient populations. Mol. Biol. Evol. 31: 1248–1260. [DOI] [PubMed] [Google Scholar]
  33. Slatkin M., 2001.  Simulating genealogies of selected alleles in a population of variable size. Genet. Res. 78(01): 49–57. [DOI] [PubMed] [Google Scholar]
  34. Slatkin M., and R. R. Hudson, 1991.  Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129: 555–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Song, Y. S, and M. Steinrücken, 2012.  A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection. Genetics 190: 1117–1129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Sørensen, M., 2009 Parametric inference for discretely sampled stochastic differential equations, pp. 531–553 in Handbook of Financial Time Series. Springer-Verlag, Berlin/Heidelberg, Germany/New York. [Google Scholar]
  37. Steinrücken M., Bhaskar A., and Y. S. Song, 2014.  A novel spectral method for inferring general diploid selection from time series genetic data. Ann. Appl. Stat. 8(4): 2203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Voight, B. F., S. Kudaravalli, X. Wen, and J. K. Pritchard, 2006.  A map of recent positive selection in the human genome. PLoS Biol. 4(3): e72. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Watterson G. A., 1979.  Estimating and testing selection: the two-alleles, genic selection diffusion model. Adv. Appl. Probab. 11: 14–30. [Google Scholar]
  40. Williamson, E. G., and M. Slatkin, 1999.  Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152: 755–761. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

C++ software implementing the method described in this article is freely available under a GNU Public License at https://github.com/Schraiber/selection.


Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES