An explicit transition density expansion for a multi-allelic Wright-Fisher diffusion with general diploid selection

Matthias Steinrücken; Y X Rachel Wang; Yun S Song

doi:10.1016/j.tpb.2012.10.006

. Author manuscript; available in PMC: 2014 Feb 1.

Published in final edited form as: Theor Popul Biol. 2012 Nov 2;83:1–14. doi: 10.1016/j.tpb.2012.10.006

An explicit transition density expansion for a multi-allelic Wright-Fisher diffusion with general diploid selection

Matthias Steinrücken ^a, Y X Rachel Wang ^a, Yun S Song ^a,^b,^✉

PMCID: PMC3568258 NIHMSID: NIHMS419697 PMID: 23127866

Abstract

Characterizing time-evolution of allele frequencies in a population is a fundamental problem in population genetics. In the Wright-Fisher diffusion, such dynamics is captured by the transition density function, which satisfies well-known partial differential equations. For a multi-allelic model with general diploid selection, various theoretical results exist on representations of the transition density, but finding an explicit formula has remained a difficult problem. In this paper, a technique recently developed for a diallelic model is extended to find an explicit transition density for an arbitrary number of alleles, under a general diploid selection model with recurrent parent-independent mutation. Specifically, the method finds the eigenvalues and eigenfunctions of the generator associated with the multi-allelic diffusion, thus yielding an accurate spectral representation of the transition density. Furthermore, this approach allows for efficient, accurate computation of various other quantities of interest, including the normalizing constant of the stationary distribution and the rate of convergence to this distribution.

1. Introduction

Diffusion processes can be used to describe the evolution of population-wide allele frequencies in large populations, and they have been successfully applied in various population genetic analyses in the past. Karlin and Taylor (1981), Ewens (2004), and Durrett (2008) provide excellent introduction to the subject. The diffusion approximation captures the key features of the underlying evolutionary model and provides a concise framework for describing the dynamics of allele frequencies, even in complex evolutionary scenarios. However, finding explicit expressions for the transition density function (TDF) is a challenging problem for most models of interest. Although a partial differential equation (PDE) satisfied by the TDF can be readily obtained from the standard diffusion theory, few models admit analytic solutions.

Since closed-form transition density functions are unknown for general diffusions, approaches such as finite difference methods (Bollback et al., 2008; Gutenkunst et al., 2009) and series expansions (Lukić et al., 2011) have been adopted recently to obtain approximate solutions. In a finite difference scheme, one needs to discretize the state space, but since the TDF depends on the parameters of the model (e.g. the selection coefficients), the suitability of a given discretization might depend strongly on the parameter values, and whether a particular discretization would produce accurate solutions is difficult to predict a priori. Series expansions allow one to circumvent the problem of choosing an appropriate discretization for the state space. However, if the chosen basis functions in the representation are not the eigenfunctions of the diffusion generator, as in Lukić et al. (2011), then one has to solve a system of coupled ordinary differential equations (ODE) to obtain the transition density. Lukić et al. (2011) solve this system of ODEs numerically, which may introduce potential errors of numerical approximations.

If the eigenvalues and eigenfunctions of the diffusion generator can be found, the spectral representation (which in some sense provides the optimal series expansion) of the TDF can be obtained. For the one-locus Wright-Fisher diffusion with an arbitrary number K of alleles evolving under a neutral parent-independent mutation (PIM) model, Shimakura (1977) and Griffiths (1979) derived an explicit spectral representation of the TDF using orthogonal polynomials. More recently, Baxter et al. (2007) derived the same solution by diagonalizing the associated PDE using a suitable coordinate transformation, followed by solving for each dimension independently. In a related line of research, Griffiths and Li (1983) and Tavaré (1984) expressed the time-evolution of the allele frequencies in terms of a stochastic process dual to the diffusion, and showed that the resulting expression is closely related to the spectral representation.

The duality approach was later extended by Barbour et al. (2000) to incorporate a general selection model. Although theoretically very interesting, this approach does not readily lead to efficient computation of the TDF because of the following reason: Computation under the dual process requires evaluating the moments of the stationary distribution. Although the functional form of this distribution is known (Ethier and Kurtz, 1994; Barbour et al., 2000), the normalization constant and moments can only be computed analytically in special cases (Genz and Joyce, 2003), and numerical computation under a general model of diploid selection is difficult. Incidentally, this issue arises in various applications (e.g., Buzbas et al. 2009, 2011), and it has therefore received significant attention in the past; see, for example, Donnelly et al. (2001) and Buzbas and Joyce (2009).

Many decades ago, Kimura (1955, 1957) addressed the problem of finding an explicit spectral representation of the TDF for models with selection. Specifically, in the case of a diallelic model with special selection schemes, he employed a perturbation method to find the required eigenvalues and eigenfunctions of the diffusion generator. Being perturbative in the selection coefficient, this approach is accurate only for small selection parameters. Recently, Song and Steinrücken (2012) revisited this problem and developed an alternative method of deriving an explicit spectral representation of the TDF for the diallelic Wright-Fisher diffusion under a general diploid selection model with recurrent mutation. In contrast to Kimura’s approach, this new approach is non-perturbative and is applicable to a broad range of parameter values. The goal of the present paper is to extend the work of Song and Steinrücken (2012) to an arbitrary number K of alleles, assuming a PIM model with general diploid selection.

The rest of this paper is organized as follows. In Section 2, we lay out the necessary mathematical background and review the work of Song and Steinrücken (2012) in the case of a diallelic (K = 2) model with general diploid selection. In Section 3, we describe the spectral representation for the neutral PIM model with an arbitrary number K of alleles. Then, in Section 4, we generalize the method of Song and Steinrücken (2012) to an arbitrary K-allelic PIM model with general diploid selection. We demonstrate in Section 5 that the quantities involved in the spectral representation converge rapidly. Further, we discuss the computation of the normalization constant of the stationary distribution under mutation-selection balance and the rate of convergence to this distribution. We conclude in Section 6 with potential applications and extensions of our work.

2. Background

2.1. The Wright-Fisher diffusion

In this paper, we consider a single locus with K distinct possible alleles. The dynamics of the allele frequencies in a large population is commonly approximated by the Wright-Fisher diffusion on the (K − 1)-simplex

Δ_{K - 1} ≔ {x \in ℝ_{\geq 0}^{K - 1} : 1 - | x | \geq 0},

where $| x | = \sum_{i = 1}^{K - 1} x_{i}$ . For a given x = (x₁, …, x_K−1) ∈ Δ_K−1, the component x_i denotes the population frequency of allele i ∈ {1, …, K − 1}. The frequency of allele K is given by x_K = 1 − |x|.

The associated diffusion generator ℒ is a second order differential operator of form

ℒ f (x) = \frac{1}{2} \sum_{i, j = 1}^{K - 1} b_{i, j} (x) \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} f (x) + \sum_{i = 1}^{K - 1} a_{i} (x) \frac{\partial}{\partial x_{i}} f (x),

(1)

which acts on twice continuously differentiable functions f : Δ_K−1 → ℝ. The diffusion coefficient b_i,j(x) is given by

b_{i, j} (x) = x_{i} (δ_{i, j} - x_{j}),

where the Kronecker delta δ_i,j is equal to 1 if i = j and 0 otherwise. For a neutral PIM model, we use θ_i = 4Nu_i to denote the population-scaled mutation rate associated with allele i, where u_i is the probability of mutation producing allele i per individual per generation and N is the effective population size. Under this model, the drift coefficient a_i(x) is given by

a_{i} (x) = \frac{1}{2} (θ_{i} - | θ | x_{i}),

where $θ = (θ_{1}, \dots, θ_{K}) \in ℝ_{> 0}^{K} and | θ | = \sum_{i = 1}^{K} θ_{i}$ .

Consider a general diploid selection model in which the relative fitness of a diploid individual with one copy of allele i and one copy of allele j is given by 1 + 2s_i,j. We measure fitness relative to that of an individual with two copies of allele K, thus s_K,K = 0. The diffusion generator in this case is given by ℒ = ℒ₀ + ℒ_σ, where ℒ₀ denotes the diffusion generator under neutrality and the additional term ℒ_σ, which captures the contribution from selection to the drift coefficient a_i(x), is given by

ℒ_{σ} = \sum_{i = 1}^{K - 1} x_{i} [σ_{i} (x) - σ̄ (x)] \frac{\partial}{\partial x_{i}},

where σ_i(x) denotes the marginal fitness of type i and σ̄(x) denotes the mean fitness of the population with allele frequencies x. More precisely,

σ_{i} (x) = \sum_{j = 1}^{K} σ_{i, j} x_{j}

(2)

and

σ̄ (x) ≔ \sum_{i, j = 1}^{K} σ_{i, j} x_{i} x_{j},

(3)

where σ_i,j = 2Ns_i,j. Intuitively, the population frequency of a given allele tends to increase if its marginal fitness is higher than the mean fitness of the population. The selection scheme is specified by a symmetric matrix σ = (σ_i,j)_1≤i,j≤K ∈ ℝ^K×K of population-scaled selection coefficients, with σ_K,K = 0.

The operators ℒ₀ and ℒ are elliptic inside the simplex Δ_K−1, but not on the boundaries. Thus, the precise domain of the generator is not straightforward to describe, but Epstein and Mazzeo (2011) give a suitable characterization.

2.2. Spectral representation of the transition density function

For t ≥ 0, the time evolution of a diffusion X_t on the simplex Δ_K−1 is described by the transition density function p(t; x, y)dy = ℙ[X_t ∈ dy | X₀ = x], where x, y ∈ Δ_K−1. The transition density function satisfies the Kolmogorov backward equation

\frac{\partial}{\partial t} p (t; x, y) = ℒ p (t; x, y),

(4)

where ℒ, the generator associated with the diffusion, is a differential operator in x.

We briefly review the framework underlying the spectral representation of the transition density function. The operator ℒ is said to be symmetric with respect to a density π : Δ_K−1 → ℝ_≥0 if, for all twice continuously differentiable functions f : Δ_K−1 → ℝ and g : Δ_K−1 → ℝ that belong to the domain of the operator, the following equality holds:

\int_{Δ_{K - 1}} [ℒ f (x)] g (x) π (x) d x = \int_{Δ_{K - 1}} f (x) [ℒ g (x)] π (x) d x .

A straightforward calculation using integration by parts yields that the diffusion generators described in Section 2.1 are symmetric with respect to their associated stationary densities.

Theorem 1.4.4 of Epstein and Mazzeo (2011) guarantees that an unbounded symmetric operator ℒ of the kind defined in Section 2.1 of this paper has countably many eigenvalues {−Λ₀, −Λ₁, −Λ₂, …}, which are real and non-positive, satisfying

0 \leq Λ_{0} \leq Λ_{1} \leq Λ_{2} \leq \dots,

with Λ_n → ∞ as n → ∞. An eigenfunction B_n : Λ_K−1 → ℝ with eigenvalue −Λ_n satisfies

ℒ B_{n} (x) = - Λ_{n} B_{n} (x),

(5)

and, furthermore, B_n(x) is an element of the Hilbert space L²(Δ_K−1, π(x)) of functions square integrable with respect to the density π(x), equipped with the canonical inner product 〈·, ·〉_π. If ℒ is symmetric with respect to π(x), then its eigenfunctions are orthogonal with respect to π(x):

{〈 B_{n}, B_{m} 〉}_{π} ≔ \int_{Δ_{K - 1}} B_{n} (x) B_{m} (x) π (x) d x = δ_{n, m} d_{n},

where δ_n,m is the Kronecker delta and d_n are some constants. In the cases considered in this paper, the eigenfunctions form a basis of the Hilbert space L²(Δ_K−1, π(x)).

It follows from equation (5) that exp(−Λ_nt)B_n(x) is a solution to the Kolmogorov backward equation (4). By linearity of (4), the sum of two solutions is again a solution. Combining the initial condition p(0; x, y) = δ(x − y) with the fact that {B_n(x)} form a basis yields the following spectral representation of the transition density:

p (t; x, y) = \sum_{n = 0}^{\infty} \frac{1}{d_{n}} e^{- Λ_{n} t} B_{n} (x) B_{n} (y) π (y) .

(6)

The initial condition being the Dirac delta δ(x − y) corresponds to the frequency at time zero being x.

2.3. Univariate Jacobi polynomials

The univariate Jacobi polynomials play an important role throughout this paper. Here we review some key facts about this particular type of classical orthogonal polynomials. An excellent treatise on univariate orthogonal polynomials can be found in Szegö (1939) and a comprehensive collection of useful formulas can be found in Abramowitz and Stegun (1965, Chapter 22).

The Jacobi polynomials $p_{n}^{(a, b)} (z)$ , for z ∈ [−1, 1], satisfy the differential equation

(1 - z^{2}) \frac{d^{2} f (z)}{d z^{2}} + [b - a - (a + b + 2) z] \frac{df (z)}{dz} + n (n + a + b + 1) f (z) = 0 .

(7)

For given a, b > −1, the set ${p_{n}^{(a, b)} (z)}_{n = 0}^{\infty}$ forms an orthogonal system on the interval [−1, 1] with respect to the weight function (1 − z)^a(1 + z)^b. For a more convenient correspondence with the diffusion parameters, we define the following modified Jacobi polynomials, for x ∈ [0, 1] and a, b > 0:

R_{n}^{(a, b)} (x) = p_{n}^{(b - 1, a - 1)} (2 x - 1) .

This definition is slightly different from that adopted by Griffiths and Spanò (2010).

Equation (7) implies that the modified Jacobi polynomials $R_{n}^{(a, b)} (x)$ , for x ∈ [0, 1], satisfy the differential equation

x (1 - x) \frac{d^{2} f (x)}{{dx}^{2}} + [a - (a + b) x] \frac{df (x)}{dx} + n (n + a + b - 1) f (x) = 0 .

(8)

For fixed a, b > 0, the set ${R_{n}^{(a, b)} (x)}_{n = 0}^{\infty}$ forms an orthogonal system on [0, 1] with respect to the weight function x^a−1(1 − x)^b−1. More precisely,

\int_{0}^{1} R_{n}^{(a, b)} (x) R_{m}^{(a, b)} (x) x^{a - 1} {(1 - x)}^{b - 1} dx = δ_{n, m} c_{n}^{(a, b)},

where δ_n,m denotes the Kronecker delta and

c_{n}^{(a, b)} = \frac{Γ (n + a) Γ (n + b)}{(2 n + a + b - 1) Γ (n + a + b - 1) Γ (n + 1)} .

(9)

Note that ${R_{n}^{(a, b)} (x)}_{n = 0}^{\infty}$ form a complete basis of the Hilbert space L²([0, 1], x^a−1(1 − x)^b−1).

For n ≥ 1, the modified Jacobi polynomial $R_{n}^{(a, b)} (x)$ satisfies the recurrence relation

{xR}_{n}^{(a, b)} (x) = \frac{(n + a - 1) (n + b - 1)}{(2 n + a + b - 1) (2 n + a + b - 2)} R_{n - 1}^{(a, b)} (x) + [\frac{1}{2} - \frac{b^{2} - a^{2} - 2 (b - a)}{2 (2 n + a + b) (2 n + a + b - 2)}] R_{n}^{(a, b)} (x) + \frac{(n + 1) (n + a + b - 1)}{(2 n + a + b) (2 n + a + b - 1)} R_{n + 1}^{(a, b)} (x),

(10)

while, for n = 0,

{xR}_{0}^{(a, b)} (x) = \frac{a}{a + b} R_{0}^{(a, b)} (x) + \frac{1}{a + b} R_{1}^{(a, b)} (x) .

(11)

Also, note that $R_{0}^{(a, b)} (x) \equiv 1$ . These recurrence relations play an important role in the work of Song and Steinrücken (2012), and the multivariate analogues, discussed later in Section 3.2, are similarly important for the present work.

The modified Jacobi polynomials satisfy other interesting relations, one of them being the following:

R_{n}^{(a, b)} (x) = \frac{n + a + b - 1}{2 n + a + b - 1} R_{n}^{(a, b + 1)} (x) - 𝟙_{{n > 0}} \frac{n + a - 1}{2 n + a + b - 1} R_{n - 1}^{(a, b + 1)} (x) .

(12)

Using this identity, polynomials with parameter b can be related to polynomials with parameter b + 1. We utilize this relation later.

2.4. A review of the K = 2 case

To motivate the approach to be employed in the general case, we briefly review the work of Song and Steinrücken (2012) for deriving the transition density function in the diallelic (K = 2) case. The vector of mutation rates is given by θ = (α, β), while the symmetric matrix describing the general diploid selection scheme can be parametrized as

σ = (\begin{matrix} 2 σ & 2 σ h \\ 2 σ h & 0 \end{matrix}),

where σ is the selection strength and h the dominance parameter. For K = 2, the diffusion is one dimensional and the simplex Δ₁ is equal to the unit interval [0, 1]. With x denoting x₁, the generator (1) reduces to

ℒ f (x) = \frac{1}{2} x (1 - x) \frac{\partial^{2}}{\partial x^{2}} f (x) + {\frac{1}{2} [α - (α + β) x] + 2 σ x (1 - x) [x + h (1 - 2 x)]} \frac{\partial}{\partial x} f (x) .

In the neutral case (i.e., σ = 0), the modified Jacobi polynomials $R_{n}^{(α, β)} (x)$ are eigenfunctions of the diffusion generator with eigenvalues $λ_{n}^{(α, β)} = \frac{1}{2} n (n - 1 + α + β)$ . Hence, a spectral representation of the transition density function can be readily obtained via (6).

In the non-neutral case (i.e., σ ≠ 0), consider the functions $S_{n}^{θ} (x) = e^{- σ̄ (x) / 2} R_{n}^{(α, β)} (x)$ , which form an orthogonal basis of the Hilbert space L²([0, 1], e^σ̄(x)x^α−1(1 − x)^β−1), where e^σ̄(x)x^α−1(1 − x)^β−1 corresponds to the stationary distribution of the non-neutral diffusion, up to a multiplicative constant. Since the eigenfunctions B_n(x) of the diffusion generator are elements of this Hilbert space, we can pose an expansion $B_{n} (x) = \sum_{m = 0}^{\infty} w_{n, m} S_{m}^{θ} (x)$ in terms of the basis functions $S_{n}^{θ} (x)$ , where w_n,m are to be determined. Then, the eigenvalue equation ℒB_n(x) = −Λ_nB_n(x) implies the algebraic equation

\sum_{m = 0}^{\infty} w_{n, m} [λ_{m}^{(α, β)} + Q (x; α, β, σ, h)] R_{m}^{(α, β)} (x) = Λ_{n} \sum_{m = 0}^{\infty} w_{n, m} R_{m}^{(α, β)} (x),

where Q(x; α, β, σ, h) is a polynomial in x of degree four. Utilizing the recurrence relations in (10) and (11), one can then arrive at a linear system Mw_n = Λ_nw_n, where w_n = (w_n,0, w_n,1, w_n,2, …) is an infinite-dimensional vector of variables and M is a sparse infinite-dimensional matrix with entries that depend on the index n and the parameters α, β, σ, h of the model. The infinite linear system Mw_n = Λ_nw_n is approximated by a finite-dimensional truncated linear system

M^{[D]} w_{n}^{[D]} = Λ_{n}^{[D]} w_{n}^{[D]},

where $w_{n}^{(D)} = (w_{n, 0}^{[D]}, w_{n, 1}^{[D]}, \dots, w_{n, D - 1}^{[D]})$ and M^[D] is the submatrix of M consisting of its first D rows and D columns. This finite-dimensional linear system can be easily solved using standard linear algebra to obtain the eigenvalues $Λ_{n}^{[D]}$ and the eigenvectors $w_{n}^{[D]}$ of M^[D]. Song and Steinrücken observed that $Λ_{n}^{[D]} and w_{n, m}^{[D]}$ converge very rapidly as the truncation level D increases. Finally, the coefficients $w_{n, m}^{[D]}$ can be used to approximate the eigenfunctions B_n(x), and, together with the eigenvalues $Λ_{n}^{[D]}$ , an efficient approximation of the transition density function can be obtained via (6).

3. The Neutral Case with an Arbitrary Number of Alleles

In this section, we describe the spectral representation of the transition density of a neutral PIM model with an arbitrary number K of alleles. As in the case of K = 2, reviewed in Section 2.4, for an arbitrary K the eigenfunctions in the neutral case can be used to construct the eigenfunctions in the case with selection. The latter case is considered in Section 4.

3.1. Multivariate Jacobi polynomials

In what follows, let ℕ₀ = {0, 1, 2, …} denote the set of non-negative integers. As in Griffiths and Spanò (2011), we define the following system of multivariate orthogonal polynomials in K − 1 variables:

Definition 1. For each vector $n = (n_{1}, \dots, n_{K - 1}) \in ℕ_{0}^{K - 1} and θ = (θ_{1}, \dots, θ_{K}) \in ℝ_{\geq 0}^{K}$ , the orthogonal polynomial $P_{n}^{θ} (x)$ is defined as

P_{n}^{θ} (x) = \prod_{j = 1}^{K - 1} [{(1 - \frac{x_{j}}{1 - \sum_{i = 1}^{j - 1} x_{i}})}^{N_{j}} R_{n_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})} (\frac{x_{j}}{1 - \sum_{i = 1}^{j - 1} x_{i}})],

where $N_{j} = \sum_{i = j + 1}^{K - 1} n_{i} and Θ_{j} = \sum_{i = j + 1}^{K} θ_{i}$ .

For x = (x₁, …, x_K−1) ∈ Δ_K−1, let Π₀(x) denote an unnormalized density of the Dirichlet distribution with parameter θ = (θ₁, …, θ_K):

Π_{0} (x) = \prod_{i = 1}^{K} x_{i}^{θ_{i} - 1},

(13)

where x_K = 1 − |x|. The following lemma, a proof of which is provided in Appendix A, states that the above multivariate Jacobi polynomials $P_{n}^{θ} (x)$ are orthogonal with respect to Π₀(x):

Lemma 2. For $n, m \in ℕ_{0}^{K - 1}$ ,

\int_{Δ_{K - 1}} P_{n}^{θ} (x) P_{m}^{θ} (x) Π_{0} (x) d x = δ_{n, m} C_{n}^{θ},

where $δ_{n, m} = \prod_{i = 1}^{K - 1} δ_{n_{i}, m_{i}}$ and

C_{n}^{θ} ≔ \prod_{i = 1}^{K - 1} c_{n_{i}}^{(θ_{i}, Θ_{i} + 2 N_{i})},

(14)

with $C_{n}^{(a, b)}$ defined in (9).

Remark: The multivariate Jacobi polynomials form a complete basis of L²(Δ_K−1, Π₀(x)), the Hilbert space of functions on Δ_K−1 square integrable with respect to the unnormalized Dirichlet density Π₀(x).

3.2. Recurrence relation for multivariate Jacobi polynomials

Recall that the univariate Jacobi polynomials satisfy the recurrence relation (10). Theorem 3.2.1 of Dunkl and Xu (2001) guarantees that the multivariate Jacobi polynomials satisfy a similar recurrence relation. More precisely, we have the following lemma, the proof of which is provided in Appendix A:

Lemma 3. Given $n = (n_{1}, \dots, n_{K - 1}) \in ℕ_{0}^{K - 1} and m = (m_{1}, \dots, m_{K - 1}) \in ℕ_{0}^{K - 1}$ , define $N_{j} = \sum_{i = j + 1}^{K - 1} n_{i} and M_{j} = \sum_{i = j + 1}^{K - 1} m_{i}$ . For given i ∈ {1, …, K − 1} and n, $P_{n}^{θ} (x)$ satisfies the recurrence relation

x_{i} P_{n}^{θ} (x) = \sum_{m \in ℳ_{i} (n)} r_{n, m}^{(θ, i)} P_{m}^{θ} (x),

(15)

where $r_{n, m}^{(θ, i)}$ are known constants (provided in Appendix A) and

ℳ_{i} (n) ≔ {m \in ℕ_{0}^{K - 1} : M_{j} = N_{j} for all j > i and | M_{j} - N_{j} | \leq 1 for all j \leq i} .

(16)

Impose an ordering on the (K − 1)-dimensional index vectors $n \in ℕ_{0}^{K - 1}$ . Then, the recurrence (15) can be represented as

x_{i} P_{n}^{θ} (x) = \sum_{m \in ℳ_{i} (n)} {[𝒢_{i}^{θ}]}_{n, m} P_{m}^{θ} (x),

where $𝒢_{i}^{θ}$ corresponds to an infinite dimensional matrix in which columns and rows are indexed by the ordered (K − 1)-tuples, and the (n, m)-th entry is defined as

{[𝒢_{i}^{θ}]}_{n, m} = {\begin{matrix} r_{n, m}^{(θ, i)} & if m \in ℳ_{i} (n), \\ 0, & otherwise . \end{matrix}

(17)

Note that for each given n, the number of non-zero entries in every row of $𝒢_{i}^{θ}$ is finite. One can deduce the following corollary from the new representation:

Corollary 4. Let $a = {(a_{n})}_{n \in ℕ_{0}^{K - 1}}$ be such that $\sum_{n \in ℕ_{0}^{K - 1}} a_{n}^{2} C_{n}^{θ} < \infty$ . Then

x_{i} \cdot \sum_{n \in ℕ_{0}^{K - 1}} a_{n} P_{n}^{θ} (x) = \sum_{n \in ℕ_{0}^{K - 1}} b_{n} P_{n}^{θ} (x),

where ${(b_{n})}_{n \in ℕ_{0}^{K - 1}} = a \cdot 𝒢_{i}^{θ}$ .

Remark: Since under multiplication x_i commutes with x_j for 1 ≤ i, j ≤ K − 1, the corresponding matrices $𝒢_{i}^{θ} and 𝒢_{j}^{θ}$ also commute.

3.3. Eigenfunctions of the neutral generator ℒ₀

It is well known that the stationary distribution of the Wright-Fisher diffusion under a neutral PIM model is the Dirichlet distribution (Wright, 1949). The density of the Dirichlet distribution is a weight function with respect to which the associated diffusion generator ℒ₀ is symmetric. As discussed in Section 3.1, the multivariate Jacobi polynomials $P_{n}^{θ} (x)$ are orthogonal with respect to the weight function Π₀(x), which is equal to the density of the Dirichlet distribution up to a multiplicative constant. Given the discussion in Section 2.2, one might then suspect that $P_{n}^{θ} (x)$ are potential eigenfunctions of ℒ₀. The following lemma establishes that this is indeed the case:

Lemma 5. For all $n \in ℕ_{0}^{K - 1}$ , the multivariate Jacobi polynomials $P_{n}^{θ} (x)$ satisfy

ℒ_{0} P_{n}^{θ} (x) = - λ_{| n |}^{θ} P_{n}^{θ} (x),

where

λ_{| n |}^{θ} = \frac{1}{2} | n | (| n | - 1 + | θ |) .

(18)

That is, $P_{n}^{θ} (x)$ are eigenfunctions of ℒ₀ with eigenvalues $- λ_{| n |}^{θ}$ .

A proof of this lemma is deferred to Appendix A. We conclude this section with a few comments.

Remarks:

Substituting the eigenvalues and eigenfunctions into the spectral representation (6), we obtain
$p (t; x, y) = \sum_{n \in ℕ_{0}^{K - 1}} \frac{1}{C_{n}^{θ}} e^{- λ_{| n |}^{θ} t} P_{n}^{θ} (x) P_{n}^{θ} (y) Π_{0} (y) .$ (19)
For every $n \in ℕ_{0}^{K - 1}$ , note that $- λ_{| n |}^{θ}$ only depends on the norm |n|, which implies degeneracy in the spectrum of ℒ₀. Griffiths (1979) constructed orthogonal kernel polynomials indexed by |n|, that is the sum over all orthogonal polynomials with index summing to |n|, and obtained the transition density expansion (19).

4. A General Diploid Selection Case with an Arbitrary Number of Alleles

In this section, we derive the spectral representation of the transition density function of the Wright-Fisher diffusion under a K-allelic PIM model with general diploid selection. This work extends the work of Song and Steinrücken (2012), the special case of K = 2 briefly summarized in Section 2.4, to an arbitrary number K of alleles. The recurrence relation presented in Lemma 3 plays a crucial role in the following derivation.

Recall that the backward generator for the full model is ℒ = ℒ₀ + ℒ_σ, where ℒ₀ corresponds to the generator under neutrality and ℒ_σ corresponds to the contribution from selection. The diffusion has a unique stationary density [see Ethier and Kurtz (1994) or Barbour et al. (2000)] proportional to

Π (x) ≔ e^{σ̄ (x)} Π_{0} (x),

(20)

where Π₀(x) is defined in (13) and σ̄(x) is the mean fitness defined in (3). As mentioned in Section 2.2, ℒ is symmetric with respect to Π(x). For n ∈ ℕ₀, we aim to find the eigenvalues −Λ_n and the eigenfunctions B_n of ℒ such that

ℒ B_{n} (x) = - Λ_{n} B_{n} (x) .

(21)

By convention, we place Λ_n in non-decreasing order. The symmetry of ℒ implies that {B_n(x)} form an orthogonal system with respect to Π(x), that is

\int_{Δ_{K - 1}} B_{n} (x) B_{m} (x) Π (x) d x \propto δ_{n, m} .

Such a system of orthogonal functions, however, is not unique. The orthogonality of ${P_{n}^{θ}}$ with respect to Π₀, established in Lemma 2, can be used to show that the functions

S_{n}^{θ} (x) ≔ P_{n}^{θ} (x) e^{- σ̄ (x) / 2}

(22)

are orthogonal with respect to Π, as are B_n(x). Furthermore, the fact that ${P_{n}^{θ} (x)}$ form a complete basis of L²(Δ_K−1, Π₀(x)) means that ${S_{n}^{θ} (x)}$ is a complete basis of L²(Δ_K−1, Π(x)). Since B_n ∈ L²(Δ_K−1, Π(x)), we thus seek to represent B_n(x) as linear combination of the basis $S_{n}^{θ} (x)$ :

B_{n} (x) = \sum_{m \in ℕ_{0}^{K - 1}} u_{n, m} S_{m}^{θ} (x),

(23)

where u_n,m are some constants to be determined.

Define an index set $ℐ = \cup_{L = 0}^{4}$ {1, …, K − 1}^L and for i = (i₁, …, i_L) ∈ ℐ, define x_i = x_i₁ ⋯ x_{i_L}. We have the following theorem for solving the eigensystem associated with the full generator ℒ:

Theorem 6. For all n ∈ ℕ₀, the eigenfunction B_n(x) of ℒ can be represented by (23). The corresponding eigenvalues −Λ_n and the coefficients u_n,m can be found by solving the infinite-dimensional eigensystem

u_{n} M = u_{n} Λ_{n},

(24)

where $u_{n} = {(u_{n, m})}_{m \in ℕ_{0}^{K - 1}}$ and

M = diag ({λ_{| m |}^{θ}}_{m \in ℕ_{0}^{K - 1}}) + \sum_{i \in ℐ} q (i) 𝒢_{i}^{θ} .

Here, $λ_{| m |}^{θ}$ is defined as in (18) and, for i = (i₁, …, i_L) ∈ ℐ, we define $𝒢_{i}^{θ} = 𝒢_{i_{1}}^{θ} \dots 𝒢_{i_{L}}^{θ}$ , where $𝒢_{i}^{θ}$ is given in (17). When L =0, $𝒢_{i}^{θ}$ is defined to be the identity matrix. Explicit expressions of the constants q(i) are provided in Appendix C.

Remarks:

Although M is infinite dimensional, it is in fact sparse with only finitely many non-zero entries in every row and column.
Equation (24) implies that Λ_n and u_n are in fact the left eigenvalues and eigenvectors of M. To solve the eigensystem in practice requires some truncation of the matrix to finite dimensions, as in the K = 2 case described in Section 2.4. For a given n ∈ ℕ₀, we would like both Λ_n and u_n,m to converge as the truncation level increases. In Section 5, we demonstrate that this is indeed the case using empirical examples.

We now provide a proof of the theorem.

Proof of Theorem 6. Substituting (23) into (21) we obtain

\sum_{k \in ℕ_{0}^{K - 1}} u_{n, k} ℒ S_{k}^{θ} (x) = - \sum_{k \in ℕ_{0}^{K - 1}} Λ_{n} u_{n, k} S_{k}^{θ} (x) .

It is shown in Appendix D that

ℒ S_{k}^{θ} (x) = - e^{- σ̄ (x) / 2} [λ_{| k |}^{θ} P_{k}^{θ} (x) + Q (x; σ, θ) P_{k}^{θ} (x)],

(25)

where

Q (x; σ, θ) = \frac{1}{2} [\sum_{i = 1}^{K} x_{i} σ_{i}^{2} (x) + \sum_{i = 1}^{K} θ_{i} σ_{i} (x) + \sum_{i = 1}^{K} x_{i} σ_{i, i} - (1 + | θ |) σ̄ (x) - σ̄ {(x)}^{2}],

with σ_i(x) and σ̄(x) defined as in (2) and (3), respectively. Thus, one arrives at the following equation:

\sum_{k \in ℕ_{0}^{K - 1}} Λ_{n} u_{n, k} P_{k}^{θ} (x) = \sum_{k \in ℕ_{0}^{K - 1}} u_{n, k} [λ_{| k |}^{θ} P_{k}^{θ} (x) + Q (x; σ, θ) P_{k}^{θ} (x)] .

(26)

We solve the equation by first representing Q(x; σ, θ) $P_{k}^{θ} (x)$ as a finite linear combination of ${P_{k}^{θ} (x)}_{k \in ℕ_{0}^{K - 1}}$ . Observe that Q is in fact a fourth-order polynomial in x. Collecting terms, Q can be written in the form

Q (x; σ, θ) = \sum_{i \in ℐ} q (i) x_{i},

(27)

for the constants q(i) given in Appendix C. Applying Corollary 4 recursively, we obtain

Q (x; σ, θ) P_{k}^{θ} (x) = \sum_{i \in ℐ} q (i) \sum_{l \in ℕ_{0}^{K - 1}} {[𝒢_{i}^{θ}]}_{k, l} P_{l}^{θ} (x) .

Finally, substituting this equation into (26), multiplying both sides of (26) by $P_{m}^{θ} (x)$ , and integrating with respect to Π₀(x) over the simplex Δ_K−1 yields the matrix equation (24).

5. Empirical Results and Applications

In this section, we study the convergence behavior of the eigenvalues and eigenvectors as we approximate the solutions of (24). Further, we show how the spectral representation can be employed to obtain the transient and stationary density explicitly (especially the normalizing constant), and to characterize the convergence rate of the diffusion to stationarity. A Mathematica implementation of the relevant formulas for computing the spectral representation is available from the authors upon request.

5.1. Convergence of the eigenvalues and eigenvectors

In what follows we order the Jacobi polynomials according to the graded lexicographic ordering of their corresponding indices. Thus $P_{n_{1}}^{θ} < P_{n_{2}}^{θ}$ if

|n₁| < |n₂|, or
|n₁| = |n₂| and n₁ is lexicographically smaller than n₂.

Fix K and note that, for a given truncation level D ∈ ℕ₀ and l ∈ ℕ₀, there are $(\begin{matrix} l + K - 2 \\ K - 2 \end{matrix})$ polynomials $P_{n}^{θ}$ with |n| = l, and $𝒰 (D) ≔ (\begin{matrix} D + K - 1 \\ K - 1 \end{matrix})$ polynomials with index |n| ≤ D. For the computations in the rest of this section we chose K = 3, unless otherwise stated.

Now, one can obtain a finite-dimensional linear system approximating (24) by truncation, that is, taking only those entries in M and u_n whose associated index vectors satisfy |n| ≤ D. More explicitly, with M^[D] = ([M]_k,l) ∈ ℝ^{𝒰(D)×𝒰(D)} and $u_{n}^{[D]} = (u_{n, k}) \in ℝ^{𝒰 (D)}$ , where $k, l \in ℕ_{0}^{K - 1}$ such that |k|, |l| ≤ D, the solutions of

u_{n}^{[D]} M^{[D]} = u_{n}^{[D]} Λ_{n}^{[D]}

should approximate the solutions of the infinite system Λ_n and u_n. The convergence patterns of $Λ_{n}^{[D]} and u_{n, k}^{[D]}$ as D increases are exemplified in Figure 1 for the parameters

i)
K = 3, θ = (0.01, 0.02, 0.03), $σ = σ_{1} : = (\begin{matrix} 12 & 14 & 15 \\ 14 & 11 & 13 \\ 15 & 13 & 0 \end{matrix})$ ; and
ii)
K = 3, θ = (0.01, 0.02, 0.03), $σ = σ_{2} : = (\begin{matrix} 120 & 140 & 150 \\ 140 & 110 & 130 \\ 150 & 130 & 0 \end{matrix})$ .

Figure 2 displays the convergence behavior for the parameters

iii)
K = 3, θ = (10, 20, 30), σ = σ₁; and
iv)
K = 4, θ = (0.01, 0.02, 0.03, 0.04), $σ_{3} = (\begin{matrix} 12 & 14 & 15 & 16 \\ 14 & 11 & 10 & 13 \\ 15 & 10 & 9 & 14 \\ 16 & 13 & 14 & 0 \end{matrix})$ .

Convergence of the truncated eigenvalues Λ_n and coefficients of the eigenvectors u_n as the truncation level D increases, for K = 3 with low mutation rates. Subfigures (a) and (b) show $Λ_{0}^{[D]}, Λ_{75}^{[D]}, and Λ_{150}^{[D]}$ for σ = σ₁ and σ = σ₂, respectively. Subfigures (c) and (d) show $u_{75, (8, 2)}^{[D]}, u_{75, (7, 3)}^{[D]}, and u_{75, (6, 4)}^{[D]}$ for σ = σ₁ and σ = σ₂, respectively. The mutation rates were set to θ = (0.01, 0.02, 0.03) for all computations.

Convergence of the truncated eigenvalues Λ_n and coefficients of the eigenvectors u_n as the truncation level D increases, for K = 3 with high mutation rates and for K = 4 with low mutation rates. Subfigures (a) and (c) show $Λ_{n}^{[D]}$ for n = 0, 75, 150, and $u_{75, m}^{[D]}$ for m = (8, 2), (7, 3), (6, 4), respectively, for mutation rates θ = (10, 20, 30) and selection coefficients σ = σ₁. The convergence behavior for K = 4 is shown in subfigures (b) and (d) for $Λ_{n}^{[D]}$ with n = 0, 75, 150, and $u_{75, m}^{[D]}$ with m = (3, 2, 4), (5, 3, 4), (3, 5, 2), respectively; the mutation rates were set to θ = (0.01, 0.02, 0.03, 0.04) and the selection coefficients to σ = σ₃.

In all cases, $Λ_{n}^{[D]} and u_{n, k}^{[D]}$ converge with increasing truncation level to empirical limits. The eigenvalues $Λ_{n}^{[D]}$ decrease towards the empirical limit, whereas the coefficients $u_{n, k}^{[D]}$ show oscillatory behavior before ultimately stabilizing. The rate of convergence is faster for smaller selection intensity. Varying the mutation parameters does not influence convergence behavior significantly. As expected, $Λ_{0}^{[D]}$ converges rapidly to zero in all cases, consistent with the fact that the diffusion has a stationary distribution. For a fixed n, $Λ_{n}^{[D]}$ and its associated coefficients $u_{n, k}^{[D]}$ roughly converge at similar truncation levels.

Figure 3 shows $Λ_{n}^{[D]}$ for D = 24 and 0 ≤ n ≤ 35 under neutrality (σ = 0) and selection (σ = σ₁ and $σ = \frac{1}{4} σ_{2}$ ). Upon inspection all of the eigenvalues displayed have converged properly. Under neutrality, the eigenvalues are functions of |n|, thus they are degenerate and cluster into groups. In the presence of selection, however, we empirically observe that all of the eigenvalues are distinct. For moderate selection intensity, the group structure is less prominent. In general, increasing the selection parameters evens out the group structure and shifts the entire spectrum upward.

The first 36 eigenvalues of the different spectra for the selection parameters σ = 0, σ1 and $\frac{1}{4} σ_{2}$ , respectively. The latter was chosen so that the ranges of the eigenvalues are comparable. The truncation level D was set to 24 and mutation rates θ = (0.01, 0.02.0.03) were used.

Computing the transition density function for large selection coefficients requires combining terms of substantially different orders of magnitude, because of the exponential weighting factors in the density (20) and in the expansion (22). Therefore, the coefficients $u_{n, k}^{[D]}$ have to be calculated with high precision to obtain accurate numerical results under strong selection.

5.2. Transient and stationary densities

The approximations to the eigenvalues $Λ_{n}^{[D]}$ and the eigenfunctions B_n (via the eigenvectors $u_{n}^{[D]}$ and equation (23)) can be used in the spectral representation (6) to approximate the transition density function at arbitrary times t. Examples with σ = σ₁ for different times are given in Figure 4. At first, the density is concentrated around the initial frequencies x = (0.02, 0.02, 0.96), but as time increases, the frequencies of the first and second allele increases, since these have a higher relative fitness. Eventually, the transition density converges to the stationary distribution (similar to distribution at t = 2), where the bulk of the mass is concentrated at high frequencies for the first and second allele.

Approximation of the transition density function (6) for different times t ∈ {0.04, 0.2, 1.0, 2.0}. Selection was governed by the matrix of coefficients σ = σ₁ and x = (0.02, 0.02, 0.96) was used as initial condition. The truncation level was set to D = 40, whereas the summation in equation (23) ranged over all m such that 0 ≤ |m| ≤ 36, and all eigenfunctions and eigenvalues with 0 ≤ n ≤ 561 were included in equation (6). The mutation rates were set to θ = (0.01, 0.02.0.03). The plots only vary in y₁ and y₂, since y₃ = 1 − y₁ − y₂. (a) t = 0.04. (b) t = 0.2. (c) t = 1.0. (d) t = 2.0.

The eigenvalues $Λ_{n}^{[D]}$ and coefficients $u_{n, k}^{[D]}$ can also be employed to approximate the constant that normalizes the stationary distribution Π(x) to a proper probability distribution. Following the same line of argument as Song and Steinrücken (2012), the orthogonal relations enable us to circumvent the difficulty involved in directly evaluating a multivariate integral over the simplex Δ_K−1. First, note that since ℒ maps constant functions to zero, any constant function is an eigenfunction with associated eigenvalue Λ₀ = 0, thus B₀(x) = B₀(y) = const. In (6), taking t → ∞, we get

lim_{t \to \infty} p (t; x, y) = Π (y) \frac{B_{0} (x) B_{0} (y)}{{〈 B_{0}, B_{0} 〉}_{Π}} ≕ \frac{1}{C_{Π}} Π (y) .

Then for x = y = 0, by (23) we have

C_{Π} = \int_{Δ K - 1} Π (z) d z = \frac{{〈 B_{0}, B_{0} 〉}_{Π}}{B_{0} {(0)}^{2}} = \frac{\sum_{m \in ℕ_{0}^{K - 1}} u_{0, m}^{2} {〈 P_{m}^{θ}, P_{m}^{θ} 〉}_{Π_{0}}}{e^{- σ̄ (0)} {(\sum_{m \in ℕ_{0}^{K - 1}} u_{0, m} P_{m}^{θ} (0))}^{2}} = \frac{\sum_{m \in ℕ_{0}^{K - 1}} u_{0, m}^{2} C_{m}^{θ}}{{(\sum_{m \in ℕ_{0}^{K - 1}} u_{0, m} \prod_{j = 1}^{K - 1} \frac{Γ (n_{j} + θ_{j})}{Γ (n_{j} + 1) Γ (θ_{j})})}^{2}},

(28)

since σ̄(0) = σ_K,K = 0 and $R_{n_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})} (0) = {(- 1)}^{n_{j}} \frac{Γ (n_{j} + θ_{j})}{Γ (n_{j} + 1) Γ (θ_{j})}$ . Here $C_{m}^{θ}$ is the constant defined in (14). The purely algebraic form of the right hand side in equation (28) allows to compute an accurate approximation of the normalizing constant C_Π by replacing the infinite sums by sums over all indices m such that |m| is less or equal then a given truncation level. This offers an attractive alternative to other computationally intensive methods (Donnelly et al., 2001; Genz and Joyce, 2003; Buzbas and Joyce, 2009).

Figure 5 shows two examples of stationary distributions for different selection coefficients. In Figure 5(a) the stationary density is concentrated in the interior of the simplex, since all homozygotes are less fit then the heterozygotes. This situation is referred to as heterozygote advantage, resulting in a balancing selection pattern, and the different alleles co-exist at stationarity. In Figure 5(b), allele number 1 is strongly favored by the given selection coefficients, and thus the stationary density is concentrated at high frequencies for this allele.

Two examples of the stationary distribution for different selection parameters. The mutation rates were set to θ = (0.01, 0.02.0.03) in both cases. Again, a truncation level of D = 40 was used, the summation in equation (28) ranged over all m such that 0 ≤ |m| ≤ 36. The plots only vary in y₁ and y₂, since y₃ = 1 − y₁ − y₂.

We can also use (6) to investigate the rate of convergence of the diffusion process to the stationary distribution. Denote the difference between the transition density and the stationary density by

d (t; x, y) ≔ p (t; x, y) - \frac{1}{C_{Π}} Π (y) = \sum_{n = 1}^{\infty} e^{- Λ_{n} t} Π (y) \frac{B_{n} (x) B_{n} (y)}{{〈 B_{n}, B_{n} 〉}_{Π}} .

We measure the magnitude of d(t; x, y) by the square of its L² norm with respect to the weight function 1/Π(y), that is,

{‖ d (t; x, \cdot) ‖}_{1 / Π}^{2} ≔ {〈 d, d 〉}_{1 / Π} = \sum_{n = 1}^{\infty} e^{- 2 Λ_{n} t} \frac{B_{n} {(x)}^{2}}{{〈 B_{n}, B_{n} 〉}_{Π}} = \sum_{n = 1}^{\infty} e^{- 2 Λ_{n} t} \frac{e^{- σ̄ (x)} {(\sum_{k \in ℕ_{0}^{K - 1}} u_{n, k} P_{k}^{θ} (x))}^{2}}{\sum_{m \in ℕ_{0}^{K - 1}} u_{n, m}^{2} C_{m}^{θ}} .

(29)

Again, the sums in this expression can be approximated by truncating at a given level. Figure 6 shows ${‖ d (t; x, \cdot) ‖}_{1 / Π}^{2}$ as a function of time t, for σ = σ₁, σ = 0.5σ₁, σ = 0.1σ₁. The initial frequencies were x = (0.02, 0.02, 0.96). As expected, the distance to the stationary distribution decreases over time. Further, the rate of convergence is faster if the values in σ get larger, which was observed by Song and Steinrücken (2012) too. We note that the spectral representation can also be readily employed to study convergence rates measured by other metrics such as the total variation distance or relative entropy.

Convergence of the transition density to stationarity as time evolves, for initial frequencies x = (0.02, 0.02, 0.96)^T. Deviation from the stationary density is measured by ${‖ d (t; x, \cdot) ‖}_{1 / Π}^{2}$ , defined in (29). The mutation rates were chosen to be θ = (0.01, 0.02, 0.03) and the selection parameters were σ = 0.1σ₁, σ = 0.5σ₁ and σ = 0.1σ₁, respectively. The truncation level was set to D = 40, and (29) was approximated by summing over 0 ≤ n ≤ 561 and **m, k** such that 0 ≤ |m|, |k| ≤ 36.

6. Discussion

In this paper, we have extended the method of Song and Steinrücken (2012) to obtain an explicit spectral representation of the transition density function for the multi-dimensional Wright-Fisher diffusion under a PIM model with general diploid selection and an arbitrary number of alleles. We have demonstrated the fidelity and fast convergence of the approximations. Further, as an example application of our work, we have computed the normalization constant of the stationary distribution and quantified the rate at which the transition density approaches this distribution.

Efficient approximations of the eigensystem and the transition density function lead to a number of important applications. Combining the stationary distribution discussed in Section 5.2 with the recurrence relation shown in Lemma 3, one can calculate algebraically the probability of observing a given genetic configuration of individuals sampled from the stationary distribution of the non-neutral diffusion. This kind of algebraic approach would complement previous works (Evans et al., 2007; Živković and Stephan, 2011) on sample allele frequency spectra that involve solving ODEs satisfied by the moments of the diffusion. Further, the algebraic approach is potentially more efficient than computationally expensive Monte Carlo methods (Donnelly et al., 2001) and more generally applicable than methods relying on the selection coefficients being of a certain form (Genz and Joyce, 2003). Note that, by discretizing time and space, our representation of the transition density function can be used for approximate simulation of frequencies from stationarity as well as frequency trajectories, which can in turn be employed in the aforementioned Monte Carlo frameworks.

The sampling probability can be applied, for example, to estimate evolutionary parameters via maximum likelihood or Bayesian inference frameworks. Furthermore, the notion of sampling probability can be combined with the spectral representation of the transition density function in a hidden Markov model framework as in Bollback et al. (2008), to calculate the probability of observing a series of configurations sampled at different times. The method developed in this paper would allow for such an analysis in a model with multiple alleles subject to recurrent parent-independent mutation and general diploid selection.

An important, albeit very challenging, future direction is to extend our current approach to analyze the dynamics of multi-locus diffusions with recombination and selection. Such an extension would allow for the incorporation of additional data at closely linked loci, which has the potential to significantly improve the inference of evolutionary parameters, especially the strength and localization of selection. We have only considered Wright-Fisher diffusions in a single panmictic population of a constant size. As achieved in the alternative approaches of Gutenkunst et al. (2009) and Lukić et al. (2011), mentioned in Introduction, it would be desirable to generalize our approach to incorporate subdivided populations exchanging migrants, with possibly fluctuating population sizes. Another possible extension is to relax the PIM assumption and consider a more general mutation model.

We note that our present technique relies on the diffusion generator being symmetric. This symmetry does not hold in some of the scenarios mentioned above, making a direct application of the ideas developed here difficult. However, we believe that it is worthwhile investigating whether one could apply our approach to devise approximations to the transition density function that are sufficiently accurate for practical applications.

Acknowledgement

We thank Anand Bhaskar for many helpful discussions. This research is supported in part by a DFG Research Fellowship STE 2011/1-1 to M.S.; and by an NIH grant R01-GM094402, an Alfred P. Sloan Research Fellowship, and a Packard Fellowship for Science and Engineering to Y.S.S.

Appendix A

Proofs of lemmas

Proof of Lemma 2. For two indices $n, m \in ℕ_{0}^{K - 1}$ , consider the integral

\int_{Δ_{K - 1}} P_{n}^{θ} (x) P_{m}^{θ} (x) Π_{0} (x) d x = \int_{{[0, 1]}^{K - 1}} {P̃}_{n}^{θ} (ξ) {P̃}_{m}^{θ} (ξ) Π_{0} (x (ξ)) | det (D x) (ξ) | d ξ,

(A.1)

where the right hand side can be obtained by the coordinate transformation introduced in Appendix B and the multivariate integration through substitution rule. Using

Π_{0} (x (ξ)) = \prod_{i = 1}^{K - 1} ξ_{i}^{θ_{i} - 1} {(1 - ξ_{i})}^{Θ_{i} - (K - i)},

the determinant of the Jacobian

| det (D x) (ξ) | = \prod_{i = 1}^{K - 2} {(1 - ξ_{i})}^{K - (i + 1)},

see Baxter et al. (2007)[Equation B.1], and the transformed Jacobi polynomials (B.1), it can be shown that

\prod_{j = 1}^{K - 1} \int_{0}^{1} R_{n_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})} (ξ_{j}) R_{m_{j}}^{(θ_{j}, Θ_{j} + 2 M_{j})} (ξ_{j}) ξ_{j}^{θ_{j} - 1} {(1 - ξ_{j})}^{Θ_{j} + N_{j} + M_{j} - 1} d ξ_{j} = C_{n}^{θ} δ_{n, m}

holds, with

C_{n}^{θ} = \prod_{j = 1}^{K - 1} c_{n_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})} .

In the case n = m this can be seen immediately. If n ≠ m without loss of generality let 1 ≤ l ≤ K − 1 be the largest l such that n_l < m_l and n_k = m_k for all k = l+1, …, K − 1. Then N_l = M_l (recall N_K−1 = M_K−1 = 0) and $R_{m_{l}}^{(θ_{l}, Θ_{l} + 2 M_{l})}$ (ξ_l) is orthogonal to all polynomials of lesser degree with respect to the weight function $ξ_{l}^{θ_{l} - 1}$ (1 − ξ_l)^{Θ_l+2M_l−1}, and thus the l-th factor and the whole product is zero.

Proof of Lemma 3. We found it most convenient to derive a recurrence relation for

x_{i} P_{n}^{θ} (x)

(A.2)

by projecting expression (A.2) onto the orthogonal basis ${P_{m}^{θ} (x)}$ , and investigate the respective coefficients. First, note that the coordinate transformation introduced in Appendix B yields x_i = ξ_i Π_j<i(1 − ξ_j), so

x_{i} P_{n}^{θ} (x) = ξ_{i} \prod_{j < i} (1 - ξ_{j}) {P̃}_{n}^{θ} (ξ) .

(A.3)

Further, integrate expression (A.3) against the base function ${P̃}_{m}^{θ} (ξ)$ times the weight function Π₀ to get the respective coefficient in the basis representation. Using the integration by substitution rule again, as in equation (A.1), this yields

\frac{1}{C_{m}^{θ}} \int_{{[0, 1]}^{K - 1}} ξ_{i} \prod_{j < i} (1 - ξ_{i}) {P̃}_{n}^{θ} (ξ) {P̃}_{m}^{θ} (ξ) \prod_{k = 1}^{K - 1} ξ_{k}^{θ_{k} - 1} {(1 - ξ_{k})}^{Θ_{k} - 1} d ξ = \prod_{j = i + 1}^{K - 1} \frac{1}{c_{m_{j}}^{(θ_{j}, Θ_{j} + 2 M_{j})}} \int_{0}^{1} R_{n_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})} (ξ_{j}) R_{m_{j}}^{(θ_{j}, Θ_{j} + 2 M_{j})} (ξ_{j}) ξ_{j}^{θ_{j} - 1} {(1 - ξ_{j})}^{Θ_{j} + N_{j} + M_{j} - 1} d ξ_{j} \times \frac{1}{c_{m_{i}}^{(θ_{i}, Θ_{i} + 2 M_{i})}} \int_{0}^{1} R_{n_{i}}^{(θ_{i}, Θ_{i} + 2 N_{i})} (ξ_{i}) R_{m_{i}}^{(θ_{i}, Θ_{i} + 2 M_{i})} (ξ_{i}) ξ_{i}^{θ_{i}} {(1 - ξ_{j})}^{Θ_{i} + N_{i} + M_{i} - 1} d ξ_{i} \times \prod_{j = 1}^{i - 1} \frac{1}{c_{m_{j}}^{(θ_{j}, Θ_{j} + 2 M_{j})}} \int_{0}^{1} R_{n_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})} (ξ_{j}) R_{m_{j}}^{(θ_{j}, Θ_{j} + 2 M_{j})} (ξ_{j}) ξ_{j}^{θ_{j} - 1} {(1 - ξ_{j})}^{Θ_{j} + N_{j} + M_{j}} d ξ_{j} .

The first term on the right hand side yields zero, unless m_j = n_j for all j > i, thus M_i = N_i. In this case the term is equal to 1. Since m_j = n_j for j > i, note that the second term on the right hand side is of the form

\frac{1}{c_{m_{i}}^{(α, β)}} \int_{0}^{1} R_{n_{i}}^{(α, β)} (ξ) R_{m_{i}}^{(α, β)} (ξ) ξ w_{α, β} (ξ) d ξ = G_{n_{i}, m_{i}}^{(α, β)} δ_{n_{i} + 1, m_{i}} + G_{n_{i}, m_{i}}^{(α, β)} δ_{n_{i}, m_{i}} + G_{n_{i}, m_{i}}^{(α, β)} δ_{n_{i} - 1, m_{i}},

with w_α,β(ξ) = ξ^α−1(1 − ξ)^β−1, α = θ_i, and β = Θ_i + 2N_i. Here we applied the recurrence relation (10) to $ξ R_{n_{i}}^{(α, β)} (ξ)$ and used the orthogonality of the Jacobi polynomials. The constants $G_{n, m}^{(a, b)}$ are given by

G_{n, m}^{(a, b)} = {\begin{matrix} \frac{(n + a - 1) (n + b - 1)}{(2 n + a + b - 1) (2 n + a + b - 2)}, & if n - m = 1 and n > 0, \\ \frac{1}{2} - \frac{b^{2} - a^{2} - 2 (b - a)}{2 (2 n + a + b) (2 n + a + b - 2)}, & if n - m = 0 and n \geq 0, \\ \frac{(n + 1) (n + a + b - 1)}{(2 n + a + b) (2 n + a + b - 1)}, & if n - m = - 1 and n \geq 0 . \end{matrix}

This expression is non-zero for −1 ≤ n_i−m_i ≤ 1. Furthermore, the form of the integral for j = i − 1 depends on this difference, or rather the difference between N_j and M_j. Depending on the difference N_j − M_j we have to consider the integrals

- 1 : \frac{1}{c_{m_{j}}^{(α, β + 2)}} \int_{0}^{1} R_{n_{j}}^{(α, β)} (ξ) R_{m_{j}}^{(α, β + 2)} (ξ) w_{α, β + 2} (ξ) d ξ,

(A.4)

0 : \frac{1}{c_{m_{j}}^{(α, β)}} \int_{0}^{1} R_{n_{j}}^{(α, β)} (ξ) R_{m_{j}}^{(α, β)} (ξ) (1 - ξ) w_{α, β} (ξ) d ξ,

(A.5)

+ 1 : \frac{1}{c_{m_{j}}^{(α, β + 2)}} \int_{0}^{1} R_{n_{j}}^{(α, β)} (ξ) R_{m_{j}}^{(α, β - 2)} (ξ) w_{α, β} (ξ), d ξ,

(A.6)

with α = θ_j and β = Θ_j + 2N_j. In expression (A.6) we have to assume β > 2, which is equivalent to N_j ≥ 1. This holds true, because if N_j = 0, this case would not have to be considered.

Applying relation (12) twice to the polynomial $R_{n_{j}}^{(α, β)} (ξ)$ in equation (A.4) and using orthogonality yields

H_{n_{j}, m_{j}}^{(α, β)} δ_{n_{j}, m_{j}} + H_{n_{j}, m_{j}}^{(α, β)} δ_{n_{j} - 1, m_{j}} + H_{n_{j}, m_{j}}^{(α, β)} δ_{n_{j} - 2, m_{j}},

for some constants $H_{n, m}^{(α, β)}$ . Here $H_{0, - 1}^{(α, β)} = H_{0, - 2}^{(α, β)} = H_{1, - 1}^{(α, β)} = 0$ . Thus, in the case M_j = N_j + 1, the expression for j is non-zero for m_j = n_j, n_j − 1, and n_j − 2. Furthermore, relation (10) can be applied to the term $R_{n_{j}}^{(α, β)} (ξ)$ (1 − ξ), together with orthogonality to get

I_{n_{j}, m_{j}}^{(α, β)} δ_{n_{j} + 1, m_{j}} + I_{n_{j}, m_{j}}^{(α, β)} δ_{n_{j}, m_{j}} + I_{n_{j}, m_{j}}^{(α, β)} δ_{n_{j} - 1, m_{j}},

for given constants $I_{n, m}^{(α, β)}$ , with $I_{- 1, 0}^{(α, β)} = I_{0, - 1}^{(α, β)} = 0$ . In the case M_j = N_j, the expression is non-zero for m_j = n_j − 1, n_j, and n_j + 1. Finally, applying relation (12) to the term $R_{m_{j}}^{(α, β - 2)} (ξ)$ in expression (A.6) combined with orthogonality yields

J_{n_{j}, m_{j}}^{(α, β)} δ_{n_{j}, m_{j}} + J_{n_{j}, m_{j}}^{(α, β)} δ_{n_{j} + 1, m_{j}} + J_{n_{j}, m_{j}}^{(α, β)} δ_{n_{j} + 2, m_{j}},

for given constants $J_{n, m}^{(α, β)}$ . Again $J_{- 1, 0}^{(α, β)} = J_{- 2, 0}^{(α, β)} = J_{- 1, 1}^{(α, β)} = 0$ . Thus this expression is non-zero for m_j = n_j, n_j + 1, and n_j + 2. The constants $H_{n, m}^{(a, b)}, I_{n, m}^{(a, b)}, J_{n, m}^{(a, b)}$ are given by

H_{n, m}^{(a, b)} = {\begin{matrix} \frac{(n + a + b - 1) (n + a + b)}{(2 n + a + b - 1) (2 n + a + b)}, & if m - n = 0 and n \geq 0, \\ - \frac{2 a}{a + b + 2}, & if m - n = - 1 and n = 1 \\ - \frac{2 (n + a - 1) (n + a + b - 1)}{(2 n + a + b - 2) (2 n + a + b)}, & if m - n = - 1 and n > 1, \\ \frac{(n + a - 2) (n + a - 1)}{(2 n + a + b - 2) (2 n + a + b - 1)}, & if m - n = - 2 and n > 1, \end{matrix}

I_{n, m}^{(a, b)} = {\begin{matrix} - \frac{1}{a + b}, & if m - n = 1 and n = 0, \\ - \frac{(n + 1) (n + a + b - 1)}{(2 n + a + b - 1) (2 n + a + b)}, & if m - n = 1 and n > 0, \\ \frac{b}{a + b}, & if m - n = 0 and n = 0, \\ \frac{b^{2} + a (b + 2)}{(a + b) (a + b + 2)}, & if m - n = 0 and n = 1, \\ \frac{b^{2} + 2 n (n + a - 1) + b (2 n + a - 2)}{(2 n + a + b - 2) (2 n + a + b)}, & if m - n = 0 and n > 1, \\ - \frac{ab}{(a + b) (a + b + 1)}, & if m - n = - 1 and n = 1 \\ - \frac{(n + a - 1) (n + b - 1)}{(2 n + a + b - 2) (2 n + a + b - 1)}, & if m - n = - 1 and n > 1, \end{matrix}

J_{n, m}^{(a, b)} = {\begin{matrix} \frac{(b - 1) (b - 2)}{(a + b - 1) (a + b - 2)}, & if m - n = 0 and n = 0, \\ \frac{(n + b - 2) (n + b - 1)}{(2 n + a + b - 2) (2 n + a + b - 1)}, & if m - n = 0 and n > 0, \\ - \frac{2 (n + 1) (n + b - 1)}{(2 n + a + b - 2) (2 n + a + b)}, & if m - n = 1 and n \geq 0, \\ \frac{(n + 1) (n + 2)}{(2 n + a + b - 1) (2 n + a + b)}, & if m - n = 2 and n \geq 0 . \end{matrix}

Now considering all three possible values for N_j − M_j, and all possible implications for the difference n_j − m_j, it can be shown that 1 ≤ N_j−1 − M_j−1 ≤ 1 has to hold as well. Using induction shows that 1 ≤ N_j − M_j ≤ 1 holds for all j < i. Thus for all j < i the same integrals (A.4), (A.5), and (A.6), with adjusted parameters α = θ_j and β = Θ_j + 2N_j, have to be considered.

Combining these results shows that for fixed i and n the polynomials with a non-zero contribution to the recurrence relation for $x_{i} P_{n}^{θ} (x)$ are exactly those with indices from the set

ℳ_{i} (n) : = {m \in ℕ_{0}^{K - 1} | m_{j} \geq 0 \forall_{j}, M_{j} = N_{j} \forall_{j} > i, | M_{j} - N_{j} | \leq 1 \forall_{j} \leq i},

defined in (16). Thus,

x_{i} P_{n}^{θ} (x) = \sum_{m \in ℳ_{i} (n)} r_{n, m}^{(θ, i)} P_{m}^{θ} (x),

where the coefficients $r_{n, m}^{(θ, i)}$ are given by

r_{n, m}^{(θ, i)} = G_{n_{i}, m_{i}}^{(θ_{i}, Θ_{i} + 2 N_{i})} \prod_{j < i} {\begin{matrix} H_{n_{j}, m_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})}, & if N_{j} - M_{j} = - 1, \\ I_{n_{j}, m_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})}, & if N_{j} - M_{j} = 0, \\ J_{n_{j}, m_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})}, & if N_{j} - M_{j} = + 1 . \end{matrix}

Proof of Lemma 5. Using the coordinate transformation introduced in Appendix B, and applying ℒ₀, given in equation (B.2), to ${P̃}_{n}^{θ} (ξ)$ from equation (B.1) yields

ℒ_{0} {P̃}_{n}^{θ} (ξ) = \frac{1}{2} \sum_{i = 1}^{K - 1} \frac{1}{Π_{k < i} (1 - ξ_{k})} \prod_{j = 1, j \neq i}^{K - 1} R_{n_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})} (ξ_{j}) {(1 - ξ_{j})}^{N_{j}} \times (ξ_{i} (1 - ξ_{i}) \frac{\partial^{2}}{\partial ξ_{i}^{2}} {R_{n_{i}}^{(θ_{i}, Θ_{i} + 2 N_{i})} (ξ_{i}) {(1 - ξ_{i})}^{N_{i}}} + (θ_{i} - Θ_{i - 1} ξ_{i}) \frac{\partial}{\partial ξ_{i}} {R_{n_{i}}^{(θ_{i}, Θ_{i} + 2 N_{i})} (ξ_{i}) {(1 - ξ_{i})}^{N_{i}}}) .

(A.7)

Employing equation (8), one can show that the terms in the brackets on the right hand side of equation (A.7) reduce to

{(1 - ξ_{i})}^{N_{i}} R_{n_{i}}^{(θ_{i}, Θ_{i} + 2 N_{i})} (ξ_{i}) (- N_{i - 1} (N_{i - 1} - 1 + Θ_{i - 1}) + \frac{1}{1 - ξ_{i}} N_{i} (N_{i} - 1 + Θ_{i})),

and substitution yields

ℒ_{0} {P̃}_{n}^{θ} (ξ) = \frac{1}{2} {P̃}_{n}^{θ} (ξ) (- \sum_{i = 1}^{K - 1} \frac{1}{Π_{k < i} (1 - ξ_{k})} N_{i - 1} (N_{i - 1} - 1 + Θ_{i - 1}) + \sum_{i = 2}^{K} \frac{1}{Π_{k < i} (1 - ξ_{k})} N_{i - 1} (N_{i - 1} - 1 + Θ_{i - 1})) = - λ_{| n |}^{θ} {P̃}_{n}^{θ} (ξ)

with $λ_{| n |}^{θ} = \frac{1}{2} | n | (| n | - 1 + | θ |)$ , since Θ₀ = |θ|, N₀ = |n|, and N_{K − 1} = 0.

Appendix B

Change of coordinates

Working with the multivariate Jacobi polynomials and the neutral diffusion, it is convenient, for some derivations, to transform the equations to a different coordinate system. This transformation maps the simplex Δ_{K − 1} to the K − 1-dimensional unit cube [0, 1]^K−1. It is implicitly used in Griffiths and Spanò (2011, Section 3), but more explicitly introduced and used as a transformation in Baxter et al. (2007). The vector ξ(x) = (ξ₁(x), …, ξ_K−1(x)) is obtained from the vector of population frequencies x via the transformation

ξ_{i} = \frac{x_{i}}{1 - \sum_{j < i} x_{j}}

for 1 ≤ i ≤ K − 1. The inverse of this transformation is given by

x_{i} = ξ_{i} \prod_{j < i} (1 - ξ_{j})

for 1 ≤ i ≤ K − 1. The inverse relation can be derived by noting that 1 − ∑_j<i x_j = Π_j<i(1 − ξ_j) holds.

Definition 1 yields immediately that the multivariate Jacobi polynomials $P_{n}^{θ} (x)$ take the form

{P̃}_{n}^{θ} (ξ) = P_{n}^{θ} (x (ξ)) = \prod_{j = 1}^{K - 1} R_{n_{j}}^{(θ_{j}, Θ_{j} + 2 N_{j})} (ξ_{j}) {(1 - ξ_{j})}^{N_{j}}

(B.1)

in the transformed coordinates. The neutral diffusion generator ℒ₀ in the transformed coordinate system is given by the following lemma.

Lemma 7. Using variables in the new coordinate system, the backward generator of the diffusion under neutrality ℒ₀ can be written as

ℒ_{0} f̃ (ξ) = \frac{1}{2} \sum_{i = 1}^{K - 1} {b̃}_{i, i} (ξ) \frac{\partial^{2}}{\partial ξ_{i}^{2}} f̃ (ξ) + \sum_{i = 1}^{K - 1} ã_{i} (ξ) \frac{\partial}{\partial ξ_{i}} f̃ (ξ),

(B.2)

with

{b̃}_{i, j} (ξ) = δ_{i, j} (\frac{ξ_{i} (1 - ξ_{i})}{Π_{k < i} (1 - ξ_{k})})

and

ã_{i} (ξ) = \frac{1}{2} \frac{θ_{i} - Θ_{i - 1} ξ_{i}}{Π_{k < i} (1 - ξ_{k})} .

The proof of this lemma is paraphrased in Appendix B of Baxter et al. (2007). The transformation diagonalizes the operator by removing all the mixed second order partial derivatives.

Appendix C

Coefficients of the polynomial Q(x; σ, θ)

q = \frac{1}{2} (\sum_{j = 1}^{K} θ_{j} σ_{K, j} - | θ | σ_{K, K}) when i = \emptyset,

q (i_{1}) = \frac{1}{2} (\sum_{j = 1}^{K} θ_{j} (σ_{i_{1}, j} - σ_{t, K}) + σ_{i_{1}, K}^{2} + σ_{K, K}^{2} - 2 σ_{K, K} σ_{i_{1}, K} - 2 (1 + | θ |) σ_{i_{1}, K} + (1 + 2 | θ |) σ_{K, K} + σ_{i_{1}, i_{1}}),

q (i_{1}, i_{2}) = \frac{1}{2} (2 σ_{i_{1}, K} σ_{i_{1}, i_{2}} - 3 σ_{i_{1}, K} σ_{i_{2}, K} + 8 σ_{i_{2}, K} σ_{K, K} - 2 σ_{K, K} σ_{i_{1}, i_{2}} - 2 σ_{i_{1}, K}^{2} - 3 σ_{K, K}^{2} - (1 + | θ |) (σ_{i_{1}, i_{2}} + σ_{K, K} - 2 σ_{i_{2}, K})),

q (i_{1}, i_{2}, i_{3}) = \frac{1}{2} ((σ_{i_{1}, i_{3}} - σ_{i_{1}, K}) (σ_{i_{1}, i_{2}} - σ_{i_{1}, K}) - (σ_{i_{3}, K} - σ_{K, K}) (σ_{i_{2}, K} - σ_{K, K}) - 4 (σ_{i_{2}, i_{3}} + σ_{K, K} - 2 σ_{i_{3}, K}) (σ_{i_{1}, K} - σ_{K, K})),

q (i_{1}, i_{2}, i_{3}, i_{4}) = - \frac{1}{2} ((σ_{i_{1}, i_{2}} + σ_{K, K} - 2 σ_{i_{2}, K}) (σ_{i_{3}, i_{4}} + σ_{K, K} - 2 σ_{i_{4}, K})) .

Appendix D

Derivation of equation (25)

Applying ℒ to $S_{n}^{θ} (x)$ ,

ℒ S_{n}^{θ} (x) = (ℒ_{0} + ℒ_{σ}) (P_{n}^{θ} (x) e^{- σ̄ (x) / 2}) = e^{- \frac{σ̄ (x)}{2}} ℒ_{0} P_{n}^{θ} (x) + P_{n}^{θ} (x) ℒ_{0} e^{- \frac{σ̄ (x)}{2}} + \sum_{i, j = 1}^{K - 1} x_{i} (δ_{i, j} - x_{j}) \frac{\partial}{\partial x_{i}} {e^{- \frac{σ̄ (x)}{2}}} \frac{\partial}{\partial x_{j}} {P_{n}^{θ} (x)} + P_{n}^{θ} (x) ℒ_{σ} e^{- \frac{σ̄ (x)}{2}} + e^{- \frac{σ̄ (x)}{2}} ℒ_{σ} P_{n}^{θ} (x) = - λ_{n}^{θ} e^{- \frac{σ̄ (x)}{2}} P_{n}^{θ} (x) + P_{n}^{θ} (x) ℒ e^{- \frac{σ̄ (x)}{2}} + \sum_{i, j = 1}^{K - 1} x_{i} (δ_{i, j} - x_{j}) \frac{\partial}{\partial x_{i}} {e^{- \frac{σ̄ (x)}{2}}} \frac{\partial}{\partial x_{j}} {P_{n}^{θ} (x)} + e^{- \frac{σ̄ (x)}{2}} ℒ_{σ} P_{n}^{θ} (x) .

It can be shown that the last two terms in the above expression sum up to 0. Note that for 1 ≤ i, j ≤ K − 1,

\frac{\partial}{\partial x_{i}} σ̄ (x) = 2 \sum_{k = 1}^{K} σ_{k, i} x_{k} - 2 \sum_{l = 1}^{K} σ_{l, K} x_{l},

(D.1)

\frac{\partial^{2}}{\partial x_{j} \partial x_{i}} σ̄ (x) = 2 (σ_{i, j} - σ_{j, K} - σ_{i, K} + σ_{K, K}) .

(D.2)

It follows that

\sum_{i, j = 1}^{K - 1} x_{i} (δ_{i, j} - x_{j}) \frac{\partial}{\partial x_{i}} {e^{- \frac{σ̄ (x)}{2}}} \frac{\partial}{\partial x_{j}} {P_{n}^{θ} (x)} + e^{- \frac{σ̄ (x)}{2}} ℒ_{σ} P_{n}^{θ} (x) = e^{- \frac{σ̄ (x)}{2}} [\sum_{i = 1}^{K - 1} x_{i} \frac{\partial}{\partial x_{i}} {- \frac{σ̄ (x)}{2}} \frac{\partial}{\partial x_{i}} {P_{n}^{θ} (x)} - \sum_{i, j = 1}^{K - 1} x_{i} x_{j} \frac{\partial}{\partial x_{i}} {- \frac{σ̄ (x)}{2}} \frac{\partial}{\partial x_{j}} {P_{n}^{θ} (x)} + \sum_{i = 1}^{K - 1} x_{i} \frac{\partial}{\partial x_{i}} {P_{n}^{θ} (x)} \sum_{j = 1}^{K} σ_{i, j} x_{j} - σ̄ (x) \sum_{i = 1}^{K - 1} x_{i} \frac{\partial}{\partial x_{i}} {P_{n}^{θ} (x)}] = e^{- \frac{σ̄ (x)}{2}} [- \sum_{i = 1}^{K - 1} x_{i} (\sum_{k = 1}^{K} σ_{k, i} x_{k} - \sum_{l = 1}^{K} σ_{l, K} x_{l}) \frac{\partial}{\partial x_{i}} {P_{n}^{θ} (x)} + \sum_{i, j = 1}^{K - 1} x_{i} x_{j} (\sum_{k = 1}^{K} σ_{k, i} x_{k} - \sum_{l = 1}^{K} σ_{l, K} x_{l}) \frac{\partial}{\partial x_{j}} {P_{n}^{θ} (x)} + \sum_{i = 1}^{K - 1} x_{i} \frac{\partial}{\partial x_{i}} {P_{n}^{θ} (x)} \sum_{j = 1}^{K} σ_{i ․ j} x_{j} - σ̄ (x) \sum_{i = 1}^{K - 1} x_{i} \frac{\partial}{\partial x_{i}} {P_{n}^{θ} (x)}] = e^{- \frac{σ̄ (x)}{2}} [\sum_{i = 1}^{K - 1} x_{i} \frac{\partial}{\partial x_{i}} {P_{n}^{θ} (x)} \sum_{l = 1}^{K} σ_{l, K} x_{l} - \sum_{j = 1}^{K - 1} x_{j} \frac{\partial}{\partial x_{j}} {P_{n}^{θ} (x)} (\sum_{l = 1}^{K} σ_{l, K} x_{K} x_{l} + \sum_{i = 1}^{K - 1} x_{i} \sum_{l = 1}^{K} σ_{l, K} x_{l})] = 0,

where we used equation (D.1) for the second equality and $\sum_{i = 1}^{K} x_{i} = 1$ for the last equality.

Further, using equation (D.1) and (D.2) one can show that

ℒ e^{- \frac{σ̄ (x)}{2}} = \frac{1}{2} e^{- \frac{σ̄ (x)}{2}} (- \sum_{i = 1}^{K} x_{i} σ_{i}^{2} (x) - \sum_{i = 1}^{K} x_{i} σ_{ii} + (1 + | θ |) σ̄ (x) + σ̄ {(x)}^{2} - \sum_{i = 1}^{K} θ_{i} σ_{i} (x)) = e^{- \frac{\bar{σ} (x)}{2}} Q (x; σ, θ),

where Q takes the form (27), that is Q(x; σ, θ) = ∑_i∈ℐ q(i)x_i, with the constants q(i) given in Appendix C.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Matthias Steinrücken, Email: steinrue@stat.berkeley.edu.

Y. X. Rachel Wang, Email: rachelwang@stat.berkeley.edu.

Yun S. Song, Email: yss@cs.berkeley.edu.

References

Abramowitz M, Stegun IA. Handbook of Mathematical Functions. Dover Publications; 1965. [Google Scholar]
Barbour AD, Ethier SN, Griffiths RC. A transition function expansion for a diffusion model with selection. Ann. Appl. Probab. 2000;10:123–162. [Google Scholar]
Baxter G, Blythe R, McKane A. Exact solution of the multi-allelic diffusion model. Math. Biosci. 2007;209:124–170. doi: 10.1016/j.mbs.2007.01.001. [DOI] [PubMed] [Google Scholar]
Bollback JP, York TL, Nielsen R. Estimation of 2Nes from temporal allele frequency data. Genetics. 2008;179(1):497–502. doi: 10.1534/genetics.107.085019. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buzbas EO, Joyce P. Maximum likelihood estimates under the k-allele model with selection can be numerically unstable. Ann. Appl. Stat. 2009;3(3):1147–1162. [Google Scholar]
Buzbas EO, Joyce P, Abdo Z. Estimation of selection intensity under overdominance by Bayesian methods. Stat. Appl. Genet. Mol. Biol. 2009;8(1) doi: 10.2202/1544-6115.1466. Article 32. [DOI] [PMC free article] [PubMed] [Google Scholar]
Buzbas EO, Joyce P, Rosenberg NA. Inference on the strength of balancing selection for epistatically interacting loci. Theor. Popul. Biol. 2011;79(3):102–113. doi: 10.1016/j.tpb.2011.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Donnelly P, Nordborg M, Joyce P. Likelihoods and simulation methods for a class of nonneutral population genetics models. Genetics. 2001;159(2):853–867. doi: 10.1093/genetics/159.2.853. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dunkl C, Xu Y. Orthogonal Polynomials of Several Variables. Cambridge University Press; 2001. [Google Scholar]
Durrett R. Probability Models for DNA Sequence Evolution. Springer; 2008. [Google Scholar]
Epstein CL, Mazzeo R. Degenerate diffusion operators arising in population biology. 2011 arXiv preprint: http://arxiv.org/abs/1110.0032.
Ethier SN, Kurtz TG. Convergence to Fleming-Viot processes in the weak atomic topology. Stoch. Proc. Appl. 1994;54(1):1–27. [Google Scholar]
Evans SN, Shvets Y, Slatkin M. Non-equilibrium theory of the allele frequency spectrum. Theor. Popul. Biol. 2007;71(1):109–119. doi: 10.1016/j.tpb.2006.06.005. [DOI] [PubMed] [Google Scholar]
Ewens W. Mathematical Population Genetics. 2nd edition. volume I. Springer; 2004. Theoretical introduction. [Google Scholar]
Genz A, Joyce P. Computation of the normalizing constant for exponentially weighted dirichlet distribution integrals. Comp. Sci. Stat. 2003;35:181–212. [Google Scholar]
Griffiths R. A transition density expansion for a multi-allele diffusion model. Adv. Appl. Prob. 1979;11:310–325. [Google Scholar]
Griffiths RC, Li W-H. Simulating allele frequencies in a population and the genetic differentiation of populations under mutation pressure. Theor. Popul. Biol. 1983;23(1):19–33. doi: 10.1016/0040-5809(83)90003-5. [DOI] [PubMed] [Google Scholar]
Griffiths RC, Spanò D. Probability and Mathematical Genetics, Papers in Honour of Sir John Kingman. LMS Lecture Note Series 378. chapter 15. Cambridge University Press; 2010. Diffusion processes and coalescent trees; pp. 358–375. [Google Scholar]
Griffiths RC, Spanò D. Multivariate Jacobi and Laguerre polynomials, infinite-dimensional extensions, and their probabilistic connections with multivariate Hahn and Meixner polynomials. Bernoulli. 2011;17(3):1095–1125. [Google Scholar]
Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009;5:e1000695. doi: 10.1371/journal.pgen.1000695. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karlin S, Taylor H. A Second Course in Stochastic Processes. Academic Press; 1981. [Google Scholar]
Kimura M. Stochastic processes and distribution of gene frequences under natural selection. Cold Spring Harb. Symp. Quant. Biol. 1955;20:33–53. doi: 10.1101/sqb.1955.020.01.006. [DOI] [PubMed] [Google Scholar]
Kimura M. Some problems of stochastic processes in genetics. Ann. Math. Stat. 1957;28:882–901. [Google Scholar]
Lukić S, Hey J, Chen K. Non-equilibrium allele frequency spectra via spectral methods. Theor. Popul. Biol. 2011;79(4):203–219. doi: 10.1016/j.tpb.2011.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shimakura N. Equations différentielles provenant de la génétique des populations. Tohoku Math. J. 1977;29:287–318. [Google Scholar]
Song YS, Steinrücken M. A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection. Genetics. 2012;190(3):1117–1129. doi: 10.1534/genetics.111.136929. [DOI] [PMC free article] [PubMed] [Google Scholar]
Szegö G. Orthogonal Polynomials. American Mathematical Society; 1939. [Google Scholar]
Tavaré S. Line-of-descent and genealogical processes, and their applications in population genetics models. Theor. Popul. Biol. 1984;26:119–164. doi: 10.1016/0040-5809(84)90027-3. [DOI] [PubMed] [Google Scholar]
Wright S. Adaptation and selection. In: Jepson GL, Mayr E, Simpson GG, editors. Genetics, Paleontology and Evolution. Princeton, New Jersey: Princeton Univ. Press; 1949. pp. 365–389. [Google Scholar]
Živković D, Stephan W. Analytical results on the neutral non-equilibrium allele frequency spectrum based on diffusion theory. Theor. Popul. Biol. 2011;79(4):184–191. doi: 10.1016/j.tpb.2011.03.003. [DOI] [PubMed] [Google Scholar]

[R1] Abramowitz M, Stegun IA. Handbook of Mathematical Functions. Dover Publications; 1965. [Google Scholar]

[R2] Barbour AD, Ethier SN, Griffiths RC. A transition function expansion for a diffusion model with selection. Ann. Appl. Probab. 2000;10:123–162. [Google Scholar]

[R3] Baxter G, Blythe R, McKane A. Exact solution of the multi-allelic diffusion model. Math. Biosci. 2007;209:124–170. doi: 10.1016/j.mbs.2007.01.001. [DOI] [PubMed] [Google Scholar]

[R4] Bollback JP, York TL, Nielsen R. Estimation of 2Nes from temporal allele frequency data. Genetics. 2008;179(1):497–502. doi: 10.1534/genetics.107.085019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Buzbas EO, Joyce P. Maximum likelihood estimates under the k-allele model with selection can be numerically unstable. Ann. Appl. Stat. 2009;3(3):1147–1162. [Google Scholar]

[R6] Buzbas EO, Joyce P, Abdo Z. Estimation of selection intensity under overdominance by Bayesian methods. Stat. Appl. Genet. Mol. Biol. 2009;8(1) doi: 10.2202/1544-6115.1466. Article 32. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Buzbas EO, Joyce P, Rosenberg NA. Inference on the strength of balancing selection for epistatically interacting loci. Theor. Popul. Biol. 2011;79(3):102–113. doi: 10.1016/j.tpb.2011.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Donnelly P, Nordborg M, Joyce P. Likelihoods and simulation methods for a class of nonneutral population genetics models. Genetics. 2001;159(2):853–867. doi: 10.1093/genetics/159.2.853. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Dunkl C, Xu Y. Orthogonal Polynomials of Several Variables. Cambridge University Press; 2001. [Google Scholar]

[R10] Durrett R. Probability Models for DNA Sequence Evolution. Springer; 2008. [Google Scholar]

[R11] Epstein CL, Mazzeo R. Degenerate diffusion operators arising in population biology. 2011 arXiv preprint: http://arxiv.org/abs/1110.0032.

[R12] Ethier SN, Kurtz TG. Convergence to Fleming-Viot processes in the weak atomic topology. Stoch. Proc. Appl. 1994;54(1):1–27. [Google Scholar]

[R13] Evans SN, Shvets Y, Slatkin M. Non-equilibrium theory of the allele frequency spectrum. Theor. Popul. Biol. 2007;71(1):109–119. doi: 10.1016/j.tpb.2006.06.005. [DOI] [PubMed] [Google Scholar]

[R14] Ewens W. Mathematical Population Genetics. 2nd edition. volume I. Springer; 2004. Theoretical introduction. [Google Scholar]

[R15] Genz A, Joyce P. Computation of the normalizing constant for exponentially weighted dirichlet distribution integrals. Comp. Sci. Stat. 2003;35:181–212. [Google Scholar]

[R16] Griffiths R. A transition density expansion for a multi-allele diffusion model. Adv. Appl. Prob. 1979;11:310–325. [Google Scholar]

[R17] Griffiths RC, Li W-H. Simulating allele frequencies in a population and the genetic differentiation of populations under mutation pressure. Theor. Popul. Biol. 1983;23(1):19–33. doi: 10.1016/0040-5809(83)90003-5. [DOI] [PubMed] [Google Scholar]

[R18] Griffiths RC, Spanò D. Probability and Mathematical Genetics, Papers in Honour of Sir John Kingman. LMS Lecture Note Series 378. chapter 15. Cambridge University Press; 2010. Diffusion processes and coalescent trees; pp. 358–375. [Google Scholar]

[R19] Griffiths RC, Spanò D. Multivariate Jacobi and Laguerre polynomials, infinite-dimensional extensions, and their probabilistic connections with multivariate Hahn and Meixner polynomials. Bernoulli. 2011;17(3):1095–1125. [Google Scholar]

[R20] Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009;5:e1000695. doi: 10.1371/journal.pgen.1000695. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Karlin S, Taylor H. A Second Course in Stochastic Processes. Academic Press; 1981. [Google Scholar]

[R22] Kimura M. Stochastic processes and distribution of gene frequences under natural selection. Cold Spring Harb. Symp. Quant. Biol. 1955;20:33–53. doi: 10.1101/sqb.1955.020.01.006. [DOI] [PubMed] [Google Scholar]

[R23] Kimura M. Some problems of stochastic processes in genetics. Ann. Math. Stat. 1957;28:882–901. [Google Scholar]

[R24] Lukić S, Hey J, Chen K. Non-equilibrium allele frequency spectra via spectral methods. Theor. Popul. Biol. 2011;79(4):203–219. doi: 10.1016/j.tpb.2011.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Shimakura N. Equations différentielles provenant de la génétique des populations. Tohoku Math. J. 1977;29:287–318. [Google Scholar]

[R26] Song YS, Steinrücken M. A simple method for finding explicit analytic transition densities of diffusion processes with general diploid selection. Genetics. 2012;190(3):1117–1129. doi: 10.1534/genetics.111.136929. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R27] Szegö G. Orthogonal Polynomials. American Mathematical Society; 1939. [Google Scholar]

[R28] Tavaré S. Line-of-descent and genealogical processes, and their applications in population genetics models. Theor. Popul. Biol. 1984;26:119–164. doi: 10.1016/0040-5809(84)90027-3. [DOI] [PubMed] [Google Scholar]

[R29] Wright S. Adaptation and selection. In: Jepson GL, Mayr E, Simpson GG, editors. Genetics, Paleontology and Evolution. Princeton, New Jersey: Princeton Univ. Press; 1949. pp. 365–389. [Google Scholar]

[R30] Živković D, Stephan W. Analytical results on the neutral non-equilibrium allele frequency spectrum based on diffusion theory. Theor. Popul. Biol. 2011;79(4):184–191. doi: 10.1016/j.tpb.2011.03.003. [DOI] [PubMed] [Google Scholar]

PERMALINK

An explicit transition density expansion for a multi-allelic Wright-Fisher diffusion with general diploid selection

Matthias Steinrücken

Y X Rachel Wang

Yun S Song

Abstract

1. Introduction

2. Background

2.1. The Wright-Fisher diffusion

2.2. Spectral representation of the transition density function

2.3. Univariate Jacobi polynomials

2.4. A review of the K = 2 case

3. The Neutral Case with an Arbitrary Number of Alleles

3.1. Multivariate Jacobi polynomials

3.2. Recurrence relation for multivariate Jacobi polynomials

3.3. Eigenfunctions of the neutral generator ℒ0

4. A General Diploid Selection Case with an Arbitrary Number of Alleles

5. Empirical Results and Applications

5.1. Convergence of the eigenvalues and eigenvectors

Figure 1.

Figure 2.

Figure 3.

5.2. Transient and stationary densities

Figure 4.

Figure 5.

Figure 6.

6. Discussion

Acknowledgement

Appendix A

Proofs of lemmas

Appendix B

Change of coordinates

Appendix C

Coefficients of the polynomial Q(x; σ, θ)

Appendix D

Derivation of equation (25)

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.3. Eigenfunctions of the neutral generator ℒ₀