Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jan 1.
Published in final edited form as: J Am Stat Assoc. 2019 Jul 22;115(531):1472–1487. doi: 10.1080/01621459.2019.1635482

Efficiently inferring the demographic history of many populations with allele count data

Jack Kamm 1,2,6, Jonathan Terhorst 3, Richard Durbin 1,2, Yun S Song 4,5,6,*
PMCID: PMC7531012  NIHMSID: NIHMS1534767  PMID: 33012903

Abstract

The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than p reviously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed “basal Eurasian” admixture event in human history. We implement and release our method in a new open-source software package momi2.

Keywords: population genetics, coalescent theory, demographic inference, frequency spectrum

1. Introduction

All natural populations undergo evolutionary processes of migration, size changes, and divergence, and the history of these demographic events shape their present genetic diversity. Thus, inferring demographic history is of central concern in evolutionary and population genetics, both for its intrinsic interest (e.g., in dating the out-of-Africa migration of modern humans (Schaffner et al., 2005; Gutenkunst et al., 2009)) and also for biological applications (such as distinguishing the effects of natural selection from demography (Beaumont and Nichols, 1996; Boyko et al., 2008)). However, genetic sequence data and the space of possible demographic models are both very high dimensional objects, leading to numerous statistical and computational challenges when inferring demographic history from genetic data.

The joint sample frequency spectrum (SFS) is the multidimensional histogram of mutant allele counts in a sample of DNA sequences, and is a popular summary statistic which lies at the core of hundreds of empirical studies in population genetics; e.g., see Wakeley and Hey (1997); Griffiths and Tavaré (1998); Nielsen (2000); Gutenkunst et al. (2009); Coventry et al. (2010); Gazave et al. (2014); Gravel et al. (2011); Nelson et al. (2012); Excoffier et al. (2013); Jenkins et al. (2014); Bhaskar et al. (2015); Jouganous et al. (2017). Recently, progress has also been made on theoretical fronts to characterize statistical properties of SFS-based inference. In particular, studies of identifiability (Myers et al., 2008; Bhaskar and Song, 2014) and the rate of convergence (Terhorst and Song, 2015; Baharian and Gravel, 2018) have been carried out.

Demographic history can be inferred by fitting the observed value of the SFS to its expected value in a composite likelihood framework. The expected SFS can be efficiently computed when the demographic history is a tree, and in previous work we developed a method momi to compute the SFS of hundreds of populations related by a tree (Kamm et al., 2017). However, natural populations are often related by a more complex history that is not tree-like, as gene flow (the exchange of migrants between populations) adds extra edges to the topology associated with the demographic history. In this case, computing the expected SFS is much more computationally demanding, and existing methods for computing the exact expected SFS can scale to only a handful of populations (Gutenkunst et al., 2009; Jouganous et al., 2017).

In this article, we extend our previous algorithm momi to handle discrete (or pulse) migration events between populations, in which case demographies are described by general directed acyclic graphs (DAGs). Our new method momi2 can compute the exact expected SFS with admixture for more populations than previously possible, and uses novel insights from a stochastic process known as the Lookdown Construction (Donnelly and Kurtz, 1996; Donnelly et al., 1999). In addition, momi2 utilizes automatic differentiation (Corliss et al., 2002; Bhaskar et al., 2015) to compute gradients of the SFS, which we use to efficiently search the parameter space during optimization. Finally, momi2 can efficiently compute linear functionals of the SFS, which we exploit to compute the expected values of a number of standard population genetic summary statistics under complex demography.

The rest of this paper as organized as follows. In Section 2 we provide some background and survey related work. Section 3 describes our method, first with an illustrative example in Section 3.1, and then with formulas and pseudocode in Section 3.2. Finally, in Section 4, we apply our method to an 8-population SFS, including ancient and contemporary human populations, to estimate the timing and strength of a proposed “basal Eurasian” admixture event in human history. The Appendix contains all proofs, an analysis of the computational complexity of our method, and additional details of the application to ancient DNA.

2. Background

Suppose a sample of n=(n1,,nD) haploid genomes have been sampled from D “demes” or populations. The positions in the genome where the samples are not all identical are called segregating sites. In most organisms mutations are rare; most sites are not segregating. It is therefore reasonable to assume, as we do from now on, that each position in the genome has experienced at most a single mutation in its history, and that each individual can be labeled as having the “ancestral” or “derived” (mutant) allele at each segregating site. In population genetics, this simplifying assumption is known as the infinite sites model.

The sample frequency spectrum (SFS) is a D-dimensional array [fx](n1+1)××(nD+1) whose entry fx counts the number of segregating sites with exactly x copies of the derived allele and nx copies of the ancestral allele, where x=(x1,,xD)0D with 0 ≤ xd ≤ nd for each d=1,,D. Note we only consider segregating sites with 2 alleles, so f0 = fn = 0 by definition. Compared to the full data set (i.e., the complete genetic sequences of all n = n1 +⋯+ nd genomes), the SFS [fx] is a compressed, low-dimensional summary which nevertheless preserves much of the signal about the various population size changes, divergence times, and admixture events that occurred over the course of the populations’ history.

2.1. Demographic events

The expected multipopulation SFS can be obtained by integrating over random genealogies formed by a backwards-in-time stochastic process known as the structured coalescent (Kingman, 1982; Takahata, 1988; Notohara, 1990). Before getting to the technical details, we first review the basic dynamics of this process in order to build intuition for how the data have power to infer population splits, size changes, and gene flow events. See Durrett (2008) for a more detailed introduction.

Informally, the topology and branch lengths of genealogies are affected by a demographic history in two ways:

  1. Two lineages may not coalesce into a common ancestor until they reside in the same population, and the time until this occurs is affected by migration patterns and population split times.

  2. At any particular point in time, two members within the same population are more likely to have a common parent if the population size is small; so, for example, residents of a small village will typically be more closely related than residents of a large city.

Regarding the second point, we define the scaled effective population size η(t) such that the rate at which any two lineages find a common ancestor at time t is 1/η(t). Under the simplest random mating model (the Wright Fisher model, cf. Durrett (2008)), the census population size exactly equals , where T is the number of generations per unit time; more generally, η scales with the number of breeding individuals in the population. Thus, estimating η allows us to infer size change events such as bottlenecks, exponential growth, and population crashes.

If we could observe the true distribution of genealogies, then we could directly infer demographic history from the waiting times between coalescence events, following the principles listed in items 1 and 2 above. However, since genealogies are never directly observed, we must make inferences about demographic history indirectly using mutation data.

2.2. Likelihoods and the site frequency spectrum

Consider a genome with L positions and mutation rate θL per position per generation. So at any given position in the genome, mutations arise on the tree there as a Poisson point process with rate θL. The chance of 2 or more mutations at a single position is O(1L2), and taking the limit L → ∞ we arrive at the aforementioned infinite sites approximation (Kimura, 1969; Durrett, 2008), which assumes that each segregating site was caused by a single mutation, so that each allele may be labeled as ancestral or derived.

The observed segregating sites are not independent, because trees at neighboring positions are correlated. Unfortunately, even in the simplest case of a single population with constant size, an analytic expression for the likelihood of mutation data at a set of linked (non-independent) sites is not known (Bhaskar et al., 2015). Therefore, SFS data are generally used with composite likelihood methods. Recall that fx Is the total number of segregating sites with derived allele count pattern x=(x1,,xD). Define ϕx=1θE[fx], i.e., ϕx is the expected frequency of x per unit mutation rate. Equivalently, ϕx is the expected branch length subtending x leaves in a random coalescent tree (i.e., with x1 descendants in population 1, x2 in population 2, and so on). A commonly used composite likelihood is the Poisson random field model (Sawyer and Hartl, 1992), which assumes that the total number of segregating sites is Poisson with rate θxϕx, and that the patterns at the observed sites are independent with sampling probabilities proportional to ϕx; this yields a log-likelihood of

Lf1log(θϕ1)θϕ1+xfxlogϕxϕ1, (1)

where f1xfx and ϕ1xϕx. Demographic history can then be inferred by searching for the parameter values that maximize L.

2.3. Existing work and our contribution

The composite log-likelihood L in (1) requires us to compute ϕx, the expected branch length subtending x. Let G be a random genealogical tree sampled under the demography, and Lx(G) be the total length of all branches in G subtending x, so that

ϕx=E[Lx(G)]=GLx(G)d(G). (2)

The integral (2) is difficult since the support of G is at least as large as the number of labeled binary trees with n leaves, a quantity which grows faster than exponentially in the sample size n. Consequently, a number of different methods have been proposed for evaluating (or approximating) the integral (2). Sampling-based methods include Markov chain Monte Carlo (Griffiths and Tavaré, 1997; Nielsen, 2000), importance sampling (Stephens and Donnelly, 2000; De Iorio and Griffiths, 2004), simulation (Excoffier and Foll, 2011; Excoffier et al., 2013), and ABC (Wegmann et al., 2010). The main advantage of these methods is their flexibility: since, as mentioned above, computing the likelihood of the data is trivial conditional on G, these methods can be used on a rich class of models. However, the high dimension of ϕx makes it impractical to compute by sampling unless D is small. In particular, since the support of x grows like O(nD), Monte Carlo methods will assign zero mass to configurations that are actually observed in the data.

A second approach, implemented in the software ∂a∂i (Gutenkunst et al., 2009), computes ϕx by numerically solving PDEs arising from the Wright-Fisher diffusion (Ewens, 2004), which is dual to the coalescent process described above. For D populations, this involves numerically solving a D-dimensional integral. The initial ∂a∂i method in Gutenkunst et al. (2009) could handle up to D=3 populations; subsequent improvements (Lukić and Hey, 2012; Jouganous et al., 2017) extended this to D=4 and then D=5 populations by using spectral representations or alternative basis functions for solving the PDEs.

The third approach for computing ϕx, which includes our method, integrates over the sample allele frequencies “backwards-in-time”, exploiting conditional independence relationships to reduce computation. This involves considering the alleles of the sample’s ancestors at different points in the demographic history, and integrating out these random variables via inference algorithms for probabilistic graphical models (Pearl, 1982; Felsenstein, 1981; Lauritzen and Spiegelhalter, 1988; Koller and Friedman, 2009). Bryant et al. (2012) and Chen (2012) computed the SFS for finite- and infinite-sites models using this backward-in-time approach under the coalescent. De Maio et al. (2015) and Kamm et al. (2017) substantially lowered the computational burden of this approach by replacing the coalescent with the continuous-time Moran model (Durrett, 2008), a stochastic process which induces the same sampling distribution as the coalescent, but using a much smaller state space. However, until now these Moran-based approaches have been limited to analyzing tree-shaped demographies without admixture between populations.1

The main contribution of this paper is to extend our previous Moran-based method (Kamm et al., 2017) to allow for demographies defined on general directed acyclic graphs (DAGs), thus allowing for admixture between populations. We describe and implement an algorithm for computing the expected infinite-sites SFS under the multipopulation Moran model with size changes, exponential growth, population splits, and point admixture events (i.e., instantaneous migration “pulses”). This substantially enlarges the space of demographies for which the expected SFS can be accurately computed.

Additionally, our algorithm computes not only individual SFS entries, but also linear functionals of the expected SFS. Specifically, our method computes rank-1 tensor products of the SFS in the same time as a single entry (general linear functionals are sums of these rank-1 products). A number of widelyused statistics in population genetics can be expressed as SFS functionals, and to our knowledge our method is the first to compute expectations of these statistics under complex demography.

We demonstrate our method by using it to infer the history of eight human subpopulations that have undergone multiple admixture events. We complement our theoretical contributions with an open-source, user-friendly software implementation that will enable practitioners to deploy our method. The software uses automatic differentiation (Corliss et al., 2002; Bhaskar et al., 2015; Maclaurin et al., 2015) to compute derivatives of the SFS, leading to efficient optimization and parameter inference. Our package, called momi2, is available for download at https://github.com/popgenmethods/momi2.

3. Method

In this section, we describe the algorithm implemented in momi2 for computing the expected SFS under complex demographies. We begin in Subsection 3.1 with an illustrative example that highlights the novel aspects of our work. Then in Subsection 3.2 we provide pseudocode for our algorithm, and state the formulas used by our algorithm as Propositions. The proofs of these Propositions, and the proof for the correctness of our algorithm, requires substantial additional notation, and we defer this to Appendix A.2. For ease of reference, the symbols we use are summarized in Table 1.

Table 1.

Notation. Rows in the second part of the table are only used for proofs in the Appendix and may be ignored in the main text.

Symbol Description Reference
D Sampled populations are {1,,D} §2
n # of samples per population (n1,,nD)
fx # of sites with x derived and nx ancestral alleles
θ Mutation rate §2.2
ϕx Expected branch length with x descendants, 1θE[fx]
G Population graph Figure 2(a)
v A vertex in G, corresponding to a population §3.2
Tv time between top and bottom events of v
ηv (t) scaled population size of v at t ∈[0, τv]
nv # of samples with ancestry in v
Xv(t) # of derived alleles in v at time t
XS(t) Vector of allele counts (Xv1(t1),,Xvk(tk)) at S = {v1,…,vk}
Xv, XS Shorthand for Xv(0), XS(0) respectively
T Event tree Figure 2(b)
E A join, split or leaf event in T
KE Populations we are keeping track of at E
x,zE,t Conditional likelihood (XLeaves(E)=z|XKE(t)=x) Eq. (4)
ϕzE E[branch length atbelow KE with z descendants] Eq. (5)
xE,t, ϕE Shorthand for x,zE,t, ϕzE respectively when z fixed
ntot The total number of sampled lineages d=1Dnd Appendix A.1
Mv,t,(i) Label and allele of the ith lineage in v at t ∈[0, τv]
Mv,t (Mv,t,(1),Mv,t,(2),)
ME,t (Mv1,t1,,Mvk,tk) where KE = {v1,…,vk} and t = (t1,…,tk)
Xv,m(t) the # derived among the 1st m lineages in v, t. (Note: Xv(t)Xv,nv(t))

3.1. Example

Consider the model depicted in Figure 1. This model has 3 sampled (leaf) populations, related as follows. Populations 1 and 2 are sisters, with Population 3 an outgroup to them; however a pulse of migrants from 3 to 2 occurs after the split between populations 1 and 2.

Fig. 1.

Fig. 1

An example of a 3-population Moran model. The bottom of the graph corresponds to the present and the top to the past. Population 2 receives admixture from population 3 after splitting from population 1. Other features of the demography include archaic samples in population 1, and various size changes along the edges of this demography. Lineages bearing the ancestral (derived) allele are shown in red (blue). The derived allele arises as the blue star (⋆) in the population immediately ancestral to population 4. The red circles (•) and blue stars represent one potential configuration of the allelic states of the ancestors to the sample; our method integrates over all potential configurations in order to calculate the likelihood. Arrows pointing to the right (→) represent reproduction events in the forward-time Moran model. The black arrow pointing to the left (←) at event 6 is a migration event from the population at node 7 to the population at node 5 (again in the forward-time sense).

At the leaves, we observe (n1, n2, n3) = (2,2,1) samples, with (X1, X2, X3) = (1,1,0) copies of the derived (blue star) allele. We wish to compute the expected number of mutations with pattern (X1, X2, X3); to do this, we will integrate over the unobserved variables (X4, X5, X6, X7, X8) which represent allele counts at certain internal positions within the demography. The random variables X1,…,X8 are related to each other by the DAG G in Figure 2(a), with an edge from population v to w if alleles may pass directly from v into w.

Fig. 2.

Fig. 2

The population DAG G (a) and event tree T (b) corresponding to the demography shown in Figure 1. (a) Each vertex in G corresponds to a collection of alleles in Figure 1. (b) The corresponding event tree T is a junction tree of the DAG in G; each vertex of T corresponds to a set of vertices from G.

In the realization of Figure 1, the hidden blue allele counts are (X4, X5, X6, X7, X8) = (3,2,0,0,0). The blue mutation occurs in the edge above X4, spreading to 3 of the lineages. One copy of the blue allele moves on to X1, while 2 copies move on to X5. However, due to the admixture, X2 only inherits 1 blue allele from X5, inheriting a red allele from the 2 red alleles at X6.

Under the infinite-sites assumption described above, the mutation observed at this site arose at a single point in the genealogical tree depicted in Figure 1. To compute the expected number of mutations with (X1, X2, X3) = (1,1,0), we may condition on the population (i.e., the edge in Figure 1) on which it arose:

ϕ1,1,0=1θE[# mutations with (X1,X2,X3)=(1,1,0)]=1θv=18i=1nvE[#mutations in population v yielding Xv=i]×(X1=1,X2=1,X3=0| mutation at v with Xv=i). (3)

We call 1θE[#mutations at v with Xv=i], the “truncated SFS”; this only concerns events within a single population and can be computed using the method momi1. We refer the interested reader to Kamm et al. (2017) for further details. The second term (X1=1,X2=1,X3=0| mutation at v with Xv=i) gives the conditional likelihood of observing the data given a mutation and its allele count at population v. In momi1, we computed this term in the case where G is a tree without admixture using the sum-product (also known as belief propagation) algorithm (Felsenstein, 1981; Koller and Friedman, 2009).

The main result of this work is to extend our previous dynamic program to the case where G is given by a DAG due to admixture. We will use a dynamic program that is essentially a kind of junction tree algorithm (Koller and Friedman, 2009). This algorithm works by decomposing a DAG graph into a tree in such a way that vertices in the tree correspond to collections of nodes in the original graph. Belief propagation is then applied to the tree decomposition.

We illustrate our algorithm using the example demography in Figure 2(b). We call the tree decomposition T an event tree, because each internal node corresponds to either an admixture or split event.2 We construct the event tree T, and compute the conditional likelihoods (X1=1,X2=1,X3=0| mutation at v with Xv=i) , as follows:

  1. We initially start with a collection of 3 singleton sets of leaf populations: {{1},{2},{3}}. For v = 1, 2, 3, we also keep track of conditional likelihoods of the data beneath v; since we are at the leaves, these are simply the Kronecker delta functions
    (Xv=i|Xv=j)=δi,j={1, if i=j,0, if ij.
  2. Going back in time, the first event is the admixture of populations 5 and 6 into 2. To process this event, we remove the set {2} and replace it with {5, 6}; the current collection of sets becomes {{1},{3},{5,6}}. We make the new set {5, 6} the parent of the removed set {2} in T.

    In addition, we compute [(X2|X5=i,X6=j)]i,j, the conditional likelihood of the data beneath {5, 6} given the allele counts X5, X6. We obtain this by applying Lemmas 1 and 2 to the previous conditional likelihood [(X2|X2=i)]i. Specifically, we apply Lemma 1 to “lift” the conditional likelihood at population 2 to the time point immediately below the admixture event; then, we apply Lemma 2 to obtain (X2|X5,X6), the conditional likelihood at {5, 6} immediately above the admixture event.

  3. The next event has the alleles from 6 splitting off from population 3. To process this event, we merge the clusters containing the relevant populations ({5, 6} and {3}); then we remove 3,6 and replace them with their parent population 7, to obtain {5, 7} as the parent cluster of {5, 6} and {3} in T. After this stage, our collection of sets becomes {{1},{5,7}}.

    To obtain (X2,X3|X5,X7), the conditional likelihood of the data at the leaves beneath {5, 7}, we apply Lemma 3 to (X3|X3) and (X2|X5,X6). Lemma 3 computes the conditional likelihood at a split event when the children fall into separate clusters beneath the split (in this case, the child clusters are {3} and {5, 6}).

  4. The next event is similar, with populations 1 and 5 merging into population 4. After combining the clusters {1} and {5, 7}, removing the merged populations 1 and 5, and adding in their parent 4, we are left with the collection {{4,7}}.

    Similar to previous steps, we compute (X1,X2,X3|X4,X7), the conditional likelihood of the data at the leaves beneath {4, 7}, by applying Lemma 1 to “lift” the conditional likelihoods (X1|X1) and (X2,X3|X5,X7) to the point immediately below the split event, and then apply Lemma 3 to compute the conditional likelihood at a split event above two independent clusters ({1},{5,7}).

  5. The final event has populations 4 and 7 merging into population 8. We replace the cluster {4, 7} with its parent cluster {8} at the root of T.

    To compute (X1,X2,X3|X8) from the child likelihoods (X1,X2,X3|X4,X7), we first apply Lemma 1 to lift the likelihoods immediately below the split event, and then apply Lemma 4, which computes the conditional likelihood at the parent of a split when the children belong to the same cluster ({4, 7} in this case).

Finally, at each cluster {ν1, ν2,…} in T with leaves {ι1, ι2…} we have computed (Xl1,Xl2,|Xv1,Xv2,), the conditional likelihoods at the leaves beneath {ν1, ν2,…}. But to apply equation (3), we need ((X1,X2,X3)=(1,1,0)| mutation at v with Xv=i) mutation at ν with Xν = i) This is given by

((X1,X2,X3)=(1,1,0)| mutation at v with Xv=i)={((X1,X2,X3)=(1,1,0)|X4=i,X7=0),if v=4,((X1,X2,X3)=(1,1,0)|X8=i),if v=8,0,else,

since we assume that each site experiences at most a single mutation in its history, and the only way derived alleles are observed in leaf populations 1,2 is if the corresponding mutation occurs in population 4 or 8.

Remark. The observant reader may have noticed that in Figure 1, population 8 has only n8 = 5 alleles, despite its children having n4 + n7 = 7 > 5 alleles in total. This is due to the fact that there are only 5 alleles at the leaves, so there are at most 5 ancestors in any population at any point; thus, when populations 4 and 7 merge into population 8, we can drop two of the tracked lineages. To show this formally, we use a stochastic process called the lookdown construction, which is a version of the Moran model with a countably infinite number of lineages. However, this analysis requires a great deal of additional notation and is not essential to the remainder of the text, so we defer it to Appendix A.1.

3.2. Algorithms and formulas

We now describe the algorithm to construct the event tree T. We assume that the population graph G has two types of topological events:

  1. Population split: two child populations u, v split from each other; their parent population is w. Looking backward in time, u, v merge and become the population w.

  2. Population admixture: a single child population u, inherits from exactly two parent populations v, w, with the probability that an allele comes from v(w, respectively) being p(1−P, respectively).

Note that more complicated events, such as trifurcating splits or symmetric pulse migrations, may be expressed as a succession of these 2-way split and admixture events.

We provide pseudocode to construct the event tree T in Algorithm 1. In words, we initially start with K equal to a collection of singleton sets corresponding to the leaves of G. Processing each split or admixture event E back in time, we merge all blocks containing the child population(s) of E Then we remove the child population(s) from this merged block, and add in the parent population(s). Within T, the new merged block is the parent of the blocks removed at this stage.

Algorithm 1.

Construct event tree T

1: procedure EventTree(G)
2: K{{l}:lLeaves(G)} ⊳ State variable containing current set of tracked populations
3: V(T){} ⊳ Vertices of T
4: E(T){} ⊳ Edges of T
5: for events E in BackInTimeEvents (G) do
6: if E is split then
7: w1, w2 ← ChildPopulations(E)
8: ν ← ParentPopulation(E)
9: K1 ← block in K containing w1
10: K2 ← block in K containing w2
11: KEK1K2 \{w1, w2} ∪ {ν}
12: KK\{K1,K2}{KE}
13: V(T)V(T){KE}
14: if K1K2 then
15: E(T)E(T){KEK1,KEK2}
16: else
17: E(T)E(T){KEK1}
18: end if
19: else if E is admixture then
20: w ← ChildPopulation(E)
21: ν1, ν2 ← ParentPopulations(E)
22: K′ ← block in K containing w
23: KEK′\{w}∪{ν1, ν2}
24: KK\{K}{KE}
25: V(T)V(T){KE}
26: E(T)E(T){KEK}
27: end if
28: end for
29: return V(T),E(T)
30: end procedure

We now describe Algorithm 2, the dynamic program to compute the joint SFS. We need a bit more notation. For population v, let nv be the number of samples with ancestry in v, we will be keeping track of nv lineages within population v. Let Tv denote the amount of time between the top and bottom events of v, and let ην(t) denote the scaled population size of v at time t ∈ [0, τν] above the base of v, Let 0Xv(t)nv be the allele count within the nv lineages of v at time t above its bottom, so Xv(0)=Xv in our earlier notation.

At event E in T, let KE={v1,,v|KE|} be the corresponding block of populations in G. We define the conditional likelihood at E as

x,zE,t=(XLeaves(E)=z|XKE(t)=x) (4)

where XKE(t)=(Xv1(t1),,Xv|KE|t|KE|) is the vector of allele counts in populations KE at times t, and XLeaves(E) the observed data at the leaves beneath E.

In addition, we define the “partial SFS”

ϕzE=v descended from KEE[branch length in v with XLeaves(E)=z] (5)

as the expected branch length at or below KE subtending z leaves. (Here, the expectation is with respect to branch lengths in the coalescent tree relating the samples.) Note that ϕzRoot(T) gives the desired final result, and corresponds to equation (3) in the previous subsection.

Algorithm 2.

Dynamic program to compute the SFS ϕ

1: procedure DP(1,,D)i=eXini+1
2: for event E in DepthFirstSearch (T) do
3: if KE = {d} is leaf then
4: E,0d
5: else if E is split event then
6: E,0 ← Lemmas 1 and 2
7: else if E is join event then
8: if |ChildEvents(E)|=1 then
9: E,0 ← Lemmas 1 and 4
10: else if |ChildEvents(E) |= 2 then
11: E,0 ← Lemmas 1 and 3
12: end if
13: end if⊳ Computed the conditional likelihood E,0
14: ϕE ← (10) ⊳ Computes ϕE the partial SFS
15: end for
16: return ϕRoot(T) ⊳ Return partial SFS at the root event
17: end procedure

For the remainder of the subsection, we will fix Leaved XLeaves(G)=z, and drop the dependence on z in x,zE,t and ϕzE. Algorithm 2 defines a dynamic program (DP) over the conditional likelihoods xE,0 and partial SFS ϕE. The DP takes input vectors 1,,D corresponding to the leaf populations Leaves (G)={1,,D}. If the inputs 1,,D are set to indicator vectors corresponding to the observed counts X1,,XD, the DP of Algorithm 2 will return the corresponding SFS entry, as stated in the theorem below:

Theorem 1. If (X1,,XD)0,n then

ϕ=DP(eX1,,eXD),

where eXi=(0,,1,,0)ni+1 denotes the vector with 1 at coordinate Xi and 0 elsewhere.

We now present the formulas used by Algorithm 2, in a series of lemmas also used to prove Theorem 1. We start with a formula to “lift” (|,Xv(0),) up to (|,Xv(τv),). That is, this formula transforms a likelihood conditioned on Xv(0), the allele count at the bottom of v, into a likelihood conditioned on Xv(τv), the allele count at the top of v.

Lemma 1 (Lifting), thmadmixlift Let E be a split or admixture event with corresponding block KE, νKE be a population within this block, and x, t be vectors of allele counts and times for the populations in KE. Then, for x(k) = xkeν the conditional likelihood of E is

xE,t=j=0nv[eQ(nv)0τv1ηv(t)dt]kjlx(j)+jevE,tτvev, (6)

where Q(n)(n+1)×(n+1) is the transition rate matrix of the Moran model with n lineages; in particular, Q(n)=(qij(n))0i,jn and

qij(n)={i(ni),if i=j,12i(ni),if |ji|=1,0,else.

To process event E, we first use Lemma 1 to lift up the conditional likelihoods at the child populations, up to the time of E We then apply one of Lemma 2, Lemma 3, or Lemma 4, to obtain the conditional likelihood at E from the lifted child likelihoods, depending on whether E is an admixture or split event, and whether the child populations of E fall into a single cluster or two independent clusters.

We first consider admixture events; Lemma 2 describes how to compute the conditional likelihood in this case. Let the child population be w and the parent populations be v1, v2 Each of the nw lineages in w independently inherits from v1 with probability q1, or from v2 with probability q2 =1 − q1. So the number of lineages inheriting from v1 is Binomial (nw, q1). Then, given that m1 alleles are inherited from v1 and m2 = nw = m1 inherited from v2, the particular alleles inherited from v1 or v2 are chosen by sampling without replacement.

Lemma 2 (Admixture event).

Let E be an admixture event, with child population w and parent populations v1, v2. Let E′ be the child event in T. Suppose each lineage in w comes from v1 with probability q1, and from v2 with probability q2 = 1 − q1 For KE the population cluster at E, let x be a vector of allele counts on KE\ {ν1, ν2}. Then the conditional likelihood of allele counts x+x1ev1+x2ev2 at E is given by

x+x1ev1+x2ev2E,0=xw=0nwx+xwewE,τwewm1+m2=nwm1,m2:(nwm1)q1m1q2m2j1+j2=xwj1,j2:(x1j1)(nv1x1m1j1)(nv1m1)(x2j2)(nv2x2m2j2)(nv2m2). (7)

We next consider a split event E, with parent population v and child populations w1, w2. We first consider the case where E has 2 distinct children in T, i.e., w1, w2 fall into 2 distinct blocks beneath E. Denote the corresponding child events as E1′, E2′ respectively. Then the conditional likelihood at E is given by a convolution of the conditional likelihoods at E1′, E2′, as described in Lemma 3:

Lemma 3 (Population split, 2 clusters). Let E be a split event with parent population v and child populations w1, w2. Suppose E has 2 child events E1′, E2′, with corresponding blocks KE1,KE2, where w1KE1,w2KE2. Let x−1 be allele counts on KE1\{w1} and x−2 be allele counts on KE2\{w2}, Then the conditional likelihood at E is

x1+x2+xvev2E,0=x1+x2=xvx1,x2:(nw1x1)(nw2x2)(nvxv)x1ew1+x1E1,τw1ew1x2ew2+x2E2,τw2ew2. (8)

Next, consider the case where the child populations w1, w2 fall into the same cluster beneath E, so that E has just one child event, say E′. The next lemma describes how to obtain the conditional likelihood at E from the conditional likelihood at E′. This involves summing over the dimensions corresponding to w1, w2 within the conditional likelihood E,τw1ew1+τw2ew2. In addition, note that we may have nv<nw1+nw2 (recall nv is the number of samples with ancestry in v). That is, after merging w1, w2 backwards in time, we may be keeping track of more alleles than originally sampled, allowing us to “drop” some extraneous non-ancestral lineages, as illustrated in the root population of Figure 1. Formally, let yx be the vector whose i-th entry is the likelihood at E conditional on there being a total of i derived alleles in the child populations w1 and w2, and also conditional on the vector of allele counts x in the other populations. If B(nw1+nw2+1)×(nv+1) is a matrix whose (i, j)th entry is the probability of sampling without replacement j alleles in the parent population v out of the total of i alleles segregating in child populations w1 and w2, then the conditional likelihood E at E satisfies the equation

yx=BE,

leading to the following lemma.

Lemma 4 (Population split, 1 cluster). Let E be a population split with exactly one event E′. Denote the corresponding population clusters as KE and KE. Denote the parent population as v, the child populations as w1, w2. Let x, x,yx and B be defined as in the preceding paragraph:

yix=j+k=ij,k:(nw1j)(nw2k)(nw1+nw2i)jew1+kew2+xE,τw1ew1+τw2ew2=(XLeaves(E)|Xw1(τw1)+Xw2(τw2)=i,X=x)
Bi,j=(nvj)(nw1+nw2nvij)(nw1+nw2i).

Then the conditional likelihood at E is given by

kev+xE,0=[B+yx]k, (9)

with B+ denoting the Moore-Penrose pseudoinverse of B.

Finally, having computed the conditional likelihoods at event E, we wish to compute the partial SFS ϕE at the event. This is given by the recursive formula in Lemma 5, which involves the conditional likelihood at E, the expected number of mutations arising within each parent population v, and the partial SFS at the child events.

Lemma 5. For an event E, define the partial ordering E′ ≺ E if E′ occurs beneath E in the event tree, and let KE new =KE\(EEKE) be the populations newly formed at E (i.e., formed by a population split or admixture at E). For vKEnew, let fν(k) be the truncated SFS in population v,

fv(k)=1θE[#mutations at v with Xv=k],

which can be computed by the formulas in Kamm et al. (2017). Then the partial SFS at E is given by

ϕE=vKEnewk=1nvfv(k)kevE,0+{0ifEisleafevent,ϕE,if ChildEvents(E)={E},ϕE1dLeaves (E2)0d+ϕE2dLeaves(E1)0d,if ChildEvents(E)={E1,E2}. (10)

3.3. Normalizing constant and other linear functionals

To compute the probability ϕzϕ1 of a mutation having observed allele counts z, we need not just ϕz, but also the normalizing constant ϕ1=zϕz the expected total branch length.

Computing ‖ϕ1 directly is inefficient because of the O(d=1Dnd) possible entries z. Instead, we can use Algorithm 2 to compute ‖ϕ1, and many more statistics of the SFS, in the same time as a single entry:

Theorem 2. For πdnd+1,d{1,,D}, the tensor dot product of the SFS ϕ against π1πD=[πz11πzDD]z1,,zD is

ϕ(π1πD)=z1,,zDϕz1,,zDπz11πzDD=DP(π1,,πD)(d=1Dπ0d)DP(e0,,e0)(d=1Dπndd) DP (en1,,enD).

Theorem 2 says that for any rank-K tensor A(n1+1)××(nD+1) with

A=k=1Ka1(k)aD(k),
ϕA=zϕzAz=k=1Kϕ(a1(k)aD(k))

can be computed in K calls to DP (π1,,πD). 3 In particular, the expected total branch length ‖ϕ1 is given by

ϕ1=zϕz=ϕ(11)=DP(1,,1)DP(e0,,e0)DP(en1,,enD)

with 1 the vector with 1 at every coordinate.

Beyond the applications we explore here, we expect this result to be useful in related settings. A number of population genetic statistics can be expressed as f ⨀ A, including Watterson’s estimator θW of the mutation rate (Watterson, 1975), Fay and Wu’s H statistic for positive selection (Fay and Wu, 2000), and Patterson’s f2, f3, f4 statistics for assessing topology (Patterson et al., 2012). Theorem 2 allows us to compute their expected values E[fA]=θϕA, and to construct test statistics from the deviance f ⨀ AθϕA under an appropriate null model.

An even wider class of population genetic statistics can be written as nonlinear functions of SFS-tensor products like g(1θfA1,1θfA2,); this class includes Tajima’s D statistic for selection (Tajima, 1989), the FST statistic for population structure (Holsinger and Weir, 2009), and Patterson’s D statistic for introgression (Patterson et al., 2012). These statistics may be viewed as plug-in estimators for g(ϕA1, ϕA2,…) which we can compute with Theorem 2. Note that these estimators are biased due to the nonlinear function g, but the bias can be estimated via block jackknife, and will typically be small since 1θfAϕA almost surely as the number of independent SNPs grows.

Another interesting linear statistic of the SFS that can be computed with Theorem 2 is E[TMRCA], the expected time of the most recent common ancestor. In particular, let d be any leaf population; for simplicity assume d is sampled at the present (i.e. d is not archaic). Then

E[TMRCA]=z1,,zDϕz1,,zDzdnd=ϕ(1(zdnd)zd{0,,nd}1).

To see this, note that TMRCA is proportional to the expected number of mutations hitting an arbitrary lineage in d, and if a mutation has configuration z=(z1,,zD) derived copies, then the chance of hitting the lineage is zdnd (Zeng et al., 2006).

4. Application

We tested our method on a demographic inference problem in human genetics that is currently of interest. Lazaridis et al. (2014) showed that genetic variation in present-day Europeans suggests an admixture model involving three ancestral meta-populations: Ancient North Eurasian (ANE), Western Hunter Gatherers (WHG), and Early European Farmers (EEF). They also showed that EEF contains ancestry from a source that is an outgroup to all non-African populations, and yet shares much of the genetic drift common to non-African populations; they dubbed this ancestry component as “Basal Eurasian” ancestry. Later work (Lazaridis et al., 2016) showed that Basal Eurasian ancestry is shared by ancient and contemporary Middle Eastern populations, and is correlated with a decrease in Neanderthal ancestry, implying that Basal Eurasian ancestry contains lower levels of Neanderthal admixture when compared with non-Basal ancestry. The results from Lazaridis et al. (2014, 2016) were based on several related methods for modeling covariances in population allele frequencies, most notably qpGraph and qpAdm (Patterson et al., 2012; Haak et al., 2015). These methods are computationally efficient, and are robust to ascertainment bias and misspecification of the population size history; however, they are unable to infer the timing of demographic events.

We applied momi2 to estimate the strength and timing of basal Eurasian admixture into early European farmers, and the split time of the basal Eurasian lineage. To do this, we built a demographic model relating 12 samples from 8 populations, shown in Figure 3. These samples consisted of the Altai Neanderthal (Prüfer et al., 2014); the 45,000 year old Ust’lshim man from Siberia (Fu et al., 2014); 3 present-day populations (Mbuti, Sardinian, Han) with 3, 2, and 2 samples respectively; and 3 ancient samples representing the European ancestry components identified by Lazaridis et al. (2014): a 7,500 year old sample from the Linearbandkeramik (LBK) culture (representing EEF), an 8,000 year old sample from the Loschbour rock shelter in Luxembourg (representing WHG), and the 24,000 year old Mal’ta boy (“MA1”) from Siberia (representing ANE). After data cleaning, our dataset consisted of 2.4 ×106 autosomal transversion SNPs. See Appendix A.4 for more details about the data.

Fig. 3.

Fig. 3

Inferred model and bootstraps for the 11 population demography described in Section 4. The y-axis is linear below 5 × 104 then follows a logarithmic scale above 5 × 104 in the foreground (blue branches) is our point estimate from maximum composite likelihood; in the background (gray branches, with transluscent pulse arrows) are 300 nonparametric bootstraps. Population sampling times are indicated by open circles; note demographic events may still involve populations below their sampling times, as in the case of Loschbour. The nonparametric bootstrap estimates were created by splitting the data into 100 equally sized contiguous blocks, resampling these blocks with replacement, and refitting the model.

To construct the topology of the model in Figure 3, we first obtained a tree by neighbor joining (Saitou and Nei, 1987), then added 3 extra admixture events reflecting prior knowledge, as well as a recent Neanderthal population decline starting at the Mbuti-Eurasian split. We inferred split times, population sizes (including the Neanderthal decline), and admixture times and proportions by maximizing a composite likelihood L, given by the product of the likelihoods at every SNP:

L=sSNPs(zs) (11)

where (zs)ϕzs and was computed by momi2. The low coverage of the MA1 sample and the deep divergence of the Neanderthal sample may cause bias in SFS entries where only these samples contain derived alleles; we thus excluded these entries and corrected the normalizing constant appropriately (see Appendix A.4).

We used automatic differentiation to compute the gradient logL, which we used to search for the optimum of logℒ. We constructed nonparametric bootstrap confidence intervals by splitting the genome into 100 equally sized blocks, resampling these blocks to create 300 bootstrap datasets, and re-inferring the demography for each bootstrap dataset. We also used 300 parametric bootstraps to assess how well we could infer the demography under simulated data; for each parametric bootstrap dataset, we used msprime (Kelleher et al., 2016) to simulate ten 300 Mb chromosomes from our initial point estimate, and re-inferred the demography. The nonparametric bootstrap was used for all confidence intervals reported below.

Our inferred demography, along with nonparametric bootstrap re-estimates, are shown in Figure 3 and Table 2. Our parametric bootstrap estimates are shown in Figure 4. We inferred a pulse of 0.094 (95% Cl of 0.049–0.174) from the ghost Basal Eurasian population to EEF ancestry (LBK), substantially less than the 0.44 inferred by (Lazaridis et al., 2014). This admixture was inferred to occur 33.7 kya (95% Cl of 10.8–41.1 kya), shortly after the Loschbour-LBK split at 37.7 kya (95% Cl of 32.2–42.3 kya). The split time of the ghost Basal Eurasian lineage from other Eurasians was inferred at 79.8 kya (95% Cl of 67.4–101 kya). Other parameters were broadly in line with previous estimates, such as a Mbuti-Eurasian split of 96 kya, a Han-European split of 50 kya, a Neanderthal split of 696 kya, and Eurasians deriving 0.03 of their ancestry from Neanderthal (Terhorst et al., 2017; Green et al., 2010; Meyer et al., 2016).

Table 2.

Estimated parameters of the demography in Figure 3, along with nonparametric bootstrap estimates of the bias and standard deviation, and 95% bootstrap quantiles. We use (A, B) to denote the ancestor of A and B; tv and Nv to denote the height and size at vertex v, and tA→B and pA→B to denote respectively the time and strength of an admixture arrow from A to B.

Parameter Estimate Bias SD 2.5% 97.5%
NLosch 1.92e+03 14.1 185 1.57e+03 2.27e+03
NMbu 1.73e+04 –255 1.86e+03 1.6e+04 1.92e+04
N(Mbu,Losch) 2.91e+04 −270 2.34e+03 2.7e+04 3.21e+04
t(Han,Losch) 9.58e+04 −1.48e+03 9.32e+03 9.19e+04 1.03e+05
NHan 6.3e+03 −17.1 400 5.71e+03 6.91e+03
N(Han,Losch) 2.34e+03 −70.2 372 2.13e+03 2.7e+03
t(Han,Losch) 5.04e+04 −171 2.75e+03 4.71e+04 5.43e+04
t(Ust,Losch) 5.15e+04 −79.4 2.47e+03 4.85e+04 5.47e+04
N(Nean,Losch) 1.82e+04 −216 1.59e+03 1.7e+04 1.99e+04
t(Nean,Losch) 6.96e+05 −7.6e+03 5.74e+04 6.49e+05 7.58e+05
tNean→Eurasian 5.68e+04 −279 3.04e+03 5.38e+04 6.01e+04
pNean→Eurasian 0.0296 −2.82e-05 0.00252 0.0251 0.0349
NNean 86.9 −3.33 19.8 76.7 105
t(MA1,Losch) 4.49e+04 98.2 2.36e+03 4.11 e+04 4.87e+04
NLBK 75.7 −685 629 4.12 1.9e+03
t(LBK,Losch) 3.77e+04 543 2.59e+03 3.22e+04 4.23e+04
pBasal→EEF 0.0936 −0.0122 0.0366 0.0485 0.174
tBasal→EEF 3.37e+04 7.46e+03 1.01e+04 1.08e+04 4.11 e+04
t(Basal,Losch) 7.98e+04 −706 1.37e+04 6.74e+04 1.01e+05
NSard 1.5e+04 −1.28e+04 6.53e+04 8.58e+03 8.9e+04
t(Sard,LBK) 7.69e+03 −1.69e+03 1.64e+03 7.51e+03 1.24e+04
N(Sard,LBK) 1.2e+04 1.32e+03 1.99e+03 7.33e+03 1.45e+04
tGhostWHG→Sard 1.23e+03 −2.93e+03 3.1e+03 597 1.06e+04
pGhostWHG→Sard 0.0317 −0.00223 0.0151 0.00631 0.0618
t(GhostWHG,Losch) 1.56e+03 −1.32e+04 1.07e+04 964 3.49e+04

Fig. 4.

Fig. 4

Parametric bootstraps for the demography inferred in Figure 3. The inferred demography (blue tree) was used to simulate 300 bootstrap replicates. Re-running our method on each replicate produced a new inferred demography which is plotted in gray.

Inferring the optimal demography from start to finish took 2.5 hours on a laptop with 4 CPU cores, and used 2 GB RAM. The 300 bootstraps were run separately on a high-performance computing cluster. To our knowledge, no other method can infer this demographic model using the full SFS. The moments software package (Jouganous et al., 2017) is capable of computing the SFS for up to 5 populations, less than the 8 populations here, though it can scale to more individuals per population than momi2. While the fastsimcoal2 software package (Excoffier et al., 2013) is capable of handling demographies of this size and larger, it doesn’t compute the full, exact SFS, and also does not include an option for the ascertainment scheme we use here (excluding mutations private to Neanderthal and MA1).

We also used our model to produce estimates of the human mutation rate, which we estimated as 1.22 × 10−8 per base per generation, with a bootstrap-quantile 95% Cl of (1.12, 1.32) × 10−8 closely matching previous estimates of the human mutation rate (Scally, 2016). To obtain this estimate, we compared the observed nucleotide diversity with that expected under the inferred demography, adjusting by the empirical transition to transversion ratio (Appendix A.4). Estimating the mutation rate was possible here because we did not use a prespecified mutation rate to estimate the model in Figure 3, instead using the known ages of the Ust’Ishim, LBK, Loschbour, and MA1 samples to calibrate dates.

Acknowledgments

This research is supported in part by an NIH grant R01-GM109454 (YSS); a Packard Fellowship for Science and Engineering (YSS); Wellcome Trust grants WT206194 and WT207492 (RD); and the Human Frontiers Science Program fellowship LT000402/2017 (JK). YSS is a Chan Zuckerberg Biohub investigator.

A. Appendix

A.1. Model and notation

In this section, we formally define the stochastic process underlying our model, and introduce some additional notation needed for the proofs in Section A.2.

The main stochastic process we use is the lookdown construction of the Moran model (Donnelly and Kurtz, 1996; Donnelly et al., 1999), a variant of the Moran model where copying only occurs in one “direction” (as in Figure 5). We will make use of certain conditional independence properties that result from this one-way copying. However, we note that there is a simple coupling between the lookdown and standard Moran models, and these two models generate data with the same distribution.

Fig. 5.

Fig. 5

Standard and lookdown Moran models, (a) illustrates the standard Moran model, with copying in both directions; (b) illustrates the lookdown Moran model, with copying always from left to right. The sample genealogy (i.e., the lineages ancestral to present day samples) is in solid. The law of the genealogy is the coalescent under both models, and so the present-day samples of the two models are equal in distribution.

We now describe our lookdown model in more detail. Within each of the D leaf populations we consider a countably infinite number of lineages, each with a unique label in +. We arbitrarily assign the (n1,,nD) sampled lineages to the lowest labels {1,2, …, ntot}, where ntotd=1Dnd, and arbitrarily assign the remaining unsampled lineages to labels {ntot + 1,ntot + 2, …}. Each lineage extends infinitely backwards in time, and at each admixture event, each lineage randomly chooses the parent population it extends into. In addition, each lineage has an allele, which changes through time due to mutation and copying events.

Similar to the usual Moran model, copying between a pair of lineages in population v occurs at rate 1ηv(t), where ην(t) is the scaled population size of v. However, copying only occurs in one direction, from lineages with lower labels to lineages with higher labels (Figure 5). So if two lineages are labeled i and j respectively, with i < j, then the copying event ij, occurs at rate 1ηv(t), whereas the reverse copying event ij never happens (has rate 0). By contrast, in the standard (non-lookdown) Moran model, both ij and ij copying events happen at rate 12ηv(t). However, under both models, two lineages going backwards in time coalesce at rate 1ηv(t), as in the coalescent. Thus, the sample genealogies under both models follow the same multipopulation coalescent distribution.

Denote the labeled lineages in population v, and their alleles at height t, by

Mv,t=(Mv,t,(1),Mv,t,(2),)

where Mv,t,(i)=(labelv,(i),allelev,t,(i))+×{0,1} is a pair, consisting of the ith lowest label at v, along with its corresponding allele at height t ∈[0, τν). Note that we have ordered the lineages by their labels, so that labelν,(i) < labelν,(i+1). Also note that we measure the time t from the bottom of v, so that alleleν,0,(i) denotes the ith allele at the bottom of v, and allelev,τv,(i) the ith allele at the top of v.

Let Xv,m(t)=i=1mI{Mv,t,(i)derived} denote the number of derived alleles among the lowest m lineages at v, t. Let nv=dvnd be the number of samples in leaves below v. During computation, we will only need to keep track of the lowest nv lineages in v, so we denote Xv(t)Xv,nv(t) to be the number of derived alleles among the first nv lineages at v, t.

Finally, for an event E with KE = (ν1, …, νk) the corresponding block of populations in Algorithm 1, let t = (t1, …, tk) be a corresponding set of times within each population ν1, …, νk. We define ME,t=(Mv1,t1,,Mvk,tk) as the labeled alleles at each of (ν1,t1), …, (νk, tk)

A.2. Proofs

For the reader’s convenience, theorem statements are reproduced from the main text before each proof. The following two lemmas will be useful for several of the proofs below:

Lemma 6. The distribution of ME,t is invariant to finite permutations of the labels within any population. νKE Furthermore, the labels are independent of the alleles.

Proof. By construction, none of the lineages within the populations in KE are ancestral to each other. Thus the sample genealogy of any finite subsample of the lineages is the multipopulation coalescent, because going backwards in time, coalescence (copying) between each pair of lineages occurs at rate 1ηw(s) at vertex w time s. The invariance to permutation, and independence of alleles and labels, follows from the coalescent. □

Lemma 7. For event E and population uKE, let I{nu+1,nu+2,} be a (possibly random) collection of integers greater than nu, and let Xu,I(tu) be the number of derived lineages in {Mu,tu,(i)}iI. Then XLeaves(E) is conditionally independent of Xu,I(tj) the allele counts on I, given XKE(t) the allele counts on the first nv1,,nvk lineages of KE = {ν1, …, νk}.

Proof. Integrate over ME,t,nv{Mv,tv,(i)}vKE,1inv the first (nv1,,nvk) labeled alleles within KE, and Mu,tu,I={Mu,tu,(i)}iI the labeled alleles at I:

(XLeaves(E)|XKE(t),Xu,Itu)=E[(XLeaves(E)|ME,t,nv,Mu,tu,I)|XKE(t),Xu,I(tu)]=E[(XLeaves(E)|ME,t,nv)|XKE(t),Xu,I(tu)]=E[(XLeaves(E)|ME,t,nv)|XKE(t)]=(XLeaves(E)|XKE(t))

with the second equality because higher lineages cannot copy to lower lineages (so XLeaves(E)Mu,tu,I|ME,t,nv), and the third equality because of the exchangeability and independence from Lemma 6 (so given XKE(t), the alleles of ME,t,nv are ordered by a uniform permutation independent of Xu,I(tu), and the labels of ME,t,nv are independent of XKE(t), Xu,I(tu)). □

A.2.1. Proof of Theorem 1

Theorem 1. If (X1,,XD)0,n, then

ϕ=DP(eX1,,eXD),

where eXi=(0,,1,,0)ni+1 denotes the vector with 1 at coordinate Xi and 0 elsewhere.

Proof. Lemmas 2–4 give formulas for the partial likelihood xE,0 using terms like xE,t, where E′ ∈ Children(E) and E is an admixture, 2-cluster split, or 1-cluster split, respectively. The terms xE,t in turn can be computed from the partial likelihoods xE,0 using Lemma 1; then Lemma 5 provides a formula for ϕE in terms of E,0 and the partial SFS ϕE.

Algorithm 2 traverses the event tree T in a depth-first search, applying Lemmas 5,1,2,4,3 to compute xE,0, ϕE at each event E from their values at the children of E The input to Algorithm 2 are the likelihoods {d},0 at the leaves, and the output is the partial SFS ϕRoot(T) at the root.

Thus, since

{d},0=[(Xd|Xd=i)]0ind=eXd,

it follows that DP(eX1,,eXD)=ϕRoot(T)=ϕ. □

A.2.2. Proof of Lemma 5

Lemma 5. For an event E, define the partial ordering E′ ≺ E if E′ occurs beneath E in the event tree, and let KEnew=KE\(EEKE) be the populations newly formed at E (i.e., formed by a population split or admixture at E). For vKEnew, let fv(k) be the truncated SFS in population v,

fv(k)=1θE[#mutations at v with Xv=k],

which can be computed by the formulas in Kamm et al. (2017). Then the partial SFS at E is given by

ϕE=vKEnewk=1nvfv(k)kevE,0+{0ifEisleafevent,ϕE,if ChildEvents(E)={E},ϕE1dLeaves (E2)0d+ϕE2dLeaves(E1)0d,if ChildEvents(E)={E1,E2}. (10)

Proof. Without loss of generality assume θ = 1. Then ϕE is the expected number of mutations at or below KE with observed counts XLeaves(E). We split ϕE into two parts: the mutations within KEnew=KE\(EEKE) (i.e. the populations formed by a split or join at E); and the mutations that occur in ∪E′EKE′, the populations that arise strictly below E.

The first part, of mutations at the new populations of E, is given by

vKE new k=1nvfv(k)kevE,0,

since fv(k) is the expected number of mutations arising at v with Xv = k, and kevE,0=(XLeaves(E)|Xv=k).

For the second part, of mutations strictly below E, we split into three cases: either E is a leaf event, or E has a single child event, or E has two child events. In the first case, if E is a leaf event, then no mutations can occur below E. In the second case, if E has a single child event E′, then the expected number of mutations strictly below E is simply ϕE by definition.

Finally, if E has two child events E1′, E2′, then E1′ and E2′ share no leaves, so a mutation underneath E1′ will have no derived alleles in Leaves(E2′) and vice versa. Thus, the number of mutations strictly below E is

ϕE1dLeaves(E2)0d+ϕE2dLeades(E1)0d.

A.2.3. Proof of Lemma 1

Lemma 1 (Lifting). Let E be a split or admixture event with corresponding block KE, vKE be a population within this block, and x, t be vectors of allele counts and times for the populations in KE. Then, for x(k) = xkev, the conditional likelihood of E is

xE,t=j=0nv[eQ(nv)0τv1ηv(t)dt]kjlx(j)+jevE,tτvev, (6)

where Q(n)(n+1)×(n+1) is the transition rate matrix of the Moran model with n lineages; in particular, Q(n)=(qij(n))0i,jn and

qij(n)={i(ni),if i=j,12i(ni),if |ji|=1,0,else.

Proof. Define a “quasi-lookdown” Moran model M*, which is identical to M, except within the nv lowest lineages of v, where we allow copying in both directions at rate 12ηv(t) the non-lookdown Moran model).

For an event E with populations KE = (v1, v2,…) and corresponding times t = (t1,t2…), let XKE(t)=(Xv1(t1),Xv2(t2),) be the corresponding allele counts. Next, define XE,t={XKEs}(E,s)(E,t) as the sample path of allele counts below E, t, where (E′, s) (E, t) if either E is above E′, or E = E′ and st component-wise. It will suffice to show M(XE,t)=M*(XE,t), because then for t = τvev+tv,

kev+xvE,t=j=0nvM(XKE(tv)=jev+xv|XKE(t)=kev+xv)ljev+xvE,tv=j=0nvM*(XKE(tv)=jev+xv|XKE(t)=kev+xv)ljev+μvE,tv=j=0nv[eM(nv)0τv1ηv(t)dt]kjljev+μvE,tv

as desired.

M(XE,t)=M*(XE,t) follows from a coupling argument. Let ME,t={ME,s}(E,s)(E,t) the sample path of M below E, t. We can map the partial sample paths of M*v,t onto those of Mv,t as follows: moving from the bottom to the top of v, whenever a lower label is copied over by a higher label, swap the labels of the lineages above the copying. Then the relabeled sample path has the same distribution as the lookdown construction, since the allele with the higher label is always copied over, and the rate of copying between pairs of lineages is 1ηv(t). Since this relabeling also leaves Xv(t) unchanged, we have M(XE,t)=M*(XE,t). □

A.2.4. Proof of Lemma 2

Lemma 2 (Admixture event).

Let E be an admixture event, with child population w and parent populations v1, v2. Let E′ be the child event in T. Suppose each lineage in w comes from v1 with probability q1 and from v2 with probability q2 = 1 − q1 For KE the population cluster at E, let x be a vector of allele counts on KE \ {v1, v2}. Then the conditional likelihood of allele counts x+x1ev1+x2ev2 at E is given by

x+x1ev1+x2ev2E,0=xw=0nwx+xwewE,τwewm1+m2=nwm1,m2:(nwm1)q1m1q2m2j1+j2=xwj1,j2:(x1j1)(nv1x1m1j1)(nv1m1)(x2j2)(nv2x2m2j2)(nv2m2). (7)

Proof. First note that XLeaves(E) = XLeaves(E′) and

x+x1ev1+x2ev2E,0=(XLeaves(E)|XKE(0)=x+x1ev1+x2ev2)=xw=0nw(XLeaves(E)|XKE(0)=x+x1ev1+x2ev2,Xw(τw)=xw)(Xw(τw)=xw|XKE(0)=x)=xw=0nw(XLeaves(E)|XKE(0)=x+x1ev1+x2ev2,Xw(τw)=xw)×m1+m2=nwm1,m2:(nwm1)q1m1q2m2j1+j2=xwj1,j2:(x1j1)(nv1x1m1j1)(nv1m1)(x2j2)(nv2x2m2j2)(nv2m2)

by sampling nw alleles in w from nv1, nv2 alleles in v1, v2 which are exchangeable by Lemma 6.

Next, consider the nv1+nv2nw highest alleles of v1, v2, and let I{nw+1,nw+2,} their relative positions within w. Then we conclude the proof by noting that XLeaves(E) = XLeaves(E′) and applying Lemma 7, to get

(XLeaves(E)|XKE(0)=x+x1ev1+x2ev2,Xw(τw)=xw)=(XLeaves(E)|Xv1=x1,Xv2=x2,XKE(τwew)=x+xwew)=(XLeaves(E)|Xw,I(τw)=x1+x2xw,XKE(τw,ew)=x+xwew)=(XLeaves(E)|XE(τw,ew)=x+xwew)=lxcap+xwewE,0.

A.2.5. Proof of Lemma 4

Lemma 4 (Population split, 1 cluster). Let E be a population split with exactly one event E′. Denote the corresponding population clusters as KE and KE. Denote the parent population as v, the child populations as w1,w2. Let x, yx and B be defined as in the preceding paragraph:

yix=j+k=ij,k:(nw1j)(nw2k)(nw1+nw2i)jew1+kew2+xE,τw1ew1+τw2ew2=(XLeaves(E)|Xw1(τw1)+Xw2(τw2)=i,X=x)
Bi,j=(nvj)(nw1+nw2nvij)(nw1+nw2i).

Then the conditional likelihood at E is given by

kev+xE,0=[B+yx]k, (9)

with B+ denoting the Moore-Penrose pseudoinverse of B.

Proof.

Let X=XKE(0)Xv(0)ev=XKE(0)Xw1(0)ew1Xw2(0)ew2 be the vector of allele counts on KEKE at times 0. Then note that

yix=j,k:j+k=i(nw1j)(nw2k)(nw1+nw2i)jew1+kew2+xE,τw1ew1+τw2ew2=j,k:j+k=i(Xw1(τw1)=j,Xw2(τw2)=k|Xw1(τw1)+Xw2(τw2)=i,X=x)jew1+kew2+xE,τw1ew1+τw2ew2=(XLeaves(E)|Xw1(τw1)+Xw2(τw2)=i,X=x)

with the second equality following from exchangeability (Lemma 6) and the third equality from XLeaves(E) = XLeaves(E).

Next note that

(XLeaves(E)|Xw1(τw1)+Xw2(τw2)=i,X=x)=j=0nv(XKE(0)=jev+x|Xw1(τw1)+Xw2(τw2)=i,X=x)×(XLeaves(E)|XKE(0)=jev+x,Xw1(τw1)+Xw2(τw2)=i)=j=0nv(nvj)(nw1+nw2nvij)(nw1+nw2i)(XLeaves(E)|XKE(0)=jev+x,Xw1(τw1)+Xw2(τw2)=i)

with the second equality again due to exchangeability (Lemma 6).

Define I{nv+1,nv+2,} so that {1,,nv}I are the indices in v of the first nw1, nw2 alleles in w1, w2. Then because Xv,I(0)=Xw1(τw1)+Xw2(τw2)Xv(0) and Lemma 7,

(XLeaves(E)|XKE(0),Xw1(τw1)+Xw2(τw2))=(XLeaves(E)|XKE(0),Xv,I(0))=(XLeaves(E)|XKE(0))

and thus

yix=j=0nv(nvj)(nw1+nw2nvij)(nw1+nw2i)jev+xE,0=j=0nvBijjev+xE,0

so letting =[jev+xE,0]0jnvnv+1, we have yx=B and therefore =B+yx. □

A.2.6. Proof of Lemma 3

Lemma 3 (Population split, 2 clusters). Let E be a split event with parent population v and child populations w1, w2 Suppose E has 2 child events E1′, E2′, with corresponding blocks KE1, KE2, where w1KE1, w2KE2. Let x−1 be allele counts on KE1\{w2} and x−2 be allele counts on KE2\{w2}. Then the conditional likelihood at E is

x1+x2+xvevE,0=x1+x2=xvx1,x2:(nw1x1)(nw2x2)(nvxv)x1ew1+x1E1,τw1ew1x2ew2+x2E2,τw2ew2. (8)

Proof. Notice that

x1+x2+xvevE,0=x1,x2:x1+x2=xv(XKE1(τw1ew1)=x1+x1ew1,XKE2(τw2ew2)=x2+x2ew1|XKE(0)=x1+x2+xvev)×x1ew1+x1E1,τw1ew1x2ew2+x2E2,τw2ew2=x1+x2=xvx1,x2:(nw1x1)(nw2x2)(nvxv)x1ew1+x1E1,τw1ew1x2ew2+x2E2,τw2ew2

with the first equality from the Markov property of the Moran process, and the second equality following from the exchangeability of the nv alleles at vertex v (Lemma 6). □

A.2.7. Proof of Theorem 2

Theorem 2. For πdnd+1,d{1,,D}, the tensor dot product of the SFS ϕ against π1πD=[πz11πzDD]z1,,zD is

ϕ(π1πD)=z1,,zDϕz1,,zDπz11πzDD=DP(π1,,πD)(d=1Dπ0d)DP(e0,,e0)(d=1Dπndd)DP(en1,,enD).

Proof. Below, we will prove DP (1,,D) is a multilinear function of the input vectors 1,,D. The result immediately follows from this, because then

ϕ(π1πD)=z0,nϕzπz11πzDD=z0,nDP(πz11ez1,,πzDDezD)=DP(z1=0n1πz11ez1,,zD=0nDπzDDezD)DP(π01e0,,π0De0)DP(πn11en1,,πnDDenD)=DP(π1,,πD)DP(π01e0,,π0De0)DP(πn11en1,,πnDDenD).

We now show DP (1,,D) is a multilinear function of 1,,D. We start by showing that if event E has leaf populations Leaves(E)=(d1,…, dL), then ℓE, t is a multilinear functions of d1,,dL. We show this by induction over (E,t). The base case, where t = 0 and E is a leaf with KE = (di), is trivially true because E,0=di.

For the next case, we note that (6), (7), and (9) express E, t as a tensor product of E′, t with a matrix, where (E′, t′) ≺ (E, t) and Leaves(E) = Leaves(E′). E′, t is a multilinear function of d1,,dL by induction, hence E, t is also.

Similarly, (8) expresses E, 0 as a product of E1,t1 and E2,t2, where Ei,ti is a multilinear function of (k:k ∈Leaves(Ei)} by induction, and furthermore Leaves(E1)∩Leaves(E2) = ∅ and Leaves(E1)∩Leaves(E2) = Leaves(E) Thus E, 0 is a multilinear function of d1,,dL.

Thus, each of the operations (6), (7), (8), (9) in Algorithm 2 preserves multilinearity of E, t, and thus each E, t is a multilinear function of its leaf vectors by induction. Finally, a similar induction argument shows that each ϕE is a multilinear function of {k : k ∈ Leaves(E)}, since (10) expresses ϕE as a sum of multilinear functions by induction. □

A.3. Computational complexity

Computing the SFS ϕz Algorithm 2 involves keeping track of the partial likelihoods x,zE,0 at each event E. For a dataset with s unique observations z, the array of partial likelihoods for every z and every hidden state x at event E is E,0=(x,zE,0)x,z a tensor with svKE(nv+1)=O(svKEnv)=O(sn|KE|) total entries (where n=d=1Dnd).

We now consider the time cost of each operation in Algorithm 2. We start by considering the “universal constants” that depend only on nv, and not on the parameters of the demography or the number of SNPs. Next, we consider terms that need to be computed once per demography, but don’t depend on the number of SNPs. Finally, we consider those terms that need to be recomputed once for each SNP and each demography.

Proposition 1. For a population graph G with corresponding event tree T, let |V| be the number of vertices (populations) in the graph, and let Madmix be the number of admixture events. Then, for a dataset with n samples and s unique observations, computing the likelihoods of d different demographies with the same topology {G and T) but different parameters (sizes, pulse strengths, branch lengths), has a computational complexity of

O(|V|n3+Madmix(n5+dn4)+d|V|n2+dsETn|KE|+1).

Proof. In Lemma 1, we compute the matrix exponential of (6) as eQ(nv)0τv1ηv(t)dt=UeΣ0τv1ηv(t)dtU1 where U are the eigenvectors in the decomposition Q(nv)=UΣU1; this eigenvalue decomposition costs O(nv3). Similarly, in Lemma 4 we compute the pseudoinverse of (9) as B+ =VΣ1UT for SVD B = UΣVT; this SVD also costs O(nv3) Both the eigenvalue decomposition and the SVD are universal for all demographic parameters, and computed at most once per population v, which contributes an overall complexity of O(V|n3).

The innermost sum of (7) in Lemma 2 is also universal, and costs O(nw5) to compute for all possible values of x1,x2,xw,m1 leading to the term O(Madmixn5). However, the middle sum of this equation is not universal, since it depends on q1, q2. While not universal, this middle sum only needs to be computed once per demography, and does not depend on the number of SNPs; in particular, computing it for all values of x1,x2,xw, costs O(nw4) per demography, contributing an overall complexity of O(Madmixdn4) similarly, the term in fnvv(k) in (10) of Lemma 5 is computed once per demography, and costs O(nv2) (Kamm et al., 2017), contributing a complexity of O(d|V|n2).

Finally, we consider the terms that need to be recomputed for every SNP and every demography. Equations (6), (7), (8) of Lemmas 1, 2, 3, respectively, each cost O(n|KE|+1) per SNP, since they express x,zE,0 as a sum of O(n) terms. Similarly, equation (9) of Lemma 4 costs O(n|KE|+1), since we compute x,zE,0 as a product of an O(n) × O(n) matrix B+ with a tensor yx containing O(n|KE|1) entries; yx is itself computed by summing over all O(n|KE|) entries of x,zE,τw1ew1+τw2ew2. Altogether, this contributes a complexity of dsEn|KE|+1. □

A.4. Application supplement

A.4.1. Data

Table 3 gives the populations and samples we used for our example application in Section 4. All samples except for MA1 were taken from the SGDP dataset (Mallick et al., 2016). We added on the additional low-coverage MA1 sample (Raghavan et al., 2014) to represent the Ancient North Eurasian (ANE) component of European ancestry. Due to the low coverage of MA1, we only sampled a single random allele from it at each site, using a GATK pileup (McKenna et al., 2010) and restricting ourselves to reads with quality ≥30.

Table 3.

Populations and samples used for the example application in Section 4. We only used 1 random allele for MA1 due to low coverage. The ages of the samples are given in thousands of years ago (kya).

Population Individuals Alleles kya
Mbuti 3 6 0
Han 2 4 0
Sardinian 2 4 0
Loschbour 1 2 7.5
LBK (Stuttgart) 1 2 8
MA1 (Mal’ta) 1 1 24
Ust’Ishim 1 2 45
Altai Neanderthal 1 2 50

We ascertained SNPs at SGDP filter level 1, then used the genotype calls at filter level 0 at the ascertained SNPs. When ascertaining SNPs, we limited ourselves to sites that were polymorphic among the samples excluding MA1 and Neanderthal. We excluded MA1 during ascertainment due to its low coverage. We also excluded the Neanderthal sample during ascertainment because it had substantially fewer new mutations than expected based on its age; it was unclear whether this was due to changes in the mutation rate on that lineage since its deep split with modern humans, or whether this was an artifact of the SNP calling strategy used by SGDP. We used Theorem 2 to correct the normalizing constant of the SFS due to excluding MA1 and Neanderthal during ascertainment; in particular, we normalized the SFS by the total branch length of the subtree excluding MA1 and Neanderthal.

To avoid biases in ancient DNA caused by deamination (Dabney et al., 2013), we removed all transitions (i.e. A ↔ G and C ↔T mutations), keeping only the transversions. We used Chimp as a proxy for the ancestral allele, removing all sites where the Chimp allele was missing. After data cleaning, we were left with 2,444,888 autosomal transversion SNPs that were segregating among the samples excluding MA1 and Neanderthal.

A.4.2. Model fitting procedure

We fit the model in Figure 3 in an iterative fashion, adding populations in one at a time and re-estimating the parameters. We started with a tree including the 4 populations Mbuti, Loschbour, Han, and Ust’lshim, with no admixture events. This initial model had 8 parameters: the population sizes at the Han, Mbuti, and Loschbour leaves, the population size at the Eurasian and human ancestor (the Ust’lshim leaf was set to the ancestral Eurasian population size), and the times that the Han, Ust’lshim, and Mbuti populations diverged from Loschbour.

We next added in the Neanderthal population, with an admixture event from Neanderthal to the Eurasian ancestor. We added parameters for the Neanderthal-human split time, the Neanderthal-Eurasian introgression tie and strength, the ancestral Neanderthal-human population size, and a Neanderthal population exponential decline rate starting at the Eurasian-Mbuti split, following results from Prüfer et al. (2014).

We followed by adding on the MA1 sample, adding a single parameter for its split time (we fixed its population size to be the ancestral Eurasian population size). We then added the LBK early farmer, adding 4 parameters for its divergence time, Basal Eurasian admixture time and strength, and the split time of the Basal Eurasian lineage. Finally, we added in a Sardinian population, along with parameters for its population size, its split time, and admixture times and strength from the Loschbour WHG.

At each step, we re-estimated all parameters using the L-BFGS-B optimization algorithm, maximizing the multinomial composite likelihood (11). For the final few models (adding in LBK and then Sardinian) we initialized each estimation with a stochastic gradient descent before finishing with L-BFGS-B.

A.4.3. Mutation rate estimation

We used the within-population nucleotide diversity to estimate the mutation rate. The nucleotide diversity is the number of sites where 2 random alleles (drawn without replacement) differ. The empirical value of the nucleotide diversity in population i is

π^i=sSNPs2xi(s)(nixi(s))ni(ni1)=x2xi(nixi)ni(ni1)fx,

while the expected value of the nucleotide diversity per unit mutation rate is

πi=1θE[π^i]=x2xi(s)(nixi(s))ni(ni1)ϕx.

This yields a mutation rate estimate θ^i for each population, given by θ^i=π^iπi. We then averaged over θ^i to obtain our combined estimate θ^=1Di=1Dθ^i. To obtain the per-site mutation rate we need to divide by the number of bases L after our data cleaning process. We approximated this as follows:

  1. At SGDP filter level 1, there are about 2.13 × 109 sites per individual.

  2. Multiply this by 0.93 due to excluding the sex chromosomes.

  3. Multiply this by 0.32 due to only using transversions.

  4. Multiply by 0.93 to account for excluding sites missing the Chimp allele.

  5. Finally, multiply 1.1 to account for using genotype calls at filter level 0. The inflation in the number of sites here is because we are using a stricter filter to ascertain SNPs (SGDP filter level 1) than we are to call genotypes at the SNPs (SGDP filter level 0). In particular, we need to account for SNPs where the sample has a heterozygous genotype at filter level 0, but has a missing genotype at filter level 1 ; these genotypes are counted as heterozygotes in the SFS, and we need to adjust the number of bases L to account for this.

The latter 2 factors (accounting for sites missing the Chimp allele and for adding in genotype calls at filter level 0) we estimated by observing how the number of heterozygotes per individual changed after these data cleaning steps.

Footnotes

1

A reviewer noted that Jouganous et al. (2017) also employ a Moran-based model with discrete and continuous admixture, however, as noted above, their method of solution is completely different from ours.

2

We omit certain graph preprocessing steps (moralization and triangulation) from the general junction tree algorithm which are not necessary in our setting.

3

The calls to DP(e0,…,e0) and DP (en1,,enD) only need to be computed once for all statistics.

References

  1. Baharian S and Gravel S (2018). On the decidability of population size histories from finite allele frequency spectra. Theoretical Population Biology 120, 42–51. [DOI] [PubMed] [Google Scholar]
  2. Beaumont MA and Nichols RA (1996). Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society of London. Series B: Biological Sciences 263(1377), 1619–1626. [Google Scholar]
  3. Bhaskar A and Song YS (2014). Descartes’ rule of signs and the identifiability of population demographic models from genomic variation data. Annals of Statistics 42 (6), 2469–2493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bhaskar A, Wang YXR, and Song YS (2015). Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Research 25 (2), 268–279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, et al. (2008). Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genetics 4 (5), e1000083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, and RoyChoudhury A (2012). Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution 29 (8), 1917–1932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chen H (2012). The joint allele frequency spectrum of multiple populations: A coalescent theory approach. Theoretical Population Biology 81 (2), 179–195. [DOI] [PubMed] [Google Scholar]
  8. Corliss G, Faure C, Griewank A, Hascoet L, and Naumann U (2002). Automatic Differentiation of Algorithms: From Simulation to Optimization, Volume 1 New York: Springer Science & Business Media. [Google Scholar]
  9. Coventry A, Bull-Otterson LM, Liu X, Clark AG, Maxwell TJ, Crosby J, Hixson JE, Rea TJ, Muzny DM, Lewis LR, et al. (2010). Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nature Communications 1, 131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Dabney J, Meyer M, and Pääbo S (2013). Ancient DNA damage. Cold Spring Harbor perspectives in biology 5(7), a012567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. De lorio M and Griffiths RC (2004). Importance sampling on coalescent histories. II: Subdivided population models. Adv. Appl Prob. 36, 434–454. [Google Scholar]
  12. De Maio N, Schrempf D, and Kosiol C (2015). Pomo: An allele frequency-based approach for species tree estimation. Systematic Biology 64 (6), 1018–1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Donnelly P and Kurtz T (1996). A countable representation of the Fleming-Viot measure-valued diffusion. Ann Probab 24, 698–742. [Google Scholar]
  14. Donnelly P, Kurtz TG, et al. (1999). Particle representations for measure-valued population models. The Annals of Probability 27 (1), 166–205. [Google Scholar]
  15. Durrett R (2008). Probability Models for DNA Sequence Evolution (2nd ed.). Springer, New York. [Google Scholar]
  16. Ewens WJ (2004). Mathematical Population Genetics: I. Theoretical Introduction. New York: Springer Science+Business Media, Inc. [Google Scholar]
  17. Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, and Foil M (2013). Robust demographic inference from genomic and SNP data. PLoS Genetics 9 (10), e1003905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Excoffier L and Foil M (2011). Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27 (9), 1332–1334. [DOI] [PubMed] [Google Scholar]
  19. Fay JC and Wu CI (2000). Hitchhiking under positive Darwinian selection. Genetics 155, 1405–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Felsenstein J (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17, 368–376. [DOI] [PubMed] [Google Scholar]
  21. Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PL, Aximu-Petri A, Prüfer K, de Filippo C, et al. (2014). Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514 (7523), 445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gazave E, Ma L, Chang D, Coventry A, Gao F, Muzny D, Boerwinkle E, Gibbs RA, Sing CF, Clark AG, et al. (2014). Neutral genomic regions refine models of recent rapid human population growth. Proceedings of the National Academy of Sciences 111 (2), 757–762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, Yu F, Gibbs RA, Bustamante CD, Altshuler DL, et al. (2011). Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences 108 (29), 11983–11988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH-Y, et al. (2010). A draft sequence of the Neandertal genome. Science 328(5979), 710–722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Griffiths R and Tavaré S (1998). The age of a mutation in a general coalescent tree. Communications in Statistics. Stochastic Models 14 (1–2), 273–295. [Google Scholar]
  26. Griffiths RC and Tavaré S (1997). Computational methods for the coalescent In Donnelly P and Tavaré S (Eds.), Progress in population genetics and human evolution, Volume 87, pp. 165–182. Springer-Verlag, Berlin. [Google Scholar]
  27. Gutenkunst RN, Hernandez RD, Williamson SH, and Bustamante CD (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics 5(10), e1000695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, Brandt G, Nordenfelt S, Harney E, Stewardson K, et al. (2015). Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Holsinger KE and Weir BS (2009). Genetics in geographically structured populations: defining, estimating and interpreting FST. Nature Reviews Genetics 10 (9), 639–650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Jenkins PA, Mueller JW, and Song YS (2014). General triallelic frequency spectrum under demographic models with variable population size. Genetics 196(1), 295–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Jouganous J, Long W, Ragsdale AP, and Gravel S (2017). Inferring the joint demographic history of multiple populations: Beyond the diffusion approximation. Genetics 206(3), 1549–1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Kamm JA, Terhorst J, and Song YS (2017). Efficient computation of the joint sample frequency spectra for multiple populations. Journal of Computational and Graphical Statistics 26 (1), 182–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kelleher J, Etheridge AM, and McVean G (2016). Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS computational biology 12(5), e1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Kimura M (1969). The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61 (4), 893–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Kingman JFC (1982). The coalescent. Stoch. Process. Appl 13, 235–248. [Google Scholar]
  36. Koller D and Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT press. [Google Scholar]
  37. Lauritzen SL and Spiegelhalter DJ (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B (Methodological) 50 (2), 157–224. [Google Scholar]
  38. Lazaridis I, Nadel D, Rollefson G, Merrett DC, Rohland N, Mallick S, Fernandes D, Novak M, Gamarra B, Sirak K, et al. (2016). Genomic insights into the origin of farming in the ancient Near East. Nature 536 (7617), 419–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, Sudmant PH, Schraiber JG, Castellano S, Lipson M, et al. (2014). Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513 (7518), 409–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Lukić S and Hey J (2012). Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion. Genetics 192 (2), 619–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Maclaurin D, Duvenaud D, and Adams RP (2015). Autograd: Effortless gradients in numpy. In ICML 2015 AutoML Workshop. [Google Scholar]
  42. Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, et al. (2016). The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538 (7624), 201–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. (2010). The Genome Analysis Toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Research 20 (9), 1297–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Meyer M, Arsuaga J-L, de Filippo C, Nagel S, Aximu-Petri A, Nickel B, Martinez I, Gracia A, de Castro JMB, Carbonell E, et al. (2016). Nuclear DNA sequences from the middle pleistocene sima de los huesos hominins. Nature 531 (7595), 504–507. [DOI] [PubMed] [Google Scholar]
  45. Myers S, Fefferman C, and Patterson N (2008). Can one learn history from the allelic spectrum? Theor. Popul. Biol 73(3), 342–348. [DOI] [PubMed] [Google Scholar]
  46. Nelson MR, Wegmann D, Ehm MG, Kessner D, Jean PS, Verzilli C, Shen J, Tang Z, Bacanu S-A, Fraser D, et al. (2012). An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337(6090), 100–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Nielsen R (2000). Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154 (2), 931–942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Notohara M (1990). The coalescent and the genealogical process in geographically structured population. Journal of Mathematical Biology 29 (1), 59–75. [DOI] [PubMed] [Google Scholar]
  49. Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, and Reich D (2012). Ancient admixture in human history. Genetics 192(3), 1065–1093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Pearl J (1982). Reverend bayes on inference engines: a distributed hierarchical approach. In Proceedings of the National Conference on Artificial Intelligence, pp. 133–136. [Google Scholar]
  51. Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, De Filippo C, et al. (2014). The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505(7481), 43–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A, Moltke I, Rasmussen S, Stafford TW Jr, Orlando L, Metspalu E, et al. (2014). Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505(7481), 87–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Saitou N and Nei M (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(A), 406–425. [DOI] [PubMed] [Google Scholar]
  54. Sawyer SA and Hartl DL (1992). Population genetics of polymorphism and divergence. Genetics 132(A), 1161–1176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Scally A (2016). The mutation rate in human evolution and demographic inference. Current Opinion in Genetics & Development 41, 36–43. [DOI] [PubMed] [Google Scholar]
  56. Schaffner SF, Foo C, Gabriel S, Reich D, Daly WJ, and Altshuler D (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Stephens M and Donnelly P (2000). Inference in molecular population genetics. J. Royal Stat. Soc. B 62, 605–655. [Google Scholar]
  58. Tajima F (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123 (3), 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Takahata N (1988). The coalescent in two partially isolated diffusion populations. Genetics Research 52(3), 213–222. [DOI] [PubMed] [Google Scholar]
  60. Terhorst J, Kamm JA, and Song YS (2017). Robust and scalable inference of population history from hundreds of unphased whole genomes. Nature Genetics 49(2), 303–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Terhorst J and Song YS (2015). Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proceedings of the National Academy of Sciences 112(25), 7677–7682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Wakeley J and Hey J (1997). Estimating ancestral population parameters. Genetics 145(3), 847–855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Watterson G (1975). On the number of segregating sites in genetical models without recombination. Theoretical Population Biology 7(2), 256–276. [DOI] [PubMed] [Google Scholar]
  64. Wegmann D, Leuenberger C, Neuenschwander S, and Excoffier L (2010). ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC bioinformatics 11 (1), 116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Zeng K, Fu Y-X, Shi S, and Wu C-I (2006). Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174 (3), 1431–1439. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES