Efficiently inferring the demographic history of many populations with allele count data

Jack Kamm; Jonathan Terhorst; Richard Durbin; Yun S Song

doi:10.1080/01621459.2019.1635482

. Author manuscript; available in PMC: 2021 Jan 1.

Published in final edited form as: J Am Stat Assoc. 2019 Jul 22;115(531):1472–1487. doi: 10.1080/01621459.2019.1635482

Efficiently inferring the demographic history of many populations with allele count data

Jack Kamm ^1,^2,⁶, Jonathan Terhorst ³, Richard Durbin ^1,², Yun S Song ^4,^5,^6,^*

PMCID: PMC7531012 NIHMSID: NIHMS1534767 PMID: 33012903

Abstract

The sample frequency spectrum (SFS), or histogram of allele counts, is an important summary statistic in evolutionary biology, and is often used to infer the history of population size changes, migrations, and other demographic events affecting a set of populations. The expected multipopulation SFS under a given demographic model can be efficiently computed when the populations in the model are related by a tree, scaling to hundreds of populations. Admixture, back-migration, and introgression are common natural processes that violate the assumption of a tree-like population history, however, and until now the expected SFS could be computed for only a handful of populations when the demographic history is not a tree. In this article, we present a new method for efficiently computing the expected SFS and linear functionals of it, for demographies described by general directed acyclic graphs. This method can scale to more populations than p reviously possible for complex demographic histories including admixture. We apply our method to an 8-population SFS to estimate the timing and strength of a proposed “basal Eurasian” admixture event in human history. We implement and release our method in a new open-source software package momi2.

Keywords: population genetics, coalescent theory, demographic inference, frequency spectrum

1. Introduction

All natural populations undergo evolutionary processes of migration, size changes, and divergence, and the history of these demographic events shape their present genetic diversity. Thus, inferring demographic history is of central concern in evolutionary and population genetics, both for its intrinsic interest (e.g., in dating the out-of-Africa migration of modern humans (Schaffner et al., 2005; Gutenkunst et al., 2009)) and also for biological applications (such as distinguishing the effects of natural selection from demography (Beaumont and Nichols, 1996; Boyko et al., 2008)). However, genetic sequence data and the space of possible demographic models are both very high dimensional objects, leading to numerous statistical and computational challenges when inferring demographic history from genetic data.

The joint sample frequency spectrum (SFS) is the multidimensional histogram of mutant allele counts in a sample of DNA sequences, and is a popular summary statistic which lies at the core of hundreds of empirical studies in population genetics; e.g., see Wakeley and Hey (1997); Griffiths and Tavaré (1998); Nielsen (2000); Gutenkunst et al. (2009); Coventry et al. (2010); Gazave et al. (2014); Gravel et al. (2011); Nelson et al. (2012); Excoffier et al. (2013); Jenkins et al. (2014); Bhaskar et al. (2015); Jouganous et al. (2017). Recently, progress has also been made on theoretical fronts to characterize statistical properties of SFS-based inference. In particular, studies of identifiability (Myers et al., 2008; Bhaskar and Song, 2014) and the rate of convergence (Terhorst and Song, 2015; Baharian and Gravel, 2018) have been carried out.

Demographic history can be inferred by fitting the observed value of the SFS to its expected value in a composite likelihood framework. The expected SFS can be efficiently computed when the demographic history is a tree, and in previous work we developed a method momi to compute the SFS of hundreds of populations related by a tree (Kamm et al., 2017). However, natural populations are often related by a more complex history that is not tree-like, as gene flow (the exchange of migrants between populations) adds extra edges to the topology associated with the demographic history. In this case, computing the expected SFS is much more computationally demanding, and existing methods for computing the exact expected SFS can scale to only a handful of populations (Gutenkunst et al., 2009; Jouganous et al., 2017).

In this article, we extend our previous algorithm momi to handle discrete (or pulse) migration events between populations, in which case demographies are described by general directed acyclic graphs (DAGs). Our new method momi2 can compute the exact expected SFS with admixture for more populations than previously possible, and uses novel insights from a stochastic process known as the Lookdown Construction (Donnelly and Kurtz, 1996; Donnelly et al., 1999). In addition, momi2 utilizes automatic differentiation (Corliss et al., 2002; Bhaskar et al., 2015) to compute gradients of the SFS, which we use to efficiently search the parameter space during optimization. Finally, momi2 can efficiently compute linear functionals of the SFS, which we exploit to compute the expected values of a number of standard population genetic summary statistics under complex demography.

The rest of this paper as organized as follows. In Section 2 we provide some background and survey related work. Section 3 describes our method, first with an illustrative example in Section 3.1, and then with formulas and pseudocode in Section 3.2. Finally, in Section 4, we apply our method to an 8-population SFS, including ancient and contemporary human populations, to estimate the timing and strength of a proposed “basal Eurasian” admixture event in human history. The Appendix contains all proofs, an analysis of the computational complexity of our method, and additional details of the application to ancient DNA.

2. Background

Suppose a sample of $n = (n_{1}, \dots, n_{D})$ haploid genomes have been sampled from $D$ “demes” or populations. The positions in the genome where the samples are not all identical are called segregating sites. In most organisms mutations are rare; most sites are not segregating. It is therefore reasonable to assume, as we do from now on, that each position in the genome has experienced at most a single mutation in its history, and that each individual can be labeled as having the “ancestral” or “derived” (mutant) allele at each segregating site. In population genetics, this simplifying assumption is known as the infinite sites model.

The sample frequency spectrum (SFS) is a $D$ -dimensional array $[f_{x}] \in ℤ^{(n_{1} + 1) \times \dots \times (n_{D} + 1)}$ whose entry f_x counts the number of segregating sites with exactly x copies of the derived allele and n – x copies of the ancestral allele, where $x = (x_{1}, \dots, x_{D}) \in ℕ_{0}^{D}$ with 0 ≤ x_d ≤ n_d for each $d = 1, \dots, D$ . Note we only consider segregating sites with 2 alleles, so f₀ = f_n = 0 by definition. Compared to the full data set (i.e., the complete genetic sequences of all n = n₁ +⋯+ n_d genomes), the SFS [f_x] is a compressed, low-dimensional summary which nevertheless preserves much of the signal about the various population size changes, divergence times, and admixture events that occurred over the course of the populations’ history.

2.1. Demographic events

The expected multipopulation SFS can be obtained by integrating over random genealogies formed by a backwards-in-time stochastic process known as the structured coalescent (Kingman, 1982; Takahata, 1988; Notohara, 1990). Before getting to the technical details, we first review the basic dynamics of this process in order to build intuition for how the data have power to infer population splits, size changes, and gene flow events. See Durrett (2008) for a more detailed introduction.

Informally, the topology and branch lengths of genealogies are affected by a demographic history in two ways:

Two lineages may not coalesce into a common ancestor until they reside in the same population, and the time until this occurs is affected by migration patterns and population split times.
At any particular point in time, two members within the same population are more likely to have a common parent if the population size is small; so, for example, residents of a small village will typically be more closely related than residents of a large city.

Regarding the second point, we define the scaled effective population size η(t) such that the rate at which any two lineages find a common ancestor at time t is 1/η(t). Under the simplest random mating model (the Wright Fisher model, cf. Durrett (2008)), the census population size exactly equals Tη, where T is the number of generations per unit time; more generally, η scales with the number of breeding individuals in the population. Thus, estimating η allows us to infer size change events such as bottlenecks, exponential growth, and population crashes.

If we could observe the true distribution of genealogies, then we could directly infer demographic history from the waiting times between coalescence events, following the principles listed in items 1 and 2 above. However, since genealogies are never directly observed, we must make inferences about demographic history indirectly using mutation data.

2.2. Likelihoods and the site frequency spectrum

Consider a genome with L positions and mutation rate $\frac{θ}{L}$ per position per generation. So at any given position in the genome, mutations arise on the tree there as a Poisson point process with rate $\frac{θ}{L}$ . The chance of 2 or more mutations at a single position is $O (\frac{1}{L^{2}})$ , and taking the limit L → ∞ we arrive at the aforementioned infinite sites approximation (Kimura, 1969; Durrett, 2008), which assumes that each segregating site was caused by a single mutation, so that each allele may be labeled as ancestral or derived.

The observed segregating sites are not independent, because trees at neighboring positions are correlated. Unfortunately, even in the simplest case of a single population with constant size, an analytic expression for the likelihood of mutation data at a set of linked (non-independent) sites is not known (Bhaskar et al., 2015). Therefore, SFS data are generally used with composite likelihood methods. Recall that f_x Is the total number of segregating sites with derived allele count pattern $x = (x_{1}, \dots, x_{D})$ . Define $ϕ_{x} = \frac{1}{θ} E [f_{x}]$ , i.e., ϕ_x is the expected frequency of x per unit mutation rate. Equivalently, ϕ_x is the expected branch length subtending x leaves in a random coalescent tree (i.e., with x₁ descendants in population 1, x₂ in population 2, and so on). A commonly used composite likelihood is the Poisson random field model (Sawyer and Hartl, 1992), which assumes that the total number of segregating sites is Poisson with rate $θ \sum_{x} ϕ_{x}$ , and that the patterns at the observed sites are independent with sampling probabilities proportional to ϕ_x; this yields a log-likelihood of

L \propto ‖ f ‖_{1} log (θ ‖ ϕ ‖_{1}) - θ ‖ ϕ ‖_{1} + \sum_{x} f_{x} log \frac{ϕ_{x}}{‖ ϕ ‖_{1}},

(1)

where $‖ f ‖_{1} ≔ \sum_{x} f_{x}$ and $‖ ϕ ‖_{1} ≔ \sum_{x} ϕ_{x}$ . Demographic history can then be inferred by searching for the parameter values that maximize $L$ .

2.3. Existing work and our contribution

The composite log-likelihood $L$ in (1) requires us to compute ϕ_x, the expected branch length subtending x. Let G be a random genealogical tree sampled under the demography, and L_x(G) be the total length of all branches in G subtending x, so that

ϕ_{x} = E [L_{x} (G)] = \int_{G} L_{x} (G) d ℙ (G) .

(2)

The integral (2) is difficult since the support of G is at least as large as the number of labeled binary trees with n leaves, a quantity which grows faster than exponentially in the sample size n. Consequently, a number of different methods have been proposed for evaluating (or approximating) the integral (2). Sampling-based methods include Markov chain Monte Carlo (Griffiths and Tavaré, 1997; Nielsen, 2000), importance sampling (Stephens and Donnelly, 2000; De Iorio and Griffiths, 2004), simulation (Excoffier and Foll, 2011; Excoffier et al., 2013), and ABC (Wegmann et al., 2010). The main advantage of these methods is their flexibility: since, as mentioned above, computing the likelihood of the data is trivial conditional on G, these methods can be used on a rich class of models. However, the high dimension of ϕ^x makes it impractical to compute by sampling unless $D$ is small. In particular, since the support of x grows like $O (n^{D})$ , Monte Carlo methods will assign zero mass to configurations that are actually observed in the data.

A second approach, implemented in the software ∂a∂i (Gutenkunst et al., 2009), computes ϕ_x by numerically solving PDEs arising from the Wright-Fisher diffusion (Ewens, 2004), which is dual to the coalescent process described above. For $D$ populations, this involves numerically solving a $D$ -dimensional integral. The initial ∂a∂i method in Gutenkunst et al. (2009) could handle up to $D = 3$ populations; subsequent improvements (Lukić and Hey, 2012; Jouganous et al., 2017) extended this to $D = 4$ and then $D = 5$ populations by using spectral representations or alternative basis functions for solving the PDEs.

The third approach for computing ϕ_x, which includes our method, integrates over the sample allele frequencies “backwards-in-time”, exploiting conditional independence relationships to reduce computation. This involves considering the alleles of the sample’s ancestors at different points in the demographic history, and integrating out these random variables via inference algorithms for probabilistic graphical models (Pearl, 1982; Felsenstein, 1981; Lauritzen and Spiegelhalter, 1988; Koller and Friedman, 2009). Bryant et al. (2012) and Chen (2012) computed the SFS for finite- and infinite-sites models using this backward-in-time approach under the coalescent. De Maio et al. (2015) and Kamm et al. (2017) substantially lowered the computational burden of this approach by replacing the coalescent with the continuous-time Moran model (Durrett, 2008), a stochastic process which induces the same sampling distribution as the coalescent, but using a much smaller state space. However, until now these Moran-based approaches have been limited to analyzing tree-shaped demographies without admixture between populations.¹

The main contribution of this paper is to extend our previous Moran-based method (Kamm et al., 2017) to allow for demographies defined on general directed acyclic graphs (DAGs), thus allowing for admixture between populations. We describe and implement an algorithm for computing the expected infinite-sites SFS under the multipopulation Moran model with size changes, exponential growth, population splits, and point admixture events (i.e., instantaneous migration “pulses”). This substantially enlarges the space of demographies for which the expected SFS can be accurately computed.

Additionally, our algorithm computes not only individual SFS entries, but also linear functionals of the expected SFS. Specifically, our method computes rank-1 tensor products of the SFS in the same time as a single entry (general linear functionals are sums of these rank-1 products). A number of widelyused statistics in population genetics can be expressed as SFS functionals, and to our knowledge our method is the first to compute expectations of these statistics under complex demography.

We demonstrate our method by using it to infer the history of eight human subpopulations that have undergone multiple admixture events. We complement our theoretical contributions with an open-source, user-friendly software implementation that will enable practitioners to deploy our method. The software uses automatic differentiation (Corliss et al., 2002; Bhaskar et al., 2015; Maclaurin et al., 2015) to compute derivatives of the SFS, leading to efficient optimization and parameter inference. Our package, called momi2, is available for download at https://github.com/popgenmethods/momi2.

3. Method

In this section, we describe the algorithm implemented in momi2 for computing the expected SFS under complex demographies. We begin in Subsection 3.1 with an illustrative example that highlights the novel aspects of our work. Then in Subsection 3.2 we provide pseudocode for our algorithm, and state the formulas used by our algorithm as Propositions. The proofs of these Propositions, and the proof for the correctness of our algorithm, requires substantial additional notation, and we defer this to Appendix A.2. For ease of reference, the symbols we use are summarized in Table 1.

Table 1.

Notation. Rows in the second part of the table are only used for proofs in the Appendix and may be ignored in the main text.

Symbol	Description	Reference
$D$	Sampled populations are ${1, \dots, D}$	§2
n	# of samples per population $(n_{1}, \dots, n_{D})$	”
f_x	# of sites with x derived and n − x ancestral alleles	”
θ	Mutation rate	§2.2
ϕ_x	Expected branch length with x descendants, $\frac{1}{θ} E [f_{x}]$	”
$G$	Population graph	Figure 2(a)
v	A vertex in $G$ , corresponding to a population	§3.2
T_v	time between top and bottom events of v	”
η_v (t)	scaled population size of v at t ∈[0, τ_v]	”
n_v	# of samples with ancestry in v	”
$X_{v}^{(t)}$	# of derived alleles in v at time t	”
$X_{S}^{(t)}$	Vector of allele counts $(X_{v_{1}}^{(t_{1})}, \dots, X_{v_{k}}^{(t_{k})})$ at S = {v₁,…,v_k}	”
X_v, X_S	Shorthand for $X_{v}^{(0)}$ , $X_{S}^{(0)}$ respectively	”
$T$	Event tree	Figure 2(b)
E	A join, split or leaf event in $T$	”
K_E	Populations we are keeping track of at E	”
$ℓ_{x, z}^{E, t}$	Conditional likelihood $ℙ (X_{Leaves (E)} = z \| X_{K_{E}}^{(t)} = x)$	Eq. (4)
$ϕ_{z}^{E}$	$E [branch length atbelow K_{E} with z descendants]$	Eq. (5)
$ℓ_{x}^{E, t}$ , ϕ^E	Shorthand for $ℓ_{x, z}^{E, t}$ , $ϕ_{z}^{E}$ respectively when z fixed	”
n_tot	The total number of sampled lineages $\sum_{d = 1}^{D} n_{d}$	Appendix A.1
$M_{v, t, (i)}$	Label and allele of the ith lineage in v at t ∈[0, τ_v]	”
$M_{v, t}$	$(M_{v, t, (1)}, M_{v, t, (2)}, \dots)$	”
$M_{E, t}$	$(M_{v_{1}, t_{1}}, \dots, M_{v_{k}, t_{k}})$ where K_E = {v₁,…,v_k} and t = (t₁,…,t_k)	”
$X_{v, m}^{(t)}$	the # derived among the 1st m lineages in v, t. (Note: $X_{v}^{(t)} \equiv X_{v, n_{v}}^{(t)}$ )	”

Open in a new tab

3.1. Example

Consider the model depicted in Figure 1. This model has 3 sampled (leaf) populations, related as follows. Populations 1 and 2 are sisters, with Population 3 an outgroup to them; however a pulse of migrants from 3 to 2 occurs after the split between populations 1 and 2.

Fig. 1 — An example of a 3-population Moran model. The bottom of the graph corresponds to the present and the top to the past. Population 2 receives admixture from population 3 after splitting from population 1. Other features of the demography include archaic samples in population 1, and various size changes along the edges of this demography. Lineages bearing the ancestral (derived) allele are shown in red (blue). The derived allele arises as the blue star (⋆) in the population immediately ancestral to population 4. The red circles (•) and blue stars represent one potential configuration of the allelic states of the ancestors to the sample; our method integrates over all potential configurations in order to calculate the likelihood. Arrows pointing to the right (→) represent reproduction events in the forward-time Moran model. The black arrow pointing to the left (←) at event 6 is a migration event from the population at node 7 to the population at node 5 (again in the forward-time sense).

At the leaves, we observe (n_1, n_2, n₃) = (2,2,1) samples, with (X_1, X₂, X₃) = (1,1,0) copies of the derived (blue star) allele. We wish to compute the expected number of mutations with pattern (X₁, X₂, X₃); to do this, we will integrate over the unobserved variables (X₄, X₅, X₆, X₇, X₈) which represent allele counts at certain internal positions within the demography. The random variables X₁,…,X₈ are related to each other by the DAG $G$ in Figure 2(a), with an edge from population v to w if alleles may pass directly from v into w.

Fig. 2 — The population DAG $G$ (a) and event tree $T$ (b) corresponding to the demography shown in Figure 1. (a) Each vertex in $G$ corresponds to a collection of alleles in Figure 1. (b) The corresponding event tree $T$ is a junction tree of the DAG in $G$ ; each vertex of $T$ corresponds to a set of vertices from $G$ .

In the realization of Figure 1, the hidden blue allele counts are (X₄, X₅, X₆, X₇, X₈) = (3,2,0,0,0). The blue mutation occurs in the edge above X₄, spreading to 3 of the lineages. One copy of the blue allele moves on to X₁, while 2 copies move on to X₅. However, due to the admixture, X₂ only inherits 1 blue allele from X₅, inheriting a red allele from the 2 red alleles at X₆.

Under the infinite-sites assumption described above, the mutation observed at this site arose at a single point in the genealogical tree depicted in Figure 1. To compute the expected number of mutations with (X₁, X₂, X₃) = (1,1,0), we may condition on the population (i.e., the edge in Figure 1) on which it arose:

ϕ_{1, 1, 0} = \frac{1}{θ} E [# mutations with (X_{1}, X_{2}, X_{3}) = (1, 1, 0)] = \frac{1}{θ} \sum_{v = 1}^{8} \sum_{i = 1}^{n_{v}} E [# mutations in population v yielding X_{v} = i] \times ℙ (X_{1} = 1, X_{2} = 1, X_{3} = 0 | mutation at v with X_{v} = i) .

(3)

We call $\frac{1}{θ} E [# mutations at v with X_{v} = i]$ , the “truncated SFS”; this only concerns events within a single population and can be computed using the method momi1. We refer the interested reader to Kamm et al. (2017) for further details. The second term $ℙ (X_{1} = 1, X_{2} = 1, X_{3} = 0 | mutation at v with X_{v} = i)$ gives the conditional likelihood of observing the data given a mutation and its allele count at population v. In momi1, we computed this term in the case where $G$ is a tree without admixture using the sum-product (also known as belief propagation) algorithm (Felsenstein, 1981; Koller and Friedman, 2009).

The main result of this work is to extend our previous dynamic program to the case where $G$ is given by a DAG due to admixture. We will use a dynamic program that is essentially a kind of junction tree algorithm (Koller and Friedman, 2009). This algorithm works by decomposing a DAG graph into a tree in such a way that vertices in the tree correspond to collections of nodes in the original graph. Belief propagation is then applied to the tree decomposition.

We illustrate our algorithm using the example demography in Figure 2(b). We call the tree decomposition $T$ an event tree, because each internal node corresponds to either an admixture or split event.² We construct the event tree $T$ , and compute the conditional likelihoods $ℙ (X_{1} = 1, X_{2} = 1, X_{3} = 0 | mutation at v with X_{v} = i)$ , as follows:

We initially start with a collection of 3 singleton sets of leaf populations: {{1},{2},{3}}. For v = 1, 2, 3, we also keep track of conditional likelihoods of the data beneath v; since we are at the leaves, these are simply the Kronecker delta functions
$ℙ (X_{v} = i | X_{v} = j) = δ_{i, j} = {\begin{array}{l} 1, & if i = j, \\ 0, & if i \neq j . \end{array}$
Going back in time, the first event is the admixture of populations 5 and 6 into 2. To process this event, we remove the set {2} and replace it with {5, 6}; the current collection of sets becomes {{1},{3},{5,6}}. We make the new set {5, 6} the parent of the removed set {2} in $T$ .

In addition, we compute ${[ℙ (X_{2} | X_{5} = i, X_{6} = j)]}_{i, j}$ , the conditional likelihood of the data beneath {5, 6} given the allele counts X₅, X₆. We obtain this by applying Lemmas 1 and 2 to the previous conditional likelihood ${[ℙ (X_{2} | X_{2} = i)]}_{i}$ . Specifically, we apply Lemma 1 to “lift” the conditional likelihood at population 2 to the time point immediately below the admixture event; then, we apply Lemma 2 to obtain $ℙ (X_{2} | X_{5}, X_{6})$ , the conditional likelihood at {5, 6} immediately above the admixture event.
The next event has the alleles from 6 splitting off from population 3. To process this event, we merge the clusters containing the relevant populations ({5, 6} and {3}); then we remove 3,6 and replace them with their parent population 7, to obtain {5, 7} as the parent cluster of {5, 6} and {3} in $T$ . After this stage, our collection of sets becomes {{1},{5,7}}.

To obtain $ℙ (X_{2}, X_{3} | X_{5}, X_{7})$ , the conditional likelihood of the data at the leaves beneath {5, 7}, we apply Lemma 3 to $ℙ (X_{3} | X_{3})$ and $ℙ (X_{2} | X_{5}, X_{6})$ . Lemma 3 computes the conditional likelihood at a split event when the children fall into separate clusters beneath the split (in this case, the child clusters are {3} and {5, 6}).
The next event is similar, with populations 1 and 5 merging into population 4. After combining the clusters {1} and {5, 7}, removing the merged populations 1 and 5, and adding in their parent 4, we are left with the collection {{4,7}}.

Similar to previous steps, we compute $ℙ (X_{1}, X_{2}, X_{3} | X_{4}, X_{7})$ , the conditional likelihood of the data at the leaves beneath {4, 7}, by applying Lemma 1 to “lift” the conditional likelihoods $ℙ (X_{1} | X_{1})$ and $ℙ (X_{2}, X_{3} | X_{5}, X_{7})$ to the point immediately below the split event, and then apply Lemma 3 to compute the conditional likelihood at a split event above two independent clusters ({1},{5,7}).
The final event has populations 4 and 7 merging into population 8. We replace the cluster {4, 7} with its parent cluster {8} at the root of $T$ .

To compute $ℙ (X_{1}, X_{2}, X_{3} | X_{8})$ from the child likelihoods $ℙ (X_{1}, X_{2}, X_{3} | X_{4}, X_{7})$ , we first apply Lemma 1 to lift the likelihoods immediately below the split event, and then apply Lemma 4, which computes the conditional likelihood at the parent of a split when the children belong to the same cluster ({4, 7} in this case).

Finally, at each cluster {ν₁, ν₂,…} in $T$ with leaves {ι₁, ι₂…} we have computed $ℙ (X_{l_{1}}, X_{l_{2}}, \dots | X_{v_{1}}, X_{v_{2}}, \dots)$ , the conditional likelihoods at the leaves beneath {ν₁, ν₂,…}. But to apply equation (3), we need $ℙ ((X_{1}, X_{2}, X_{3}) = (1, 1, 0) | mutation at v with X_{v} = i)$ mutation at ν with X_ν = i) This is given by

ℙ ((X_{1}, X_{2}, X_{3}) = (1, 1, 0) | mutation at v with X_{v} = i) = {\begin{array}{l} ℙ ((X_{1}, X_{2}, X_{3}) = (1, 1, 0) | X_{4} = i, X_{7} = 0), & if v = 4, \\ ℙ ((X_{1}, X_{2}, X_{3}) = (1, 1, 0) | X_{8} = i), & if v = 8, \\ 0, & else, \end{array}

since we assume that each site experiences at most a single mutation in its history, and the only way derived alleles are observed in leaf populations 1,2 is if the corresponding mutation occurs in population 4 or 8.

Remark. The observant reader may have noticed that in Figure 1, population 8 has only n₈ = 5 alleles, despite its children having n₄ + n₇ = 7 > 5 alleles in total. This is due to the fact that there are only 5 alleles at the leaves, so there are at most 5 ancestors in any population at any point; thus, when populations 4 and 7 merge into population 8, we can drop two of the tracked lineages. To show this formally, we use a stochastic process called the lookdown construction, which is a version of the Moran model with a countably infinite number of lineages. However, this analysis requires a great deal of additional notation and is not essential to the remainder of the text, so we defer it to Appendix A.1.

3.2. Algorithms and formulas

We now describe the algorithm to construct the event tree $T$ . We assume that the population graph $G$ has two types of topological events:

Population split: two child populations u, v split from each other; their parent population is w. Looking backward in time, u, v merge and become the population w.
Population admixture: a single child population u, inherits from exactly two parent populations v, w, with the probability that an allele comes from v(w, respectively) being p(1−P, respectively).

Note that more complicated events, such as trifurcating splits or symmetric pulse migrations, may be expressed as a succession of these 2-way split and admixture events.

We provide pseudocode to construct the event tree $T$ in Algorithm 1. In words, we initially start with $K$ equal to a collection of singleton sets corresponding to the leaves of $G$ . Processing each split or admixture event E back in time, we merge all blocks containing the child population(s) of E Then we remove the child population(s) from this merged block, and add in the parent population(s). Within $T$ , the new merged block is the parent of the blocks removed at this stage.

Algorithm 1.

Construct event tree $T$

1: procedure EventTree(

G

)

K \leftarrow {{l} : l \in Leaves (G)}

⊳ State variable containing current set of tracked populations

V (T) \leftarrow {}

⊳ Vertices of

T

E (T) \leftarrow {}

⊳ Edges of

T

5: for events E in BackInTimeEvents

(G)

6: if E is split then

7: w₁, w₂ ← ChildPopulations(E)

8: ν ← ParentPopulation(E)

9: K₁ ← block in

K

containing w₁

10: K₂ ← block in

K

containing w₂

11: K_E ← K₁ ∪ K₂ \{w₁, w₂} ∪ {ν}

12:

K \leftarrow K \ {K_{1}, K_{2}} \cup {K_{E}}

13:

V (T) \leftarrow V (T) \cup {K_{E}}

14: if K₁ ≠ K₂ then

15:

E (T) \leftarrow E (T) \cup {K_{E} \to K_{1}, K_{E} \to K_{2}}

16: else

17:

E (T) \leftarrow E (T) \cup {K_{E} \to K_{1}}

18: end if

19: else if E is admixture then

20: w ← ChildPopulation(E)

21: ν₁, ν₂ ← ParentPopulations(E)

22: K′ ← block in

K

containing w

23: K_E ← K′\{w}∪{ν₁, ν₂}

24:

K \leftarrow K \ {K^{'}} \cup {K_{E}}

25:

V (T) \leftarrow V (T) \cup {K_{E}}

26:

E (T) \leftarrow E (T) \cup {K_{E} \to K'}

27: end if

28: end for

29: return

V (T), E (T)

30: end procedure

Open in a new tab

We now describe Algorithm 2, the dynamic program to compute the joint SFS. We need a bit more notation. For population v, let n_v be the number of samples with ancestry in v, we will be keeping track of n_v lineages within population v. Let T_v denote the amount of time between the top and bottom events of v, and let η_ν(t) denote the scaled population size of v at time t ∈ [0, τ_ν] above the base of v, Let $0 \leq X_{v}^{(t)} \leq n_{v}$ be the allele count within the n_v lineages of v at time t above its bottom, so $X_{v}^{(0)} = X_{v}$ in our earlier notation.

At event E in $T$ , let $K_{E} = {v_{1}, \dots, v_{| K_{E} |}}$ be the corresponding block of populations in $G$ . We define the conditional likelihood at E as

ℓ_{x, z}^{E, t} = ℙ (X_{Leaves (E)} = z | X_{K_{E}}^{(t)} = x)

(4)

where $X_{K_{E}}^{(t)} = (X_{v_{1}}^{(t_{1})}, \dots, X_{v_{| K_{E} |}}^{t_{| K_{E} |}})$ is the vector of allele counts in populations K_E at times t, and X_Leaves(E) the observed data at the leaves beneath E.

In addition, we define the “partial SFS”

ϕ_{z}^{E} = \sum_{v descended from K_{E}} E [branch length in v with X_{Leaves (E)} = z]

(5)

as the expected branch length at or below K_E subtending z leaves. (Here, the expectation is with respect to branch lengths in the coalescent tree relating the samples.) Note that $ϕ_{z}^{R o o t (T)}$ gives the desired final result, and corresponds to equation (3) in the previous subsection.

Algorithm 2.

Dynamic program to compute the SFS ϕ

1: procedure

DP (ℓ^{1}, \dots, ℓ^{D}) ⊳ ℓ^{i} = e_{X_{i}} \in ℝ^{n_{i} + 1}

2: for event E in DepthFirstSearch

(T)

3: if K_E = {d} is leaf then

4: ℓ^E,0 ← ℓ^d

5: else if E is split event then

6: ℓ^E,0 ← Lemmas 1 and 2

7: else if E is join event then

8: if |ChildEvents(E)|=1 then

9: ℓ^E,0 ← Lemmas 1 and 4

10: else if |ChildEvents(E) |= 2 then

11: ℓ^E,0 ← Lemmas 1 and 3

12: end if

13: end if⊳ Computed the conditional likelihood ℓ^E,0

14: ϕ^E ← (10) ⊳ Computes ϕ^E the partial SFS

15: end for

16: return

ϕ^{Root (T)}

⊳ Return partial SFS at the root event

17: end procedure

Open in a new tab

For the remainder of the subsection, we will fix Leaved $X_{Leaves (G)} = z$ , and drop the dependence on z in $ℓ_{x, z}^{E, t}$ and $ϕ_{z}^{E}$ . Algorithm 2 defines a dynamic program (DP) over the conditional likelihoods $ℓ_{x}^{E, 0}$ and partial SFS ϕ^E. The DP takes input vectors $ℓ^{1}, \dots, ℓ^{D}$ corresponding to the leaf populations Leaves $(G) = {1, \dots, D}$ . If the inputs $ℓ^{1}, \dots, ℓ^{D}$ are set to indicator vectors corresponding to the observed counts $X_{1}, \dots, X_{D}$ , the DP of Algorithm 2 will return the corresponding SFS entry, as stated in the theorem below:

Theorem 1. If $(X_{1}, \dots, X_{D}) \neq 0, n$ then

ϕ = DP (e_{X_{1}}, \dots, e_{X_{D}}),

where $e_{X_{i}} = (0, \dots, 1, \dots, 0) \in ℝ^{n_{i} + 1}$ denotes the vector with 1 at coordinate X_i and 0 elsewhere.

We now present the formulas used by Algorithm 2, in a series of lemmas also used to prove Theorem 1. We start with a formula to “lift” $ℙ (\dots | \dots, X_{v}^{(0)}, \dots)$ up to $ℙ (\dots | \dots, X_{v}^{(τ_{v})}, \dots)$ . That is, this formula transforms a likelihood conditioned on $X_{v}^{(0)}$ , the allele count at the bottom of v, into a likelihood conditioned on $X_{v}^{(τ_{v})}$ , the allele count at the top of v.

Lemma 1 (Lifting), thmadmixlift Let E be a split or admixture event with corresponding block K_E, ν ∈ K_E be a population within this block, and x, t be vectors of allele counts and times for the populations in K_E. Then, for x_(k) = x − ke_ν the conditional likelihood of E is

ℓ_{x}^{E, t} = \sum_{j = 0}^{n_{v}} {[e^{Q^{(n_{v})} \int_{0}^{τ_{v}} \frac{1}{η_{v} (t)} d t}]}_{k j} l_{x_{(j)} + j e_{v}}^{E, t - τ_{v} e_{v}},

(6)

where $Q^{(n)} \in ℝ^{(n + 1) \times (n + 1)}$ is the transition rate matrix of the Moran model with n lineages; in particular, $Q^{(n)} = {(q_{i j}^{(n)})}_{0 \leq i, j \leq n}$ and

q_{i j}^{(n)} = {\begin{array}{l} - i (n - i), & if i = j, \\ \frac{1}{2} i (n - i), & if | j - i | = 1, \\ 0, & else. \end{array}

To process event E, we first use Lemma 1 to lift up the conditional likelihoods at the child populations, up to the time of E We then apply one of Lemma 2, Lemma 3, or Lemma 4, to obtain the conditional likelihood at E from the lifted child likelihoods, depending on whether E is an admixture or split event, and whether the child populations of E fall into a single cluster or two independent clusters.

We first consider admixture events; Lemma 2 describes how to compute the conditional likelihood in this case. Let the child population be w and the parent populations be v₁, v₂ Each of the n_w lineages in w independently inherits from v₁ with probability q₁, or from v₂ with probability q₂ =1 − q₁. So the number of lineages inheriting from v₁ is Binomial (n_w, q₁). Then, given that m₁ alleles are inherited from v₁ and m₂ = n_w = m₁ inherited from v₂, the particular alleles inherited from v₁ or v₂ are chosen by sampling without replacement.

Lemma 2 (Admixture event).

Let E be an admixture event, with child population w and parent populations v₁, v₂. Let E′ be the child event in $T$ . Suppose each lineage in w comes from v₁ with probability q₁, and from v₂ with probability q₂ = 1 − q₁ For K_E the population cluster at E, let x_∩ be a vector of allele counts on K_E\ {ν₁, ν₂}. Then the conditional likelihood of allele counts $x_{\cap} + x_{1} e_{v_{1}} + x_{2} e_{v_{2}}$ at E is given by

ℓ_{x_{\cap} + x_{1} e_{v_{1}} + x_{2} e_{v_{2}}}^{E, 0} = \sum_{x_{w} = 0}^{n_{w}} ℓ_{x_{\cap} + x_{w} e_{w}}^{E^{'}, τ_{w} e_{w}} \sum_{\overset{m_{1}, m_{2} :}{m_{1} + m_{2} = n_{w}}} (\begin{matrix} n_{w} \\ m_{1} \end{matrix}) q_{1}^{m_{1}} q_{2}^{m_{2}} \sum_{\overset{j_{1}, j_{2} :}{j_{1} + j_{2} = x_{w}}} \frac{(\begin{matrix} x_{1} \\ j_{1} \end{matrix}) (\begin{matrix} n_{v_{1}} - x_{1} \\ m_{1} - j_{1} \end{matrix})}{(\begin{matrix} n_{v_{1}} \\ m_{1} \end{matrix})} \frac{(\begin{matrix} x_{2} \\ j_{2} \end{matrix}) (\begin{matrix} n_{v_{2}} - x_{2} \\ m_{2} - j_{2} \end{matrix})}{(\begin{matrix} n_{v_{2}} \\ m_{2} \end{matrix})} .

(7)

We next consider a split event E, with parent population v and child populations w₁, w₂. We first consider the case where E has 2 distinct children in $T$ , i.e., w₁, w₂ fall into 2 distinct blocks beneath E. Denote the corresponding child events as E₁′, E₂′ respectively. Then the conditional likelihood at E is given by a convolution of the conditional likelihoods at E₁′, E₂′, as described in Lemma 3:

Lemma 3 (Population split, 2 clusters). Let E be a split event with parent population v and child populations w₁, w₂. Suppose E has 2 child events E₁′, E₂′, with corresponding blocks $K_{E_{1}^{'}}, K_{E_{2}^{'}}$ , where $w_{1} \in K_{E_{1}^{'}}, w_{2} \in K_{E_{2}^{'}}$ . Let x₋₁ be allele counts on $K_{E_{1}^{'}} \ {w_{1}}$ and x₋₂ be allele counts on $K_{E_{2}^{'}} \ {w_{2}}$ , Then the conditional likelihood at E is

ℓ_{x_{- 1} + x_{- 2} + x_{v} e_{v_{2}}}^{E, 0} = \sum_{\overset{x_{1}, x_{2} :}{x_{1} + x_{2} = x_{v}}} \frac{(\begin{matrix} n_{w_{1}} \\ x_{1} \end{matrix}) (\begin{matrix} n_{w_{2}} \\ x_{2} \end{matrix})}{(\begin{matrix} n_{v} \\ x_{v} \end{matrix})} ℓ_{x_{1} e_{w_{1}} + x_{- 1}}^{E_{1}^{'}, τ_{w_{1}} e_{w_{1}}} ℓ_{x_{2} e_{w_{2}} + x_{- 2}}^{E_{2}^{'}, τ_{w_{2}} e_{w_{2}}} .

(8)

Next, consider the case where the child populations w₁, w₂ fall into the same cluster beneath E, so that E has just one child event, say E′. The next lemma describes how to obtain the conditional likelihood at E from the conditional likelihood at E′. This involves summing over the dimensions corresponding to w₁, w₂ within the conditional likelihood $ℓ^{E^{'}, τ_{w_{1}}}^{e_{w_{1}} + τ_{w_{2}} e_{w_{2}}}$ . In addition, note that we may have $n_{v} < n_{w_{1}} + n_{w_{2}}$ (recall n_v is the number of samples with ancestry in v). That is, after merging w₁, w₂ backwards in time, we may be keeping track of more alleles than originally sampled, allowing us to “drop” some extraneous non-ancestral lineages, as illustrated in the root population of Figure 1. Formally, let $y^{x_{\cap}}$ be the vector whose i-th entry is the likelihood at E conditional on there being a total of i derived alleles in the child populations w₁ and w₂, and also conditional on the vector of allele counts x_∩ in the other populations. If $B \in ℝ^{(n_{w_{1}} + n_{w_{2}} + 1) \times (n_{v} + 1)}$ is a matrix whose (i, j)th entry is the probability of sampling without replacement j alleles in the parent population v out of the total of i alleles segregating in child populations w₁ and w₂, then the conditional likelihood ℓ^E at E satisfies the equation

y^{x_{\cap}} = B ℓ^{E},

leading to the following lemma.

Lemma 4 (Population split, 1 cluster). Let E be a population split with exactly one event E′. Denote the corresponding population clusters as K_E and K_E′. Denote the parent population as v, the child populations as w₁, w₂. Let x_∩, $x_{\cap}, y^{x_{\cap}}$ and B be defined as in the preceding paragraph:

y_{i}^{x_{\cap}} = \sum_{\overset{j, k :}{j + k = i}} \frac{(\begin{matrix} n_{w_{1}} \\ j \end{matrix}) (\begin{matrix} n_{w_{2}} \\ k \end{matrix})}{(\begin{matrix} n_{w_{1}} + n_{w_{2}} \\ i \end{matrix})} ℓ_{j e_{w_{1}} + k e_{w 2} + x_{\cap}}^{E^{'}, τ_{w_{1}} e_{w_{1}} + τ_{w_{2}} e_{w_{2}}} = ℙ (X_{Leaves (E)} | X_{w_{1}}^{(τ_{w_{1}})} + X_{w_{2}}^{(τ_{w_{2}})} = i, X_{\cap} = x_{\cap})

B_{i, j} = \frac{(\begin{matrix} n_{v} \\ j \end{matrix}) (\begin{matrix} n_{w_{1}} + n_{w_{2}} - n_{v} \\ i - j \end{matrix})}{(\begin{matrix} n_{w_{1}} + n_{w_{2}} \\ i \end{matrix})} .

Then the conditional likelihood at E is given by

ℓ_{k e_{v} + x_{\cap}}^{E, 0} = {[B^{+} y^{x_{\cap}}]}_{k},

(9)

with B⁺ denoting the Moore-Penrose pseudoinverse of B.

Finally, having computed the conditional likelihoods at event E, we wish to compute the partial SFS ϕ^E at the event. This is given by the recursive formula in Lemma 5, which involves the conditional likelihood at E, the expected number of mutations arising within each parent population v, and the partial SFS at the child events.

Lemma 5. For an event E, define the partial ordering E′ ≺ E if E′ occurs beneath E in the event tree, and let $K_{E}^{new} = K_{E} \ (\cup_{E^{'} ≺ E} K_{E^{'}})$ be the populations newly formed at E (i.e., formed by a population split or admixture at E). For $v \in K_{E}^{new}$ , let f_ν(k) be the truncated SFS in population v,

f_{v} (k) = \frac{1}{θ} E [# mutations at v with X_{v} = k],

which can be computed by the formulas in Kamm et al. (2017). Then the partial SFS at E is given by

ϕ^{E} = \sum_{v \in K_{E}^{new}} \sum_{k = 1}^{n_{v}} f_{v} (k) ℓ_{k e_{v}}^{E, 0} + {\begin{array}{l} 0 & if E is leaf event, \\ ϕ^{E^{'}}, & if ChildEvents (E) = {E^{'}}, \\ ϕ^{E_{1}^{'}} \prod_{d \in Leaves (E_{2}^{'})} ℓ_{0}^{d} + ϕ^{E_{2}^{'}} \prod_{d \in L e a v e s (E_{1}^{'})} ℓ_{0}^{d}, & if ChildEvents (E) = {E_{1}^{'}, E_{2}^{'}} . \end{array}

(10)

3.3. Normalizing constant and other linear functionals

To compute the probability $\frac{ϕ_{z}}{‖ ϕ ‖_{1}}$ of a mutation having observed allele counts z, we need not just ϕ_z, but also the normalizing constant $‖ ϕ ‖_{1} = \sum_{z} ϕ_{z}$ the expected total branch length.

Computing ‖ϕ‖₁ directly is inefficient because of the $O (\prod_{d = 1}^{D} n_{d})$ possible entries z. Instead, we can use Algorithm 2 to compute ‖ϕ‖₁, and many more statistics of the SFS, in the same time as a single entry:

Theorem 2. For $π^{d} \in ℝ^{n_{d} + 1}, d \in {1, \dots, D}$ , the tensor dot product of the SFS ϕ against $π^{1} \otimes \dots \otimes π^{D} = {[π_{z_{1}}^{1} \dots π_{z_{D}}^{D}]}_{z_{1}, \dots, z_{D}}$ is

ϕ ⊙ (π^{1} \otimes \dots \otimes π^{D}) = \sum_{z_{1}, \dots, z_{D}} ϕ_{z_{1}, \dots, z_{D}} π_{z_{1}}^{1} \dots π_{z_{D}}^{D} = DP (π^{1}, \dots, π^{D}) - (\prod_{d = 1}^{D} π_{0}^{d}) DP (e_{0}, \dots, e_{0}) - (\prod_{d = 1}^{D} π_{n_{d}}^{d}) DP (e_{n_{1}}, \dots, e_{n_{D}}) .

Theorem 2 says that for any rank-K tensor $A \in ℝ^{(n_{1} + 1) \times \dots \times (n_{D} + 1)}$ with

A = \sum_{k = 1}^{K} a_{1}^{(k)} \otimes \dots \otimes a_{D}^{(k)},

ϕ ⊙ A = \sum_{z} ϕ_{z} A_{z} = \sum_{k = 1}^{K} ϕ ⊙ (a_{1}^{(k)} \otimes \dots \otimes a_{D}^{(k)})

can be computed in K calls to DP $(π^{1}, \dots, π^{D})$ . 3 In particular, the expected total branch length ‖ϕ‖₁ is given by

‖ ϕ ‖_{1} = \sum_{z} ϕ_{z} = ϕ ⊙ (1 \otimes \dots \otimes 1) = DP (1, \dots, 1) - DP (e_{0}, \dots, e_{0}) - DP (e_{n_{1}}, \dots, e_{n_{D}})

with 1 the vector with 1 at every coordinate.

Beyond the applications we explore here, we expect this result to be useful in related settings. A number of population genetic statistics can be expressed as f ⨀ A, including Watterson’s estimator θ_W of the mutation rate (Watterson, 1975), Fay and Wu’s H statistic for positive selection (Fay and Wu, 2000), and Patterson’s f₂, f₃, f₄ statistics for assessing topology (Patterson et al., 2012). Theorem 2 allows us to compute their expected values $E [f ⊙ A] = θ ϕ ⊙ A$ , and to construct test statistics from the deviance f ⨀ A − θϕ⨀A under an appropriate null model.

An even wider class of population genetic statistics can be written as nonlinear functions of SFS-tensor products like $g (\frac{1}{θ} f ⊙ A_{1}, \frac{1}{θ} f ⊙ A_{2}, \dots)$ ; this class includes Tajima’s D statistic for selection (Tajima, 1989), the F_ST statistic for population structure (Holsinger and Weir, 2009), and Patterson’s D statistic for introgression (Patterson et al., 2012). These statistics may be viewed as plug-in estimators for g(ϕ⨀ A₁, ϕ⨀A₂,…) which we can compute with Theorem 2. Note that these estimators are biased due to the nonlinear function g, but the bias can be estimated via block jackknife, and will typically be small since $\frac{1}{θ} f ⊙ A \to ϕ ⊙ A$ almost surely as the number of independent SNPs grows.

Another interesting linear statistic of the SFS that can be computed with Theorem 2 is $E [T_{MRCA}]$ , the expected time of the most recent common ancestor. In particular, let d be any leaf population; for simplicity assume d is sampled at the present (i.e. d is not archaic). Then

E [T_{MRCA}] = \sum_{z_{1}, \dots, z_{D}} ϕ_{z_{1}, \dots, z_{D}} \frac{z_{d}}{n_{d}} = ϕ ⊙ (1 \otimes \dots \otimes {(\frac{z_{d}}{n_{d}})}_{z_{d} \in {0, \dots, n_{d}}} \otimes \dots \otimes 1) .

To see this, note that T_MRCA is proportional to the expected number of mutations hitting an arbitrary lineage in d, and if a mutation has configuration $z = (z_{1}, \dots, z_{D})$ derived copies, then the chance of hitting the lineage is $\frac{z_{d}}{n_{d}}$ (Zeng et al., 2006).

4. Application

We tested our method on a demographic inference problem in human genetics that is currently of interest. Lazaridis et al. (2014) showed that genetic variation in present-day Europeans suggests an admixture model involving three ancestral meta-populations: Ancient North Eurasian (ANE), Western Hunter Gatherers (WHG), and Early European Farmers (EEF). They also showed that EEF contains ancestry from a source that is an outgroup to all non-African populations, and yet shares much of the genetic drift common to non-African populations; they dubbed this ancestry component as “Basal Eurasian” ancestry. Later work (Lazaridis et al., 2016) showed that Basal Eurasian ancestry is shared by ancient and contemporary Middle Eastern populations, and is correlated with a decrease in Neanderthal ancestry, implying that Basal Eurasian ancestry contains lower levels of Neanderthal admixture when compared with non-Basal ancestry. The results from Lazaridis et al. (2014, 2016) were based on several related methods for modeling covariances in population allele frequencies, most notably qpGraph and qpAdm (Patterson et al., 2012; Haak et al., 2015). These methods are computationally efficient, and are robust to ascertainment bias and misspecification of the population size history; however, they are unable to infer the timing of demographic events.

We applied momi2 to estimate the strength and timing of basal Eurasian admixture into early European farmers, and the split time of the basal Eurasian lineage. To do this, we built a demographic model relating 12 samples from 8 populations, shown in Figure 3. These samples consisted of the Altai Neanderthal (Prüfer et al., 2014); the 45,000 year old Ust’lshim man from Siberia (Fu et al., 2014); 3 present-day populations (Mbuti, Sardinian, Han) with 3, 2, and 2 samples respectively; and 3 ancient samples representing the European ancestry components identified by Lazaridis et al. (2014): a 7,500 year old sample from the Linearbandkeramik (LBK) culture (representing EEF), an 8,000 year old sample from the Loschbour rock shelter in Luxembourg (representing WHG), and the 24,000 year old Mal’ta boy (“MA1”) from Siberia (representing ANE). After data cleaning, our dataset consisted of 2.4 ×10⁶ autosomal transversion SNPs. See Appendix A.4 for more details about the data.

Fig. 3 — Inferred model and bootstraps for the 11 population demography described in Section 4. The y-axis is linear below 5 × 10⁴ then follows a logarithmic scale above 5 × 10⁴ in the foreground (blue branches) is our point estimate from maximum composite likelihood; in the background (gray branches, with transluscent pulse arrows) are 300 nonparametric bootstraps. Population sampling times are indicated by open circles; note demographic events may still involve populations below their sampling times, as in the case of Loschbour. The nonparametric bootstrap estimates were created by splitting the data into 100 equally sized contiguous blocks, resampling these blocks with replacement, and refitting the model.

To construct the topology of the model in Figure 3, we first obtained a tree by neighbor joining (Saitou and Nei, 1987), then added 3 extra admixture events reflecting prior knowledge, as well as a recent Neanderthal population decline starting at the Mbuti-Eurasian split. We inferred split times, population sizes (including the Neanderthal decline), and admixture times and proportions by maximizing a composite likelihood $L$ , given by the product of the likelihoods at every SNP:

L = \prod_{s \in SNPs} ℙ (z_{s})

(11)

where $ℙ (z_{s}) \propto ϕ_{z_{s}}$ and was computed by momi2. The low coverage of the MA1 sample and the deep divergence of the Neanderthal sample may cause bias in SFS entries where only these samples contain derived alleles; we thus excluded these entries and corrected the normalizing constant appropriately (see Appendix A.4).

We used automatic differentiation to compute the gradient $\nabla \log L$ , which we used to search for the optimum of logℒ. We constructed nonparametric bootstrap confidence intervals by splitting the genome into 100 equally sized blocks, resampling these blocks to create 300 bootstrap datasets, and re-inferring the demography for each bootstrap dataset. We also used 300 parametric bootstraps to assess how well we could infer the demography under simulated data; for each parametric bootstrap dataset, we used msprime (Kelleher et al., 2016) to simulate ten 300 Mb chromosomes from our initial point estimate, and re-inferred the demography. The nonparametric bootstrap was used for all confidence intervals reported below.

Our inferred demography, along with nonparametric bootstrap re-estimates, are shown in Figure 3 and Table 2. Our parametric bootstrap estimates are shown in Figure 4. We inferred a pulse of 0.094 (95% Cl of 0.049–0.174) from the ghost Basal Eurasian population to EEF ancestry (LBK), substantially less than the 0.44 inferred by (Lazaridis et al., 2014). This admixture was inferred to occur 33.7 kya (95% Cl of 10.8–41.1 kya), shortly after the Loschbour-LBK split at 37.7 kya (95% Cl of 32.2–42.3 kya). The split time of the ghost Basal Eurasian lineage from other Eurasians was inferred at 79.8 kya (95% Cl of 67.4–101 kya). Other parameters were broadly in line with previous estimates, such as a Mbuti-Eurasian split of 96 kya, a Han-European split of 50 kya, a Neanderthal split of 696 kya, and Eurasians deriving 0.03 of their ancestry from Neanderthal (Terhorst et al., 2017; Green et al., 2010; Meyer et al., 2016).

Table 2.

Estimated parameters of the demography in Figure 3, along with nonparametric bootstrap estimates of the bias and standard deviation, and 95% bootstrap quantiles. We use (A, B) to denote the ancestor of A and B; t_v and N_v to denote the height and size at vertex v, and t_A→B and p_A→B to denote respectively the time and strength of an admixture arrow from A to B.

Parameter	Estimate	Bias	SD	2.5%	97.5%
N_Losch	1.92e+03	14.1	185	1.57e+03	2.27e+03
N_Mbu	1.73e+04	–255	1.86e+03	1.6e+04	1.92e+04
N_(Mbu,Losch)	2.91e+04	−270	2.34e+03	2.7e+04	3.21e+04
t_(Han,Losch)	9.58e+04	−1.48e+03	9.32e+03	9.19e+04	1.03e+05
N_Han	6.3e+03	−17.1	400	5.71e+03	6.91e+03
N_(Han,Losch)	2.34e+03	−70.2	372	2.13e+03	2.7e+03
t_(Han,Losch)	5.04e+04	−171	2.75e+03	4.71e+04	5.43e+04
t_(Ust,Losch)	5.15e+04	−79.4	2.47e+03	4.85e+04	5.47e+04
N_(Nean,Losch)	1.82e+04	−216	1.59e+03	1.7e+04	1.99e+04
t_(Nean,Losch)	6.96e+05	−7.6e+03	5.74e+04	6.49e+05	7.58e+05
t_{Nean→Eurasian}	5.68e+04	−279	3.04e+03	5.38e+04	6.01e+04
p_{Nean→Eurasian}	0.0296	−2.82e-05	0.00252	0.0251	0.0349
N_Nean	86.9	−3.33	19.8	76.7	105
t_(MA1,Losch)	4.49e+04	98.2	2.36e+03	4.11 e+04	4.87e+04
N_LBK	75.7	−685	629	4.12	1.9e+03
t_(LBK,Losch)	3.77e+04	543	2.59e+03	3.22e+04	4.23e+04
p_Basal→EEF	0.0936	−0.0122	0.0366	0.0485	0.174
t_Basal→EEF	3.37e+04	7.46e+03	1.01e+04	1.08e+04	4.11 e+04
t_{(Basal,Losch)}	7.98e+04	−706	1.37e+04	6.74e+04	1.01e+05
N_Sard	1.5e+04	−1.28e+04	6.53e+04	8.58e+03	8.9e+04
t_(Sard,LBK)	7.69e+03	−1.69e+03	1.64e+03	7.51e+03	1.24e+04
N_(Sard,LBK)	1.2e+04	1.32e+03	1.99e+03	7.33e+03	1.45e+04
t_{GhostWHG→Sard}	1.23e+03	−2.93e+03	3.1e+03	597	1.06e+04
p_{GhostWHG→Sard}	0.0317	−0.00223	0.0151	0.00631	0.0618
t_{(GhostWHG,Losch)}	1.56e+03	−1.32e+04	1.07e+04	964	3.49e+04

Open in a new tab

Fig. 4 — Parametric bootstraps for the demography inferred in Figure 3. The inferred demography (blue tree) was used to simulate 300 bootstrap replicates. Re-running our method on each replicate produced a new inferred demography which is plotted in gray.

Inferring the optimal demography from start to finish took 2.5 hours on a laptop with 4 CPU cores, and used 2 GB RAM. The 300 bootstraps were run separately on a high-performance computing cluster. To our knowledge, no other method can infer this demographic model using the full SFS. The moments software package (Jouganous et al., 2017) is capable of computing the SFS for up to 5 populations, less than the 8 populations here, though it can scale to more individuals per population than momi2. While the fastsimcoal2 software package (Excoffier et al., 2013) is capable of handling demographies of this size and larger, it doesn’t compute the full, exact SFS, and also does not include an option for the ascertainment scheme we use here (excluding mutations private to Neanderthal and MA1).

We also used our model to produce estimates of the human mutation rate, which we estimated as 1.22 × 10⁻⁸ per base per generation, with a bootstrap-quantile 95% Cl of (1.12, 1.32) × 10⁻⁸ closely matching previous estimates of the human mutation rate (Scally, 2016). To obtain this estimate, we compared the observed nucleotide diversity with that expected under the inferred demography, adjusting by the empirical transition to transversion ratio (Appendix A.4). Estimating the mutation rate was possible here because we did not use a prespecified mutation rate to estimate the model in Figure 3, instead using the known ages of the Ust’Ishim, LBK, Loschbour, and MA1 samples to calibrate dates.

Acknowledgments

This research is supported in part by an NIH grant R01-GM109454 (YSS); a Packard Fellowship for Science and Engineering (YSS); Wellcome Trust grants WT206194 and WT207492 (RD); and the Human Frontiers Science Program fellowship LT000402/2017 (JK). YSS is a Chan Zuckerberg Biohub investigator.

A. Appendix

A.1. Model and notation

In this section, we formally define the stochastic process underlying our model, and introduce some additional notation needed for the proofs in Section A.2.

The main stochastic process we use is the lookdown construction of the Moran model (Donnelly and Kurtz, 1996; Donnelly et al., 1999), a variant of the Moran model where copying only occurs in one “direction” (as in Figure 5). We will make use of certain conditional independence properties that result from this one-way copying. However, we note that there is a simple coupling between the lookdown and standard Moran models, and these two models generate data with the same distribution.

Fig. 5 — Standard and lookdown Moran models, (a) illustrates the standard Moran model, with copying in both directions; (b) illustrates the lookdown Moran model, with copying always from left to right. The sample genealogy (i.e., the lineages ancestral to present day samples) is in solid. The law of the genealogy is the coalescent under both models, and so the present-day samples of the two models are equal in distribution.

We now describe our lookdown model in more detail. Within each of the $D$ leaf populations we consider a countably infinite number of lineages, each with a unique label in $ℤ_{+}$ . We arbitrarily assign the $(n_{1}, \dots, n_{D})$ sampled lineages to the lowest labels {1,2, …, n_tot}, where $n_{tot} \equiv \sum_{d = 1}^{D} n_{d}$ , and arbitrarily assign the remaining unsampled lineages to labels {n_tot + 1,n_tot + 2, …}. Each lineage extends infinitely backwards in time, and at each admixture event, each lineage randomly chooses the parent population it extends into. In addition, each lineage has an allele, which changes through time due to mutation and copying events.

Similar to the usual Moran model, copying between a pair of lineages in population v occurs at rate $\frac{1}{η_{v} (t)}$ , where η_ν(t) is the scaled population size of v. However, copying only occurs in one direction, from lineages with lower labels to lineages with higher labels (Figure 5). So if two lineages are labeled i and j respectively, with i < j, then the copying event i → j, occurs at rate $\frac{1}{η_{v} (t)}$ , whereas the reverse copying event i ← j never happens (has rate 0). By contrast, in the standard (non-lookdown) Moran model, both i → j and i ← j copying events happen at rate $\frac{1}{2 η_{v} (t)}$ . However, under both models, two lineages going backwards in time coalesce at rate $\frac{1}{η_{v} (t)}$ , as in the coalescent. Thus, the sample genealogies under both models follow the same multipopulation coalescent distribution.

Denote the labeled lineages in population v, and their alleles at height t, by

M_{v, t} = (M_{v, t, (1)}, M_{v, t, (2)}, \dots)

where $M_{v, t, (i)} = ({label}_{v, (i)}, {allele}_{v, t, (i)}) \in ℤ_{+} \times {0, 1}$ is a pair, consisting of the ith lowest label at v, along with its corresponding allele at height t ∈[0, τ_ν). Note that we have ordered the lineages by their labels, so that label_ν,(i) < label_ν,(i+1). Also note that we measure the time t from the bottom of v, so that allele_ν,0,(i) denotes the ith allele at the bottom of v, and ${allele}_{v, τ_{v}, (i)}$ the ith allele at the top of v.

Let $X_{v, m}^{(t)} = \sum_{i = 1}^{m} I_{{M_{v, t, (i)} derived}}$ denote the number of derived alleles among the lowest m lineages at v, t. Let $n_{v} = \sum_{d ⪰ v} n_{d}$ be the number of samples in leaves below v. During computation, we will only need to keep track of the lowest n_v lineages in v, so we denote $X_{v}^{(t)} \equiv X_{v, n_{v}}^{(t)}$ to be the number of derived alleles among the first n_v lineages at v, t.

Finally, for an event E with K_E = (ν₁, …, ν_k) the corresponding block of populations in Algorithm 1, let t = (t₁, …, t_k) be a corresponding set of times within each population ν₁, …, ν_k. We define $M_{E, t} = (M_{v_{1}, t_{1}}, \dots, M_{v_{k}, t_{k}})$ as the labeled alleles at each of (ν₁,t₁), …, (ν_k, t_k)

A.2. Proofs

For the reader’s convenience, theorem statements are reproduced from the main text before each proof. The following two lemmas will be useful for several of the proofs below:

Lemma 6. The distribution of $M_{E, t}$ is invariant to finite permutations of the labels within any population. ν ∈K_E Furthermore, the labels are independent of the alleles.

Proof. By construction, none of the lineages within the populations in K_E are ancestral to each other. Thus the sample genealogy of any finite subsample of the lineages is the multipopulation coalescent, because going backwards in time, coalescence (copying) between each pair of lineages occurs at rate $\frac{1}{η_{w} (s)}$ at vertex w time s. The invariance to permutation, and independence of alleles and labels, follows from the coalescent. □

Lemma 7. For event E and population u ∈ K_E, let $I \subset {n_{u} + 1, n_{u} + 2, \dots}$ be a (possibly random) collection of integers greater than n_u, and let $X_{u, I}^{(t_{u})}$ be the number of derived lineages in ${M_{u, t_{u}, (i)}}_{i \in I}$ . Then X_Leaves(E) is conditionally independent of $X_{u, I}^{(t_{j})}$ the allele counts on $I$ , given $X_{K_{E}}^{(t)}$ the allele counts on the first $n_{v_{1}}, \dots, n_{v_{k}}$ lineages of K_E = {ν₁, …, ν_k}.

Proof. Integrate over $M_{E, t, n_{v}} \equiv {M_{v, t_{v}, (i)}}_{v \in K_{E}, 1 \leq i \leq n_{v}}$ the first $(n_{v_{1}}, \dots, n_{v_{k}})$ labeled alleles within K_E, and $M_{u, t_{u}, I} = {M_{u, t_{u}, (i)}}_{i \in I}$ the labeled alleles at $I$ :

ℙ (X_{Leaves (E)} | X_{K_{E}}^{(t)}, X_{u, I}^{t_{u}}) = E [ℙ (X_{Leaves (E)} | M_{E, t, n_{v}}, M_{u, t_{u}, I}) | X_{K_{E}}^{(t)}, X_{u, I}^{(t_{u})}] = E [ℙ (X_{Leaves (E)} | M_{E, t, n_{v}}) | X_{K_{E}}^{(t)}, X_{u, I}^{(t_{u})}] = E [ℙ (X_{Leaves (E)} | M_{E, t, n_{v}}) | X_{K_{E}}^{(t)}] = ℙ (X_{Leaves (E)} | X_{K_{E}}^{(t)})

with the second equality because higher lineages cannot copy to lower lineages (so $X_{Leaves (E)} ⊥ M_{u, t_{u}, I} | M_{E, t, n_{v}}$ ), and the third equality because of the exchangeability and independence from Lemma 6 (so given $X_{K_{E}}^{(t)}$ , the alleles of $M_{E, t, n_{v}}$ are ordered by a uniform permutation independent of $X_{u, I}^{(t_{u})}$ , and the labels of $M_{E, t, n_{v}}$ are independent of $X_{K_{E}}^{(t)}$ , $X_{u, I}^{(t_{u})}$ ). □

A.2.1. Proof of Theorem 1

Theorem 1. If $(X_{1}, \dots, X_{D}) \neq 0$ ,n, then

ϕ = DP (e_{X_{1}}, \dots, e_{X_{D}}),

where $e_{X_{i}} = (0, \dots, 1, \dots, 0) \in ℝ^{n_{i} + 1}$ denotes the vector with 1 at coordinate X_i and 0 elsewhere.

Proof. Lemmas 2–4 give formulas for the partial likelihood $ℓ_{x}^{E, 0}$ using terms like $ℓ_{x^{'}}^{E^{'}, t^{'}}$ , where E′ ∈ Children(E) and E is an admixture, 2-cluster split, or 1-cluster split, respectively. The terms $ℓ_{x^{'}}^{E^{'}, t^{'}}$ in turn can be computed from the partial likelihoods $ℓ_{x^{'}}^{E^{'}, 0}$ using Lemma 1; then Lemma 5 provides a formula for ϕ^E in terms of ℓ^E,0 and the partial SFS ϕ^E′.

Algorithm 2 traverses the event tree $T$ in a depth-first search, applying Lemmas 5,1,2,4,3 to compute $ℓ_{x}^{E, 0}$ , ϕ^E at each event E from their values at the children of E The input to Algorithm 2 are the likelihoods ℓ^{d},0 at the leaves, and the output is the partial SFS $ϕ^{Root (T)}$ at the root.

Thus, since

ℓ^{{d}, 0} = {[ℙ (X_{d} | X_{d} = i)]}_{0 \leq i \leq n_{d}} = e_{X_{d}},

it follows that $DP (e_{X_{1}}, \dots, e_{X_{D}}) = ϕ^{Root (T)} = ϕ$ . □

A.2.2. Proof of Lemma 5

f_{v} (k) = \frac{1}{θ} E [# mutations at v with X_{v} = k],

which can be computed by the formulas in Kamm et al. (2017). Then the partial SFS at E is given by

ϕ^{E} = \sum_{v \in K_{E}^{new}} \sum_{k = 1}^{n_{v}} f_{v} (k) ℓ_{k e_{v}}^{E, 0} + {\begin{array}{l} 0 & if E is leaf event, \\ ϕ^{E^{'}}, & if ChildEvents (E) = {E^{'}}, \\ ϕ^{E_{1}^{'}} \prod_{d \in Leaves (E_{2}^{'})} ℓ_{0}^{d} + ϕ^{E_{2}^{'}} \prod_{d \in L e a v e s (E_{1}^{'})} ℓ_{0}^{d}, & if ChildEvents (E) = {E_{1}^{'}, E_{2}^{'}} . \end{array}

(10)

Proof. Without loss of generality assume θ = 1. Then ϕ^E is the expected number of mutations at or below K_E with observed counts X_Leaves(E). We split ϕ^E into two parts: the mutations within $K_{E}^{new} = K_{E} \ (\cup_{E^{'''} ≺ E} K_{E^{'}})$ (i.e. the populations formed by a split or join at E); and the mutations that occur in ∪_E′≺EK_E′, the populations that arise strictly below E.

The first part, of mutations at the new populations of E, is given by

\sum_{v \in K_{E}^{new}} \sum_{k = 1}^{n_{v}} f_{v} (k) ℓ_{k e_{v}}^{E, 0},

since f_v(k) is the expected number of mutations arising at v with X_v = k, and $ℓ_{k e_{v}}^{E, 0} = ℙ (X_{Leaves (E)} | X_{v} = k)$ .

For the second part, of mutations strictly below E, we split into three cases: either E is a leaf event, or E has a single child event, or E has two child events. In the first case, if E is a leaf event, then no mutations can occur below E. In the second case, if E has a single child event E′, then the expected number of mutations strictly below E is simply ϕ^E′ by definition.

Finally, if E has two child events E₁′, E₂′, then E₁′ and E₂′ share no leaves, so a mutation underneath E₁′ will have no derived alleles in Leaves(E₂′) and vice versa. Thus, the number of mutations strictly below E is

ϕ^{E_{1}^{'}} \prod_{d \in L e a v e s (E_{2}^{'})} ℓ_{0}^{d} + ϕ^{E_{2}^{'}} \prod_{d \in Leades (E_{1}^{'})} ℓ_{0}^{d} .

□

A.2.3. Proof of Lemma 1

Lemma 1 (Lifting). Let E be a split or admixture event with corresponding block K_E, v ∈ K_E be a population within this block, and x, t be vectors of allele counts and times for the populations in K_E. Then, for x_(k) = x − ke_v, the conditional likelihood of E is

ℓ_{x}^{E, t} = \sum_{j = 0}^{n_{v}} {[e^{Q^{(n_{v})} \int_{0}^{τ_{v}} \frac{1}{η_{v} (t)} d t}]}_{k j} l_{x_{(j)} + j e_{v}}^{E, t - τ_{v} e_{v}},

(6)

where $Q^{(n)} \in ℝ^{(n + 1) \times (n + 1)}$ is the transition rate matrix of the Moran model with n lineages; in particular, $Q^{(n)} = {(q_{i j}^{(n)})}_{0 \leq i, j \leq n}$ and

q_{i j}^{(n)} = {\begin{array}{l} - i (n - i), & if i = j, \\ \frac{1}{2} i (n - i), & if | j - i | = 1, \\ 0, & else. \end{array}

Proof. Define a “quasi-lookdown” Moran model $M^{*}$ , which is identical to $M$ , except within the n_v lowest lineages of v, where we allow copying in both directions at rate $\frac{1}{2 η_{v} (t)}$ the non-lookdown Moran model).

For an event E with populations K_E = (v₁, v₂,…) and corresponding times t = (t₁,t₂…), let $X_{K_{E}}^{(t)} = (X_{v_{1}}^{(t_{1})}, X_{v_{2}}^{(t_{2})}, \dots)$ be the corresponding allele counts. Next, define $X_{E, t} = {X_{K_{E^{'}}}^{s}}_{(E^{'}, s) (E, t)}$ as the sample path of allele counts below E, t, where (E′, s) (E, t) if either E is above E′, or E = E′ and s≤t component-wise. It will suffice to show $ℙ_{M} (X_{E, t}) = ℙ_{M^{*}} (X_{E, t})$ , because then for t = τ_ve_v+t_−v,

ℓ_{k e_{v} + x_{- v}}^{E, t} = \sum_{j = 0}^{n_{v}} ℙ_{M} (X_{K_{E}}^{(t_{- v})} = j e_{v} + x_{- v} | X_{K_{E}}^{(t)} = k e_{v} + x_{- v}) l_{j e_{v} + x_{- v}}^{E, t_{- v}} = \sum_{j = 0}^{n_{v}} ℙ_{M^{*}} (X_{K_{E}}^{(t_{- v})} = j e_{v} + x_{- v} | X_{K_{E}}^{(t)} = k e_{v} + x_{- v}) l_{j e_{v} + μ_{- v}}^{E, t_{- v}} = \sum_{j = 0}^{n_{v}} {[e^{M^{(n_{v})} \int_{0}^{τ_{v}} \frac{1}{η_{v} (t)} d t}]}_{k j} l_{j e_{v} + μ_{- v}}^{E, t_{- v}}

as desired.

$ℙ_{M} (X_{E, t}) = ℙ_{M^{*}} (X_{E, t})$ follows from a coupling argument. Let $M_{E, t} = {M_{E^{'}, s}}_{(E^{'}, s)}_{(E, t)}$ the sample path of $M$ below E, t. We can map the partial sample paths of $M^{*}_{v, t}$ onto those of $M_{v, t}$ as follows: moving from the bottom to the top of v, whenever a lower label is copied over by a higher label, swap the labels of the lineages above the copying. Then the relabeled sample path has the same distribution as the lookdown construction, since the allele with the higher label is always copied over, and the rate of copying between pairs of lineages is $\frac{1}{η_{v} (t)}$ . Since this relabeling also leaves $X_{v}^{(t)}$ unchanged, we have $ℙ_{M} (X_{E, t}) = ℙ_{M^{*}} (X_{E, t})$ . □

A.2.4. Proof of Lemma 2

Lemma 2 (Admixture event).

Let E be an admixture event, with child population w and parent populations v₁, v₂. Let E′ be the child event in $T$ . Suppose each lineage in w comes from v₁ with probability q₁ and from v₂ with probability q₂ = 1 − q₁ For K_E the population cluster at E, let x_∩ be a vector of allele counts on K_E \ {v₁, v₂}. Then the conditional likelihood of allele counts $x_{\cap} + x_{1} e_{v_{1}} + x_{2} e_{v_{2}}$ at E is given by

ℓ_{x_{\cap} + x_{1} e_{v_{1}} + x_{2} e_{v_{2}}}^{E, 0} = \sum_{x_{w} = 0}^{n_{w}} ℓ_{x_{\cap} + x_{w} e_{w}}^{E^{'}, τ_{w} e_{w}} \sum_{\overset{m_{1}, m_{2} :}{m_{1} + m_{2} = n_{w}}} (\begin{matrix} n_{w} \\ m_{1} \end{matrix}) q_{1}^{m_{1}} q_{2}^{m_{2}} \sum_{\overset{j_{1}, j_{2} :}{j_{1} + j_{2} = x_{w}}} \frac{(\begin{matrix} x_{1} \\ j_{1} \end{matrix}) (\begin{matrix} n_{v_{1}} - x_{1} \\ m_{1} - j_{1} \end{matrix})}{(\begin{matrix} n_{v_{1}} \\ m_{1} \end{matrix})} \frac{(\begin{matrix} x_{2} \\ j_{2} \end{matrix}) (\begin{matrix} n_{v_{2}} - x_{2} \\ m_{2} - j_{2} \end{matrix})}{(\begin{matrix} n_{v_{2}} \\ m_{2} \end{matrix})} .

(7)

Proof. First note that X_Leaves(E) = X_Leaves(E′) and

ℓ_{x_{\cap} + x_{1} e_{v_{1}} + x_{2} e_{v_{2}}}^{E, 0} = ℙ (X_{Leaves (E)} | X_{K_{E}}^{(0)} = x_{\cap} + x_{1} e_{v_{1}} + x_{2} e_{v_{2}}) = \sum_{x_{w} = 0}^{n_{w}} ℙ (X_{Leaves (E)} | X_{K_{E}}^{(0)} = x_{\cap} + x_{1} e_{v_{1}} + x_{2} e_{v_{2}}, X_{w}^{(τ_{w})} = x_{w}) ℙ (X_{w}^{(τ_{w})} = x_{w} | X_{K_{E}}^{(0)} = x) = \sum_{x_{w} = 0}^{n_{w}} ℙ (X_{Leaves (E)} | X_{K_{E}}^{(0)} = x_{\cap} + x_{1} e_{v_{1}} + x_{2} e_{v_{2}}, X_{w}^{(τ_{w})} = x_{w}) \times \sum_{\overset{m_{1}, m_{2} :}{m_{1} + m_{2} = n_{w}}} (\begin{matrix} n_{w} \\ m_{1} \end{matrix}) q_{1}^{m_{1}} q_{2}^{m_{2}} \sum_{\overset{j_{1}, j_{2} :}{j_{1} + j_{2} = x_{w}}} \frac{(\begin{matrix} x_{1} \\ j_{1} \end{matrix}) (\begin{matrix} n_{v_{1}} - x_{1} \\ m_{1} - j_{1} \end{matrix})}{(\begin{matrix} n_{v_{1}} \\ m_{1} \end{matrix})} \frac{(\begin{matrix} x_{2} \\ j_{2} \end{matrix}) (\begin{matrix} n_{v_{2}} - x_{2} \\ m_{2} - j_{2} \end{matrix})}{(\begin{matrix} n_{v_{2}} \\ m_{2} \end{matrix})}

by sampling n_w alleles in w from $n_{v_{1}}$ , $n_{v_{2}}$ alleles in v₁, v₂ which are exchangeable by Lemma 6.

Next, consider the $n_{v_{1}} + n_{v_{2}} - n_{w}$ highest alleles of v₁, v₂, and let $I \subset {n_{w} + 1, n_{w} + 2, \dots}$ their relative positions within w. Then we conclude the proof by noting that X_Leaves(E) = X_Leaves(E′) and applying Lemma 7, to get

ℙ (X_{Leaves (E)} | X_{K_{E}}^{(0)} = x_{\cap} + x_{1} e_{v_{1}} + x_{2} e_{v_{2}}, X_{w}^{(τ_{w})} = x_{w}) = ℙ (X_{Leaves (E^{'})} | X_{v_{1}} = x_{1}, X_{v_{2}} = x_{2}, X_{K_{E}^{'}}^{(τ_{w} e_{w})} = x_{\cap} + x_{w} e_{w}) = ℙ (X_{Leaves (E^{'})} | X_{w, I}^{(τ_{w})} = x_{1} + x_{2} - x_{w}, X_{K_{E}^{'}}^{(τ_{w}, e_{w})} = x_{\cap} + x_{w} e_{w}) = ℙ (X_{Leaves (E^{'})} | X_{E^{'}}^{(τ_{w}, e_{w})} = x_{\cap} + x_{w} e_{w}) = l_{x_{c} a p + x_{w} e_{w}}^{E, 0} .

□

A.2.5. Proof of Lemma 4

Lemma 4 (Population split, 1 cluster). Let E be a population split with exactly one event E′. Denote the corresponding population clusters as K_E and K_E′. Denote the parent population as v, the child populations as w₁,w₂. Let x_∩, $y^{x_{\cap}}$ and B be defined as in the preceding paragraph:

y_{i}^{x_{\cap}} = \sum_{\overset{j, k :}{j + k = i}} \frac{(\begin{matrix} n_{w_{1}} \\ j \end{matrix}) (\begin{matrix} n_{w_{2}} \\ k \end{matrix})}{(\begin{matrix} n_{w_{1}} + n_{w_{2}} \\ i \end{matrix})} ℓ_{j e_{w_{1}} + k e_{w 2} + x_{\cap}}^{E^{'}, τ_{w_{1}} e_{w_{1}} + τ_{w_{2}} e_{w_{2}}} = ℙ (X_{Leaves (E)} | X_{w_{1}}^{(τ_{w_{1}})} + X_{w_{2}}^{(τ_{w_{2}})} = i, X_{\cap} = x_{\cap})

B_{i, j} = \frac{(\begin{matrix} n_{v} \\ j \end{matrix}) (\begin{matrix} n_{w_{1}} + n_{w_{2}} - n_{v} \\ i - j \end{matrix})}{(\begin{matrix} n_{w_{1}} + n_{w_{2}} \\ i \end{matrix})} .

Then the conditional likelihood at E is given by

ℓ_{k e_{v} + x_{\cap}}^{E, 0} = {[B^{+} y^{x_{\cap}}]}_{k},

(9)

with B⁺ denoting the Moore-Penrose pseudoinverse of B.

Proof.

Let $X_{\cap} = X_{K_{E}}^{(0)} - X_{v}^{(0)} e_{v} = X_{K_{E^{'}}}^{(0)} - X_{w_{1}}^{(0)} e_{w_{1}} - X_{w_{2}}^{(0)} e_{w_{2}}$ be the vector of allele counts on K_E∩K_E′ at times 0. Then note that

y_{i}^{x_{\cap}} = \sum_{\begin{matrix} j, k : \\ j + k = i \end{matrix}} \frac{(\begin{matrix} n_{w_{1}} \\ j \end{matrix}) (\begin{matrix} n_{w_{2}} \\ k \end{matrix})}{(\begin{matrix} n_{w_{1}} + n_{w_{2}} \\ i \end{matrix})} ℓ_{j e_{w_{1}} + k e_{w_{2}} + x_{\cap}}^{E^{'}, τ_{w_{1}} e_{w_{1}} + τ_{w_{2}} e_{w_{2}}} = \sum_{\begin{matrix} j, k : \\ j + k = i \end{matrix}} ℙ (X_{w_{1}}^{(τ_{w_{1}})} = j, X_{w_{2}}^{(τ_{w_{2}})} = k | X_{w_{1}}^{(τ_{w_{1}})} + X_{w_{2}}^{(τ_{w_{2}})} = i, X_{\cap} = x_{\cap}) ℓ_{j e_{w_{1}} + k e_{w_{2}} + x_{\cap}}^{E^{'}, τ_{w_{1}} e_{w_{1}} + τ_{w_{2}} e_{w_{2}}} = ℙ (X_{Leaves (E)} | X_{w_{1}}^{(τ_{w_{1}})} + X_{w_{2}}^{(τ_{w_{2}})} = i, X_{\cap} = x_{\cap})

with the second equality following from exchangeability (Lemma 6) and the third equality from X_Leaves(E) = X_Leaves(E′).

Next note that

ℙ (X_{Leaves (E)} | X_{w_{1}}^{(τ_{w_{1}})} + X_{w_{2}}^{(τ_{w_{2}})} = i, X_{\cap} = x_{\cap}) = \sum_{j = 0}^{n_{v}} ℙ (X_{K_{E}}^{(0)} = j e_{v} + x_{\cap} | X_{w_{1}}^{(τ_{w_{1}})} + X_{w_{2}}^{(τ_{w_{2}})} = i, X_{\cap} = x_{\cap}) \times ℙ (X_{Leaves (E)} | X_{K_{E}}^{(0)} = j e_{v} + x_{\cap}, X_{w_{1}}^{(τ_{w_{1}})} + X_{w_{2}}^{(τ_{w_{2}})} = i) = \sum_{j = 0}^{n_{v}} \frac{(\begin{matrix} n_{v} \\ j \end{matrix}) (\begin{matrix} n_{w_{1}} + n_{w_{2}} - n_{v} \\ i - j \end{matrix})}{(\begin{matrix} n_{w_{1}} + n_{w_{2}} \\ i \end{matrix})} ℙ (X_{Leaves (E)} | X_{K_{E}}^{(0)} = j e_{v} + x_{\cap}, X_{w_{1}}^{(τ_{w_{1}})} + X_{w_{2}}^{(τ_{w_{2}})} = i)

with the second equality again due to exchangeability (Lemma 6).

Define $I \subset {n_{v} + 1, n_{v} + 2, \dots}$ so that ${1, \dots, n_{v}} \cup I$ are the indices in v of the first $n_{w_{1}}$ , $n_{w_{2}}$ alleles in w₁, w₂. Then because $X_{v, I}^{(0)} = X_{w_{1}}^{(τ_{w_{1}})} + X_{w_{2}}^{(τ_{w_{2}})} - X_{v}^{(0)}$ and Lemma 7,

ℙ (X_{Leaves (E)} | X_{K_{E}}^{(0)}, X_{w_{1}}^{(τ_{w_{1}})} + X_{w_{2}}^{(τ_{w_{2}})}) = ℙ (X_{Leaves (E)} | X_{K_{E}}^{(0)}, X_{v, I}^{(0)}) = ℙ (X_{Leaves (E)} | X_{K_{E}}^{(0)})

and thus

y_{i}^{x_{\cap}} = \sum_{j = 0}^{n_{v}} \frac{(\begin{matrix} n_{v} \\ j \end{matrix}) (\begin{matrix} n_{w_{1}} + n_{w_{2}} - n_{v} \\ i - j \end{matrix})}{(\begin{matrix} n_{w_{1}} + n_{w_{2}} \\ i \end{matrix})} ℓ_{j e_{v} + x_{\cap}}^{E, 0} = \sum_{j = 0}^{n_{v}} B_{i j} ℓ_{j e_{v} + x_{\cap}}^{E, 0}

so letting $ℓ^{'} = {[ℓ_{j e_{v} + x_{\cap}}^{E, 0}]}_{0 \leq j \leq n_{v}} \in ℝ^{n_{v} + 1}$ , we have $y^{x_{\cap}} = B ℓ^{'}$ and therefore $ℓ^{'} = B^{+} y^{x_{\cap}}$ . □

A.2.6. Proof of Lemma 3

Lemma 3 (Population split, 2 clusters). Let E be a split event with parent population v and child populations w₁, w₂ Suppose E has 2 child events E₁′, E₂′, with corresponding blocks $K_{E_{1}^{'}}$ , $K_{E_{2}^{'}}$ , where $w_{1} \in K_{E_{1}^{'}}$ , $w_{2} \in K_{E_{2}^{'}}$ . Let x₋₁ be allele counts on $K_{E_{1}^{'}} \ {w_{2}}$ and x₋₂ be allele counts on $K_{E_{2}^{'}} \ {w_{2}}$ . Then the conditional likelihood at E is

ℓ_{x_{- 1} + x_{- 2} + x_{v} e_{v}}^{E, 0} = \sum_{\overset{x_{1}, x_{2} :}{x_{1} + x_{2} = x_{v}}} \frac{(\begin{matrix} n_{w_{1}} \\ x_{1} \end{matrix}) (\begin{matrix} n_{w_{2}} \\ x_{2} \end{matrix})}{(\begin{matrix} n_{v} \\ x_{v} \end{matrix})} ℓ_{x_{1} e_{w_{1}} + x_{- 1}}^{E_{1}^{'}, τ_{w_{1}} e_{w_{1}}} ℓ_{x_{2} e_{w_{2}} + x_{- 2}}^{E_{2}^{'}, τ_{w_{2}} e_{w_{2}}} .

(8)

Proof. Notice that

ℓ_{x_{- 1} + x_{- 2} + x_{v} e_{v}}^{E, 0} = \sum_{\begin{matrix} x_{1}, x_{2} : \\ x_{1} + x_{2} = x_{v} \end{matrix}} ℙ (X_{K_{E^{'}_{1}}}^{(τ_{w_{1}} e_{w_{1}})} = x_{- 1} + x_{1} e_{w_{1}}, X_{K_{E^{'} 2}}^{(τ_{w_{2}} e_{w_{2}})} = x_{- 2} + x_{2} e_{w_{1}} | X_{K_{E}}^{(0)} = x_{- 1} + x_{- 2} + x_{v} e_{v}) \times ℓ_{x_{1} e_{w_{1}} + x_{- 1}}^{E_{1}^{'}, τ_{w_{1}} e_{w_{1}}} ℓ_{x_{2} e_{w_{2}} + x_{- 2}}^{E_{2}^{'}, τ_{w_{2}} e_{w_{2}}} = \sum_{\overset{x_{1}, x_{2} :}{x_{1} + x_{2} = x_{v}}} \frac{(\begin{matrix} n_{w_{1}} \\ x_{1} \end{matrix}) (\begin{matrix} n_{w_{2}} \\ x_{2} \end{matrix})}{(\begin{matrix} n_{v} \\ x_{v} \end{matrix})} ℓ_{x_{1} e_{w_{1}} + x_{- 1}}^{E_{1}^{'}, τ_{w_{1}} e_{w_{1}}} ℓ_{x_{2} e_{w_{2}} + x_{- 2}}^{E_{2}^{'}, τ_{w_{2}} e_{w_{2}}}

with the first equality from the Markov property of the Moran process, and the second equality following from the exchangeability of the n_v alleles at vertex v (Lemma 6). □

A.2.7. Proof of Theorem 2

ϕ ⊙ (π^{1} \otimes \dots \otimes π^{D}) = \sum_{z_{1}, \dots, z_{D}} ϕ_{z_{1}, \dots, z_{D}} π_{z_{1}}^{1} \dots π_{z_{D}}^{D} = DP (π^{1}, \dots, π^{D}) - (\prod_{d = 1}^{D} π_{0}^{d}) DP (e_{0}, \dots, e_{0}) - (\prod_{d = 1}^{D} π_{n_{d}}^{d}) D P (e_{n_{1}}, \dots, e_{n_{D}}) .

Proof. Below, we will prove DP $(ℓ^{1}, \dots, ℓ^{D})$ is a multilinear function of the input vectors $ℓ^{1}, \dots, ℓ^{D}$ . The result immediately follows from this, because then

ϕ ⊙ (π^{1} \otimes \dots \otimes π^{D}) = \sum_{z \neq 0, n} ϕ_{z} π_{z_{1}}^{1} \dots π_{z_{D}}^{D} = \sum_{z \neq 0, n} DP (π_{z_{1}}^{1} e_{z_{1}}, \dots, π_{z_{D}}^{D} e_{z_{D}}) = DP (\sum_{z_{1} = 0}^{n_{1}} π_{z_{1}}^{1} e_{z_{1}}, \dots, \sum_{z_{D} = 0}^{n_{D}} π_{z_{D}}^{D} e_{z_{D}}) - DP (π_{0}^{1} e_{0}, \dots, π_{0}^{D} e_{0}) - DP (π_{n_{1}}^{1} e_{n_{1}}, \dots, π_{n_{D}}^{D} e_{n_{D}}) = DP (π^{1}, \dots, π^{D}) - DP (π_{0}^{1} e_{0}, \dots, π_{0}^{D} e_{0}) - DP (π_{n_{1}}^{1} e_{n_{1}}, \dots, π_{n_{D}}^{D} e_{n_{D}}) .

We now show DP $(ℓ^{1}, \dots, ℓ^{D})$ is a multilinear function of $ℓ^{1}, \dots, ℓ^{D}$ . We start by showing that if event E has leaf populations Leaves(E)=(d₁,…, d_L), then ℓ^{E, t} is a multilinear functions of $ℓ^{d_{1}}, \dots, ℓ^{d_{L}}$ . We show this by induction over (E,t). The base case, where t = 0 and E is a leaf with K_E = (d_i), is trivially true because $ℓ^{E, 0} = ℓ^{d_{i}}$ .

For the next case, we note that (6), (7), and (9) express ℓ^{E, t} as a tensor product of ℓ^{E′, t′} with a matrix, where (E′, t′) ≺ (E, t) and Leaves(E) = Leaves(E′). ℓ^{E′, t′} is a multilinear function of $ℓ^{d_{1}}, \dots, ℓ^{d_{L}}$ by induction, hence ℓ^{E, t} is also.

Similarly, (8) expresses ℓ^{E, 0} as a product of $ℓ^{E_{1}^{'}, t_{1}}$ and $ℓ^{{E^{'}}_{2}, t_{2}}$ , where $ℓ^{{E^{'}}_{i}, t_{i}}$ is a multilinear function of (ℓ^k:k ∈Leaves(E′_i)} by induction, and furthermore Leaves(E′₁)∩Leaves(E′₂) = ∅ and Leaves(E′₁)∩Leaves(E′₂) = Leaves(E) Thus ℓ^{E, 0} is a multilinear function of $ℓ^{d_{1}}, \dots, ℓ^{d_{L}}$ .

Thus, each of the operations (6), (7), (8), (9) in Algorithm 2 preserves multilinearity of ℓ^{E, t}, and thus each ℓ^{E, t} is a multilinear function of its leaf vectors by induction. Finally, a similar induction argument shows that each ϕ^E is a multilinear function of {ℓ^k : k ∈ Leaves(E)}, since (10) expresses ϕ^E as a sum of multilinear functions by induction. □

A.3. Computational complexity

Computing the SFS ϕ_z Algorithm 2 involves keeping track of the partial likelihoods $ℓ_{x, z}^{E, 0}$ at each event E. For a dataset with s unique observations z, the array of partial likelihoods for every z and every hidden state x at event E is $ℓ^{E, 0} = {(ℓ_{x, z}^{E, 0})}_{x, z}$ a tensor with $s \prod_{v \in K_{E}} (n_{v} + 1) = O (s \prod_{v \in K_{E}} n_{v}) = O (s n^{| K_{E} |})$ total entries (where $n = \sum_{d = 1}^{D} n_{d}$ ).

We now consider the time cost of each operation in Algorithm 2. We start by considering the “universal constants” that depend only on n_v, and not on the parameters of the demography or the number of SNPs. Next, we consider terms that need to be computed once per demography, but don’t depend on the number of SNPs. Finally, we consider those terms that need to be recomputed once for each SNP and each demography.

Proposition 1. For a population graph $G$ with corresponding event tree $T$ , let |V| be the number of vertices (populations) in the graph, and let M_admix be the number of admixture events. Then, for a dataset with n samples and s unique observations, computing the likelihoods of d different demographies with the same topology { $G$ and $T$ ) but different parameters (sizes, pulse strengths, branch lengths), has a computational complexity of

O (| V | n^{3} + M_{admix} (n^{5} + d n^{4}) + d | V | n^{2} + d s \sum_{E \in T} n^{| K_{E} | + 1}) .

Proof. In Lemma 1, we compute the matrix exponential of (6) as $e^{Q^{(n_{v})} \int_{0}^{τ_{v}}}^{\frac{1}{η_{v} (t)} d t} = U e^{Σ \int_{0}^{τ_{v}} \frac{1}{η_{v} (t)} d t} U^{- 1}$ where U are the eigenvectors in the decomposition $Q^{(n_{v})} = U Σ U^{- 1}$ ; this eigenvalue decomposition costs $O (n_{v}^{3})$ . Similarly, in Lemma 4 we compute the pseudoinverse of (9) as B⁺ =VΣ⁻¹U^T for SVD B = UΣV^T; this SVD also costs $O (n_{v}^{3})$ Both the eigenvalue decomposition and the SVD are universal for all demographic parameters, and computed at most once per population v, which contributes an overall complexity of O(V|n³).

The innermost sum of (7) in Lemma 2 is also universal, and costs $O (n_{w}^{5})$ to compute for all possible values of x₁,x₂,x_w,m₁ leading to the term O(M_admixn⁵). However, the middle sum of this equation is not universal, since it depends on q₁, q₂. While not universal, this middle sum only needs to be computed once per demography, and does not depend on the number of SNPs; in particular, computing it for all values of x₁,x₂,x_w, costs $O (n_{w}^{4})$ per demography, contributing an overall complexity of O(M_admixdn⁴) similarly, the term in $f_{n_{v}}^{v} (k)$ in (10) of Lemma 5 is computed once per demography, and costs $O (n_{v}^{2})$ (Kamm et al., 2017), contributing a complexity of O(d|V|n²).

Finally, we consider the terms that need to be recomputed for every SNP and every demography. Equations (6), (7), (8) of Lemmas 1, 2, 3, respectively, each cost $O (n^{| K_{E} | + 1})$ per SNP, since they express $ℓ_{x, z}^{E, 0}$ as a sum of O(n) terms. Similarly, equation (9) of Lemma 4 costs $O (n^{| K_{E'} | + 1})$ , since we compute $ℓ_{x, z}^{E, 0}$ as a product of an O(n) × O(n) matrix B⁺ with a tensor $y^{x_{\cap}}$ containing $O (n^{| K_{E'} | - 1})$ entries; $y^{x_{\cap}}$ is itself computed by summing over all $O (n^{| K_{E'} |})$ entries of $ℓ_{x, z}^{E', τ_{w_{1}} e_{w_{1}} + τ_{w_{2}} e_{w_{2}}}$ . Altogether, this contributes a complexity of $d s \sum_{E} n^{| K_{E} | + 1}$ . □

A.4. Application supplement

A.4.1. Data

Table 3 gives the populations and samples we used for our example application in Section 4. All samples except for MA1 were taken from the SGDP dataset (Mallick et al., 2016). We added on the additional low-coverage MA1 sample (Raghavan et al., 2014) to represent the Ancient North Eurasian (ANE) component of European ancestry. Due to the low coverage of MA1, we only sampled a single random allele from it at each site, using a GATK pileup (McKenna et al., 2010) and restricting ourselves to reads with quality ≥30.

Table 3.

Populations and samples used for the example application in Section 4. We only used 1 random allele for MA1 due to low coverage. The ages of the samples are given in thousands of years ago (kya).

Population	Individuals	Alleles	kya
Mbuti	3	6	0
Han	2	4	0
Sardinian	2	4	0
Loschbour	1	2	7.5
LBK (Stuttgart)	1	2	8
MA1 (Mal’ta)	1	1	24
Ust’Ishim	1	2	45
Altai Neanderthal	1	2	50

Open in a new tab

We ascertained SNPs at SGDP filter level 1, then used the genotype calls at filter level 0 at the ascertained SNPs. When ascertaining SNPs, we limited ourselves to sites that were polymorphic among the samples excluding MA1 and Neanderthal. We excluded MA1 during ascertainment due to its low coverage. We also excluded the Neanderthal sample during ascertainment because it had substantially fewer new mutations than expected based on its age; it was unclear whether this was due to changes in the mutation rate on that lineage since its deep split with modern humans, or whether this was an artifact of the SNP calling strategy used by SGDP. We used Theorem 2 to correct the normalizing constant of the SFS due to excluding MA1 and Neanderthal during ascertainment; in particular, we normalized the SFS by the total branch length of the subtree excluding MA1 and Neanderthal.

To avoid biases in ancient DNA caused by deamination (Dabney et al., 2013), we removed all transitions (i.e. A ↔ G and C ↔T mutations), keeping only the transversions. We used Chimp as a proxy for the ancestral allele, removing all sites where the Chimp allele was missing. After data cleaning, we were left with 2,444,888 autosomal transversion SNPs that were segregating among the samples excluding MA1 and Neanderthal.

A.4.2. Model fitting procedure

We fit the model in Figure 3 in an iterative fashion, adding populations in one at a time and re-estimating the parameters. We started with a tree including the 4 populations Mbuti, Loschbour, Han, and Ust’lshim, with no admixture events. This initial model had 8 parameters: the population sizes at the Han, Mbuti, and Loschbour leaves, the population size at the Eurasian and human ancestor (the Ust’lshim leaf was set to the ancestral Eurasian population size), and the times that the Han, Ust’lshim, and Mbuti populations diverged from Loschbour.

We next added in the Neanderthal population, with an admixture event from Neanderthal to the Eurasian ancestor. We added parameters for the Neanderthal-human split time, the Neanderthal-Eurasian introgression tie and strength, the ancestral Neanderthal-human population size, and a Neanderthal population exponential decline rate starting at the Eurasian-Mbuti split, following results from Prüfer et al. (2014).

We followed by adding on the MA1 sample, adding a single parameter for its split time (we fixed its population size to be the ancestral Eurasian population size). We then added the LBK early farmer, adding 4 parameters for its divergence time, Basal Eurasian admixture time and strength, and the split time of the Basal Eurasian lineage. Finally, we added in a Sardinian population, along with parameters for its population size, its split time, and admixture times and strength from the Loschbour WHG.

At each step, we re-estimated all parameters using the L-BFGS-B optimization algorithm, maximizing the multinomial composite likelihood (11). For the final few models (adding in LBK and then Sardinian) we initialized each estimation with a stochastic gradient descent before finishing with L-BFGS-B.

A.4.3. Mutation rate estimation

We used the within-population nucleotide diversity to estimate the mutation rate. The nucleotide diversity is the number of sites where 2 random alleles (drawn without replacement) differ. The empirical value of the nucleotide diversity in population i is

{\hat{π}}_{i} = \sum_{s \in SNPs} 2 \frac{x_{i}^{(s)} (n_{i} - x_{i}^{(s)})}{n_{i} (n_{i} - 1)} = \sum_{x} 2 \frac{x_{i} (n_{i} - x_{i})}{n_{i} (n_{i} - 1)} f_{x},

while the expected value of the nucleotide diversity per unit mutation rate is

π_{i} = \frac{1}{θ} E [{\hat{π}}_{i}] = \sum_{x} 2 \frac{x_{i}^{(s)} (n_{i} - x_{i}^{(s)})}{n_{i} (n_{i} - 1)} ϕ_{x} .

This yields a mutation rate estimate ${\hat{θ}}_{i}$ for each population, given by ${\hat{θ}}_{i} = \frac{{\hat{π}}_{i}}{π_{i}}$ . We then averaged over ${\hat{θ}}_{i}$ to obtain our combined estimate $\hat{θ} = \frac{1}{D} \sum_{i = 1}^{D} {\hat{θ}}_{i}$ . To obtain the per-site mutation rate we need to divide by the number of bases L after our data cleaning process. We approximated this as follows:

At SGDP filter level 1, there are about 2.13 × 10⁹ sites per individual.
Multiply this by 0.93 due to excluding the sex chromosomes.
Multiply this by 0.32 due to only using transversions.
Multiply by 0.93 to account for excluding sites missing the Chimp allele.
Finally, multiply 1.1 to account for using genotype calls at filter level 0. The inflation in the number of sites here is because we are using a stricter filter to ascertain SNPs (SGDP filter level 1) than we are to call genotypes at the SNPs (SGDP filter level 0). In particular, we need to account for SNPs where the sample has a heterozygous genotype at filter level 0, but has a missing genotype at filter level 1 ; these genotypes are counted as heterozygotes in the SFS, and we need to adjust the number of bases L to account for this.

The latter 2 factors (accounting for sites missing the Chimp allele and for adding in genotype calls at filter level 0) we estimated by observing how the number of heterozygotes per individual changed after these data cleaning steps.

Footnotes

A reviewer noted that Jouganous et al. (2017) also employ a Moran-based model with discrete and continuous admixture, however, as noted above, their method of solution is completely different from ours.

We omit certain graph preprocessing steps (moralization and triangulation) from the general junction tree algorithm which are not necessary in our setting.

The calls to DP(e₀,…,e₀) and DP $(e_{n_{1}}, \dots, e_{n_{D}})$ only need to be computed once for all statistics.

References

Baharian S and Gravel S (2018). On the decidability of population size histories from finite allele frequency spectra. Theoretical Population Biology 120, 42–51. [DOI] [PubMed] [Google Scholar]
Beaumont MA and Nichols RA (1996). Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society of London. Series B: Biological Sciences 263(1377), 1619–1626. [Google Scholar]
Bhaskar A and Song YS (2014). Descartes’ rule of signs and the identifiability of population demographic models from genomic variation data. Annals of Statistics 42 (6), 2469–2493. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bhaskar A, Wang YXR, and Song YS (2015). Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Research 25 (2), 268–279. [DOI] [PMC free article] [PubMed] [Google Scholar]
Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, et al. (2008). Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genetics 4 (5), e1000083. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, and RoyChoudhury A (2012). Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution 29 (8), 1917–1932. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen H (2012). The joint allele frequency spectrum of multiple populations: A coalescent theory approach. Theoretical Population Biology 81 (2), 179–195. [DOI] [PubMed] [Google Scholar]
Corliss G, Faure C, Griewank A, Hascoet L, and Naumann U (2002). Automatic Differentiation of Algorithms: From Simulation to Optimization, Volume 1 New York: Springer Science & Business Media. [Google Scholar]
Coventry A, Bull-Otterson LM, Liu X, Clark AG, Maxwell TJ, Crosby J, Hixson JE, Rea TJ, Muzny DM, Lewis LR, et al. (2010). Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nature Communications 1, 131. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dabney J, Meyer M, and Pääbo S (2013). Ancient DNA damage. Cold Spring Harbor perspectives in biology 5(7), a012567. [DOI] [PMC free article] [PubMed] [Google Scholar]
De lorio M and Griffiths RC (2004). Importance sampling on coalescent histories. II: Subdivided population models. Adv. Appl Prob. 36, 434–454. [Google Scholar]
De Maio N, Schrempf D, and Kosiol C (2015). Pomo: An allele frequency-based approach for species tree estimation. Systematic Biology 64 (6), 1018–1031. [DOI] [PMC free article] [PubMed] [Google Scholar]
Donnelly P and Kurtz T (1996). A countable representation of the Fleming-Viot measure-valued diffusion. Ann Probab 24, 698–742. [Google Scholar]
Donnelly P, Kurtz TG, et al. (1999). Particle representations for measure-valued population models. The Annals of Probability 27 (1), 166–205. [Google Scholar]
Durrett R (2008). Probability Models for DNA Sequence Evolution (2nd ed.). Springer, New York. [Google Scholar]
Ewens WJ (2004). Mathematical Population Genetics: I. Theoretical Introduction. New York: Springer Science+Business Media, Inc. [Google Scholar]
Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, and Foil M (2013). Robust demographic inference from genomic and SNP data. PLoS Genetics 9 (10), e1003905. [DOI] [PMC free article] [PubMed] [Google Scholar]
Excoffier L and Foil M (2011). Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27 (9), 1332–1334. [DOI] [PubMed] [Google Scholar]
Fay JC and Wu CI (2000). Hitchhiking under positive Darwinian selection. Genetics 155, 1405–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
Felsenstein J (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17, 368–376. [DOI] [PubMed] [Google Scholar]
Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PL, Aximu-Petri A, Prüfer K, de Filippo C, et al. (2014). Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514 (7523), 445. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gazave E, Ma L, Chang D, Coventry A, Gao F, Muzny D, Boerwinkle E, Gibbs RA, Sing CF, Clark AG, et al. (2014). Neutral genomic regions refine models of recent rapid human population growth. Proceedings of the National Academy of Sciences 111 (2), 757–762. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, Yu F, Gibbs RA, Bustamante CD, Altshuler DL, et al. (2011). Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences 108 (29), 11983–11988. [DOI] [PMC free article] [PubMed] [Google Scholar]
Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH-Y, et al. (2010). A draft sequence of the Neandertal genome. Science 328(5979), 710–722. [DOI] [PMC free article] [PubMed] [Google Scholar]
Griffiths R and Tavaré S (1998). The age of a mutation in a general coalescent tree. Communications in Statistics. Stochastic Models 14 (1–2), 273–295. [Google Scholar]
Griffiths RC and Tavaré S (1997). Computational methods for the coalescent In Donnelly P and Tavaré S (Eds.), Progress in population genetics and human evolution, Volume 87, pp. 165–182. Springer-Verlag, Berlin. [Google Scholar]
Gutenkunst RN, Hernandez RD, Williamson SH, and Bustamante CD (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics 5(10), e1000695. [DOI] [PMC free article] [PubMed] [Google Scholar]
Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, Brandt G, Nordenfelt S, Harney E, Stewardson K, et al. (2015). Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
Holsinger KE and Weir BS (2009). Genetics in geographically structured populations: defining, estimating and interpreting FST. Nature Reviews Genetics 10 (9), 639–650. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jenkins PA, Mueller JW, and Song YS (2014). General triallelic frequency spectrum under demographic models with variable population size. Genetics 196(1), 295–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jouganous J, Long W, Ragsdale AP, and Gravel S (2017). Inferring the joint demographic history of multiple populations: Beyond the diffusion approximation. Genetics 206(3), 1549–1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kamm JA, Terhorst J, and Song YS (2017). Efficient computation of the joint sample frequency spectra for multiple populations. Journal of Computational and Graphical Statistics 26 (1), 182–194. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kelleher J, Etheridge AM, and McVean G (2016). Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS computational biology 12(5), e1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kimura M (1969). The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61 (4), 893–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kingman JFC (1982). The coalescent. Stoch. Process. Appl 13, 235–248. [Google Scholar]
Koller D and Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT press. [Google Scholar]
Lauritzen SL and Spiegelhalter DJ (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B (Methodological) 50 (2), 157–224. [Google Scholar]
Lazaridis I, Nadel D, Rollefson G, Merrett DC, Rohland N, Mallick S, Fernandes D, Novak M, Gamarra B, Sirak K, et al. (2016). Genomic insights into the origin of farming in the ancient Near East. Nature 536 (7617), 419–424. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, Sudmant PH, Schraiber JG, Castellano S, Lipson M, et al. (2014). Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513 (7518), 409–413. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lukić S and Hey J (2012). Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion. Genetics 192 (2), 619–639. [DOI] [PMC free article] [PubMed] [Google Scholar]
Maclaurin D, Duvenaud D, and Adams RP (2015). Autograd: Effortless gradients in numpy. In ICML 2015 AutoML Workshop. [Google Scholar]
Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, et al. (2016). The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538 (7624), 201–206. [DOI] [PMC free article] [PubMed] [Google Scholar]
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. (2010). The Genome Analysis Toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Research 20 (9), 1297–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meyer M, Arsuaga J-L, de Filippo C, Nagel S, Aximu-Petri A, Nickel B, Martinez I, Gracia A, de Castro JMB, Carbonell E, et al. (2016). Nuclear DNA sequences from the middle pleistocene sima de los huesos hominins. Nature 531 (7595), 504–507. [DOI] [PubMed] [Google Scholar]
Myers S, Fefferman C, and Patterson N (2008). Can one learn history from the allelic spectrum? Theor. Popul. Biol 73(3), 342–348. [DOI] [PubMed] [Google Scholar]
Nelson MR, Wegmann D, Ehm MG, Kessner D, Jean PS, Verzilli C, Shen J, Tang Z, Bacanu S-A, Fraser D, et al. (2012). An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337(6090), 100–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nielsen R (2000). Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154 (2), 931–942. [DOI] [PMC free article] [PubMed] [Google Scholar]
Notohara M (1990). The coalescent and the genealogical process in geographically structured population. Journal of Mathematical Biology 29 (1), 59–75. [DOI] [PubMed] [Google Scholar]
Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, and Reich D (2012). Ancient admixture in human history. Genetics 192(3), 1065–1093. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pearl J (1982). Reverend bayes on inference engines: a distributed hierarchical approach. In Proceedings of the National Conference on Artificial Intelligence, pp. 133–136. [Google Scholar]
Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, De Filippo C, et al. (2014). The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505(7481), 43–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A, Moltke I, Rasmussen S, Stafford TW Jr, Orlando L, Metspalu E, et al. (2014). Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505(7481), 87–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
Saitou N and Nei M (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(A), 406–425. [DOI] [PubMed] [Google Scholar]
Sawyer SA and Hartl DL (1992). Population genetics of polymorphism and divergence. Genetics 132(A), 1161–1176. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scally A (2016). The mutation rate in human evolution and demographic inference. Current Opinion in Genetics & Development 41, 36–43. [DOI] [PubMed] [Google Scholar]
Schaffner SF, Foo C, Gabriel S, Reich D, Daly WJ, and Altshuler D (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stephens M and Donnelly P (2000). Inference in molecular population genetics. J. Royal Stat. Soc. B 62, 605–655. [Google Scholar]
Tajima F (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123 (3), 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
Takahata N (1988). The coalescent in two partially isolated diffusion populations. Genetics Research 52(3), 213–222. [DOI] [PubMed] [Google Scholar]
Terhorst J, Kamm JA, and Song YS (2017). Robust and scalable inference of population history from hundreds of unphased whole genomes. Nature Genetics 49(2), 303–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
Terhorst J and Song YS (2015). Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proceedings of the National Academy of Sciences 112(25), 7677–7682. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wakeley J and Hey J (1997). Estimating ancestral population parameters. Genetics 145(3), 847–855. [DOI] [PMC free article] [PubMed] [Google Scholar]
Watterson G (1975). On the number of segregating sites in genetical models without recombination. Theoretical Population Biology 7(2), 256–276. [DOI] [PubMed] [Google Scholar]
Wegmann D, Leuenberger C, Neuenschwander S, and Excoffier L (2010). ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC bioinformatics 11 (1), 116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeng K, Fu Y-X, Shi S, and Wu C-I (2006). Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174 (3), 1431–1439. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] Baharian S and Gravel S (2018). On the decidability of population size histories from finite allele frequency spectra. Theoretical Population Biology 120, 42–51. [DOI] [PubMed] [Google Scholar]

[R2] Beaumont MA and Nichols RA (1996). Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society of London. Series B: Biological Sciences 263(1377), 1619–1626. [Google Scholar]

[R3] Bhaskar A and Song YS (2014). Descartes’ rule of signs and the identifiability of population demographic models from genomic variation data. Annals of Statistics 42 (6), 2469–2493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] Bhaskar A, Wang YXR, and Song YS (2015). Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. Genome Research 25 (2), 268–279. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR, et al. (2008). Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genetics 4 (5), e1000083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, and RoyChoudhury A (2012). Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution 29 (8), 1917–1932. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Chen H (2012). The joint allele frequency spectrum of multiple populations: A coalescent theory approach. Theoretical Population Biology 81 (2), 179–195. [DOI] [PubMed] [Google Scholar]

[R8] Corliss G, Faure C, Griewank A, Hascoet L, and Naumann U (2002). Automatic Differentiation of Algorithms: From Simulation to Optimization, Volume 1 New York: Springer Science & Business Media. [Google Scholar]

[R9] Coventry A, Bull-Otterson LM, Liu X, Clark AG, Maxwell TJ, Crosby J, Hixson JE, Rea TJ, Muzny DM, Lewis LR, et al. (2010). Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nature Communications 1, 131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Dabney J, Meyer M, and Pääbo S (2013). Ancient DNA damage. Cold Spring Harbor perspectives in biology 5(7), a012567. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] De lorio M and Griffiths RC (2004). Importance sampling on coalescent histories. II: Subdivided population models. Adv. Appl Prob. 36, 434–454. [Google Scholar]

[R12] De Maio N, Schrempf D, and Kosiol C (2015). Pomo: An allele frequency-based approach for species tree estimation. Systematic Biology 64 (6), 1018–1031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] Donnelly P and Kurtz T (1996). A countable representation of the Fleming-Viot measure-valued diffusion. Ann Probab 24, 698–742. [Google Scholar]

[R14] Donnelly P, Kurtz TG, et al. (1999). Particle representations for measure-valued population models. The Annals of Probability 27 (1), 166–205. [Google Scholar]

[R15] Durrett R (2008). Probability Models for DNA Sequence Evolution (2nd ed.). Springer, New York. [Google Scholar]

[R16] Ewens WJ (2004). Mathematical Population Genetics: I. Theoretical Introduction. New York: Springer Science+Business Media, Inc. [Google Scholar]

[R17] Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, and Foil M (2013). Robust demographic inference from genomic and SNP data. PLoS Genetics 9 (10), e1003905. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Excoffier L and Foil M (2011). Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27 (9), 1332–1334. [DOI] [PubMed] [Google Scholar]

[R19] Fay JC and Wu CI (2000). Hitchhiking under positive Darwinian selection. Genetics 155, 1405–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] Felsenstein J (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17, 368–376. [DOI] [PubMed] [Google Scholar]

[R21] Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PL, Aximu-Petri A, Prüfer K, de Filippo C, et al. (2014). Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514 (7523), 445. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] Gazave E, Ma L, Chang D, Coventry A, Gao F, Muzny D, Boerwinkle E, Gibbs RA, Sing CF, Clark AG, et al. (2014). Neutral genomic regions refine models of recent rapid human population growth. Proceedings of the National Academy of Sciences 111 (2), 757–762. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, Yu F, Gibbs RA, Bustamante CD, Altshuler DL, et al. (2011). Demographic history and rare allele sharing among human populations. Proceedings of the National Academy of Sciences 108 (29), 11983–11988. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH-Y, et al. (2010). A draft sequence of the Neandertal genome. Science 328(5979), 710–722. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Griffiths R and Tavaré S (1998). The age of a mutation in a general coalescent tree. Communications in Statistics. Stochastic Models 14 (1–2), 273–295. [Google Scholar]

[R26] Griffiths RC and Tavaré S (1997). Computational methods for the coalescent In Donnelly P and Tavaré S (Eds.), Progress in population genetics and human evolution, Volume 87, pp. 165–182. Springer-Verlag, Berlin. [Google Scholar]

[R27] Gutenkunst RN, Hernandez RD, Williamson SH, and Bustamante CD (2009). Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics 5(10), e1000695. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B, Brandt G, Nordenfelt S, Harney E, Stewardson K, et al. (2015). Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] Holsinger KE and Weir BS (2009). Genetics in geographically structured populations: defining, estimating and interpreting FST. Nature Reviews Genetics 10 (9), 639–650. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] Jenkins PA, Mueller JW, and Song YS (2014). General triallelic frequency spectrum under demographic models with variable population size. Genetics 196(1), 295–311. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] Jouganous J, Long W, Ragsdale AP, and Gravel S (2017). Inferring the joint demographic history of multiple populations: Beyond the diffusion approximation. Genetics 206(3), 1549–1567. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Kamm JA, Terhorst J, and Song YS (2017). Efficient computation of the joint sample frequency spectra for multiple populations. Journal of Computational and Graphical Statistics 26 (1), 182–194. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Kelleher J, Etheridge AM, and McVean G (2016). Efficient coalescent simulation and genealogical analysis for large sample sizes. PLoS computational biology 12(5), e1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] Kimura M (1969). The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61 (4), 893–903. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] Kingman JFC (1982). The coalescent. Stoch. Process. Appl 13, 235–248. [Google Scholar]

[R36] Koller D and Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. MIT press. [Google Scholar]

[R37] Lauritzen SL and Spiegelhalter DJ (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society. Series B (Methodological) 50 (2), 157–224. [Google Scholar]

[R38] Lazaridis I, Nadel D, Rollefson G, Merrett DC, Rohland N, Mallick S, Fernandes D, Novak M, Gamarra B, Sirak K, et al. (2016). Genomic insights into the origin of farming in the ancient Near East. Nature 536 (7617), 419–424. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, Sudmant PH, Schraiber JG, Castellano S, Lipson M, et al. (2014). Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513 (7518), 409–413. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Lukić S and Hey J (2012). Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion. Genetics 192 (2), 619–639. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Maclaurin D, Duvenaud D, and Adams RP (2015). Autograd: Effortless gradients in numpy. In ICML 2015 AutoML Workshop. [Google Scholar]

[R42] Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, et al. (2016). The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538 (7624), 201–206. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. (2010). The Genome Analysis Toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Research 20 (9), 1297–1303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Meyer M, Arsuaga J-L, de Filippo C, Nagel S, Aximu-Petri A, Nickel B, Martinez I, Gracia A, de Castro JMB, Carbonell E, et al. (2016). Nuclear DNA sequences from the middle pleistocene sima de los huesos hominins. Nature 531 (7595), 504–507. [DOI] [PubMed] [Google Scholar]

[R45] Myers S, Fefferman C, and Patterson N (2008). Can one learn history from the allelic spectrum? Theor. Popul. Biol 73(3), 342–348. [DOI] [PubMed] [Google Scholar]

[R46] Nelson MR, Wegmann D, Ehm MG, Kessner D, Jean PS, Verzilli C, Shen J, Tang Z, Bacanu S-A, Fraser D, et al. (2012). An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337(6090), 100–104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] Nielsen R (2000). Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154 (2), 931–942. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] Notohara M (1990). The coalescent and the genealogical process in geographically structured population. Journal of Mathematical Biology 29 (1), 59–75. [DOI] [PubMed] [Google Scholar]

[R49] Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, Genschoreck T, Webster T, and Reich D (2012). Ancient admixture in human history. Genetics 192(3), 1065–1093. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] Pearl J (1982). Reverend bayes on inference engines: a distributed hierarchical approach. In Proceedings of the National Conference on Artificial Intelligence, pp. 133–136. [Google Scholar]

[R51] Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, De Filippo C, et al. (2014). The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505(7481), 43–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A, Moltke I, Rasmussen S, Stafford TW Jr, Orlando L, Metspalu E, et al. (2014). Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505(7481), 87–91. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] Saitou N and Nei M (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(A), 406–425. [DOI] [PubMed] [Google Scholar]

[R54] Sawyer SA and Hartl DL (1992). Population genetics of polymorphism and divergence. Genetics 132(A), 1161–1176. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] Scally A (2016). The mutation rate in human evolution and demographic inference. Current Opinion in Genetics & Development 41, 36–43. [DOI] [PubMed] [Google Scholar]

[R56] Schaffner SF, Foo C, Gabriel S, Reich D, Daly WJ, and Altshuler D (2005). Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576–1583. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] Stephens M and Donnelly P (2000). Inference in molecular population genetics. J. Royal Stat. Soc. B 62, 605–655. [Google Scholar]

[R58] Tajima F (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123 (3), 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R59] Takahata N (1988). The coalescent in two partially isolated diffusion populations. Genetics Research 52(3), 213–222. [DOI] [PubMed] [Google Scholar]

[R60] Terhorst J, Kamm JA, and Song YS (2017). Robust and scalable inference of population history from hundreds of unphased whole genomes. Nature Genetics 49(2), 303–309. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] Terhorst J and Song YS (2015). Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum. Proceedings of the National Academy of Sciences 112(25), 7677–7682. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R62] Wakeley J and Hey J (1997). Estimating ancestral population parameters. Genetics 145(3), 847–855. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R63] Watterson G (1975). On the number of segregating sites in genetical models without recombination. Theoretical Population Biology 7(2), 256–276. [DOI] [PubMed] [Google Scholar]

[R64] Wegmann D, Leuenberger C, Neuenschwander S, and Excoffier L (2010). ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC bioinformatics 11 (1), 116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R65] Zeng K, Fu Y-X, Shi S, and Wu C-I (2006). Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174 (3), 1431–1439. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Efficiently inferring the demographic history of many populations with allele count data

Jack Kamm

Jonathan Terhorst

Richard Durbin

Yun S Song

Abstract

1. Introduction

2. Background

2.1. Demographic events

2.2. Likelihoods and the site frequency spectrum

2.3. Existing work and our contribution

3. Method

Table 1.

3.1. Example

Fig. 1.

Fig. 2.

3.2. Algorithms and formulas

Algorithm 1.

Algorithm 2.

3.3. Normalizing constant and other linear functionals

4. Application

Fig. 3.

Table 2.

Fig. 4.

Acknowledgments

A. Appendix

A.1. Model and notation

Fig. 5.

A.2. Proofs

A.2.1. Proof of Theorem 1

A.2.2. Proof of Lemma 5

A.2.3. Proof of Lemma 1

A.2.4. Proof of Lemma 2

A.2.5. Proof of Lemma 4

A.2.6. Proof of Lemma 3

A.2.7. Proof of Theorem 2

A.3. Computational complexity

A.4. Application supplement

A.4.1. Data

Table 3.

A.4.2. Model fitting procedure

A.4.3. Mutation rate estimation

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases