Skip to main content
Elsevier Sponsored Documents logoLink to Elsevier Sponsored Documents
. 2021 Jan 21;509:110400. doi: 10.1016/j.jtbi.2020.110400

The probability distribution of the ancestral population size conditioned on the reconstructed phylogenetic tree with occurrence data

Marc Manceau 1,, Ankit Gupta 1, Timothy Vaughan 1, Tanja Stadler 1,
PMCID: PMC7733867  PMID: 32739241

Highlights

  • Reconstructed phylogenetic trees contain valuable information on population dynamics.

  • Considering records of occurrences through time allows us to get more relevant data.

  • Data is modeled using a birth-death process with specific sampling scheme.

  • The distribution of the population size conditioned on the data is derived.

  • This distribution will foster new advances in macroevolution and epidemiology.

Keywords: Birth-death process, Fossilized birth-death model, Epidemiology, Macroevolution, Phylogenetics

Abstract

We consider a homogeneous birth-death process with three different sampling schemes. First, individuals can be sampled through time and included in a reconstructed phylogenetic tree. Second, they can be sampled through time and only recorded as a point ‘occurrence’ along a timeline. Third, extant individuals can be sampled and included in the reconstructed phylogenetic tree with a fixed probability. We further consider that sampled individuals can be removed or not from the process, upon sampling, with fixed probability. We derive the probability distribution of the population size at any time in the past conditional on the joint observation of a reconstructed phylogenetic tree and a record of occurrences not included in the tree. We also provide an algorithm to simulate ancestral population size trajectories given the observation of a reconstructed phylogenetic tree and occurrences. This distribution can be readily used to draw inferences about the ancestral population size in the field of epidemiology and macroevolution. In epidemiology, these results will allow data from epidemiological case count studies to be used in conjunction with molecular sequencing data (yielding reconstructed phylogenetic trees) to coherently estimate prevalence through time. In macroevolution, it will foster the joint examination of the fossil record and extant taxa to reconstruct past biodiversity.

1. Introduction

Owing to seminal papers by Yule, 1925, Kendall, 1948, and much later by Nee et al. (1994), birth-death models have become ubiquitous in evolutionary biology. They are used as a population dynamic model, parameterized via a birth and death rate, in studies spanning fields as diverse as paleontology, macroevolution, linguistics, and epidemiology (see e.g. Foote, 2000, Heath et al., 2014, Gray et al., 2009, Stadler et al., 2013). A major aim when using these models is to reliably estimate the ancestral number of species, languages or infected individuals, i.e. past biodiversity, past prevalence, or more general past population sizes. In both macroevolution and epidemiology, population dynamics inferences can rely on occurrence data, i.e. the fossil record and the case counts record. This data is modeled as a sampling of individuals from the full population through time (Foote, 2000, Starrfelt and Liow, 2016).

In recent years, impressive sequencing efforts targeting present-day species and pathogens have enabled the reconstruction of phylogenies. Two main modeling approaches allow to quantify past population sizes in the past using these trees. First, phylodynamic tools have been developed to fit the birth and death rates of a birth-death process on the reconstructed phylogenetic tree of interest, while integrating over past population sizes (Stadler, 2011, Morlon et al., 2011). In order to quantify past population sizes, typically the expected population sizes based on these estimated birth and death rates are calculated (Morlon et al., 2011, Ratmann et al., 2016, Billaud et al., 2019).

Thus, such population sizes are not directly conditioned on the reconstructed phylogenetic tree. Instead, the statistical signal in the tree is only used to compute rate estimates. Second, phylodynamic tools have been developed to fit the expected population size of a coalescent model on a reconstructed phylogenetic tree. This modeling approach may appear as a better alternative, for it is directly parametrized with the population size that we wish to estimate. However, this comes at the cost of ignoring stochastic fluctuations in small populations (Morlon et al., 2010, Ratmann et al., 2016).

Statistical approaches stemming from the analysis of case count data or from the analysis of reconstructed evolutionary trees have been part of separate bodies of work for many years, historically yielding conflicts between biodiversity estimates based on the fossil record and estimates based on reconstructed phylogenies of extant taxa (Quental and Marshall, 2010 but see also Morlon et al., 2011). A first path towards merging these disparate data was introduced by the fossilized birth-death model of Stadler (2010), which considered a birth-death model with sampling and inclusion of individuals in the tree through time. This allowed taking into account infection trees reconstructed from pathogen sequences sampled throughout an epidemic (Stadler et al., 2011). In macroevolution, it paved the way to more precise phylogenetic dating using well-conserved fossil taxa which could be placed on a reconstructed phylogeny using morphological characters (Gavryushkina et al., 2016). Not so well-conserved fossils (i.e. occurrences) have also been used with this model, using a Markov Chain Monte Carlo (MCMC) scheme to integrate over all possible placements along a fixed tree (Heath et al., 2014). Analytical developments around this new model have been made by Gupta et al. (2019), which derived an analytical formula for the probability density of an outcome of the process, which consists of a reconstructed phylogenetic tree along with a record of occurrences. Again, all these methods do not quantify population sizes directly, but estimate birth and death rates while analytically integrating over population sizes.

Very recently, Vaughan et al. (2019) introduced a Monte-Carlo particle filtering algorithm allowing direct quantification of past population sizes and birth and death rates conditioned on reconstructed phylogenetic trees and occurrences (see Andrieu et al., 2010 for details about particle filtering methods). As such, it can produce more accurate population size estimates than the methods mentioned above as the estimates directly condition on all data, i.e. the occurrence record (e.g. poorly preserved fossils, or case count epidemiological record) and the reconstructed phylogenetic tree.

In this paper, we build on the analytical developments presented by Gupta et al. (2019), to calculate the past population size distribution as originally targeted by Vaughan et al. (2019). Our approach here is more analytic, leading to much faster numerical calculations compared to the particle filtering method previously developed. The efficiency of our method paves the way towards considering much bigger datasets, and towards extending the method to multi-type or density-dependent birth-death processes.

In Section 2, we present the model, notation, and an overview of the strategy to express the targeted distribution. In Section 3, we adapt the main results of Gupta et al. (2019) to compute the probability density of observations made after a given time, conditioned on the past population size. In Section 4, we provide a way to compute the joint density of the past population size and observations made before a given time. Combining results of Sections 3, 4 in Section 5, we compute the distribution of past population sizes conditional on the full outcome of the process, and perform sanity checks against previously published methods achieving similar tasks (Stadler, 2010, Vaughan et al., 2019, Gupta et al., 2019). We finally discuss applications and potential extensions of the model.

2. Model and notation

2.1. Parameters of the process

We consider a population of individuals, any of which can give birth to another individual at rate λ or die at rate μ. The process starts at time tor in the past with one individual, and evolves until reaching present time 0, i.e. time is oriented from the present towards the past. In the rest of the manuscript, something happening at time t will thus always refer to an event taking place t units before present.

We superimpose to this background population dynamics three different sampling schemes. First, individuals can be ψ-sampled at rate ψ throughout their lifetime. When ψ-sampled, the individual will be included in the reconstructed phylogenetic tree. Second, individuals can be ω-sampled at rate ω throughout their lifetime. When ω-sampled, the individual is not included in the reconstructed phylogenetic tree, but its sampling time is nevertheless recorded and called ‘an occurrence’. Last, the process finishes upon reaching the present time 0, and each extant individual at that time is ρ-sampled with fixed probability ρ, leading to their inclusion in the reconstructed phylogenetic tree. The sum of all per-capita rates will be called for short γ=λ+μ+ψ+ω.

Following Vaughan et al. (2019), we also include in the model an effect of the ψ- and ω-sampling through time on the population dynamics. We consider that, upon sampling, an individual is either removed from the process with probability r(0,1), or is unaffected by the sampling with probability (1-r). The overall number of individuals, denoted (It), thus follows a linear birth-death process with birth rate λ and death rate μ+(ψ+ω)r. Note that, because the ρ-sampling step occurs here at the end of the process, it does not matter whether or not individuals are removed upon ρ-sampling.

2.2. Introducing useful probabilities

Some aspects of this process have been previously investigated thoroughly. We now use two key probabilities. First, we will call ut the probability that a process starting at time t with only one individual remains unsampled up to and including the present time (time 0). We recall that ut satisfies the ordinary differential equation (ODE) (Maddison et al., 2007)

u0=zu˙t=λut2-γut+μ. (2.1)

The solution of this for a particular initial condition z being the following

u(t,z)=x1(x2-z)-x2(x1-z)e-Δt(x2-z)-(x1-z)e-Δt (2.2)

where Δ=γ2-4λμ>(λ+μ)2-4λμ(λ-μ)2>0 and x1,x2 are the two roots of the polynomial λx2-γx+μ,

x1=γ-Δ2λandx2=γ+Δ2λ.

Second, we call pt the probability that a process starting at time t with one individual precisely leads to one sampled individual at present time 0. Writing the ODE governing the evolution of this quantity leads to

p0=1-zp˙t=(2λu(t,z)-γ)pt. (2.3)

The solution of this being the following

p(t,z)=(1-z)Δλ2(x2-z)-(x1-z)e-Δt-2e-Δt. (2.4)

These formulas are well known, and correspond respectively to quantities called p0(t) and p1(t) in Stadler (2010). When z=1-ρ, we will drop the dependence on z and use the shorter notation ut,pt. We recall standard ways to derive these expressions in Appendix A.

2.3. Strategy of the paper

The process with sampling leads to the observation of two distinct objects (T,O) illustrated in Fig. 1.

Fig. 1.

Fig. 1

General setting of the method. a) the full process with sampling. Pink dots translate as dots in O and correspond to ω-sampling (sampling through time without sequencing). Blue dots translate as dots in T and correspond to ψ-sampling (sampling through time with sequencing). Yellow dots correspond to all present-day ρ-sampling events. Filled or unfilled dots correspond respectively to sampling with or without removal. b) Population size through time. c) Observed occurrences through time. d) Reconstructed phylogenetic tree. e) Number of individuals in reconstructed phylogenetic tree through time. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

The reconstructed phylogenetic tree T, on the one hand, represents the evolutionary relationships between all ψ-sampled and ρ-sampled individuals. We further consider that ψ-sampled individuals are labeled either as ‘removed’ or ‘non-removed’. All ψ-sampled removed individuals are necessarily leaves of T, whereas ψ-sampled non-removed ones can either stand as leaves (when the descent of the individual is not sampled) or as vertices along a branch (when the descent of the individual is further sampled), in which case they are referred to as sampled ancestors.

The record of occurrences O, on the other hand, is an ordered list of all ω-sampling times. We also consider that these sampling times are labeled as either ‘removed’ or ‘non-removed’.

In this paper, we are interested in computing the probability distribution of the number of individuals in the past, conditioned on the observed outcome (T,O) of the process. If kt denotes the number of sampled lineages in T at time t, we call our target distribution,

t0,iN0={0,1,2,},Kt(i)P(It=kt+i|T,O). (2.5)

We will refer to epochs as the maximal time slices within which no sampling event in O, nor branching event in T, happened. These epochs are delimited by the union of sampling times in O, branching times in the tree T, and sampling times of leaves and sampled ancestors in T. All pooled together, we call these ordered times (th)h=0n, starting at present time t0=0 and ending at the origin time tn=tor.

At any time t0 we also introduce:

Ttthe treeTstarting at the origin timetorand cut at timetTtthe collection of trees(or forest)obtained by cuttingTat timet,and considering all subtrees descending from cut lineagesOtO|(t,tor)OtO|(0,t)

The general strategy – and outline – of the paper is the following. We will traverse the tree and record of occurrences breadth-first, i.e. level-by-level through time. In a backward traversal we will compute the probability density of observations made between time t and 0 conditioned on the population size at time t. We call this probability density,

iN0,Lt(i)PTt,Ot|It=kt+i. (2.6)

In a forward traversal we will then compute the joint probability density of the observations made prior to time t and the population size at time t. We call this density,

iN0,Mt(i)PTt,Ot,It=kt+i. (2.7)

Provided we get expressions of (Lt)t=0tor and (Mt)t=0tor, our target distribution can then be expressed by combining both, noting that

Kt(i)PIt=kt+i|T,OPIt=kt+i,Tt,Ot,Tt,Ot=PTt,Ot|It=kt+i,Tt,OtPIt=kt+i,Tt,Ot=Lt(i)Mt(i) (2.8)

where the last line holds because, conditionally on It=kt+i, the future of the (Markov) process is independent of what happened before.

In the process of getting the probability density of T,O under the same model, Gupta et al. (2019) provided an analytical formula and an algorithm to compute the first ingredient Lt in the case where all individuals are removed upon sampling (i.e. r=1). We thus recall their main result, and adapt it to our slightly different framework, in the next section.

3. Calculation of Lt – The density of observations below t conditioned on past population size

We start this section by presenting the ODEs satisfied by the probability density Lt. This provides us with a numerical algorithm to compute Lt, which we subsequently simplify with analytical results for specific sets of parameters.

3.1. Set of ODEs satisfied by Lt

We can derive the probability density Lt by studying its evolution through time. First, observe that we can express L0 at present time 0. Indeed, provided we know the exact number of individuals living at time 0, the probability to see the tips of the tree is directly driven by the ρ-sampling,

iN0,L0(i)=ρk0(1-ρ)i. (3.1)

We now derive the ODE driving the evolution of Lt through time across any given epoch. We consider an infinitesimal time step δt and list the events which could have happened in the full process between t+δt and t, leading to our observations. Suppose the number of observed lineages in this epoch is k, and the total number of individuals alive is k+i. We emphasize three cases, illustrated in Fig. 2:

  • 1.

    nothing happened with probability 1-γ(k+i)δt

  • 2.
    a birth event happened
    • (a)
      among the k sampled lineages in Tt, and it leads to an extinct or unsampled subtree to the left or to the right, with probability 2λkδt.
    • (b)
      among the i other individuals, with probability λiδt.
  • 3.

    a death event happened among the i particles, with probability μiδt.

Fig. 2.

Fig. 2

Four unobservable scenarios taken into account to derive the ODEs (3.2), (4.1).

These allow us to write, iN0,

Lt+δt(i)=1-γ(k+i)δtLt(i)+λ(2k+i)δtLt(i+1)+μiδtLt(i-1).

Note that for i=0,Lt(i-1) is not defined, but the term cancels out thanks to the factor i.

Subtracting Lt(i) from both sides, dividing by δt and letting δt0, we get the following set of ODEs driving the evolution of Lt,

iN0,L0(i)=ρk0(1-ρ)iL˙t(i)=-γ(k+i)Lt(i)+λ(2k+i)Lt(i+1)+μiLt(i-1). (3.2)

Last, we need to study how Lt changes at punctual events. We call unsampled lineages the lineages that do not appear on the reconstructed phylogenetic tree, i.e. have not been ρ- or ψ-sampled. Note that these unsampled lineages might still be subject to ω-sampling events.

There are 6 types of punctual events that we can come across at time t in the past, listed below and illustrated in Fig. 3. We denote Lt+ the probability just before (i.e. up) the punctual event and Lt- the probability immediately after (i.e. down). One directly gets Lt+ by decomposing it into what must occur below t-, multiplied by the rate of the specific event happening on the infinitesimal time window (t-,t+). We can either find,

  • 1.
    a leaf of Tt, labeled as removed. This is a ψ-sampling with removal event for which the number of unsampled lineages remains constant, and the number of sampled lineages increases by one (going backward in time). It thus gives,
    Lt+(i)=ψrLt-(i). (3.3)
  • 2.
    a leaf of Tt, labeled as non-removed. This is a ψ-sampling without removal event for which one of the unsampled lineage becomes a sampled one (going backward in time). It thus gives,
    Lt+(i)=ψ(1-r)Lt-(i+1). (3.4)
  • 3.
    a sampled ancestor along a branch of Tt, necessarily labeled as non-removed. This is a ψ-sampling without removal event, not impacting the number of sampled or unsampled lineages. It thus gives,
    Lt+(i)=ψ(1-r)Lt-(i). (3.5)
  • 4.
    an occurrence in Ot, labeled as removed. This is a ω-sampling with removal event, for which the number of unsampled lineages increases by one (going backward in time). It thus gives,
    Lt+(i)=ωriLt-(i-1). (3.6)

    Note that here also, for i=0,Lt(-1) is not defined but the term cancels out thanks to the factor i.

  • 5.
    an occurrence in Ot, labeled as non-removed. This is a ω-sampling without removal event, not impacting the number of sampled or unsampled lineages. It thus gives,
    Lt+(i)=ω(k+i)(1-r)Lt-(i). (3.7)
  • 6.
    a branching event between two branches of Tt. The number of sampled lineages decreases by one (going backward in time). It thus gives,
    Lt+(i)=λLt-(i). (3.8)

Fig. 3.

Fig. 3

Six observable punctual events in the data.

Note that these updates can be adapted to the case when we don’t observe the removal status of individuals. The update corresponding to a leaf of T is the sum of updates (3.3), (3.4), the update corresponding to an occurrence event is the the sum of updates (3.6), (3.7), while updates (3.5), (3.8) are unchanged.

This set of ODEs (3.2) together with update Eqs. (3.3), (3.4), (3.5), (3.6), (3.7), (3.8) can be numerically approximated. To do so, we fix a finite upper bound N on the number of hidden individuals and numerically integrate a truncated ODE system. We detail this in the following algorithm to compute an approximation of Lt at any time t.

Algorithm 1: Computes a numerical approximation of Lt for a specific set of times
Input:
 Observed tree and occurrence data (T,O),
 parameters (tor,λ,μ,ψ,ω,ρ,r),
 set of time points (τj)j=1S for which we want to compute the density Lτj(i),
 and the truncation N setting the accuracy of the algorithm.
Output:  A numerical approximation of Lt at times (τj)j=1S,(Lτj(i))i{0,1,,N}j{1,2,,S}. 1:  Pool all (τj) and all branching and sampling times of (T,O) in an ordered list (th)h=1n
2:  Set j=1 and initialize B as a S×(N+1) empty matrix
3:  Set i{0,1,,N},L0(i)=ρk0(1-ρ)i
4:  forh=1,2,,n
5:   Numerically solve the ODE L˙t=ALt on (th-1,th), by computing Lth=e(th-th-1)ALth-1,
6:  where matrix A is a (N+1)×(N+1) tridiagonal matrix with entries given by,
i{0,1,,N}A(i,i)=-γ(k+i)i{0,1,,N-1}A(i,i+1)=λ(2k+i)i{1,2,,N}A(i,i-1)=μi
7:    ifth=τj
8:   Record i,B(j,i)=Lth(i)
9:   Set j=j+1
10:  end if
11:  ifth=tn or th=τSthen
12:    returnB
13:  else ifth is a removed leaf then
14:   Set Lth+=ψrLth-
15:  else ifth is a non-removed leaf then
16:   Set i<N,Lth+(i)=ψ(1-r)Lth-(i+1) and Lth+(N)=0
17:  else ifth is a sampled ancestor then
18:   Set Lth+=ψ(1-r)Lth-
19:  else ifth is a removed occurrence then
20:   Set i>0,Lth+(i)=ωriLth-(i-1) and Lth-(0)=0
21:  else ifth is a non-removed occurrence
22:   Set Lth+(i)=ω(1-r)(k+i)Lth-(i)
23:  elseth is a branching event
24:   Set Lth+=λLth-
25:  end if
26: end for

We also define a slight variation of this algorithm, that we will refer to as Algorithm 1’, where no set of time points (τj) is required, and the values of Lt are not recorded through time (i.e. matrix B disappears). Instead, when reaching tn=tor we simply return Lt(0), which by definition is an estimate of the probability density of (T,O). Note that this strategy is identical to what has been used to compute the probability density of a reconstructed phylogenetic tree under a logistic birth-death process (Leventhal et al., 2013).

These two algorithms will prove useful to deal with the general case. Furthermore, we may obtain analytical expressions for Lt when ω=0 as well as when r=1 (Gupta et al., 2019). We reveal these in the next two subsections.

3.2. Special case ω=0

Suppose we can express Lt(i) as the product Lt(i)=utiWt where Wt is a function of time only, and ut is defined as in Eq. 2.2. We first get, from the initialization in Eq. (3.2), that W0=ρk0. Moreover, substituting utiWt in the ODE leads to

L˙t(i)=iuti-1u˙tWt+utiW˙t=λiuti+1-γiuti+μiuti-1Wt+utiW˙t.

Thus leading to the following ODE for Wt, on any epoch (th,th+1) where the number of sampled lineages remains fixed and equal to k,

utiW˙t=-γ(i+k)uti+λ(2k+i)uti+1+μiuti-1-λiuti+1+γiuti-μiuti-1WtW˙t=(2λut-γ)kWt.

This is very close to the ODE (2.3) governing the evolution of pt, and it leads to (see derivation in Appendix A),

t(th,th+1),Wt=Wthptpthk. (3.9)

Last, because ω=0, updates (3.3), (3.4), (3.5), (3.6), (3.7), (3.8) simplify to only the following ψ- and λ-events,

iftis a removed leaf,Wt+=ψrWt- (3.10)
iftis a non-removed leaf,Wt+=ψ(1-r)utWt- (3.11)
iftis a sampled ancestor,Wt+=ψ(1-r)Wt- (3.12)
iftis a branching time,Wt+=λWt-. (3.13)

Combining these updates with Eq. (3.9) leads to the following proposition.

Proposition 3.1

When ω=0, at any time t across epoch (th,th+1), considering that we observed so far – i.e. on (0,th+1)v sampled ancestors, w removed leaves at times tjW, x branching events at times tjX, y non-removed leaves at times tjY, we get,

Lt(i)=utiWtwhereWt=λxψv+w+y(1-r)v+yrwptkttjXptjtjYutjptj-1tjWptj-1.

Proof

We prove this proposition by induction across the epochs in Appendix E, using as the main arguments the equation updates (3.10), (3.11), (3.12), (3.13), combined with Eq. (3.9).

Note that this proposition is very similar to what is presented in Section 3 by Gupta et al. (2019). We nevertheless need to highlight two differences.

The first one is that we allow here for removal or not of the individual upon sampling, with a given probability r, whereas Gupta et al. (2019) considered that all individuals were removed upon sampling (r=1), and Stadler (2010) considered that individuals were not removed upon sampling (r=0).

The second difference concerns the underlying framework under which we derive our results. In Gupta et al. (2019), individuals where distinguishable (say, each one is assigned a number and they can be ordered), whereas in the present paper they are not. When individuals are ordered, the probability density Lt(i) is changed by a factor (k+i)!i!, which is the number of ways we can arrange k+i elements in a list of size k, i.e. the number of ordered configurations of hidden individuals.

Note that, when reaching the origin of the tree, the formula in Proposition 3.1 reduces to a very similar formula for the probability density of T because i=0 and k=1. We summarize this as the following corollary.

Corollary 3.1.1

When ω=0, the probability density of a reconstructed tree T with v sampled ancestors, w removed leaves at times tjW, y non-removed leaves at times tjY, and branching events at times tjX, is

P(T)=λw+y+k0-1ψv+w+y(1-r)v+yrwtjX{tor}ptjtjYutjptj-1tjWptj-1 (3.14)

Proof

It directly follows from Proposition 3.1, by noting that P(T)=Ltor(0). Note also that a rooted binary tree with w+y+k0 leaves shows necessarily x=w+y+k0-1 branching times.

Note that this formula is a straightforward generalization of formulas provided in Stadler (2010) (where r=0) or Stadler et al. (2011) (where ρ=0).

3.3. Special case r=1

When r=1, only three kinds of punctual events, corresponding to updates (3.3), (3.6), (3.8) need to be taken into account. Because the number of unsampled individuals i goes into formula (3.6), the simple expression Lt(i)=utiWt cannot be considered anymore, and one needs to find another expression. This has already been done in Gupta et al. (2019) and we only need to adapt here their result to our slightly different framework.

Proposition 3.2

When r=1, we can compute the Lt(i) values at any time t as

Lt(i)==0qi!(i-)!uti-Wt().

where Wt is a q dimensional time-varying vector which can be computed following Algorithm 2 in Gupta et al. (2019).

Proof

The proof relies on the definition of a distinguishable version of the probability Lt(i) as

Lt(i)=(k+i)!i!Lt(i) (3.15)

which allows us to use results previously derived in Gupta et al. (2019). Details are provided in Appendix B.

Note that when there is no ω-sampling, then q=0 for all times and Wt(0) is the same as Wt defined in the previous section.

This ends our section on the computation of Lt. It thus remains to (i) present a way to compute Mt and (ii) combine Lt and Mt to get the target distribution Kt at any time t. We do this in turn in the next two sections.

4. Calculation of Mt – the joint density of observations above t and past population size

Recall that we are now interested in computing the joint density of observations above time t and past population size at time t, i.e. iN0,Mt(i)P(Tt,Ot,It=kt+i). We start by presenting the ODEs satisfied by Mt, before turning to its resolution for specific parameter sets. The approach is very similar to the one presented in the previous section to compute Lt, with the slight difference that we will need to traverse the tree forward in time instead of backward in time.

4.1. Set of ODEs satisfied by Mt

At the time of origin of the process tor, we only observe one starting lineage in Ttor. This provides us with the following initialization condition on M,

Mtor(i)=P(Itor=1+i)=1i=0.

We then derive the ODEs driving the evolution of Mt across an epoch on which the number of observed lineages is fixed and equal to k. Suppose we know Mt, and we observe no punctual event on the infinitesimal time interval (t-δt,t). Unobservable events have already been illustrated in Fig. 2. It allows us to get

Mt-δt(i)=1-γ(i+k)δtMt(i)+λ(2k+i-1)δt1i>0Mt(i-1)+μ(i+1)δtMt(i+1).

Subtracting Mt(i) from both sides, multiplying by -1,dividing by δt and letting δt0, we get the following set of ODEs driving the evolution of Mt,

iN0,Mtor(i)=1i=0M˙t(i)=γ(i+k)Mt(i)-λ(2k+i-1)1i>0Mt(i-1)-μ(i+1)Mt(i+1). (4.1)

Last, we need to take into account the evolution of Mt at punctual events. Again, there are 6 types of punctual events that we can come across at time t in the past, listed below and illustrated in Fig. 3. We denote Mt- the probability just after (i.e. below) the punctual event and Mt+ the probability immediately before (i.e. up). Because we are here deriving Mt forward in time, one needs to carefully note differences with results derived in Section 3 relating to the number of lineages before and after the event. We can indeed find the same punctual events, namely,

  • 1.
    a leaf of Tt, labeled as removed. This is a ψ-sampling with removal event for which the number of sampled lineages decreases by one and the number of unsampled lineages remains unchanged. This gives,
    Mt-(i)=ψrMt+(i). (4.2)
  • 2.
    a leaf of Tt, labeled as non-removed. This is a ψ-sampling without removal event for which one sampled lineages becomes unsampled. This gives,
    Mt-(i)=ψ(1-r)1i>0Mt+(i-1). (4.3)
  • 3.
    a sampled ancestor along a branch of Tt, necessarily labeled as non-removed. This is a ψ-sampling without removal event which does not affect the number of lineages. It gives,
    Mt-(i)=ψ(1-r)Mt+(i). (4.4)
  • 4.
    an occurrence in Ot, labeled as removed. This is a ω-sampling with removal event, for which the number of unsampled lineages decreases by one. This gives,
    Mt-(i)=ωr(i+1)Mt+(i+1). (4.5)
  • 5.
    an occurrence in Ot, labeled as non-removed. This is a ω-sampling without removal event which does not affect the number of lineages. It gives,
    Mt-(i)=(k+i)ω(1-r)Mt+(i). (4.6)
  • 6.

    a branching event between two branches of Tt.

    This is a λ-event increasing the number of sampled lineages by one. This gives,
    Mt-(i)=λMt+(i). (4.7)

Finally, upon reaching present time 0, one needs to take into account the ρ-sampling, leading to the following update,

M0-(i)=(1-ρ)iρk0M0+(i). (4.8)

Note, as for Lt, that these updates can be adapted to the case when we do not observe the removal status of individuals. The update corresponding to a leaf of T is the sum of updates (4.2), (4.3), the update corresponding to an occurrence event is the the sum of updates (4.5), (4.6), while updates (4.4), (4.7) are unchanged.

As already exhibited for Lt, we can build a similar algorithm to compute Mt in the general case, relying on a numerical ODE solver for approximating Eq. (4.1). As for Algorithm 1’ previously introduced to compute the probability density of (T,O), a slight variation of this algorithm would allow one to compute an estimate of the probability density of (T,O) by summing the M0(i)’s over all i. Note that this strategy is identical to what has been used to compute the probability density of a reconstructed phylogenetic tree under a logistic birth-death process (Etienne et al., 2012, Laudanno et al., 2020).

While this approach is in theory a good approximation, it requires fixing arbitrarilly a truncation parameter N, and exponentiating matrices of dimension N×N, leading to potential speed or accurracy issues. In the remainder of this section, we derive analytical results to avoid resorting to a numerical ODE solver in specific cases.

4.2. The corresponding generating function

We introduce now the generating function corresponding to the density Mt, which will prove useful to get analytical results,

M^(t,z)i=0ziMt(i).

The initial condition on M translates into, z,M^(tor,z)=1. The ODE (4.1) furthermore translates into the following partial differential equation (PDE),

tM^=i=0ziγ(i+k)Mt(i)-λ(2k+i-1)1i>0Mt(i-1)-μ(i+1)Mt(i+1)=γki=0ziMt(i)+γi=1iziMt(i)-λi=0zi+1(2k+i)Mt(i)-μi=1izi-1Mt(i)=γkM^+γzzM^-2kλzM^-λz2zM^-μzM^=-k(2λz-γ)M^-(λz2-γz+μ)zM^.

Our target generating function M^ is thus the solution of the following PDE problem across a given epoch (th-1,th), on which the number of observed lineages remains constant and equal to k,

M^(th,z)=F(z)tM^+(λz2-γz+μ)zM^+k(2λz-γ)M^=0. (4.9)

Solving this PDE problem allows us to obtain an analytical expression of M^ for any time across an epoch, provided we know the expression of M^(th,z) at the end of the epoch.

Proposition 4.1

The solution to the PDE problem (4.9) is given by

M^(t,z)=Fu(th-t,z)R(th-t,z)k

where we introduce R(t,z)=p(t,z)/(1-z) to ease the notation.

Proof

We used the method of characteristics to solve this first order linear PDE, see derivations in Appendix C.

Between epochs, one must also update M^ according to punctual events taking place. Previously presented updates of M (Eqs. (4.2), (4.3), (4.4), (4.5), (4.6), (4.7)) translate into the following updates for M^,

if t is a removed leaf,

M^(t-,z)=i=0ziψrMt+(i)=ψrM^(t+,z) (4.10)

if t is a non-removed leaf,

M^(t-,z)=i=0ziψ(1-r)1i>0Mt+(i-1)=ψ(1-r)zM^(t+,z) (4.11)

if t is a sampled ancestor,

M^(t-,z)=i=0ziψ(1-r)Mt+(i)=ψ(1-r)M^(t+,z) (4.12)

if t is a removed occurrence,

M^(t-,z)=i=0ziωr(i+1)Mt+(i+1)=ωrzM^(t+,z) (4.13)

if t is a non-removed occurrence,

M^(t-,z)=i=0ziω(1-r)(k+i)Mt+(i)=ω(1-r)kM^(t+,z)+zzM^(t+,z) (4.14)

if t is a branching event,

M^(t-,z)=i=0ziλMt+(i)=λM^(t+,z). (4.15)

If we are interested in the distribution at some point, we can thus start the formula at tor with F(z)=1, and then iteratively alternate between the updates at punctual events and the use of Proposition 4.1 over each epoch. When reaching present time 0, the step of ρ-sampling expressed in Eq. (4.8) moreover translates into,

M^(0-,z)=i=0zi(1-ρ)iρk0M0+(i)=ρk0M^(0+,(1-ρ)z). (4.16)

While this procedure in theory allows us to get the analytical formula of M^ at any time, updates (4.13), (4.14) require differentiating the generating function, greatly complicating the expression of the function after a few occurrences. When ω=0, these two updates disappear and a nice recursion leads to a closed-form formula that we will detail in Proposition 4.3.

We implemented this procedure in the SageMath programming language able to deal with symbolic calculus. We were however not able to make it find concise expressions, and computing these successive derivatives was too time-consuming to be applicable to standard datasets in the field. Instead, when ω0.

We suggest another strategy for computing the Mt(i)’s, namely approximating M^ across punctual events by a polynomial of order N , l=0NMt(l)zl, while still relying on Proposition 4.1 to drive the evolution of the probability generating function between events. This is a more efficient alternative to numerically solving the ODE system. We only need to derive the expression of the generating function at punctual events as given in the following Proposition 4.2.

Proposition 4.2

The derivatives in z=0 of a generative function which can be expressed as

M^(th-t,z)R(th-t,z)kl=0NMth(l)u(th-t,z)l

can be numerically computed using the formula

ziM^(th-t,z)z=0=Δλ2e-Δ(th-t)kα=0il=αNMth(l)iαl!(l-α)!m=0i-α-1(2k+l+m)x1x2l-α-x1+x2e-Δ(th-t)α1-e-Δ(th-t)l+i-2αx2-x1e-Δ(th-t)-(2k+l+i-α).

Proof

The derivation is detailed in Appendix D.1.

This derivation is at the heart of Algorithm 2, allowing to follow the evolution of the Mt(i)’s through each epoch, as well as at times when we want to record them.

We will refer to Algorithm 2’ as the slight variation of this algorithm aimed at computing the density of (T,O). No set of time points (τj) is required, and the values of Mt are not recorded through time (i.e. matrix B disappears). Instead, when reaching th=t0 we simply return i=0Nρk0(1-ρ)iM(i).

Algorithm 2: Computes a numerical approximation of Mt for a specic set of times
Input: Observed tree and occurrence data (T,O),
 parameters (tor,λ,μ,ψ,ω,ρ),
 set of time points (τj)j=1S for which we want to compute the density,
 and the truncation N setting the accuracy of the algorithm.
Output:  A numerical approximation of Mt at times (τj)j=1S,(Mτj(i))i{0,1,,N}j{1,2,,S}.
1:  Pool all (τj) and all branching and sampling times of (T,O) in an ordered list (th)h=1n
2:  Set j=S and B as a S×(N+1) empty matrix
3:  Set i{0,1,,N},M(i)=1i=0
4:  Set k=1
5:  forh=n-1,n-2,,0do
6:   Compute the values right before the punctual event,
M(i)=Δλ2e-Δ(th-t)kα=0il=αNMth(l)lα1(i-α)!m=0i-α-1(2k+l+m)-x1+x2e-Δ(th-t)αx1x2l-α1-e-Δ(th-t)l+i-2αx2-x1e-Δ(th-t)-(2k+l+i-α)
7:  ifth=τjthen
8:   Record the result in B : i,B(j,i)=M(i)
9:   Set j=j-1.
10:  end if
11:  ifth=0 or th=τS
12:   returnB
13:  els ifth is a removed leaf
14:   Update i,M(i)=ψrM~(i)
15:   Set k=k-1
16:  else ifth is a non-removed leaf
17:   Update M(0)=0 and i>0,M(i)=ψ(1-r)M~(i-1)
18:   Set k=k-1
19:  else ifth is a sampled ancestor
20:   Update i,M(i)=ψ(1-r)M~(i)
21:  else ifth is a removed occurrence
22:   Update i<N,M(i)=ωr(i+1)M~(i+1) and M(N)=0
23:  else ifth is a non-removed occurrence
24:   Update i,M(i)=ω(1-r)(k+i)M~(i)
25:  elseth is a branching event
26:   Update i,M(i)=λM~(i)
27:   Set k=k+1
28:  end if
29: end for

Note that we tried to follow an analogous generating function approach as an alternative to Algorithm 1 to compute Lt as well. This leads to another PDE problem, described in Appendix F, that will require further work to be solved.

4.3. Special case ω=0

We were not able to come with any analytical simplification, as in the previous section, for the case r=1. However, for the special case ω=0, corresponding to the special case leading to the observation of O=, a nice recursion leads to a closed-form formula for M^.

Proposition 4.3

When ω=0, at any time t, considering that we have observed so far –i.e. on (t,tor)v sampled ancestors, w removed leaves at times tjW, x branching events at times tjX, y non-removed leaves at times tjY, we get,

M^(t,z)=λxψv+w+yrw(1-r)v+ytjX{tor}R(tj-t,z)tjWR(tj-t,z)-1tjYu(tj-t,z)R(tj-t,z)-1.

Proof

We prove this result by induction across the epochs of T in Appendix E, using as the main arguments the update Eqs. (4.10), (4.11), (4.12), (4.15), combined with Proposition 4.1 driving the evolution across an epoch.

As a simple corollary of this result, when th=0 is the present, we get back the same probability density formula of T as provided, e.g. in Theorem 3.5 in Stadler (2010) (when r=0), in Section 3 in Gupta et al. (2019) (when r=1), or in our previous Corollary 3.1.1.

Indeed, Proposition 4.3 offers yet another proof of Corollary 3.1.1 by noting that

P(T)=i=0M0-(i)=M^(0-,1)=ρk0M^(0+,1-ρ)

where the last equality follows from Eq. (4.16) taking into account the ρ-sampling at present. Note that this alternative proof is also presented in (Laudanno et al., 2020).

When ω=0, Proposition 4.3 also offers an alternative to Algorithm 2 for deriving Mt. Indeed, resorting to the generating function to get back the probability density, one can get the following corollary.

Corollary 4.3.1

When ω=0, at any time t, considering that we have observed so far –i.e. on (t,tor) – v sampled ancestors, w removed leaves at times tjW, x branching events at times tjX, y non-removed leaves at times tjY, we can compute Mt(i) using the following recursion,

Mt(0)=λxψv+w+yrw(1-r)v+ytjX{tor}R(tj-t,0)tjWR(tj-t,0)-1tjYu(tj-t,0)R(tj-t,0)-1Mt(i)=1iα=1iMt(i-α)C(α)

where we define

C(α)=2ΠtjX{tor}atj-tα-2ΠtjWatj-tα-ΠtjY(atj-tα+btj-tα)at=1-e-Δtx2-x1e-Δt-1bt=x1-x2e-Δtx1x2-x2x1e-Δt-1.

Proof

The probability density Mt(i) can be found back by taking

Mt(i)=1i!ziM^(t,z)z=0.

The result follows from the derivation of these derivatives in Appendix D.2.

This special case ends the section. In the next section, we will combine results from Sections 3 and 4 and use our ability to compute Lt and Mt to compute Kt, the probability distribution of the population size given (T,O).

5. The distribution of past population size conditioned on observations

5.1. The distribution at fixed times

In Section 3, we explained how to compute Lt, the probability density of the observations below time t conditioned on the population size at time t. This relies either on Algorithm 1 in the general case, or on the more optimized Proposition 3.1 in case ω=0, or Proposition 3.2 in the case r=1.

In Section 4, we explained how to compute Mt, the probability density of the observations above time t and the population size at time t. This relies either on Algorithm 2 in the general case, or on the more optimized Corollary 4.3.1 when ω=0. We now combine Lt and Mt to derive the probability distribution of the population size given (T,O). Provided we have stored numerical values (Lτj(i))i{0,1,,N}j{1,2,,S} and (Mτj(i))i{0,1,,N}j{1,2,,S} for a set of time points (τj)j=1S, recall from the first section that we obtain

Kτj(i)=PIτj=kτj+i|T,O=Lτj(i)Mτj(i)P(T,O)Lτj(i)Mτj(i)P(T,O)ifiN,and0otherwise.

Note that the denominator needs only be computed once, by evaluating i=0NLτj(i)Mτj(i) for example at time τj=tor or τj=0 as described in previous sections.

Depending on the parameter space that one wants to consider, it thus remains to arrange pieces stemming from the previous sections. We provide a flowchart in Fig. 4 to guide the reader to chose the most efficient path.

Fig. 4.

Fig. 4

The most efficient results depending on the parameter space considered. In red, results already described in Stadler (2010) and Gupta et al. (2019). In blue, the new contribution of this manuscript. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

5.2. Generator of trajectories

The previous result gives us the distribution of the population size at any time in the past, but does not state anything about population size trajectories. We provide now an approximate way of simulating population size trajectories conditioned on (T,O).

Indeed, recall we have,

Kt(i)P(It=kt+i|T,O)Lt(i)Mt(i)L˙t(i)=-γ(kt+i)Lt(i)+λ(2kt+i)Lt(i+1)+μiLt(i-1)M˙t(i)=γ(kt+i)Mt(i)-μ(i+1)Mt(i+1)-λ(2kt+i-1)1i>0Mt(i-1).

We thus get,

K˙t(i)L˙t(i)Mt(i)+Lt(i)M˙t(i)-γ(kt+i)Lt(i)Mt(i)+λ(2kt+i)Lt(i+1)Mt(i)+μiLt(i-1)Mt(i)+γ(kt+i)Mt(i)Lt(i)-μ(i+1)Mt(i+1)Lt(i)-λ(2kt+i-1)1i>0Mt(i-1)Lt(i)λ(2kt+i)Lt(i+1)Lt(i)Kt(i)+μiLt(i-1)Lt(i)Kt(i)-λ(2kt+i-1)1i>0Lt(i)Lt(i-1)Kt(i-1)-μ(i+1)Lt(i)Lt(i+1)Kt(i+1)Qt(i,i)Kt(i)+Qt(i-1,i)Kt(i-1)+Qt(i+1,i)Kt(i+1). (5.1)

We introduced in the last line the following notation,

Qt(i+1,i)=-μ(i+1)Lt(i)Lt(i+1)Qt(i-1,i)=-λ(2kt+i-1)1i>0Lt(i)Lt(i-1)Qt(i,i)=λ(2kt+i)Lt(i+1)Lt(i)+μiLt(i-1)Lt(i).

Using these, we see that Qt(i,i)=-Qt(i,i+1)+Qt(i,i-1). This allows us to draw trajectories of the number of ancestors in the past as a time-continuous Markov process with the (inhomogeneous) rates Qt written above.

Observe that we could equally write these ODE coefficients using the Mt(i)’s. This gives,

K˙t(i)λ(2kt+i)Mt(i)Mt(i+1)Kt(i+1)+μiMt(i)Mt(i-1)Kt(i-1)-μ(i+1)Mt(i+1)Mt(i)Kt(i)-λ(2kt+i-1)1i>0Mt(i-1)Mt(i)Kt(i)Rt(i+1,i)Kt(i+1)+Rt(i-1,i)Kt(i-1)+Rt(i,i)Kt(i) (5.2)

where we introduced in the last line the following notation,

Rt(i+1,i)=λ(2kt+i)Mt(i)Mt(i+1)Rt(i-1,i)=μiMt(i)Mt(i-1)Rt(i,i)=-λ(2kt+i-1)1i>0Mt(i-1)Mt(i)-μ(i+1)Mt(i+1)Mt(i).

This is a standard result for Markov chains that are conditioned on a final state, and the shape of the newly derived transition kernel is called a Doob’s transform (Levin and Peres, 2017). Note that these transitions symplify for special cases when we have an analytical expression of either Lt(i) or Mt(i).

5.3. Numerical implementation

Results of this paper have been implemented numerically and the code is freely available on GitLab: https://gitlab.com/MMarc/popsize-distribution/.

We used the numerical implementation to verify the correctness of the results in several ways:

  • 1.

    We verified that the values of the probability density of (T,O) computed using Lt and Mt (i.e. respectively using Algorithms 1’ and 2’) were equivalent to values computed using already known formulas when (ω=0,r=0) (Stadler, 2010) or when r=1 (Gupta et al., 2019). See result in Fig. 5AB.

  • 2.

    We verified that the values of the probability density of (T,O) computed using Lt or Mt (Algorithms 1’ and 2’) were identical on examples for which no previous formula was known. See result in Fig. 5C.

  • 3.

    We assessed the distribution of the population size against the only numerical method performing the same goal, the particle filtering developed in Vaughan et al. (2019). We compared values of a few quantiles computed using the two methods, see result in Fig. 5DEF). Note that (Vaughan et al., 2019) considered that we never have data on the removal status of individuals. We thus adapted our developments to this scenario in this specific comparison, by summing updates corresponding to the removal or not of the sampled individuals.

Fig. 5.

Fig. 5

Assessment of the accuracy of the methods presented in this paper, on toy datasets. First row, probability density of data, A) against known analytical formula when ω=0 and (μ,ρ,ψ,r)=(1,0.5,0.3,0.2); B) against known analytical formula when r=1 and (μ,ρ,ψ,ω)=(1,0.5,0.3,0.6); C) obtained using Algorithms 1’ or 2’ otherwise, with (μ,ρ,ψ,r,ω)=(1,0.5,0.3,0.2,0.6). Second row, quantiles of the population size distribution, against the particle filter in Vaughan et al. (2019), with parameters (μ,ρ,ψ,r,ω)=(1,0.1,0.001,0.5,0.001). D) quantile of level 0.2; E) median; F) quantile of level 0.8.

On each of these sanity checks, we verified that different quantities match across different λ values. Note that we could equivalently have chosen any other parameter to be varied.

We also illustrate in Fig. 6 our target distribution Kt of the past population size conditioned on (T,O), on a few simulated examples.

Fig. 6.

Fig. 6

Inferred population size distribution Kt using (T,O) matches the simulated population size trajectory It under three different processes: A) A homogeneous birth-death with ρ-sampling at present; B) A homogeneous birth-death with ρ-sampling at present and ψ-sampling through time; C) A homogeneous birth-death process with ρ-, ψ- and ω-sampling. Note that we plot on the same graph kt, the number of observed lineages in the tree, as this is an obvious lower bound in our population size inference.

6. Discussion

The results we have derived in this paper fit into two main categories. The first category concerns results allowing one to compute the probability density of a tree and occurrences, while the second category concerns results allowing one to compute the probability distribution of the population size in the past. We discuss these two categories below, before presenting ideas for future extensions of the model.

6.1. Using the probability density of the data

We present in this article new ways to compute the probability density of the data, P(T,O). For the special cases (ω=0,r=0) or (r=1), efficient calculations are available in Stadler, 2010, Gupta et al., 2019. Our two Algorithms 1’ and 2’ have the potential to improve the computation time of P(T,O) also when ω0 and r1. When analysing data, as described below, often this probability density is conditioned on sampling at least one individual, using utor (Stadler, 2012).

In the case that the tree is known, we can use P(T,O|λ,μ,ρ,ψ,r,ω,tor) (with conditioning on sampling at least one individual) to obtain maximum likelihood parameter estimates for the birth-death parameters λ,μ as well as the sampling parameters ρ,ψ,r,ω. For special cases of this model, it has been shown that not all sampling parameters are identifiable (see e.g. Stadler et al., 2019). Future work will involve investigating which of the sampling parameters in the general model can be estimated.

On the other hand, data may consist of sequencing data A and occurrence data O. Bayesian tools are then typically employed to obtain a sample from the posterior distribution of the parameters using Markov chain Monte Carlo methods. The posterior distribution is,

f(T,θ,λ,μ,ρ,ψ,ω,tor|O,A)f(A|T,θ)f(O,T|λ,μ,ρ,ψ,r,ω,tor)f(λ,μ,ρ,ψ,r,ω,θ,tor),

with θ summarizing the parameters of the model of molecular evolution and f(λ,μ,ρ,ψ,r,ω,θ,tor) being the prior distribution on the model parameters.

6.2. Probability distribution of past population sizes

The main results of this paper allow oneto compute the probability distribution of the population size in the past and to generate population size trajectories conditioned on (T,O) (Section 5).

Given a tree and occurrences together with birth-death parameters (which may be the maximum likelihood parameters obtained based on the tree and record of occurrences), we can simulate the distribution of past population sizes as described in Section 5.2. Furthermore, we can calculate the probability of a population size at any time in the past as described in Section 5.1.

If we are instead provided with sequencing data A and occurrence data O, and want to generate a simulated ensemble characterizing the posterior distribution of past population size trajectories I, we can use the following strategy. The posterior distribution is,

f(T,I,θ,λ,μ,ρ,ψ,ω,tor|O,A)=f(I|T,θ,λ,μ,ρ,ψ,ω,tor,O,A)f(T,θ,λ,μ,ρ,ψ,ω,tor|O,A)

We have described above how to obtain a sample from the posterior distribution f(T,θ,λ,μ,ρ,ψ,ω,tor|O,A) using Markov chain Monte Carlo. For each sample of (T,θ,λ,μ,ρ,ψ,ω,tor) thus obtained, we can simulate an appropriately conditioned population size trajectory I as described in Section 5.2. The ensemble of trajectories thus generated has the required distribution. We can employ an analogous procedure if we are interested in the posterior probability distribution of the population size at a particular time t. For each posterior sample of (T,θ,λ,μ,ρ,ψ,ω,tor), we can calculate the population size distribution at time t using Section 5.1. The posterior population size at time t is then the average over all these conditional distributions.

6.3. Increased efficiency opens new research avenues

Both the density P(T,O) and the probability distribution of the population size in the past (Kt) can be obtained using the Monte-Carlo particle filtering algorithm developed in Vaughan et al. (2019). The new approach presented in this paper is nevertheless appealing for two reasons. First, it provides a direct link with previous analytical formulas developed in Stadler, 2010, Gupta et al., 2019, thus improving our understanding of these processes and leading to very efficient results in the specific case where ω=0. Second, Algorithms 1 and 2 have the potential to be more efficient alternatives to the Monte-Carlo particle filtering algorithm. Computing quantiles shown in Fig. 5DEF using the particle filtering took a few days, as compared to a few minutes with our method, mainly because it can be applied directly on a fixed tree and does not need to be part of a MCMC. A more thorough quantitative comparison of both approaches would require to implement this work in a MCMC framework, which is beyond the scope of this paper.

This increased efficiency could open up the possibility to analyse much bigger datasets in the near future. In macroevolution, the study of clades with a huge fossil record like cetaceans could benefit from our approach. This dataset is characterized by a rather small number of extant species and fossils with morphological data available (respectively ρ-sampled and ψ-sampled species), but includes a huge number of fossils without morphological data (ω-sampled species) (Morlon et al., 2011, Barido-Sottani et al., 2019). For the cetaceans as well as many other clades, it will be of great interest to compute diversity estimates under the modelling framework presented here (assuming ρ0,ω0,r=0). Ultimately, all ω-samples could be taken into account to inform the tree and diversity estimates.

In the context of epidemiology, typically, the genetic sequences of the pathogen are only available for a fraction of the infected individuals. These correspond to ψ-samples, while other sampled infected individuals correspond to ω-samples. Further developing our approach in a Bayesian framework, both the genetic sequences and the record of occurrence could be jointly used to estimate the underlying transmission tree and prevalence of the disease through time. Depending on the cost of sequencing and the ability of numerical methods to handle some critical amount of both genetic sequences and number of occurrences, optimal sampling procedure could be investigated, to make the most of both types of data.

Finally, while improving on current methods, these two Algorithms 1 and 2 still only provide approximations of, respectively, Lt and Mt, that critically rely on the truncation parameter of the state space N. Increasing N leads to a more accurate approximation, while increasing the runtime of the method. If the probability mass of the number of hidden individuals is non-negligible above N, both algorithms will lead to very poor approximations of Lt and Mt. This value should thus be carefully chosen in empirical applications, depending on what is expected with the data at hand. We point out that the behaviour of these algorithms strongly relies on the runtime and accurracy of the matrix exponentiation steps. Numerous matrix exponentiation methods have been proposed in the literature (Moler and Van Loan, 2003). In our current implementation, we rely on a recent matrix exponentiation method already implemented in scipy (Al-Mohy and Higham, 2010). Future avenues towards improving this specific step could focus on new theoretical results adapted to tridiagonal matrices (Smith and Shahrezaei, 2015) or alternatively try to adapt Laplace transform approximations derived in Crawford et al. (2014), who present theoretical results bounding the errors made in their approximation.

6.4. Future extensions

Our proposed modelling framework lends itself well for various biologically realistic extensions to allow closer fit to empirical data in a variety of situations.

The first extension that we envision is to relax the assumption of rate homogeneity and instead work with time-varying rates. This has already been considered in different studies relying on birth-death processes, either with exponentially varying functions (Morlon et al., 2011) or with piecewise constant rates (a model dubbed as skyline birth-death process, see Stadler et al., 2013, Gavryushkina et al., 2016). As all our results can be straightforwardly adapted to such a framework, this would not require much theoretical work. However, the challenge would be to do so without overfitting the data.

Another popular extension that has been described in the literature on birth-death processes for phylodynamics is to consider multi-type birth-death processes (Maddison et al., 2007). Each individual is assigned a type, which impacts its propensity to give birth to other types. All sampling-related parameters can also be considered type-dependent. The main challenge here boils down to dealing with an increase of dimensionality, because we would be interested in the joint distribution of all subpopulation sizes. This extension is particularly interesting for epidemiological applications, when different populations of infected individuals, clustered according to some characteristic (e.g. patient behaviour or geography) might have very different dynamics (Stadler and Bonhoeffer, 2013).

Finally, we are very hopeful that this piece of work could be applied as well to density-dependent birth-death processes, also known as logistic birth-death models. Indeed, very similar ideas to the breadth-first forward and backward traversals as applied in Algorithms 1’ and 2’ appear in the context of logistic birth-death models (Etienne et al., 2012, Leventhal et al., 2013, Laudanno et al., 2020). Preliminary results obtained by adapting our numerical algorithms to this framework are very encouraging, and we are currently in the process of deriving as much analytical results as we can to speed up the method. We are hoping to present this in a subsequent paper.

6.5. Conclusion

This manuscript presents a way to efficiently compute the distribution of the past population size in a linear birth-death process, conditioned on the observation of a reconstructed phylogenetic tree and a record of occurrences through time. Such data are very common in macroevolution where the reconstructed phylogenetic tree of extant species is available together with occurrences from the fossil record. In epidemiology, pathogen genetic sequencing data and case count data are a common data source. Our method thus promises to allow efficient quantification of past population sizes, representing past biodiversity or past prevalence, from these rich datasets.

We believe that this method also paves the way for the consideration of more complex and more realistic demographic scenarios, assuming either time-dependent (Morlon et al., 2011, Stadler et al., 2013, Gavryushkina et al., 2016) or density-dependent parameters (Etienne et al., 2012, Leventhal et al., 2013), potentially catering for populations with multiple demographic categories/types (Maddison et al., 2007, Stadler and Bonhoeffer, 2013, Freyman and Höhna, 2018). It is our hope that this manuscript will foster important research advances for unravelling demographic histories in epidemiology, macroevolution, and any other fields where birth-death processes form a relevant model framework.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Marc Manceau: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft. Ankit Gupta: Validation, Investigation, Writing - original draft, Visualization. Timothy Vaughan: Validation, Investigation, Writing - review & editing, Supervision. Tanja Stadler: Conceptualization, Resources, Writing - review & editing, Supervision, Funding acquisition.

Acknowledgements

The authors are grateful to Rachel Warnock for helpful discussion on potential applications of the model, and Alex Zarebski for his thorough examination of this manuscript. A.G. is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement no. 743269 (CyberGenetics project)

Contributor Information

Marc Manceau, Email: marc.manceau@bsse.ethz.ch.

Tanja Stadler, Email: tanja.stadler@bsse.ethz.ch.

Appendix A. Solving well-known ODEs

A.1. The extinction probability

We first deal with Eq. (2.1) governing ut, and start by studying the polynomial λx2-γx+μ.

This polynomial has discriminant Δ=γ2-4λμ>(λ+μ)2-4λμ(λ-μ)20. Note that the first inequality holds in the case we are interested in because we can assume that ψ+ω>0. When this is not the case, one needs to consider that λμ. Roots are

x1=γ-Δ2λandx2=γ+Δ2λ.

Moreover, we know that both roots are positive because Δ<γ2Δ<γx1>0.

On an interval including zero and where the polynomial remains positive (as (-,x1) for example), we can write,

duλu2-γu+μ=dtdu(x1-u)(x2-u)=λdt1x2-x11x1-u-1x2-udu=λdt1x1-u-1x2-udu=Δdt.

Integrating both sides between time 0 and t, we get

x2-utx1-ut=x2-zx1-zeΔtx2(x1-z)e-Δt-ut(x1-z)e-Δt=x1(x2-z)-ut(x2-z)ut(x2-z)-(x1-z)e-Δt=x1(x2-z)-x2(x1-z)e-Δtut=x1(x2-z)-x2(x1-z)e-Δt(x2-z)-(x1-z)e-Δt.

This is the result stated in Eq. (2.2). Note that this quantity is called p0(t) in Stadler (2010), or E(t) in Maddison et al. (2007).

A.2. Probability to leave only one sampled descendent

We aim here to integrate a slight variation of Eq. (2.3) governing pt when k=1. The equation we are interested in is,

dWsds=(2λu(s,z)-γ)kWs (A.1)
dWsWs=2λkx1(x2-z)-x2(x1-z)e-Δs(x2-z)-(x1-z)e-Δs-γkds=2λkx1ΔΔ(x2-z)eΔs(x2-z)eΔs-(x1-z)-2λkx2Δ(x1-z)Δe-Δs(x2-z)-(x1-z)e-Δs-γkds.

All these three terms can be integrated visually between some time th and t, leading to,

lnWtWth=2λkx1Δln(x2-z)eΔs-(x1-z)tht-2λkx2Δln(x2-z)-(x1-z)e-Δstht-γk(t-th)=2λkx1Δln(x2-z)eΔt-(x1-z)(x2-z)eΔth-(x1-z)-2λkx2Δln(x2-z)-(x1-z)e-Δt(x2-z)-(x1-z)e-Δth-γk(t-th)=-2λk(x2-x1)Δln(x2-z)-(x1-z)e-Δt(x2-z)-(x1-z)e-Δth-γk(t-th)+2λkx1(t-th)=-2kln(x2-z)-(x1-z)e-Δt(x2-z)-(x1-z)e-Δth-kΔ(t-th).

Leading to the final expression below

Wt=Wth(x2-z)-(x1-z)e-Δt(x2-z)-(x1-z)e-Δth-2ke-kΔ(t-th). (A.2)

Note that the case k=1,th=0 and W0=1-z corresponds to the probability pt given as Eq. (2.4),

p(t,z)=(1-z)Δλ2(x2-z)-(x1-z)e-Δt-2e-Δt.

while the general case can be expressed using function p as

Wt=Wthp(t,z)p(th,z)k.

A.3. A few useful properties

Solutions u(t,z) and p(t,z) to ODEs (2.1), (2.3) satisfy two properties relying on the semi-group property of solutions of ODEs, namely,

u(t2-t1,u(t1,z))=u(t2,z) (A.3)
p(t2-t1,u(t1,z))=p(t2,z)p(0,u(t1,z))p(t1,z)=p(t2,z)(1-u(t1,z))p(t1,z). (A.4)

These two properties are useful in many calculations throughout this document, e.g.

  • Solving the main PDE in Appendix C requires inverting u, using the first property with,
    z=u(t-th,z0)u(th-t,z)=u(th-t,u(t-th,z0))u(th-t,z)=u(0,z0)z0=u(th-t,z).
  • The same Appendix section requires also composing function p and u, using
    p(t-th,u(th-t,z))p(0,u(th-t,z)=p(0,z)p(th-t,z).
  • In the proof of Proposition 4.3, we switch to the notation R(t,z)=p(t,z)/(1-z) and again compose R and u in the same way,
    p(tj-t,u(th-t,z))p(0,u(th-t,z))=p(tj-t,z)p(th-t,z)p(tj-t,u(th-t,z))1-u(th-t,z)=p(tj-t,z)p(th-t,z)R(tj-th,u(th-t,z))=R(tj-t,z)R(th-t,z).

Appendix B. Link with previous work by Gupta et al. (2019)

We aim here at providing details to link this work with results previously derived by Gupta et al. (2019), allowing efficient computation of Lt(i) in the special case r=1.

To do so, we define the distinguishable version of the probability Lt(i) as

Lt(i)=(k+i)!i!Lt(i). (B.1)

We now derive the ODE for Lt(i). Multiplying both sides of (3.2) by (k+i)!i!we obtain

L˙t(i)=-γ(k+i)(k+i)!i!Lt(i)+λ(2k+i)(k+i)!i!Lt(i+1)+μi(k+i)!i!Lt(i-1)=-γ(k+i)(k+i)!i!Lt(i)+λ(k+i)(2k+i)(i+1)(k+i+1)(k+i)(k+i+1)!(i+1)!Lt(i+1)+μ(k+i)(k+i-1)!(i-1)!Lt(i-1)=-γ(k+i)Lt(i)+λ(k+i)ϕi,kLt(i+1)+μ(k+i)Lt(i-1), (B.2)

where

ϕi,k=(2k+i)(i+1)(k+i+1)(k+i)=1-k(k-1)(k+i+1)(k+i)

is the probability that a coalescing pair of randomly chosen lineages (from (k+i+1) total lineages) does not consist of two sampled lineages. This shows that Lt(i) satisfies the ODE (B.2) across any epoch. One can see that at punctual events the transition conditions (3.3), (3.8) hold for Lt(i) for ψ-sampling and branching events respectively. Moreover at ω-sampling events the transition condition (3.6) transforms to

Lt+(i)=ω(k+i)Lt-(i-1).

With these transition conditions and initial condition L0(i)=(k0+i)!i!L0(i)=(k0+i)!i!(1-ρ)k0ρi, the ODE (B.2) was solved explicitly in Gupta et al. (2019) and the solution is of the form

Lt+(i)==0q(k+i)!(i-)!uti-Wt()

where q is the number of ω-sampling events in the time-interval [0,t) and the (q+1)-dimensional time-varying vector Wt=(Wt(0),,Wt(q)) can be analytically computed following the approach in Gupta et al. (2019). Therefore from (B.1) we state Proposition 3.2.

Appendix C. Solving the main PDE

We aim now at finding an analytical solution for the following PDE, driving the evolution of M^ across a given epoch (th-1,th), on which the number of observed lineages remains constant and equal to k,

M^(th,z)=F(z)tM^+(λz2-γz+μ)zM^+k(2λz-γ)M^=0.

C.1. Principle of the method of characteristics

This problem can be solved by the method of characteristics. We suppose that we can write M^(t,z)=M^(t(s),z(s)) where functions t and z satisfy the ODEs,

dzds=λz2-γz+μdtds=1.

This way, the function g(s)=M^(t(s),z(s)) satisfies another ODE, that we will have to solve,

dgds=dzdszM^+dtdstM^=(λz2-γz+μ)zM^+tM^=-k(2λz-γ)M^dgds+k(2λz-γ)g=0.

C.2. Step 1, solve for t(s),z(s) and g(s)

We start by integrating t(s). We moreover fix that t(0)=th, thus leading to t(s)=th+s.

We now turn to z, and notice that it satisfies previously studied ODE (2.1). Integrating between 0 and s leads to,

z(s)=u(s,z0)=x1(x2-z0)-x2(x1-z0)e-Δt(x2-z0)-(x1-z0)e-Δt.

Last, g satisfies an ODE very similar to (A.1). Taking care of the minus sign, it leads to the following result,

gs=g01R(s,z0)k. (C.1)

C.3. Step 2, express M^ back as a function of t,z

We want to express our two unknown quantities s and z0 as functions of t and z.

On a first hand, we get easily s=t-th. We moreover can solve for z0 in the following equation, remembering the semi-group property of u,

z=u(t-th,z0)z0=u(th-t,z).

Substituting these into the previous expression (C.1) of gs then leads to,

M^(t,z)=Fu(th-t,z)R(t-th,u(th-t,z))-k=Fu(th-t,z)R(th-t,z)k.

where the first to second equality relies on a property exposed in A.3. This gives us the final formula which is stated in Proposition 4.1.

Appendix D. Some useful algebra

This section of the Appendix pools together all bits of algebra that are not really digestible, but are used in the main text.

D.1. Deivative of M^

We first modify a bit the expression of the generating function,

M^(th-t,z)=R(th-t,z)kl=0NMth(l)u(th-t,z)l=Δλ2e-Δtkl=0NMth(l)x1(x2-z)-x2(x1-z)e-Δtl(x2-z)-(x1-z)e-Δt-(2k+l).

Applying Leibniz’s derivation rule to the product, we get,

ziM^(th-t,z)z=0=Δλ2e-Δtkl=0NMth(l)α=0iiαzαx1(x2-z)-x2(x1-z)e-Δtlz=0
zi-α(x2-z)-(x1-z)e-Δt-(2k+l)z=0. (D.1)

The first of the two derivatives in the sum can be computed as,

zx1(x2-z)-x2(x1-z)e-Δtl=l-x1+x2e-Δtx1(x2-z)-x2(x1-z)e-Δtl-11l1z2x1(x2-z)-x2(x1-z)e-Δtl=l(l-1)-x1+x2e-Δt2x1(x2-z)-x2(x1-z)e-Δtl-21l2zαx1(x2-z)-x2(x1-z)e-Δtl=l!(l-α)!-x1+x2e-Δtαx1(x2-z)-x2(x1-z)e-Δtl-α1lα.

While the second gives us,

z(x2-z)-(x1-z)e-Δt-(2k+l)=(2k+l)1-e-Δt(x2-z)-(x1-z)e-Δt-(2k+l+1)z2(x2-z)-(x1-z)e-Δt-(2k+l)=(2k+l)(2k+l+1)1-e-Δt2(x2-z)-(x1-z)e-Δt-(2k+l+2)zi-α(x2-z)-(x1-z)e-Δt-(2k+l)=m=0i-α-1(2k+l+m)1-e-Δti-α(x2-z)-(x1-z)e-Δt-(2k+l+i-α).

Applying these derivatives in z=0 in Eq. (D.1) yields,

ziM^(th-t,z)z=0=Δλ2e-Δtkα=0il=αNMth(l)iαl!(l-α)!m=0i-α-1(2k+l+m)-x1+x2e-Δtαx1x2l-α1-e-Δtl+i-2αx2-x1e-Δt-(2k+l+i-α)

which is the expression provided in Proposition (4.2).

D.2. Derivatives of M^ when ω=0

We wish here to derive the ziM^(t,z) where function M^ is as given in Proposition 4.3, i.e.

M^(t,z)=λxψv+w+yrw(1-r)v+yR(tor-t,z)tjXR(tj-t,z)tjWR(tj-t,z)-1tjYu(tj-t,z)R(tj-t,z)-1.

We take for simplicity the derivative of the logarithm of M^ and express the derivatives of M^ using these and Leibniz’s formula,

zM^=M^z(lnM^)z2M^=zM^z(lnM^)+M^z2(lnM^)z3M^=z2M^z(lnM^)+2zM^z2(lnM^)+M^z3(lnM^)ziM^=α=1ii-1α-1zi-αM^zα(lnM^). (D.2)

In order to compute the derivatives of lnM^, one needs to get the derivatives of lnR(t,z) and lnu(t,z). We have

lnR(t,z)=-2ln(x2-z)-(x1-z)e-Δt+lnΔλ2-ΔtzlnR(t,z)=21-e-Δt(x2-z)-(x1-z)e-Δt-1z2lnR(t,z)=21-e-Δt2(x2-z)-(x1-z)e-Δt-2z3lnR(t,z)=41-e-Δt3(x2-z)-(x1-z)e-Δt-3zαlnR(t,z)=2(α-1)!1-e-Δtα(x2-z)-(x1-z)e-Δt-α.

Finally taking the function in z=0 leads to

zαlnR(t,0)=2(α-1)!atα (D.3)
where we definedat1-e-Δtx2-x1e-Δt-1. (D.4)

In the same way we get,

lnu(t,z)=lnx1(x2-z)-x2(x1-z)e-Δt-ln(x2-z)-(x1-z)e-Δtzlnu(t,z)=-x1-x2e-Δtx1(x2-z)-x2(x1-z)e-Δt-1+1-e-Δt(x2-z)-(x1-z)e-Δt-1z2lnu(t,z)=-x1-x2e-Δt2x1(x2-z)-x2(x1-z)e-Δt-2+1-e-Δt2(x2-z)-(x1-z)e-Δt-2z3lnu(t,z)=-2x1-x2e-Δt3x1(x2-z)-x2(x1-z)e-Δt-3+21-e-Δt3(x2-z)-(x1-z)e-Δt-3zαlnu(t,z)=(α-1)!-x1-x2e-Δtαx1(x2-z)-x2(x1-z)e-Δt-α+1-e-Δtα(x2-z)-(x1-z)e-Δt-α.

Here also, we are interested in the function in z=0,

zαlnu(t,0)=(α-1)!(atα-btα) (D.5)
where we definedbtx1-x2e-Δtx1x2-x2x1e-Δt-1. (D.6)

Last ingredient needed to write the derivative of lnM^, we get,

zαlnu(t,z)R(t,z)-1z=0=zαlnu(t,z)z=0-zαlnR(t,z)z=0=-(α-1)!(atα+btα). (D.7)

Finally, using Eq. (D.1)one can compute

zα(lnM^(t,z))z=0=(α-1)!C(α)where we definedC(α)2ator-tα+2tjXatj-tα-2tjWatj-tα-tjY(atj-tα+btj-tα).

Plugging this into Eq. D.2 and noting that ziM^(t,z)z=0=i!Mt(i), we get

Mt(i)=α=1ii-1α-1(i-α)!(α-1)!i!Mt(i-α)C(α)=1iα=1iMt(i-α)C(α)

which is the result stated in Corollary 4.3.1.

Appendix E. Inductions across the epochs

E.1. Proof of Proposition 3.1

We prove the proposition by induction across the epochs.

If we observe only the first epoch and the k0 leaves at present, then we get at any time t across the first epoch (0,t1),Lt(i)=ρk0(1-ρ)i=utiptk0, which satisfies Proposition 3.1.

Suppose we observed so far – i.e. on (0,th+1)v sampled ancestors, w removed leaves at times tjW, x branching events at times tjX, y non-removed leaves at times tjY. And suppose that Proposition 3.1 is verified across epoch (th,th+1). Let us have a look at what happen across epoch (th+1,th+2).

The observed punctual event th+1 can either be,

  • 1.
    a removed ancestral leaf. Update (3.10) then applies. Subsequently, the number of sampled lineages increases by one and formula (3.9) applies on the next epoch, leading to
    Lt(i)=utiWtwhereWt=λxψv+(w+1)+y(1-r)v+yrw+1pth+1kptpth+1k+1tjXptjtjYutjptj-1tjWptj-1=λxψv+(w+1)+y(1-r)v+yrw+1ptktjXptjtjYutjptj-1tjW{th+1}ptj-1.
  • 2.
    a non-removed ancestral leaf. Update (3.11) then applies. Subsequently, the number of sampled lineages increases by one and formula (3.9) applies on the next epoch, leading to
    Lt(i)=utiWtwhereWt=λxψv+w+(y+1)(1-r)v+(y+1)rwpth+1kuth+1ptpth+1k+1tjXptjtjYutjptj-1tjWptj-1=λxψv+w+(y+1)(1-r)v+(y+1)rwptk+1tjXptjtjY{th+1}utjptj-1tjWptj-1.
  • 3.
    a non-removed sampled ancestor along a branch. Update (3.12) then applies. The number of sampled lineages does not changes, and formula (3.9) applies on the next epoch, leading to
    Lt(i)=utiWtwhereWt=λxψ(v+1)+w+y(1-r)(v+1)+yrwpth+1kptpth+1ktjXptjtjYutjptj-1tjWptj-1=λxψv+w+y+1(1-r)v+y+1rwptktjXptjtjYutjptj-1tjWptj-1.
  • 4.
    a branching event between two sampled lineages. Update (3.13) then applies. The number of sampled lineages decreases by one, and formula (3.9) applies on the next epoch, leading to
    Lt(i)=utiWtwhereWt=λx+1ψv+w+y(1-r)v+yrwpth+1kptpth+1k-1tjXptjtjYutjptj-1tjWptj-1=λx+1ψv+w+y(1-r)v+yrwptk-1tjX{th+1}ptjtjYutjptj-1tjWptj-1.

In all four cases, Proposition 3.1 is satisfied across epoch (th+1,th+2).

E.2. Proof of Proposition 4.3

This Proposition is also proven by induction across the epochs.

We start at tor=tn with k=1 lineage. Across epoch (tn-1,tn), applying Proposition 4.1 with F(z)=1 and k=1, we get M^(t,z)=R(tor-t,z), which verifies Proposition 4.3.

Suppose now that Proposition 4.3 is verified across epoch (th,th+1) and that we observed, on (th,tor), v sampled ancestors, w removed leaves at times tjW, x branching events at times tjX, y non-removed leaves at times tjY. Let us have a look at what happens on (th-1,th).

Punctual event th can either be,

  • 1.
    a removed leaf. The number of sampled lineages then goes from 1+x-y-w to x-y-w, and applying update (4.10) followed by Proposition 4.1 leads to
    M^(t,z)=λxψv+(w+1)+yrw+1(1-r)v+yR(tor-th,u(th-t,z))R(th-t,z)x-y-wtjXR(tj-th,u(th-t,z))tjWR(tj-th,u(th-t,z))-1tjYu(tj-th,u(th-t,z))R(tj-th,u(th-t,z))-1=λxψv+(w+1)+yrw+1(1-r)v+yR(tor-t,z)R(th-t,z)R(th-t,z)x-y-wtjXR(tj-t,z)R(th-t,z)tjWR(th-t,z)R(tj-t,z)tjYu(tj-t,z)R(th-t,z)R(tj-t,z)=λxψv+(w+1)+yrw+1(1-r)v+yR(tor-t,z)tjXR(tj-t,z)tjW{th}R(tj-t,z)-1tjYu(tj-t,z)R(tj-t,z)-1.
    where the first to second equality is detailed in Appendix A, and the second to third comes after canceling out the R(th-t,z).
  • 2.
    a non-removed leaf. The number of sampled lineages then goes from 1+x-y-w to x-y-w, and applying update (4.11) followed by Proposition 4.1 leads to
    M^(t,z)=λxψv+w+(y+1)rw(1-r)v+(y+1)R(tor-t,z)R(th-t,z)R(th-t,z)x-y-wu(th-t,z)tjXR(tj-t,z)R(th-t,z)tjWR(th-t,z)R(tj-t,z)tjYu(tj-t,z)R(th-t,z)R(tj-t,z)=λxψv+w+(y+1)rw(1-r)v+(y+1)R(tor-t,z)tjXR(tj-t,z)tjWR(tj-t,z)-1tjY{th}u(tj-t,z)R(tj-t,z)-1.
  • 3.
    a sampled ancestor. The number of sampled lineages then remains unchanged and equal to 1+x-y-w. Applying update (4.12) followed by Proposition 4.1 leads to
    M^(t,z)=λxψ(v+1)+w+yrw(1-r)(v+1)+yR(tor-t,z)R(th-t,z)R(th-t,z)1+x-y-wtjXR(tj-t,z)R(th-t,z)tjWR(th-t,z)R(tj-t,z)tjYu(tj-t,z)R(th-t,z)R(tj-t,z)=λxψ(v+1)+w+yrw(1-r)(v+1)+yR(tor-t,z)tjXR(tj-t,z)tjWR(tj-t,z)-1tjYu(tj-t,z)R(tj-t,z)-1.
  • 4.
    a branching time. The number of sampled lineages then goes from 1+x-y-w to 2+x-y-w, and applying update (4.15) followed by Proposition 4.1 leads to
    M^(t,z)=λx+1ψv+w+yrw(1-r)v+yR(tor-t,z)R(th-t,z)R(th-t,z)2+x-y-wtjXR(tj-t,z)R(th-t,z)tjWR(th-t,z)R(tj-t,z)tjYu(tj-t,z)R(th-t,z)R(tj-t,z)=λx+1ψv+w+yrw(1-r)v+yR(tor-t,z)tjX{th}R(tj-t,z)tjWR(tj-t,z)-1tjYu(tj-t,z)R(tj-t,z)-1.

In all these cases, Proposition 4.3 is verified across epoch (th-1,th), which ends the proof.

Appendix F. Using a generating function to solve for Lt

F.1. A slightly different strategy

Recall that Lt verifies the following ODEs,

L˙t(i)=-γ(i+k)Lt(i)+λ(2k+i)Lt(i+1)+μiLt(i-1)L0(i)=ρ0k0(1-ρ0)i.

If we introduce the corresponding generating function,

L^(t,z)=i=0ziLt(i)

then the initial condition on L translates into,

L^(0,z)=i=0(z(1-ρ))iρk0=ρk011-z(1-ρ),z±11-ρ.

The ODE translates into a PDE, but not as nicely as for Mt, see below,

tL^=i=0zi-γ(i+k)Lt(i)+λ(2k+i)Lt(i+1)+μiLt(i-1)=-γki=0ziLt(i)-γi=1iziLt(i)+λi=1zi-1(2k+i-1)Lt(i)+μi=0(i+1)zi+1Lt(i)=-γkL^-γzzL^+(2k-1)λ1z(L^-Lt(0))+λzL^+μz2zL^+μzL^=-γk+(2k-1)λ1z+μzL^+(μz2-γz+λ)zL^-(2k-1)λ1zL^(t,0).

We are thus left with the following PDE problem,

L^(0,z)=ρk01-z(1-ρ)-ztL^+(μz3-γz2+λz)zL^+(μz2-γkz+(2k-1)λ)L^-(2k-1)λL^(t,0)=0. (F.1)

This remaining term with L^(t,0) complicates things a little bit. However, the initial condition on L^ provides us with a first candidate function to satisfy this PDE.

F.2. Solution

We introduce below function f, and show that it satisfies the PDE problem (F.1).

f(t,z)ptk1-zut.

First, we observe that it satisfies the initial condition. We then need to check that it satisfies the PDE, and to do so we expand each of the four components of Eq. (F.1).

The first one gives us,

-ztf=-zkp˙tptk-1(1-zut)+zu˙tptk(1-zut)2=-zk(2λut-γ)(1-zut)+(λut2-γut+μ)z(1-zut)2ptk=λ(-2kzut-(2k-1)z2ut2)+γ(kz-(k-1)z2ut)-μz2(1-zut)2ptk.

We then turn to the second component,

(μz3-γz2+λz)zf=λzut-γz2ut+μz3ut(1-zut)2ptk.

And the third one,

(μz2-γkz+(2k-1)λ)f=(1-zut)(μz2-γkz+(2k-1)λ)(1-zut)2ptk=λ((2k-1)-(2k-1)zut)+γ(-kz+kz2ut)+μ(z2-z3ut)(1-zut)2ptk.

And the fourth and final one,

-(2k-1)λf(t,0)=-λ(2k-1)(1-zut)2(1-zut)2ptk=λ(-(2k-1)z2ut2+2(2k-1)zut-(2k-1))(1-zut)2ptk.

Putting everything together, we can now check that indeed,

-ztf+(μz3-γz2+λz)zf+(μz2-γkz+(2k-1)λ)f-(2k-1)f(t,0)=0.

While the branching and ψ-sampling with removal updates do not change anything to this solution, all the others do. Further work is thus needed to look for other solutions to this same PDE with different initial conditions.

References

  1. Al-Mohy A.H., Higham N.J. A new scaling and squaring algorithm for the matrix exponential. SIAM J. Matrix Anal. Appl. 2010;31(3):970–989. [Google Scholar]
  2. Andrieu C., Doucet A., Holenstein R. Particle markov chain monte carlo methods. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 2010;72(3):269–342. [Google Scholar]
  3. Barido-Sottani J., Aguirre-Fernández G., Hopkins M.J., Stadler T., Warnock R. Ignoring stratigraphic age uncertainty leads to erroneous estimates of species divergence times under the fossilized birth–death process. Proc. Roy. Soc. B. 2019;286(1902):20190685. doi: 10.1098/rspb.2019.0685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Billaud, O., Moen, D.S., Parsons, T.L., Morion, H., 2019. Estimating diversity through time using molecular phylogenies: Old and species-poor frog families are the remnants of a diverse past. bioRxiv. [DOI] [PubMed]
  5. Crawford F.W., Minin V.N., Suchard M.A. Estimation for general birth-death processes. J. Am. Stat. Assoc. 2014;109(506):730–747. doi: 10.1080/01621459.2013.866565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Etienne R.S., Haegeman B., Stadler T., Aze T., Pearson P.N., Purvis A., Phillimore A.B. Diversity-dependence brings molecular phylogenies closer to agreement with the fossil record. Proc. Roy. Soc. B Biol. Sci. 2012;279(1732):1300–1309. doi: 10.1098/rspb.2011.1439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Foote M. Origination and extinction components of taxonomic diversity: general problems. Paleobiology. 2000;26(S4):74–102. [Google Scholar]
  8. Freyman, W.A., Höhna, S., 11 2018. Stochastic character mapping of state-dependent diversification reveals the tempo of evolutionary decline in self-compatible onagraceae lineages. [DOI] [PubMed]
  9. Gavryushkina A., Heath T.A., Ksepka D.T., Stadler T., Welch D., Drummond A.J. Bayesian total-evidence dating reveals the recent crown radiation of penguins. Syst. Biol. 2016;66(1):57–73. doi: 10.1093/sysbio/syw060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Gray R.D., Drummond A.J., Greenhill S.J. Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science. 2009;323(5913):479–483. doi: 10.1126/science.1166858. [DOI] [PubMed] [Google Scholar]
  11. Gupta, A., Manceau, M., Vaughan, T., Khammash, M., Stadler, T., 2019. The probability distribution of the reconstructed phylogenetic tree with occurrence data. bioRxiv, 679365 [DOI] [PubMed]
  12. Heath T.A., Huelsenbeck J.P., Stadler T. The fossilized birth–death process for coherent calibration of divergence-time estimates. Proc. Nat. Acad. Sci. 2014;111(29):2957–2966. doi: 10.1073/pnas.1319091111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Kendall D.G. On the generalized ‘birth-and-death’ process. Ann. Math. Stat. 1948;19:1–15. [Google Scholar]
  14. Laudanno G., Haegeman B., Etienne R.S. Additional analytical support for a new method to compute the likelihood of diversification models. Bull. Math. Biol. 2020;82(2):22. doi: 10.1007/s11538-020-00698-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Leventhal G.E., Günthard H.F., Bonhoeffer S., Stadler T. Using an epidemiological model for phylogenetic inference reveals density dependence in hiv transmission. Mol. Biol. Evol. 2013;31(1):6–17. doi: 10.1093/molbev/mst172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Levin, D.A., Peres, Y., 2017. Markov chains and mixing times. Vol. 107. American Mathematical Soc
  17. Maddison W.P., Midford P.E., Otto S.P. Estimating a binary character’s effect on speciation and extinction. Syst. Biol. 2007;56(5):701–710. doi: 10.1080/10635150701607033. [DOI] [PubMed] [Google Scholar]
  18. Moler C., Van Loan C. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev. 2003;45(1):3–49. [Google Scholar]
  19. Morlon H., Parsons T.L., Plotkin J.B. Reconciling molecular phylogenies with the fossil record. Proc. Natl. Acad. Sci. USA. 2011;108(39):16327–16332. doi: 10.1073/pnas.1102543108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Morlon H., Potts M.D., Plotkin J.B. Inferring the dynamics of diversification: a coalescent approach. PLoS Biol. 2010;8(9):1–13. doi: 10.1371/journal.pbio.1000493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Nee S., May R.M., Harvey P.H. The reconstructed evolutionary process. Philos. Trans. Roy. Soc. Lond. B Biol. Sci. 1994;344(1309):305–311. doi: 10.1098/rstb.1994.0068. [DOI] [PubMed] [Google Scholar]
  22. Quental T.B., Marshall C.R. Diversity dynamics: molecular phylogenies need the fossil record. Trends Ecol. Evol. 2010;25(8):434–441. doi: 10.1016/j.tree.2010.05.002. [DOI] [PubMed] [Google Scholar]
  23. Ratmann O., Hodcroft E.B., Pickles M., Cori A., Hall M., Lycett S., Colijn C., Dearlove B., Didelot X., Frost S. Phylogenetic tools for generalized hiv-1 epidemics: findings from the pangea-hiv methods comparison. Mol. Biol. Evol. 2016;34(1):185–203. doi: 10.1093/molbev/msw217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Smith S., Shahrezaei V. General transient solution of the one-step master equation in one dimension. Phys. Rev. E. 2015;91(6) doi: 10.1103/PhysRevE.91.062119. [DOI] [PubMed] [Google Scholar]
  25. Stadler T. Sampling-through-time in birth–death trees. J. Theor. Biol. 2010;267(3):396–404. doi: 10.1016/j.jtbi.2010.09.010. [DOI] [PubMed] [Google Scholar]
  26. Stadler, T., 2011. Inferring speciation and extinction processes from extant species data. Proc. Natl. Acad. Sci. USA [DOI] [PMC free article] [PubMed]
  27. Stadler T. How can we improve accuracy of macroevolutionary rate estimates? Syst. Biol. 2012;62(2):321–329. doi: 10.1093/sysbio/sys073. [DOI] [PubMed] [Google Scholar]
  28. Stadler T., Bonhoeffer S. Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods. Philos. Trans. Roy. Soc. Lond. B Biol. Sci. 2013;368(1614):20120198. doi: 10.1098/rstb.2012.0198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Stadler T., Kouyos R., von Wyl V., Yerly S., Böni J., Bürgisser P., Klimkait T., Joos B., Rieder P., Xie D. Estimating the basic reproductive number from viral sequence data. Mol. Biol. Evol. 2011;29(1):347–357. doi: 10.1093/molbev/msr217. [DOI] [PubMed] [Google Scholar]
  30. Stadler, T., Kühnert, D., Bonhoeffer, S., Drummond, A.J., 2013. Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proc. Nat. Acad. Sci. 110 (1), 228–233 [DOI] [PMC free article] [PubMed]
  31. Stadler T., Steel M. Swapping birth and death: symmetries and transformations in phylodynamic models. Syst. Biol. 2019;68(5):852–858. doi: 10.1093/sysbio/syz039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Starrfelt J., Liow L.H. How many dinosaur species were there? fossil bias and true richness estimated using a poisson sampling model. Philos. Trans. Roy. Soc. B Biol. Sci. 2016;371(1691):20150219. doi: 10.1098/rstb.2015.0219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Vaughan, T.G., Leventhal, G.E., Rasmussen, D.A., Drummond, A.J., Welch, D., Stadler, T., 05 2019. Estimating epidemic incidence and prevalence from genomic data. Mol. Biol. Evol. [DOI] [PMC free article] [PubMed]
  34. Yule G.U. Philos. Trans. Roy. Soc. Lond. B; J.C: 1925. A mathematical theory of evolution, based on the conclusions of Dr. Willis, F.R.S. [Google Scholar]

RESOURCES