The probability distribution of the ancestral population size conditioned on the reconstructed phylogenetic tree with occurrence data

Marc Manceau; Ankit Gupta; Timothy Vaughan; Tanja Stadler

doi:10.1016/j.jtbi.2020.110400

. 2021 Jan 21;509:110400. doi: 10.1016/j.jtbi.2020.110400

The probability distribution of the ancestral population size conditioned on the reconstructed phylogenetic tree with occurrence data

Marc Manceau ^1,^⁎, Ankit Gupta ¹, Timothy Vaughan ¹, Tanja Stadler ^1,^⁎

PMCID: PMC7733867 PMID: 32739241

Highlights

•
Reconstructed phylogenetic trees contain valuable information on population dynamics.
•
Considering records of occurrences through time allows us to get more relevant data.
•
Data is modeled using a birth-death process with specific sampling scheme.
•
The distribution of the population size conditioned on the data is derived.
•
This distribution will foster new advances in macroevolution and epidemiology.

Keywords: Birth-death process, Fossilized birth-death model, Epidemiology, Macroevolution, Phylogenetics

Abstract

We consider a homogeneous birth-death process with three different sampling schemes. First, individuals can be sampled through time and included in a reconstructed phylogenetic tree. Second, they can be sampled through time and only recorded as a point ‘occurrence’ along a timeline. Third, extant individuals can be sampled and included in the reconstructed phylogenetic tree with a fixed probability. We further consider that sampled individuals can be removed or not from the process, upon sampling, with fixed probability. We derive the probability distribution of the population size at any time in the past conditional on the joint observation of a reconstructed phylogenetic tree and a record of occurrences not included in the tree. We also provide an algorithm to simulate ancestral population size trajectories given the observation of a reconstructed phylogenetic tree and occurrences. This distribution can be readily used to draw inferences about the ancestral population size in the field of epidemiology and macroevolution. In epidemiology, these results will allow data from epidemiological case count studies to be used in conjunction with molecular sequencing data (yielding reconstructed phylogenetic trees) to coherently estimate prevalence through time. In macroevolution, it will foster the joint examination of the fossil record and extant taxa to reconstruct past biodiversity.

1. Introduction

Owing to seminal papers by Yule, 1925, Kendall, 1948, and much later by Nee et al. (1994), birth-death models have become ubiquitous in evolutionary biology. They are used as a population dynamic model, parameterized via a birth and death rate, in studies spanning fields as diverse as paleontology, macroevolution, linguistics, and epidemiology (see e.g. Foote, 2000, Heath et al., 2014, Gray et al., 2009, Stadler et al., 2013). A major aim when using these models is to reliably estimate the ancestral number of species, languages or infected individuals, i.e. past biodiversity, past prevalence, or more general past population sizes. In both macroevolution and epidemiology, population dynamics inferences can rely on occurrence data, i.e. the fossil record and the case counts record. This data is modeled as a sampling of individuals from the full population through time (Foote, 2000, Starrfelt and Liow, 2016).

In recent years, impressive sequencing efforts targeting present-day species and pathogens have enabled the reconstruction of phylogenies. Two main modeling approaches allow to quantify past population sizes in the past using these trees. First, phylodynamic tools have been developed to fit the birth and death rates of a birth-death process on the reconstructed phylogenetic tree of interest, while integrating over past population sizes (Stadler, 2011, Morlon et al., 2011). In order to quantify past population sizes, typically the expected population sizes based on these estimated birth and death rates are calculated (Morlon et al., 2011, Ratmann et al., 2016, Billaud et al., 2019).

Thus, such population sizes are not directly conditioned on the reconstructed phylogenetic tree. Instead, the statistical signal in the tree is only used to compute rate estimates. Second, phylodynamic tools have been developed to fit the expected population size of a coalescent model on a reconstructed phylogenetic tree. This modeling approach may appear as a better alternative, for it is directly parametrized with the population size that we wish to estimate. However, this comes at the cost of ignoring stochastic fluctuations in small populations (Morlon et al., 2010, Ratmann et al., 2016).

Statistical approaches stemming from the analysis of case count data or from the analysis of reconstructed evolutionary trees have been part of separate bodies of work for many years, historically yielding conflicts between biodiversity estimates based on the fossil record and estimates based on reconstructed phylogenies of extant taxa (Quental and Marshall, 2010 but see also Morlon et al., 2011). A first path towards merging these disparate data was introduced by the fossilized birth-death model of Stadler (2010), which considered a birth-death model with sampling and inclusion of individuals in the tree through time. This allowed taking into account infection trees reconstructed from pathogen sequences sampled throughout an epidemic (Stadler et al., 2011). In macroevolution, it paved the way to more precise phylogenetic dating using well-conserved fossil taxa which could be placed on a reconstructed phylogeny using morphological characters (Gavryushkina et al., 2016). Not so well-conserved fossils (i.e. occurrences) have also been used with this model, using a Markov Chain Monte Carlo (MCMC) scheme to integrate over all possible placements along a fixed tree (Heath et al., 2014). Analytical developments around this new model have been made by Gupta et al. (2019), which derived an analytical formula for the probability density of an outcome of the process, which consists of a reconstructed phylogenetic tree along with a record of occurrences. Again, all these methods do not quantify population sizes directly, but estimate birth and death rates while analytically integrating over population sizes.

Very recently, Vaughan et al. (2019) introduced a Monte-Carlo particle filtering algorithm allowing direct quantification of past population sizes and birth and death rates conditioned on reconstructed phylogenetic trees and occurrences (see Andrieu et al., 2010 for details about particle filtering methods). As such, it can produce more accurate population size estimates than the methods mentioned above as the estimates directly condition on all data, i.e. the occurrence record (e.g. poorly preserved fossils, or case count epidemiological record) and the reconstructed phylogenetic tree.

In this paper, we build on the analytical developments presented by Gupta et al. (2019), to calculate the past population size distribution as originally targeted by Vaughan et al. (2019). Our approach here is more analytic, leading to much faster numerical calculations compared to the particle filtering method previously developed. The efficiency of our method paves the way towards considering much bigger datasets, and towards extending the method to multi-type or density-dependent birth-death processes.

In Section 2, we present the model, notation, and an overview of the strategy to express the targeted distribution. In Section 3, we adapt the main results of Gupta et al. (2019) to compute the probability density of observations made after a given time, conditioned on the past population size. In Section 4, we provide a way to compute the joint density of the past population size and observations made before a given time. Combining results of Sections 3, 4 in Section 5, we compute the distribution of past population sizes conditional on the full outcome of the process, and perform sanity checks against previously published methods achieving similar tasks (Stadler, 2010, Vaughan et al., 2019, Gupta et al., 2019). We finally discuss applications and potential extensions of the model.

2. Model and notation

2.1. Parameters of the process

We consider a population of individuals, any of which can give birth to another individual at rate $λ$ or die at rate $μ$ . The process starts at time $t_{or}$ in the past with one individual, and evolves until reaching present time 0, i.e. time is oriented from the present towards the past. In the rest of the manuscript, something happening at time t will thus always refer to an event taking place t units before present.

We superimpose to this background population dynamics three different sampling schemes. First, individuals can be $ψ$ -sampled at rate $ψ$ throughout their lifetime. When $ψ$ -sampled, the individual will be included in the reconstructed phylogenetic tree. Second, individuals can be $ω$ -sampled at rate $ω$ throughout their lifetime. When $ω$ -sampled, the individual is not included in the reconstructed phylogenetic tree, but its sampling time is nevertheless recorded and called ‘an occurrence’. Last, the process finishes upon reaching the present time 0, and each extant individual at that time is $ρ$ -sampled with fixed probability $ρ$ , leading to their inclusion in the reconstructed phylogenetic tree. The sum of all per-capita rates will be called for short $γ = λ + μ + ψ + ω$ .

Following Vaughan et al. (2019), we also include in the model an effect of the $ψ$ - and $ω$ -sampling through time on the population dynamics. We consider that, upon sampling, an individual is either removed from the process with probability $r \in (0, 1)$ , or is unaffected by the sampling with probability $(1 - r)$ . The overall number of individuals, denoted $(I_{t})$ , thus follows a linear birth-death process with birth rate $λ$ and death rate $μ + (ψ + ω) r$ . Note that, because the $ρ$ -sampling step occurs here at the end of the process, it does not matter whether or not individuals are removed upon $ρ$ -sampling.

2.2. Introducing useful probabilities

Some aspects of this process have been previously investigated thoroughly. We now use two key probabilities. First, we will call $u_{t}$ the probability that a process starting at time t with only one individual remains unsampled up to and including the present time (time 0). We recall that $u_{t}$ satisfies the ordinary differential equation (ODE) (Maddison et al., 2007)

\begin{matrix} u_{0} = z \\ {\dot{u}}_{t} = λ u_{t}^{2} - γ u_{t} + μ . \end{matrix}

(2.1)

The solution of this for a particular initial condition z being the following

u (t, z) = \frac{x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t}}{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t}}

(2.2)

where $Δ = γ^{2} - 4 λ μ > {(λ + μ)}^{2} - 4 λ μ ⩾ {(λ - μ)}^{2} > 0$ and $x_{1}, x_{2}$ are the two roots of the polynomial $λ x^{2} - γ x + μ$ ,

x_{1} = \frac{γ - \sqrt{Δ}}{2 λ} and x_{2} = \frac{γ + \sqrt{Δ}}{2 λ} .

Second, we call $p_{t}$ the probability that a process starting at time t with one individual precisely leads to one sampled individual at present time 0. Writing the ODE governing the evolution of this quantity leads to

\begin{matrix} p_{0} = 1 - z \\ {\dot{p}}_{t} = (2 λ u (t, z) - γ) p_{t} . \end{matrix}

(2.3)

The solution of this being the following

p (t, z) = (1 - z) \frac{Δ}{λ^{2}} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 2} e^{- \sqrt{Δ} t} .

(2.4)

These formulas are well known, and correspond respectively to quantities called $p_{0} (t)$ and $p_{1} (t)$ in Stadler (2010). When $z = 1 - ρ$ , we will drop the dependence on z and use the shorter notation $u_{t}, p_{t}$ . We recall standard ways to derive these expressions in Appendix A.

2.3. Strategy of the paper

The process with sampling leads to the observation of two distinct objects $(T, O)$ illustrated in Fig. 1.

The reconstructed phylogenetic tree $T$ , on the one hand, represents the evolutionary relationships between all $ψ$ -sampled and $ρ$ -sampled individuals. We further consider that $ψ$ -sampled individuals are labeled either as ‘removed’ or ‘non-removed’. All $ψ$ -sampled removed individuals are necessarily leaves of $T$ , whereas $ψ$ -sampled non-removed ones can either stand as leaves (when the descent of the individual is not sampled) or as vertices along a branch (when the descent of the individual is further sampled), in which case they are referred to as sampled ancestors.

The record of occurrences $O$ , on the other hand, is an ordered list of all $ω$ -sampling times. We also consider that these sampling times are labeled as either ‘removed’ or ‘non-removed’.

In this paper, we are interested in computing the probability distribution of the number of individuals in the past, conditioned on the observed outcome $(T, O)$ of the process. If $k_{t}$ denotes the number of sampled lineages in $T$ at time t, we call our target distribution,

\forall t ⩾ 0, \forall i \in N_{0} = {0, 1, 2, \dots}, K_{t}^{(i)} ≔ P (I_{t} = k_{t} + i | T, O) .

(2.5)

We will refer to epochs as the maximal time slices within which no sampling event in $O$ , nor branching event in $T$ , happened. These epochs are delimited by the union of sampling times in $O$ , branching times in the tree $T$ , and sampling times of leaves and sampled ancestors in $T$ . All pooled together, we call these ordered times ${(t_{h})}_{h = 0}^{n}$ , starting at present time $t_{0} = 0$ and ending at the origin time $t_{n} = t_{or}$ .

At any time $t ⩾ 0$ we also introduce:

\begin{matrix} T_{t}^{↑} ≔ the tree T starting at the origin time t_{or} and cut at time t \\ T_{t}^{↓} ≔ the collection of trees (or forest) obtained by cutting T \\ at time t, and considering all subtrees descending from cut lineages \\ O_{t}^{↑} ≔ O_{| (t, t_{or})} \\ O_{t}^{↓} ≔ O_{| (0, t)} \end{matrix}

The general strategy – and outline – of the paper is the following. We will traverse the tree and record of occurrences breadth-first, i.e. level-by-level through time. In a backward traversal we will compute the probability density of observations made between time t and 0 conditioned on the population size at time t. We call this probability density,

\forall i \in N_{0}, L_{t}^{(i)} ≔ P (T_{t}^{↓}, O_{t}^{↓} | I_{t} = k_{t} + i) .

(2.6)

In a forward traversal we will then compute the joint probability density of the observations made prior to time t and the population size at time t. We call this density,

\forall i \in N_{0}, M_{t}^{(i)} ≔ P (T_{t}^{↑}, O_{t}^{↑}, I_{t} = k_{t} + i) .

(2.7)

Provided we get expressions of ${(L_{t})}_{t = 0}^{t_{or}}$ and ${(M_{t})}_{t = 0}^{t_{or}}$ , our target distribution can then be expressed by combining both, noting that

\begin{matrix} K_{t}^{(i)} ≔ & P (I_{t} = k_{t} + i | T, O) \\ \propto P (I_{t} = k_{t} + i, T_{t}^{↑}, O_{t}^{↑}, T_{t}^{↓}, O_{t}^{↓}) \\ = P (T_{t}^{↓}, O_{t}^{↓} | I_{t} = k_{t} + i, T_{t}^{↑}, O_{t}^{↑}) P (I_{t} = k_{t} + i, T_{t}^{↑}, O_{t}^{↑}) \\ = L_{t}^{(i)} M_{t}^{(i)} \end{matrix}

(2.8)

where the last line holds because, conditionally on $I_{t} = k_{t} + i$ , the future of the (Markov) process is independent of what happened before.

In the process of getting the probability density of $T, O$ under the same model, Gupta et al. (2019) provided an analytical formula and an algorithm to compute the first ingredient $L_{t}$ in the case where all individuals are removed upon sampling (i.e. $r = 1$ ). We thus recall their main result, and adapt it to our slightly different framework, in the next section.

3. Calculation of $L_{t}$ – The density of observations below t conditioned on past population size

We start this section by presenting the ODEs satisfied by the probability density $L_{t}$ . This provides us with a numerical algorithm to compute $L_{t}$ , which we subsequently simplify with analytical results for specific sets of parameters.

3.1. Set of ODEs satisfied by $L_{t}$

We can derive the probability density $L_{t}$ by studying its evolution through time. First, observe that we can express $L_{0}$ at present time 0. Indeed, provided we know the exact number of individuals living at time 0, the probability to see the tips of the tree is directly driven by the $ρ$ -sampling,

\forall i \in N_{0}, L_{0}^{(i)} = ρ^{k_{0}} {(1 - ρ)}^{i} .

(3.1)

We now derive the ODE driving the evolution of $L_{t}$ through time across any given epoch. We consider an infinitesimal time step $δ t$ and list the events which could have happened in the full process between $t + δ t$ and t, leading to our observations. Suppose the number of observed lineages in this epoch is k, and the total number of individuals alive is $k + i$ . We emphasize three cases, illustrated in Fig. 2:

1.
nothing happened with probability $(1 - γ (k + i) δ t)$
2.
a birth event happened
- (a)
  among the k sampled lineages in $T_{t}^{↓}$ , and it leads to an extinct or unsampled subtree to the left or to the right, with probability $2 λ k δ t$ .
- (b)
  among the i other individuals, with probability $λ i δ t$ .
3.
a death event happened among the i particles, with probability $μ i δ t$ .

Fig. 2 — Four unobservable scenarios taken into account to derive the ODEs (3.2), (4.1).

These allow us to write, $\forall i \in N_{0}$ ,

L_{t + δ t}^{(i)} = (1 - γ (k + i) δ t) L_{t}^{(i)} + λ (2 k + i) δ {tL}_{t}^{(i + 1)} + μ i δ {tL}_{t}^{(i - 1)} .

Note that for $i = 0, L_{t}^{(i - 1)}$ is not defined, but the term cancels out thanks to the factor i.

Subtracting $L_{t}^{(i)}$ from both sides, dividing by $δ t$ and letting $δ t \to 0$ , we get the following set of ODEs driving the evolution of $L_{t}$ ,

\begin{matrix} \forall i \in N_{0}, L_{0}^{(i)} = & ρ^{k_{0}} {(1 - ρ)}^{i} \\ {\dot{L}}_{t}^{(i)} = & - γ (k + i) L_{t}^{(i)} + λ (2 k + i) L_{t}^{(i + 1)} + μ {iL}_{t}^{(i - 1)} . \end{matrix}

(3.2)

Last, we need to study how $L_{t}$ changes at punctual events. We call unsampled lineages the lineages that do not appear on the reconstructed phylogenetic tree, i.e. have not been $ρ$ - or $ψ$ -sampled. Note that these unsampled lineages might still be subject to $ω$ -sampling events.

There are 6 types of punctual events that we can come across at time t in the past, listed below and illustrated in Fig. 3. We denote $L_{t^{+}}$ the probability just before (i.e. up) the punctual event and $L_{t^{-}}$ the probability immediately after (i.e. down). One directly gets $L_{t^{+}}$ by decomposing it into what must occur below $t^{-}$ , multiplied by the rate of the specific event happening on the infinitesimal time window $(t^{-}, t^{+})$ . We can either find,

1.
a leaf of $T_{t}^{↓}$ , labeled as removed. This is a $ψ$ -sampling with removal event for which the number of unsampled lineages remains constant, and the number of sampled lineages increases by one (going backward in time). It thus gives,
$L_{t^{+}}^{(i)} = ψ {rL}_{t^{-}}^{(i)} .$ (3.3)
2.
a leaf of $T_{t}^{↓}$ , labeled as non-removed. This is a $ψ$ -sampling without removal event for which one of the unsampled lineage becomes a sampled one (going backward in time). It thus gives,
$L_{t^{+}}^{(i)} = ψ (1 - r) L_{t^{-}}^{(i + 1)} .$ (3.4)
3.
a sampled ancestor along a branch of $T_{t}^{↓}$ , necessarily labeled as non-removed. This is a $ψ$ -sampling without removal event, not impacting the number of sampled or unsampled lineages. It thus gives,
$L_{t^{+}}^{(i)} = ψ (1 - r) L_{t^{-}}^{(i)} .$ (3.5)
4.
an occurrence in $O_{t}^{↓}$ , labeled as removed. This is a $ω$ -sampling with removal event, for which the number of unsampled lineages increases by one (going backward in time). It thus gives,
$L_{t^{+}}^{(i)} = ω {riL}_{t^{-}}^{(i - 1)} .$ (3.6)

Note that here also, for $i = 0, L_{t}^{(- 1)}$ is not defined but the term cancels out thanks to the factor i.
5.
an occurrence in $O_{t}^{↓}$ , labeled as non-removed. This is a $ω$ -sampling without removal event, not impacting the number of sampled or unsampled lineages. It thus gives,
$L_{t^{+}}^{(i)} = ω (k + i) (1 - r) L_{t^{-}}^{(i)} .$ (3.7)
6.
a branching event between two branches of $T_{t}^{↓}$ . The number of sampled lineages decreases by one (going backward in time). It thus gives,
$L_{t^{+}}^{(i)} = λ L_{t^{-}}^{(i)} .$ (3.8)

Note that these updates can be adapted to the case when we don’t observe the removal status of individuals. The update corresponding to a leaf of $T$ is the sum of updates (3.3), (3.4), the update corresponding to an occurrence event is the the sum of updates (3.6), (3.7), while updates (3.5), (3.8) are unchanged.

This set of ODEs (3.2) together with update Eqs. (3.3), (3.4), (3.5), (3.6), (3.7), (3.8) can be numerically approximated. To do so, we fix a finite upper bound N on the number of hidden individuals and numerically integrate a truncated ODE system. We detail this in the following algorithm to compute an approximation of $L_{t}$ at any time t.

Algorithm 1: Computes a numerical approximation of

L_{t}

for a specific set of times

Input:

Observed tree and occurrence data

(T, O)

parameters

(t_{or}, λ, μ, ψ, ω, ρ, r)

set of time points

{(τ_{j})}_{j = 1}^{S}

for which we want to compute the density

L_{τ_{j}}^{(i)}

and the truncation N setting the accuracy of the algorithm.

Output: A numerical approximation of

L_{t}

at times

{(τ_{j})}_{j = 1}^{S}, {({\tilde{L}}_{τ_{j}}^{(i)})}_{\begin{matrix} i \in {0, 1, \dots, N} \\ j \in {1, 2, \dots, S} \end{matrix}}

. 1: Pool all

(τ_{j})

and all branching and sampling times of

(T, O)

in an ordered list

{(t_{h})}_{h = 1}^{n}

2: Set

j = 1

and initialize B as a

S \times (N + 1)

empty matrix

3: Set

\forall i \in {0, 1, \dots, N}, {\tilde{L}}_{0}^{(i)} = ρ^{k_{0}} {(1 - ρ)}^{i}

4: for

h = 1, 2, \dots, n

5: Numerically solve the ODE

{\dot{\tilde{L}}}_{t} = A {\tilde{L}}_{t}

(t_{h - 1}, t_{h})

, by computing

{\tilde{L}}_{t_{h}} = e^{(t_{h} - t_{h - 1}) A} {\tilde{L}}_{t_{h - 1}}

6: where matrix A is a

(N + 1) \times (N + 1)

tridiagonal matrix with entries given by,

\begin{matrix} \forall i \in {0, 1, \dots, N} A^{(i, i)} = - γ (k + i) \\ \forall i \in {0, 1, \dots, N - 1} A^{(i, i + 1)} = λ (2 k + i) \\ \forall i \in {1, 2, \dots, N} A^{(i, i - 1)} = μ i \end{matrix}

7: if

t_{h} = τ_{j}

8: Record

\forall i, B^{(j, i)} = {\tilde{L}}_{t_{h}}^{(i)}

9: Set

j = j + 1

10: end if

11: if

t_{h} = t_{n}

t_{h} = τ_{S}

then

12: returnB

13: else if

t_{h}

is a removed leaf then

14: Set

{\tilde{L}}_{t_{h}^{+}} = ψ r {\tilde{L}}_{t_{h}^{-}}

15: else if

t_{h}

is a non-removed leaf then

16: Set

\forall i < N, {\tilde{L}}_{t_{h}^{+}}^{(i)} = ψ (1 - r) {\tilde{L}}_{t_{h}^{-}}^{(i + 1)}

and

{\tilde{L}}_{t_{h}^{+}}^{(N)} = 0

17: else if

t_{h}

is a sampled ancestor then

18: Set

{\tilde{L}}_{t_{h}^{+}} = ψ (1 - r) {\tilde{L}}_{t_{h}^{-}}

19: else if

t_{h}

is a removed occurrence then

20: Set

\forall i > 0, {\tilde{L}}_{t_{h}^{+}}^{(i)} = ω ri {\tilde{L}}_{t_{h}^{-}}^{(i - 1)}

and

{\tilde{L}}_{t_{h}^{-}}^{(0)} = 0

21: else if

t_{h}

is a non-removed occurrence

22: Set

{\tilde{L}}_{t_{h}^{+}}^{(i)} = ω (1 - r) (k + i) {\tilde{L}}_{t_{h}^{-}}^{(i)}

23: else

t_{h}

is a branching event

24: Set

{\tilde{L}}_{t_{h}^{+}} = λ {\tilde{L}}_{t_{h}^{-}}

25: end if

26: end for

Open in a new tab

We also define a slight variation of this algorithm, that we will refer to as Algorithm 1’, where no set of time points $(τ_{j})$ is required, and the values of ${\tilde{L}}_{t}$ are not recorded through time (i.e. matrix B disappears). Instead, when reaching $t_{n} = t_{or}$ we simply return ${\tilde{L}}_{t}^{(0)}$ , which by definition is an estimate of the probability density of $(T, O)$ . Note that this strategy is identical to what has been used to compute the probability density of a reconstructed phylogenetic tree under a logistic birth-death process (Leventhal et al., 2013).

These two algorithms will prove useful to deal with the general case. Furthermore, we may obtain analytical expressions for $L_{t}$ when $ω = 0$ as well as when $r = 1$ (Gupta et al., 2019). We reveal these in the next two subsections.

3.2. Special case $ω = 0$

Suppose we can express $L_{t}^{(i)}$ as the product $L_{t}^{(i)} = u_{t}^{i} W_{t}$ where $W_{t}$ is a function of time only, and $u_{t}$ is defined as in Eq. 2.2. We first get, from the initialization in Eq. (3.2), that $W_{0} = ρ^{k_{0}}$ . Moreover, substituting $u_{t}^{i} W_{t}$ in the ODE leads to

\begin{matrix} {\dot{L}}_{t}^{(i)} = & {iu}_{t}^{i - 1} {\dot{u}}_{t} W_{t} + u_{t}^{i} {\dot{W}}_{t} \\ = & (λ {iu}_{t}^{i + 1} - γ {iu}_{t}^{i} + μ {iu}_{t}^{i - 1}) W_{t} + u_{t}^{i} {\dot{W}}_{t} . \end{matrix}

Thus leading to the following ODE for $W_{t}$ , on any epoch $(t_{h}, t_{h + 1})$ where the number of sampled lineages remains fixed and equal to k,

\begin{matrix} u_{t}^{i} {\dot{W}}_{t} = & (- γ (i + k) u_{t}^{i} + λ (2 k + i) u_{t}^{i + 1} + μ {iu}_{t}^{i - 1} - λ {iu}_{t}^{i + 1} + γ {iu}_{t}^{i} - μ {iu}_{t}^{i - 1}) W_{t} \\ \Rightarrow {\dot{W}}_{t} = & (2 λ u_{t} - γ) {kW}_{t} . \end{matrix}

This is very close to the ODE (2.3) governing the evolution of $p_{t}$ , and it leads to (see derivation in Appendix A),

\forall t \in (t_{h}, t_{h + 1}), W_{t} = W_{t_{h}} {(\frac{p_{t}}{p_{t_{h}}})}^{k} .

(3.9)

Last, because $ω = 0$ , updates (3.3), (3.4), (3.5), (3.6), (3.7), (3.8) simplify to only the following $ψ$ - and $λ$ -events,

if t is a removed leaf, W_{t^{+}} = ψ {rW}_{t^{-}}

(3.10)

if t is a non-removed leaf, W_{t^{+}} = ψ (1 - r) u_{t} W_{t^{-}}

(3.11)

if t is a sampled ancestor, W_{t^{+}} = ψ (1 - r) W_{t^{-}}

(3.12)

if t is a branching time, W_{t^{+}} = λ W_{t^{-}} .

(3.13)

Combining these updates with Eq. (3.9) leads to the following proposition.

Proposition 3.1

When $ω = 0$ , at any time t across epoch $(t_{h}, t_{h + 1})$ , considering that we observed so far – i.e. on $(0, t_{h + 1})$ – v sampled ancestors, w removed leaves at times $t_{j} \in W$ , x branching events at times $t_{j} \in X$ , y non-removed leaves at times $t_{j} \in Y$ , we get,

$\begin{matrix} L_{t}^{(i)} = & u_{t}^{i} W_{t} \\ where W_{t} = & λ^{x} ψ^{v + w + y} {(1 - r)}^{v + y} r^{w} p_{t}^{k_{t}} \prod_{t_{j} \in X} p_{t_{j}} \prod_{t_{j} \in Y} u_{t_{j}} p_{t_{j}}^{- 1} \prod_{t_{j} \in W} p_{t_{j}}^{- 1} . \end{matrix}$

Proof

We prove this proposition by induction across the epochs in Appendix E, using as the main arguments the equation updates (3.10), (3.11), (3.12), (3.13), combined with Eq. (3.9).

Note that this proposition is very similar to what is presented in Section 3 by Gupta et al. (2019). We nevertheless need to highlight two differences.

The first one is that we allow here for removal or not of the individual upon sampling, with a given probability r, whereas Gupta et al. (2019) considered that all individuals were removed upon sampling ( $r = 1$ ), and Stadler (2010) considered that individuals were not removed upon sampling ( $r = 0$ ).

The second difference concerns the underlying framework under which we derive our results. In Gupta et al. (2019), individuals where distinguishable (say, each one is assigned a number and they can be ordered), whereas in the present paper they are not. When individuals are ordered, the probability density $L_{t}^{(i)}$ is changed by a factor $\frac{(k + i)!}{i!}$ , which is the number of ways we can arrange $k + i$ elements in a list of size k, i.e. the number of ordered configurations of hidden individuals.

Note that, when reaching the origin of the tree, the formula in Proposition 3.1 reduces to a very similar formula for the probability density of $T$ because $i = 0$ and $k = 1$ . We summarize this as the following corollary.

Corollary 3.1.1

When $ω = 0$ , the probability density of a reconstructed tree $T$ with v sampled ancestors, w removed leaves at times $t_{j} \in W$ , y non-removed leaves at times $t_{j} \in Y$ , and branching events at times $t_{j} \in X$ , is

$P (T) = λ^{w + y + k_{0} - 1} ψ^{v + w + y} {(1 - r)}^{v + y} r^{w} \prod_{t_{j} \in X \cup {t_{or}}} p_{t_{j}} \prod_{t_{j} \in Y} u_{t_{j}} p_{t_{j}}^{- 1} \prod_{t_{j} \in W} p_{t_{j}}^{- 1}$ (3.14)

Proof

It directly follows from Proposition 3.1, by noting that $P (T) = L_{t_{or}}^{(0)}$ . Note also that a rooted binary tree with $w + y + k_{0}$ leaves shows necessarily $x = w + y + k_{0} - 1$ branching times.

Note that this formula is a straightforward generalization of formulas provided in Stadler (2010) (where $r = 0$ ) or Stadler et al. (2011) (where $ρ = 0$ ).

3.3. Special case $r = 1$

When $r = 1$ , only three kinds of punctual events, corresponding to updates (3.3), (3.6), (3.8) need to be taken into account. Because the number of unsampled individuals i goes into formula (3.6), the simple expression $L_{t}^{(i)} = u_{t}^{i} W_{t}$ cannot be considered anymore, and one needs to find another expression. This has already been done in Gupta et al. (2019) and we only need to adapt here their result to our slightly different framework.

Proposition 3.2

When $r = 1$ , we can compute the $L_{t}^{(i)}$ values at any time t as

$L_{t}^{(i)} = \sum_{ℓ = 0}^{q} \frac{i!}{(i - ℓ)!} u_{t}^{i - ℓ} W_{t}^{(ℓ)} .$

where $W_{t}$ is a q dimensional time-varying vector which can be computed following Algorithm 2 in Gupta et al. (2019).

Proof

The proof relies on the definition of a distinguishable version of the probability $L_{t}^{(i)}$ as

${\overline{L}}_{t}^{(i)} = \frac{(k + i)!}{i!} L_{t}^{(i)}$ (3.15)

which allows us to use results previously derived in Gupta et al. (2019). Details are provided in Appendix B.

Note that when there is no $ω$ -sampling, then $q = 0$ for all times and $W_{t}^{(0)}$ is the same as $W_{t}$ defined in the previous section.

This ends our section on the computation of $L_{t}$ . It thus remains to (i) present a way to compute $M_{t}$ and (ii) combine $L_{t}$ and $M_{t}$ to get the target distribution $K_{t}$ at any time t. We do this in turn in the next two sections.

4. Calculation of $M_{t}$ – the joint density of observations above t and past population size

Recall that we are now interested in computing the joint density of observations above time t and past population size at time t, i.e. $\forall i \in N_{0}, M_{t}^{(i)} ≔ P (T_{t}^{↑}, O_{t}^{↑}, I_{t} = k_{t} + i)$ . We start by presenting the ODEs satisfied by $M_{t}$ , before turning to its resolution for specific parameter sets. The approach is very similar to the one presented in the previous section to compute $L_{t}$ , with the slight difference that we will need to traverse the tree forward in time instead of backward in time.

4.1. Set of ODEs satisfied by $M_{t}$

At the time of origin of the process $t_{or}$ , we only observe one starting lineage in $T_{t_{or}}^{↑}$ . This provides us with the following initialization condition on M,

M_{t_{or}}^{(i)} = P (I_{t_{or}} = 1 + i) = 1_{i = 0} .

We then derive the ODEs driving the evolution of $M_{t}$ across an epoch on which the number of observed lineages is fixed and equal to k. Suppose we know $M_{t}$ , and we observe no punctual event on the infinitesimal time interval $(t - δ t, t)$ . Unobservable events have already been illustrated in Fig. 2. It allows us to get

M_{t - δ t}^{(i)} = (1 - γ (i + k) δ t) M_{t}^{(i)} + λ (2 k + i - 1) δ t 1_{i > 0} M_{t}^{(i - 1)} + μ (i + 1) δ {tM}_{t}^{(i + 1)} .

Subtracting $M_{t}^{(i)}$ from both sides, multiplying by -1,dividing by $δ t$ and letting $δ t \to 0$ , we get the following set of ODEs driving the evolution of $M_{t}$ ,

\begin{matrix} \forall i \in N_{0}, M_{t_{or}}^{(i)} = & 1_{i = 0} \\ {\dot{M}}_{t}^{(i)} = & γ (i + k) M_{t}^{(i)} - λ (2 k + i - 1) 1_{i > 0} M_{t}^{(i - 1)} - μ (i + 1) M_{t}^{(i + 1)} . \end{matrix}

(4.1)

Last, we need to take into account the evolution of $M_{t}$ at punctual events. Again, there are 6 types of punctual events that we can come across at time t in the past, listed below and illustrated in Fig. 3. We denote $M_{t^{-}}$ the probability just after (i.e. below) the punctual event and $M_{t^{+}}$ the probability immediately before (i.e. up). Because we are here deriving $M_{t}$ forward in time, one needs to carefully note differences with results derived in Section 3 relating to the number of lineages before and after the event. We can indeed find the same punctual events, namely,

1.
a leaf of $T_{t}^{↓}$ , labeled as removed. This is a $ψ$ -sampling with removal event for which the number of sampled lineages decreases by one and the number of unsampled lineages remains unchanged. This gives,
$M_{t^{-}}^{(i)} = ψ {rM}_{t^{+}}^{(i)} .$ (4.2)
2.
a leaf of $T_{t}^{↓}$ , labeled as non-removed. This is a $ψ$ -sampling without removal event for which one sampled lineages becomes unsampled. This gives,
$M_{t^{-}}^{(i)} = ψ (1 - r) 1_{i > 0} M_{t^{+}}^{(i - 1)} .$ (4.3)
3.
a sampled ancestor along a branch of $T_{t}^{↓}$ , necessarily labeled as non-removed. This is a $ψ$ -sampling without removal event which does not affect the number of lineages. It gives,
$M_{t^{-}}^{(i)} = ψ (1 - r) M_{t^{+}}^{(i)} .$ (4.4)
4.
an occurrence in $O_{t}^{↓}$ , labeled as removed. This is a $ω$ -sampling with removal event, for which the number of unsampled lineages decreases by one. This gives,
$M_{t^{-}}^{(i)} = ω r (i + 1) M_{t^{+}}^{(i + 1)} .$ (4.5)
5.
an occurrence in $O_{t}^{↓}$ , labeled as non-removed. This is a $ω$ -sampling without removal event which does not affect the number of lineages. It gives,
$M_{t^{-}}^{(i)} = (k + i) ω (1 - r) M_{t^{+}}^{(i)} .$ (4.6)
6.
a branching event between two branches of $T_{t}^{↓}$ .

This is a $λ$ -event increasing the number of sampled lineages by one. This gives,
$M_{t^{-}}^{(i)} = λ M_{t^{+}}^{(i)} .$ (4.7)

Finally, upon reaching present time 0, one needs to take into account the $ρ$ -sampling, leading to the following update,

M_{0^{-}}^{(i)} = {(1 - ρ)}^{i} ρ^{k_{0}} M_{0^{+}}^{(i)} .

(4.8)

Note, as for $L_{t}$ , that these updates can be adapted to the case when we do not observe the removal status of individuals. The update corresponding to a leaf of $T$ is the sum of updates (4.2), (4.3), the update corresponding to an occurrence event is the the sum of updates (4.5), (4.6), while updates (4.4), (4.7) are unchanged.

As already exhibited for $L_{t}$ , we can build a similar algorithm to compute $M_{t}$ in the general case, relying on a numerical ODE solver for approximating Eq. (4.1). As for Algorithm 1’ previously introduced to compute the probability density of $(T, O)$ , a slight variation of this algorithm would allow one to compute an estimate of the probability density of $(T, O)$ by summing the $M_{0}^{(i)}$ ’s over all i. Note that this strategy is identical to what has been used to compute the probability density of a reconstructed phylogenetic tree under a logistic birth-death process (Etienne et al., 2012, Laudanno et al., 2020).

While this approach is in theory a good approximation, it requires fixing arbitrarilly a truncation parameter N, and exponentiating matrices of dimension $N \times N$ , leading to potential speed or accurracy issues. In the remainder of this section, we derive analytical results to avoid resorting to a numerical ODE solver in specific cases.

4.2. The corresponding generating function

We introduce now the generating function corresponding to the density $M_{t}$ , which will prove useful to get analytical results,

\hat{M} (t, z) ≔ \sum_{i = 0}^{\infty} z^{i} M_{t}^{(i)} .

The initial condition on M translates into, $\forall z, \hat{M} (t_{or}, z) = 1$ . The ODE (4.1) furthermore translates into the following partial differential equation (PDE),

\begin{matrix} \partial_{t} \hat{M} = & \sum_{i = 0}^{\infty} z^{i} (γ (i + k) M_{t}^{(i)} - λ (2 k + i - 1) 1_{i > 0} M_{t}^{(i - 1)} - μ (i + 1) M_{t}^{(i + 1)}) \\ = & γ k \sum_{i = 0}^{\infty} z^{i} M_{t}^{(i)} + γ \sum_{i = 1}^{\infty} {iz}^{i} M_{t}^{(i)} - λ \sum_{i = 0}^{\infty} z^{i + 1} (2 k + i) M_{t}^{(i)} - μ \sum_{i = 1}^{\infty} {iz}^{i - 1} M_{t}^{(i)} \\ = & γ k \hat{M} + γ z \partial_{z} \hat{M} - 2 k λ z \hat{M} - λ z^{2} \partial_{z} \hat{M} - μ \partial_{z} \hat{M} \\ = & - k (2 λ z - γ) \hat{M} - (λ z^{2} - γ z + μ) \partial_{z} \hat{M} . \end{matrix}

Our target generating function $\hat{M}$ is thus the solution of the following PDE problem across a given epoch $(t_{h - 1}, t_{h})$ , on which the number of observed lineages remains constant and equal to k,

\begin{matrix} \hat{M} (t_{h}, z) = F (z) \\ \partial_{t} \hat{M} + (λ z^{2} - γ z + μ) \partial_{z} \hat{M} + k (2 λ z - γ) \hat{M} = 0 . \end{matrix}

(4.9)

Solving this PDE problem allows us to obtain an analytical expression of $\hat{M}$ for any time across an epoch, provided we know the expression of $\hat{M} (t_{h}, z)$ at the end of the epoch.

Proposition 4.1

The solution to the PDE problem (4.9) is given by

$\hat{M} (t, z) = F (u (t_{h} - t, z)) R {(t_{h} - t, z)}^{k}$

where we introduce $R (t, z) = p (t, z) / (1 - z)$ to ease the notation.

Proof

We used the method of characteristics to solve this first order linear PDE, see derivations in Appendix C.

Between epochs, one must also update $\hat{M}$ according to punctual events taking place. Previously presented updates of M (Eqs. (4.2), (4.3), (4.4), (4.5), (4.6), (4.7)) translate into the following updates for $\hat{M}$ ,

if t is a removed leaf,

\hat{M} (t^{-}, z) = \sum_{i = 0}^{\infty} z^{i} (ψ {rM}_{t^{+}}^{(i)}) = ψ r \hat{M} (t^{+}, z)

(4.10)

if t is a non-removed leaf,

\hat{M} (t^{-}, z) = \sum_{i = 0}^{\infty} z^{i} (ψ (1 - r) 1_{i > 0} M_{t^{+}}^{(i - 1)}) = ψ (1 - r) z \hat{M} (t^{+}, z)

(4.11)

if t is a sampled ancestor,

\hat{M} (t^{-}, z) = \sum_{i = 0}^{\infty} z^{i} (ψ (1 - r) M_{t^{+}}^{(i)}) = ψ (1 - r) \hat{M} (t^{+}, z)

(4.12)

if t is a removed occurrence,

\hat{M} (t^{-}, z) = \sum_{i = 0}^{\infty} z^{i} (ω r (i + 1) M_{t^{+}}^{(i + 1)}) = ω r \partial_{z} \hat{M} (t^{+}, z)

(4.13)

if t is a non-removed occurrence,

\hat{M} (t^{-}, z) = \sum_{i = 0}^{\infty} z^{i} (ω (1 - r) (k + i) M_{t^{+}}^{(i)}) = ω (1 - r) (k \hat{M} (t^{+}, z) + z \partial_{z} \hat{M} (t^{+}, z))

(4.14)

if t is a branching event,

\hat{M} (t^{-}, z) = \sum_{i = 0}^{\infty} z^{i} (λ M_{t^{+}}^{(i)}) = λ \hat{M} (t^{+}, z) .

(4.15)

If we are interested in the distribution at some point, we can thus start the formula at $t_{or}$ with $F (z) = 1$ , and then iteratively alternate between the updates at punctual events and the use of Proposition 4.1 over each epoch. When reaching present time 0, the step of $ρ$ -sampling expressed in Eq. (4.8) moreover translates into,

\hat{M} (0^{-}, z) = \sum_{i = 0}^{\infty} z^{i} {(1 - ρ)}^{i} ρ^{k_{0}} M_{0^{+}}^{(i)} = ρ^{k_{0}} \hat{M} (0^{+}, (1 - ρ) z) .

(4.16)

While this procedure in theory allows us to get the analytical formula of $\hat{M}$ at any time, updates (4.13), (4.14) require differentiating the generating function, greatly complicating the expression of the function after a few occurrences. When $ω = 0$ , these two updates disappear and a nice recursion leads to a closed-form formula that we will detail in Proposition 4.3.

We implemented this procedure in the SageMath programming language able to deal with symbolic calculus. We were however not able to make it find concise expressions, and computing these successive derivatives was too time-consuming to be applicable to standard datasets in the field. Instead, when $ω \neq 0$ .

We suggest another strategy for computing the $M_{t}^{(i)}$ ’s, namely approximating $\hat{M}$ across punctual events by a polynomial of order N , $\sum_{l = 0}^{N} {\tilde{M}}_{t}^{(l)} z^{l}$ , while still relying on Proposition 4.1 to drive the evolution of the probability generating function between events. This is a more efficient alternative to numerically solving the ODE system. We only need to derive the expression of the generating function at punctual events as given in the following Proposition 4.2.

Proposition 4.2

The derivatives in $z = 0$ of a generative function which can be expressed as

$\hat{M} (t_{h} - t, z) ≔ R {(t_{h} - t, z)}^{k} \sum_{l = 0}^{N} {\tilde{M}}_{t_{h}}^{(l)} u {(t_{h} - t, z)}^{l}$

can be numerically computed using the formula

$\begin{matrix} {(\partial_{z}^{i} \hat{M} (t_{h} - t, z))}_{z = 0} = & {(\frac{Δ}{λ^{2}} e^{- \sqrt{Δ} (t_{h} - t)})}^{k} \sum_{α = 0}^{i} \sum_{l = α}^{N} {\tilde{M}}_{t_{h}}^{(l)} (\begin{matrix} i \\ α \end{matrix}) \frac{l!}{(l - α)!} (\prod_{m = 0}^{i - α - 1} (2 k + l + m)) {(x_{1} x_{2})}^{l - α} \\ {(- x_{1} + x_{2} e^{- \sqrt{Δ} (t_{h} - t)})}^{α} {(1 - e^{- \sqrt{Δ} (t_{h} - t)})}^{l + i - 2 α} {(x_{2} - x_{1} e^{- \sqrt{Δ} (t_{h} - t)})}^{- (2 k + l + i - α)} . \end{matrix}$

Proof

The derivation is detailed in Appendix D.1.

This derivation is at the heart of Algorithm 2, allowing to follow the evolution of the ${\tilde{M}}_{t}^{(i)}$ ’s through each epoch, as well as at times when we want to record them.

We will refer to Algorithm 2’ as the slight variation of this algorithm aimed at computing the density of $(T, O)$ . No set of time points $(τ_{j})$ is required, and the values of ${\tilde{M}}_{t}$ are not recorded through time (i.e. matrix $B'$ disappears). Instead, when reaching $t_{h} = t_{0}$ we simply return $\sum_{i = 0}^{N} ρ^{k_{0}} {(1 - ρ)}^{i} {\tilde{M}}^{(i)}$ .

Algorithm 2: Computes a numerical approximation of Mt for a specic set of times

Input: Observed tree and occurrence data

(T, O)

parameters

(t_{or}, λ, μ, ψ, ω, ρ)

set of time points

{(τ_{j})}_{j = 1}^{S}

for which we want to compute the density,

and the truncation N setting the accuracy of the algorithm.

Output: A numerical approximation of

M_{t}

at times

{(τ_{j})}_{j = 1}^{S}, {({\tilde{M}}_{τ_{j}}^{(i)})}_{\begin{matrix} i \in {0, 1, \dots, N} \\ j \in {1, 2, \dots, S} \end{matrix}}

1: Pool all

(τ_{j})

and all branching and sampling times of

(T, O)

in an ordered list

{(t_{h})}_{h = 1}^{n}

2: Set

j = S

and

B^{'}

as a

S \times (N + 1)

empty matrix

3: Set

\forall i \in {0, 1, \dots, N}, {\tilde{M}}^{(i)} = 1_{i = 0}

4: Set

k = 1

5: for

h = n - 1, n - 2, \dots, 0

6: Compute the values right before the punctual event,

\begin{matrix} {\tilde{\tilde{M}}}^{(i)} = & {(\frac{Δ}{λ^{2}} e^{- \sqrt{Δ} (t_{h} - t)})}^{k} \sum_{α = 0}^{i} \sum_{l = α}^{N} {\tilde{M}}_{t_{h}}^{(l)} (\begin{matrix} l \\ α \end{matrix}) \frac{1}{(i - α)!} (\prod_{m = 0}^{i - α - 1} (2 k + l + m)) \\ {(- x_{1} + x_{2} e^{- \sqrt{Δ} (t_{h} - t)})}^{α} {(x_{1} x_{2})}^{l - α} {(1 - e^{- \sqrt{Δ} (t_{h} - t)})}^{l + i - 2 α} {(x_{2} - x_{1} e^{- \sqrt{Δ} (t_{h} - t)})}^{- (2 k + l + i - α)} \end{matrix}

7: if

t_{h} = τ_{j}

then

8: Record the result in

B^{'}

\forall i, {B^{'}}^{(j, i)} = {\tilde{\tilde{M}}}^{(i)}

9: Set

j = j - 1

10: end if

11: if

t_{h} = 0

t_{h} = τ_{S}

12: return

B^{'}

13: els if

t_{h}

is a removed leaf

14: Update

\forall i, {\tilde{M}}^{(i)} = ψ r {\tilde{\tilde{M}}}^{(i)}

15: Set

k = k - 1

16: else if

t_{h}

is a non-removed leaf

17: Update

{\tilde{M}}^{(0)} = 0

and

\forall i > 0, {\tilde{M}}^{(i)} = ψ (1 - r) {\tilde{\tilde{M}}}^{(i - 1)}

18: Set

k = k - 1

19: else if

t_{h}

is a sampled ancestor

20: Update

\forall i, {\tilde{M}}^{(i)} = ψ (1 - r) {\tilde{\tilde{M}}}^{(i)}

21: else if

t_{h}

is a removed occurrence

22: Update

\forall i < N, {\tilde{M}}^{(i)} = ω r (i + 1) {\tilde{\tilde{M}}}^{(i + 1)}

and

{\tilde{M}}^{(N)} = 0

23: else if

t_{h}

is a non-removed occurrence

24: Update

\forall i, {\tilde{M}}^{(i)} = ω (1 - r) (k + i) {\tilde{\tilde{M}}}^{(i)}

25: else

t_{h}

is a branching event

26: Update

\forall i, {\tilde{M}}^{(i)} = λ {\tilde{\tilde{M}}}^{(i)}

27: Set

k = k + 1

28: end if

29: end for

Open in a new tab

Note that we tried to follow an analogous generating function approach as an alternative to Algorithm 1 to compute $L_{t}$ as well. This leads to another PDE problem, described in Appendix F, that will require further work to be solved.

4.3. Special case $ω = 0$

We were not able to come with any analytical simplification, as in the previous section, for the case $r = 1$ . However, for the special case $ω = 0$ , corresponding to the special case leading to the observation of $O = \emptyset$ , a nice recursion leads to a closed-form formula for $\hat{M}$ .

Proposition 4.3

When $ω = 0$ , at any time t, considering that we have observed so far –i.e. on $(t, t_{or})$ – v sampled ancestors, w removed leaves at times $t_{j} \in W$ , x branching events at times $t_{j} \in X$ , y non-removed leaves at times $t_{j} \in Y$ , we get,

$\hat{M} (t, z) = λ^{x} ψ^{v + w + y} r^{w} {(1 - r)}^{v + y} \prod_{t_{j} \in X \cup {t_{or}}} R (t_{j} - t, z) \prod_{t_{j} \in W} R {(t_{j} - t, z)}^{- 1} \prod_{t_{j} \in Y} u (t_{j} - t, z) R {(t_{j} - t, z)}^{- 1} .$

Proof

We prove this result by induction across the epochs of $T$ in Appendix E, using as the main arguments the update Eqs. (4.10), (4.11), (4.12), (4.15), combined with Proposition 4.1 driving the evolution across an epoch.

As a simple corollary of this result, when $t_{h} = 0$ is the present, we get back the same probability density formula of $T$ as provided, e.g. in Theorem 3.5 in Stadler (2010) (when $r = 0$ ), in Section 3 in Gupta et al. (2019) (when $r = 1$ ), or in our previous Corollary 3.1.1.

Indeed, Proposition 4.3 offers yet another proof of Corollary 3.1.1 by noting that

P (T) = \sum_{i = 0}^{\infty} M_{0^{-}}^{(i)} = \hat{M} (0^{-}, 1) = ρ^{k_{0}} \hat{M} (0^{+}, 1 - ρ)

where the last equality follows from Eq. (4.16) taking into account the $ρ$ -sampling at present. Note that this alternative proof is also presented in (Laudanno et al., 2020).

When $ω = 0$ , Proposition 4.3 also offers an alternative to Algorithm 2 for deriving $M_{t}$ . Indeed, resorting to the generating function to get back the probability density, one can get the following corollary.

Corollary 4.3.1

When $ω = 0$ , at any time t, considering that we have observed so far –i.e. on $(t, t_{or})$ – v sampled ancestors, w removed leaves at times $t_{j} \in W$ , x branching events at times $t_{j} \in X$ , y non-removed leaves at times $t_{j} \in Y$ , we can compute $M_{t}^{(i)}$ using the following recursion,

$\begin{matrix} M_{t}^{(0)} = & λ^{x} ψ^{v + w + y} r^{w} {(1 - r)}^{v + y} \prod_{t_{j} \in X \cup {t_{or}}} R (t_{j} - t, 0) \prod_{t_{j} \in W} R {(t_{j} - t, 0)}^{- 1} \prod_{t_{j} \in Y} u (t_{j} - t, 0) R {(t_{j} - t, 0)}^{- 1} \\ M_{t}^{(i)} = & \frac{1}{i} \sum_{α = 1}^{i} M_{t}^{(i - α)} C^{(α)} \end{matrix}$

where we define

$\begin{matrix} C^{(α)} = & 2 \underset{t_{j} \in X \cup {t_{or}}}{Π} a_{t_{j} - t}^{α} - 2 \underset{t_{j} \in W}{Π} a_{t_{j} - t}^{α} - \underset{t_{j} \in Y}{Π} (a_{t_{j} - t}^{α} + b_{t_{j} - t}^{α}) \\ a_{t} = & (1 - e^{- \sqrt{Δ} t}) {(x_{2} - x_{1} e^{- \sqrt{Δ} t})}^{- 1} \\ b_{t} = & (x_{1} - x_{2} e^{- \sqrt{Δ} t}) {(x_{1} x_{2} - x_{2} x_{1} e^{- \sqrt{Δ} t})}^{- 1} . \end{matrix}$

Proof

The probability density $M_{t}^{(i)}$ can be found back by taking

$M_{t}^{(i)} = \frac{1}{i!} {(\partial_{z}^{i} \hat{M} (t, z))}_{z = 0} .$

The result follows from the derivation of these derivatives in Appendix D.2.

This special case ends the section. In the next section, we will combine results from Sections 3 and 4 and use our ability to compute $L_{t}$ and $M_{t}$ to compute $K_{t}$ , the probability distribution of the population size given $(T, O)$ .

5. The distribution of past population size conditioned on observations

5.1. The distribution at fixed times

In Section 3, we explained how to compute $L_{t}$ , the probability density of the observations below time t conditioned on the population size at time t. This relies either on Algorithm 1 in the general case, or on the more optimized Proposition 3.1 in case $ω = 0$ , or Proposition 3.2 in the case $r = 1$ .

In Section 4, we explained how to compute $M_{t}$ , the probability density of the observations above time t and the population size at time t. This relies either on Algorithm 2 in the general case, or on the more optimized Corollary 4.3.1 when $ω = 0$ . We now combine $L_{t}$ and $M_{t}$ to derive the probability distribution of the population size given $(T, O)$ . Provided we have stored numerical values ${({\tilde{L}}_{τ_{j}}^{(i)})}_{\begin{matrix} i \in {0, 1, \dots, N} \\ j \in {1, 2, \dots, S} \end{matrix}}$ and ${({\tilde{M}}_{τ_{j}}^{(i)})}_{\begin{matrix} i \in {0, 1, \dots, N} \\ j \in {1, 2, \dots, S} \end{matrix}}$ for a set of time points ${(τ_{j})}_{j = 1}^{S}$ , recall from the first section that we obtain

\begin{matrix} K_{τ_{j}}^{(i)} = & P (I_{τ_{j}} = k_{τ_{j}} + i | T, O) \\ = & \frac{L_{τ_{j}}^{(i)} M_{τ_{j}}^{(i)}}{P (T, O)} \\ \approx & \frac{{\tilde{L}}_{τ_{j}}^{(i)} {\tilde{M}}_{τ_{j}}^{(i)}}{P (T, O)} if i ⩽ N, and 0 otherwise . \end{matrix}

Note that the denominator needs only be computed once, by evaluating $\sum_{i = 0}^{N} {\tilde{L}}_{τ_{j}}^{(i)} {\tilde{M}}_{τ_{j}}^{(i)}$ for example at time $τ_{j} = t_{or}$ or $τ_{j} = 0$ as described in previous sections.

Depending on the parameter space that one wants to consider, it thus remains to arrange pieces stemming from the previous sections. We provide a flowchart in Fig. 4 to guide the reader to chose the most efficient path.

Fig. 4 — The most efficient results depending on the parameter space considered. In red, results already described in Stadler (2010) and Gupta et al. (2019). In blue, the new contribution of this manuscript. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

5.2. Generator of trajectories

The previous result gives us the distribution of the population size at any time in the past, but does not state anything about population size trajectories. We provide now an approximate way of simulating population size trajectories conditioned on $(T, O)$ .

Indeed, recall we have,

\begin{matrix} K_{t}^{(i)} ≔ & P (I_{t} = k_{t} + i | T, O) \propto L_{t}^{(i)} M_{t}^{(i)} \\ {\dot{L}}_{t}^{(i)} = & - γ (k_{t} + i) L_{t}^{(i)} + λ (2 k_{t} + i) L_{t}^{(i + 1)} + μ {iL}_{t}^{(i - 1)} \\ {\dot{M}}_{t}^{(i)} = & γ (k_{t} + i) M_{t}^{(i)} - μ (i + 1) M_{t}^{(i + 1)} - λ (2 k_{t} + i - 1) 1_{i > 0} M_{t}^{(i - 1)} . \end{matrix}

We thus get,

\begin{matrix} {\dot{K}}_{t}^{(i)} \propto & {\dot{L}}_{t}^{(i)} M_{t}^{(i)} + L_{t}^{(i)} {\dot{M}}_{t}^{(i)} \\ \propto & - γ (k_{t} + i) L_{t}^{(i)} M_{t}^{(i)} + λ (2 k_{t} + i) L_{t}^{(i + 1)} M_{t}^{(i)} + μ {iL}_{t}^{(i - 1)} M_{t}^{(i)} \\ + & γ (k_{t} + i) M_{t}^{(i)} L_{t}^{(i)} - μ (i + 1) M_{t}^{(i + 1)} L_{t}^{(i)} - λ (2 k_{t} + i - 1) 1_{i > 0} M_{t}^{(i - 1)} L_{t}^{(i)} \\ \propto & λ (2 k_{t} + i) \frac{L_{t}^{(i + 1)}}{L_{t}^{(i)}} K_{t}^{(i)} + μ i \frac{L_{t}^{(i - 1)}}{L_{t}^{(i)}} K_{t}^{(i)} - λ (2 k_{t} + i - 1) 1_{i > 0} \frac{L_{t}^{(i)}}{L_{t}^{(i - 1)}} K_{t}^{(i - 1)} - μ (i + 1) \frac{L_{t}^{(i)}}{L_{t}^{(i + 1)}} K_{t}^{(i + 1)} \\ \propto & Q_{t}^{(i, i)} K_{t}^{(i)} + Q_{t}^{(i - 1, i)} K_{t}^{(i - 1)} + Q_{t}^{(i + 1, i)} K_{t}^{(i + 1)} . \end{matrix}

(5.1)

We introduced in the last line the following notation,

\begin{matrix} Q_{t}^{(i + 1, i)} = & - μ (i + 1) \frac{L_{t}^{(i)}}{L_{t}^{(i + 1)}} \\ Q_{t}^{(i - 1, i)} = & - λ (2 k_{t} + i - 1) 1_{i > 0} \frac{L_{t}^{(i)}}{L_{t}^{(i - 1)}} \\ Q_{t}^{(i, i)} = & λ (2 k_{t} + i) \frac{L_{t}^{(i + 1)}}{L_{t}^{(i)}} + μ i \frac{L_{t}^{(i - 1)}}{L_{t}^{(i)}} . \end{matrix}

Using these, we see that $Q_{t}^{(i, i)} = - (Q_{t}^{(i, i + 1)} + Q_{t}^{(i, i - 1)})$ . This allows us to draw trajectories of the number of ancestors in the past as a time-continuous Markov process with the (inhomogeneous) rates $Q_{t}$ written above.

Observe that we could equally write these ODE coefficients using the $M_{t}^{(i)}$ ’s. This gives,

\begin{matrix} {\dot{K}}_{t}^{(i)} \propto & λ (2 k_{t} + i) \frac{M_{t}^{(i)}}{M_{t}^{(i + 1)}} K_{t}^{(i + 1)} + μ i \frac{M_{t}^{(i)}}{M_{t}^{(i - 1)}} K_{t}^{(i - 1)} - μ (i + 1) \frac{M_{t}^{(i + 1)}}{M_{t}^{(i)}} K_{t}^{(i)} - λ (2 k_{t} + i - 1) 1_{i > 0} \frac{M_{t}^{(i - 1)}}{M_{t}^{(i)}} K_{t}^{(i)} \\ \propto & R_{t}^{(i + 1, i)} K_{t}^{(i + 1)} + R_{t}^{(i - 1, i)} K_{t}^{(i - 1)} + R_{t}^{(i, i)} K_{t}^{(i)} \end{matrix}

(5.2)

where we introduced in the last line the following notation,

\begin{matrix} R_{t}^{(i + 1, i)} = & λ (2 k_{t} + i) \frac{M_{t}^{(i)}}{M_{t}^{(i + 1)}} \\ R_{t}^{(i - 1, i)} = & μ i \frac{M_{t}^{(i)}}{M_{t}^{(i - 1)}} \\ R_{t}^{(i, i)} = & - λ (2 k_{t} + i - 1) 1_{i > 0} \frac{M_{t}^{(i - 1)}}{M_{t}^{(i)}} - μ (i + 1) \frac{M_{t}^{(i + 1)}}{M_{t}^{(i)}} . \end{matrix}

This is a standard result for Markov chains that are conditioned on a final state, and the shape of the newly derived transition kernel is called a Doob’s transform (Levin and Peres, 2017). Note that these transitions symplify for special cases when we have an analytical expression of either $L_{t}^{(i)}$ or $M_{t}^{(i)}$ .

5.3. Numerical implementation

Results of this paper have been implemented numerically and the code is freely available on GitLab: https://gitlab.com/MMarc/popsize-distribution/.

We used the numerical implementation to verify the correctness of the results in several ways:

1.
We verified that the values of the probability density of $(T, O)$ computed using $L_{t}$ and $M_{t}$ (i.e. respectively using Algorithms 1’ and 2’) were equivalent to values computed using already known formulas when $(ω = 0, r = 0)$ (Stadler, 2010) or when $r = 1$ (Gupta et al., 2019). See result in Fig. 5AB.
2.
We verified that the values of the probability density of $(T, O)$ computed using $L_{t}$ or $M_{t}$ (Algorithms 1’ and 2’) were identical on examples for which no previous formula was known. See result in Fig. 5C.
3.
We assessed the distribution of the population size against the only numerical method performing the same goal, the particle filtering developed in Vaughan et al. (2019). We compared values of a few quantiles computed using the two methods, see result in Fig. 5DEF). Note that (Vaughan et al., 2019) considered that we never have data on the removal status of individuals. We thus adapted our developments to this scenario in this specific comparison, by summing updates corresponding to the removal or not of the sampled individuals.

Fig. 5 — Assessment of the accuracy of the methods presented in this paper, on toy datasets. First row, probability density of data, A) against known analytical formula when $ω = 0$ and $(μ, ρ, ψ, r) = (1, 0.5, 0.3, 0.2)$ ; B) against known analytical formula when $r = 1$ and $(μ, ρ, ψ, ω) = (1, 0.5, 0.3, 0.6)$ ; C) obtained using Algorithms 1’ or 2’ otherwise, with $(μ, ρ, ψ, r, ω) = (1, 0.5, 0.3, 0.2, 0.6)$ . Second row, quantiles of the population size distribution, against the particle filter in Vaughan et al. (2019), with parameters $(μ, ρ, ψ, r, ω) = (1, 0.1, 0.001, 0.5, 0.001)$ . D) quantile of level 0.2; E) median; F) quantile of level 0.8.

On each of these sanity checks, we verified that different quantities match across different $λ$ values. Note that we could equivalently have chosen any other parameter to be varied.

We also illustrate in Fig. 6 our target distribution $K_{t}$ of the past population size conditioned on $(T, O)$ , on a few simulated examples.

6. Discussion

The results we have derived in this paper fit into two main categories. The first category concerns results allowing one to compute the probability density of a tree and occurrences, while the second category concerns results allowing one to compute the probability distribution of the population size in the past. We discuss these two categories below, before presenting ideas for future extensions of the model.

6.1. Using the probability density of the data

We present in this article new ways to compute the probability density of the data, $P (T, O)$ . For the special cases $(ω = 0, r = 0)$ or $(r = 1)$ , efficient calculations are available in Stadler, 2010, Gupta et al., 2019. Our two Algorithms 1’ and 2’ have the potential to improve the computation time of $P (T, O)$ also when $ω \neq 0$ and $r \neq 1$ . When analysing data, as described below, often this probability density is conditioned on sampling at least one individual, using $u_{t_{or}}$ (Stadler, 2012).

In the case that the tree is known, we can use $P (T, O | λ, μ, ρ, ψ, r, ω, t_{or})$ (with conditioning on sampling at least one individual) to obtain maximum likelihood parameter estimates for the birth-death parameters $λ, μ$ as well as the sampling parameters $ρ, ψ, r, ω$ . For special cases of this model, it has been shown that not all sampling parameters are identifiable (see e.g. Stadler et al., 2019). Future work will involve investigating which of the sampling parameters in the general model can be estimated.

On the other hand, data may consist of sequencing data $A$ and occurrence data $O$ . Bayesian tools are then typically employed to obtain a sample from the posterior distribution of the parameters using Markov chain Monte Carlo methods. The posterior distribution is,

f (T, θ, λ, μ, ρ, ψ, ω, t_{or} | O, A) \propto f (A | T, θ) f (O, T | λ, μ, ρ, ψ, r, ω, t_{or}) f (λ, μ, ρ, ψ, r, ω, θ, t_{or}),

with $θ$ summarizing the parameters of the model of molecular evolution and $f (λ, μ, ρ, ψ, r, ω, θ, t_{or})$ being the prior distribution on the model parameters.

6.2. Probability distribution of past population sizes

The main results of this paper allow oneto compute the probability distribution of the population size in the past and to generate population size trajectories conditioned on $(T, O)$ (Section 5).

Given a tree and occurrences together with birth-death parameters (which may be the maximum likelihood parameters obtained based on the tree and record of occurrences), we can simulate the distribution of past population sizes as described in Section 5.2. Furthermore, we can calculate the probability of a population size at any time in the past as described in Section 5.1.

If we are instead provided with sequencing data $A$ and occurrence data $O$ , and want to generate a simulated ensemble characterizing the posterior distribution of past population size trajectories $I$ , we can use the following strategy. The posterior distribution is,

f (T, I, θ, λ, μ, ρ, ψ, ω, t_{or} | O, A) = f (I | T, θ, λ, μ, ρ, ψ, ω, t_{or}, O, A) f (T, θ, λ, μ, ρ, ψ, ω, t_{or} | O, A)

We have described above how to obtain a sample from the posterior distribution $f (T, θ, λ, μ, ρ, ψ, ω, t_{or} | O, A)$ using Markov chain Monte Carlo. For each sample of $(T, θ, λ, μ, ρ, ψ, ω, t_{or})$ thus obtained, we can simulate an appropriately conditioned population size trajectory $I$ as described in Section 5.2. The ensemble of trajectories thus generated has the required distribution. We can employ an analogous procedure if we are interested in the posterior probability distribution of the population size at a particular time t. For each posterior sample of $(T, θ, λ, μ, ρ, ψ, ω, t_{or})$ , we can calculate the population size distribution at time t using Section 5.1. The posterior population size at time t is then the average over all these conditional distributions.

6.3. Increased efficiency opens new research avenues

Both the density $P (T, O)$ and the probability distribution of the population size in the past $(K_{t})$ can be obtained using the Monte-Carlo particle filtering algorithm developed in Vaughan et al. (2019). The new approach presented in this paper is nevertheless appealing for two reasons. First, it provides a direct link with previous analytical formulas developed in Stadler, 2010, Gupta et al., 2019, thus improving our understanding of these processes and leading to very efficient results in the specific case where $ω = 0$ . Second, Algorithms 1 and 2 have the potential to be more efficient alternatives to the Monte-Carlo particle filtering algorithm. Computing quantiles shown in Fig. 5DEF using the particle filtering took a few days, as compared to a few minutes with our method, mainly because it can be applied directly on a fixed tree and does not need to be part of a MCMC. A more thorough quantitative comparison of both approaches would require to implement this work in a MCMC framework, which is beyond the scope of this paper.

This increased efficiency could open up the possibility to analyse much bigger datasets in the near future. In macroevolution, the study of clades with a huge fossil record like cetaceans could benefit from our approach. This dataset is characterized by a rather small number of extant species and fossils with morphological data available (respectively $ρ$ -sampled and $ψ$ -sampled species), but includes a huge number of fossils without morphological data ( $ω$ -sampled species) (Morlon et al., 2011, Barido-Sottani et al., 2019). For the cetaceans as well as many other clades, it will be of great interest to compute diversity estimates under the modelling framework presented here (assuming $ρ \neq 0, ω \neq 0, r = 0$ ). Ultimately, all $ω$ -samples could be taken into account to inform the tree and diversity estimates.

In the context of epidemiology, typically, the genetic sequences of the pathogen are only available for a fraction of the infected individuals. These correspond to $ψ$ -samples, while other sampled infected individuals correspond to $ω$ -samples. Further developing our approach in a Bayesian framework, both the genetic sequences and the record of occurrence could be jointly used to estimate the underlying transmission tree and prevalence of the disease through time. Depending on the cost of sequencing and the ability of numerical methods to handle some critical amount of both genetic sequences and number of occurrences, optimal sampling procedure could be investigated, to make the most of both types of data.

Finally, while improving on current methods, these two Algorithms 1 and 2 still only provide approximations of, respectively, $L_{t}$ and $M_{t}$ , that critically rely on the truncation parameter of the state space N. Increasing N leads to a more accurate approximation, while increasing the runtime of the method. If the probability mass of the number of hidden individuals is non-negligible above N, both algorithms will lead to very poor approximations of $L_{t}$ and $M_{t}$ . This value should thus be carefully chosen in empirical applications, depending on what is expected with the data at hand. We point out that the behaviour of these algorithms strongly relies on the runtime and accurracy of the matrix exponentiation steps. Numerous matrix exponentiation methods have been proposed in the literature (Moler and Van Loan, 2003). In our current implementation, we rely on a recent matrix exponentiation method already implemented in scipy (Al-Mohy and Higham, 2010). Future avenues towards improving this specific step could focus on new theoretical results adapted to tridiagonal matrices (Smith and Shahrezaei, 2015) or alternatively try to adapt Laplace transform approximations derived in Crawford et al. (2014), who present theoretical results bounding the errors made in their approximation.

6.4. Future extensions

Our proposed modelling framework lends itself well for various biologically realistic extensions to allow closer fit to empirical data in a variety of situations.

The first extension that we envision is to relax the assumption of rate homogeneity and instead work with time-varying rates. This has already been considered in different studies relying on birth-death processes, either with exponentially varying functions (Morlon et al., 2011) or with piecewise constant rates (a model dubbed as skyline birth-death process, see Stadler et al., 2013, Gavryushkina et al., 2016). As all our results can be straightforwardly adapted to such a framework, this would not require much theoretical work. However, the challenge would be to do so without overfitting the data.

Another popular extension that has been described in the literature on birth-death processes for phylodynamics is to consider multi-type birth-death processes (Maddison et al., 2007). Each individual is assigned a type, which impacts its propensity to give birth to other types. All sampling-related parameters can also be considered type-dependent. The main challenge here boils down to dealing with an increase of dimensionality, because we would be interested in the joint distribution of all subpopulation sizes. This extension is particularly interesting for epidemiological applications, when different populations of infected individuals, clustered according to some characteristic (e.g. patient behaviour or geography) might have very different dynamics (Stadler and Bonhoeffer, 2013).

Finally, we are very hopeful that this piece of work could be applied as well to density-dependent birth-death processes, also known as logistic birth-death models. Indeed, very similar ideas to the breadth-first forward and backward traversals as applied in Algorithms 1’ and 2’ appear in the context of logistic birth-death models (Etienne et al., 2012, Leventhal et al., 2013, Laudanno et al., 2020). Preliminary results obtained by adapting our numerical algorithms to this framework are very encouraging, and we are currently in the process of deriving as much analytical results as we can to speed up the method. We are hoping to present this in a subsequent paper.

6.5. Conclusion

This manuscript presents a way to efficiently compute the distribution of the past population size in a linear birth-death process, conditioned on the observation of a reconstructed phylogenetic tree and a record of occurrences through time. Such data are very common in macroevolution where the reconstructed phylogenetic tree of extant species is available together with occurrences from the fossil record. In epidemiology, pathogen genetic sequencing data and case count data are a common data source. Our method thus promises to allow efficient quantification of past population sizes, representing past biodiversity or past prevalence, from these rich datasets.

We believe that this method also paves the way for the consideration of more complex and more realistic demographic scenarios, assuming either time-dependent (Morlon et al., 2011, Stadler et al., 2013, Gavryushkina et al., 2016) or density-dependent parameters (Etienne et al., 2012, Leventhal et al., 2013), potentially catering for populations with multiple demographic categories/types (Maddison et al., 2007, Stadler and Bonhoeffer, 2013, Freyman and Höhna, 2018). It is our hope that this manuscript will foster important research advances for unravelling demographic histories in epidemiology, macroevolution, and any other fields where birth-death processes form a relevant model framework.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

CRediT authorship contribution statement

Marc Manceau: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft. Ankit Gupta: Validation, Investigation, Writing - original draft, Visualization. Timothy Vaughan: Validation, Investigation, Writing - review & editing, Supervision. Tanja Stadler: Conceptualization, Resources, Writing - review & editing, Supervision, Funding acquisition.

Acknowledgements

The authors are grateful to Rachel Warnock for helpful discussion on potential applications of the model, and Alex Zarebski for his thorough examination of this manuscript. A.G. is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreement no. 743269 (CyberGenetics project)

Contributor Information

Marc Manceau, Email: marc.manceau@bsse.ethz.ch.

Tanja Stadler, Email: tanja.stadler@bsse.ethz.ch.

Appendix A. Solving well-known ODEs

A.1. The extinction probability

We first deal with Eq. (2.1) governing $u_{t}$ , and start by studying the polynomial $λ x^{2} - γ x + μ$ .

This polynomial has discriminant $Δ = γ^{2} - 4 λ μ > {(λ + μ)}^{2} - 4 λ μ ⩾ {(λ - μ)}^{2} ⩾ 0$ . Note that the first inequality holds in the case we are interested in because we can assume that $ψ + ω > 0$ . When this is not the case, one needs to consider that $λ \neq μ$ . Roots are

x_{1} = \frac{γ - \sqrt{Δ}}{2 λ} and x_{2} = \frac{γ + \sqrt{Δ}}{2 λ} .

Moreover, we know that both roots are positive because $Δ < γ^{2} \Rightarrow \sqrt{Δ} < γ \Rightarrow x_{1} > 0$ .

On an interval including zero and where the polynomial remains positive (as $(- \infty, x_{1})$ for example), we can write,

\begin{matrix} \frac{du}{λ u^{2} - γ u + μ} = dt \\ \Leftrightarrow & \frac{du}{(x_{1} - u) (x_{2} - u)} = λ dt \\ \Leftrightarrow & \frac{1}{x_{2} - x_{1}} (\frac{1}{x_{1} - u} - \frac{1}{x_{2} - u}) du = λ dt \\ \Leftrightarrow & (\frac{1}{x_{1} - u} - \frac{1}{x_{2} - u}) du = \sqrt{Δ} dt . \end{matrix}

Integrating both sides between time 0 and t, we get

\begin{matrix} \frac{x_{2} - u_{t}}{x_{1} - u_{t}} = \frac{x_{2} - z}{x_{1} - z} e^{\sqrt{Δ} t} \\ \Leftrightarrow & x_{2} (x_{1} - z) e^{- \sqrt{Δ} t} - u_{t} (x_{1} - z) e^{- \sqrt{Δ} t} = x_{1} (x_{2} - z) - u_{t} (x_{2} - z) \\ \Leftrightarrow & u_{t} ((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t}) = x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t} \\ \Leftrightarrow & u_{t} = \frac{x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t}}{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t}} . \end{matrix}

This is the result stated in Eq. (2.2). Note that this quantity is called $p_{0} (t)$ in Stadler (2010), or $E (t)$ in Maddison et al. (2007).

A.2. Probability to leave only one sampled descendent

We aim here to integrate a slight variation of Eq. (2.3) governing $p_{t}$ when $k = 1$ . The equation we are interested in is,

\frac{{dW}_{s}}{ds} = (2 λ u (s, z) - γ) {kW}_{s}

(A.1)

\begin{matrix} \frac{{dW}_{s}}{W_{s}} = & (2 λ k \frac{x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} s}}{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} s}} - γ k) ds \\ = & (\frac{2 λ {kx}_{1}}{\sqrt{Δ}} \frac{\sqrt{Δ} (x_{2} - z) e^{\sqrt{Δ} s}}{(x_{2} - z) e^{\sqrt{Δ} s} - (x_{1} - z)} - \frac{2 λ {kx}_{2}}{\sqrt{Δ}} \frac{(x_{1} - z) \sqrt{Δ} e^{- \sqrt{Δ} s}}{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} s}} - γ k) ds . \end{matrix}

All these three terms can be integrated visually between some time $t_{h}$ and t, leading to,

\begin{matrix} \ln \frac{W_{t}}{W_{t_{h}}} = & \frac{2 λ {kx}_{1}}{\sqrt{Δ}} {[\ln ((x_{2} - z) e^{\sqrt{Δ} s} - (x_{1} - z))]}_{t_{h}}^{t} - \frac{2 λ {kx}_{2}}{\sqrt{Δ}} {[\ln ((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} s})]}_{t_{h}}^{t} - γ k (t - t_{h}) \\ = & \frac{2 λ {kx}_{1}}{\sqrt{Δ}} \ln \frac{(x_{2} - z) e^{\sqrt{Δ} t} - (x_{1} - z)}{(x_{2} - z) e^{\sqrt{Δ} t_{h}} - (x_{1} - z)} - \frac{2 λ {kx}_{2}}{\sqrt{Δ}} \ln \frac{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t}}{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t_{h}}} - γ k (t - t_{h}) \\ = & - \frac{2 λ k (x_{2} - x_{1})}{\sqrt{Δ}} \ln \frac{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t}}{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t_{h}}} - γ k (t - t_{h}) + 2 λ {kx}_{1} (t - t_{h}) \\ = & - 2 k \ln \frac{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t}}{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t_{h}}} - k \sqrt{Δ} (t - t_{h}) . \end{matrix}

Leading to the final expression below

W_{t} = W_{t_{h}} {(\frac{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t}}{(x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t_{h}}})}^{- 2 k} e^{- k \sqrt{Δ} (t - t_{h})} .

(A.2)

Note that the case $k = 1, t_{h} = 0$ and $W_{0} = 1 - z$ corresponds to the probability $p_{t}$ given as Eq. (2.4),

p (t, z) = (1 - z) \frac{Δ}{λ^{2}} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 2} e^{- \sqrt{Δ} t} .

while the general case can be expressed using function p as

W_{t} = W_{t_{h}} {(\frac{p (t, z)}{p (t_{h}, z)})}^{k} .

A.3. A few useful properties

Solutions $u (t, z)$ and $p (t, z)$ to ODEs (2.1), (2.3) satisfy two properties relying on the semi-group property of solutions of ODEs, namely,

u (t_{2} - t_{1}, u (t_{1}, z)) = u (t_{2}, z)

(A.3)

p (t_{2} - t_{1}, u (t_{1}, z)) = \frac{p (t_{2}, z) p (0, u (t_{1}, z))}{p (t_{1}, z)} = \frac{p (t_{2}, z) (1 - u (t_{1}, z))}{p (t_{1}, z)} .

(A.4)

These two properties are useful in many calculations throughout this document, e.g.

•
Solving the main PDE in Appendix C requires inverting u, using the first property with,
$\begin{matrix} z = u (t - t_{h}, z_{0}) \\ \Leftrightarrow & u (t_{h} - t, z) = u (t_{h} - t, u (t - t_{h}, z_{0})) \\ \Leftrightarrow & u (t_{h} - t, z) = u (0, z_{0}) \\ \Leftrightarrow & z_{0} = u (t_{h} - t, z) . \end{matrix}$
•
The same Appendix section requires also composing function p and u, using
$\frac{p (t - t_{h}, u (t_{h} - t, z))}{p (0, u (t_{h} - t, z)} = \frac{p (0, z)}{p (t_{h} - t, z)} .$
•
In the proof of Proposition 4.3, we switch to the notation $R (t, z) = p (t, z) / (1 - z)$ and again compose R and u in the same way,
$\begin{matrix} \frac{p (t_{j} - t, u (t_{h} - t, z))}{p (0, u (t_{h} - t, z))} = \frac{p (t_{j} - t, z)}{p (t_{h} - t, z)} \\ \Leftrightarrow & \frac{p (t_{j} - t, u (t_{h} - t, z))}{1 - u (t_{h} - t, z)} = \frac{p (t_{j} - t, z)}{p (t_{h} - t, z)} \\ \Leftrightarrow & R (t_{j} - t_{h}, u (t_{h} - t, z)) = \frac{R (t_{j} - t, z)}{R (t_{h} - t, z)} . \end{matrix}$

Appendix B. Link with previous work by Gupta et al. (2019)

We aim here at providing details to link this work with results previously derived by Gupta et al. (2019), allowing efficient computation of $L_{t}^{(i)}$ in the special case $r = 1$ .

To do so, we define the distinguishable version of the probability $L_{t}^{(i)}$ as

{\overline{L}}_{t}^{(i)} = \frac{(k + i)!}{i!} L_{t}^{(i)} .

(B.1)

We now derive the ODE for ${\overline{L}}_{t}^{(i)}$ . Multiplying both sides of (3.2) by $\frac{(k + i)!}{i!}$ we obtain

\begin{matrix} {\dot{\overline{L}}}_{t}^{(i)} = & - γ (k + i) \frac{(k + i)!}{i!} L_{t}^{(i)} + λ (2 k + i) \frac{(k + i)!}{i!} L_{t}^{(i + 1)} + μ i \frac{(k + i)!}{i!} L_{t}^{(i - 1)} \\ = - γ (k + i) \frac{(k + i)!}{i!} L_{t}^{(i)} + λ (k + i) [\frac{(2 k + i) (i + 1)}{(k + i + 1) (k + i)}] \frac{(k + i + 1)!}{(i + 1)!} L_{t}^{(i + 1)} + μ (k + i) \frac{(k + i - 1)!}{(i - 1)!} L_{t}^{(i - 1)} \\ = - γ (k + i) {\overline{L}}_{t}^{(i)} + λ (k + i) ϕ_{i, k} {\overline{L}}_{t}^{(i + 1)} + μ (k + i) {\overline{L}}_{t}^{(i - 1)}, \end{matrix}

(B.2)

where

ϕ_{i, k} = \frac{(2 k + i) (i + 1)}{(k + i + 1) (k + i)} = 1 - \frac{k (k - 1)}{(k + i + 1) (k + i)}

is the probability that a coalescing pair of randomly chosen lineages (from ( $k + i + 1$ ) total lineages) does not consist of two sampled lineages. This shows that ${\overline{L}}_{t}^{(i)}$ satisfies the ODE (B.2) across any epoch. One can see that at punctual events the transition conditions (3.3), (3.8) hold for ${\overline{L}}_{t}^{(i)}$ for $ψ$ -sampling and branching events respectively. Moreover at $ω$ -sampling events the transition condition (3.6) transforms to

{\overline{L}}_{t^{+}}^{(i)} = ω (k + i) {\overline{L}}_{t^{-}}^{(i - 1)} .

With these transition conditions and initial condition ${\overline{L}}_{0}^{(i)} = \frac{(k_{0} + i)!}{i!} L_{0}^{(i)} = \frac{(k_{0} + i)!}{i!} {(1 - ρ)}^{k_{0}} ρ^{i}$ , the ODE (B.2) was solved explicitly in Gupta et al. (2019) and the solution is of the form

{\overline{L}}_{t^{+}}^{(i)} = \sum_{ℓ = 0}^{q} \frac{(k + i)!}{(i - ℓ)!} u_{t}^{i - ℓ} W_{t}^{(ℓ)}

where q is the number of $ω$ -sampling events in the time-interval $[0, t)$ and the $(q + 1)$ -dimensional time-varying vector $W_{t} = (W_{t}^{(0)}, \dots, W_{t}^{(q)})$ can be analytically computed following the approach in Gupta et al. (2019). Therefore from (B.1) we state Proposition 3.2.

Appendix C. Solving the main PDE

We aim now at finding an analytical solution for the following PDE, driving the evolution of $\hat{M}$ across a given epoch $(t_{h - 1}, t_{h})$ , on which the number of observed lineages remains constant and equal to k,

\begin{matrix} \hat{M} (t_{h}, z) = & F (z) \\ \partial_{t} \hat{M} + (λ z^{2} - γ z + μ) \partial_{z} \hat{M} + k (2 λ z - γ) \hat{M} = & 0 . \end{matrix}

C.1. Principle of the method of characteristics

This problem can be solved by the method of characteristics. We suppose that we can write $\hat{M} (t, z) = \hat{M} (t (s), z (s))$ where functions t and z satisfy the ODEs,

\begin{matrix} \frac{dz}{ds} = & λ z^{2} - γ z + μ \\ \frac{dt}{ds} = & 1 . \end{matrix}

This way, the function $g (s) = \hat{M} (t (s), z (s))$ satisfies another ODE, that we will have to solve,

\begin{matrix} \frac{dg}{ds} = \frac{dz}{ds} \partial_{z} \hat{M} + \frac{dt}{ds} \partial_{t} \hat{M} = (λ z^{2} - γ z + μ) \partial_{z} \hat{M} + \partial_{t} \hat{M} = - k (2 λ z - γ) \hat{M} \\ \Leftrightarrow & \frac{dg}{ds} + k (2 λ z - γ) g = 0 . \end{matrix}

C.2. Step 1, solve for $t (s), z (s)$ and $g (s)$

We start by integrating $t (s)$ . We moreover fix that $t (0) = t_{h}$ , thus leading to $t (s) = t_{h} + s$ .

We now turn to z, and notice that it satisfies previously studied ODE (2.1). Integrating between 0 and s leads to,

z (s) = u (s, z_{0}) = \frac{x_{1} (x_{2} - z_{0}) - x_{2} (x_{1} - z_{0}) e^{- \sqrt{Δ} t}}{(x_{2} - z_{0}) - (x_{1} - z_{0}) e^{- \sqrt{Δ} t}} .

Last, g satisfies an ODE very similar to (A.1). Taking care of the minus sign, it leads to the following result,

g_{s} = g_{0} {(\frac{1}{R (s, z_{0})})}^{k} .

(C.1)

C.3. Step 2, express $\hat{M}$ back as a function of $t, z$

We want to express our two unknown quantities s and $z_{0}$ as functions of t and z.

On a first hand, we get easily $s = t - t_{h}$ . We moreover can solve for $z_{0}$ in the following equation, remembering the semi-group property of u,

z = u (t - t_{h}, z_{0}) \Leftrightarrow z_{0} = u (t_{h} - t, z) .

Substituting these into the previous expression (C.1) of $g_{s}$ then leads to,

\begin{matrix} \hat{M} (t, z) = & F (u (t_{h} - t, z)) R {(t - t_{h}, u (t_{h} - t, z))}^{- k} \\ = & F (u (t_{h} - t, z)) R {(t_{h} - t, z)}^{k} . \end{matrix}

where the first to second equality relies on a property exposed in A.3. This gives us the final formula which is stated in Proposition 4.1.

Appendix D. Some useful algebra

This section of the Appendix pools together all bits of algebra that are not really digestible, but are used in the main text.

D.1. Deivative of $\hat{M}$

We first modify a bit the expression of the generating function,

\begin{matrix} \hat{M} (t_{h} - t, z) = & R {(t_{h} - t, z)}^{k} \sum_{l = 0}^{N} {\tilde{M}}_{t_{h}}^{(l)} u {(t_{h} - t, z)}^{l} \\ = & {(\frac{Δ}{λ^{2}} e^{- \sqrt{Δ} t})}^{k} \sum_{l = 0}^{N} {\tilde{M}}_{t_{h}}^{(l)} {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{l} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- (2 k + l)} . \end{matrix}

Applying Leibniz’s derivation rule to the product, we get,

{(\partial_{z}^{i} \hat{M} (t_{h} - t, z))}_{z = 0} = {(\frac{Δ}{λ^{2}} e^{- \sqrt{Δ} t})}^{k} \sum_{l = 0}^{N} {\tilde{M}}_{t_{h}}^{(l)} \sum_{α = 0}^{i} (\begin{matrix} i \\ α \end{matrix}) {(\partial_{z}^{α} {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{l})}_{z = 0}

{(\partial_{z}^{i - α} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- (2 k + l)})}_{z = 0} .

(D.1)

The first of the two derivatives in the sum can be computed as,

\begin{matrix} \partial_{z} {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{l} = & l (- x_{1} + x_{2} e^{- \sqrt{Δ} t}) {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{l - 1} 1_{l ⩾ 1} \\ \partial_{z}^{2} {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{l} = & l (l - 1) {(- x_{1} + x_{2} e^{- \sqrt{Δ} t})}^{2} {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{l - 2} 1_{l ⩾ 2} \\ ⋮ \\ \partial_{z}^{α} {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{l} = & \frac{l!}{(l - α)!} {(- x_{1} + x_{2} e^{- \sqrt{Δ} t})}^{α} {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{l - α} 1_{l ⩾ α} . \end{matrix}

While the second gives us,

\begin{matrix} \partial_{z} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- (2 k + l)} = & (2 k + l) (1 - e^{- \sqrt{Δ} t}) {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- (2 k + l + 1)} \\ \partial_{z}^{2} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- (2 k + l)} = & (2 k + l) (2 k + l + 1) {(1 - e^{- \sqrt{Δ} t})}^{2} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- (2 k + l + 2)} \\ ⋮ \\ \partial_{z}^{i - α} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- (2 k + l)} = & (\prod_{m = 0}^{i - α - 1} (2 k + l + m)) {(1 - e^{- \sqrt{Δ} t})}^{i - α} \\ {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- (2 k + l + i - α)} . \end{matrix}

Applying these derivatives in $z = 0$ in Eq. (D.1) yields,

\begin{matrix} {(\partial_{z}^{i} \hat{M} (t_{h} - t, z))}_{z = 0} = & {(\frac{Δ}{λ^{2}} e^{- \sqrt{Δ} t})}^{k} \sum_{α = 0}^{i} \sum_{l = α}^{N} {\tilde{M}}_{t_{h}}^{(l)} (\begin{matrix} i \\ α \end{matrix}) \frac{l!}{(l - α)!} (\prod_{m = 0}^{i - α - 1} (2 k + l + m)) \\ {(- x_{1} + x_{2} e^{- \sqrt{Δ} t})}^{α} {(x_{1} x_{2})}^{l - α} {(1 - e^{- \sqrt{Δ} t})}^{l + i - 2 α} {(x_{2} - x_{1} e^{- \sqrt{Δ} t})}^{- (2 k + l + i - α)} \end{matrix}

which is the expression provided in Proposition (4.2).

D.2. Derivatives of $\hat{M}$ when $ω = 0$

We wish here to derive the $\partial_{z}^{i} \hat{M} (t, z)$ where function $\hat{M}$ is as given in Proposition 4.3, i.e.

\hat{M} (t, z) = λ^{x} ψ^{v + w + y} r^{w} {(1 - r)}^{v + y} R (t_{or} - t, z) \prod_{t_{j} \in X} R (t_{j} - t, z) \prod_{t_{j} \in W} R {(t_{j} - t, z)}^{- 1} \prod_{t_{j} \in Y} u (t_{j} - t, z) R {(t_{j} - t, z)}^{- 1} .

We take for simplicity the derivative of the logarithm of $\hat{M}$ and express the derivatives of $\hat{M}$ using these and Leibniz’s formula,

\begin{matrix} \partial_{z} \hat{M} = \hat{M} \partial_{z} (\ln \hat{M}) \\ \partial_{z}^{2} \hat{M} = \partial_{z} \hat{M} \partial_{z} (\ln \hat{M}) + \hat{M} \partial_{z}^{2} (\ln \hat{M}) \\ \partial_{z}^{3} \hat{M} = \partial_{z}^{2} \hat{M} \partial_{z} (\ln \hat{M}) + 2 \partial_{z} \hat{M} \partial_{z}^{2} (\ln \hat{M}) + \hat{M} \partial_{z}^{3} (\ln \hat{M}) \\ ⋮ \\ \partial_{z}^{i} \hat{M} = \sum_{α = 1}^{i} (\begin{matrix} i - 1 \\ α - 1 \end{matrix}) (\partial_{z}^{i - α} \hat{M}) (\partial_{z}^{α} (\ln \hat{M})) . \end{matrix}

(D.2)

In order to compute the derivatives of $\ln \hat{M}$ , one needs to get the derivatives of $\ln R (t, z)$ and $\ln u (t, z)$ . We have

\begin{matrix} \ln R (t, z) = & - 2 \ln ((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t}) + \ln \frac{Δ}{λ^{2}} - \sqrt{Δ} t \\ \partial_{z} \ln R (t, z) = & 2 (1 - e^{- \sqrt{Δ} t}) {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 1} \\ \partial_{z}^{2} \ln R (t, z) = & 2 {(1 - e^{- \sqrt{Δ} t})}^{2} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 2} \\ \partial_{z}^{3} \ln R (t, z) = & 4 {(1 - e^{- \sqrt{Δ} t})}^{3} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 3} \\ ⋮ \\ \partial_{z}^{α} \ln R (t, z) = & 2 (α - 1)! {(1 - e^{- \sqrt{Δ} t})}^{α} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- α} . \end{matrix}

Finally taking the function in $z = 0$ leads to

\partial_{z}^{α} \ln R (t, 0) = 2 (α - 1)! a_{t}^{α}

(D.3)

where we defined a_{t} ≔ (1 - e^{- \sqrt{Δ} t}) {(x_{2} - x_{1} e^{- \sqrt{Δ} t})}^{- 1} .

(D.4)

In the same way we get,

\begin{matrix} \ln u (t, z) = & \ln (x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t}) - \ln ((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t}) \\ \partial_{z} \ln u (t, z) = & - (x_{1} - x_{2} e^{- \sqrt{Δ} t}) {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 1} \\ + (1 - e^{- \sqrt{Δ} t}) {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 1} \\ \partial_{z}^{2} \ln u (t, z) = & - {(x_{1} - x_{2} e^{- \sqrt{Δ} t})}^{2} {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 2} \\ + {(1 - e^{- \sqrt{Δ} t})}^{2} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 2} \\ \partial_{z}^{3} \ln u (t, z) = & - 2 {(x_{1} - x_{2} e^{- \sqrt{Δ} t})}^{3} {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 3} \\ + 2 {(1 - e^{- \sqrt{Δ} t})}^{3} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- 3} \\ ⋮ \\ \partial_{z}^{α} \ln u (t, z) = & (α - 1)! [- {(x_{1} - x_{2} e^{- \sqrt{Δ} t})}^{α} {(x_{1} (x_{2} - z) - x_{2} (x_{1} - z) e^{- \sqrt{Δ} t})}^{- α}) \\ (+ {(1 - e^{- \sqrt{Δ} t})}^{α} {((x_{2} - z) - (x_{1} - z) e^{- \sqrt{Δ} t})}^{- α}] . \end{matrix}

Here also, we are interested in the function in $z = 0$ ,

\partial_{z}^{α} \ln u (t, 0) = (α - 1)! (a_{t}^{α} - b_{t}^{α})

(D.5)

where we defined b_{t} ≔ (x_{1} - x_{2} e^{- \sqrt{Δ} t}) {(x_{1} x_{2} - x_{2} x_{1} e^{- \sqrt{Δ} t})}^{- 1} .

(D.6)

Last ingredient needed to write the derivative of $\ln \hat{M}$ , we get,

\begin{matrix} {(\partial_{z}^{α} \ln (u (t, z) R {(t, z)}^{- 1}))}_{z = 0} = & {(\partial_{z}^{α} \ln u (t, z))}_{z = 0} - {(\partial_{z}^{α} \ln R (t, z))}_{z = 0} \\ = & - (α - 1)! (a_{t}^{α} + b_{t}^{α}) . \end{matrix}

(D.7)

Finally, using Eq. (D.1)one can compute

\begin{matrix} {(\partial_{z}^{α} (\ln \hat{M} (t, z)))}_{z = 0} = & (α - 1)! C^{(α)} \\ where we defined C^{(α)} ≔ & 2 a_{t_{or} - t}^{α} + 2 \sum_{t_{j} \in X} a_{t_{j} - t}^{α} - 2 \sum_{t_{j} \in W} a_{t_{j} - t}^{α} - \sum_{t_{j} \in Y} (a_{t_{j} - t}^{α} + b_{t_{j} - t}^{α}) . \end{matrix}

Plugging this into Eq. D.2 and noting that ${(\partial_{z}^{i} \hat{M} (t, z))}_{z = 0} = i! M_{t}^{(i)}$ , we get

M_{t}^{(i)} = \sum_{α = 1}^{i} (\begin{matrix} i - 1 \\ α - 1 \end{matrix}) \frac{(i - α)! (α - 1)!}{i!} M_{t}^{(i - α)} C^{(α)} = \frac{1}{i} \sum_{α = 1}^{i} M_{t}^{(i - α)} C^{(α)}

which is the result stated in Corollary 4.3.1.

Appendix E. Inductions across the epochs

E.1. Proof of Proposition 3.1

We prove the proposition by induction across the epochs.

If we observe only the first epoch and the $k_{0}$ leaves at present, then we get at any time t across the first epoch $(0, t_{1}), L_{t}^{(i)} = ρ^{k_{0}} {(1 - ρ)}^{i} = u_{t}^{i} p_{t}^{k_{0}}$ , which satisfies Proposition 3.1.

Suppose we observed so far – i.e. on $(0, t_{h + 1})$ – v sampled ancestors, w removed leaves at times $t_{j} \in W$ , x branching events at times $t_{j} \in X$ , y non-removed leaves at times $t_{j} \in Y$ . And suppose that Proposition 3.1 is verified across epoch $(t_{h}, t_{h + 1})$ . Let us have a look at what happen across epoch $(t_{h + 1}, t_{h + 2})$ .

The observed punctual event $t_{h + 1}$ can either be,

1.
a removed ancestral leaf. Update (3.10) then applies. Subsequently, the number of sampled lineages increases by one and formula (3.9) applies on the next epoch, leading to
$\begin{matrix} L_{t}^{(i)} = & u_{t}^{i} W_{t} \\ where W_{t} = & λ^{x} ψ^{v + (w + 1) + y} {(1 - r)}^{v + y} r^{w + 1} p_{t_{h + 1}}^{k} {(\frac{p_{t}}{p_{t_{h + 1}}})}^{k + 1} \prod_{t_{j} \in X} p_{t_{j}} \prod_{t_{j} \in Y} u_{t_{j}} p_{t_{j}}^{- 1} \prod_{t_{j} \in W} p_{t_{j}}^{- 1} \\ = & λ^{x} ψ^{v + (w + 1) + y} {(1 - r)}^{v + y} r^{w + 1} p_{t}^{k} \prod_{t_{j} \in X} p_{t_{j}} \prod_{t_{j} \in Y} u_{t_{j}} p_{t_{j}}^{- 1} \prod_{t_{j} \in W \cup {t_{h + 1}}} p_{t_{j}}^{- 1} . \end{matrix}$
2.
a non-removed ancestral leaf. Update (3.11) then applies. Subsequently, the number of sampled lineages increases by one and formula (3.9) applies on the next epoch, leading to
$\begin{matrix} L_{t}^{(i)} = & u_{t}^{i} W_{t} \\ where W_{t} = & λ^{x} ψ^{v + w + (y + 1)} {(1 - r)}^{v + (y + 1)} r^{w} p_{t_{h + 1}}^{k} u_{t_{h + 1}} {(\frac{p_{t}}{p_{t_{h + 1}}})}^{k + 1} \prod_{t_{j} \in X} p_{t_{j}} \prod_{t_{j} \in Y} u_{t_{j}} p_{t_{j}}^{- 1} \prod_{t_{j} \in W} p_{t_{j}}^{- 1} \\ = & λ^{x} ψ^{v + w + (y + 1)} {(1 - r)}^{v + (y + 1)} r^{w} p_{t}^{k + 1} \prod_{t_{j} \in X} p_{t_{j}} \prod_{t_{j} \in Y \cup {t_{h + 1}}} u_{t_{j}} p_{t_{j}}^{- 1} \prod_{t_{j} \in W} p_{t_{j}}^{- 1} . \end{matrix}$
3.
a non-removed sampled ancestor along a branch. Update (3.12) then applies. The number of sampled lineages does not changes, and formula (3.9) applies on the next epoch, leading to
$\begin{matrix} L_{t}^{(i)} = & u_{t}^{i} W_{t} \\ where W_{t} = & λ^{x} ψ^{(v + 1) + w + y} {(1 - r)}^{(v + 1) + y} r^{w} p_{t_{h + 1}}^{k} {(\frac{p_{t}}{p_{t_{h + 1}}})}^{k} \prod_{t_{j} \in X} p_{t_{j}} \prod_{t_{j} \in Y} u_{t_{j}} p_{t_{j}}^{- 1} \prod_{t_{j} \in W} p_{t_{j}}^{- 1} \\ = & λ^{x} ψ^{v + w + y + 1} {(1 - r)}^{v + y + 1} r^{w} p_{t}^{k} \prod_{t_{j} \in X} p_{t_{j}} \prod_{t_{j} \in Y} u_{t_{j}} p_{t_{j}}^{- 1} \prod_{t_{j} \in W} p_{t_{j}}^{- 1} . \end{matrix}$
4.
a branching event between two sampled lineages. Update (3.13) then applies. The number of sampled lineages decreases by one, and formula (3.9) applies on the next epoch, leading to
$\begin{matrix} L_{t}^{(i)} = & u_{t}^{i} W_{t} \\ where W_{t} = & λ^{x + 1} ψ^{v + w + y} {(1 - r)}^{v + y} r^{w} p_{t_{h + 1}}^{k} {(\frac{p_{t}}{p_{t_{h + 1}}})}^{k - 1} \prod_{t_{j} \in X} p_{t_{j}} \prod_{t_{j} \in Y} u_{t_{j}} p_{t_{j}}^{- 1} \prod_{t_{j} \in W} p_{t_{j}}^{- 1} \\ = & λ^{x + 1} ψ^{v + w + y} {(1 - r)}^{v + y} r^{w} p_{t}^{k - 1} \prod_{t_{j} \in X \cup {t_{h + 1}}} p_{t_{j}} \prod_{t_{j} \in Y} u_{t_{j}} p_{t_{j}}^{- 1} \prod_{t_{j} \in W} p_{t_{j}}^{- 1} . \end{matrix}$

In all four cases, Proposition 3.1 is satisfied across epoch $(t_{h + 1}, t_{h + 2})$ .

E.2. Proof of Proposition 4.3

This Proposition is also proven by induction across the epochs.

We start at $t_{or} = t_{n}$ with $k = 1$ lineage. Across epoch $(t_{n - 1}, t_{n})$ , applying Proposition 4.1 with $F (z) = 1$ and $k = 1$ , we get $\hat{M} (t, z) = R (t_{or} - t, z)$ , which verifies Proposition 4.3.

Suppose now that Proposition 4.3 is verified across epoch $(t_{h}, t_{h + 1})$ and that we observed, on $(t_{h}, t_{or})$ , v sampled ancestors, w removed leaves at times $t_{j} \in W$ , x branching events at times $t_{j} \in X$ , y non-removed leaves at times $t_{j} \in Y$ . Let us have a look at what happens on $(t_{h - 1}, t_{h})$ .

Punctual event $t_{h}$ can either be,

a removed leaf. The number of sampled lineages then goes from

1 + x - y - w

x - y - w

, and applying update (4.10) followed by Proposition 4.1 leads to

\begin{matrix} \hat{M} (t, z) = & λ^{x} ψ^{v + (w + 1) + y} r^{w + 1} {(1 - r)}^{v + y} R (t_{or} - t_{h}, u (t_{h} - t, z)) R {(t_{h} - t, z)}^{x - y - w} \prod_{t_{j} \in X} R (t_{j} - t_{h}, u (t_{h} - t, z)) \\ \prod_{t_{j} \in W} R {(t_{j} - t_{h}, u (t_{h} - t, z))}^{- 1} \prod_{t_{j} \in Y} u (t_{j} - t_{h}, u (t_{h} - t, z)) R {(t_{j} - t_{h}, u (t_{h} - t, z))}^{- 1} \\ = & λ^{x} ψ^{v + (w + 1) + y} r^{w + 1} {(1 - r)}^{v + y} \frac{R (t_{or} - t, z)}{R (t_{h} - t, z)} R {(t_{h} - t, z)}^{x - y - w} \\ \prod_{t_{j} \in X} \frac{R (t_{j} - t, z)}{R (t_{h} - t, z)} \prod_{t_{j} \in W} \frac{R (t_{h} - t, z)}{R (t_{j} - t, z)} \prod_{t_{j} \in Y} u (t_{j} - t, z) \frac{R (t_{h} - t, z)}{R (t_{j} - t, z)} \\ = & λ^{x} ψ^{v + (w + 1) + y} r^{w + 1} {(1 - r)}^{v + y} R (t_{or} - t, z) \\ \prod_{t_{j} \in X} R (t_{j} - t, z) \prod_{t_{j} \in W \cup {t_{h}}} R {(t_{j} - t, z)}^{- 1} \prod_{t_{j} \in Y} u (t_{j} - t, z) R {(t_{j} - t, z)}^{- 1} . \end{matrix}

where the first to second equality is detailed in Appendix A, and the second to third comes after canceling out the

R (t_{h} - t, z)

a non-removed leaf. The number of sampled lineages then goes from

1 + x - y - w

x - y - w

, and applying update (4.11) followed by Proposition 4.1 leads to

\begin{matrix} \hat{M} (t, z) = & λ^{x} ψ^{v + w + (y + 1)} r^{w} {(1 - r)}^{v + (y + 1)} \frac{R (t_{or} - t, z)}{R (t_{h} - t, z)} R {(t_{h} - t, z)}^{x - y - w} u (t_{h} - t, z) \\ \prod_{t_{j} \in X} \frac{R (t_{j} - t, z)}{R (t_{h} - t, z)} \prod_{t_{j} \in W} \frac{R (t_{h} - t, z)}{R (t_{j} - t, z)} \prod_{t_{j} \in Y} u (t_{j} - t, z) \frac{R (t_{h} - t, z)}{R (t_{j} - t, z)} \\ = & λ^{x} ψ^{v + w + (y + 1)} r^{w} {(1 - r)}^{v + (y + 1)} R (t_{or} - t, z) \\ \prod_{t_{j} \in X} R (t_{j} - t, z) \prod_{t_{j} \in W} R {(t_{j} - t, z)}^{- 1} \prod_{t_{j} \in Y \cup {t_{h}}} u (t_{j} - t, z) R {(t_{j} - t, z)}^{- 1} . \end{matrix}

a sampled ancestor. The number of sampled lineages then remains unchanged and equal to

1 + x - y - w

. Applying update (4.12) followed by Proposition 4.1 leads to

\begin{matrix} \hat{M} (t, z) = & λ^{x} ψ^{(v + 1) + w + y} r^{w} {(1 - r)}^{(v + 1) + y} \frac{R (t_{or} - t, z)}{R (t_{h} - t, z)} R {(t_{h} - t, z)}^{1 + x - y - w} \\ \prod_{t_{j} \in X} \frac{R (t_{j} - t, z)}{R (t_{h} - t, z)} \prod_{t_{j} \in W} \frac{R (t_{h} - t, z)}{R (t_{j} - t, z)} \prod_{t_{j} \in Y} u (t_{j} - t, z) \frac{R (t_{h} - t, z)}{R (t_{j} - t, z)} \\ = & λ^{x} ψ^{(v + 1) + w + y} r^{w} {(1 - r)}^{(v + 1) + y} R (t_{or} - t, z) \\ \prod_{t_{j} \in X} R (t_{j} - t, z) \prod_{t_{j} \in W} R {(t_{j} - t, z)}^{- 1} \prod_{t_{j} \in Y} u (t_{j} - t, z) R {(t_{j} - t, z)}^{- 1} . \end{matrix}

a branching time. The number of sampled lineages then goes from

1 + x - y - w

2 + x - y - w

, and applying update (4.15) followed by Proposition 4.1 leads to

\begin{matrix} \hat{M} (t, z) = & λ^{x + 1} ψ^{v + w + y} r^{w} {(1 - r)}^{v + y} \frac{R (t_{or} - t, z)}{R (t_{h} - t, z)} R {(t_{h} - t, z)}^{2 + x - y - w} \\ \prod_{t_{j} \in X} \frac{R (t_{j} - t, z)}{R (t_{h} - t, z)} \prod_{t_{j} \in W} \frac{R (t_{h} - t, z)}{R (t_{j} - t, z)} \prod_{t_{j} \in Y} u (t_{j} - t, z) \frac{R (t_{h} - t, z)}{R (t_{j} - t, z)} \\ = & λ^{x + 1} ψ^{v + w + y} r^{w} {(1 - r)}^{v + y} R (t_{or} - t, z) \\ \prod_{t_{j} \in X \cup {t_{h}}} R (t_{j} - t, z) \prod_{t_{j} \in W} R {(t_{j} - t, z)}^{- 1} \prod_{t_{j} \in Y} u (t_{j} - t, z) R {(t_{j} - t, z)}^{- 1} . \end{matrix}

In all these cases, Proposition 4.3 is verified across epoch $(t_{h - 1}, t_{h})$ , which ends the proof.

Appendix F. Using a generating function to solve for $L_{t}$

F.1. A slightly different strategy

Recall that $L_{t}$ verifies the following ODEs,

\begin{matrix} {\dot{L}}_{t}^{(i)} = & - γ (i + k) L_{t}^{(i)} + λ (2 k + i) L_{t}^{(i + 1)} + μ {iL}_{t}^{(i - 1)} \\ L_{0}^{(i)} = & ρ_{0}^{k_{0}} {(1 - ρ_{0})}^{i} . \end{matrix}

If we introduce the corresponding generating function,

\hat{L} (t, z) = \sum_{i = 0}^{\infty} z^{i} L_{t}^{(i)}

then the initial condition on L translates into,

\hat{L} (0, z) = \sum_{i = 0}^{\infty} {(z (1 - ρ))}^{i} ρ^{k_{0}} = ρ^{k_{0}} \frac{1}{1 - z (1 - ρ)}, \forall z \in (\pm \frac{1}{1 - ρ}) .

The ODE translates into a PDE, but not as nicely as for $M_{t}$ , see below,

\begin{matrix} \partial_{t} \hat{L} = & \sum_{i = 0}^{\infty} z^{i} (- γ (i + k) L_{t}^{(i)} + λ (2 k + i) L_{t}^{(i + 1)} + μ {iL}_{t}^{(i - 1)}) \\ = & - γ k \sum_{i = 0}^{\infty} z^{i} L_{t}^{(i)} - γ \sum_{i = 1}^{\infty} {iz}^{i} L_{t}^{(i)} + λ \sum_{i = 1}^{\infty} z^{i - 1} (2 k + i - 1) L_{t}^{(i)} + μ \sum_{i = 0}^{\infty} (i + 1) z^{i + 1} L_{t}^{(i)} \\ = & - γ k \hat{L} - γ z \partial_{z} \hat{L} + (2 k - 1) λ \frac{1}{z} (\hat{L} - L_{t}^{(0)}) + λ \partial_{z} \hat{L} + μ z^{2} \partial_{z} \hat{L} + μ z \hat{L} \\ = & (- γ k + (2 k - 1) λ \frac{1}{z} + μ z) \hat{L} + (μ z^{2} - γ z + λ) \partial_{z} \hat{L} - (2 k - 1) λ \frac{1}{z} \hat{L} (t, 0) . \end{matrix}

We are thus left with the following PDE problem,

\begin{matrix} \hat{L} (0, z) = \frac{ρ^{k_{0}}}{1 - z (1 - ρ)} \\ - z \partial_{t} \hat{L} + (μ z^{3} - γ z^{2} + λ z) \partial_{z} \hat{L} + (μ z^{2} - γ kz + (2 k - 1) λ) \hat{L} - (2 k - 1) λ \hat{L} (t, 0) = 0 . \end{matrix}

(F.1)

This remaining term with $\hat{L} (t, 0)$ complicates things a little bit. However, the initial condition on $\hat{L}$ provides us with a first candidate function to satisfy this PDE.

F.2. Solution

We introduce below function f, and show that it satisfies the PDE problem (F.1).

f (t, z) ≔ \frac{p_{t}^{k}}{1 - {zu}_{t}} .

First, we observe that it satisfies the initial condition. We then need to check that it satisfies the PDE, and to do so we expand each of the four components of Eq. (F.1).

The first one gives us,

\begin{matrix} - z \partial_{t} f = & - z \frac{k {\dot{p}}_{t} p_{t}^{k - 1} (1 - {zu}_{t}) + z {\dot{u}}_{t} p_{t}^{k}}{{(1 - {zu}_{t})}^{2}} \\ = & - z \frac{k (2 λ u_{t} - γ) (1 - {zu}_{t}) + (λ u_{t}^{2} - γ u_{t} + μ) z}{{(1 - {zu}_{t})}^{2}} p_{t}^{k} \\ = & \frac{λ (- 2 {kzu}_{t} - (2 k - 1) z^{2} u_{t}^{2}) + γ (kz - (k - 1) z^{2} u_{t}) - μ z^{2}}{{(1 - {zu}_{t})}^{2}} p_{t}^{k} . \end{matrix}

We then turn to the second component,

(μ z^{3} - γ z^{2} + λ z) \partial_{z} f = \frac{λ {zu}_{t} - γ z^{2} u_{t} + μ z^{3} u_{t}}{{(1 - {zu}_{t})}^{2}} p_{t}^{k} .

And the third one,

\begin{matrix} (μ z^{2} - γ kz + (2 k - 1) λ) f = & \frac{(1 - {zu}_{t}) (μ z^{2} - γ kz + (2 k - 1) λ)}{{(1 - {zu}_{t})}^{2}} p_{t}^{k} \\ = & \frac{λ ((2 k - 1) - (2 k - 1) {zu}_{t}) + γ (- kz + {kz}^{2} u_{t}) + μ (z^{2} - z^{3} u_{t})}{{(1 - {zu}_{t})}^{2}} p_{t}^{k} . \end{matrix}

And the fourth and final one,

\begin{matrix} - (2 k - 1) λ f (t, 0) = & \frac{- λ (2 k - 1) {(1 - {zu}_{t})}^{2}}{{(1 - {zu}_{t})}^{2}} p_{t}^{k} \\ = & \frac{λ (- (2 k - 1) z^{2} u_{t}^{2} + 2 (2 k - 1) {zu}_{t} - (2 k - 1))}{{(1 - {zu}_{t})}^{2}} p_{t}^{k} . \end{matrix}

Putting everything together, we can now check that indeed,

- z \partial_{t} f + (μ z^{3} - γ z^{2} + λ z) \partial_{z} f + (μ z^{2} - γ kz + (2 k - 1) λ) f - (2 k - 1) f (t, 0) = 0 .

While the branching and $ψ$ -sampling with removal updates do not change anything to this solution, all the others do. Further work is thus needed to look for other solutions to this same PDE with different initial conditions.

References

Al-Mohy A.H., Higham N.J. A new scaling and squaring algorithm for the matrix exponential. SIAM J. Matrix Anal. Appl. 2010;31(3):970–989. [Google Scholar]
Andrieu C., Doucet A., Holenstein R. Particle markov chain monte carlo methods. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 2010;72(3):269–342. [Google Scholar]
Barido-Sottani J., Aguirre-Fernández G., Hopkins M.J., Stadler T., Warnock R. Ignoring stratigraphic age uncertainty leads to erroneous estimates of species divergence times under the fossilized birth–death process. Proc. Roy. Soc. B. 2019;286(1902):20190685. doi: 10.1098/rspb.2019.0685. [DOI] [PMC free article] [PubMed] [Google Scholar]
Billaud, O., Moen, D.S., Parsons, T.L., Morion, H., 2019. Estimating diversity through time using molecular phylogenies: Old and species-poor frog families are the remnants of a diverse past. bioRxiv. [DOI] [PubMed]
Crawford F.W., Minin V.N., Suchard M.A. Estimation for general birth-death processes. J. Am. Stat. Assoc. 2014;109(506):730–747. doi: 10.1080/01621459.2013.866565. [DOI] [PMC free article] [PubMed] [Google Scholar]
Etienne R.S., Haegeman B., Stadler T., Aze T., Pearson P.N., Purvis A., Phillimore A.B. Diversity-dependence brings molecular phylogenies closer to agreement with the fossil record. Proc. Roy. Soc. B Biol. Sci. 2012;279(1732):1300–1309. doi: 10.1098/rspb.2011.1439. [DOI] [PMC free article] [PubMed] [Google Scholar]
Foote M. Origination and extinction components of taxonomic diversity: general problems. Paleobiology. 2000;26(S4):74–102. [Google Scholar]
Freyman, W.A., Höhna, S., 11 2018. Stochastic character mapping of state-dependent diversification reveals the tempo of evolutionary decline in self-compatible onagraceae lineages. [DOI] [PubMed]
Gavryushkina A., Heath T.A., Ksepka D.T., Stadler T., Welch D., Drummond A.J. Bayesian total-evidence dating reveals the recent crown radiation of penguins. Syst. Biol. 2016;66(1):57–73. doi: 10.1093/sysbio/syw060. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gray R.D., Drummond A.J., Greenhill S.J. Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science. 2009;323(5913):479–483. doi: 10.1126/science.1166858. [DOI] [PubMed] [Google Scholar]
Gupta, A., Manceau, M., Vaughan, T., Khammash, M., Stadler, T., 2019. The probability distribution of the reconstructed phylogenetic tree with occurrence data. bioRxiv, 679365 [DOI] [PubMed]
Heath T.A., Huelsenbeck J.P., Stadler T. The fossilized birth–death process for coherent calibration of divergence-time estimates. Proc. Nat. Acad. Sci. 2014;111(29):2957–2966. doi: 10.1073/pnas.1319091111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kendall D.G. On the generalized ‘birth-and-death’ process. Ann. Math. Stat. 1948;19:1–15. [Google Scholar]
Laudanno G., Haegeman B., Etienne R.S. Additional analytical support for a new method to compute the likelihood of diversification models. Bull. Math. Biol. 2020;82(2):22. doi: 10.1007/s11538-020-00698-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Leventhal G.E., Günthard H.F., Bonhoeffer S., Stadler T. Using an epidemiological model for phylogenetic inference reveals density dependence in hiv transmission. Mol. Biol. Evol. 2013;31(1):6–17. doi: 10.1093/molbev/mst172. [DOI] [PMC free article] [PubMed] [Google Scholar]
Levin, D.A., Peres, Y., 2017. Markov chains and mixing times. Vol. 107. American Mathematical Soc
Maddison W.P., Midford P.E., Otto S.P. Estimating a binary character’s effect on speciation and extinction. Syst. Biol. 2007;56(5):701–710. doi: 10.1080/10635150701607033. [DOI] [PubMed] [Google Scholar]
Moler C., Van Loan C. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev. 2003;45(1):3–49. [Google Scholar]
Morlon H., Parsons T.L., Plotkin J.B. Reconciling molecular phylogenies with the fossil record. Proc. Natl. Acad. Sci. USA. 2011;108(39):16327–16332. doi: 10.1073/pnas.1102543108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Morlon H., Potts M.D., Plotkin J.B. Inferring the dynamics of diversification: a coalescent approach. PLoS Biol. 2010;8(9):1–13. doi: 10.1371/journal.pbio.1000493. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nee S., May R.M., Harvey P.H. The reconstructed evolutionary process. Philos. Trans. Roy. Soc. Lond. B Biol. Sci. 1994;344(1309):305–311. doi: 10.1098/rstb.1994.0068. [DOI] [PubMed] [Google Scholar]
Quental T.B., Marshall C.R. Diversity dynamics: molecular phylogenies need the fossil record. Trends Ecol. Evol. 2010;25(8):434–441. doi: 10.1016/j.tree.2010.05.002. [DOI] [PubMed] [Google Scholar]
Ratmann O., Hodcroft E.B., Pickles M., Cori A., Hall M., Lycett S., Colijn C., Dearlove B., Didelot X., Frost S. Phylogenetic tools for generalized hiv-1 epidemics: findings from the pangea-hiv methods comparison. Mol. Biol. Evol. 2016;34(1):185–203. doi: 10.1093/molbev/msw217. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smith S., Shahrezaei V. General transient solution of the one-step master equation in one dimension. Phys. Rev. E. 2015;91(6) doi: 10.1103/PhysRevE.91.062119. [DOI] [PubMed] [Google Scholar]
Stadler T. Sampling-through-time in birth–death trees. J. Theor. Biol. 2010;267(3):396–404. doi: 10.1016/j.jtbi.2010.09.010. [DOI] [PubMed] [Google Scholar]
Stadler, T., 2011. Inferring speciation and extinction processes from extant species data. Proc. Natl. Acad. Sci. USA [DOI] [PMC free article] [PubMed]
Stadler T. How can we improve accuracy of macroevolutionary rate estimates? Syst. Biol. 2012;62(2):321–329. doi: 10.1093/sysbio/sys073. [DOI] [PubMed] [Google Scholar]
Stadler T., Bonhoeffer S. Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods. Philos. Trans. Roy. Soc. Lond. B Biol. Sci. 2013;368(1614):20120198. doi: 10.1098/rstb.2012.0198. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stadler T., Kouyos R., von Wyl V., Yerly S., Böni J., Bürgisser P., Klimkait T., Joos B., Rieder P., Xie D. Estimating the basic reproductive number from viral sequence data. Mol. Biol. Evol. 2011;29(1):347–357. doi: 10.1093/molbev/msr217. [DOI] [PubMed] [Google Scholar]
Stadler, T., Kühnert, D., Bonhoeffer, S., Drummond, A.J., 2013. Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proc. Nat. Acad. Sci. 110 (1), 228–233 [DOI] [PMC free article] [PubMed]
Stadler T., Steel M. Swapping birth and death: symmetries and transformations in phylodynamic models. Syst. Biol. 2019;68(5):852–858. doi: 10.1093/sysbio/syz039. [DOI] [PMC free article] [PubMed] [Google Scholar]
Starrfelt J., Liow L.H. How many dinosaur species were there? fossil bias and true richness estimated using a poisson sampling model. Philos. Trans. Roy. Soc. B Biol. Sci. 2016;371(1691):20150219. doi: 10.1098/rstb.2015.0219. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vaughan, T.G., Leventhal, G.E., Rasmussen, D.A., Drummond, A.J., Welch, D., Stadler, T., 05 2019. Estimating epidemic incidence and prevalence from genomic data. Mol. Biol. Evol. [DOI] [PMC free article] [PubMed]
Yule G.U. Philos. Trans. Roy. Soc. Lond. B; J.C: 1925. A mathematical theory of evolution, based on the conclusions of Dr. Willis, F.R.S. [Google Scholar]

[b0005] Al-Mohy A.H., Higham N.J. A new scaling and squaring algorithm for the matrix exponential. SIAM J. Matrix Anal. Appl. 2010;31(3):970–989. [Google Scholar]

[b0010] Andrieu C., Doucet A., Holenstein R. Particle markov chain monte carlo methods. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 2010;72(3):269–342. [Google Scholar]

[b0015] Barido-Sottani J., Aguirre-Fernández G., Hopkins M.J., Stadler T., Warnock R. Ignoring stratigraphic age uncertainty leads to erroneous estimates of species divergence times under the fossilized birth–death process. Proc. Roy. Soc. B. 2019;286(1902):20190685. doi: 10.1098/rspb.2019.0685. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0020] Billaud, O., Moen, D.S., Parsons, T.L., Morion, H., 2019. Estimating diversity through time using molecular phylogenies: Old and species-poor frog families are the remnants of a diverse past. bioRxiv. [DOI] [PubMed]

[b0025] Crawford F.W., Minin V.N., Suchard M.A. Estimation for general birth-death processes. J. Am. Stat. Assoc. 2014;109(506):730–747. doi: 10.1080/01621459.2013.866565. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0030] Etienne R.S., Haegeman B., Stadler T., Aze T., Pearson P.N., Purvis A., Phillimore A.B. Diversity-dependence brings molecular phylogenies closer to agreement with the fossil record. Proc. Roy. Soc. B Biol. Sci. 2012;279(1732):1300–1309. doi: 10.1098/rspb.2011.1439. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0035] Foote M. Origination and extinction components of taxonomic diversity: general problems. Paleobiology. 2000;26(S4):74–102. [Google Scholar]

[b0040] Freyman, W.A., Höhna, S., 11 2018. Stochastic character mapping of state-dependent diversification reveals the tempo of evolutionary decline in self-compatible onagraceae lineages. [DOI] [PubMed]

[b0045] Gavryushkina A., Heath T.A., Ksepka D.T., Stadler T., Welch D., Drummond A.J. Bayesian total-evidence dating reveals the recent crown radiation of penguins. Syst. Biol. 2016;66(1):57–73. doi: 10.1093/sysbio/syw060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0050] Gray R.D., Drummond A.J., Greenhill S.J. Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science. 2009;323(5913):479–483. doi: 10.1126/science.1166858. [DOI] [PubMed] [Google Scholar]

[b0055] Gupta, A., Manceau, M., Vaughan, T., Khammash, M., Stadler, T., 2019. The probability distribution of the reconstructed phylogenetic tree with occurrence data. bioRxiv, 679365 [DOI] [PubMed]

[b0060] Heath T.A., Huelsenbeck J.P., Stadler T. The fossilized birth–death process for coherent calibration of divergence-time estimates. Proc. Nat. Acad. Sci. 2014;111(29):2957–2966. doi: 10.1073/pnas.1319091111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0065] Kendall D.G. On the generalized ‘birth-and-death’ process. Ann. Math. Stat. 1948;19:1–15. [Google Scholar]

[b0070] Laudanno G., Haegeman B., Etienne R.S. Additional analytical support for a new method to compute the likelihood of diversification models. Bull. Math. Biol. 2020;82(2):22. doi: 10.1007/s11538-020-00698-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0075] Leventhal G.E., Günthard H.F., Bonhoeffer S., Stadler T. Using an epidemiological model for phylogenetic inference reveals density dependence in hiv transmission. Mol. Biol. Evol. 2013;31(1):6–17. doi: 10.1093/molbev/mst172. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0080] Levin, D.A., Peres, Y., 2017. Markov chains and mixing times. Vol. 107. American Mathematical Soc

[b0085] Maddison W.P., Midford P.E., Otto S.P. Estimating a binary character’s effect on speciation and extinction. Syst. Biol. 2007;56(5):701–710. doi: 10.1080/10635150701607033. [DOI] [PubMed] [Google Scholar]

[b0090] Moler C., Van Loan C. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev. 2003;45(1):3–49. [Google Scholar]

[b0095] Morlon H., Parsons T.L., Plotkin J.B. Reconciling molecular phylogenies with the fossil record. Proc. Natl. Acad. Sci. USA. 2011;108(39):16327–16332. doi: 10.1073/pnas.1102543108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0100] Morlon H., Potts M.D., Plotkin J.B. Inferring the dynamics of diversification: a coalescent approach. PLoS Biol. 2010;8(9):1–13. doi: 10.1371/journal.pbio.1000493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0105] Nee S., May R.M., Harvey P.H. The reconstructed evolutionary process. Philos. Trans. Roy. Soc. Lond. B Biol. Sci. 1994;344(1309):305–311. doi: 10.1098/rstb.1994.0068. [DOI] [PubMed] [Google Scholar]

[b0110] Quental T.B., Marshall C.R. Diversity dynamics: molecular phylogenies need the fossil record. Trends Ecol. Evol. 2010;25(8):434–441. doi: 10.1016/j.tree.2010.05.002. [DOI] [PubMed] [Google Scholar]

[b0115] Ratmann O., Hodcroft E.B., Pickles M., Cori A., Hall M., Lycett S., Colijn C., Dearlove B., Didelot X., Frost S. Phylogenetic tools for generalized hiv-1 epidemics: findings from the pangea-hiv methods comparison. Mol. Biol. Evol. 2016;34(1):185–203. doi: 10.1093/molbev/msw217. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0120] Smith S., Shahrezaei V. General transient solution of the one-step master equation in one dimension. Phys. Rev. E. 2015;91(6) doi: 10.1103/PhysRevE.91.062119. [DOI] [PubMed] [Google Scholar]

[b0125] Stadler T. Sampling-through-time in birth–death trees. J. Theor. Biol. 2010;267(3):396–404. doi: 10.1016/j.jtbi.2010.09.010. [DOI] [PubMed] [Google Scholar]

[b0130] Stadler, T., 2011. Inferring speciation and extinction processes from extant species data. Proc. Natl. Acad. Sci. USA [DOI] [PMC free article] [PubMed]

[b0135] Stadler T. How can we improve accuracy of macroevolutionary rate estimates? Syst. Biol. 2012;62(2):321–329. doi: 10.1093/sysbio/sys073. [DOI] [PubMed] [Google Scholar]

[b0140] Stadler T., Bonhoeffer S. Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods. Philos. Trans. Roy. Soc. Lond. B Biol. Sci. 2013;368(1614):20120198. doi: 10.1098/rstb.2012.0198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0145] Stadler T., Kouyos R., von Wyl V., Yerly S., Böni J., Bürgisser P., Klimkait T., Joos B., Rieder P., Xie D. Estimating the basic reproductive number from viral sequence data. Mol. Biol. Evol. 2011;29(1):347–357. doi: 10.1093/molbev/msr217. [DOI] [PubMed] [Google Scholar]

[b0150] Stadler, T., Kühnert, D., Bonhoeffer, S., Drummond, A.J., 2013. Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proc. Nat. Acad. Sci. 110 (1), 228–233 [DOI] [PMC free article] [PubMed]

[b0155] Stadler T., Steel M. Swapping birth and death: symmetries and transformations in phylodynamic models. Syst. Biol. 2019;68(5):852–858. doi: 10.1093/sysbio/syz039. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0160] Starrfelt J., Liow L.H. How many dinosaur species were there? fossil bias and true richness estimated using a poisson sampling model. Philos. Trans. Roy. Soc. B Biol. Sci. 2016;371(1691):20150219. doi: 10.1098/rstb.2015.0219. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0165] Vaughan, T.G., Leventhal, G.E., Rasmussen, D.A., Drummond, A.J., Welch, D., Stadler, T., 05 2019. Estimating epidemic incidence and prevalence from genomic data. Mol. Biol. Evol. [DOI] [PMC free article] [PubMed]

[b0170] Yule G.U. Philos. Trans. Roy. Soc. Lond. B; J.C: 1925. A mathematical theory of evolution, based on the conclusions of Dr. Willis, F.R.S. [Google Scholar]

PERMALINK

The probability distribution of the ancestral population size conditioned on the reconstructed phylogenetic tree with occurrence data

Marc Manceau

Ankit Gupta

Timothy Vaughan

Tanja Stadler

Highlights

Abstract

1. Introduction

2. Model and notation

2.1. Parameters of the process

2.2. Introducing useful probabilities

2.3. Strategy of the paper

Fig. 1.

3. Calculation of Lt – The density of observations below t conditioned on past population size

3.1. Set of ODEs satisfied by Lt

Fig. 2.

Fig. 3.

3.2. Special case ω=0

Proposition 3.1

Proof

Corollary 3.1.1

Proof

3.3. Special case r=1

Proposition 3.2

Proof

4. Calculation of Mt – the joint density of observations above t and past population size

4.1. Set of ODEs satisfied by Mt

4.2. The corresponding generating function

Proposition 4.1

Proof

Proposition 4.2

Proof

4.3. Special case ω=0

Proposition 4.3

Proof

Corollary 4.3.1

Proof

5. The distribution of past population size conditioned on observations

5.1. The distribution at fixed times

Fig. 4.

5.2. Generator of trajectories

5.3. Numerical implementation

Fig. 5.

Fig. 6.

6. Discussion

6.1. Using the probability density of the data

6.2. Probability distribution of past population sizes

6.3. Increased efficiency opens new research avenues

6.4. Future extensions

6.5. Conclusion

Declaration of Competing Interest

CRediT authorship contribution statement

Acknowledgements

Contributor Information

Appendix A. Solving well-known ODEs

A.1. The extinction probability

A.2. Probability to leave only one sampled descendent

A.3. A few useful properties

Appendix B. Link with previous work by Gupta et al. (2019)

Appendix C. Solving the main PDE

C.1. Principle of the method of characteristics

C.2. Step 1, solve for t(s),z(s) and g(s)

C.3. Step 2, express M^ back as a function of t,z

Appendix D. Some useful algebra

D.1. Deivative of M^

D.2. Derivatives of M^ when ω=0

Appendix E. Inductions across the epochs

E.1. Proof of Proposition 3.1

E.2. Proof of Proposition 4.3

Appendix F. Using a generating function to solve for Lt

F.1. A slightly different strategy

F.2. Solution

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3. Calculation of $L_{t}$ – The density of observations below t conditioned on past population size

3.1. Set of ODEs satisfied by $L_{t}$

3.2. Special case $ω = 0$

3.3. Special case $r = 1$

4. Calculation of $M_{t}$ – the joint density of observations above t and past population size

4.1. Set of ODEs satisfied by $M_{t}$

4.3. Special case $ω = 0$

C.2. Step 1, solve for $t (s), z (s)$ and $g (s)$

C.3. Step 2, express $\hat{M}$ back as a function of $t, z$

D.1. Deivative of $\hat{M}$

D.2. Derivatives of $\hat{M}$ when $ω = 0$

Appendix F. Using a generating function to solve for $L_{t}$