Skip to main content
Springer logoLink to Springer
. 2023 Sep 2;87(3):53. doi: 10.1007/s00285-023-01988-4

The expected loss of feature diversity (versus phylogenetic diversity) following rapid extinction at the present

Marcus Overwater 1, Daniel Pelletier 2, Mike Steel 1,
PMCID: PMC10475005  PMID: 37658909

Abstract

The current rapid extinction of species leads not only to their loss but also the disappearance of the unique features they harbour, which have evolved along the branches of the underlying evolutionary tree. One proxy for estimating the feature diversity (FD) of a set S of species at the tips of a tree is ‘phylogenetic diversity’ (PD): the sum of the branch lengths of the subtree connecting the species in S. For a phylogenetic tree that evolves under a standard birth–death process, and which is then subject to a sudden extinction event at the present (the simple ‘field of bullets’ model with a survival probability of s per species) the proportion of the original PD that is retained after extinction at the present is known to converge quickly to a particular concave function φPD(s) as t grows. To investigate how the loss of FD mirrors the loss of PD for a birth–death tree, we model FD by assuming that distinct discrete features arise randomly and independently along the branches of the tree at rate r and are lost at a constant rate ν. We derive an exact mathematical expression for the ratio φFD(s) of the two expected feature diversities (prior to and following an extinction event at the present) as t becomes large. We find that although φFD has a similar behaviour to φPD (and coincides with it for ν=0), when ν>0, φFD(s) is described by a function that is different from φPD(s). We also derive an exact expression for the expected number of features that are present in precisely one extant species. Our paper begins by establishing some generic properties of FD in a more general (non-phylogenetic) setting and applies this to fixed trees, before considering the setting of random (birth–death) trees.

Keywords: Feature diversity, Phylogenetic diversity, Birth–death processes, Extinction

Introduction

Phylogenetic trees provide a way to quantify biodiversity and the extent to which it might be lost in the face of the current mass extinction event. One such biodiversity measure is phylogenetic diversity (PD), which associates to each subset S of extant species, the sum of the branch lengths of the underlying evolutionary tree that connects (just) these species to the root of the tree. This measure, pioneered by Faith (1992), provides a more complete measure of biodiversity than merely counting the number of species in S (see e.g. Miller et al. 2018). Moreover, if new features evolve along the branches of a tree and are never lost, then the resulting features present amongst the species in the subset S (the feature diversity (FD) of S) are directly correlated with the phylogenetic diversity of S (Wicke et al. 2021). However, when features are lost, it has recently been shown mathematically that under simple (deterministic or stochastic) models of feature gain and loss, FD necessarily deviates from PD except for very trivial types of phylogenetic trees (Theorems 2 and 3 of Rosindell et al. (2022)). More generally, the question of the extent to which PD captures feature or functional diversity has been the subject of considerable debate in the biological literature (see Devictor et al. 2010; Mazel et al. 2017, 2018, 2019; Owen et al. 2019; Tucker et al. 2018, 2019).

In this paper, we investigate a related question: under a standard phylogenetic diversification model and a simple stochastic process of feature gain and loss, what proportion of feature diversity is expected to be lost in a mass extinction event at the present? And how does this ratio compare with the expected phylogenetic diversity that will be lost? Although the latter ratio (for PD) has been determined in earlier work, here we provide a corresponding result for FD, and show how it differs from PD when the rate of feature loss is non-zero. Our results suggest that the relative loss of FD under a mass extinction event at the present is greater than the relative loss of PD. We also investigate the number of features that are expected to be found in just one species at the present.

We begin by considering the properties of FD in a more general setting based purely on the species themselves (i.e. not involving any underlying phylogenetic tree or network) and then consider FD on fixed phylogenetic trees before presenting the results for (random) birth–death trees. Some of the results of these earlier sections are applied in the later sections.

General properties of feature diversity without reference to phylogenies

This section considers the generic properties of expected FD for sets of species, and thus, no underlying phylogenetic tree or stochastic process that generates a tree, or model of feature evolution is assumed. We mostly follow the notation from Wicke et al. (2021).

Definitions

Let X be a labelled set of species, and for each xX, let Fx be a non-empty set of discrete features (e.g. genes, genomic inserts, traits) that are associated with species x.

  • The collection of ordered pairs F={(x,Fx):xX} is called a feature assignment on X.

  • For any subset A of X, let F(A)=xAFx and let μ:F(X)R>0 be a function assigning some richness or novelty to a feature fF(X).

  • The feature diversity of some subset A of a set of species X is defined as,
    FD(A)=fF(A)μ(f). 2.1

A default option for μ is to set μ(f)=1, which simply counts the number of features present. Notice that for any subset A of X we have FD(A)xAFD({x})=xAfFxμ(f), with equality if and only if no features are shared by any two species.

Feature diversity loss under a ‘Field of Bullets’ model of extinction at the present

Definition 1.1

Consider a sudden extinction event taking place across a set of species (Raup 1993). In the generalised field of bullets (g-FOB) model, each species xX either survives the extinction event (with probability sx) or disappears (with probability 1-sx), and these survival events are assumed to be independent among the species. We write s(X) (or, more briefly, s) to denote the vector (sx:xX) and we let X denote the (random) subset of X corresponding to the species that survive the extinction event. If sx=s for all xX, then we have the simpler field of bullets (FOB) model.

We define the following quantity:

φ(F,s)=FD(X)FD(X),

which is the proportion of feature diversity that survives the extinction event. In the case of the FOB model, we denote this ratio by φ(F,s).

Proposition 1.1

For the g-FOB model,

E[φ(F,s)]=fF(X)μ~(f)1-x:fFx(1-sx),

where μ~(f)=μ(f)fF(X)μ(f) are the normalised μ values (which sum to 1).

For the FOB model, the equation in Proposition 2.1 simplifies to

E[φ(F,s)]=fF(X)μ~(f)1-(1-s)nf, 2.2

where nf=|{f:x:fFx}|, the number of species in X that possess feature f.

Proof

We have φ(F,s)=fF(X)μ~(f)·If where If is the Bernoulli variable that takes the value 1 if at least one species in X with feature f survives the g-FOB extinction event, and 0 otherwise. Applying linearity of expectation and noting that P(If=1)=1-x:fFx(1-sx) gives the result.

Notice that E[φ(F,s)]=s at s=0,1. The behaviour between these two extreme values of s is described next.

Proposition 1.2

Under the FOB model, the following hold:

  • (i)

    E[φ(F,s)]s for all s[0,1].

  • (ii)

    E[φ(F,s)]=s for a value of s(0,1) if and only if the sets Fx in F are pairwise disjoint, in which case E[φ(F,s)]=s for all s[0,1].

  • (iii)

    If the sets Fx are not pairwise disjoint, then E[φ(F,s)] is a strictly concave increasing function of s.

Proof

Part (i): Since nf1 in Eq. (2.2) and 1-(1-s)nfs for all nf1, the claimed inequality is immediate. Part (ii): If nf=1 then 1-(1-s)nf=s for all s[0,1], and if nf>1 then 1-(1-s)nf>s for every s(0,1). Part (iii): By Eq. (2.2),

ddsE[φ(F,s)]=fF(X)μ~(f)nf(1-s)nf-1>0,

and

d2ds2E[φ(F,s)]=-fF(X):nf2μ~(f)nf(nf-1)(1-s)nf-2<0,

for all s(0,1), from which the claimed results follow.

Approximating φ(F,s) by its expected value

For each n1, let Xn be a labelled set of n species with feature assignment Fn. Let Xn denote the (random) set of species after a g-FOB extinction event with survival probability vector s(Xn), which assigns each xXn a corresponding survival probability sn(x). Note that we make no assumption regarding how the species in Xn and Xm are related (e.g. they may be disjoint, overlapping or nested), or any apriori relationship between s(Xn) and s(Xm).

In the following result, we provide a sufficient condition under which φ(Fn,sn) is likely to be close to its expected value E[φ(Fn,sn)] (readily computed via Eq. (2.1)) when the number of species n is large. This condition allows some species to contribute proportionately more FD than other species do on average, and this proportion can grow with n, provided that it does not grow too quickly.

Proposition 1.3

  1. For ϵ>0,
    Pφ(Fn,sn)-E[φ(Fn,sn)]>ϵ2exp-2ϵ2Rn,
    where Rn=xXnFD({x})FD(Xn)2 .
  2. If Rn0 as n then φ(Fn,sn)-E[φ(Fn,sn)]P0.

  3. Let avFD(Xn)=FD(Xn)/n (the average contribution of each species to the total FD), and suppose that for each xXn, FD({x})/avFD(Xn)Bn, where and Bn2=o(n) (e.g. Bn=n3). We then have the following convergence in probability as n:
    φ(Fn,sn)-E[φ(Fn,sn)]P0.

Proof

Let Yn={Yi:i[n]} be a sequence of Bernoulli random variables where each Yi takes the value of 1 if species xi survives and 0 otherwise. For the g-FOB model, the random variables Yi are independent. We can write φ(Fn,sn)=h(Yn) where h(y1,,yn) is the ratio FD({xiXn:yi=1})FD(Xn). Observe that for any particular value of i{1,,n}, if we change yi (from 0 to 1 or visa versa) to give yi then:

|h(y1,...,yi,...,yn)-h(y1,...,yi,...,yn)|FD({xi})FD(Xn).

We now apply McDiarmid’s inequality McDiarmid (1989) to obtain (for each ϵ>0):

P(φ(Fn,sn)-E[φ(Fn,sn)]ϵ)2exp-2ϵ2Rn, 2.3

This establishes Part (a). Part (b) now follows immediately, and Part (c) follows from Part (b), since the condition in Part (c) implies that

Rn=xFD({x})n·avFD(Xn)2xBnn2=xBn2n2=Bn2/n0,

as n.

Remark

Note that Proposition 2.3(c) can fail when the condition stated in Part (c) does not hold, even for the simpler FOB model. We provide a simple example to demonstrate this. Let Xn={1,,n} and let Fi={f} for i=1,,n-1 and Fn={g} where f and g are distinct features with μ(f)=μ(g)=1. In this case:

φ(Fn,s)=0,w.p.(1-s)n;12,w.p.(1-s)n-1s+(1-(1-s)n-1)(1-s);1,w.p.(1-(1-s)n-1)s.

Therefore, φ(Fn,s)-E[φ(Fn,s)] does not converge in probability to 0 (or to any constant) as n.

Consequences for phylogenetic diversity

Proposition 2.2 and Proposition 2.3 provide a simple way to derive certain results concerning phylogenetic diversity - both on rooted trees and also for rooted phylogenetic networks (specifically for the subNet diversity measure described in Wicke and Fischer (2018)). To each edge e of a rooted phylogenetic tree (or network), associate some unique feature fe and give it the value μ(fe)=(e), where (e) is the length of edge e in the tree (or network). For any subset Y of X (the leaf set of T) we then have:

PD(Y)=FD(Y).

It follows that for the simple field of bullets models, PD satisfies the concavity properties described in Proposition 2.2, where the condition that the sets Fx are pairwise disjoint corresponds to the tree being a star tree. These results were established for rooted trees by specific tree-based arguments (see e.g. Sect. 5 of Lambert and Steel (2013)), but they directly follow from the more general framework above, and extend beyond trees.

Using this same link between PD and FD, Proposition 2.3 provides a further application to any sequence of rooted phylogenetic trees Tn with n leaves and ultrametric edge lengths. For example, the ratio of surviving PD to original PD under the FOB model converges in probability to the expected value of this ratio for a sequence of trees Tn if the total PD of Tn grows at least as fast as nL/nβ, where L is the height of the tree and 0<β<1/2. This condition holds, for example, for Yule trees (Stadler and Steel 2012).

The feature diversity ratio φ for a model of feature evolution on a phylogenetic tree

Consider a rooted binary phylogenetic tree Tn, in which each edge has a positive length that corresponds to a temporal duration, with the root ρ of Tn being placed at the top of a stem edge at time 0, and with each leaf in the leaf set Xn={x1,,xn} of Tn being placed at time t (as in Fig. 1). For convenience, we will assume in this section that μ(f)=1 for all features; however, this assumption can be relaxed (e.g. by allowing μ(f) to take values in a fixed interval [ab] where a>0 according to some fixed distribution, and independently between features) without altering the results significantly.

Fig. 1.

Fig. 1

An example of feature gain and loss on a tree and the impact of extinction at the present. Left: New features arise (indicated by +) and disappear (indicated by −) along the branches of the tree. To simplify this example, no features are present at time 0; however, our results do not require this restriction. In total, there are five features present amongst the leaves of the tree at time t, namely {α,β,γ,δ,ϵ}. Right: An extinction event at the present (denoted by ) results in the loss of three of the extant species, leaving just three features being present among the leaves of the resulting pruned tree, namely {α,β,γ}. Thus, the ratio of surviving features to total features is 3/5=0.6

We let Fρ denote the (possibly empty) set of features present at time 0 (i.e. at the top of the stem edge), and we assume throughout this section that |Fρ| is bounded by some fixed constant B, independent of n.

On Tn, we apply a stochastic process in which (discrete) features arise independently along the branches of this tree at rate r, and each feature that arises is novel (i.e. it has not appeared earlier elsewhere in the tree). Once a feature arises, it is then carried forward in time along the branches of Tn (and is passed on to the two lineages arising at any speciation event). In addition, any feature can be lost from a lineage at any point according to a continuous-time pure-death process that operates at rate ν. This model was investigated in a different setting in Huson and Steel (2004) and studied more recently in Rosindell et al. (2022). Under this process, each leaf x of Tn will have a (possibly empty) set of features (Fx). Fig. 1 illustrates the processes described. Note that |Fx| (for any xXn) and FD(Xn) are now random variables.

Let N denote the number of features at the end of any path P in Tn that starts at time t=0 and ends at time . Then N is described by a continuous-time Markov process that has a constant birth rate and a linear death rate. It is then a classical result Feller (1950) that N has expected value rν(1-e-ν)+F0e-ν, where F0=|Fρ| (the number of features present at time 0), and N converges to a Poisson distribution with mean rν as grows. Moreover, if N0=0 then N has a Poisson distribution for any value of >0 (Feller (1950) p. 461), and so, regardless of the value of Fρ, the random variable |Fx\Fρ| has a Poisson distribution with mean rν(1-e-ν(x)) where (x) is the length of the path from the root to leaf x. The (random) number of features at any leaf x of Tn (i.e. |Fx|) has the same distribution as N(x). Note that for distinct leaves x and y of Tn, the random variables |Fx| and |Fy| are not independent.

Notational convention: Henceforth we will write FD(Tn) in place of FD(Xn) and we will also write FD(Tns) in place of FD(Xn) when Xn is the subset of the set of leaves Xn of Tn that survive under a FOB model with a survival probability s. We will let Fρ denote the set of features present at the root vertex ρ at the top of the stem edge.

Lemma 1.1

Set F0=. Then for any value of n1 the following hold:

  • (i)

    The random variable FD(Tn) has a Poisson distribution.

  • (ii)
    The expected value of FD(Tn) satisfies the following bound:
    E[FD(Tn)]rνn.

Proof

Part (i): We use induction on n. Since Fρ=, the result for base case (n=1) holds by the results mentioned in the previous paragraph. Thus, suppose that n2, and let Tn be a binary tree with n leaves and with Fρ=. Then:

FD(Tn)=FD(T1)+FD(T2)+G.

where the trees Ti (with i=1,2) are subtrees of Tn obtained by deleting the stem edge and its endpoints (and setting the set of features at the top of the stem edge of T1 and T2 equal to the empty set), and G is the number of features that arise on the stem edge of Tn and are present in at least one leaf of Tn.

Notice that FD(T1), FD(T2) and G are independent random variables, and, by induction, FD(T1) and FD(T2) each have a Poisson distribution. Conditional on the number X of features that are present at the end of the stem edge, G has a binomial distribution with parameters X and p where p is the probability that a single feature present at the end of the stem edge is present in at least one leaf of Tn. Since X has a Poisson distribution, and a Poisson number of Bernoulli random variables is Poisson, it follows that FD(Tn), being the sum of three independent Poisson variables, also has a Poisson distribution. This establishes the induction step and thus Part (i).

Part (ii): We have:

FD(Tn)=xXnFxxXn|Fx|.

Now, for each xX, we have E[|Fx|]=rν(1-e-νLx) where Lx denotes the length of the path from the top of the stem edge of Tn to the leaf x. Thus E[FD(Tn)]rν·n.

Example (n=2) Consider the process described on T2 with F0=. Let 0 denote the length of the stem edge, and the length of each of the two pendant edges. Then FD(T2) has a Poisson distribution with expected value

rν(1-e-ν0)(1-(1-e-ν)2)+2rν(1-e-ν)

and FD(T2s) has expected value

s2FD(T2)+2s(1-s)rν(1-e-ν0)e-ν+rν(1-e-ν).

In particular,

E[FD(T2s)]E[FD(T2)]=s·s+2(1-s)1-e-ν(0+)(1-e-ν0)(1-(1-e-ν)2)+2(1-e-ν)

When 0=0, the right-hand side of this equation equals s, but for all other values it is strictly greater than s. Moreover, by differentiating E[FD(T2s)]E[FD(T2)] it can be verified that this ratio is monotone decreasing as ν increases for all positive values of and 0; in particular, for s(0,1), and ν>0, this ratio is always less than the expected proportion of PD that survives in T2.

A limit result for sequences of trees

The main result of this section is Theorem 3.1, and its proof relies on establishing a sequence of preliminary lemmas.

Lemma 1.2

Let β>0 be a fixed constant. Given a sequence Tn of trees with leaf set Xn, let En be the event that |Fx|nβ for every xXn. Then limnP(En)=1.

Proof

We combine the Bonferroni inequality with a standard right-tail probability bound for a Poisson variable. Firstly, observe that:

P(En)1-xXnP(|Fx|>nβ)=1-nP(|Fx1|>nβ). 3.1

Now, |Fx1| can be written as the sum of two independent random variables Ux1+Vx1 where Ux1 has a Poisson distribution with mean mr/ν, and Vx1|Fρ|B with probability 1 (Ux1 counts the features at x1 if Fρ=, Vx1 counts the features in Fρ that are remain present at x1, and B is the global bound on Fρ described near the start of Sect. 3). Thus, the Chernoff bound on the right hand tail of a Poisson variable (Mitzenmacher and Upfal 2005, p. 97) gives P(|Ux1|>nβ)emnβnβe-m, and so nP(|Fx1|>nβ)0 as n grows. Applying this to the inequality in (3.1) establishes the result.

Lemma 1.3

Let (Tn,n1) be a sequence of rooted binary trees, with Tn having leaf set Xn, and suppose that E[FD(Tn)]cn for some constant c>0. Then FD(Tn)E[FD(Tn)] converges in probability to 1 as n.

Proof

First observe that we can write FD(Tn) as the sum of two independent random variables, namely FD0(Tn)+K, where FD0(Tn) is the FD value when Fρ=, and K is the number of features at the time at the root that are also present in at least one leaf. In particular, K|Fρ| which is assumed to be bounded by a constant B (independent of n). Thus

Var[FD(Tn)]=Var[FD0(Tn)]+Var[K]Var[FD0(Tn)]+B.

By Lemma 3.1, Var[FD0(Tn)]=E[FD0(Tn)]rν·n, so Var[FD(Tn)]rν·n+B, and thus:

VarFD(Tn)n=Var[FD(Tn)]/n20, 3.2

as n. We now apply Chebyshev’s inequality to obtain:

PFD(Tn)E[FD(Tn)]-1ϵVarFD(Tn)E[FD(Tn)]/ϵ2, 3.3

and since

FD(Tn)E[FD(Tn)]=FD(Tn)n·nE[FD(Tn)]

we have:

VarFD(Tn)E[FD(Tn)]=VarFD(Tn)n·nE[FD(Tn)]2

and so the term on the right of Eq. (3.3) is bounded above by VarFD(Tn)n·1ϵ2c2, which converges to 0 as n by Eq. (3.2).

Lemma 1.4

Let (Tn,n1) be a sequence of rooted binary trees, with Tn having leaf set Xn, and suppose that E[FD(Tn)]cn for some constant c>0. Suppose that for each xXn, FD({x})Bn, where Bn2=o(n). Then FD(Tns)E[FD(Tns)]P1 as n.

Proof

Let Wn=H(Yn), where Yn={Yi:i[n]} is the sequence of Bernoulli random variables with Yi=1 if species xi survives the FOB extinction event and Yi=0 otherwise, and H(y1,,yn) is the ratio FD({xiXn:yi=1})E[FD(Tns)], where the numerator is as defined in Eq. (2.1) with μ(f)=1 for all f. Observe that for any particular value of i{1,,n}, if we change yi (from 0 to 1 or visa versa) to give yi then:

|H(y1,...,yi,...,yn)-H(y1,...,yi,...,yn)|FD({xi})/E[FD(Tns)]Bn/E[FD(Tns)].

We now apply McDiarmid’s inequality to obtain (for each ϵ>0):

P(|Wn-1|ϵ)2exp-2ϵ2E[FD(Tns)]2nBn2. 3.4

Now, E[FD(Tns)]s·E[FD(Tn)] by Proposition 2.2(i) (taking μ(f)=1 for all f) and, by assumption, E[FD(Tn)]cn. Thus we obtain:

P(|Wn-1|ϵ)2exp-2ϵ2c2s2nBn2.

Therefore, P(|Wn-1|ϵ)0 as n. Since this holds for all ϵ>0, we obtain the claimed result.

We can now state the main result of this section. Recall that FD(Tns) is the number of features present among leaves of Tn that survive the FOB extinction event.

Theorem 1.1

Let (Tn,n1) be a sequence of binary trees and let features evolve on Tn according to the stochastic feature evolution process described. If E[FD(Tn)]cn for some constant c>0, and if E[FD(Tns)]E[FD(Tn)] converges to a constant cs as n grows we have:

FD(Tns)FD(Tn)Pcs

as n.

Before we proceed to the proof, we provide the following comments.

Remarks

  • Theorem 3.1 can fail without the condition E[FD(Tn)]cn. Fig. 2 provides a simple example of a sequence of trees Tn for which FD(Tns)FD(Tn) does not converge in probability to any constant value as n grows. In this example, Fρ= and the tree has height 2 with one leaf having an incident edge of length and the remaining n-1 leaves having incident edges of length 1/n.

  • A sufficient condition for the inequality E[FD(Tn)]cn in Theorem 3.1 to hold is that for some ϵ>0 and δ>0 the proportion of pendant edges of Tn of length ϵ is at least δ for all n1. Briefly, the reason for this is that the expected number of features arising on each of these pendant edges and surviving to the end of this edge is bounded away from 0 and the features associated with distinct pendant edges are always different from each other, and different from any other features arising in the tree.

Fig. 2.

Fig. 2

For the tree Tn shown (with >1 fixed and s(0,1)) the sequence of trees (Tn,n2) has the property that FD(Tns)FD(Tn) does not converge in probability to any constant value as n grows

Proof of Theorem 3.1

Let Aϵ(n) be the event that FD(Tns)E[FD(Tns)]-1<ϵ, and let En be the event described in Lemma  3.2 with β=13. Then, limnP(Aϵ(n)|En)=1, by Lemma 3.4. Now,

P(Aϵ(n))=P(Aϵ(n)|En)P(En)+P(Aϵ(n)|En¯)P(En¯),

and since P(En)1 as n (by Lemma 3.2) we have limnP(Aϵ(n))=1. Since this holds for all ϵ>0, it follows that FD(Tns)E[FD(Tns)]P1 as n grows.

Moreover, from Lemma 3.3, we have FD(Tn)E[FD(Tn)]P1. Now we can write FD(Tns)FD(Tn) as follows:

FD(Tns)FD(Tn)=FD(Tns)E[FD(Tns)]·E[FD(Tn)]FD(Tn)·E[FD(Tns)]E[FD(Tn)]

The first two terms on the right of this equation each converge in probability to 1 as n, whereas the third (deterministic) term converges to cs. This completes the proof.

We will apply Theorem 3.1 in the next section to establish a result for FD loss on birth–death trees.

Feature diversity ratios in birth–death trees

In this section, we continue to investigate the stochastic model of feature gain and loss, but rather than considering fixed trees, we will now allow the trees themselves to be stochastically generated, following the simple birth–death processes that are common in phylogenetics. Thus, there will now be three stochastic processes in play: the linear-birth/linear-death process that generates the tree, the constant-birth/linear-death process of feature gain and loss operating along the branches of the tree, and the simple FOB extinction event at the present.

Definitions

Let Tt denote a birth–death tree grown from a single lineage for time t with birth and death parameters λ and μ, respectively. We will assume throughout that λ>μ (since otherwise the tree is guaranteed to die out as t becomes large).

On Tt, we impose the model of feature gain and loss from the previous section with parameters r and ν. We now apply the FOB model in which each extant species (i.e. leaves of Tt that are present at time t) has a probability s>0 of surviving and 1-s of becoming extinct (independently across the extant species), and we let Tts denote the tree obtained from Tt by removing the species at the present that did not survive this process. We refer to Tts as the pruned tree, and the leaves of Tt and Tts that are present at time t as the extant species (or leaves) of these trees (to contrast them from leaves of Tt that lie at the endpoints of any extinct lineages). If s<1, there may now be fewer features present among the (probably reduced number of) extant species in Tts than there were in Tt.

Let FD(Tt) be the discrete random variable that counts the number of features present in at least one of the extant leaves of Tt, and let FD(Tts) denote the number of these features that are also present in at least one extant leaf in the pruned tree Tts.

Expected feature diversity

Next, we consider the expected values of FD(Tt) and FD(Tts), where this expectation is across all three processes (the birth–death process that generates Tt, feature gain and loss on this tree, and the species that survive the extinction event at the present under the FOB model). Of particular interest is the ratio of these expectations, and their limit as t becomes large. Specifically, let:

φFD(t,s)=E[FD(Tts)]E[FD(Tt)]andφFD(s)=limtφFD(t,s).

Note that φFD(s) is a function of five parameters (s,r,λ,μ,ν); however, we will show that it is just a function of s and two other parameters. Notice also that once these parameters are fixed, φFD(t,s) and φFD(s) are monotone increasing functions of s taking the value 0 at s=0 and 1 at s=1. Moreover, φFD(t,s) and φFD(s) are both independent of r (the rate at which features arise along any lineage) as we formally show shortly.

Relationship to phylogenetic diversity (PD)

Recall that for a rooted phylogenetic tree T with branch lengths, the PD value of a subset S of leaves (PD(ST)) is the sum of the lengths of the edges of the subtree of T that connect S and the root of the tree.

In the special case where ν=0, and where no features are present at time 0, FD(Tts) (conditioned on Tt), has a Poisson distribution with a mean of r times PD(St,Tt), where St is the (random) set of leaves at time t that survive the extinction event at the present. Consequently, E[FD(Tts)|Tt]=E[rPD(St,Tt)]=rE[PD(St,Tt)]. Similarly, E[FD(Tt)|Tt]=rE[PD(Lt,Tt)], where Lt is the set of extant leaves of Tt. Thus, in this special case we have φFD(s)=φPD(s), where:

φPD(s)=limtE[PD(St,Tt)]E[PD(Lt,Tt)].

The function φPD(s) was explicitly determined in Lambert and Steel (2013), Mooers et al. (2012) as follows:

φPD(s):=ρsρ+s-1·ln(s/(1-ρ))ln(1/(1-ρ)),ifρ=μ/λ0,1-s;-sln(s)/(1-s),ifρ=0(i.e.aYuletree);(1-s)/ln(1/s),ifρ=1-s. 4.1

Calculating φFD(s)

We first recall a standard result from birth–death theory. Consider a linear birth–death process (starting with a single individual at time 0), with a birth rate λ, a death rate θ. For the individuals present at time t, sample each individual independently with sampling probability s>0. Let Xt (t0) denote the number of these sampled individuals and let Rts(λ,θ) be the probability that Xt>0. Then

Rts(λ,θ)=s(λ-θ)sλ+(λ(1-s)-θ)e(θ-λ)t,ifλθ;s1+λst,ifλ=θ. 5.1

In particular, Rts(λ,θ) converges to 0 if λθ and converges to a strictly positive value 1-θ/λ if λ>θ (Kendall 1948), (Yang and Rannala 1997).

The number of species at time t in Tts that have a copy of particular feature f that arose at some fixed time t0(0,t) in Tt is described exactly by the birth–death process Xt-t0 with parameters λ and θ=μ+ν and survival probability s at the present; it follows that as t becomes large, it becomes increasingly certain that none of the species at time t in the pruned tree will contain feature f if λμ+ν, whereas if λ>μ+ν, there is a positive limiting probability that f will be present in the extant leaves of the pruned tree.

Since φFD(s)=φPD(s) (as given by Eqn. (4.1)) for all values of s when ν=0, in this section we will assume that ν>0 (in addition to our universal assumption that λ>μ). Our main result provides an explicit formula for φFD(s) in Part (a), and describes some of its key properties in Parts (b) and (c).

Theorem 1.2

Given λ>μ and ν>0, let ρ=μ+νλ, β=1-λ-μν. Then:

  1. φFD(s)=s·I(s)I(1),
    where
    I(s)=01dx1-s1-xβ1-ρ,whenρ1
    and
    I(s)=01dxν/λ-slnx,whenρ=1.
  2. Conditional on the non-extinction of Tt, FD(Tts)FD(Tt) converges in probability to φFD(s) as t.

  3. φFD(s) is an increasing concave function that satisfies 1φFD(s)s for all s.

Remarks

  • (i)

    Notice that although φFD(s) depends on five parameters (r,s,λ,μ,ν), Theorem 5.1(a) reveals that just three derived parameters suffice to determine φFD(s), namely s and the ratios ρ1=μ/λ(0,1) and ρ2=ν/λ (these determine ρ and β, since ρ=ρ1+ρ2 and β=1-1/ρ2+ρ1/ρ2). Notice also that ρ=1β=0 and ρ>1β>0.

  • (ii)
    Our proof relies on establishing the following exact expression for E[FD(Tts)]:
    E[FD(Tts)]=re(λ-μ)t0te-(λ-μ)τRτs(λ,μ+ν)dτ+F0Rts(λ,μ+ν). 5.2
    where F0 is the number of features present at time t=0.

    In particular, this also provides an exact expression for φFD(t,s). Notice that the ratio of the expected number of features present in the pruned tree, divided by the expected number of species in the pruned tree is FD(Tts)se(λ-μ)t, and this ratio converges to r(1-ρ)ν01dx1-s-ρ+sxβ as t when ρ1 (via a further analysis of Eq. (5.2) in the Appendix).

  • (ii)

    We saw in Sect. 4.3 that when ν=0, φFD(s)=φPD(s). At the other extreme, if λ and μ are fixed, and we let ν, then φFD(s) converges to s (since ρ- and β1 in Theorem 5.1(a)). Informally, when ν is large compared to λ, most of the features present among the extant leaves of Tt will have arisen near the end of the pendant edges incident with these extant leaves (a formalisation of this claim appears in Rosindell et al. (2022)); if we now apply the FOB model then the expected proportion of these features that survive will be close to the expected proportion of leaves that survive, namely s.

Illustrative examples

First, consider a Yule tree (i.e. μ=0) grown for time t. Figure 3 (left) plots φFD(s) for values of ν/λ{0,0.5,1,2,10}. When ν=0, φFD(s) describes the proportional loss of expected PD in the pruned tree, and as ν increases, φFD(s) converges towards s (the expected proportion of leaves that survive extinction at the present). Figure 3 (right) plots φFD(s) for birth–death trees with μ/λ=0.8, showing a similar trend, however with φFD(s) ranging higher above the curve y=s.

Fig. 3.

Fig. 3

Left: The function φFD(s) as a function of s for pure-birth Yule trees (μ=0). Right: The function φFD(s) for birth–death trees with μ/λ=0.8. The curves within each graph show the effect of increasing ν, for values of ν/λ between 0 (top) and 10 (bottom). The top curve also corresponds to the phylogenetic diversity ratio φPD(s)

Simulations

We ran simulations to test the expected relationship between φFD(s) and s and to get estimates of standard deviation. All simulations were run in R version 4.2.1 Team (2021). We first simulated 500 Yule trees (age = 100, λ=0.055,μ=0, repeated and filtered to keep 500 250-tip trees) and 500 birth–death trees (age = 100, λ=0.11,μ=0.088, repeated and filtered to keep 500 trees with 250-300 tips) using the sim.bd.age function in the package TreeSim (Stadler 2011). Features were then evolved on each tree, followed by separate extinction events.

Keeping the rate of feature gain fixed (r=0.3), we modelled five different rates of feature loss (ν{0,0.5λ,λ,2λ,10λ}). We estimated feature gain and loss on each edge using the Gillespie algorithm (Gillespie 1976), where time until the next event (either feature gain or loss) was drawn from an exponential distribution with rate (r+kν), where k is the number of currently existing features at the start of the edge. The type of event was then determined with a Bernoulli draw with probability of a gain equal to r/(r+kν). At each split on the tree, all existing features were copied to descendent edges. Each gain event created a new unique feature, and each loss event randomly selected an existing feature to eliminate from the current edge. At the end of the simulation the presence of features on each tip of the tree was recorded.

Extinction events were simulated by randomly selecting a proportion s of tips to delete, with s ranging from 0.05 to 0.95 by intervals of 0.05. The proportion of unique features remaining after extinction events was recorded, and the results are shown in Fig. 4.

Fig. 4.

Fig. 4

Simulation results for remaining feature diversity as a function of s for pure-birth Yule trees (μ=0, left) and birth–death trees (μ/λ=0.8, right). Results shown for two of the five feature evolution parameter scenarios. Each point represents one simulation on one simulated tree (n=500 trees). The solid curves are the theoretical relationships between the expected proportion of remaining feature diversity and s for both of the shown feature evolution parameter scenarios

Resulting ratios of remaining features generally tracked expectations (the bias for birth–death trees when ν/λ=0 is likely due to our theoretical results conditioning on t rather than n). The standard deviation (SD), calculated for each ν on both Yule and birth–death trees, were fairly consistent for all ν except ν=10λ, where it was noticeably higher, as shown in Table 1. Because the number of total events along an edge (gains and losses) is described by a Poisson distribution, its variance increases with the mean, and this may explain the higher standard deviation at the highest loss rate.

Table 1.

Mean standard deviations of proportions of remaining feature diversity (each value is averaged over the 19 values of s between 0.05 and 0.95)

ν/λ 0 0.5 1 2 10
Yule 2.06×10-2 1.83×10-2 1.84×10-2 1.99×10-2 3.67×10-2
Birth–death 2.33×10-2 2.02×10-2 2.16×10-2 2.52×10-2 5.01×10-2

Features that appear in only one extant species

Let Ut denote the number of features that are present in precisely one species in Tt, and let Ut=E[Ut]. The following result describes a simple relationship between Ut and Ft=E[FD(Tt)].

Proposition 1.4

dFtdt=re(λ-μ)t-(μ+ν)Ut. 5.3

Proof

Let Ft denote the number of features present at time t among the leaves of Tt. Consider evolving Tt for an additional (short) period δ into the future. Then, conditional on Nt (the number of leaves of Tt present at time t) and Ut:

Ft+δ-Ft=1,withprobabilityrNtδ+o(δ);-1,withprobability(μ+ν)δ·Ut+o(δ);0,withprobability1-((μ+ν)Ut+rNt)δ+o(δ).

Applying the expectation operator (and using E[Ft+δ]=E[E[Ft+δ|Nt,Ut]]) and letting δ0 leads to the equation stated.

Concluding comments

In this paper, we have considered, in order, three types of data to quantify the expected loss of feature diversity: sets of features across species (without any model of feature evolution or phylogeny), sets of features at the tips of a given phylogenetic tree, and sets of features at the tips of a random (birth–death) tree. The results of the earlier sections also proved helpful in establishing certain results in later sections.

In terms of wider significance to biodiversity conservation, our results and graphs in Sect. 5 suggest that the extent of relative feature diversity loss following extinction at the present is likely to be greater than that predicted by relative phylogenetic diversity loss for any given extinction rate s(0,1).

Of course, our results are based on simple models (of feature gain and loss, and extinction at the present) and so exploring how these results might extend to more complex and realistic biological models would be a worthwhile topic for future work.

Acknowledgements

We thank François Bienvenu and James Rosindell for helpful suggestions on an earlier draft of this manuscript, and Ailene MacPherson for technical advice regarding the simulations. We also thank the two anonymous reviewers for further helpful comments and the New Zealand Marsden Fund (MFP-UOC2005) for supporting this research.

Appendix: Proof of Theorem 5.1

Part (a): Consider a birth–death tree Tt with parameters λ,μ, a feature evolution model with parameters r,ν, and a survival probability s for leaves at the present. Let Fts=FD(Tts), and let Gts be the random variable that has the same distribution as Fts with the initial condition G0s=0 (i.e. no features at the root of the tree). Let Δts be the number of features that are present at the root of Tt and also in the pruned tree Tts. Then

Fts=Gts+Δts.

Let Fts=E[Fts] and Gts=E[Gts]. The two random variables Gts and Δts are not independent (they are linked by both the underlying tree Tt and the pruning event at the present) however, applying the expectation operator gives:

Fts=E[Gts]+E[Δts]=Gts+F0Rts(λ,μ+ν), 7.1

where F0 is the number of features present at the root of Tts.

Now, consider Gt+δs, and the events that can occur in the interval [0,δ).

  1. A new feature arises (with probability rδ+o(δ));

  2. A speciation event occurs (with probability λδ+o(δ));

  3. The lineage (and hence the tree) dies (with probability μδ+o(δ)).

  4. None of the above occur (with probability 1-(r+λ+μ)δ+o(δ))

Let X be the random variable taking values in {a,b,c,d} which denotes which of these four events occurs. In Case (a), the new feature is also present in Tts with probability Rts(λ,μ+ν) and so

E[Gt+δs|X=a]=E[Gts]+Rts(λ,μ+ν)=Gts+Rts(λ,μ+ν).

For Case (b), E[Gt+δs|X=b]=E[Gts+Hts], where Hts is an independent copy of Gts. Thus,

E[Gt+δs|X=b]=2Gts.

For Cases (c) and (d), we have: E[Gt+δs|X=c]=0andE[Gt+δs|X=d]=Gts. Thus, by the law of total expectation,

Gt+δs=E[E[Gt+δs|X]]=Gts+(rRts(λ,μ+ν)+(λ-μ)Gts)δ+o(δ).

Consequently, the function Gts satisfies the first-order linear differential equation:

ddtGts-(λ-μ)Gts=rRts(λ,μ+ν), 7.2

subject to the initial condition G0s=0. Solving Eq. (7.2) gives:

Gts=0tre(λ-μ)τRt-τs(λ,μ+ν)dτ 7.3

By making a change of variable we can rewrite Eq. (7.3) as:

Gts=re(λ-μ)t0te-(λ-μ)τRτs(λ,μ+ν)dτ. 7.4

Combining this equation and Eq. (7.1) provides the explicit expression for Fts described earlier (Eq. (5.2)).

We now substitute in the expression for Rts(λ,θ) from Eq. (5.1) (with t=τ and θ=μ+ν). For ρ1, we have:

Gts=re(λ-μ)ts(1-ρ)0te(μ-λ)τ(1-s-ρ)e-λ(1-ρ)τ+sdτ. 7.5

Multiplying the numerator and denominator of the integrand by eλ(1-ρ)τ gives e-ντ1-s-ρ+seλ(1-ρ)τ, and then by making the substitution x=e-ντ, we obtain:

Gts=re(λ-μ)ts(1-ρ)ν·e-νt1dx1-s-ρ+sxβ, 7.6

for the value of β described in the theorem. Combining Eqs. (7.1) and (7.6) gives:

Fts=re(λ-μ)ts(1-ρ)ν·e-νt1dx1-s-ρ+sxβ+F0Rts(λ,μ+ν). 7.7

Thus, for ρ1,

FtsFt1=se-νt1dx1-s-ρ+sxβ+o(1)e-νt1dx-ρ+xβ+o(1),

where the two terms of order o(1) (which converge to zero as t grows) refer to the last term on the right of Eq. (7.7) which is bounded above by the constant F0 and so is asymptotically negligible in comparison to the term e(λ-μ)t in Eq. (7.6). This gives the limit for φFD(s) as stated in Part (a) for fixed ρ1.

In the case where ρ=1 (which implies β=0), we use the corresponding expression for Rts(λ,θ) from Eq. (5.1) with t=τ and θ=μ+ν to obtain:

Gts=rse(λ-μ)t0te-(λ-μ)τ1+λsτdτ.

By a similar approach to the above we are led to the second equation in Part (a).

Part (b): Let Tt be a birth–death tree with rates λ,μ where λ>μ, let n(Tt) denote the number of leaves of Tt present at time t, and let E be the event that n(Tt)>0 (i.e. the non-extinction of Tt).

Conditional on the event E, the number of leaves in Tt tends to infinity (with probability 1) as t (Jagers 1992), and so we can define a sequence of trees T1,T2,,Tn, by letting Tk denote the tree Tτ at the first time τ=τ(k) when Tτ has k extant leaves (we ignore leaves of Tt that have already become extinct by time τ).

Next, we establish that E[FD(Tn)]cn for a constant c>0 (in order to apply Theorem 3.1). The tree Tn has n extant pendant edges, and the length of a randomly selected pendant edge in Tn has a strictly positive probability p of having length at least κ>0 (dependent on μ and λ), by Theorem 3.1 of Stadler and Steel (2012). Now, E[FD(Tn)] is bounded below by the total number of features that arises on the n pendant edges and survive to the end of the edge (since all these features will necessarily be distinct from each other, and from other features that arise in the tree). Moreover, for each edge having length at least κ the expected number of features that arise on this edge and survive to the end of the edge is at least rν(1-e-κν). Thus the expected number of features contributed by the pendant edges to FD(Tn) is at least n·prν(1-e-κν).

Thus, we can now apply Theorem 3.1, since cs=limnE[FD(Tns)]E[FD(Tn)] exists, and equals φFD(s), so FD(Tns)FD(Tn) (and thus FD(Tts)FD(Tt)) converges to φFD(s) as n (respectively t) grows.

Part (c): We apply Proposition 2.2. By Part (i) of that result, and conditioning on Tt we obtain E[FD(Tts)|Tt]sFD(Tt), and so, taking expectation again (over the distribution of Tt) gives: E[FD(Tts)]sE[FD(Tt)], and thus φFD(s)=limtE[FD(Tts)]E[FD(Tt)]s.

The inequality φFD(s)1 is clear since, for any choice of Tt, we have FD(Tts)FD(Tt) with probability 1.

For concavity, Proposition 2.2 implies that the conditional expectation EFD(Tts)FD(Tt)|Tt is concave as a function of s, and (by taking expectation over the distribution of Tt), EFD(Tts)FD(Tt) is also concave as a function of s. Finally, by Part (a) of the current theorem, EFD(Tts)FD(Tt) converges (deterministically) to φFD(s) and so this function is also concave as a function of s.

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Devictor V, Mouillot D, Meynard C, Jiguet F, Thuiller W, Mouquet N. Spatial mismatch and congruence between taxonomic, phylogenetic and functional diversity: the need for integrative conservation strategies in a changing world. Ecol Lett. 2010;13:1030–1040. doi: 10.1111/j.1461-0248.2010.01493.x. [DOI] [PubMed] [Google Scholar]
  2. Faith DP. Conservation evaluation and phylogenetic diversity. Biol Cons. 1992;61(1):1–10. doi: 10.1016/0006-3207(92)91201-3. [DOI] [Google Scholar]
  3. Feller W. An introduction to probability theory and its applications. 3. London: Wiley; 1950. [Google Scholar]
  4. Gillespie D. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys. 1976;22:403–434. doi: 10.1016/0021-9991(76)90041-3. [DOI] [Google Scholar]
  5. Huson D, Steel M. Phylogenetic trees based on gene content. Bioinformatics. 2004;20:2044–2049. doi: 10.1093/bioinformatics/bth198. [DOI] [PubMed] [Google Scholar]
  6. Jagers P. Stabilities and instabilities in population dynamics. J Appl Probab. 1992;29:770–780. doi: 10.2307/3214711. [DOI] [Google Scholar]
  7. Kendall DG. On the generalized birth-and-death process. Ann Math Stat. 1948;19:1–15. doi: 10.1214/aoms/1177730285. [DOI] [Google Scholar]
  8. Lambert A, Steel M. Predicting the loss of phylogenetic diversity under non-stationary diversification models. J Theor Biol. 2013;337:111–124. doi: 10.1016/j.jtbi.2013.08.009. [DOI] [PubMed] [Google Scholar]
  9. Mazel F, Mooers AO, Riva GVD, Pennell MW. Conserving phylogenetic diversity can be a poor strategy for conserving functional diversity. Syst Biol. 2017;66(6):1019–1027. doi: 10.1093/sysbio/syx054. [DOI] [PubMed] [Google Scholar]
  10. Mazel F, Pennell MW, Cadotte MW, Diaz S, Riva GVD, Grenyer R, Leprieur F, Mooers AO, Mouillot D, Tucker CM, Pearse WD. Prioritizing phylogenetic diversity captures functional diversity unreliably. Nat Commun. 2018;9:2888. doi: 10.1038/s41467-018-05126-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Mazel F, Pennell MW, Cadotte MW, Diaz S, Riva GVD, Grenyer R, Leprieur F, Mooers AO, Mouillot D, Tucker CM, Pearse WD. Reply to: “Global conservation of phylogenetic diversity captures more than just functional diversity”. Nat Commun. 2019;10:858. doi: 10.1038/s41467-019-08603-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. McDiarmid C. Surveys in combinatorics, London mathematical society lecture notes series 141. Cambridge: Cambridge University Press; 1989. On the method of bounded differences; pp. 148–188. [Google Scholar]
  13. Miller JT, Jolley-Rogers G, Mishler BD, Thornhill AH. Phylogenetic diversity is a better measure of biodiversity than taxon counting. J Syst Evol. 2018;56(6):663–667. doi: 10.1111/jse.12436. [DOI] [Google Scholar]
  14. Mitzenmacher M, Upfal E. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge: Cambridge University Press; 2005. [Google Scholar]
  15. Mooers A, Gascuel O, Stadler T, Li H, Steel M. Branch lengths on birth–death trees and the expected loss of phylogenetic diversity. Syst Biol. 2012;61(2):195–203. doi: 10.1093/sysbio/syr090. [DOI] [PubMed] [Google Scholar]
  16. Owen NR, Gumbs R, Gray CL, Faith DP. Global conservation of phylogenetic diversity captures more than just functional diversity. Nat Commun. 2019;10:859. doi: 10.1038/s41467-019-08600-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Raup DM. Extinction: bad genes or bad luck? Oxford: Oxford University Press; 1993. [Google Scholar]
  18. Rosindell J, Manson K, Gumbs R, Pearse W, Steel M (2022) Phylogenetic biodiversity metrics should account for both accumulation and attrition of evolutionary heritage. Technical Report 2022.07.16.499419, BioRxiv [DOI] [PMC free article] [PubMed]
  19. Stadler T. Simulating trees with a fixed number of extant species. Syst Biol. 2011;60:676–684. doi: 10.1093/sysbio/syr029. [DOI] [PubMed] [Google Scholar]
  20. Stadler T, Steel M. Distribution of branch lengths and phylogenetic diversity under homogeneous speciation models. J Theor Biol. 2012;297(2):33–40. doi: 10.1016/j.jtbi.2011.11.019. [DOI] [PubMed] [Google Scholar]
  21. Team RC (2021) A language and environment for statistical computing (accessed on 7 August 2021)
  22. Tucker CM, Aze T, Cadotte MW, Cantalapiedra JL, Chisholm C, Díaz S. Assessing the utility of conserving evolutionary history. Biol Rev. 2019;94:1740–1760. doi: 10.1111/brv.12526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Tucker CM, Davies TJ, Cadotte MW, Pearse WD. On the relationship between phylogenetic diversity and trait diversity. Ecology. 2018;99(6):1473–1479. doi: 10.1002/ecy.2349. [DOI] [PubMed] [Google Scholar]
  24. Wicke K, Fischer M. Phylogenetic diversity and biodiversity indices on phylogenetic networks. Math Biosci. 2018;298:80–90. doi: 10.1016/j.mbs.2018.02.005. [DOI] [PubMed] [Google Scholar]
  25. Wicke K, Mooers A, Steel M. Formal links between feature diversity and phylogenetic diversity. Syst Biol. 2021;70:480–490. doi: 10.1093/sysbio/syaa062. [DOI] [PubMed] [Google Scholar]
  26. Yang Z, Rannala B. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Mol Biol Evol. 1997;14(7):717–724. doi: 10.1093/oxfordjournals.molbev.a025811. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Mathematical Biology are provided here courtesy of Springer

RESOURCES