Abstract
Stochastic birth–death models provide the foundation for studying and simulating evolutionary trees in phylodynamics. A curious feature of such models is that they exhibit fundamental symmetries when the birth and death rates are interchanged. In this article, we first provide intuitive reasons for these known transformational symmetries. We then show that these transformational symmetries (encoded in algebraic identities) are preserved even when individuals at the present are sampled with some probability. However, these extended symmetries require the death rate parameter to sometimes take a negative value. In the last part of this article, we describe the relevance of these transformations and their application to computational phylodynamics, particularly to maximum likelihood and Bayesian inference methods, as well as to model selection.
Keywords: Algebraic symmetries, Bayesian inference, birth–death models, maximum likelihood, phylodynamics, phylogenetics, speciation–extinction models
Linear birth–death models play a pivotal role in phylodynamics. These stochastic models provide a prior distribution on evolutionary trees (both the shape and edge length distribution) for Bayesian inference methods (Yang and Rannala 1997; Stadler et al. 2013). Moreover, these models allow biologists to estimate key parameters of macroevolution (such as speciation rates corresponding to birth rates and extinction rates corresponding to death rates) from reconstructed phylogenetic trees which were dated by fossil (or other time-sampled) evidence (Nee et al. 1994).
The study of such models dates back to some classical papers from the early to mid-20th century (Yule 1924; Kendall 1948a,b), and the application of these models to phylogenetics and phylodynamics flourished from the 1990s onwards (Nee et al. 1994; Rannala and Yang 1996). Further in-depth mathematical analysis (Aldous 2001; Maddison 2007; Aldous et al. 2009; Morlon et al. 2011; Lambert and Stadler 2013) has extended our understanding of the properties of these models and extensions that allow more complex processes of birth and death.
In this article, we identify and explore curious symmetries in fundamental birth–death model probability distributions when the birth and death rates (
and
) are swapped. This symmetry has been known in the case of complete sampling of individuals at present (Waugh 1958; Tavaré 2018), and we will start the article by providing an intuitive account of this symmetry that seems at first a little surprising. We extend this to the more general setting where a third parameter is introduced—the sampling probability
of individuals sampled at the present—and show how analogous symmetries can be derived by a transformation that reduces these three parameters to just two (
). One can view these as “corrected” birth and death rates, except for the caveat that this new death rate
can now take negative values. A major advantage of working with the transformed pair of parameters (
) is that it captures the correct dimensionality of the process (namely 2), thereby avoiding the inherent redundancy present in the 3D parameterization that uses the triple
. This viewpoint has implications for phylogenetic and phylodynamic inferences, both in the maximum likelihood and Bayesian settings, and we explore these implications in the latter part of our article.
Birth–Death Symmetries
Consider a phylogenetic tree that evolves from a single ancestral individual according to a birth–death process, with a constant birth rate
and a constant death rate
. Suppose that at some time point in the tree, there are
individuals present. Let
be the probability that at time
later, there will be
individuals present. These transition probabilities are classical and provide a foundation for phylodynamic models. The starting point for this article is the following curious symmetry which goes back to (Waugh, 1958) and was recently highlighted again in (Tavaré, 2018):
![]() |
(1) |
This equation states the surprising result that the probability of one individual having one surviving descendant after time
remains the same if we swap the birth rate (
) and the death rate (
). Thus a process with a birth rate of, say, 100 and a death rate of, say, 1—a scenario with a very fast-growing population—has the same probability of having one surviving descendant as a process with a birth rate of 1 and a death rate of 100—a scenario where we know that the process eventually leads to extinction. This symmetry can be extended to more general scenarios, as stated in the following theorem.
Theorem 1.
For any non-negative value of
and any value of
:
More generally, set
Then for all
and
the following birth–death interchange symmetry holds:
This result has been established in Waugh (1958) and explicitly stated in Tavaré (2018) (an alternative formal proof of Theorem 1 is provided in the supplementary material available on Dryad at http://dx.doi.org/10.5061/dryad.57704ft). To provide some intuitive insight into this result, we now provide a direct and conceptually transparent Proof of Theorem 1 in the case where
(i.e., equation (1)); the result for
follows by essentially applying the same idea. We start a birth–death process with one individual. The waiting time between “events” (a birth event or death event) is
, where
is the number of individuals at the considered time point. Let
, and consider two different scenarios (one proceeds forward in time, the other backward):
Scenario 1: The process starts at time 0 and is stopped at time
. At an event, with probability
, we add an individual and, with probability
. we remove an individual. Scenario 1 is a classic forward-in-time birth–death process.Scenario 2: The process starts at time
and is stopped at time
. At an event, with probability
we add an individual and, with probability
, we remove an individual. Scenario 2 is a birth–death process in reversed time with the birth and death rates being interchanged compared with Scenario 1.
Intuitively, the result of the time-reversed process with birth and death being interchanged is analogous to the forward-in-time process. However, we justify this intuition by a formal argument showing that the probability of observing one individual after time
is the same under Scenario 1 and Scenario 2.
Consider some population size trajectory
that starts at time
with one individual and ends with one individual after time
(see Fig. 1 for an example). At each event,
can grow or decrease by one. Let the number of growth events be
, which therefore also equals the number of death events. Denote the time of these
events by
, and define
and
. See Figure 1 for an example with
.
Figure 1.
The forward-in-time birth–death process with realization
and the equivalent time-reversed process with interchanged rates and realization
.
The probability density of
under Scenario 1,
, is a product of the probability for the birth events,
, for the death events
, and the waiting times between events,
, where
is the number of individuals prior to the event at time
. Finally, the term
stipulates that no subsequent event happens after the event at time
. In summary, the probability density of
under Scenario 1 for
is:
![]() |
For
, we have
![]() |
Now we reverse time in the realization
and call it
. Thus,
starts where
ends, and
ends where
starts. The probability density of
under Scenario 2 is then
. We establish
analogous to the procedure above, with the birth events in
being death events in
and vice versa. Thus, the same
and
factors are multiplied when calculating the probability density of
under Scenario 2, compared to the probability density of
under Scenario 1. Furthermore, the waiting time contributions are the same for Scenario 1 and Scenario 2, and thus
.
Note that
is the integral over all realizations
under Scenario 1,
, where
is a realization with
birth events according to an event time vector
.
Analogously,
. Since
, each component in this integration has the same probability density and thus we have
.
One can directly extend this argument to establish Theorem 1 for any value of
by considering the associated forward-in-time and backward-in-time processes.
General Symmetries under Incomplete Sampling
We continue to study a birth–death model with constant and non-negative birth and death rates
and
. However, we now allow each of the individuals present at time
to be sampled (independently) with probability
.
Let us first suppose that we start with one individual at time 0, and let
be the probability that
sampled descendants are observed (i.e., extant and sampled) at time
. The exact expressions for
are provided by the following theorem.
Theorem 2.
For
, we have:
with
For the critical case
, we have:
with
For
and
, the result is already provided in Stadler (2010), based on earlier work by Nee et al. (1994); Yang and Rannala (1997). The critical case for
is provided for example in (Feller, 2008). For the proof of the remaining cases, refer to the Supplementary Material available on Dryad.
In what follows, we investigate the expressions for
in detail, and identify symmetries with respect to adjusted birth and death rates.
Negative “Death Rates” in the Case of Incomplete Sampling
We introduce two new variables
and
, which will play a key role in the remainder of the article. They are defined by
and
according to the following transformation:
![]() |
Note that when
, we have
and
. Further, for all vales of
we have
(thus
if and only if
). Note also that
is entirely possible (e.g., when
and
, we obtain
). In this case,
cannot easily be viewed as a death rate (nor as a birth rate); however, allowing
to take any real value (positive or negative) means that all parameter triplets
have a transformation to
.
The following lemma is straightforward to verify using simple algebra (Stadler 2013).
Lemma 3.
For all
and
, the four functions
can be written as functions of only two parameters (
and
) when
(rather than the three parameters
). When
, these four functions can be written as functions of the single parameter
.
In order to investigate symmetries, we define the following functions, which only depend on
,
, and
(rather than the four parameters
and
) (this dependence on
,
, and
can easily be seen from Lemma 3). Let:
![]() |
For
, these equations are,
![]() |
In particular, we have:
. This leads to the following symmetries with respect to
and
. A proof is provided in the Supplementary Material available on Dryad.
Theorem 4.
For
, the following symmetries hold:
and for all
:
Tree Probability Densities
Let
be a phylogenetic tree generated by a birth–death process starting with one individual and being stopped after time
. Each individual alive after time
is sampled with probability
. In this tree, all extinct lineages are pruned, and only the lineages leading to the sampled tips are kept. Such a tree is also called the reconstructed tree (Nee et al. 1994), as indicated by the red lines in Figure 2. Let this tree have
sampled tips and the branching times
, where time is measured from the present time 0. Let
be the number of coexisting lineages of tree
at time
(see Fig. 2).
Figure 2.

A phylogenetic tree
that evolves under a birth–death process with rates
and with sampling at the present with probability
. Lineages ending in a death (extinction) are marked by
whereas lineages at the present that are not sampled are marked by o. The reconstructed tree on the sampled extant individuals is indicated by the additinal lines starting at
.
Let
be the probability density of the tree
, and let
be the probability density of the tree
, given that at least one individual is sampled at present. Thus
is the stem age (
) of the process. For
, this corresponds to conditioning on nonextinction of the process. Let
denote the probability density of the tree
, given that we sample exactly
tips at present (denoted by
).
The tree
in these formulations was a tree starting with one individual, leading to two lineages at time
in the past. Alternatively, a tree
may start with two lineages at time
ago; the probability of such a tree is
. Let
be the probability density of the tree
conditioning on sampling at least one descendant individual from both initial lineages. Note that when conditioning on sampling, the time
is the crown age of the clade (
). Furthermore, let
be the probability density of the tree
conditioned on sampling exactly
tips at present. Finally, in the setting where
is chosen uniformly at random from
, then a tree
conditioned on
tips and integrated over all possible
has probability density
.
In what follows, we assume
and thus
; otherwise, we cannot obtain a tree with
.
Theorem 5.
The tree probability densities can be expressed as functions of
and
, or
and
. Omitting the parameters
, and
in these functions for easier reading, the expressions are given in the following table:
|
|
|
|---|---|---|
|
||
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
||
|
|
|
We note that the expressions in the middle column have been presented in Stadler (2013) (equation 1–7), highlighting that
goes back to Thompson (1975) for
,
to Nee et al. (1994), and
to (Yang and Rannala, 1997) (both for
). Furthermore, the probability density
for
is described in Felsenstein (2004) and in earlier work by Rannala (1997). The idea of parameter transformation (right column) has been introduced for
in (Stadler, 2009).
Remark 6.
Only the expressions for the unconditioned tree probability densities (i.e., the equations not conditioning on observing at least one sample) depend on all three parameters
and
. The remaining five expressions (the conditioned tree probability densities) only depend on two parameters (
), meaning only two out of the three birth–death parameters
can be inferred from the phylogenetic tree. This has already been observed for
by (Stadler, 2009) and is then trivial to generalize for the other equations. Furthermore, based on Theorem 4, the expressions for
and
(i.e., the expressions where we condition on both the age of the process and the number of sampled tips) give the same result for
and for when the parameters are swapped to
. For complete sampling, Rannala (1997) noticed this symmetry in
(this author also mentioned that this special symmetry had also been independently observed by Monty Slatkin). Note that
is possible, whereas
, thus the swapping is only well-defined if
.
Implications for Empirical Data Analysis
Tree Symmetries for Complete Sampling with Implications on Parameter Inference
As highlighted in Remark 6, we can, based on Corollary 1 of the supplementary material available on Dryad, directly conclude that
![]() |
Thus, we obtain the same probability density when swapping birth and death. As a consequence, we have to specify if the birth rate is bigger or smaller than the death rate prior to any analysis based on these equations.
Mapping from
to the Birth–Death Model Parameters
with implications for Maximum Likelihood and Bayesian Inference
When using the tree probability densities in a maximum likelihood inference framework, the expressions are maximized over the parameters for a given tree. Based on the five conditioned tree probability density equations, we should optimize over
and
, with
and
, instead of maximizing over the three parameters
and
, as the latter parameterization induces a ridge in the likelihood surface and thus optimization is problematic. This is equivalent to optimizing when assuming complete sampling (and allowing the “death rate”
to be negative) and, in a second step, assuming a sampling probability
and transforming from
to
. This procedure was already suggested in (Stadler, 2009), Section 6.2 (up to pointing out the possibility for negative
). We next investigate for which chosen values of
we can transform
to
. A proof is provided in the Supplementary Material available on Dryad.
Theorem 7.
Let
denote the conditioned tree probability density for an arbitrary tree
given
and
. The expression for
is given in the right column of Theorem 5. Each
has corresponding birth–death parameters
, namely:
Given
, we obtain the same tree probability density
using the expression in the middle column of Theorem 5 with parameters
, where
is any value in
.
Given
, we obtain the same tree probability density
using the expression in the middle column of Theorem 5 with parameters
, where
is any value in
.
In summary, given we estimate a negative
, for some
, we cannot transform the parameters to
. Thus, for parameter inference on empirical data, the best strategy might be to fix
and then estimate
and
.
Given the dependency of
and
on only two parameters
and
, one may decide to perform a Bayesian analysis on
(see also Stadler (2009), Section 6.1). Care has to be taken though regarding the priors, since these priors play out in nonstraightforward ways. Assume, for example, that the analysis is performed by sampling
. For each sampled parameter pair, one might assume a
uniformly at random. Given that
, this would yield a uniform distribution on the chosen
. However, given that some sampled parameter pairs reveal
, it follows that only a small
, namely
is possible, meaning that overall, the samples on
would be nonuniform, with a preference for small values of
. Thus, in the Bayesian setting, we need to assess the effective priors on
given the parameter nonidentifiability.
Mappings between Birth–Death Model Parameters
and
Next, we characterize all birth–death parameters that are transformations of
, the proof is again provided in the Supplementary Material available on Dryad.
Theorem 8.
Let
be birth–death parameters with the corresponding
. There exist parameters
and
with
if
(for all
) and if
(for all
).
Note that the parameters
and
give thus rise to the same tree probability density.
Corollary 9.
With
(and thus
) a transformation always exists for
. However, a parameter transformation may not be possible for
(e.g., if
, we cannot transform to
).
Next, we consider
(i.e., the transformation to the case of complete sampling). A further consequence of Theorem 8 is the following result from Stadler and Steel (2012).
Corollary 10.
With
, a transformation exists to
if
. If
, no transformation exists.
Implications for proving properties of the birth–death tree distribution.
Properties of the birth–death tree distribution need to be known in order to test if empirical data are significantly different from these properties and thus the birth–death model has to be rejected for the given data. Sometimes, proofs of the properties of the conditioned tree distribution are carried out for complete sampling (i.e., for parameters
). Such properties also hold for incomplete sampling if
or if
. To include the parameter space
, the proof needs to be done with explicitly acknowledging incomplete sampling. This was noticed already in Stadler and Steel (2012).
Implications regarding model selection.
For a given phylogenetic tree, it is tempting to ask if a model with
or
fits the data better. However, for every parameter combination
, we also find a parameter combination
with both parameter triples having the same conditioned tree probability density. Moreover, there are parameter combinations
without a corresponding triplet where
(see Corollary 9). Thus, the model with
always gets more support than the model with
. In summary, such a test is meaningless because of the parameter nonidentifiability.
Discussion
Birth–death models have been studied for almost 100 years (Yule 1924; Kendall 1948a). However, surprising properties are still being uncovered. Here, we presented some unexpected symmetries in birth–death models with incomplete sampling of individuals. In particular, a birth–death process with incomplete sampling can be described phylogenetically through two parameters instead of three parameters, resulting in parameter nonidentifiability.
Such parameter nonidentifiability has important consequences for using birth–death models in phylogenetic and phylodynamic inference. In particular, the likelihood surface of the three birth–death parameters
and
for a given tree has a ridge, and we can therefore only estimate two of the three parameters. Maximum likelihood estimation should thus be done for a fixed sampling probability. In Bayesian analysis, we need to carefully consider the effective prior when using such nonidentifiable parameter triplets.
Furthermore, we showed that for some of the parameter triplets (
), their two-parameter description is, in fact, equivalent to a birth–death process with complete sampling. However, in some cases, the resulting ‘death’ rate is negative, and thus the transformed parameters cannot always be considered as a birth–death process with complete sampling. This means that we cannot simply prove properties of phylogenetic trees for complete sampling and then extrapolate to incomplete sampling, as we then miss some birth–death parameter combinations (namely the ones leading to a negative “death” rate). Furthermore, testing whether the data are completely sampled (
) or not (
) is not informative, as the models with
always have more support: parameter triplets for incomplete sampling may only have corresponding complete sampling parameters with a negative “death” rate, whereas birth and death rates under complete sampling have a corresponding triplet for all
.
The birth–death model presented here is the simplest model for speciation and extinction, or for transmission and recovery. However, it has limitations for explaining the data, as it assumes exponential growth of the population, although populations cannot have unlimited growth, and it assumes that all individuals are dynamically equivalent. There has been considerable work on extending the birth–death model to address such limitations (Maddison 2007; Morlon et al. 2011; Stadler 2011; Etienne et al. 2012; Stadler and Bonhoeffer 2013), but no symmetries and only very special parameter nonidentifiability has been observed (Stadler et al. 2013). It will be interesting to explore in the future whether the observed symmetries and nonidentifiabilities in our simple model are also present in these more complex models.
Acknowledgements
We wish to thank Joe Felsenstein and Nicolas Salamin for drawing our attention to the symmetry stated in Equation (1). Further, we thank Bruce Rannala for pointing us to his work on tree symmetry in a special case (Rannala 1997). We further wish to thank the two anonymous reviewers for several helpful suggestions, in particular one of them pointing us to (Waugh, 1958) and (Tavaré, 2018).
Supplementary Material
Data available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.57704ft.
Funding
This work was supported in part by the European Research Council under the Seventh Framework Programme of the European Commission [PhyPD: grant agreement number 335529 to T.S.].
References
- Aldous D. 2001. Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Stat. Sci. 16:23–34. [Google Scholar]
- Aldous D., Krikun M., Popovic L.. 2009. Five statistical questions about the tree of life. Syst. Biol. 60:318–328. [DOI] [PubMed] [Google Scholar]
- Etienne R.S., Haegeman B., Stadler T., Aze T., Pearson P.N., Purvis A., Phillimore A.B.. 2012. Diversity-dependence brings molecular phylogenies closer to agreement with the fossil record. Proc. R. Soc. Lond. B. 279:1300–1309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feller W. 2008. An introduction to probability theory and its applications, Vol. 2 New York (USA): John Wiley & Sons. [Google Scholar]
- Felsenstein J. 2004. Inferring phylogenies. Sunderland, MA: Sinauer Associates; 8:8–5. [Google Scholar]
- Kendall D.G. 1948a. On some modes of population growth leading to R. A. Fisher’s logarithmic series distribution. Biometrika 35:6–15. [PubMed] [Google Scholar]
- Kendall D.G. 1948b. On the generalized “birth-and-death” process. Ann. Math. Statist. 19:1–15. [Google Scholar]
- Lambert A., Stadler T.. 2013. Birth–death models and coalescent point processes: the shape and probability of reconstructed phylogenies. Theor. Pop. Biol. 90:113–128. [DOI] [PubMed] [Google Scholar]
- Maddison W. 2007. Estimating a binary character’s effect on speciation and extinction. Syst. Biol. 56:701–710. [DOI] [PubMed] [Google Scholar]
- Morlon H., Parsons T.L., Plotkin J.B.. 2011. Reconciling molecular phylogenies with the fossil record. Proc. Natl. Acad. Sci. USA 1081:6327–6332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nee S.C., May R.M., Harvey P.. 1994. The reconstructed evolutionary process. Philos. Trans. R. Soc. Ser B. 344:305–311. [DOI] [PubMed] [Google Scholar]
- Rannala B. 1997. Gene genealogy in a population of variable size. Heredity 78:417. [DOI] [PubMed] [Google Scholar]
- Rannala B., Yang Z.. 1996. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 43:304–311. [DOI] [PubMed] [Google Scholar]
- Stadler T. 2009. On incomplete sampling under birth-death models and connections to the sampling-based coalescent. J. Theor. Biol. 261:58–66. [DOI] [PubMed] [Google Scholar]
- Stadler T. 2010. Sampling-through-time in birth–death trees. J. Theor. Biol. 267:396–404. [DOI] [PubMed] [Google Scholar]
- Stadler T. 2011. Mammalian phylogeny reveals recent diversification rate shifts. Proc. Natl. Acad. Sci. USA 108:6187–6192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stadler T. 2013. How can we improve accuracy of macroevolutionary rate estimates? Syst. Biol. 62:321. [DOI] [PubMed] [Google Scholar]
- Stadler T., Bonhoeffer S.. 2013. Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods. Philos. Trans. R. Soc. Ser. B. 368:20120198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stadler T., Kühnert D., Bonhoeffer S., Drummond A.J.. 2013. Birth–death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV)). Proc. Natl. Acad. Sci. USA 110:228–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stadler T., Steel M.. 2012. Distribution of branch lengths and phylogenetic diversity under homogeneous speciation models. J. Theor. Biol. 297:33–40. [DOI] [PubMed] [Google Scholar]
- Tavaré} S. 2018. The linear birth–death process: an inferential retrospective. Adv. Appl. Prob. 50:253–269. [Google Scholar]
- Thompson E.A. 1975. Human evolutionary trees. New York (USA): Cambridge University Press. [Google Scholar]
- Waugh W.O. 1958. Conditioned Markov processes. Biometrika 45:241–249. [Google Scholar]
- Yang Z., Rannala B.. 1997. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Mol. Biol. Evol. 17:717–724. [DOI] [PubMed] [Google Scholar]
- Yule G.U. 1924. A mathematical theory of evolution: based on the conclusions of Dr. J.C. Willis. Philos. Trans. R. Soc. Ser. B. 213:21–87. [Google Scholar]





















































































































