Skip to main content
Systematic Biology logoLink to Systematic Biology
. 2023 Dec 28;73(1):235–246. doi: 10.1093/sysbio/syad075

The Limits of the Constant-rate Birth–Death Prior for Phylogenetic Tree Topology Inference

Mark P Khurana 1,, Neil Scheidwasser-Clow 2, Matthew J Penn 3, Samir Bhatt 4,5,#, David A Duchêne 6,#
Editor: Sebastian Hoehna
PMCID: PMC11129600  PMID: 38153910

Abstract

Birth–death models are stochastic processes describing speciation and extinction through time and across taxa and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under the constant-rate birth–death (crBD) model tend to differ from empirical trees, for example, with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between the crBD model and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which the crBD model differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD model trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used Bayesian methods and crBD priors with those that used other non-crBD priors and non-Bayesian methods (i.e., maximum likelihood methods), we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using the crBD model in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios but also highlight that empirically observed phylogenetic imbalance is highly improbable under the crBD model, leading to systematic bias in data sets with limited information content.

Keywords: Birth–death model, phylogenetic timescale inference, tree imbalance, tree shape


Understanding macroevolutionary processes is highly dependent on the use of diversification models, which describe speciation and extinction through time. In Bayesian phylogenetics, the choice of tree prior often follows a diversification model, informing some aspects of the inferred tree topology and branch lengths (Colijn and Plazzotta 2018). Diversification models in phylogenetics are, therefore, useful tools for inferring the relationship between taxa as well as understanding the drivers of speciation and/or extinction (Mooers and Heard 1997) and are used in a wide range of research fields spanning from epidemiology to conservation (Tomiuk and Loeschcke 1994; Andréoletti et al. 2022; Attwood et al. 2022). Consequently, how the choice of evolutionary model impacts tree topology has been a source of great interest. As growing numbers of empirically estimated trees become available, it becomes possible to test general hypotheses about diversification processes, as well as understand how theoretical models as general-purpose prior distributions impact our inferences of processes in real-world settings (Hey 1992).

Birth–death models are commonly used as phylogenetic tree priors and are continuous-time Markov processes that describe the birth (i.e., speciation) and death (i.e., extinction) of lineages forwards in time. Lineages either speciate or go extinct at a certain rate, including parameters for birth (λ), death (μ), and present-day sampling fraction (ρ). Both λ and μ are non-negative real numbers, and ρ is a real number between zero and one. A standard birth–death model is the constant-rate birth–death (crBD) process (Kendall 1948; Thompson 1975; Nee 2006). It is age-independent such that rates are constant over time, and the time until speciation or extinction is exponentially distributed. The simplest form of the crBD model is the Yule model, which assumes that extinction rates are zero (μ = 0) and therefore only fits a λ and ρ parameter (Yule 1925). crBD models are generally considered flexible (i.e., not highly informative and therefore not highly biasing), computationally efficient, and mathematically convenient (Gernhard 2008; Morlon 2014). This makes them attractive general priors for phylogenetic inference, meaning that the crBD model remains ubiquitous in the biological sciences.

Previous research has highlighted how trees inferred using the crBD model as a prior differ from empirical trees; there has been significant interest in the departure in divergence times and diversification rates between empirical data and crBD priors, with substantial differences reported in empirical case studies (Phillimore and Price 2008; Jones 2011; Ritchie et al. 2016; Ritchie and Ho 2019) and in simulations among well-understood evolutionary models (Sarver et al. 2019). Since early in their development, neutral evolutionary models including the crBD model were also recognized to produce different tree topologies from empirical trees, with empirical cases being significantly more imbalanced (i.e., where balance is the extent to which branching events lead to groups of equal size) (Guyer and Slowinski 1991; Losos and Adler 1995; Aldous 1996, 2001; Huelsenbeck and Kirkpatrick 1996; Mooers and Heard 1997; Steel and McKenzie 2001; Pinelis 2003; Bienvenu et al. 2021). In addition, the topology of trees has been linked with a range of complex evolutionary processes, including fitness, diversity-dependence, and selection (Maia et al. 2004; Dayarian and Shraiman 2014). Nonetheless, few studies have used large numbers of empirical data sets and comprehensive simulations to test the absolute performance of the crBD model as a prior for topology inference (Jones 2011; Hagen et al. 2015; Bienvenu et al. 2021), so our understanding of empirical trees and the crBD model for inference of fundamental parameters in phylogenetics remains incomplete.

Importantly, using a range of tree indices can lead to a detailed description of model performance (Brown and Thomson 2018; Duchêne et al. 2018), and this can be done using unique indices that describe the space of empirical versus model-simulated trees (Colijn and Plazzotta 2018; Duchene et al. 2019). Choosing the right indices for comparison is particularly important since they may have inherent flaws, such as not being applicable to some kinds of tree topologies or networks, or when comparing trees with different numbers of leaves (Coronado et al. 2020; Fischer et al. 2021; Lemant et al. 2022). Recently, several new indices have been proposed to describe phylogenetic trees (Mir et al. 2018; Lima et al. 2020; Bocharov et al. 2022), building on a host of older indices used to quantify tree features such as balance (Sackin 1972; Colless 1982) and stemminess (i.e., the relative length of internal to terminal branches) (Fiala and Sokal 1985; Rohlf et al. 1990). Coupled with the integration of these indices into software packages, large numbers of empirically estimated phylogenetic trees can now be characterized in high detail. Thus, in this Point of View, we use several recently compiled indices to explore the limits of the crBD prior for empirically inferred tree topologies, allowing us to understand the role that the simplifying assumptions of these models play in influencing our understanding of macroevolutionary processes.

Empirical Tree Data Set

To compare empirical trees with theoretical crBD-simulated trees, a sample of empirically estimated phylogenetic trees was sourced from TimeTree (TToL5) (Kumar et al. 2022), together with their associated references and other metadata. These trees are aimed at representing a signal from empirical data, even though, in many cases, this signal will be affected by noise and poor model performance. Many of these trees were inferred using the crBD model as a prior, such that we assume that they represent the departure from the prior from the signal in the data (e.g., from molecular sequences). Regardless of the influence of a prior distribution, we take these trees as being a better representation of the empirical diversification process than that arising from any existing model.

All species trees available from TimeTree were downloaded upon request (accessed October 2022), totaling 4070 trees. Among these, we selected those that included branch lengths, were correctly written in Newick format, were rooted, ultrametric, bifurcating, and contained more than 3 tips. We also selected trees that had more unique tip lengths than half the number of tips. This guarantees that the selected trees were not highly simplified estimates of the diversification process, while allowing some redundancy that can arise from dating methods (e.g., from cherries or sister pairs). In total, this resulted in 1189 unique empirical trees for analysis; the final set of trees and code can be found on Dryad. The original Newick trees can be obtained from TimeTree upon request, but a full list of all the original studies from which the Newick trees are derived, including PubMed/ TimeTree ID, article name, first author name, year, and DOI can be found in Supplementary Table S1 (available on Dryad at https://doi.org/10.5061/dryad.2fqz612vg).

Identifying Sensitive Tree Topology Indices Using Null-model Scenarios

To measure the discrepancy between empirical and simulated trees with maximal detail and accuracy, as well as to identify the most sensitive methods for measuring this discrepancy, we used an existing compilation of tree topology indices (Fischer et al. 2021). Indices used in this study can broadly be categorized into 3 groups: balance indices (i.e., where more balanced trees have higher values), imbalance indices (i.e., where higher values denote more imbalance), and other tree topology indices. We selected 29 computationally tractable indices with high chances of describing the main forms of violation of crBD priors to be included in the study (Table 1). A full list of each index name, description, interpretation, software availability in R, and source reference can be found in Supplementary Table S2 on Dryad. Further derivations and mathematical details for the majority of indices have also been detailed previously (Fischer et al. 2021).

Table 1.

Tree topology indices used in the study. *As defined by (Fischer et al. 2021).

Index Description Source
Imbalance indices*
Normalized Colless Index, Ic Sum of absolute differences subtended by left and right leaves of internal nodes, including the root. It is an aggregate imbalance value among internal nodes, normalized by the maximum value for a tree of that size. (Heard 1992)
Colless-like indices, CD,f Sum of the balance values of the internal nodes relative to f (f-size) and D (dissimilarity) of a tree. A generalization of the Colless index to rooted non-binary trees, with several variations. In this study, we used an exponential function for the f size and mean deviation from the median (MDM) for the dissimilarity D function. (Mir et al. 2018)
Equal weights Colless index (I2 index) Modified Colless index that applies equal weight to all internal nodes of a tree. It counteracts ‘deep’ nodes being weighted heavier in the standard Ic. (Mooers and Heard 1997)
Total I (I’) index Sum of imbalance values of all binary nodes in the tree. Imbalance values are the “ratio between the observed deviation of the larger pending subtree from the minimum value possible and the maximum deviation possible” (Fischer et al. 2021). (Fusco and Cronk 1995; Purvis et al. 2002)
Mean and Median I (I’) index Mean/median of imbalance values of all binary nodes in the tree, where imbalance values for I are defined as above. (Fusco and Cronk 1995; Purvis et al. 2002)
Normalized Sackin index Sackin index (sum of the edges from a leaf to the root for all leaves) normalized to the size of the tree. (Blum et al. 2006)
Total cophenetic index, Φ Sum of the depths (i.e., number of edges) of the common ancestor of all pairs of leaves (i.e., sum of cophenetic values [depth of the most recent common ancestor]). (Mir et al. 2013)
Average leaf depth (N¯) Mean of leaf depths in a tree, where depth is the number of leaves in a pending subtree attached to each node. (Sackin 1972; Shao and Sokal 1990)
Leaf Depth Variance (σN2) Variance of leaf depths in a tree. (Sackin 1972; Shao and Sokal 1990; M. Coronado et al. 2020)
J index Number of internal nodes that are not perfectly balanced in a rooted, binary tree. (Rogers 1996)
Symmetry nodes index (SNI) Number of interior nodes which are not symmetry nodes (i.e., nodes whose two maximal pendant subtrees are symmetrical). (Kersting and Fischer 2021)
Ŝ-shape statistic Sum of the log value of all clade sizes minus one, for all internal nodes in a tree. (Blum and François 2006)
Maximum depth Maximum depth of any leaf in the tree, where depth is the number of edges from a leaf to the root. (Colijn and Gardy 2014)
stairs/stairs1/
staircase-ness measure
The proportion of sub-trees for all vertices that are imbalanced to any extent (i.e., sub-trees where the left child contains more leaves than the right child, or vice-versa). (Norström et al. 2012)
Balance indices*
Mean depth Mean topological distance of each node, including both internal nodes and leaves, to the root. (Herrada et al. 2011)
Rooted Quartet index Balance index based on the symmetry of the evolutionary history of every set of 4 leaves. It is the sum of symmetry values of each 4-tuple of leaves in a tree. (Coronado et al. 2019)
Furnas rank Ranking of trees, which occurs through the “left-light rooted ranking” system, which assigns an integer value rank to each sub-tree with a given number of terminal nodes. Balanced trees receive larger values relative to the number of possible distinct topologies. (Furnas 1984; Kirkpatrick and Slatkin 1993)
stairs2 The mean ratio between the number of leaves of the smaller and larger pending subtree over all inner vertices.(Fischer et al. 2021) The average of all the min(l,r)/max(l,r) values of each sub-tree, where l and r are the number of leaves in the left and right children of a subtree (Norström et al. 2012). (Norström et al. 2012)
Maximum width Maximum number of nodes for a given depth, d (as defined above), in a tree. (Colijn and Gardy 2014)
Maximum difference in widths Maximum difference in width values at any value for depth, d. (Colijn and Gardy 2014)
B1 index The sum of the inverse heights (i.e., maximum depth of any leaf) of subtrees subrooted to internal nodes of a tree, excluding the root. (Shao and Sokal 1990)
B2 index Measures the equitability of arriving at the leaves of a tree when starting at the root and assuming equiprobable branching at each inner vertex; a Shannon-Wiener information function (Fischer et al. 2021). (Shao and Sokal 1990)
Other topology indices
Normalized number of cherries Number of cherries (adjacent pairs of leaves with a common ancestor node) divided by half the number of tips in a tree. (McKenzie and Steel 2000)
Area Per Pair (APP) index The average distance between all pairs of leaves in a tree, where distance denotes the number of edges that connect leaves. (Lima et al. 2020)
Normalized IL number Normalized number of internal nodes with a single tip child; equivalent to the modified cherry index for rooted, binary trees (Fischer et al. 2021). (Colijn and Gardy 2014)
Average ladder length Mean size of ladders in a tree, where ladders are the maximum number of connected IL nodes with a single leaf descendant. (Colijn and Gardy 2014)
Maximum height Maximum height (i.e., maximum depth of any leaf) among all tips in the tree. (Metzig et al. 2019)
Normalized number of pitchforks Number of substructures consisting of 3 tips, normalized to the size of the tree. (Metzig et al. 2019)

To identify tree indices that are highly sensitive in discriminating between empirical and crBD trees, we simulated trees using empirical parameter estimates under a null-model scenario including λ and μ parameters. To perform these simulations, crBD parameters were first estimated from empirical trees assuming age-independent speciation and extinction, using a constant sampling fraction (ρ = 1) to avoid issues of non-identifiability among parameters. We also maintained any potential outgroups from the empirical trees to mimic the empirical process used by researchers in real-life settings. For each of the empirical trees, the λ and μ parameters were estimated using maximum likelihood (ML) under the model by Nee et al. (Nee et al. 1994), as implemented in the phytools (Revell 2012) R package. These parameter estimates were subsequently used to simulate a parametric bootstrap distribution. For each empirical tree, we used the ML crBD parameters (λ and μ) (Fig. 1) to make 1000 simulations in the TreeSim R package (Stadler 2011), version 2.4, with the number of taxa and tree age matched to the empirical values. All data extraction, simulation and analysis methods were implemented in R and are available on Dryad. Tree indices were calculated using the R packages TreeBalance (Fischer et al. 2021) and phyloTop (Kendall et al. 2023).

Figure 1.

Figure 1.

Distribution of crBD model parameter values inferred from empirical TimeTree trees (n = 1189 trees) with ρ = 1. (a) Birth parameters, log(λ). (b) Death parameters, log(μ). 778 (65.4%) of μ values were zero. Green dotted line represents median value, excluding μ values = 0.

Topological Comparison of Empirical and Simulated Trees

We tested the hypothesis that empirical trees are different from crBD model simulations, as described by our broad set of statistics. We first compared the distribution of simulated index values with that of each empirical tree, for each of the 1189 trees. To do this, we calculated a z-score of the corresponding empirical tree, which in this case is the number of standard deviations from the empirical tree to the mean of the distribution of simulated trees. We next tested whether there were systematic differences in index values between the pooled set of empirical and simulated trees, the null hypothesis being that there was no difference. To do this, we used Wilcoxon (Mann–Whitney) U tests on the pooled data across studies, comparing index values from all empirical against all simulated trees.

We found that imbalance indices were significantly greater and balance indices significantly lower for empirical trees than for simulated trees (Fig. 2a). This means that empirical trees have significantly greater heterogeneity in rates of diversification across lineages than is predicted by the crBD model. Notably, there was strong and consistent evidence that leaf depth variance and other depth indices (mean depth, maximum height, maximum depth, and average leaf depth; Table 1) were higher for empirical trees in all scenarios. Empirical trees also tended to have a lower maximum width (i.e., the maximum number of nodes at a given depth) than simulated trees, indicating a greater imbalance in empirical trees. The findings support previous work showing that more balanced trees have lower maximum depth and lower leaf depth variation than imbalanced ones (Sackin 1972; Kirkpatrick and Slatkin 1993; Coronado et al. 2020). These results were consistent even when removing trees with potential outgroups (n = 706 trees without potential outgroups, Supplementary Fig. S1), highlighting that observed differences were not driven by their inclusion.

Figure 2.

Figure 2.

(a) Differences between the realized Wilcoxon (Mann–Whitney) U statistic value and the expected U value, including corresponding P-values, for each index (n = 1189 trees). Dotted line represents a Bonferroni-corrected P-value. (b) Z-score distributions for median I, the normalized Colless index, leaf depth variance, B2, and stairs2. Green dotted line represents mean value, gray dotted line denotes zero.

Overall, the B2 index, leaf depth variance, I indices (specifically mean I and median I), stairs2, and the normalized Colless index were the strongest discriminators of empirical versus crBD-simulated trees (Fig. 2b). In contrast, APP, SNI, J index, Furnas rank, and the B1 index were found to be poor discriminators, showing little evidence of a difference between empirical and simulated trees. This result is consistent with recent evidence that some indices are far more detailed descriptions of tree topology than others, with B2 being uniquely powerful for discriminating different processes of diversification (Bienvenu et al. 2021). A full distribution of z-scores for each index can be found in Supplementary Figure S2 on Dryad.

The results reinforce previous findings of greater imbalance in empirical trees than in trees simulated under a range of theoretical models (Losos and Adler 1995; Aldous 1996, 2001; Mooers and Heard 1997; Steel and McKenzie 2001; Pinelis 2003). The substantial imbalance observed in nature can be driven by variation in both speciation and extinction rates, through time and across lineages. Concrete mechanisms to explain differences in tree balance include taxon sampling (Heath et al. 2008), refractory periods after speciation, age-dependent speciation (Hagen et al. 2015), stochastic differences in population sizes (Hubbell 2001; Rosindell et al. 2010), adaptive radiation (i.e., the rapid diversification of a single lineage into many species), random mass extinction (Heard and Mooers 2002), selective extinction of lineages (Kirkpatrick and Slatkin 1993), and changes in selection pressure over time (Stich and Manrubia 2009), with the resulting variation in evolutionary rates between clades leading to less balanced trees. Even species interactions among close and/or distant relatives can influence tree topology to increase imbalance (Donatti et al. 2011; Chamberlain et al. 2014; Bello and Barreto 2021).

It is worth highlighting that one of the indices with the best performance in discriminating empirical from theoretical trees is B2. This statistic has the attractive property of having a probabilistic interpretation: the probability that the process that started at the root ends at each of the tips (Shao and Sokal 1990). A proposed intuitive description of this probabilistic process is to send water down a phylogeny, let it trickle down the edges, and measure the distribution of water between leaves (Bienvenu et al. 2021). This statistic was recently found to be useful for comparing phylogenetic networks and to be a finer-grained description of tree topology (Bienvenu et al. 2021). We therefore encourage the usage of B2 as a metric for measuring imbalance and for testing the performance of tree priors, and we speculate that other metrics with good performance in this study will also be useful for diversification model assessment, such as leaf depth variance (Sackin 1972; Shao and Sokal 1990; Coronado et al. 2020) and stairs2 (Norström et al. 2012).

Depth and width also provide an interesting description of tree shape and are worth discussing separately. Several biological mechanisms could explain why empirical trees tend to be “deeper” and “narrower” than simulated trees. As mentioned earlier, the mechanisms that explain imbalance more broadly could result in rapid speciation events and subsequent increases in depth within certain clades. By extension, this reduces the maximum width and increases the variance in leaf depths compared to a corresponding simulated tree of an equal size. In other real-world inference settings, depth can also be explained by the existence of polytomies, which can either be soft (i.e., due to insufficient data) or hard (i.e., true polytomies) (Coddington and Scharff 1996). Biologically, certain species could rapidly expand their range, while other species members might become rapidly isolated peripheral populations, leading to rapid speciation events and a subsequent observed hard polytomy (Anacker and Strauss 2014; Lawson et al. 2015). As mentioned earlier, the observation of higher depth and decreased width points to a strong imbalance in empirically inferred trees compared to simulated crBD model ones. However, while these findings demonstrate the topological differences between empirical and crBD model trees from a theoretical perspective, the degree to which these differences influence phylogenetic inference in real-world inference settings remains unclear.

Meta-analysis of Empirically Published Trees

To test the degree to which the crBD prior influences tree (im)balance in real-world inference settings, we conducted a meta-analysis for a random subset (300 trees) of the original 1189 empirical tree dataset. For each tree, metadata including the study name, first author name, DOI, year, molecular dating method, and tree prior used in the study was collected (see Supplementary Table S3). We then classified the molecular dating methods and tree priors into 3 broad method groups: Bayesian (BD) [n = 180], Bayesian (non-BD) [n = 57], and non-Bayesian [n = 62]. One study was excluded due to missing information in the article for both molecular dating method and tree prior, resulting in 299 included studies.

To understand whether the inference method and tree prior influenced the tree topology of the published trees, we fit a simple linear regression to test whether each of the 5 most sensitive topology indices (B2 index, leaf depth variance, median I, stairs2, and the normalized Colless index) could be explained by the inference method (prior) and the number of tips in the tree as predictor variables. We found no evidence that tree topologies were significantly explained by the inference method (Fig. 3; Supplementary Table S4), suggesting that crBD priors are robust across a broad range phylogenetic inference scenarios. A similar analysis exploring the specific prior used for Bayesian crBD analyses (i.e., Yule, crBD, and non-specified crBD models as priors) yielded similar results (Supplementary Table S4).

Figure 3.

Figure 3.

(a) Linear regression coefficients and corresponding 95% confidence intervals for the relationship between inference method (including tree prior) and various topology indices (Imbalance indices: median I, normalized Colless index, leaf depth variance; Balance indices: B2 index, stairs2), controlling for the number of tree tips (not included in the figure). Non-Bayesian studies were used as the reference group. There was no evidence that tree topologies were significantly influenced by the inference method for the 299 included empirical studies. (b) Scatter and density plots for the values of the two most discriminatory indices, B2 and leaf depth variance, colored by inference method.

Exploring the Role of the crBD Prior for Highly Imbalanced trees

To explore the limits of the crBD prior for inference of highly imbalanced phylogenies with finite data, we selected 100 of the most imbalanced trees in the dataset with 10 ≤tips ≤100 (n = 855), using the normalized Colless index to select the most imbalanced trees. We then normalized the age of all trees to 1 and subsequently simulated 10Kb sequence alignments for these tree topologies under a Jukes–Cantor (JC) model using the simSeq function in phangorn in R (Schliep 2011), with the sequence size (10Kb) large enough to ensure reasonable phylogenetic informativeness. To vary information content in alignments, sequences were simulated using 3 different evolutionary rates: 0.5, 0.05, and 0.005, such that these are the expected number of substitutions per site from the root to each tip. To contextualize these rates in empirical biological scenarios, a tree height of 1 could correspond to 1 myr (e.g., in data sets of mammals) or 1 year (e.g., in data sets of RNA viruses), in which case the slowest evolutionary rate of 0.005 is comparable to rates observed in population-level data sampled in birds or mammals (~2 × 10–3 substitutions per site per million years [Zhang et al. 2014]), but relatively fast for data sampled across strains of SARS-CoV-2 (10−3 to 10−4 substitutions per site per year [Van Dorp et al. 2020]).

Simulated alignments were then used to re-infer tree topology using the crBD model in a Bayesian framework and with free diversification rates using maximum likelihood, implemented in RevBayes (version 1.2.1) (Höhna et al. 2016a) and IQ-TREE 2 (version 2.2.0) (Minh et al. 2020), respectively. For the RevBayes Markov chain Monte Carlo (MCMC) analyses, two independent runs with 20,000 generations each were used, using a burn-in of 2000 generations with a tuning interval of 100. We also specified a global molecular clock and a JC substitution model to match the model used for simulation and prevent model misspecification. Similarly, IQ-TREE 2 analyses were run using a JC substitution model and 1000 bootstrap replicates.

We tested MCMC convergence by verifying that the effective sample size (ESS) was over 200 for all parameters in each run, using the R package coda (version 0.19-4) (Plummer et al. 2006). All analyses showed good convergence, with all ESS values ≥ 750 for each of the 100 inferred trees for all 3 substitution rates (Supplementary Table S5 on Dryad). The resulting ML and maximum clade credibility (MCC) trees for each rate and inference method were then compared to the original empirical trees using the top 5 most discriminatory topology indices (B2 index, leaf depth variance, median I, stairs2, and the normalized Colless index) (Fig. 4; Supplementary Fig. S3 for box plots). The topological distance between the original empirical and re-inferred trees was also calculated to measure inference accuracy (1—normalized Robinson-Foulds distance).

Figure 4.

Figure 4.

Differences from simulations to imbalanced empirical trees, with simulations of molecular alignments made under 3 substitution rates and with tree inferences from IQ-TREE 2 and RevBayes. Whiskers denote error bars. Imbalance indices: (a) median I, (b) normalized Colless index, (c) leaf depth variance; Balance indices: (d) B2 index, (e) stairs2; Distance index: (f) Topological accuracy (1—Normalized Robinson-Foulds Distance).

The trees from Bayesian inference are more balanced than their corresponding source empirical trees at low substitution rates, but this difference diminishes as the substitution rate increases. We also find that some posterior values were extremely improbable under the prior distribution, suggesting model misspecification instead of a more desirable flat prior distribution. Trees inferred under ML often show the contrary pattern, being more imbalanced than the source empirical trees, although this is not consistent across all indices. ML inference also overestimated imbalance compared to BI, especially for medium and high substitution rates. Importantly, Bayesian inference led to more accurate tree topologies than ML inference across the set of highly imbalanced trees.

The tendency towards balanced trees in Bayesian inference appears to reflect the prior probability across tree topologies. The distribution of indices across prior-simulated trees shows the prior is not even across values of imbalance, regardless of the metric used (Fig. 5). In fact, the most highly imbalanced or balanced trees are extremely unlikely under the prior (Supplementary Fig. S4), which places disproportionate weight on trees with intermediate balance. Posteriors are consistently more imbalanced, while trees analyzed under ML show the greatest imbalance according to most metrics (Supplementary Fig. S5). While the crBD model is traditionally viewed as uninformative, with every labeled history being equally likely, such a prior places limited weight on a ubiquitous phenomenon in empirical data, which is that of highly imbalanced trees. We propose this is problematic given that many empirical cases have short branches deep in time for which even genomic data will contain substantial amounts of error. Some clear examples are fast radiations where both the phylogenetic tree topology and branching times remain difficult to resolve to this day, such as early divergences of birds (Jarvis et al. 2014; Prum et al. 2015), insects (Misof et al. 2014), or all metazoans (Dohrmann and Wörheide 2017; Laumer et al. 2019). We suggest that researchers exploring questions with dramatic changes in net diversification rates, as observed in these early radiations, also report the comparison of prior and posterior distributions in diversification rates and phylogenetic imbalance and consider performing posterior-predictive tests (e.g., Höhna et al. 2016b) focusing on metrics of imbalance.

Figure 5.

Figure 5.

Density plots for each index (Imbalance indices: normalized Colless index, median I, leaf depth variance; Balance indices: B2 index, stairs2) with rows showing the simulation substitution rates (rate = 0.5, 0.05, 0.005). The prior group values are derived from simulated trees, where the best crBD parameters were inferred from the 100 imbalanced empirical trees. One thousand crBD model trees were then simulated for each set of crBD parameters (where ρ = 1), whereafter index values were calculated. The prior distribution therefore represents a best-case prior distribution for the given set of 100 imbalanced empirical trees and is the same distribution for each rate row.

Concluding Remarks

Using a large sample of empirically estimated phylogenetic time trees and a wide range of indices, we demonstrate the limits of the crBD prior in practice. First, we confirm previous findings showing theoretical differences in the topological characteristics of empirical and crBD-simulated trees, with crBD model trees being significantly more balanced (Guyer and Slowinski 1991; Losos and Adler 1995; Aldous 1996, 2001; Mooers and Heard 1997; Steel and McKenzie 2001; Pinelis 2003). We also identify several tree topology indices that are powerful for discriminating empirical and simulated crBD trees, including the B2 index, leaf depth variance, median I, stairs2, and the normalized Colless index. However, in practice, our results show that crBD priors are generally robust across a broad range of phylogenetic inference scenarios, not revealing any significant differences in tree topologies with non-Bayesian or Bayesian non-BD methods. On the other hand, we find topological limitations of crBD priors when inferring highly imbalanced trees in scenarios with limited information content (low substitution rate).

Our results provide evidence that the space of highly imbalanced trees is highly realistic and yet largely rejected a priori by the common crBD branching models. The space of highly imbalanced trees is furthermore likely to be difficult to explore due to these trees being far less common than trees with intermediate amounts of imbalance. During inference, this is combined with the difficulty that arises from a greater disparity in branch lengths found in imbalanced trees (Duchêne et al. 2015; Murray et al. 2016) and with efficiently exploring the vastness of the space with phylogenetic terraces (Sanderson et al. 2015). If imbalanced trees pose a more difficult inference problem, efficiently sampling these regions of tree space is an important objective of future research. Avenues to overcome this difficulty might include methods that are terrace-aware (Chernomor et al. 2016) or ameliorating substitution model performance issues associated with short branches (Kalyaanamoorthy et al. 2017; Duchêne et al. 2018; Crotty et al. 2019). Improving Markov chain Monte Carlo (MCMC) methods (Müller and Bouckaert 2020; Zhang et al. 2020), tree proposal operators (Zhang et al. 2020), and the development of altogether different alternatives (Zhang and Matsen 2022) could also be assessed with the metrics proposed here for the power of new methods to provide better coverage of imbalanced tree space. Perhaps the most interesting advances in the field are in the growing list of variations to BD, including those where lineages are allowed unique rates of speciation and extinction (e.g., multitype-birth-death; birth-death-shift; ClaDS) (Höhna et al. 2019; Maliet et al. 2019; Barido-Sottani et al. 2020; Barido-Sottani and Morlon 2023), time-dependent (i.e., where speciation rates change over time) (Paradis 2011), and diversity-dependent extensions for BD priors (Stadler 2013; Hagen et al. 2015). These advances should prove critical for improving topological accuracy, but their benefit might be limited by the computational burden of Bayesian analysis of genome-scale data. For this reason, methods that improve computational efficiency will be fundamental for future implementations of BD models.

Overall, crBD priors continue to be used in standard practice in biology for the inference of evolutionary timescales. Our findings demonstrate that crBD priors are generally robust across a broad range of phylogenetic inference scenarios; however, there is an accumulating understanding of the limitations of the model in certain scenarios. We conclusively show that one of these limitations is a difficulty to infer highly imbalanced tree topologies when phylogenetic informativeness is low and propose that model assessment using highly-discriminatory metrics is a useful avenue in future model development. Several non-BD priors that could more accurately describe macroevolutionary processes, such as Bellman–Harris or Crump–Mode–Jagers models, to improve macroevolutionary modeling (Jones 2011; Holman 2017; Hagen and Stadler 2018), are examples of promising methods that could be assessed using the metrics proposed here, allowing researchers to evaluate the merit of novel methods in real-world settings.

Supplementary Material

Data available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.2fqz612vg

Acknowledgments

The authors would like to thank Swapnil Mishra for his invaluable input during the analysis stage of the project. We would also like to thank the reviewers for their excellent feedback on earlier drafts of the article.

Contributor Information

Mark P Khurana, Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark.

Neil Scheidwasser-Clow, Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark.

Matthew J Penn, Department of Statistics, University of Oxford, OX1 3LB, Oxford, UK.

Samir Bhatt, Section of Epidemiology, Department of Public Health, University of Copenhagen, 1352 Copenhagen, Denmark; MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, SW7 2AZ, London, UK.

David A Duchêne, Centre for Evolutionary Hologenomics, University of Copenhagen, 1352 Copenhagen, Denmark.

Funding

S.B. acknowledges support from the MRC Centre for Global Infectious Disease Analysis (MR/R015600/1), jointly funded by the UK Medical Research Council (MRC) and the UK Foreign, Commonwealth & Development Office (FCDO), under the MRC/FCDO Concordat agreement, and part of the EDCTP2 programme supported by the European Union. S.B. is funded by the National Institute for Health Research (NIHR) Health Protection Research Unit in Modelling and Health Economics, a partnership between the UK Health Security Agency, Imperial College London and LSHTM (grant code NIHR200908). Disclaimer: “The views expressed are those of the author(s) and not necessarily those of the NIHR, UK Health Security Agency or the Department of Health and Social Care.” S.B. acknowledges support from the Novo Nordisk Foundation via The Novo Nordisk Young Investigator Award (NNF20OC0059309), which also supports N.S. S.B. acknowledges support from the Danish National Research Foundation via a chair grant which also supports M.K. S.B. acknowledges support from The Eric and Wendy Schmidt Fund for Strategic Innovation via the Schmidt Polymath Award (G-22-63345). M.P. is sponsored by a EPSRC DTP Studentship. D.D. acknowledges support from a European Research Council Marie Sklodowska-Curie fellowship to D.A.D. (H2020-MSCA-IF-2019-883832).

References

  1. Aldous D. 1996. Probability distributions on cladograms. In: Aldous D., Pemantle R., editors. Random Discrete Structures. New York, NY: Springer New York. p. 1–18. [Google Scholar]
  2. Aldous D.J. 2001. Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statist. Sci. 16:23–34. [Google Scholar]
  3. Anacker B.L., Strauss S.Y.. 2014. The geography and ecology of plant speciation: range overlap and niche divergence in sister species. Proc. Biol. Sci. 281:20132980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Andréoletti J., Zwaans A., Warnock R.C.M., Aguirre-Fernández G., Barido-Sottani J., Gupta A., Stadler T., Manceau M.. 2022. The occurrence birth–death process for combined-evidence analysis in macroevolution and epidemiology. Syst. Biol. 71:1440–1452. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Attwood S.W., Hill S.C., Aanensen D.M., Connor T.R., Pybus O.G.. 2022. Phylogenetic and phylodynamic approaches to understanding and combating the early SARS-CoV-2 pandemic. Nat. Rev. Genet. 23:547–562. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Barido-Sottani J., Morlon H.. 2023. The ClaDS rate-heterogeneous birth–death prior for full phylogenetic inference in BEAST2. Syst. Biol. 72:syad027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Barido-Sottani J., Vaughan T.G., Stadler T.. 2020. A multitype birth–death model for Bayesian inference of lineage-specific birth and death rates. Syst. Biol. 69:973–986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bello C., Barreto E.. 2021. The footprint of evolution in seed dispersal interactions. Science 372:682–683. [DOI] [PubMed] [Google Scholar]
  9. Bienvenu F., Cardona G., Scornavacca C.. 2021. Revisiting Shao and Sokal’s B2 index of phylogenetic balance. J. Math. Biol. 83:52. [DOI] [PubMed] [Google Scholar]
  10. Blum M.G.B., François O.. 2006. Which random processes describe the Tree of Life? A large-scale study of phylogenetic tree imbalance. Syst. Biol. 55:685–691. [DOI] [PubMed] [Google Scholar]
  11. Blum M.G.B., François O., Janson S.. 2006. The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance. Ann. Appl. Probab. 16:2195–2214. [Google Scholar]
  12. Bocharov S., Harris S., Kominek E., Mooers A.O., Steel M.. 2022. Predicting long pendant edges in model phylogenies, with applications to biodiversity and tree inference. Syst. Biol. 72:syac059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Brown J.M., Thomson R.C.. 2018. Evaluating model performance in evolutionary biology. Annu. Rev. Ecol. Evol. Syst. 49:95–114. [Google Scholar]
  14. Chamberlain S., Vázquez D.P., Carvalheiro L., Elle E., Vamosi J.C.. 2014. Phylogenetic tree shape and the structure of mutualistic networks. J. Ecol. 102:1234–1243. [Google Scholar]
  15. Chernomor O., von Haeseler A., Minh B.Q.. 2016. Terrace aware data structure for phylogenomic inference from supermatrices. Syst. Biol. 65:997–1008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Coddington J.A., Scharff N.. 1996. Problems with “soft” polytomies. Cladistics 12:139–145. [DOI] [PubMed] [Google Scholar]
  17. Colijn C., Gardy J.. 2014. Phylogenetic tree shapes resolve disease transmission patterns. Evol Med Public Health 2014:96–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Colijn C., Plazzotta G.. 2018. A metric on phylogenetic tree shapes. Syst. Biol. 67:113–126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Colless D.H., Wiley E.O.. 1982. Review of phylogenetics: the theory and practice of phylogenetic systematics. Syst. Zool. 31:100–104. [Google Scholar]
  20. Coronado T., Mir A., Rosselló F., Rotger L.. 2020. On Sackin’s original proposal: the variance of the leaves’ depths as a phylogenetic balance index. BMC Bioinf. 21:154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Coronado T.M., Mir A., Rosselló F., Valiente G.. 2019. A balance index for phylogenetic trees based on rooted quartets. J. Math. Biol. 79:1105–1148. [DOI] [PubMed] [Google Scholar]
  22. Crotty S.M., Minh B.Q., Bean N.G., Holland B.R., Tuke J., Jermiin L.S., Haeseler A.V.. 2019. GHOST: Recovering historical signal from heterotachously evolved sequence alignments. Syst. Biol. 69:syz051. [DOI] [PubMed] [Google Scholar]
  23. Dayarian A., Shraiman B.I.. 2014. How to infer relative fitness from a sample of genomic sequences. Genetics 197:913–923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Dohrmann M., Wörheide G.. 2017. Dating early animal evolution using phylogenomic data. Sci. Rep. 7:3599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Donatti C.I., Guimarães P.R., Galetti M., Pizo M.A., Marquitti F.M.D., Dirzo R.. 2011. Analysis of a hyper-diverse seed dispersal network: modularity and underlying mechanisms: Modularity in a seed dispersal network. Ecology Lett. 14:773–781. [DOI] [PubMed] [Google Scholar]
  26. Duchêne D.A., Duchêne S., Ho S.Y.W.. 2015. Tree imbalance causes a bias in phylogenetic estimation of evolutionary timescales using heterochronous sequences. Mol. Ecol. Resour. 15:785–794. [DOI] [PubMed] [Google Scholar]
  27. Duchêne D.A., Duchêne S., Ho S.Y.W.. 2018. Differences in performance among test statistics for assessing phylogenomic model adequacy. Genome Biology and Evolution 10:1375–1388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Duchene S., Bouckaert R., Duchene D.A., Stadler T., Drummond A.J.. 2019. Phylodynamic model adequacy using posterior predictive simulations. Syst. Biol. 68:358–364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Fiala K.L., Sokal R.R.. 1985. Factors determining the accuracy of cladogram estimation: evaluation using computer simulation. Evolution 39:609–622. [DOI] [PubMed] [Google Scholar]
  30. Fischer M., Herbst L., Kersting S., Kühn L., Wicke K.. 2021. Tree balance indices: a comprehensive survey. arXiv 2109.12281, doi: 10.48550/arXiv.2109.12281. [DOI] [Google Scholar]
  31. Furnas G.W. 1984. The generation of random, binary unordered trees. J. Classif. 1:187–233. [Google Scholar]
  32. Fusco G., Cronk Q.C.B.. 1995. A new method for evaluating the shape of large phylogenies. J. Theor. Biol. 175:235–243. [Google Scholar]
  33. Gernhard T. 2008. The conditioned reconstructed process. J. Theor. Biol. 253:769–778. [DOI] [PubMed] [Google Scholar]
  34. Guyer C., Slowinski J.B.. 1991. Comparisons of observed phylogenetic topologies with null expectations among three monophyletic lineages. Evolution 45:340–350. [DOI] [PubMed] [Google Scholar]
  35. Hagen O., Hartmann K., Steel M., Stadler T.. 2015. Age-dependent speciation can explain the shape of empirical phylogenies. Syst. Biol. 64:432–440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Hagen O., Stadler T.. 2018. TreeSimGM: Simulating phylogenetic trees under general Bellman–Harris models with lineage-specific shifts of speciation and extinction in R. Methods Ecol. Evol. 9:754–760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Heard S.B. 1992. Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees. Evolution 46:1818–1826. [DOI] [PubMed] [Google Scholar]
  38. Heard S.B., Mooers A.O.. 2002. Signatures of random and selective mass extinctions in phylogenetic tree balance. Syst. Biol. 51:889–897. [DOI] [PubMed] [Google Scholar]
  39. Heath T.A., Zwickl D.J., Kim J., Hillis D.M.. 2008. Taxon sampling affects inferences of macroevolutionary processes from phylogenetic trees. Syst. Biol. 57:160–166. [DOI] [PubMed] [Google Scholar]
  40. Herrada A., Eguíluz V.M., Hernández-García E., Duarte C.M.. 2011. Scaling properties of protein family phylogenies. BMC Evol. Biol. 11:155. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Hey J. 1992. Using phylogenetic trees to study speciation and extinction. Evolution 46:627–640. [DOI] [PubMed] [Google Scholar]
  42. Höhna S., Freyman W.A., Nolen Z., Huelsenbeck J.P., May M.R., Moore B.R.. 2019. A Bayesian approach for estimating branch-specific speciation and extinction rates. bioRxiv. doi: 10.1101/555805 [DOI] [Google Scholar]
  43. Höhna S., Landis M.J., Heath T.A., Boussau B., Lartillot N., Moore B.R., Huelsenbeck J.P., Ronquist F.. 2016a. RevBayes: Bayesian phylogenetic inference using graphical models and an interactive model-specification language. Syst. Biol. 65:726–736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Höhna S., May M.R., Moore B.R.. 2016b. TESS: an R package for efficiently simulating phylogenetic trees and performing Bayesian inference of lineage diversification rates. Bioinformatics 32:789–791. [DOI] [PubMed] [Google Scholar]
  45. Holman E.W. 2017. Age-dependent and lineage-dependent speciation and extinction in the imbalance of phylogenetic trees. Syst. Biol. 66:912–916. [DOI] [PubMed] [Google Scholar]
  46. Hubbell S.P. 2001. The unified neutral theory of biodiversity and biogeography. Princeton: Princeton University Press. [DOI] [PubMed] [Google Scholar]
  47. Huelsenbeck J.P., Kirkpatrick M.. 1996. Do phylogenetic methods produce trees with biased shapes? Evolution 50:1418–1424. [DOI] [PubMed] [Google Scholar]
  48. Jarvis E.D., Mirarab S., Aberer A.J., Li B., Houde P., Li C., Ho S.Y.W., Faircloth B.C., Nabholz B., Howard J.T., Suh A., Weber C.C., Da Fonseca R.R., Li J., Zhang F., Li H., Zhou L., Narula N., Liu L., Ganapathy G., Boussau B., Bayzid Md S., Zavidovych V., Subramanian S., Gabaldón T., Capella-Gutiérrez S., Huerta-Cepas J., Rekepalli B., Munch K., Schierup M., Lindow B., Warren W.C., Ray D., Green R.E., Bruford M.W., Zhan X., Dixon A., Li S., Li N., Huang Y., Derryberry E.P., Bertelsen M.F., Sheldon F.H., Brumfield R.T., Mello C.V., Lovell P.V., Wirthlin M., Schneider M.P.C., Prosdocimi F., Samaniego J.A., Velazquez A.M.V., Alfaro-Núñez A., Campos P.F., Petersen B., Sicheritz-Ponten T., Pas A., Bailey T., Scofield P., Bunce M., Lambert D.M., Zhou Q., Perelman P., Driskell A.C., Shapiro B., Xiong Z., Zeng Y., Liu S., Li Z., Liu B., Wu K., Xiao J., Yinqi X., Zheng Q., Zhang Y., Yang H., Wang J., Smeds L., Rheindt F.E., Braun M., Fjeldsa J., Orlando L., Barker F.K., Jønsson K.A., Johnson W., Koepfli K.-P., O’Brien S., Haussler D., Ryder O.A., Rahbek C., Willerslev E., Graves G.R., Glenn T.C., McCormack J., Burt D., Ellegren H., Alström P., Edwards S.V., Stamatakis A., Mindell D.P., Cracraft J., Braun E.L., Warnow T., Jun W., Gilbert M.T.P., Zhang G.. 2014. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346:1320–1331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Jones G.R. 2011. Tree models for macroevolution and phylogenetic analysis. Syst. Biol. 60:735–746. [DOI] [PubMed] [Google Scholar]
  50. Kalyaanamoorthy S., Minh B.Q., Wong T.K.F., von Haeseler A., Jermiin L.S.. 2017. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14:587–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Kendall D.G. 1948. On the generalized “Birth-and-Death” process. Ann. Math. Statist 19:1–15. [Google Scholar]
  52. Kendall M., Michael B, Caroline C.. 2023. phyloTop. Available from https://michellekendall.github.io/phyloTop/
  53. Kersting S.J., Fischer M.. 2021. Measuring tree balance using symmetry nodes — A new balance index and its extremal properties. Math. Biosci. 341:108690. [DOI] [PubMed] [Google Scholar]
  54. Kirkpatrick M., Slatkin M.. 1993. Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution 47:1171–1181. [DOI] [PubMed] [Google Scholar]
  55. Kumar S., Suleski M., Craig J.M., Kasprowicz A.E., Sanderford M., Li M., Stecher G., Hedges S.B.. 2022. TimeTree 5: An expanded resource for species divergence times. Mol. Biol. Evol. 39:msac174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Laumer C.E., Fernández R., Lemer S., Combosch D., Kocot K.M., Riesgo A., Andrade S.C.S., Sterrer W., Sørensen M.V., Giribet G.. 2019. Revisiting metazoan phylogeny with genomic sampling of all phyla. Proc. Biol. Sci. 286:20190831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Lawson L.P., Bates J.M., Menegon M., Loader S.P.. 2015. Divergence at the edges: peripatric isolation in the montane spiny throated reed frog complex. BMC Evol. Biol. 15:128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Lemant J., Le Sueur C., Manojlović V., Noble R.. 2022. Robust, universal tree balance indices. Syst. Biol. 71:1210–1224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Lima T.A., Marquitti F.M.D., de Aguiar M.A.M.. 2020. Measuring tree balance with normalized tree area. arXiv 2008.12867, doi: 10.48550/arXiv.2008.12867. [DOI] [Google Scholar]
  60. Losos J.B., Adler F.R.. 1995. Stumped by trees? A generalized null model for patterns of organismal diversity. Am. Nat 145:329–342. [Google Scholar]
  61. Maia L.P., Colato A., Fontanari J.F.. 2004. Effect of selection on the topology of genealogical trees. J. Theor. Biol. 226:315–320. [DOI] [PubMed] [Google Scholar]
  62. Maliet O., Hartig F., Morlon H.. 2019. A model with many small shifts for estimating species-specific diversification rates. Nat. Ecol. Evol. 3:1086–1092. [DOI] [PubMed] [Google Scholar]
  63. McKenzie A., Steel M.. 2000. Distributions of cherries for two models of trees. Math. Biosci. 164:81–92. [DOI] [PubMed] [Google Scholar]
  64. Metzig C., Ratmann O., Bezemer D., Colijn C.. 2019. Phylogenies from dynamic networks. PLoS Comput. Biol. 15:e1006761. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Minh B.Q., Schmidt H.A., Chernomor O., Schrempf D., Woodhams M.D., Von Haeseler A., Lanfear R.. 2020. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37:1530–1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Mir A., Rosselló F., Rotger L.. 2013. A new balance index for phylogenetic trees. Math. Biosci. 241:125–136. [DOI] [PubMed] [Google Scholar]
  67. Mir A., Rotger L., Rosselló F.. 2018. Sound Colless-like balance indices for multifurcating trees. PLoS One 13:e0203401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Misof B., Liu S., Meusemann K., Peters R.S., Donath A., Mayer C., Frandsen P.B., Ware J., Flouri T., Beutel R.G., Niehuis O., Petersen M., Izquierdo-Carrasco F., Wappler T., Rust J., Aberer A.J., Aspöck U., Aspöck H., Bartel D., Blanke A., Berger S., Böhm A., Buckley T.R., Calcott B., Chen J., Friedrich F., Fukui M., Fujita M., Greve C., Grobe P., Gu S., Huang Y., Jermiin L.S., Kawahara A.Y., Krogmann L., Kubiak M., Lanfear R., Letsch H., Li Y., Li Z., Li J., Lu H., Machida R., Mashimo Y., Kapli P., McKenna D.D., Meng G., Nakagaki Y., Navarrete-Heredia J.L., Ott M., Ou Y., Pass G., Podsiadlowski L., Pohl H., Von Reumont B.M., Schütte K., Sekiya K., Shimizu S., Slipinski A., Stamatakis A., Song W., Su X., Szucsich N.U., Tan M., Tan X., Tang M., Tang J., Timelthaler G., Tomizuka S., Trautwein M., Tong X., Uchifune T., Walzl M.G., Wiegmann B.M., Wilbrandt J., Wipfler B., Wong T.K.F., Wu Q., Wu G., Xie Y., Yang S., Yang Q., Yeates D.K., Yoshizawa K., Zhang Q., Zhang R., Zhang W., Zhang Y., Zhao J., Zhou C., Zhou L., Ziesmann T., Zou S., Li Y., Xu X., Zhang Y., Yang H., Wang J., Wang J., Kjer K.M., Zhou X.. 2014. Phylogenomics resolves the timing and pattern of insect evolution. Science 346:763–767. [DOI] [PubMed] [Google Scholar]
  69. Mooers A.O., Heard S.B.. 1997. Inferring evolutionary process from phylogenetic tree shape. Q Rev. Biol. 72:31–54. [Google Scholar]
  70. Morlon H. 2014. Phylogenetic approaches for studying diversification. Ecol Lett 17:508–525. [DOI] [PubMed] [Google Scholar]
  71. Müller N.F., Bouckaert R.R.. 2020. Adaptive Metropolis-coupled MCMC for BEAST 2. PeerJ 8:e9473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Murray G.G.R., Wang F., Harrison E.M., Paterson G.K., Mather A.E., Harris S.R., Holmes M.A., Rambaut A., Welch J.J.. 2016. The effect of genetic structure on molecular dating and tests for temporal signal. Methods Ecol. Evol. 7:80–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Nee S. 2006. Birth-death models in macroevolution. Annu. Rev. Ecol. Evol. Syst. 37:1–17. [Google Scholar]
  74. Nee S., Robert Mccredie M., Harvey P.H.. 1994. The reconstructed evolutionary process. Phil. Trans. R. Soc. Lond. B 344:305–311. [DOI] [PubMed] [Google Scholar]
  75. Norström M.M., Prosperi M.C.F., Gray R.R., Karlsson A.C., Salemi M.. 2012. PhyloTempo: A set of r scripts for assessing and visualizing temporal clustering in genealogies inferred from serially sampled viral sequences. Evol Bioinform Online 8:261–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Paradis E. 2011. Time-dependent speciation and extinction from phylogenies: A least squares approach. Evolution 65:661–672. [DOI] [PubMed] [Google Scholar]
  77. Phillimore A.B., Price T.D.. 2008. Density-dependent cladogenesis in birds. PLoS Biol. 6:e71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Pinelis I. 2003. Evolutionary models of phylogenetic trees. Proc. Biol. Sci. 270:1425–1431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Plummer M., Best N., Cowles K., Vines K.. 2006. CODA: convergence diagnosis and output analysis for MCMC. R News 6:7–11. [Google Scholar]
  80. Prum R.O., Berv J.S., Dornburg A., Field D.J., Townsend J.P., Lemmon E.M., Lemmon A.R.. 2015. A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature 526:569–573. [DOI] [PubMed] [Google Scholar]
  81. Purvis A., Katzourakis A., Agapow P.-M.. 2002. Evaluating phylogenetic tree shape: two modifications to Fusco & Cronk’s method. J. Theor. Biol. 214:99–103. [DOI] [PubMed] [Google Scholar]
  82. Revell L.J. 2012. phytools: an R package for phylogenetic comparative biology (and other things): phytools: R package. Methods Ecol. Evol. 3:217–223. [Google Scholar]
  83. Ritchie A.M., Ho S.Y.W.. 2019. Influence of the tree prior and sampling scale on Bayesian phylogenetic estimates of the origin times of language families. J. Lang. Evol. 4:108–123. [Google Scholar]
  84. Ritchie A.M., Lo N., Ho S.Y.W.. 2016. The impact of the tree prior on molecular dating of data sets containing a mixture of inter- and intraspecies sampling. Syst. Biol. 66:syw095. [DOI] [PubMed] [Google Scholar]
  85. Rogers J.S. 1996. Central moments and probability distributions of three measures of phylogenetic tree imbalance. Syst. Biol. 45:99–110. [Google Scholar]
  86. Rohlf F.J., Chang W.S., Sokal R.R., Kim J.. 1990. Accuracy of estimated phylogenies: effects of tree topology and evolutionary model. Evolution 44:1671–1684. [DOI] [PubMed] [Google Scholar]
  87. Rosindell J., Cornell S.J., Hubbell S.P., Etienne R.S.. 2010. Protracted speciation revitalizes the neutral theory of biodiversity: Protracted speciation and neutral theory. Ecology Lett. 13:716–727. [DOI] [PubMed] [Google Scholar]
  88. Sackin M.J. 1972. “Good” and “bad” phenograms. Syst. Biol. 21:225–226. [Google Scholar]
  89. Sanderson M.J., McMahon M.M., Stamatakis A., Zwickl D.J., Steel M.. 2015. Impacts of terraces on phylogenetic inference. Syst. Biol. 64:709–726. [DOI] [PubMed] [Google Scholar]
  90. Sarver B.A.J., Pennell M.W., Brown J.W., Keeble S., Hardwick K.M., Sullivan J., Harmon L.J.. 2019. The choice of tree prior and molecular clock does not substantially affect phylogenetic inferences of diversification rates. PeerJ 7:e6334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Schliep K.P. 2011. phangorn: phylogenetic analysis in R. Bioinformatics 27:592–593. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Shao K.T., Sokal R.R.. 1990. Tree balance. Syst. Biol. 39:266–276. [Google Scholar]
  93. Stadler T. 2011. Simulating trees with a fixed number of extant species. Syst. Biol. 60:676–684. [DOI] [PubMed] [Google Scholar]
  94. Stadler T. 2013. Recovering speciation and extinction dynamics based on phylogenies. J. Evol. Biol. 26:1203–1219. [DOI] [PubMed] [Google Scholar]
  95. Steel M., McKenzie A.. 2001. Properties of phylogenetic trees generated by Yule-type speciation models. Math. Biosci. 170:91–112. [DOI] [PubMed] [Google Scholar]
  96. Stich M., Manrubia S.C.. 2009. Topological properties of phylogenetic trees in evolutionary models. Eur. Phys. J. B 70:583–592. [Google Scholar]
  97. Thompson E.A. 1975. Human evolutionary trees. Cambridge [Eng.]; New York: Cambridge University Press. [Google Scholar]
  98. Tomiuk J., Loeschcke V.. 1994. On the application of birth-death models in conservation biology. Conserv. Biol. 8:574–576. [Google Scholar]
  99. Van Dorp L., Acman M., Richard D., Shaw L.P., Ford C.E., Ormond L., Owen C.J., Pang J., Tan C.C.S., Boshier F.A.T., Ortiz A.T., Balloux F.. 2020. Emergence of genomic diversity and recurrent mutations in SARS-CoV-2. Infect Genet Evol. 83:104351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Yule G. 1925. II—A mathematical theory of evolution, based on the conclusions of Dr J C Willis, F R S. Phil. Trans. R. Soc. Lond. B 213:21–87. [Google Scholar]
  101. Zhang C., Huelsenbeck J.P., Ronquist F.. 2020. Using parsimony-guided tree proposals to accelerate convergence in Bayesian phylogenetic inference. Syst. Biol. 69:1016–1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Zhang C., Matsen F.A.. 2022. A variational approach to Bayesian phylogenetic inference. arXiv 2204.07747, doi: 10.48550/arXiv.2204.07747. [DOI]
  103. Zhang G., Li C., Li Q., Li B., Larkin D.M., Lee C., Storz J.F., Antunes A., Greenwold M.J., Meredith R.W., Ödeen A., Cui J., Zhou Q., Xu L., Pan H., Wang Z., Jin L., Zhang P., Hu H., Yang W., Hu J., Xiao J., Yang Z., Liu Y., Xie Q., Yu H., Lian J., Wen P., Zhang F., Li H., Zeng Y., Xiong Z., Liu S., Zhou L., Huang Z., An N., Wang J., Zheng Q., Xiong Y., Wang G., Wang B., Wang J., Fan Y., Da Fonseca R.R., Alfaro-Núñez A., Schubert M., Orlando L., Mourier T., Howard J.T., Ganapathy G., Pfenning A., Whitney O., Rivas M.V., Hara E., Smith J., Farré M., Narayan J., Slavov G., Romanov M.N., Borges R., Machado J.P., Khan I., Springer M.S., Gatesy J., Hoffmann F.G., Opazo J.C., Håstad O., Sawyer R.H., Kim H., Kim K.-W., Kim H.J., Cho S., Li N., Huang Y., Bruford M.W., Zhan X., Dixon A., Bertelsen M.F., Derryberry E., Warren W., Wilson R.K., Li S., Ray D.A., Green R.E., O’Brien S.J., Griffin D., Johnson W.E., Haussler D., Ryder O.A., Willerslev E., Graves G.R., Alström P., Fjeldså J., Mindell D.P., Edwards S.V., Braun E.L., Rahbek C., Burt D.W., Houde P., Zhang Y., Yang H., Wang J., Jarvis E.D., Gilbert M.T.P., Wang J., Ye C., Liang S., Yan Z., Zepeda M.L., Campos P.F., Velazquez A.M.V., Samaniego J.A., Avila-Arcos M., Martin M.D., Barnett R., Ribeiro A.M., Mello C.V., Lovell P.V., Almeida D., Maldonado E., Pereira J., Sunagar K., Philip S., Dominguez-Bello M.G., Bunce M., Lambert D., Brumfield R.T., Sheldon F.H., Holmes E.C., Gardner P.P., Steeves T.E., Stadler P.F., Burge S.W., Lyons E., Smith J., McCarthy F., Pitel F., Rhoads D., Froman D.P.; Avian Genome Consortium. 2014. Comparative genomics reveals insights into avian genome evolution and adaptation. Science 346:1311–1320. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Systematic Biology are provided here courtesy of Oxford University Press

RESOURCES