Abstract
The branching structure of biological evolution confers statistical dependencies on phenotypic trait values in related organisms. For this reason, comparative macroevolutionary studies usually begin with an inferred phylogeny that describes the evolutionary relationships of the organisms of interest. The probability of the observed trait data can be computed by assuming a model for trait evolution, such as Brownian motion, over the branches of this fixed tree. However, the phylogenetic tree itself contributes statistical uncertainty to estimates of rates of phenotypic evolution, and many comparative evolutionary biologists regard the tree as a nuisance parameter. In this article, we present a framework for analytically integrating over unknown phylogenetic trees in comparative evolutionary studies by assuming that the tree arises from a continuous-time Markov branching model called the Yule process. To do this, we derive a closed-form expression for the distribution of phylogenetic diversity (PD), which is the sum of branch lengths connecting the species in a clade. We then present a generalization of PD which is equivalent to the expected trait disparity in a set of taxa whose evolutionary relationships are generated by a Yule process and whose traits evolve by Brownian motion. We find expressions for the distribution of expected trait disparity under a Yule tree. Given one or more observations of trait disparity in a clade, we perform fast likelihood-based estimation of the Brownian variance for unresolved clades. Our method does not require simulation or a fixed phylogenetic tree. We conclude with a brief example illustrating Brownian rate estimation for 12 families in the mammalian order Carnivora, in which the phylogenetic tree for each family is unresolved. [Brownian motion; comparative method; Markov reward process; phylogenetic diversity; pure-birth process; quantitative trait evolution; trait disparity; Yule process.]
Evolutionary relationships between organisms induce statistical dependencies in their phenotypic traits (Felsenstein 1985). Closely related species that have been evolving separately for only a short time will generally have similar trait values, and species whose most recent common ancestor is more distant will often have dissimilar trait values (Harvey and Pagel 1991). However, the origins of phenotypic diversity are still poorly understood (Eldredge et al. 1972; Gould and Eldredge 1977; Ricklefs 2006; Bokma 2010). Even simple idealized models of evolutionary change can give rise to highly varying phenotype values (Foote 1993; Sidlauskas 2007), and researchers disagree about the relative importance of time, the rate of speciation, and the rate of phenotypic evolution in generating phenotypic diversity (Purvis 2004; Ricklefs 2004, 2006).
Comparative phylogenetic studies seek to explain phenotypic differences between groups of taxa, and stochastic models of evolutionary change have assisted in this task. Researchers often treat phenotypic evolution as a Brownian motion process occurring independently along the branches of a fixed macroevolutionary tree (Felsenstein 1985). In comparative studies, the Brownian motion model of trait evolution has a convenient consequence: given an evolutionary tree topology and branching times, the trait values at the concurrently observed tips of the tree have multivariate normal distribution. Brownian motion on a fixed phylogenetic tree is the basis for the most popular regression-based methods for comparative inference and hypothesis testing (Grafen 1989; Garland et al. 1992; Martins and Hansen 1997; Blomberg et al. 2003; O'Meara et al. 2006; Revell 2010). In the regression approach, inference of evolutionary parameters of interest becomes a 2-step process: first, one must infer a phylogenetic tree; then, conditional on that tree, one estimates relevant evolutionary parameters, usually by maximizing likelihood of the observed trait data under the model for trait evolution. Unfortunately, the uncertainty involved in estimating the tree propagates into the comparative analysis in a way that is difficult to account for (but see Stone 2011), and comparative researchers often lack a precise phylogenetic tree on which to base a regression analysis of trait data. Modern techniques for dealing with this issue generally resort to simulation. Some researchers simulate a large number of possible trees and estimate parameters conditional on a single representative tree, such as the maximum clade credibility tree (see, e.g., Alfaro et al. 2009). Alternative approaches that utilize simultaneous simulation of trees and parameters via Bayesian methods are gaining in popularity (Sidlauskas 2007; Drummond et al. 2012; Slater et al. 2012).
However, simulation methods can be extremely slow and may require assumptions about prior distributions of unknown parameters that are difficult to justify. Indeed, in macroevolutionary studies, the phylogenetic tree is often not of interest per se, but must be taken into account in order to accurately model the dependency of the traits under consideration. Many comparative phylogeneticists regard the evolutionary tree as a nuisance parameter in the larger evolutionary statistical model. For this reason, there is increased interest in tree-free methods of comparative analysis that preserve information about the variance of phenotypic values within unresolved clades (Bokma 2010).
To develop a method for comparative inference in evolutionary studies that do not rely on a particular tree, it is convenient to specify a generative model for phylogenetic trees. In the Yule (pure-birth) process, every existing species independently gives birth with instantaneous rate λ; when there are n species, the total rate of speciation is nλ (Yule 1925). The Yule process is widely used as a null model in evolutionary hypothesis testing and can provide a plausible prior distribution on the space of evolutionary trees in Bayesian phylogenetic inference (Nee et al. 1994; Rannala and Yang 1996; Nee 2006). One can easily derive finite-time transition probabilities (Bailey 1964), and efficient methods exist to simulate samples from the distribution of Yule trees, conditional on tree age, number of species, or both (Stadler 2011). Interestingly, some researchers have pointed out that even the simple Yule process can have unexpected properties that may be relevant in evolutionary theory and reconstruction (Gernhard et al. 2008; Steel and Mooers 2010).
Due to Yule trees' simple Markov branching structure and analytically tractable transition probabilities, many researchers have made progress in characterizing summary properties of the Yule process—that is, integrating over all Yule tree realizations. For example, Steel and McKenzie (2002) study aspects of the shape of phylogenies under the Yule model, such as the distribution of the number of edges separating a subset of the extant taxa from the MRCA; Gernhard et al. (2008) find distributions of branch lengths; Steel and Mooers (2010) study the expected length of pendant and interior edges of Yule trees; and Steel and McKenzie (2001) and Mulder (2011) study the distribution of the number of internal nodes separating taxa.
One important summary statistic for trees in biodiversity applications is phylogenetic diversity (PD), defined as the sum of all branch lengths in the evolutionary tree connecting the species in a clade (Faith 1992). Applied researchers in evolutionary biology have found PD to be useful in conservation and biodiversity applications; see, for example, Webb et al. (2002), Moritz (2002), and Turnbaugh et al. (2008). PD also has the virtue of being a mathematically tractable statistic for analyzing the properties of phylogenetic trees. For example, Faller et al. (2008) show that the asymptotic distribution (as the number of taxa n → ∞) of PD is normal and give a recursion for computing the distribution of PD where edge lengths are integral. Mooers et al. (2011) discuss branch lengths on Yule trees and expected loss of PD in conservation applications. Most importantly for our study, Stadler and Steel (2012) find the moment-generating function for PD conditional on n extant taxa and tree age t under the Yule model. Following on these inspiring results, we seek now to study analytic properties of Yule trees that are useful for comparative evolutionary studies.
In this article, we present a framework for computing probability distributions related to diversity and quantitative trait evolution over unresolved Yule trees and describe methods for estimating related parameters. We first give a mathematical description of the Yule model of speciation and briefly discuss its properties. Next, we introduce the Markov reward process, a probabilistic method for deriving probability distributions related to the accumulation of diversity under a Yule model. In Theorem 1, we give an expression for the probability distribution of PD under a Yule model, conditional on the number of species n, time to the most recent common ancestor (TMRCA) or tree age t, and speciation rate λ. We show that this probability distribution approaches a normal distribution as the number of taxa n becomes large. We then demonstrate an important and previously unappreciated relationship between trait disparity, the sample variance for a group of taxa (O'Meara et al. 2006), and PD for traits evolving on a Yule tree via Brownian motion. Theorem 2 gives an expression for the distribution of expected trait disparity when integrating over the branch lengths of a Yule tree. Next, we describe a statistical method for performing fast maximum-likelihood estimation of Brownian variance, given an unresolved clade and observed trait disparity. Our approach does not require fixing a phylogenetic tree or specification of prior probabilities for unknown parameters. The method is simulation-free and does not seek to infer branch lengths or ancestral states. We show empirically that our estimators are asymptotically consistent. We provide software and data files for reproducing the results in this article in the Dryad repository (doi:10.5061/dryad.2cb46). We conclude with an application of our method to body size evolution in the mammalian order Carnivora.
Mathematical Background
To aid in exposition, we briefly establish some notation. Denote the topology of a phylogenetic tree by τ. A topology is the shape of a tree, disregarding branch lengths or age. We always condition our calculations on the phylogenetic tree having age t with n extant taxa. Let t1, t2,... denote the branching points of a tree, where tk is the time of branching from k to k + 1 lineages. We measure time in the forward direction, so at the TMRCA, t = 0. This is in keeping with our mechanistic orientation: the Yule process, to be developed below, runs forward in time from 0 to t.
Yule Processes
Let Y(t) ∈ {1,2,...} be a Yule process with birth rate λ that keeps track of the number of species at time t. The transition probabilities Pmn(t) = Pr(Y(t) = n | Y(0) = m) satisfy the Kolmogorov forward equations
![]() |
(1) |
for n ≥ 1. This infinite system of ordinary differential equations can be solved to yield closed forms for the finite-time transition probabilities,
![]() |
(2) |
which have a negative binomial form (Bailey 1964). This expression was also derived in an evolutionary context by Foote et al. (1999). In the Yule process, we are only concerned with the number of species that exist at any moment in time, not their genealogy. That is, we assume that the lineage that branches is chosen uniformly from all extant lineages. We study Yule processes starting with any number of lineages at the beginning of time of observation; if m = 1, then t is the stem age, if m = 2, t is the crown age, and m ≥ 3 allows for polytomies at the root. The transition probability (2) is useful for performing statistical inference: suppose we know the branching rate λ and the age t of a tree, and we observe Y(t) = n. Then Equation (2) gives the likelihood of our observation. Figure 1 shows an example realization of a Yule tree, with the corresponding counting process diagram below. In this example, λ = 2, Y(0) = 2, and Y(t = 1) = 12.
Figure 1.
Example of a Yule (pure-birth) tree with the corresponding counting process Y(t) below. The birth rate in this example is λ = 2. In the counting process representation of this realization, we only keep track of the total number of species in existence at each time.
Markov Reward Processes
In a Markov reward process, a non-negative reward ak accrues for each unit of time a Markov process spends in state k (Neuts 1995). Consider a Yule process Y(s) beginning at Y(0) = 1 and ending at Y(t) = n. The accumulated reward up to time t is
![]() |
(3) |
When Y(s) is observed continuously from time 0 to t, the process aY(s) is a fully observed step function, and Rt can be easily computed as the area under that function. To illustrate, suppose that the process makes jumps at times t1,...,tn−1, and we define t0 = 0 and tn = t. We assume Y(s) is right-continuous, so Y(ti) = i + 1. Then, at time t, the accumulated reward is
![]() |
(4) |
When only Y(0) and Y(t) are observed, it can be challenging to compute the distribution of Rt. In our proofs of Theorems 1 and 2, we appeal to the method developed by Neuts (1995) and Minin and Suchard (2008) to find reward probabilities conditional on Y(0) and Y(t). Let
![]() |
(5) |
be the joint probability that the reward at time t is x and the process is in state n, given that the process began in state m at time 0. This joint probability formulation is more mathematically convenient than the more natural conditional probability, as we demonstrate in the proofs of the theorems. However, it is easy to transform the distribution vmn(x,t) into the more useful conditional (on Y(t) = n) probability via Bayes' rule, as we show below. Appendix A gives a preliminary lemma deriving a representation for Yule reward processes that will be useful in proving the theorems that follow.
The Distribution of PD in a Yule Process
The Yule process is a simple and analytically tractable mechanistic model for producing the birth times of a clade. If we assume that the species that undergoes speciation is chosen randomly from the extant species at that time, then the Yule process is also a distribution over bifurcating trees of age t. The Markov rewards framework provides a technique to understand integrals over Yule processes in precisely this context. In this section, we study the distribution of PD in trees generated by a Yule process. PD depends only on the branching times, and not the underlying topology, of the phylogenetic tree, making it a suitable first step in our goal of integrating over trees in comparative studies.
To proceed, let Y(s) be a Yule process with branching rate λ that keeps track of the number of lineages at time s. We seek an expression for PD, the total branch length of the tree, which is equivalent to the area under the trajectory of the counting process Y(s). Define a Markov reward process with Y(s) and ak = k for k = 1,2, .... Then,
![]() |
(6) |
We now state our first theorem giving an expression for the distribution of Rt in a Yule process.
Theorem 1 For a Yule process with birth rate λ, starting at Y(0) = m and ending at Y(t) = n, the probability density of PD, conditional on Y(0) = m and Y(t) = n, is
![]() |
(7) |
where Pmn(t) is the Yule transition probability (2), and
![]() |
(8) |
Here, δ(x) is the Dirac delta function and H(x) is the Heaviside step function.
The proof of this theorem is given in Appendix B. There has been disagreement about whether the definition of PD in different contexts includes the root lineage (Faith 1992, 2006; Crozier et al. 2006; Faith and Baker 2006). We do not take a stance on this issue but note that if t is the stem age of an unresolved clade, then taking a1 = 1 in Equation (3) includes the root lineage in the distribution, allowing PD to accumulate in the root lineage, and a1 = 0 does not. The form of Equation (8) will change slightly if m = 1 and a1 = 0. This family of probability distributions has some interesting properties. Figure 2 shows fY(x|m,n,t,λ) for m = 1 (with a1 = 1), n = 1,...,8, λ = 1.2, and t = 1. The unusual shape of the distribution for smaller n demonstrates the piecewise nature of the density, apparent in the functional form (8).
FIGURE 2.
Probability densities of PD, or total branch length, under the Yule process starting at Y(0) = 1, ending at Y(t) = n for n = 1,...,8, with t = 1 and λ = 1.2. When n = 1, no births have occurred, so the accrued PD must be exactly x = t = 1, which we represent here as a point mass at 1. For n = 2, the minimum accumulated PD is 1, since the process spent at most one unit of time with one species; likewise the maximum accumulated PD is 2, since the process spent at most one unit of time with 2 species. The functional form of Equation (8) reveals the piecewise nature of the density, which gradually becomes smoother as n becomes large. The vertical probability axis is the same for all plots.
Interestingly, Faller et al. (2008) show that the distribution of PD tends toward a normal distribution as n → ∞, a fact suggested by the shape of the distributions in Figure 2. We make this observation more formal in the following theorem.
Theorem 2 For a Yule tree with age t, Y(0) = m, Y(t) = n, and branching rate λ,
![]() |
(9) |
where
is the variance and
![]() |
(10) |
as n → ∞. We give only a brief outline of the proof in Appendix C, since the main details are given by Faller et al. (2008), Steel and Mooers (2010), and Stadler and Steel (2012).
Practical Uses
The distribution of PD may be useful in a variety of evolutionary and conservation applications. Zink and Slowinski (1995) and Pybus and Harvey (2000) introduce statistics intended to help in testing the hypothesis of homogeneous branching rate λ using an inferred phylogeny. Both statistics utilize expressions of the form
![]() |
(11) |
where tk is the time of speciation from k to k + 1 lineages and we define tn = t. We have given the exact distribution of PD in Theorem 1 and the asymptotic distribution in Theorem 2. Therefore, one can perform a hypothesis test to evaluate the appropriateness of the Yule model using the tail probability
![]() |
(12) |
where fY(x) is given by Equation (7) and nt is the largest value of PD that can accrue for a tree with n tips over time t. If n is large, we have the normal approximation
![]() |
(13) |
where the expectation and variance of PD are given in Theorem 2 and Φ(x) is the standard normal distribution function. One can use these expressions to predict the PD that will arise under the Yule model from a collection of extant species up to time t in the future. In addition, it might be of interest to calculate the probability that future PD at time t in one group is greater than in the other, conditional on the number of species and diversification rates in both groups; this probability may have uses in conservation applications.
The Distribution of Expected Phenotypic Variance
Since researchers generally do not know the phylogenetic tree for a set of species with certainty, PD is not observable until after a tree has been estimated. We therefore seek a distribution for an analogous quantity that is observable directly from knowledge of the number of species n and their trait values, bypassing the need to infer a detailed phylogenetic tree. For this, we will need a model for phenotypic trait evolution on the branches of an unknown phylogenetic tree generated by a Yule process.
The simplest and most popular model for evolution of continuous phenotypic traits on phylogenetic trees is Brownian motion (Felsenstein 1985). Under this model, trait increments over a branch of length t are normally distributed with mean 0 and variance σ2t. The trait values for extant species at the present time are observed as the vector X = (X1,...,Xn). For a given topology τ with n taxa and branching times t = (t2,...,tn−1), with tip data X generated on the branches of this tree by zero-mean Brownian motion with variance σ2, the tip data are distributed according to a multivariate normal random variable. More formally,
![]() |
(14) |
where the entries of the variance–covariance matrix C(τ,t) = {cij} are defined as follows: cii = t, and cij is the time of shared ancestry for taxa i and j, where i ≠ j. O'Meara et al. (2006) introduce disparity, the sample variance of the tip data X,
![]() |
(15) |
where is the mean of the elements of X. The expectation of the disparity, conditional on the tree topology τ, branching times t, and the Brownian variance σ2, is
![]() |
(16) |
where we use the notation to indicate that the expectation is taken over realizations of the Brownian process that generates X. The second-to-last equality above arises since the matrix C(τ,t) is symmetric and every element on the diagonal is t. However, every entry cij is either zero or a branching time from the vector t, so the nonzero terms in the sum consist of branching times tk. Let zk be the coefficient multiplying the branching time tk in the last line of Equation (??). It is important to note that the zk depends on the tree topology. Then, we can express expected disparity as a weighted sum of the branching times,
![]() |
(17) |
Figure 3 illustrates how tree topology determines the matrix C(τ,t) and expected disparity. Notably, Sagitov and Bartoszek (2012) derive related expressions for the variance of the sample mean and the expectation of the sample variance
under a Yule model.
Figure 3.
How tree topology determines the matrix of Brownian covariances and expected disparity. At left, a tree topology τ of crown age t has 5 taxa and branch times t = (t2,t3,t4), where tk is the time of the branch from k to k + 1 lineages. At right, the corresponding matrix C(τ,t) of Brownian covariances. The diagonal entries of C(τ,t) are all t. The (i,j)th entry of C(τ,t) is the time of shared ancestry between taxa i and j, for i ≠ j. For example, Taxa 1 and 2 share ancestry for time t4. Expected disparity (using Brownian variance σ2) is calculated using Equation (??). Trait disparity cannot accumulate when there is only one species, so we draw the tree τ beginning with 2 lineages at time 0.
Expected Disparity as an Accumulated Reward
The expected disparity (17) has features in common with PD, since it is a scalar quantity that accumulates over the branches of the tree from time 0 to t. The difference is that disparity implicitly incorporates tree-topological factors, which enter Equation (17) as weights in the sum of the branch lengths. In addition, a Yule tree accumulates PD even when there is a single lineage, but the same is not true for disparity. We develop these ideas in greater detail in this section.
Our goal is to express Equation (17) as a Markov reward process in a form equivalent to Equation (4),
![]() |
(18) |
From Equation (??), we see that , and an−1 can be found using an and zn−1, and so on. We can formalize this recursive solution for the rewards by equating Equations (17) and (18) as follows:
![]() |
(19) |
Then, recursively solving for the aks gives a1 = 0 and
![]() |
(20) |
for k = 2,...,n. Note that the ak are increasing with k, and therefore ak > 0 for all k > 1. Now defining Rt(a) to be the Yule reward process with rewards a = (a1,...,an) under the topology τ, the expected disparity is distributed as
![]() |
(21) |
Note that we no longer need to condition on the branch lengths t = (t1,...,tn) in the expected disparity—they have been “integrated out.” Therefore, to find the distribution of expected trait variance under a Brownian motion process on a Yule tree with topology τ, we need only to find the relevant rewards a and compute the corresponding distribution of Rt(a).
As a concrete example, consider the 5-taxon tree in Figure 3. The expected disparity, given this topology τ and arbitrary branch lengths t = (t2,t3,t4), is
![]() |
(22) |
The coefficients are given by
![]() |
(23) |
Solving for the rewards a, we obtain
![]() |
(24) |
which is easily verified by hand. This leads us to our second theorem, which gives an expression for the distribution of Rt(a).
Theorem 3 In a Yule process with rate λ and arbitrary non-negative rewards a = (a1,...,an), the Laplace transform of vmn(x,t|a) is given by
![]() |
(25) |
The proof of this theorem is given in Appendix D. To obtain the probability distribution of the accumulated reward, we must invert Equation (25) to obtain the joint probability conditional on the reward vector a,
![]() |
(26) |
For m = n and n = m+1, there are simple expressions for the inverse Laplace transform. Under certain conditions on the rewards a, there is a straightforward analytic inversion of Equation (25) for general n > m, but the rewards computed using the times of shared ancestry in a phylogenetic tree do not always satisfy these conditions, which we outline in greater detail in Appendix E. Therefore, it is often easier to numerically invert Equation (25), and we provide a straightforward method for numerical inversion of the Laplace transform (25) based on the method popularized by Abate and Whitt (1995).
Approximate Likelihood and Inference for σ2
We now describe a statistical procedure for using Theorem 3 to perform statistical inference for the unknown Brownian variance σ2. Suppose that in a clade of n species we have a crude tree topology τ (without branch lengths). This topology could be derived from parsimony, distance-based tree reconstruction methods, or one could simply use family/genus/species information to assign a hierarchy of relationships and resolve polytomies randomly. Given the tree topology τ, one can compute the rewards a. Suppose also that we have calculated trait disparity for each of J independent continuous quantitative traits that arise from Brownian motion on the branches of the unknown phylogenetic tree, starting at the root. Let
![]() |
(27) |
be the observed disparity for the jth phenotypic trait, where X(j) is the vector of n trait values for the jth phenotypic trait and is the mean of the elements of X(j). Then, we calculate the mean disparity
across these J traits:
![]() |
(28) |
Note that in order to find , we do not need the individual trait measurements themselves—only the disparities. Then, by the law of large numbers,
as J → ∞, where Dn is the asymptotic mean disparity across all possible traits. Therefore, we approximate the distribution of
as follows:
![]() |
(29) |
where a is the vector of rewards obtained from the topology τ. This approximate relation provides the connection between observable mean trait disparity and the probability distribution in Theorem 3 that we need in order to compute the probability of the observed disparities. Suppose the stem age of an unresolved tree is t, and let
![]() |
(30) |
be the distribution of expected disparity in a Yule process with general rewards a, conditional on Y(0) = m and Y(t) = n. Here, x is the expected trait disparity, which we approximate by our observed (and therefore fixed) . From Equation (29), we write
![]() |
(31) |
so the likelihood is approximately
![]() |
(32) |
Finally, we propose the approximate maximum-likelihood estimator
![]() |
(33) |
We solve Equation (33) using the numerical Newton–Raphson method provided by the R function nlm. To find a reasonable starting value for , note that in a Yule reward process in which ak < aj for k < j, the value of the reward is constrained to lie in the interval
![]() |
(34) |
where we have assumed Y(0) = m and Y(t) = n. Additionally, when none of the rewards ak are zero and we approximate Rt by , we obtain bounds on the possible value of
that maximizes Equation (32),
![]() |
(35) |
When m = 1 and a1 = 0, we have only the lower bound
![]() |
(36) |
We emphasize that Equation (33) is not a traditional maximum-likelihood estimator. We have approximated the distribution of by the distribution of expected disparity, giving an approximate likelihood (32) that may not attain its maximum at the same value of σ2 as the true likelihood. In addition, the density (32) is non differentiable at several points, and for n = m+1 attains its maximum at the lower boundary Rt(a) = amt (see Fig. 2 with m = 1, n = 2). These issues complicate application of traditional asymptotic theory for maximum-likelihood estimates, and the classical large sample theory may not apply because the likelihood is only an approximation. One consequence of the violation of these traditional assumptions is that we are unable to provide meaningful standard errors for
using only the approximate likelihood (32).
Practical Uses
Before presenting simulation results and a real-life application, we briefly discuss 3 scenarios where approximate likelihood inference of σ2 may be helpful in comparative evolutionary studies. First, we consider the case where only the species richness n, tree age t, and mean sample variance of a collection of n trait values are known for the clade. In this case, the estimate of the Brownian variance σ2 is given by Equation (33). There are several ways to deal with the absence of the tree topology. First, since all trait values at the tips are known, an approximate tree (without branch lengths) can be inferred quickly via distance methods, and our method provides a way to integrate over branch lengths. Alternatively, one can treat the unknown clade as a “soft polytomy” and either resolve the topology arbitrarily or sample over topologies, while integrating out branch lengths. Either way, an approximate or arbitrary topology may be used as the input to Equation (33). When the number of sampled phenotype values at the tips of the unresolved clade is less than the known species richness, the problem becomes more difficult, and traditional techniques for comparative phylogenetic analysis (e.g., phylogenetic regression) may be of little use. Reconstruction of the full tree is impossible, and so it can be difficult to draw inference on the evolutionary rate in the clade as a whole. To make our discussion more formal, suppose the clade has richness n, but we only have k < n trait values. Slater et al. (2012) consider an approximate Bayesian computation (ABC) solution to this problem, in which they sample trees and their Brownian realizations for n tips and only accept simulations in which the trait variance among the k observed phenotype values is sufficiently close to the simulated value. By repeating this procedure many times, these authors approximate the likelihood of the observed data, allowing approximate inference for evolutionary rate and other parameters by Markov chain Monte Carlo. However, using our approximate likelihood framework, we can modify Equation (29) as follows:
![]() |
(37) |
If the first approximation is good, then we can treat as
and integrate over Yule trees with n tips to obtain an estimate of σ2. This approach may be reasonable if the n−k unsampled trait values are assumed to be “typical,” and not substantially different from the k sampled values. Finally, when the tree topology is known, but branch lengths are unknown, we can apply Equation (33) directly. These scenarios are illustrated schematically in Figure 4.
Figure 4.
Three scenarios for which the methods outlined in “The distribution of expected phenotypic variance” section may be useful. Top left: the tree is completely unknown and the sample variance of the trait values is known exactly. Top right: the tree is completely unknown and only a subset of the total number of taxa are sampled. We approximate the sample variance for all taxa by the sample variance for the subset. Bottom center: the tree topology is known but branch lengths are unknown.
Simulations
To empirically evaluate the correctness of the analytic distributions we derived in Theorem 3, we simulated trait data via Brownian motion on trees generated by a Yule process with age t = 1 and branching rate λ = 1. For n = 3, ...,8, we chose one tree topology and simulated Nbtimes = 2000 sets of branch lengths from a Yule process for n species (Stadler 2011). For each set of branching times, we simulated NBM = 2000 realizations of Brownian motion with σ2 = 1 to generate trait values at the tips of the tree (Paradis et al. 2004). For each of the 2000 sets of tip values, we calculated the mean disparity using Equation (??). Figure 5 shows histograms of the mean disparities with the analytic distribution fY(x) overlaid, with good correspondence. The tree topology (with arbitrary branch lengths for display) is shown in gray above each histogram.
Figure 5.
Empirical correspondence between the derived expressions for the distribution of expected disparity and simulated mean disparity histograms for trees with different numbers of taxa. For each n = 3,...,8, we simulated a single tree topology (shown above each histogram in gray). We then simulated 2000 sets of branch lengths for this topology under the Yule process. For each set of branch lengths, we calculated the mean disparity from 2000 simulations of zero-mean Brownian motion with variance σ2 = 1 on this tree.
To evaluate our estimation methodology, we take a similar approach, but for each simulated set of mean disparities, we infer , an approximate maximum-likelihood estimate of σ2. Figure 6 shows estimates of σ2 for different species richness n under different simulation conditions. For n = 3, ...,12, we generated 100 trees, each with Nbtimes = 1,5,10,100 sets of branching times. For each set of branching times, we evolved NBM = 1,5,10,100 traits by Brownian motion along the branches with rate σ2 = 1 and computed the mean disparity. Then, given the Nbtimes mean disparities, we maximized the approximate likelihood to find
. Each dot in Figure 6 represents one estimate, and the dots are jittered slightly to show their density. The variance in the estimator is large when the number of simulated branching time sets and Brownian realizations is small since (29) and hence (31) become poor approximations to the mean disparity. However, the approximate maximum-likelihood estimator for σ2 appears to have the desirable property of statistical consistency: the deviation of the estimates from the true value σ2 = 1 goes to zero as the number of mean disparity observations becomes large.
Figure 6.
Empirical consistency of approximate maximum-likelihood estimates of σ2. The number of species n in the unobserved phylogenetic tree is shown on the horizontal axis. Each set of plots shows 100 estimates of σ2 from Nbtimes simulations of different branching times under a Yule process with n species and from NBM independent realizations of Brownian motion used to compute the mean disparity for each set of branching times. The estimates are jittered to show the sampling distribution. The gray dotted line shows the true value σ2 = 1.
Application to Evolution of Body Size in the Order Carnivora
To illustrate the usefulness of our method in practical comparative inference, we estimate 12 family-wise Brownian variance rates for body size evolution in the mammalian order Carnivora using observed log-body size disparities and species richness information (Gittleman 1986; Gittleman and Purvis 1998; Slater et al. 2012). Carnivora includes members with very large and small body masses, including wide diversity within individual families (Nowak and Paradiso 1999). We included only families with 2 or more species. Families with only one species do not reveal useful information about intra-family Brownian variance. We excluded Priodontidae, which has only 2 species, because the maximum-likelihood estimate of the Brownian variance is always zero. This is because an unresolved Yule tree beginning with 1 species at time 0 and ending with 2 species at time t with expected sample variance of the trait values is distributed as σ2Rt, where Rt has density f (u) = λe−λu/(1−e−λt); this function attains its maximum at zero. The data set, comprising 282 species, included the families Canidae, Eupleridae, Felidae, Herpestidae, Hyaenidae, Mephitidae, Mustelidae, Otariidae, Phocidae, Procyonidae, Ursidae, and Viverridae. Figure 7 shows the backbone phylogeny (from Eizirik et al. 2010) and the unresolved clades. Our analysis takes advantage of several utilities for creating and manipulating phylogenetic trees and quantitative trait data (Paradis et al. 2004; Harmon et al. 2009; Bortolussi et al. 2011; FitzJohn et al. 2012; Stadler 2011, 2012), using the statistical programming language R (CRAN 2012). We intentionally limit our analysis to the Brownian variance in each family and use only body size disparity in order to demonstrate the simplicity of our approximate method under conditions of very little data.
Figure 7.
A family-level phylogenetic tree for order Carnivora. The phylogeny within each family, shown here as a gray triangle, is not known with certainty. The horizontal axis represents time, in millions of years since the MRCA. The length of the right-hand side of the gray triangles is proportional to the number of species in the family. The “backbone” tree connecting the unresolved clades is assumed fixed. Branch lengths are shown along each branch.
Since our estimation method for the Brownian variance in an unresolved clade depends on the branching rate in the clade, the first step in our analysis is to estimate the speciation rate λ from the backbone tree and the unresolved clades. Even though the tree is unobserved within each family, we can still find the maximum-likelihood estimate of λ for the tree as a whole. On a fully resolved branch of length t, the log-likelihood of λ is log(λ)− λt. For an unresolved clade of age t with one species at the crown which grows to include n species, the log-likelihood from Equation (2) is proportional to −λt+(n−1)log(1−e−λt). Summing these clade log-likelihoods over the whole tree gives the log-likelihood for λ; maximizing this function, we find that per million years. In what follows, we assume that
. To each unresolved family clade, we associate the clade disparity for body size. Our analysis will consist of estimating the family-wise Brownian variance σj2 for the jth family for each clade shown in Figure 7. Since the clade stem ages are given implicitly by the backbone tree, we integrate over crown ages for each family under the Yule model. To apply the methodology for modeling expected trait disparity developed in “The distribution of expected phenotypic variance” section, we regard the unresolved clades as “soft polytomies” in which branch lengths are unknown (Purvis and Garland 1993). We therefore resolve these polytomies randomly by sampling a tree from the discrete Yule distribution conditional on species richness and age without assigning branch lengths, and integrate over branch lengths.
Our analysis of the Carnivora family-wise Brownian variances takes < 10s to run on a laptop computer. However, to evaluate the variability in our estimates due to topology, we ran the analysis 1000 times with randomly resolved polytomies for each family. Table 1 shows the family name, species richness (n), stem age (t), observed body size disparity (), the mean estimate of σj2, and the approximate standard error of the estimate for each family. We calculated standard errors as the empirical standard deviation of the 100 Brownian variance estimates. Our estimates of σ2 reveal readily interpretable information about the evolution of body size in each family that is not available directly from observed disparities alone. For example, the families Herpestidae and Viverridae have almost identical species richness, but quite different disparity measurements. Perhaps surprisingly, we have estimated nearly equal Brownian variances for the 2 families. Why does our method produce such similar estimates of σ2? The answer lies in the ages of the clades—Viverridae is almost 50% older than Herpestidae. Two clades with the same richness whose traits evolve by Brownian motion at the same rate can exhibit very different disparity measurements, depending on their ages. This example illustrates how our method provides an approximate way to untangle the complex interaction of time, species, and observed trait variance (in the words of Ricklefs 2006) for unresolved clades.
Table 1.
Species richness, TMRCA, body size disparity, and estimated Brownian variance for each family in the Carnivora data set
Family | Richness | TMRCA | Disparity | ![]() |
SE |
---|---|---|---|---|---|
Felidae | 40 | 33.3 | 1.588 | 0.080 | 0.009 |
Viverridae | 34 | 37.4 | 0.606 | 0.029 | 0.004 |
Herpestidae | 33 | 25.5 | 0.482 | 0.030 | 0.003 |
Eupleridae | 8 | 25.5 | 0.916 | 0.079 | 0.010 |
Hyaenidae | 4 | 32.2 | 0.805 | 0.122 | 0.005 |
Canidae | 35 | 48.9 | 0.678 | 0.030 | 0.004 |
Ursidae | 8 | 42.6 | 0.303 | 0.024 | 0.002 |
Otariidae | 16 | 24.5 | 0.386 | 0.028 | 0.003 |
Phocidae | 19 | 24.5 | 0.751 | 0.052 | 0.005 |
Mephitidae | 12 | 32.0 | 0.570 | 0.039 | 0.005 |
Mustelidae | 59 | 27.4 | 2.263 | 0.126 | 0.014 |
Procyonidae | 14 | 27.4 | 0.531 | 0.037 | 0.004 |
Notes: The Canidae and Herpestidae have very different disparity measurements, but nearly identical estimates of σ2. This discrepancy is due to the difference in their ages; we explain the interaction between time, species number and disparity in greater detail in the text.
Discussion: Comparative Phylogenetics without Trees?
In this article, we have outlined a method for integrating over Yule trees. We presented an expression for the distribution of PD in an unresolved tree, conditional on the number of species n and age t. We showed that the expected disparity can be represented in a similar way as PD, since it accumulates along the branches of a phylogenetic tree. We also derived a statistical framework that uses a very small amount of information (n, t, and ) for an unresolved clade to derive a meaningful estimator for the Brownian rate σ2. It may seem counterintuitive that one can estimate the Brownian rate for an unresolved tree with n taxa, given a single disparity measurement. However, the structure provided by the Yule process allows this inference by providing just enough information about the distribution of branching times that generate the tree to model the average phenotypic disparity under Brownian motion. This permits analytic integration over 2 random objects: the collection of branching times of the tree and realizations of Brownian motion. Three assumptions make this possible: first, we fix a topology τ without branch lengths; second, we assume that branch lengths come from a Yule process; third, we compute the distribution of expected disparity, which is a scalar quantity that encapsulates the most important information in the covariance matrix C(τ). In exchange for these assumptions, we gain what frustrated reviewers of Felsenstein's paper apparently wished for: an estimator that “obviate[s] the need to have an accurate knowledge of phylogeny” (Felsenstein 1985). Whether these assumptions are warranted depends fundamentally on the scientific questions at hand, and the available data.
Perhaps the most satisfying use of our method is in providing an approximate and model-based answer to the questions posed by Ricklefs (2006) and Bokma (2010), in their similarly named papers. Our answer is approximate because it substitutes an observation for an expectation in Equation (29); it is model based because we assume that trees arise from a Yule process and traits evolve according to Brownian motion. Equation (29) expresses a heuristic relationship explaining the origins of phenotypic disparity, which we reproduce here for emphasis:
![]() |
(38) |
On the left-hand side is the observed disparity. On the right-hand side, Rt(a) serves as a scalar summary of tree shape—it depends only on n, clade age t, and tree topology τ. This reveals that even when we restrict our attention to expected disparity under the simplest evolutionary models, the interaction between n, t, and the branching structure of the tree in Rt(a) is complex, but the Brownian variance simply scales the tree-topological term. We see that scales linearly with σ2 when n, t, and the topology τ are fixed. However, changing one of n, t, or τ while holding σ2 constant will induce a nonlinear change in
. We conclude that it is not possible to partition the time-dependent and speciation-dependent influences on the accumulation of trait variance in a simple way as suggested by Ricklefs (2006) under the stochastic models we study in this article.
As an inducement to spur research on analytic integration over trees, Bokma (2010) offers a monetary reward for an expression for the distribution of sample variance from a birth–death tree with Brownian trait evolution on its branches. We have solved a simpler version of Bokma's challenge by providing the distribution of expected trait variance for a specific topology under a pure-birth process. The expression Bokma (2010) seeks is difficult to find for 2 reasons: first, it would require analytic summation over discrete tree topologies; second, and more intuitively, integrating over both topologies and Brownian realizations would subsume the Brownian variance σ2 on the right-hand side of Equation (29) into a nonlinear term that depended on n, t, and σ2 in a very complicated way.
Alternatively, simulation-based approaches provide an appealing alternative method to integrate over trees and Brownian motions without requiring approximation of the disparity by its expectation. Indeed, Bayesian methods exist to sample from the distribution of Yule trees, conditional on observed trait values at the tips, thereby providing both estimates of Brownian rates and phylogenies simultaneously, while using all the available trait data (Drummond et al. 2012). However, simulation approaches to inference can be painfully slow. Although forward simulation of stochastic processes is usually extremely fast, Markov chain Monte Carlo algorithms that rely on repeated simulation to approximate likelihoods (ABC methods) are often quite slow. The reason is that the simulated realizations are rejected if they are “far” from the observed data; it may take many millions of simulated realizations of Yule trees and Brownian motions on those trees to approximate the likelihood to reasonable precision. The problem is exacerbated when starting parameter values are not close to the posterior mode, since many simulated realizations will be rejected. Most evolutionary ABC approaches then use a Metropolis–Hastings step to accept or reject the proposed parameter values. Repeating these steps to obtain a large number of samples from the posterior distribution of the unknown parameters can be extremely time-consuming. One important use of our method is in computing good starting parameter values for sampling-based analyses, including ABC.
Tree-free comparative evolutionary biology comes at a price—there are several important drawbacks to our approach. First, even under the Yule model for speciation (with the correctly specified branching rate λ) and zero-mean Brownian motion for traits, integrating over all possible Yule trees introduces great uncertainty in estimates of σ2. Figure 6 illustrates this issue: while the estimates of σ2 eventually converge to the true value as the number of different branch length and trait realizations becomes large, the variance in these estimates can be substantial for smaller data sets. Furthermore, the assumption that may be suspect if the number of traits analyzed is small enough that the mean trait disparity is a poor substitute for the expected disparity.
We conclude with a mixed message about analytic integration over trees. First, it is possible to derive meaningful estimators for parameters of interest under simple evolutionary models, if one is willing to make assumptions about the mean behavior of the models. The estimates are usually reasonable and may provide valuable insight into the basic properties of evolutionary change under these models—even our simplistic analysis of Carnivora body size evolution reveals the complex interaction of clade age, species number, and evolutionary rate. These estimates may be useful as starting points for more time-consuming simulation analyses. Second, and more pessimistically, sophisticated analytic methods for integrating over trees cannot conjure evolutionary information from the data that is not there already. As evolutionary biologists further refine our knowledge of the tree of life, the number of clades whose phylogeny is truly unknown may diminish, along with interest in tree-free estimation methods.
Supplementary Material
We provide the software and data files used to produce the results in this article in the Dryad repository at http://datadryad.org, doi:10.5061/dryad.2cb46.
Acknowledgments
We are extremely grateful to Michael Alfaro and Graham Slater for introducing us to the problem of finding the distribution of quantitative trait variance, providing the Carnivora data set, and for helpful comments on the manuscript. John Welch and Eric Stone provided insightful criticism and suggestions. We thank Mike Steel, Folmer Bokma, and one anonymous reviewer for exacting and thoughtful reviews. This work was supported by National Institutes of Health [grant T32GM008185 to F. W. C.], and by National Institutes of Health [grants R01 GM086887, HG006139, and NSF grant DMS-0856099 to M. A. S.].
Appendix A
Markov Rewards for Yule Processes
In this appendix, we prove one lemma and the 2 theorems presented in the text. In the first proof, we derive a representation of the forward equation for a Yule reward process. Our development follows that given by Neuts (1995).
Lemma 1. In a Yule reward process Rt = ∫0t aY(s) ds with arbitrary positive rewards a1,a2,..., the Laplace-transformed reward probabilities satisfy the ordinary differential equations
![]() |
(A.1) |
Proof. Let Vmn(x,t) = Pr(Rt ≤ x,Y(t) = n | Y(0) = m). We can rewrite this quantity in a more useful form by conditioning on the time of departure u from state m, noting that the accumulated reward is amu, and then integrating over u. If m = n and no departure occurs, the accumulated reward is amt. Putting these ideas together, we obtain
![]() |
(A.2) |
Now, consider the Laplace transform Fmn(r,t) of Vmn(x,t), with respect to the reward variable x,
![]() |
(A.3) |
Making the substitution y = x−amu in the Laplace integral, we have
![]() |
(A.4) |
Now multiplying both sides by e(mλ+amr)t and differentiating with respect to t, we obtain
![]() |
(A.5) |
Expanding the left-hand side by the product rule and using the fundamental theorem of calculus on the right, we find that
![]() |
(A.6) |
Cancelling common factors and rearranging, we obtain the Kolmogorov backward equation,
![]() |
(A.7) |
However,
![]() |
(A.8) |
Plugging rFmn(r,t) = fmn(r,t) into Equation (A.7), we find that the fmn(r,t) satisfy the same system of ordinary differential equations,
![]() |
(A.9) |
These are the backward equations for the Laplace transformed reward process. To solve Equation (A.9), we note that any solution to the forward equations is a solution to the backward equations in a birth process (Grimmett and Stirzaker 2001, p. 251). Therefore, Equation (A.9) is equivalent to the forward system
![]() |
(A.10) |
for n = m,m+1,m+2,... This completes the proof.
Appendix B
Proof of Theorem 1
Proof. Lemma 1 with ak = k for k = 0,1,... gives
![]() |
(A.11) |
Define gmn(r,s) to be the Laplace transform of fmn(r,t) with respect to the time variable t. Transforming Equation (A.11) gives
![]() |
(A.12) |
Letting m = n, we find that
![]() |
(A.13) |
Next, we form a recurrence and solve for gmn(r,s) to obtain
![]() |
(A.14) |
We proceed via a partial fractions decomposition of the product in the denominator above,
![]() |
(A.15) |
when n > m. Inverse transforming with respect to s, we obtain
![]() |
(A.16) |
and
![]() |
(A.17) |
when n > m. Again inverse transforming Equation (A.17), this time with respect to the Laplace reward variable r, we find that for m = n,
![]() |
(A.18) |
which is a point mass at x = mt. For n > m,
![]() |
(A.19) |
This completes the proof.
Appendix C
Proof of Theorem 2
Proof. Steel and Mooers (2010) and Stadler and Steel (2012) provide details of this proof for m = 2, and our presentation follows theirs closely, with the exception that we allow m to be any positive number less than n. The PD, or total branch length, of a Yule tree beginning with m lineages at time 0 and ending with n tips at time t, and branch rate λ, is the sum of mt and n−m independent and identically distributed random variables with density
![]() |
(A.20) |
This is simply an exponential density scaled by the probability of a branch of length greater than t (Gernhard 2008). A tree beginning with m lineages ending in n tips has n−m branching times. Let Si be the length of the ith branch in the tree. Then, Si = t for i = 1,...,m, and
![]() |
(A.21) |
where the Si are independent and identically distributed with density (A.20). Note that
![]() |
(A.22) |
and
![]() |
(A.23) |
The variance is
![]() |
(A.24) |
The expected PD is
![]() |
(A.25) |
When m = 2, Equation (A.25) agrees with Theorem 2 of Steel and Mooers (2010). Now since the branch lengths are independent and identically distributed, the variance of PD is the sum of the n−m variance terms,
![]() |
(A.26) |
Consider a new random variable
![]() |
(A.27) |
Now, Zn is the sum of n−m independent and identically distributed random variables with mean 0 and varaiance 1, the central limit theorem holds, and Zn → Normal(0,1).
Appendix D
Proof of Theorem 3
Proof. Lemma 1 with arbitrary rewards ak, k = 1,2,..., gives
![]() |
(A.28) |
To solve the system, apply the Laplace transform with respect to time t. First note that the transform of fmm(r,t) is
![]() |
(A.29) |
Transforming the nth equation, and recalling that fmn(r,0) = 0 for n > m,
![]() |
(A.30) |
We expand the denominator by partial fractions to find
![]() |
(A.31) |
Transforming back to the time domain, we have, for m = n,
![]() |
(A.32) |
When n > m,
![]() |
(A.33) |
This completes the proof.
Appendix E
Analytic and Numerical Inversion
Analytic inversion of Equation (25) in Theorem 3 is possible, but unfortunately depends on the structure of the tree topology in unexpected ways. One convenient property of the rewards a = (a1,...,an) is that ai < ai+1 for all 1 ≤ i ≤ n−1, a fact apparent from Equation (??). Therefore, ai ≠ aj for distinct i and j. When m = n, no speciation events have taken place, and we have
![]() |
(A.34) |
For n = m+1, there is only one distinct topology, so
![]() |
(A.35) |
In general, when
![]() |
(A.36) |
for any l, k, or j in 1,...,n, then Equation (A.33) becomes
![]() |
(A.37) |
and so the full probability density for n > m is
![]() |
(A.38) |
However, the rewards a for many topologies do not satisfy Equation (A.36). This can be seen in Figure A1, where m = 2 and n = 4. Then, we see that when j = 2, k = 4, and ℓ = 3 in Equation (A.38),
![]() |
(A.39) |
and
![]() |
(A.40) |
so the denominator in the second sum in Equation (A.38) is 0. Unfortunately this happens whenever there is symmetry in the tree so that more than one pair of taxa have the same time of shared ancestry. Note also that Equation (A.38) does not reduce to Equation (25) in Theorem 3 when ak = k since the denominator in the summand of Equation (A.38) is 0.
Figure A1.
Demonstration of a problematic reward vector a computed for the symmetric 4-taxon tree. In this case, the analytic inversion formula (A.38) cannot be applied, since the denominator in the sum becomes 0.
Despite the difficulty in writing a general inversion to obtain fmn(x,t) for any topology, numerical inversion to arbitrary precision remains straightforward. Abate and Whitt (1995) describe a numerical method for inverting the Laplace transform of probability densities by a discrete Riemann sum using the trapezoidal rule with step size h:
![]() |
(A.41) |
where Re(z) indicates the real part of a complex number z and we choose A = 20. Reasonable error bounds can be derived to justify approximation via truncation of the infinite sum. Crawford and Suchard (2012) give a more detailed derivation of this numerical approximation.
Funding
This work was supported by National Institutes of Health [grant T32GM008185 to F. W. C.], and by National Institutes of Health [grants R01 GM086887, HG006139, and NSF grant DMS-0856099 to M. A. S.].
References
- Abate J., Whitt W. Numerical inversion of Laplace transforms of probability distributions. ORS J. Comput. 1995;7(1):36–43. [Google Scholar]
- Alfaro M., Santini F., Brock C., Alamillo H., Dornburg A., Rabosky D., Carnevale G., Harmon L. Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates. Proc. Natl Acad. Sci. U S A. 2009;106(32):13410–13414. doi: 10.1073/pnas.0811087106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey N. T. J. New York: Wiley; 1964. The elements of stochastic processes with applications to the natural sciences. [Google Scholar]
- Blomberg S., Garland T., Ives A. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution. 2003;57(4):717–745. doi: 10.1111/j.0014-3820.2003.tb00285.x. [DOI] [PubMed] [Google Scholar]
- Bokma F. Time, species, and separating their effects on trait variance in clades. Syst. Biol. 2010;59(5):602–607. doi: 10.1093/sysbio/syq029. [DOI] [PubMed] [Google Scholar]
- Bortolussi N., Durand E., Blum M., Francois O. 2011. apTreeshape: Analyses of Phylogenetic Treeshape. R package version 1.4-4 [Internet] Available from: URL http://CRAN.R-project.org/package=apTreeshape. [Google Scholar]
- CRAN. 2012. The comprehensive R archive network [Internet] Available from: URL http://cran.r-project.org. [Google Scholar]
- Crawford F. W., Suchard M. A. Transition probabilities for general birth–death processes with applications in ecology, genetics, and evolution. J. Math. Biol. 2012;65:553–580. doi: 10.1007/s00285-011-0471-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crozier R., Agapow P., Dunnett L. Conceptual issues in phylogeny and conservation: a reply to Faith and Baker. Evol. Bioinform. Online. 2006;2:197–199. [Google Scholar]
- Drummond A. J., Suchard M. A., Xie D., Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 2012;29:1969–1973. doi: 10.1093/molbev/mss075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eizirik E., Murphy W., Koepfli K., Johnson W., Dragoo J., Wayne R., O'Brien S. Pattern and timing of diversification of the mammalian order Carnivora inferred from multiple nuclear gene sequences. Mol. Phylogenet. Evol. 2010;56(1):49–63. doi: 10.1016/j.ympev.2010.01.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eldredge N., Gould S.J. Punctuated equilibria: an alternative to phyletic gradualism. Model in Paleobiol. 1972;82:115. [Google Scholar]
- Faith D. Conservation evaluation and phylogenetic diversity. Biol. Conserv. 1992;61(1):1–10. [Google Scholar]
- Faith D. The role of the phylogenetic diversity measure, pd, in bio-informatics: getting the definition right. Evol. Bioinform. Online. 2006;2:277. [PMC free article] [PubMed] [Google Scholar]
- Faith D., Baker A. Phylogenetic diversity (PD) and biodiversity conservation: some bioinformatics challenges. Evol. Bioinform. Online. 2006;2:121. [PMC free article] [PubMed] [Google Scholar]
- Faller B., Pardi F., Steel M. Distribution of phylogenetic diversity under random extinction. J. Theor. Biol. 2008;251(2):286–296. doi: 10.1016/j.jtbi.2007.11.034. [DOI] [PubMed] [Google Scholar]
- Felsenstein J. Phylogenies and the comparative method. Am. Nat. 1985;125(1):1–15. doi: 10.1086/703055. [DOI] [PubMed] [Google Scholar]
- FitzJohn R. G., Goldberg E. E., Magnuson-Ford K. 2012. Diversitree: comparative phylogenetic analyses of diversification. R package version 0.9-3 [Internet] Available from: URL http://CRAN.R-project.org/package=diversitree. [Google Scholar]
- Foote M. Contributions of individual taxa to overall morphological disparity. Paleobiology. 1993;19(4):403–419. [Google Scholar]
- Foote M., Hunter J. P., Janis C. M., Sepkoski J. J., Jr. Evolutionary and preservational constraints on origins of biologic groups: divergence times of eutherian mammals. Science. 1999;283(5406):1310–1314. doi: 10.1126/science.283.5406.1310. [DOI] [PubMed] [Google Scholar]
- Garland T., Harvey P., Ives A. Procedures for the analysis of comparative data using phylogenetically independent contrasts. Syst. Biol. 1992;41(1):18–32. [Google Scholar]
- Gernhard T. The conditioned reconstructed process. J. Theor. Biol. 2008;253(4):769–778. doi: 10.1016/j.jtbi.2008.04.005. [DOI] [PubMed] [Google Scholar]
- Gernhard T., Hartmann K., Steel M. Stochastic properties of generalised Yule models, with biodiversity applications. J. Math. Biol. 2008;57(5):713–735. doi: 10.1007/s00285-008-0186-y. [DOI] [PubMed] [Google Scholar]
- Gittleman J. Carnivore life history patterns: allometric, phylogenetic, and ecological associations. Am. Nat. 1986;127(6):744–771. [Google Scholar]
- Gittleman J., Purvis A. Body size and species-richness in carnivores and primates. Proc. Roy. Soc. Lond. B Bio. 1998;265(1391):113–119. doi: 10.1098/rspb.1998.0271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gould S., Eldredge N. Punctuated equilibria: the tempo and mode of evolution reconsidered. Paleobiology. 1977;3(2):115–151. [Google Scholar]
- Grafen A. The phylogenetic regression. Phil. Trans. Roy. Soc. B. 1989;326(1233):119–157. doi: 10.1098/rstb.1989.0106. [DOI] [PubMed] [Google Scholar]
- Grimmett G., Stirzaker D. Probability and random processes. Oxford: Oxford University Press; 2001. [Google Scholar]
- Harmon L., Weir J., Brock C., Glor R., Challenger W., Hunt G. 2009. geiger: analysis of evolutionary diversification. R package version 1.3-1 [Internet] Available from: URL http://CRAN.R-project.org/package=geiger. [Google Scholar]
- Harvey P., Pagel M. Oxford: Oxford University Press; 1991. The comparative method in evolutionary biology. [Google Scholar]
- Martins E., Hansen T. Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am. Nat. 1997;149(4):646–667. [Google Scholar]
- Minin V., Suchard M. Counting labeled transitions in continuous-time Markov models of evolution. J. Math. Biol. 2008;56(3):391–412. doi: 10.1007/s00285-007-0120-8. [DOI] [PubMed] [Google Scholar]
- Mooers A., Gascuel O., Stadler T., Li H., Steel M. Branch lengths on birth–death trees and the expected loss of phylogenetic diversity. Syst. Biol. 2011;61(2):195–203. doi: 10.1093/sysbio/syr090. [DOI] [PubMed] [Google Scholar]
- Moritz C. Strategies to protect biological diversity and the evolutionary processes that sustain it. Syst. Biol. 2002;51(2):238–254. doi: 10.1080/10635150252899752. [DOI] [PubMed] [Google Scholar]
- Mulder W. Probability distributions of ancestries and genealogical distances on stochastically generated rooted binary trees. J. Theor. Biol. 2011;280(1):139–145. doi: 10.1016/j.jtbi.2011.04.009. [DOI] [PubMed] [Google Scholar]
- Nee S. Birth–death models in macroevolution. Annu. Rev. Ecol. Evol. Syst. 2006;37:1–17. [Google Scholar]
- Nee S., May R. M., Harvey P. H. The reconstructed evolutionary process. Philos. Trans. Roy. Soc. B. 1994;344(1309):305–311. doi: 10.1098/rstb.1994.0068. [DOI] [PubMed] [Google Scholar]
- Neuts M. F. London: Chapman and Hall/CRC; 1995. Algorithmic probability: a collection of problems (Stochastic Modeling Series) [Google Scholar]
- Nowak R., Paradiso J. Baltimore (MD): The Johns Hopkins University Press; 1999. Walker's mammals of the world. [Google Scholar]
- O'Meara B. C., Ané C., Sanderson M. J., Wainwright P. C. Testing for different rates of continuous trait evolution using likelihood. Evolution. 2006;60(5):922–933. [PubMed] [Google Scholar]
- Paradis E., Claude J., Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20(2):289–290. doi: 10.1093/bioinformatics/btg412. [DOI] [PubMed] [Google Scholar]
- Purvis A. Evolution: How do characters evolve? Nature. 2004;432(7014) doi: 10.1038/nature03092. doi:10.1038/nature03092. [DOI] [PubMed] [Google Scholar]
- Purvis A., Garland T. Polytomies in comparative analyses of continuous characters. Syst. Biol. 1993;42(4):569–575. [Google Scholar]
- Pybus O., Harvey P. Testing macro-evolutionary models using incomplete molecular phylogenies. Proc. Roy. Soc. Lond. Ser. B Biol. Sci. 2000;267(1459):2267–2272. doi: 10.1098/rspb.2000.1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rannala B., Yang Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J. Mol. Evol. 1996;43(3):304–311. doi: 10.1007/BF02338839. [DOI] [PubMed] [Google Scholar]
- Revell L. Phylogenetic signal and linear regression on species data. Methods Ecol. Evol. 2010;1(4):319–329. [Google Scholar]
- Ricklefs R. Cladogenesis and morphological diversification in passerine birds. Nature. 2004;430(6997):338–341. doi: 10.1038/nature02700. [DOI] [PubMed] [Google Scholar]
- Ricklefs R. Time, species, and the generation of trait variance in clades. Syst. Biol. 2006;55(1):151–159. doi: 10.1080/10635150500431205. [DOI] [PubMed] [Google Scholar]
- Sagitov S., Bartoszek K. Interspecies correlation for neutrally evolving traits. J. Theor. Biol. 2012;309:11–19. doi: 10.1016/j.jtbi.2012.06.008. [DOI] [PubMed] [Google Scholar]
- Sidlauskas B. Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Evolution. 2007;61(2):299–316. doi: 10.1111/j.1558-5646.2007.00022.x. [DOI] [PubMed] [Google Scholar]
- Slater G., Harmon L., Wegmann D., Joyce P., Revell L., Alfaro M. Fitting models of continuous trait evolution to incompletely sampled comparative data using Approximate Bayesian Computation. Evolution. 2012;66:752–762. doi: 10.1111/j.1558-5646.2011.01474.x. [DOI] [PubMed] [Google Scholar]
- Stadler T. Simulating trees with a fixed number of extant species. Syst. Biol. 2011;60(5):676–684. doi: 10.1093/sysbio/syr029. [DOI] [PubMed] [Google Scholar]
- Stadler T. 2012. TreeSim: simulating trees under the birth–death model. R package version 1.6 [Internet] Available from: URL http://CRAN.R-project.org/package=TreeSim. [Google Scholar]
- Stadler T., Steel M. Distribution of branch lengths and phylogenetic diversity under homogeneous speciation models. J. Theor. Biol. 2012;297:33–40. doi: 10.1016/j.jtbi.2011.11.019. [DOI] [PubMed] [Google Scholar]
- Steel M., McKenzie A. Properties of phylogenetic trees generated by Yule-type speciation models. Math. Biosci. 2001;170(1):91–112. doi: 10.1016/s0025-5564(00)00061-4. [DOI] [PubMed] [Google Scholar]
- Steel M., McKenzie A. The “shape” of phylogenies under simple random speciation models. Biol. Evol. Statist. Phys. 2002;585:162–180. [Google Scholar]
- Steel M., Mooers A. The expected length of pendant and interior edges of a Yule tree. Appl. Math. Lett. 2010;23(11):1315–1319. [Google Scholar]
- Stone E. Why the phylogenetic regression appears robust to tree misspecification. Syst. Biol. 2011;60(3):245–260. doi: 10.1093/sysbio/syq098. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Turnbaugh P., Hamady M., Yatsunenko T., Cantarel B., Duncan A., Ley R., Sogin M., Jones W., Roe B., Affourtit J., Egholm M., Henrissat B., Heath A.C., Knight R., Gordon J.I. A core gut microbiome in obese and lean twins. Nature. 2008;457(7228):480–484. doi: 10.1038/nature07540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Webb C., Ackerly D., McPeek M., Donoghue M. Phylogenies and community ecology. Annu. Rev. Ecol. Syst. 2002;33:475–505. [Google Scholar]
- Yule G. U. A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S. Philos. Tran. Roy. Soc. Lon. B. 1925;213:21–87. [Google Scholar]
- Zink R. M., Slowinski J. B. Evidence from molecular systematics for decreased avian diversification in the pleistocene epoch. Proc. Natl Acad. Sci. U S A. 1995;92(13):5832–5835. doi: 10.1073/pnas.92.13.5832. [DOI] [PMC free article] [PubMed] [Google Scholar]