Skip to main content
Evolutionary Bioinformatics Online logoLink to Evolutionary Bioinformatics Online
. 2007 Feb 20;2:285–293.

Estimating the Relative Order of Speciation or Coalescence Events on a Given Phylogeny

Tanja Gernhard 1,, Daniel Ford 2, Rutger Vos 3, Mike Steel 4
PMCID: PMC2674681  PMID: 19455222

Abstract

The reconstruction of large phylogenetic trees from data that violates clocklike evolution (or as a supertree constructed from any m input trees) raises a difficult question for biologists–how can one assign relative dates to the vertices of the tree? In this paper we investigate this problem, assuming a uniform distribution on the order of the inner vertices of the tree (which includes, but is more general than, the popular Yule distribution on trees). We derive fast algorithms for computing the probability that (i) any given vertex in the tree was the j–th speciation event (for each j), and (ii) any one given vertex is earlier in the tree than a second given vertex. We show how the first algorithm can be used to calculate the expected length of any given interior edge in any given tree that has been generated under either a constant-rate speciation model, or the coalescent model.

Keywords: Phylogenetics, neutral model, dating speciation events, edge lengths

1. Introduction

A fundamental task in evolutionary biology is constructing evolutionary trees from a variety of data. These constructed trees show the ancesteral relationship between the species.

Not only the relationship between species is of interest, but also the time between speciation events. When constructing an evolutionary tree from a set of molecular data which satisfies the molecular clock, the edge lengths can be interpreted as a time scale. In many cases, no time scale is obtained when constructing a tree though:

  • Often, molecular data does not satisfy the molecular clock and so the edge lengths do not represent a time scale.

  • Trees can be constructed from morphological data or non-standard molecular data like gene order. This does not provide any edge lengths.

  • Having several different trees, one can combine them and construct a ‘supertree.’ Even though there may have been time scales on the original trees, most supertree methods return a tree without a time scale.

For those trees, we still want to find edge lengths representing the time between speciation events. In this paper, we will estimate the edge lengths from the shape of the tree. The method works for trees which evolved under the Yule model [Yule, 1924; Edwards, 1970; Harding, 1971; Page, 1991]. Under the Yule model, in each point of time, each species is equally likely to split. Minor changes to the method for the Yule model give us an edge length estimation for trees under the popular coalescent setting [Nordborg, 2001].

An example for a tree with unknown edge lengths is the primate supertree 𝒯p recently published in [Vos and Mooers]. Figure 1 shows a part of 𝒯p. The primate tree is a supertree on 218 species and was constructed with the MRP method (Matrix Representation using Parsimony analysis, see [Baum, 1992; Ragan, 1992]). Since for most of the interior vertices, no molecular estimates were available, the edge lengths for the tree were estimated. In [Vos and Mooers], 106 rank functions on 𝒯p were drawn uniformly at random. For each of those rank functions, the expected time intervals, i.e. the edge lengths, between vertices were considered (the expected waiting time after the (n − 1)th event until the nth event is 1/n). The authors of [Vos and Mooers] concluded their paper by asking for an analytical approach to the estimation of the edge length, which we will provide below.

Figure 1.

Figure 1

Part of the primate supertree. Figure 4–13 are some subtrees, for details see [Vos and Mooers].

In order to estimate the edge lengths, we developed the algorithms RankProb and Compare. Those algorithms answer questions like:

Was speciation event with label 76 in the primate tree (see Fig. 1) more likely to be an early event in the tree or a late event? What is the probability that 76 was the 6th speciation event? Was it more likely that speciation event 76 happened before speciation event 162 or 162 before 76?

The algorithms work for trees where every labeled history is equiprobable. This class of model, which includes the Yule model and the coalescent model, has been popular in macroevolutionary studies [Nee and May, 1997; Zhaxybayeva and Gogarten, 2004]. Note that the algorithms here are the same for the Yule model and the coalescent model, whereas the edge length estimation has minor differences for the two models.

The algorithms RankProb, Compare and an algorithm for obtaining the expected rank and variance for a vertex were implemented in Python, see [Gernhard, 2006].

2. Probability Distribution of the Rank of a Vertex

Let 𝒯 be a rooted phylogenetic tree [Semple and Steel, 2003] with |V| = n leaves. The set of interior vertices of 𝒯 shall be . For a binary tree, we have || = n − 1. Let the function r be a bijection from the set of interior vertices of 𝒯 into {1, 2,…, ||} with r(v1)≤r(v2) if v1 is an ancestor of v2. The function r is called a rank function for 𝒯. A vertex v with r(v) = i is said to have rank i. Note that r induces a linear order on the set Further, define r(𝒯 ):= {r: r is a rank function on 𝒯 }. We are interested in the distribution of the possible ranks for a certain vertex, i.e. we want to know the probability of r (v) = i for a given v If every rank function on a given tree is equally likely, we have

P[r(v)=i]=|{r:r(v)=i,rr(T)}||r(T)| (1)

which will be calculated for rooted binary trees in polynomial time by algorithm RankProb. In the algorithm, we will use the formula [Semple and Steel, 2003]

|r (T)|=|V|!Πv V(nv-1) (2)

where nv is the number of leaves below v. Note that Equation 2 holds for binary and nonbinary trees.

Examples of stochastic models on phylogenetic trees where each rank function is equally likely include:

  • The Yule model has the probability distribution P[r|T]=Πv V(nv-1)(n-1)! which is the uniform distribution [Edwards, 1970; Brown, 1994].

  • The coalescent model has the same probability distribution on rooted binary ranked trees as the Yule model. So ℙ[r|𝒯] is the uniform distribution [Aldous, 2001].

  • For some sets of trees (e.g. those drawn from the uniform model [Pinelis, 2003], also known as PDA model), no rank function is induced. If one assumes that all rank functions are equally likely on these trees, one can apply Equation 1 to such trees as well.

2.1. A polynomial-time algorithm

The following algorithm calculates the probability distribution of the rank of a vertex v in a rooted binary phylogenetic tree 𝒯. The idea of the algorithm is the following (cf. Figure 2). Label the vertices on the path from v to the root ρ by v = x1,…, xn = ρ. Let 𝒯m be the subtree of 𝒯 containing the vertex xm and all its descendants. Let α𝒯m,v(i) be the number of rank functions on the tree 𝒯m where v has rank i. The values α 𝒯m,v (i), i = 1,…,|| are calculated iteratively for m = 1,…, n. The probability ℙ[r(v) = i] equals αTn,v(i)Σi=1|V|αTn,v(i). The α-values in the fraction have a lot of factors in common which cancel out. In the following algorithm, we calculate α-values without the unnecessary terms instead, α̃𝒯m,v(i). We have α𝒯m,v(i) = α̃𝒯m,v(i)|r(𝒯1)||r(𝒯′1)||r(𝒯′2 )|…|r(𝒯′m−1)|.Algorithm: RankProb(𝒯, v)Input: A rooted binary phylogenetic tree 𝒯 and an interior vertex v.

Figure 2.

Figure 2

Labeling the tree for the algorithm RankProb.

Output: The probabilities ℙ[r(v) = i] for i = 1,…, ||.

1: Denote the vertices of the path from v to root ρ with (v = x1, x2,…, xn = ρ).
2: Denote the subtree of 𝒯, consisting of root xm and all its descendants, by 𝒯m for m = 1,…,n.
3: Initialize α̃𝒯m,v(i):= 0 for i = 1,…, |𝒯|, m = 1,…, n.
4: α̃𝒯1,v(1):= 1
5: form = 2,…, ndo
6:  𝒯′m−1:= 𝒯m\(𝒯m−1xm ) (cf.Figure 3)
7: fori = m,…, |𝒯m| do
8:   M:=min{|𝒯m−1|, i −2}
9: α˜Tm,v(i):=j=0Mα˜Tm-1,v(i-j-1)(|VTm-1|+|VTm-1|-(i-1)|VTm-1|-j)(i-2j)(*)
10:  end for
11: end for
12: fori = 1,…, |𝒯| do
13: P[r(v)=i]:=α˜Tn,v(i)Σjα˜Tn,v(j)
14: end for
15: RETURN ℙ[r(v) = i], i = 1,…, ||.

Figure 3.

Figure 3

Labeling the tree for the recursion in RankProb.

Proving the correctness and runtime of Rank Prob makes use of the following two observations.

Remark 1

Let Ai be a set containing ni elements with a linear order, i ∈ {1, 2}. There are (n1+n2n1) possible linear orders on A1A2 which preserve the linear order on A1 and A2. This follows from the observation that the number of such linear orders on A1A2 is equivalent to the number of ways of choosing n1 elements from n1 + n2 elements, which is (n1+n2n1).

Remark 2

The values (nk) for all n, kN (n, k, N ∈ℕ) can be calculated in O(N 2) using Pascal’s Triangle. Thus, after O(N 2) calculations, any value (nk) with n, kN can be obtained in constant time.

Theorem 3

RankProb returns the quantities

P[r(v)=i]

for each given vV̊ and all i ∈ 1,…, ||. The runtime is O(||2).

Proof

Let α𝒯m,v(i) = α̃𝒯m,v(i)|r (𝒯′1)||r (𝒯′1)||r(𝒯′2)|…|r(𝒯′m−1)|. We first show that α𝒯m,v(i) = |{r: r(v) = i, rr(𝒯m)}| for m = 1,…, n, i = 1,…, |𝒯|. That implies

P[r(v)=i]=|{r:r(v)=i,rr(T)}||r(T)|=αT,v(i)ΣiαT,v(i)=α˜T,v(i)Σiα˜T,v(i)

which proves the theorem.

The proof is by induction over m.

For m = 1, α𝒯1,v(1) = |r(𝒯1)|α̃𝒯1,v(i) = |r(𝒯1)| = |{r: r(v) = 1, rr(𝒯)}|. Vertex v is the root of 𝒯1, so α𝒯1,v(i) = 0 for all i > 1.

Let m = k and α𝒯m,v(i) = |{r: r(v) = i, rr (𝒯m)}| holds for all m < k𝒯 k,v(i) = 0 clearly holds for all i > |𝒯k| since r𝒯k: v → {1,…,𝒯k|}. So it remains to verify that the term (*) returns the right values for α𝒯 k,v(i). Assume that the vertex v is in the (ij −1)-th position in 𝒯k − 1 (with ij − 1 > 0) for some rank function rTscr;k− 1 and v shall be in the i-th position in 𝒯k.

Now combine the linear order in the tree 𝒯k − 1 induced by r𝒯k− 1 induced with a linear order in 𝒯′k − 1 induced by r𝒯k − 1 to get a linear order on 𝒯k. The first j vertices of 𝒯′k − 1 must be inserted between vertices of 𝒯k − 1 with lower rank than v so that v ends up to be in the i-th position of the tree 𝒯k. Count the number of possible way to do this as follows. The tree 𝒯′k − 1 has |r(𝒯′k − 1)| possible rank functions. Combining a rank function r𝒯k − 1 with a rank function r𝒯k − 1 to get a rank function r𝒯k with r𝒯k(v) = i means inserting the first j vertices of 𝒯′k − 1 anywhere between the first (i − j − 2) vertices of 𝒯k − 1. There are

((i-j-2)+jj)=(i-2j)

possibilities according to Remark 1. For combining the | 𝒯k− 1−(ij − 1) vertices of rank bigger than v in 𝒯 k − 1with the remaining |𝒯k − 1| − j vertices in 𝒯′k − 1, there are

(|VTk-1|-(i-j-1)+|VTk-1|-j|VTk-1|-j)=(|VTk-1|+|VTk-1|-(i-1)|VTk-1|-j)

possibilities. This follows again from Remark 1. The number of rank functions r𝒯k− 1 with r𝒯k− 1(v) = ij −1 is α𝒯k− 1,v (ij −1) by the induction assumption. Multiplying all those possibilities gives

αTk-1,v(i-j-1)|r(Tk-1)|(|VTk-1|+|VTk-1|-(i-1)|VTk-1|-j)(i-2j)

where α𝒯 k− 1,v(i) = α̃𝒯 k− 1,v(i)|r (𝒯1)||r (𝒯′1)||r(𝒯′2)|… r(𝒯′k−2)|. The value |{r: r(v) = i, rr(𝒯)}| is then the sum over all possible j which establishes the correctness of the algorithm.

All that remains is to verify the runtime. Note that the combinatorial factors (nk) for all n, k ≤ || can be calculated in advance in quadratic time, see Remark 2. In the algorithm, those factors can then be obtained in constant time.

The most time consuming part of the algorithm is line 13. Adding up all calculations needed for obtaining α′𝒯m,v(i), m = 1,…, n, i = 1,…,| 𝒯m| comes to:

m=2n|VTm||VTm-1|m=2n|V||VTm-1|=|V|m=2n|VTm-1||V|2

The last inequality holds since the vertices of the 𝒯′m, m = 1,…, n − 1 are distinct. Therefore, the runtime is quadratic.

Remark 4

With ℙ[r(v) = i] from Theorem 3, the expected value μr (v) and the variance σr(v)2( for r(v) can be calculated by

μr(v)=i=1|V|iP[r(v)=i]σr(v)2=i=1|V|i2P[r(v)=i]-μr(v)2

Remark 5

The algorithm RankProb can be generalized to non-binary trees [Gernhard, 2006]. The runtime is again quadratic.

3. Application of RankProb - Estimating Edge Lengths

3.1. The Yule model

A very common stochastic model for rooted binary phylogenetic trees with edge lengths is the continuous-time Yule model [Edwards, 1970]. As in the discrete Yule model, at every point in time, each species is equally likely to split and give birth to two new species. The expected waiting time for the next speciation event in a tree with n leaves is 1/n. That is, each species at any given time has a constant speciation rate (normalized so that 1 is the expected time until it next speciates).

Assume that the primate tree 𝒯p evolved under the continuous-time Yule model. In [Gernhard, 2006], the tree shape of 𝒯p (i.e. the tree without edge lengths) under the discrete Yule model is tested against the uniform model and accepts the Yule model.

Here, we describe how to estimate the edge lengths for a tree which is as sumed to have evolved under the continuous-time Yule model.

Let (u, v) be an interior edge in 𝒯 with u the immediate ancestor of v. Let X be the random variable ‘length of the edge (u, v)’ given that 𝒯 is generated according to the continuous-time Yule model.

The expected length 𝔼[X] of the edge (u, v) is given by

E[X]=i,jE[X|r(u)=i,r(v)=j]P[r(u)=i,r(v)=j].

Since, under the continuous-time Yule model, the expected waiting time for the next speciation event is 1/n it follows that:

E[X|r(u)=i,r(v)=j]=k=1j-i1i+k.

It remains to calculate the probability ℙ[r(u) = i, r(v) = j]. This is equivalent to counting all the possible rank functions where r(u) = i and r(v) = j. The subtree 𝒯v consists of v and all its descendants. The tree 𝒯u equals the tree 𝒯 where all the descendants of v are deleted, i.e. v is a leaf in 𝒯u, see Figure 4.

Figure 4.

Figure 4

Labeling the tree for estimating the edge lengths.

Note that ℙ[r(u) = i, r(v) = j] = 0 if |𝒯u| < j − 1. Therefore, assume |𝒯u| ≥ j − 1 in the following.

The number of rank functions on 𝒯u is |r(𝒯u)|. The probability ℙ[r(u) = i] can be calculated with RankProb(𝒯u, u). So the number of rank functions in 𝒯u with ℙ[r(u) = i] is ℙ[r(u) = i]. |r(𝒯u)|.

The number of rank functions on 𝒯v is |r(𝒯v)|. Let any linear order on the trees 𝒯u and 𝒯v be given. Combining those two linear orders into an order, r, on 𝒯 with r(v) = j means that the vertices with rank 1, 2, …, j − 1 in 𝒯u keep their rank. Vertex v gets rank j. The remaining |𝒯u| − (j − 1) vertices in 𝒯u and |𝒯v| − 1 vertices in 𝒯v have to be shuffled together. According to Remark (1), this can be done in

(|VTu|-(j-1)+|VTv|-1|VTv|-1)=(|VTu|+|VTv|-j|VTv|-1)

different ways. Thus overall there are:

P[r(u)=i]·|r(Tu)|·|r(Tv)|·(|VTu|+|VTv|-j|VTv|-1)

different rank functions on 𝒯 with r(u) = i and r(v) = j. For the probability ℙ[r(u) = i, r(v) = j]:

P[r(u)=i,r(v)=j]=P[r(u)=i]·|r(Tu)|·|r(Tv)|·(|VTu|+|VTv|-j|VTv|-1)Σi,jP[r(u)=i]·|r(Tu)|·|r(Tv)|·(|VTu|+|VTv|-j|VTv|-1)

Since |r(𝒯u)| and |r(𝒯v)| are independent of i and j, those factors cancel out, giving

P[r(u)=i,r(v)=j]=P[r(u)=i]·(|VTu|+|VTv|-j|VTv|-1)Σi,jP[r(u)=i]·(|VTu|+|VTv|-j|VTv|-1) (3)

Furthermore, note that

(|VTu|+|VTv|-j|VTv|-1)=(|VTv|-j)!(|VTv|-1)!(|VT|-j-(|VTv|-1))!

Again, since (|𝒯v| − 1)! is independent of i and j, this factor cancels out, and so

P[r(u)=i,r(v)=j]=P[r(u)=i]·Πk=0|VTv|-2(|VT|-j-k)Σi,jP[r(u)=i]·Πk=0|VTv|-2(|VT|-j-k)

Let Ω = {(i, j): i < j, i, j ∈ {1, …, ||}, |𝒯u| ≥ j − 1}. With this notation, the expected edge length 𝔼[X] is

E[X]=(i,j)ΩE[X|r(u)=i,r(v)=j]P[r(u)=i,r(v)=j]=(i,j)Ω[(k=1j-i1i+k)P[r(u)=i]·Πk=0|VTv|-2(|VT|-j-k)Σ(i,j)Ω[P[r(u)=i]·Πk=0|VTv|-2(|VT|-j-k)]]=Σ(i,j)Ω[(k=1j-i1i+k)·P[r(u)=i]·Πk=0|VTv|-2(|VT|-j-k)]Σ(i,j)Ω[P[r(u)=i]·Πk=0|VTv|-2(|VT|-j-k)] (4)

Remark 6

Equation 4 enables the estimation of the length of every interior edge. For pendant edges, the approach above gives no definite answer. All we know is that the time from the latest interior vertex, which has rank n − 1, until today is expected to be at most 1/n where n is the number of leaves.

Suppose that the growth process is stopped as soon as the n − 1-st speciation event occurs. In this case the expected length X of a pendant edge below an interior vertex v is:

E[X]=i=1n-1P[r(v)=i]k=in-21k+1

The expected depth of vertex v from the first branchpoint is:

i=1n-1P[r(v)=i]k=1i-11k+1

So the depth Y of the leaf in question from the first branchpoint has expectation independent of v:

E[Y]=i=1n-1P[r(v)=i]k=1i-11k+1+i=1n-1P[r(v)=i]k=in-21k+1=i=1n-1P[r(v)=i]k=1n-21k+1=k=1n-21k+1

In other words, assigning to each edge of a given tree topology its expected length gives a tree which obeys a molecular clock.

Remark 7

Often, an inferred tree has vertices with more than two descendants, i.e. there is lack of resolution due to, e.g. confliciting data. Our calculation for the expected edge length assumes a binary tree though.

However, the expected edge length may be calculated for each possible binary resolution of the supertree. Assume the supertree 𝒯 has the possible binary resolutions 𝒯1,…, 𝒯m. For an edge (u, v) in 𝒯 where u is the immediate ancestor of v, the expected edge length is calculated in the trees 𝒯i for i = 1,…, m. The expected edge length in 𝒯i is denoted by ei for i =1,…, m. Note that if u is a vertex with more than two descendants in 𝒯 then v is in general not a direct descendant of u in 𝒯i. The value ei in resolution 𝒯i is then the sum of all expected edge lengths on the path from u to v in 𝒯i.

Calculate the expected edge length 𝔼[X] of (u, v) in the supertree 𝒯 by

E[X]=ΣieiP[Ti]ΣiP[Ti] (5)

where the probability of a tree 𝒯 under the Yule model is [Brown, 1994]

P[T]=2n-1n!ΠvV(nv-1)

Again, once the expected length of pendant edges is included the resulting tree obeys a molecular clock, meaning that all leaves are at the same depth.

3.2. The coalescent process

The edge length estimation in the previous section works for the continuous-time Yule model. By changing the method above slightly, we get an edge length estimation for the coalescent process. In the coalescent setting, we have

E[X|r(u)=i,r(v)=j]=k=1j-i1(i+k)(i+k-1).

Therefore, the expected edge length for an interior edge (u, v) can be calculated by the following modification of Equation 4:

E[X]=Σ(i,j)Ω[(Σk=1j-i1(i+k)(i+k-1))·P[r(u)=i]·Πk=0|VTv|-2(|VT|-j-k)]Σ(i,j)Ω[P[r(u)=i]·Πk=0|VTv|-2(|VT|-j-k)]

The calculations in Section 3.1 and 3.2 provide exact values for the expected length of an interior edge under the Yule or coalescent process as an alternative to simulations. However simulations also provide some indication of the variability in the estimate of edge lengths, and it may be of interest to also investigate analytically the variance (or even the distribution) of the edge length in future work, rather than just its mean.

4. Comparing Two Interior Vertices

The algorithm RankProb can also be used for comparing two interior vertices. Assume again that every rank function on a rooted binary phylogenetic tree 𝒯 is equally likely. The aim is to compare two interior vertices u and v of 𝒯. Was u more likely before (of lower rank than) v or v before u? In other words, what is the probability

Pu<v:=P[r(u)<r(v)]

where r(T ) is the set of all possible rank functions on 𝒯. Note that it does not hold ℙ[r(u) < r(v)] = ℙ[r (u) > r (v)] even with the uniform distibution on the rank functions. The probability ℙu < v is equivalent to counting all the possible rank functions on 𝒯 in which u has lower rank than v and divide that number by all possible rank functions on 𝒯. One idea is to sum up the probabilities ℙ[r (u) = i, r (v) = j] in Equation 3 for all i < j which yields to a runtime of O(|V|4). The following algorithm Compare solves the problem in quadratic time. In the following, for a vertex v, the subtree 𝒯v of 𝒯 consists again of v and all its descendants.

Algorithm: Compare (𝒯, u, v)

Input: A rooted binary phylogenetic tree 𝒯 and two distinct interior vertices u and v.

Output: The probability ℙu < v:= ℙ[r(u) < r(v)|𝒯].

1: Denote the most recent common ancestor of u and v by ρ1.
2: Ifρ1 = vthen
3:  RETURN ℙu<v = 0.
4: end if
5: ifρ1= u then
6.  RETURN ℙu<v = 1.
7. end if
8: Let 𝒯ρ1 be the subtree of 𝒯 which is induced by ρ1.
9: Delete the vertex ρ1 from 𝒯ρ1. The two evolving subtrees are labeled 𝒯u and 𝒯v with u ∈ 𝒯u and v ∈ 𝒯v.
10: Run RankProb(𝒯u, u) and RankProb(𝒯v, v) to get ℙ[r (u) = i] on 𝒯u and ℙ[r (v) = i] on 𝒯v for all possible i.
11: fori = 1, …, | 𝒯u| do
12: ucum(i):= ∑k=1iℙ[r (u) = i]
13: end for
14: u<v = 0
15: fori = 1,…, |𝒯v| do
16: forj = 1,… |𝒯u| do
17:    p:=P[r(v)=i]·(i-1+jj)·(|VTv|-i+|VTu|-j|VTu|-j)·ucum(j)
18:   ℙu<v:= ℙu<v + p
19:  end for
20: end for
21: tot:=(|VTu|+|VTv||VTv|)
22: u<v:= ℙu<v/tot
23: RETURN ℙu<v

Theorem 8

The algorithm Compare returns the value

Pu<v=P[r(u)<r(v)].

The runtime of Compare is O(||2).

Proof

Note that the probability of u having smaller rank than v in tree 𝒯ρ1 equals the probability of u having smaller rank than v in tree 𝒯, since for any rank function on 𝒯 ρ1, there is the same number of linear extensions to get a rank function on the tree 𝒯.

So it is sufficient to calculate the probability ℙu < v in 𝒯 ρ1 If ρ1 = u then u is an ancestor of v in 𝒯, so return ℙu < v = 1. If ρ1 = v then v is an ancestor of u in 𝒯, so return ℙu < v = 0.

Now assume that ρ1u and ρ1v. The run of RankProb calculates the probability ℙ[r (u) = i] in the tree 𝒯u and ℙ[r (v) = i] in 𝒯v for all i. Next, combine those two linear orders. Assume that r (v) = i and that j vertices of 𝒯u are inserted before v. Inserting j vertices of 𝒯u into the linear order of 𝒯v before v is possible in (i-1+jj) ways (see Remark 1). Putting the remaining vertices in a linear order is possible in (|VTv|-i+|VTu|-j|VTu|-j) ways. The probability that the vertex u is among the j vertices which have smaller rank than v is ℙ[r (u) ≤ j] = ucum( j). There are |r (𝒯u)| possible linear orders on 𝒯u and |r (𝒯v )| possible linear orders on 𝒯v. The number of linear orders where vertex v has rank i in 𝒯v, v has rank i + j in 𝒯ρ1 and r(u) < i + j therefore equals

pi,j=P[r(v)=i]·|r(Tv)|·(i-1+jj)·(|VTv|-i+|VTu|-j|VTu|-j)·ucum(j)·|r(Tu)|

Adding up the p′ for each i and j gives the number of linear orders where u has smaller rank than v.

Combining a linear order on 𝒯v with a linear order on 𝒯u is possible in

tot:=(|VTu|+|VTv||VTv|)

different ways (see Remark 1). There are |r (𝒯u)| linear orders on 𝒯u and |r (𝒯v )| linear orders on 𝒯v, so on 𝒯ρ1, there are

tot:=(|VTu|+|VTv||VTv|)|r(Tv)||r(Tv)|

linear orders. Therefore:

Pu<v=Σi,jpi,jtot=Σi,jpi,jtot

with

pi,j=P[r(v)=i]·(i-1+jj)·(|VTv|-i+|VTu|-j|VTu|-j)·ucum(j).

This shows that Compare works correct.

Since RankProb has quadratic runtime, Compare also has quadratic runtime.

Acknowledgements

We thank Arne Mooers for very helpful comments and suggestions on earlier versions of this manuscript and the two anonymous referees for a very careful report.

The Second author’s work is partially supported by grant NSF-DMS-0241246

References

  1. Aldous DJ. Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statist. Sci. 2001;16(1):23–34. ISSN 0883-4237. [Google Scholar]
  2. Baum BR. Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon. 1992;41(1):3–10. [Google Scholar]
  3. Brown JKM. Probabilities of evolutionary trees. . Syst. Biol. 1994;43(1):78–91. [Google Scholar]
  4. Edwards AWF. Estimation of the branch points of a branching diffusion process. (With discussion.) J Roy Statist Soc Ser B. 1970;32:155–174. ISSN 0035-9246. [Google Scholar]
  5. Gernhard T. Stochastic models of speciation events in phylogenetic trees. Diplom thesis 2006 [Google Scholar]
  6. Harding EF.1971The probabilities of rooted tree-shapes generated by random bifurcation Advances in Appl Probability 344–77.ISSN 0001-8678 [Google Scholar]
  7. Hey J. Using phylogenetic trees to study speciation and extinction. Evolution. 1992;46:627–640. doi: 10.1111/j.1558-5646.1992.tb02071.x. [DOI] [PubMed] [Google Scholar]
  8. Nee SC, May RM. Extinction and the loss of evolutionary history. Science. 1997;278:692–694. doi: 10.1126/science.278.5338.692. [DOI] [PubMed] [Google Scholar]
  9. Nordborg M. Coalescent theory. Handbook of Statistical Genetics. 2001:179–212. [Google Scholar]
  10. Page B. Random cladograms and null hypotheses in cladistic biogeography. Systematic Zoology. 1991;40:54–62. [Google Scholar]
  11. Pinelis I. Evolutionary models of phylogenetic trees. Roy. Soc. Lond. Proc. Ser. Biol. Sci. 2003;270(1522):1425–1431+15. doi: 10.1098/rspb.2003.2374. ISSN 0962-8452. With an electronic appendix [DOI 10. 1098 spb. 2003. 2374] [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ragan M. Phylogenetic inference based on matrix representation of trees. . Mol. Phylogenet. Evol. 1992;1:53–58. doi: 10.1016/1055-7903(92)90035-f. [DOI] [PubMed] [Google Scholar]
  13. Semple C, Steel M. Oxford University Press; Oxford: 2003. Phylogenetics, volume 24 of Oxford Lecture Series in Mathematics and its Applications. ISBN 0-19-850942-1. [Google Scholar]
  14. Vos RA, Mooers AO. A new dated supertree of the primates. Systematic Biology, in Revision [Google Scholar]
  15. Yule GU. A mathematical theory of evolution: based on the conclusions of Dr. J.C. Willis. . Philos. Trans. Roy. Soc. London Ser. B. 1924;213:21–87. [Google Scholar]
  16. Zhaxybayeva OD, Gogarten JP. Cladogenesis, coalescence and the evolution of the three domains of life. Trends in Genetics. 2004;20:182–187. doi: 10.1016/j.tig.2004.02.004. [DOI] [PubMed] [Google Scholar]

Articles from Evolutionary Bioinformatics Online are provided here courtesy of SAGE Publications

RESOURCES