Skip to main content
Journal of Computational Biology logoLink to Journal of Computational Biology
. 2017 Sep 1;24(9):831–850. doi: 10.1089/cmb.2016.0159

Enumeration of Ancestral Configurations for Matching Gene Trees and Species Trees

Filippo Disanto 1,, Noah A Rosenberg 1
PMCID: PMC5610458  PMID: 28437136

Abstract

Given a gene tree and a species tree, ancestral configurations represent the combinatorially distinct sets of gene lineages that can reach a given node of the species tree. They have been introduced as a data structure for use in the recursive computation of the conditional probability under the multispecies coalescent model of a gene tree topology given a species tree, the cost of this computation being affected by the number of ancestral configurations of the gene tree in the species tree. For matching gene trees and species trees, we obtain enumerative results on ancestral configurations. We study ancestral configurations in balanced and unbalanced families of trees determined by a given seed tree, showing that for seed trees with more than one taxon, the number of ancestral configurations increases for both families exponentially in the number of taxa n. For fixed n, the maximal number of ancestral configurations tabulated at the species tree root node and the largest number of labeled histories possible for a labeled topology occur for trees with precisely the same unlabeled shape. For ancestral configurations at the root, the maximum increases with Inline graphic, where Inline graphic is a quadratic recurrence constant. Under a uniform distribution over the set of labeled trees of given size, the mean number of root ancestral configurations grows with Inline graphic and the variance with ∼Inline graphic. The results provide a contribution to the combinatorial study of gene trees and species trees.

Keywords: : combinatorics, gene trees, phylogenetics, species trees

1. Introduction

Investigations of the evolution of genomic regions along species tree branches have generated new combinatorial structures that can assist in studying gene trees and species trees (Maddison, 1997; Degnan and Salter, 2005; Than and Nakhleh, 2009; Degnan et al., 2012; Wu, 2012). Among these structures are ancestral configurations, structures that for a given gene tree topology and species tree topology describe the possible sets of gene lineages that can reach a given node of the species tree (Wu, 2012).

Ancestral configurations represent the set of objects over which recursive computations are performed in a fundamental calculation for inference of species trees from information on multiple genetic loci: the evaluation of gene tree probabilities conditional on species trees (Wu, 2012). Because of the appearance of ancestral configurations in sets over which sums are computed [e.g., Eq. (7) of Wu (2012)], solutions to enumerative problems involving ancestral configurations contribute to an understanding of the computational complexity of phylogenetic calculations.

Under the assumption that a gene tree and a species tree have a matching labeled topology t, we examine the number of ancestral configurations that can appear at the nodes of the species tree. Extending results of Wu (2012), whose appendix reported the number of ancestral configurations for caterpillar species trees and established a lower bound for completely balanced species trees, we study the number of ancestral configurations when t belongs to families of trees characterized by a balanced or unbalanced pattern and a seed tree. As a special case, we derive upper and lower bounds on the number of ancestral configurations possessed by matching gene trees and species trees of given size. Finally, we study the mean and the variance of the number of ancestral configurations when t is a random labeled tree of given size selected under a uniform distribution.

2. Preliminaries

We study ancestral configurations for rooted binary labeled trees. We start with some definitions and preliminary results. In Section 2.1, we recall basic properties of rooted binary labeled trees. In Section 2.2, we recall properties of generating functions that will be used to derive some of our enumerative results. Following Wu (2012), in Section 2.3, we define ancestral configurations, and we determine a recursive procedure to compute their number for matching gene trees and species trees at a given species tree node. We then relate the total number of ancestral configurations in a tree to the number of ancestral configurations at the root of the tree.

2.1. Labeled topologies

A labeled topology, or tree for short, of size Inline graphic is a bifurcating rooted tree with n labeled taxa (Fig. 1A). We assume without loss of generality a linear (alphabetical) order Inline graphic among the set Inline graphic of possible labels for the taxa of a tree. A tree of size n has leaves labeled using the first n labels in the order Inline graphic. Given two trees t1 and t2, we write Inline graphic and say that t1 is isomorphic to t2 when, removing labels at their taxa, t1 and t2 share the same unlabeled topology. The set of trees of size n is denoted by Tn, and Inline graphic denotes the set of all trees of any size. The number of trees of size Inline graphic can be computed as Inline graphic (Felsenstein, 1978), which can be rewritten for Inline graphic as

graphic file with name eq14.gif

FIG. 1.

FIG. 1.

A gene tree and a species tree with a matching labeled topology t. (A) A tree t of size 6 isomorphic to the gene tree and species tree depicted in (B, C). Tree t is characterized by its shape and by the labeling of its taxa. It is convenient to label the internal nodes of t. We identify each lineage (edge) of t by its immediate descendant node, so, for example, lineage g results from the coalescence of lineages a and b. (B) A possible realization R1 of the gene tree in (A) (dotted lines) in the species tree with a matching topology (solid lines). The ancestral configuration at species tree node Inline graphic is Inline graphic. The configuration at node m is Inline graphic. (C) A different realization R2 of the gene tree in (A) in the matching species tree. The configurations at species tree nodes Inline graphic and m are Inline graphic and Inline graphic, respectively.

The exponential generating function associated with the sequence Inline graphic is defined as

graphic file with name eq16.gif

and it is given by (Flajolet and Sedgewick, 2009, Example II.19)

graphic file with name eq17.gif

Throughout the article, most of our results are purely combinatorial. Where a probability distribution on the set of labeled topologies of a given size is needed, we assume a uniform probability distribution over the set of trees of given size.

2.2. Exponential growth and analytic combinatorics

Following Flajolet and Sedgewick (2009), a sequence of non-negative numbers an is said to have exponential growth kn or, equivalently, to be of exponential order k when

graphic file with name eq18.gif

This relationship can be rephrased as Inline graphic, where s is a subexponential factor, that is, Inline graphic. By these definitions, a sequence an grows exponentially in n when its exponential order strictly exceeds 1.

The exponential order of a sequence gives basic information about its speed of growth and enables comparisons with other sequences. In particular, from the definition, it follows that if Inline graphic has exponential order ka and Inline graphic has exponential order Inline graphic, then the sequence of ratios Inline graphic converges to 0 exponentially fast as Inline graphic. If two sequences Inline graphic and Inline graphic have the same exponential growth, then we write Inline graphic.

We are interested in the exponential growth of several increasing sequences of non-negative integers. Several results will be obtained through techniques of analytic combinatorics [see Sections IV and VI of Flajolet and Sedgewick (2009)]. The entries of a sequence of integers Inline graphic can be interpreted as the coefficients of the power series expansion Inline graphic at Inline graphic of a function Inline graphic, the generating function of the sequence. Considering z as a complex variable, under suitable conditions, there exists a general correspondence between the singular expansion of the generating function Inline graphic near its dominant singularity—the one nearest to the origin—and the asymptotic behavior of the associated coefficients an. In particular, the exponential order of the sequence Inline graphic is given by the inverse of the modulus of the dominant singularity of Inline graphic. For instance, the exponential order of the sequence Inline graphic, with Inline graphic as in Equation (1), is 2 because Inline graphic is the dominant singularity of the associated generating function [Eq. (3)]. In other words, Inline graphic increases with a subexponential multiple of Inline graphic as n becomes large.

2.3. Gene trees, species trees, and ancestral configurations

In this section, we define the object on which our study focuses: the ancestral configurations of a gene tree G in a species tree S. Ancestral configurations have been introduced by Wu (2012). In our framework, where exactly one gene lineage has been selected from each species, we assume G and S to have the same labeled topology t.

2.3.1. Ancestral configurations

Suppose R is a realization of a gene tree G in a species tree S, where we focus on the case of Inline graphic (Fig. 1). In other words, R is one of the evolutionary possibilities for the gene tree G on the matching species tree S. Viewed backward in time, for a given node k of t, consider the set Inline graphic of gene lineages (edges of G) that are present in S at the point right before node k.

As in Wu (2012), the set Inline graphic is called the ancestral configuration of the gene tree at node k of the species tree. Taking the tree t depicted in Figure 1A and considering the realization R1 of the gene tree Inline graphic in the species tree Inline graphic as given in Figure 1B, we see that the gene lineages a, b, and Inline graphic are those present in the species tree at the point right before the root node m. The set Inline graphic is thus the ancestral configuration of the gene tree at node m of the species tree. Similarly, the ancestral configuration of the gene tree at node Inline graphic of the species tree is the set of gene lineages Inline graphic. In Figure 1C, where a different realization R2 of the same gene tree is depicted, the ancestral configuration at the root m of the species tree is the set of gene lineages Inline graphic. The ancestral configuration at node Inline graphic is Inline graphic.

Let Inline graphic be the set of possible realizations of the gene tree Inline graphic in the species tree Inline graphic. For a given node k of t, by considering all possible elements Inline graphic, we define the set

graphic file with name eq63.gif

and the number

graphic file with name eq64.gif

Thus, Inline graphic corresponds to the number of different ways the gene lineages of G can reach the point right before node k in S, when all possible realizations of the gene tree G in the species tree S are considered. For instance, taking t as in Figure 1A, we have Inline graphic, Inline graphic, and

graphic file with name eq68.gif

Note that for two different realizations Inline graphic and an internal node k, we do not necessarily have Inline graphic.

For each internal node k, our definition of ancestral configuration specifically excludes as a possibility the case in which all gene tree lineages descended from node k have coalesced at species tree node k so that Inline graphic. Each configuration at node k is considered at the point right before node k in the species tree, and there is thus no time for the gene lineages from the left subtree of k to coalesce with those from the right subtree of k. Our definition is identical to that of Wu (2012), with the exception that we say that a leaf or 1-taxon tree has 0 ancestral configurations, whereas Wu assigns these cases 1 ancestral configuration.

Because we assume gene tree G and species tree S have the same labeled topology t, the set Inline graphic and the quantity Inline graphic defined in Equations (4) and (5) depend only on node k and tree t. In what follows, we use the term configuration at node k of t to denote an element of Inline graphic. The next result provides a recursive procedure for calculating the number Inline graphic at a given node k of t.

Proposition 1Given a tree t with Inline graphic, the number Inline graphic of possible configurations at the root r of t can be recursively computed as

graphic file with name eq78.gif

where Inline graphic (resp. rr) denotes the left (resp. right) child of r and Inline graphic is set to 0 when Inline graphic.

Proof. If A and B are two sets of sets, we define Inline graphic. The set Inline graphic of configurations at internal node r can be decomposed as

graphic file with name eq84.gif

where the set unions are disjoint because, as already noted, Inline graphic and Inline graphic. We immediately obtain Equation (7), as Inline graphic.    ■

We reiterate that for Equation (7) to apply for all t with Inline graphic, we must set to 0 the number of configurations at a species tree leaf and at the root of the 1-taxon tree. For the tree depicted in Figure 1A, each configuration in Inline graphic [Eq. (6)] can be obtained as described in Equation (8) from the configurations in Inline graphic and Inline graphic. Note indeed that Inline graphic, as determined by Equation (7).

2.3.2. Total configurations and root configurations

Let Inline graphic be the set of nodes of a tree t. The number of nodes Inline graphic satisfies Inline graphic. Define the total number of configurations in t as the sum

graphic file with name eq96.gif

Let Inline graphic be the number of configurations at the root r of t, or root configurations for short. As is shown in Appendix 1, Inline graphic satisfies the bound

graphic file with name eq99.gif

Furthermore, because Inline graphic for each node k of t, we have

graphic file with name eq101.gif

This result indicates that the total number of configurations c and the number of root configurations Inline graphic are equal up to a factor that is at most polynomial in the tree size Inline graphic. A consequence is that in measuring Inline graphic for a family Inline graphic of trees of increasing size, an exponential growth of the form Inline graphic for the number of root configurations translates into the same exponential growth for the total number of configurations in t:

graphic file with name eq107.gif

where, by virtue of Equation (9), Inline graphic.

An equivalent result holds when we consider the expected value of the total number of configurations Inline graphic in a random labeled tree topology of given size n. Indeed, when a tree of size n is selected at random from the set of labeled topologies, Equation (10) gives Inline graphic. Thus, the exponential growth of Inline graphic with respect to n can be recovered from the exponential growth of Inline graphic,

graphic file with name eq113.gif

Similarly, for the second moment Inline graphic, we have Inline graphic, and thus

graphic file with name eq116.gif

Using these results, in Sections 3 and 5 we will determine the exponential growth of Inline graphic and c with respect to size Inline graphic when t is considered in different settings. In Section 3, t belongs to families of unbalanced or balanced trees, whereas in Section 5, we perform our analysis considering t as a random labeled topology of given size.

2.4. Root configurations in small trees

For small values of n, Equation (7) enables the exhaustive computation of the number of root configurations Inline graphic for representative labelings of each of the unlabeled topologies of size n. In Figure 2, each dot corresponds to the logarithm of the number of root configurations for a certain tree shape of size determined by its x-coordinate. The dots associated with the largest values of Inline graphic are connected by the top line, whose growth is linear in n. Indeed, as was shown by Wu (2012), there exist families of trees for which the growth of the number of root configurations is exponential in the tree size. From Equation (9), it follows that the growth of the sequence of the largest number of root configurations in trees of size n must be exponential in n as well.

FIG. 2.

FIG. 2.

Natural logarithm of the number of root configurations for all possible tree shapes of size Inline graphic. The value for Inline graphic, Inline graphic, is omitted. Dots corresponding to the largest and smallest numbers of root configurations for each n are connected by the top and bottom lines, respectively.

The tree shapes whose labeled topologies possess the largest number of root configurations among trees of fixed size appear in Figure 3 together with their number of root configurations Inline graphic. Starting with Inline graphic, each shape in the sequence can be seen to be produced by connecting two smaller shapes also in the sequence (possibly the same shape) to a shared root.

FIG. 3.

FIG. 3.

Tree shapes of size Inline graphic whose labeled topologies have the largest number of root configurations among trees of size n. The number of root configurations Inline graphic is indicated for each tree. In each tree displayed, the two root subtrees each maximize the number of root configurations among trees of their size.

The tree shape that minimizes the number of root configurations is the caterpillar topology. The number of root configurations in the caterpillar of size n is Inline graphic (Wu, 2012). The bottom line in Figure 2, which connects dots corresponding to the smallest number of root configurations for a tree with n taxa, grows with Inline graphic.

These observations show that tree topology can have a considerable impact on the number of ancestral configurations that are possible for a given tree size. Indeed, the next section investigates the effect of tree balance on the number of root configurations in a tree. Figure 2 suggests that for random labeled topologies of a specified size, we can expect the variance of the number of root configurations to be large. We will confirm this claim in Section 5. We will also show that although there exist tree families (e.g., caterpillars) for which the growth of the number of root configurations is polynomial in the tree size, the expected number of root configurations in a random labeled topology of given size n grows exponentially in n.

3. Root Configurations for Unbalanced and Balanced Families of Trees

In this section, we study the number of root configurations for particular families of trees, extending beyond two cases considered by Wu (2012): the caterpillar case, which was studied exactly, and the completely balanced case, for which a loose lower bound of Inline graphic was reported. As balance is an important tree property that influences ancestral configurations, we study unbalanced and balanced families generated by different seed trees. Upper and lower bound results on the number of root configurations for trees of specified size appear in Section 4.

For a given seed tree s, we consider the unbalanced family Inline graphic (Fig. 4A) and the balanced family Inline graphic (Fig. 4B) defined as follows:

graphic file with name eq133.gif
graphic file with name eq134.gif

FIG. 4.

FIG. 4.

Unbalanced and balanced families of trees defined from a given seed tree s. (A) The unbalanced family Inline graphic is defined by Inline graphic, setting Inline graphic as the tree of size Inline graphic obtained by appending uh and s to a shared root node. (B) The balanced family Inline graphic is defined by Inline graphic, setting Inline graphic as the tree of size Inline graphic obtained by appending two copies of bh to a shared root node.

where Inline graphic is the tree shape obtained by appending trees t1 and t2 to a shared root node. Note that the family of caterpillar trees is obtained as Inline graphic when Inline graphic. For the same seed tree of size 1, Inline graphic is the family of completely balanced trees. When Inline graphic, Inline graphic resembles the lodgepole family Inline graphic, which is defined recursively by setting Inline graphic as the 1-taxon tree, and Inline graphic (Disanto and Rosenberg, 2015). The only difference is that in Inline graphic, each leaf is in a cherry, whereas Inline graphic has a unique leaf that is not in a cherry. For each family, it is understood that we consider an arbitrary labeling of each unlabeled shape in the family.

3.1. Unbalanced families

Fix a seed tree s and consider the family Inline graphic as defined in Equation (14). Let Inline graphic be the number of root configurations in Inline graphic, and define Inline graphic as the number of root configurations in uh. If s is the 1-taxon tree, then as noted earlier, the number of root configurations Inline graphic is set to 0. From Proposition 1, we obtain the recursion

graphic file with name eq151.gif

starting with Inline graphic. As shown in Appendix 2, the generating function

graphic file with name eq153.gif

is described by

graphic file with name eq154.gif

For Inline graphic, the dominant singularity of Inline graphic—the singularity nearest to the origin—is the solution Inline graphic of the equation Inline graphic. Applying Theorem IV.7 of Flajolet and Sedgewick (2009) yields the exponential growth of the sequence Inline graphic with respect to the index h as

graphic file with name eq160.gif

Because uh has Inline graphic leaves, substituting Inline graphic in Equation (18), we obtain the next proposition.

Proposition 2In the unbalanced family Inline graphic, the exponential growth of the number of root configurations in the size Inline graphic is

graphic file with name eq165.gif

where Inline graphic is the size of the seed tree and Inline graphic is its number of root configurations. The total number of configurations in the family Inline graphic has the same exponential growth.

In other words, for values of the number of leaves n at which a member of the unbalanced family exists, the number of root configurations in the unbalanced family grows with Inline graphic.

When the seed tree is the 1-taxon tree, so that Inline graphic and Inline graphic is the sequence of caterpillar trees, Equation (19) gives the exponential growth Inline graphic. Indeed, the number of root configurations in the caterpillar family grows like a polynomial function of the size, as immediately follows from Equation (16) [see also Wu (2012)]. Taking Inline graphic, the number of root configurations in Inline graphic becomes exponential in the tree size. Table 1 illustrates that for unbalanced families defined by small seed trees of size greater than one, root configurations in n-taxon trees—provided that a tree with n taxa is in the family—have exponential growth in the range Inline graphic to Inline graphic.

Table 1.

Approximate Values of the Constants That When Raised to the Power n Describe the Exponential Growth with the Number of taxa n of the Number of Ancestral Configurations in Unbalanced and Balanced Families For Small Seed Trees

Seed tree s Inline graphic Inline graphic Inline graphic(unbalanced) Inline graphic(balanced) Seed tree s Inline graphic Inline graphic Inline graphic(unbalanced) Inline graphic(balanced)
graphic file with name inl-1.gif 1 0 1 1.503 graphic file with name inl-8.gif 5 6 1.476 1.479
graphic file with name inl-2.gif 2 1 1.414 1.503 graphic file with name inl-9.gif 6 5 1.348 1.351
graphic file with name inl-3.gif 3 2 1.442 1.469 graphic file with name inl-10.gif 6 6 1.383 1.385
graphic file with name inl-4.gif 4 3 1.414 1.425 graphic file with name inl-11.gif 6 7 1.414 1.416
graphic file with name inl-5.gif 4 4 1.495 1.503 graphic file with name inl-12.gif 6 8 1.442 1.444
graphic file with name inl-6.gif 5 4 1.380 1.385 graphic file with name inl-13.gif 6 10 1.491 1.492
graphic file with name inl-7.gif 5 5 1.431 1.435 graphic file with name inl-14.gif 6 9 1.468 1.469

Each constant is obtained to three decimal places by numerically evaluating Equation (20).

3.2. Balanced families

The results change when we consider balanced families. For a fixed seed tree s, consider the family Inline graphic as defined in Equation (15). Let Inline graphic be the number of root configurations in seed tree Inline graphic, and define Inline graphic as the number of root configurations in bh. If Inline graphic, then Inline graphic is 0. From Proposition 1, we obtain

graphic file with name eq205.gif

with Inline graphic. Defining the sequence Inline graphic, with Inline graphic, it is straightforward to show that Inline graphic.

Sequence xh can be studied as in Aho and Sloane (1973, Section 3 and Example 2.2). For Inline graphic, a constant Inline graphic exists for which

graphic file with name eq212.gif

where Inline graphic is the floor function for k. The constant Inline graphic can be approximated using the recursive definition of xh, summing terms in a series:

graphic file with name eq215.gif

Switching back to Inline graphic, for Inline graphic, we obtain

graphic file with name eq218.gif

Thus, because Inline graphic grows with Inline graphic, to determine the exponential growth of the number of root configurations, it remains to evaluate the constant Inline graphic. Rescaling Equation (21) to consider the number of leaves Inline graphic as a parameter, we obtain the next proposition.

Proposition 3In the balanced family Inline graphic, the exponential growth of the number of root configurations in the size Inline graphic is

graphic file with name eq225.gif

where Inline graphic is the size of the seed tree. The constant Inline graphic can be computed as in Equation (20) and bounded by

graphic file with name eq228.gif

The total number of configurations in the family Inline graphic has the same exponential growth.

In other words, for values of the number of leaves n, at which a member of the balanced family exists, the number of root configurations in the balanced family grows with Inline graphic.

Proof. It remains only to prove the bound [Eq. (23)]. The lower bound follows quickly from Equation (20), as the exponent is positive. The upper bound is obtained by observing that the sequence Inline graphic is increasing, and thus Inline graphic for each Inline graphic. Therefore, from Equation (20) and the fact that Inline graphic, we have

graphic file with name eq235.gif

    ■

Comparing the number of root configurations in balanced families with those in unbalanced families (Table 1), we see that the exponential order for balanced families is greater than in unbalanced families, although typically still in the range Inline graphic to Inline graphic.

3.3. Comparing unbalanced and balanced families

For a given seed tree s, the quantities Inline graphic and Inline graphic determine the exponential orders of the sequences considered in Propositions 2 and 3, respectively. We observe three facts.

(i) Applying the lower bound in Equation (23), Inline graphic, for a fixed seed tree s, we always have

graphic file with name eq241.gif

Therefore, the growth of the number of ancestral configurations in the family Inline graphic is exponentially faster than the growth in the family Inline graphic. When s is not small, however, Inline graphic can become close to Inline graphic. For large s, Inline graphic is also large. Owing to the upper bound in Equation (23), although Inline graphic, Inline graphic only slightly exceeds Inline graphic. Furthermore, the exponent Inline graphic in the expressions for Inline graphic and Inline graphic further reduces the difference between them.

For instance, if s is the caterpillar tree with 10 leaves, we have Inline graphic, Inline graphic, and Inline graphic. In this case, Inline graphic is bounded above by a constant near Inline graphic. The increasing similarity of Inline graphic and Inline graphic is already evident in Table 1, as their values for 6-taxon seed trees are substantially closer to each other than for the smaller 1-, 2-, and 3-taxon seed trees.

(ii) The choice of the seed tree can play an important role in the relative values of Inline graphic and Inline graphic as taking two different seed trees can flip the inequality in Equation (24). In fact, if s1 and s2 are two seed trees of the same size Inline graphic for which Inline graphic, then

graphic file with name eq264.gif

To obtain this result, we note that Inline graphic Inline graphic, where the latter inequality follows from the upper bound [Eq. (23)]. The result is observable in Table 1, where at fixed Inline graphic of 4, 5, or 6, Inline graphic for some of the shapes exceeds Inline graphic for other shapes.

(iii) When the seed tree s is chosen as the 1-taxon tree with Inline graphic, the constant Inline graphic determines an upper bound for the number of root configurations that a tree of given size can have. This result is shown in more detail in the following section. The value of k0 can be computed numerically from Equation (20):

graphic file with name eq272.gif

This constant provides the exact value for which Inline graphic, reported by Wu (2012), provided a lower bound.

4. Smallest and Largest Numbers of Root Configurations for Trees of Fixed Size

We have seen that the number of root configurations for caterpillar trees grows polynomially and that the number of root configurations in unbalanced noncaterpillar families and balanced families grows exponentially. In the examples we have considered, the exponential growth proceeds with Inline graphic to Inline graphic. We now show that the caterpillar trees have the smallest number of root configurations and that the constant k0 [Eq. (26)], in fact, provides an upper bound on the exponential growth of the number of root configurations as n increases. We characterize the labeled topologies that possess the largest number of root configurations at fixed n.

4.1. Smallest number of root configurations

For the caterpillar tree of size n, the number of root configurations is Inline graphic. We show that this value, Inline graphic, is the smallest number of root configurations for a tree of size n.

Let Inline graphic denote the number of root configurations of tree t. Let Inline graphic. Suppose we have shown for each i with Inline graphic that

graphic file with name eq281.gif

The claim clearly holds for Inline graphic, for each of which the sole tree t has Inline graphic root configurations.

For Inline graphic, we use induction to prove Equation (27) for Inline graphic. Suppose Inline graphic is a tree of size n such that Inline graphic. The number of root configurations of Inline graphic is given by Proposition 1 as the product Inline graphic, where Inline graphic and Inline graphic are the root subtrees of Inline graphic. Because Inline graphic has the minimal number of root configurations, Inline graphic and Inline graphic must separately possess the minimal number of root configurations among trees of their size. We can then write Inline graphic and Inline graphic, where, without loss of generality, i is a certain value with Inline graphic. Therefore, Inline graphic has the form Inline graphic. It is determined from the minimum

graphic file with name eq301.gif

Applying the inductive hypothesis [Eq. (27)], we obtain Inline graphic. In the permissible range for i, the product Inline graphic reaches its minimum value at Inline graphic, equaling Inline graphic as desired.

By induction, we have shown that Equation (27) holds for each Inline graphic. Furthermore, the fact that the product Inline graphic in Equation (28) is minimal only at Inline graphic also demonstrates that those tree shapes of size n with the smallest number of root configurations can be recursively obtained by appending the 1-taxon tree and the tree shape of size Inline graphic with the smallest number of root configurations to a shared root node. Trees resulting from this recursive construction are exactly those having a caterpillar shape.

4.2. Largest number of root configurations

For the largest number of root configurations, we denote Inline graphic. Similarly to Equation (28), we seek to identify the trees t that produce the maximum in the following equation and to evaluate that maximum:

graphic file with name eq312.gif

Note that Inline graphic. Taking Inline graphic, we have the recursion

graphic file with name eq315.gif

starting with Inline graphic. The sequence Inline graphic was studied by de Mier and Noy (2012, Theorems 1 and 2), where it was shown (i) taking Inline graphic as the power of 2 nearest to Inline graphic, we have Inline graphic, so that

graphic file with name eq321.gif

(ii) for all Inline graphic, Inline graphic, that is,

graphic file with name eq324.gif

where the constant k0 has been already computed in Equation (26).

For small n, the labeled topologies with the largest numbers of root configurations appear in Figure 3. Collecting the results for the smallest and largest number of root configurations, we can state the following facts.

Proposition 4(i) For each Inline graphic, the smallest number of root configurations in a tree of size n is Inline graphic. The caterpillar tree shape of size n has exactly Inline graphic root configurations. (ii) For each Inline graphic, the largest number of root configurations in a tree of size n, Inline graphic, can be bounded as in Equation (30). For Inline graphic, if Inline graphic denotes the power of 2 nearest to Inline graphic, then Inline graphic is the number of root configurations in the tree shape tn recursively defined as Inline graphic, Inline graphic. When Inline graphic for integers h, tn is the completely balanced tree of depth h and Inline graphic [Eq. (21)].

As a corollary, we obtain the following result, the proof of which appears in Appendix 3.

Corollary 1Inline graphic The exponential growth of the sequences Inline graphic and Inline graphic follows Inline graphic and Inline graphic. Inline graphic The sequences Inline graphic and Inline graphic, giving, respectively, the smallest and the largest total number of configurations ct in a tree t of size n, have exponential growth Inline graphic and Inline graphic.

The family of tree shapes Inline graphic defined in Proposition 4 by the recursive decomposition Inline graphic and Inline graphic, where d is the power of 2 nearest to Inline graphic, already has a place in the study of gene trees and species trees, as it provides the maximally probable tree shapes of Degnan and Rosenberg (2006). Given a labeled topology t of size n, a labeled history of t is a linear ordering of the Inline graphic internal nodes of t such that the order of the nodes in each path going from the root of t to a leaf of t is increasing (Fig. 5). As reported by Harding (1974) and proved by Hammersley and Grimmett (1974), each labeled topology with tn as its underlying unlabeled topology possesses the maximal number of labeled histories among labeled topologies of size n. Consider the Yule model for the probability distribution of tree shapes, in which pairs of lineages in a labeled set of n lineages are joined together, at each step choosing uniformly among pairs (Yule, 1925; Harding, 1971; Brown, 1994; McKenzie and Steel, 2000; Steel and McKenzie, 2001; Rosenberg, 2006; Disanto et al., 2013; Disanto and Wiehe, 2013). Among all labeled topologies with size n, those with the largest number of labeled histories—and hence with shape tn—have the highest probability under the model.

FIG. 5.

FIG. 5.

The three labeled histories of the labeled topology Inline graphic of size Inline graphic. Each labeled history can be represented by bijectively labeling the Inline graphic internal nodes of t with the integers in Inline graphic in such a way that each path from the root of t to a leaf of t is labeled by an increasing sequence.

For Inline graphic, the maximally probable labeled topologies of size n—those with the most labeled histories—can be recursively characterized as those labeled topologies whose two root subtrees are maximally probable labeled topologies of sizes Inline graphic and Inline graphic, where Inline graphic (Hammersley and Grimmett, 1974; Harding, 1974). This characterization matches our characterization that the unlabeled shapes with the largest number of root configurations are those for which the subtrees have the most root configurations and sizes d and Inline graphic, where Inline graphic is the nearest power of 2 to Inline graphic.

To see that the characterizations are identical so that Inline graphic, note that a specific Inline graphic is the nearest power of 2 to Inline graphic precisely for integers Inline graphicInline graphic. On the endpoints of the interval, there are two choices for d, but in both cases, one choice is Inline graphic. At the same time, the integers n for which Inline graphic are precisely those in Inline graphic. Thus, Inline graphic for all integers n in Inline graphic. On the lower boundary, for Inline graphic, Inline graphic and Inline graphicInline graphic. Dividing the integers in Inline graphic into a union of intervals Inline graphic, we see that Inline graphic on each interval and hence Inline graphic for all Inline graphic.

This result shows that for a given tree size, those labeled topologies whose shapes belong to the family Inline graphic maximize both the number of root configurations and the number of labeled histories. For these labeled topologies, in Figure 6, we plot the logarithm of the maximum number of labeled histories possible for a labeled topology of size n as a function of the logarithm of the maximum number of root configurations. Although the shapes are the same, the number of labeled histories is considerably larger than the number of root configurations. The growth is approximately linear, suggesting that the maximal number of labeled histories increases approximately exponentially in the maximal number of root configurations.

FIG. 6.

FIG. 6.

Natural logarithm of the maximum number of histories possible for a labeled topology of size n as a function of the natural logarithm of the maximum number of root configurations possessed by a labeled topology of the same size (Inline graphic). The maxima occur at the same set of labeled topologies.

5. The Number of Root Configurations in a Random Labeled Topology

We now study through generating functions the number of root configurations when trees of a given size are randomly selected under a uniform distribution on the set of labeled topologies. In Section 5.1, we show that the expectation Inline graphic of the number of root configurations in a random labeled topology of size n has exponential growth Inline graphic. In Section 5.2, we show that the variance Inline graphic of the number of root configurations has exponential growth Inline graphic. The same results hold for the random total number of configurations.

5.1. Mean number of root configurations

Define the exponential generating function

graphic file with name eq389.gif

where Inline graphic is the number of root configurations in tree t. As shown in Appendix 4, the function F satisfies

graphic file with name eq391.gif

where Inline graphic is the exponential generating function in Equation (3). Solving Equation (32), we obtain a closed form for Inline graphic,

graphic file with name eq394.gif

We have taken the negative root of the quadratic equation, as it is the root that produces the correct value of Inline graphic at Inline graphic. It can be seen that Inline graphic is required by noting that the first term in Equation (31) is the z1 term, as the set T contains only trees of size at least 1, so that Equation (31) has no constant term.

The value of z that cancels the second square root in Equation (33) is Inline graphic, which is smaller than the value Inline graphic that cancels the first square root, Inline graphic. In the complex plane, both Inline graphic and Inline graphic are singularities of Inline graphic. The dominant singularity is Inline graphic as it is nearer to the origin. To highlight the type of singularity that Inline graphic has at the point Inline graphic, it is convenient to factor the second square root in Equation (33), writing Inline graphic as

graphic file with name eq408.gif

where

graphic file with name eq409.gif

is an analytic function in the circle Inline graphic, except at a removable singularity Inline graphic. Thus, we see that at Inline graphic, the generating function Inline graphic has a singularity of the square root type.

We can then apply Theorems VI.1 and VI.4 of Flajolet and Sedgewick (2009) to recover the asymptotic behavior of the nth coefficient of Inline graphic,

graphic file with name eq415.gif

as the nth coefficient of the expansion of Inline graphic at the singularity Inline graphic. This expansion is given by

graphic file with name eq418.gif

We thus have

graphic file with name eq419.gif

where we have used the asymptotic relationship Inline graphic (Flajolet and Sedgewick, 2009). Dividing by the number of trees of size n, Inline graphic, as given in Equation (1), using Stirling's formula Inline graphic, and noting the definition of Inline graphic as a mean over all labeled topologies, we obtain the asymptotic expected number of root configurations in a random labeled topology of size n:

graphic file with name eq424.gif

We summarize these results in a proposition.

Proposition 5The mean number of root configurations in a random labeled topology of size n among the Inline graphic labeled tree topologies is asymptotically

graphic file with name eq426.gif

The mean total number of configurations has exponential growth

graphic file with name eq427.gif

In Figure 7A, we can see that the approach of the natural logarithm of the exact mean number of root configurations—computed by evaluating the expansion of the generating function Inline graphic—to the asymptotic value Inline graphic proceeds quickly, so that even with small values of n, the exact mean and the asymptote are quite close on a logarithmic scale.

FIG. 7.

FIG. 7.

Mean and variance of the number of root configurations in random labeled topologies of fixed size. (A) Exact natural logarithm of the mean, computed from the power series expansion of Inline graphic [Eq. (33)], and its asymptotic approximation from Proposition 5. (B) Exact natural logarithm of the variance, computed from the power series expansion of Inline graphic [Eq. (39)], and its asymptotic approximation from Proposition 6.

5.2. Variance of the number of root configurations

By applying the same approach used to determine the mean value of the number of root configurations across labeled topologies, in this section, we study the expectation Inline graphic and then derive the asymptotic variance Inline graphic of the number of root configurations.

Define the generating function

graphic file with name eq434.gif

As shown in Appendix 5, the function Inline graphic satisfies

graphic file with name eq436.gif

This equation relates Inline graphic to the generating functions Inline graphic and Inline graphic appearing in Equations (33) and (3). Solving for Inline graphic, we obtain the function

graphic file with name eq441.gif

which has its dominant singularity at Inline graphic. In the same way as in the derivation of Inline graphic, we have taken the negative root of the quadratic Equation (38) as it is this root that produces the correct value of Inline graphic at Inline graphic. At the dominant singularity for z, the first square root in Equation (39) cancels. Factoring this square root, the function Inline graphic can be written as

graphic file with name eq447.gif

where

graphic file with name eq448.gif

The function Inline graphic is analytic in the circle Inline graphic, except at the removable singularity Inline graphic. By Theorems VI.1 and VI.4 of Flajolet and Sedgewick (2009), we can recover the asymptotic behavior of the nth coefficient Inline graphic as

graphic file with name eq453.gif

Dividing by Inline graphic and using Stirling's approximation, we get

graphic file with name eq455.gif

To obtain an asymptotic estimate for the variance, we use Equation (36) to note that the exponential growth of Inline graphic is Inline graphic. Because Inline graphic, we have that as Inline graphic,

graphic file with name eq460.gif

and thus, the variance asymptotically satisfies Inline graphic.

Furthermore, because Inline graphic and Inline graphic as shown in Equations (12) and (13), Equation (43) also holds when we replace Inline graphic by c. Thus, the variance Inline graphic of the total number of configurations in a random labeled topology of size n satisfies

graphic file with name eq466.gif

We summarize these results in a proposition.

Proposition 6The variance of the number of root configurations in a random labeled topology of size n among the Inline graphic labeled tree topologies is asymptotically

graphic file with name eq468.gif

where Inline graphic. The variance of the total number of configurations has exponential growth

graphic file with name eq470.gif

Figure 7B demonstrates that on a logarithmic scale, the approach of the exact variance of the number of root configurations—computed from Inline graphic—to the asymptotic value Inline graphic occurs rapidly in n, although slower than was seen for the mean in Figure 7A.

6. Conclusions

Under the assumption that the labeled gene tree topology matches the species tree topology, Inline graphic, we have studied the number of ancestral configurations in a given phylogenetic tree t. In particular, we have focused on the exponential growth of the number of root configurations in t, a quantity that also describes the exponential growth of the total number of configurations in t.

In Section 3, extending results of Wu (2012), in which the enumeration of ancestral configurations for caterpillar trees and a lower bound for their number in completely balanced trees were determined, we considered special families of trees generated by arbitrary seed trees s, namely the unbalanced family Inline graphic and the balanced family Inline graphic (Fig. 4). The main results describing the influence of tree balance and the seed tree topology on the number of ancestral configurations are collected in Proposition 2 and Proposition 3 for the unbalanced and balanced cases. We have shown that for each fixed seed tree s, the number of ancestral configurations in the balanced family Inline graphic grows exponentially faster than in the unbalanced family Inline graphic. When the size of the seed tree s is large, however, the difference between the exponential orders of the two integer sequences can become small. We have also observed that the choice of the seed tree can have an important influence on the number of root configurations. In fact, the number of root configurations in the family Inline graphic can grow exponentially faster than in the family Inline graphic when the number of root configurations in s1 exceeds that of s2.

When Inline graphic, the unbalanced family Inline graphic reduces to the caterpillar family, and the balanced family Inline graphic gives the family of completely balanced trees. As shown in Proposition 4, among trees of size n, the caterpillar tree with n taxa possesses the smallest number of root configurations. When n is a power of 2, the completely balanced tree of size n has the largest number; more generally, the largest number of root configurations occurs at precisely those labeled topologies that for a fixed n generate the largest number of labeled histories. As the caterpillar labeled topologies give rise to the smallest number of labeled histories at fixed n—only one—both the largest and smallest numbers of root configurations occur at trees producing the extrema in the number of labeled histories. The growth of the number of root configurations in the caterpillar family is polynomial, whereas for the completely balanced trees, it is exponential with order Inline graphic.

Assuming a uniform distribution over the labeled topologies with a given size n, in Section 5 we studied the mean and the variance of the number of ancestral configurations in a random labeled topology of size n. By using a generating function approach, in Propositions 5 and 6, we have shown that the mean number of ancestral configurations has exponential growth Inline graphic, whereas for the variance, we have

graphic file with name eq493.gif

Our results can assist in relating the complexity of algorithms for computing gene tree probabilities based on ancestral configurations—STELLS (Wu, 2012)—to those that use an evaluation based on a different class of combinatorial objects, the coalescent histories (Degnan and Salter, 2005; Rosenberg, 2007; Than et al., 2007; Rosenberg and Degnan, 2010; Rosenberg, 2013; Disanto and Rosenberg, 2015, 2016). In such comparisons, we expect that the ancestral configurations will often grow slower, as is seen in comparing the polynomial growth of the number of ancestral configurations in the caterpillar case with the corresponding exponential growth of the number of coalescent histories. However, the trees with the largest numbers of coalescent histories and the largest number of ancestral configurations are not the same, so that potential exists for each type of algorithm to be favorable in different cases. It remains to be seen whether the complexity of gene tree probability calculations can be reduced by choosing the computational approach based on tree sizes and shapes under consideration.

Many enumerative problems on ancestral configurations remain open. First, we assumed that the gene tree and species tree have the same labeled topology, and we did not study nonmatching gene trees and species trees. As has been seen for coalescent histories (Rosenberg and Degnan, 2010), however, the nonmatching case merits further analysis, as a nonmatching gene tree labeled topology can have more root configurations and more total configurations than the topology that matches the species tree. Consider a caterpillar species tree topology Inline graphic, labeling the unique internal node with k descendants bk for Inline graphic. For a matching caterpillar gene tree, by Proposition 1, the number of configurations at node bk is Inline graphic, so that the number of root configurations is Inline graphic and the total number of configurations is Inline graphic.

Now consider a pseudocaterpillar gene tree topology Inline graphicInline graphic with Inline graphic, continuing with Inline graphic as the species tree topology. Topology Inline graphic differs from Inline graphic only in the placement of a4. We label the node of Inline graphic ancestral to a1 and a2 by d2, the node ancestral to a3 and a4 by Inline graphic, and the unique node ancestral to k taxa, Inline graphic, by dk. At nodes b2, b3, b4, and b5 of Inline graphic, the configurations are Inline graphic, Inline graphic, Inline graphic, and Inline graphic, with Inline graphic, Inline graphic, Inline graphic, and Inline graphic. For Inline graphic, Inline graphic is obtained by adding taxon ak to each configuration in Inline graphic and noting the existence of one additional configuration, Inline graphic, so that Inline graphic. The number of root configurations of Inline graphic for Inline graphic is Inline graphic, and the number of total configurations is Inline graphic Inline graphic. Because Inline graphic and Inline graphic for Inline graphic, root configurations and total configurations are more numerous for the nonmatching pseudocaterpillar topology than for the matching caterpillar.

Second, when ancestral configurations are grouped according to an equivalence relationship defined in the appendix of Wu (2012) that accounts for symmetries in gene trees, the number of the resulting equivalence classes—the number of nonequivalent ancestral configurations—remains to be investigated. For gene trees and species trees with a matching labeled topology, our enumerations can be used as upper bounds for the number of nonequivalent ancestral configurations, and they can help in measuring the decrease in the number of ancestral configurations when the equivalence relationship is taken into account. We defer this analysis for future work.

7. Appendix 1. Proof of Equation (9)

Given a tree t, fix without loss of generality one of the possible planar representations of the tree t: one of the possible drawings of t in which edges do not cross and intersect only at their endpoints (Fig. 1A).

A root configuration of t uniquely determines a partition of the set of leaves of t in the following way. If Inline graphic is a root configuration of t, where each ki is a node of t, then the associated partition is Inline graphic where Inline graphic is the set of leaves of t descended from node ki (including ki itself when ki is a leaf). For instance, the partition of the leaf label set Inline graphic associated with the root configuration Inline graphic depicted in Figure 1B is Inline graphic. Note that for each pair of indices Inline graphic with Inline graphic, the leaves in Inline graphic are either all on the left or all on the right of the leaves in Inline graphic in the planar representation of t.

Without loss of generality, we can assume that the set Inline graphic is indexed such that if Inline graphic, then the leaves in Inline graphic are all depicted in the planar representation to the left of the leaves in Inline graphic. Taking the cardinality of each element Inline graphic of Inline graphic determines the vector Inline graphic which represents a composition, or ordered partition, of the integer Inline graphic. For instance, for the root configurations of the tree of size Inline graphic depicted in Figure 1A, we obtain the following compositions of 6:

graphic file with name eq549.gif

As can be seen in this example, for a given planar representation of t, the mapping Inline graphic is injective (i.e., Inline graphic). For Inline graphic, there are Inline graphic compositions of n into i parts, as Inline graphic demarcations must be placed among Inline graphic possible positions between entries of the length-n vector Inline graphic to separate groups of 1s that will be aggregated together. Using the binomial theorem to sum over all possible values of i, the number of distinct compositions of n is Inline graphic. Because each root configuration is associated with a distinct composition of n, we obtain Inline graphic, and the proof of Equation (9) is complete.

8. Appendix 2. Proof of Equation (17)

We obtain Equation (17) from Equation (16) by noting that for z close to 0, the following expansion holds:

graphic file with name eq559.gif

9. Appendix 3. Proof of Corollary 1

The proof follows from the properties of Inline graphic and Inline graphic stated in Proposition 4. Part (i) is immediate from Proposition 4 and the definition of the exponential order.

For (ii), we start with mn. Let Inline graphic be the exponential growth of the sequence mn, so that km is its exponential order. Denote by Inline graphic the caterpillar family of trees, where tn is the caterpillar with Inline graphic taxa. Thus, Inline graphic is the total number of configurations in tn and Inline graphic is its number of root configurations. By Equation (11), we have Inline graphic, and Inline graphic from part (i) of the corollary. Thus, Inline graphic Because total configurations are at least as numerous as root configurations, Inline graphic. Then the growth of mn has exponential order at most that of Inline graphic, so that Inline graphic. Clearly, however, we cannot have Inline graphic, because Inline graphic for Inline graphic and Inline graphic would imply that the sequence mn decreases below 1 with increasing n. Thus, Inline graphic.

For the sequence Mn, let Inline graphic be the exponential growth of the sequence Inline graphic This sequence has exponential order kM. Suppose Inline graphic is any sequence of trees with Inline graphic such that Inline graphic; that is, tn has the largest total number of configurations among trees of size n. From Equation (11), Inline graphic, where the latter sequence has order smaller than or equal to k0 because by definition Inline graphic for all n, and Inline graphic from part (i) of the corollary. Thus, Inline graphic At the same time, for all n, we have Inline graphic, as the largest total number of configurations is larger than the largest number of root configurations. Thus, Inline graphic. It follows that Inline graphic.

10. Appendix 4. Proof of Equation (32)

The proof follows from the tree decomposition procedure that is illustrated in Figure 8. According to this procedure, each tree t of size n is either the 1-taxon tree Inline graphic or it can be created in a unique way by relabeling and appending to a shared root node two smaller trees t1 and t2 that become the root subtrees of t. From Proposition 1, the number Inline graphic of root configurations of t can be computed in this case as the product Inline graphic. Summing over all possible trees t, the tree decomposition described in Figure 8 translates into the following decomposition for the generating function Inline graphic:

graphic file with name eq594.gif

FIG. 8.

FIG. 8.

Composition of two trees t1 and t2 of sizes Inline graphic and Inline graphic to obtain a tree t of size Inline graphic. (A) Trees t1 and t2, with leaves labeled by Inline graphic and Inline graphic. As in Section 2.1, we impose without loss of generality a linear order Inline graphic for the leaves of a tree; here, we have Inline graphic and Inline graphic. (B) Relabeling of trees t1 and t2. After relabeling, t1 and t2 have leaves labeled in the set Inline graphic of size Inline graphic. For the relabeling procedure, we choose (dotted circles) n1 elements among the n possible new labels Inline graphic. There are exactly Inline graphic different choices. The chosen elements relabel t1, whereas the elements not selected (dotted squares) relabel t2. With respect to the order Inline graphic, the ith label of t1 is assigned the label determined by the ith circle. Similarly, the ith label of t2 is assigned the label determined by the ith square. (C) After relabeling t1 and t2, the new tree t is obtained by appending t1 and t2 to a shared root node. Starting with trees t1 and t2 in (A), the same procedure can generate Inline graphic different trees t, one for each possible choice of the n1 elements (dotted circles) among the n new labels. The only exception is when Inline graphic, for which the Inline graphic relabelings generate each tree exactly twice.

The first equality is the definition of Inline graphic. In the second equality, the set of trees over which the sum is evaluated is partitioned into two parts, the 1-taxon tree Inline graphic and the trees of size larger than 1. In the third equality, the set of trees t with Inline graphic is realized taking all possible pairs of trees Inline graphic and applying to each pair the procedure in Figure 8, considering all Inline graphic possible relabelings of t1 and t2. The quantity Inline graphic in the sum Inline graphic is replaced by the product Inline graphic and the term Inline graphic is replaced by Inline graphic. Note the factor Inline graphic that appears in Equation (44) before the summation. This factor takes into account the fact that for each pair Inline graphic with Inline graphic, there exists a symmetric pair Inline graphic. Symmetric pairs generate exactly the same trees according to the procedure in Figure 8, and multiplying by Inline graphic is required to avoid double counting. When Inline graphic, the factor Inline graphic is still required because only half of the Inline graphic relabelings of t1 and t2 (Fig. 8B) create nonisomorphic trees when t1 and t2 are appended to a shared root node. Finally, observe that the number Inline graphic of root configurations in the 1-taxon tree is 0.

From Equation (44) and the definitions of Inline graphic and Inline graphic in Equations (2) and (31), algebraic manipulations yield

graphic file with name eq616.gif

11. Appendix 5. Proof of Equation (38)

The proof follows the case of Equation (32). For Inline graphic, the number Inline graphic can be obtained as the product Inline graphic, where t1 and t2 are the root subtrees of t. The tree decomposition described in Figure 8 yields

graphic file with name eq620.gif

Acknowledgments

The authors thank Elizabeth Allman, James Degnan, and John Rhodes for discussions and NIH grant R01 GM117590 for financial support.

Author Disclosure Statement

No competing financial interests exist.

References

  1. Aho A.V., and Sloane N.J.A. 1973. Some doubly exponential sequences. Fibonacci Q. 11, 429–437 [Google Scholar]
  2. Brown J.K.M. 1994. Probabilities of evolutionary trees. Syst. Biol. 43, 78–91 [Google Scholar]
  3. Degnan J.H., and Rosenberg N.A. 2006. Discordance of species trees with their most likely gene trees. PLoS Genet. 2, 762–768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Degnan J.H., Rosenberg N.A., and Stadler T. 2012. The probability distribution of ranked gene trees on a species tree. Math. Biosci. 235, 45–55 [DOI] [PubMed] [Google Scholar]
  5. Degnan J.H., and Salter L.A. 2005. Gene tree distributions under the coalescent process. Evolution 59, 24–37 [PubMed] [Google Scholar]
  6. de Mier A., and Noy M. 2012. On the maximum number of cycles in outerplanar and series-parallel graphs. Graphs Combinator. 28, 265–275 [Google Scholar]
  7. Disanto F., and Rosenberg N.A. 2015. Coalescent histories for lodgepole species trees. J. Comput. Biol. 22, 918–929 [DOI] [PubMed] [Google Scholar]
  8. Disanto F., and Rosenberg N.A. 2016. Asymptotic properties of the number of matching coalescent histories for caterpillar-like families of species trees. IEEE/ACM Trans. Comput. Biol. Bioinf. 13, 913–925 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Disanto F., Schlizio A., and Wiehe T. 2013. Yule-generated trees constrained by node imbalance. Math. Biosci. 246, 139–147 [DOI] [PubMed] [Google Scholar]
  10. Disanto F., and Wiehe T. 2013. Exact enumeration of cherries and pitchforks in ranked trees under the coalescent model. Math. Biosci. 242, 195–200 [DOI] [PubMed] [Google Scholar]
  11. Felsenstein J. 1978. The number of evolutionary trees. Syst. Zool. 27, 27–33 [Google Scholar]
  12. Flajolet P., and Sedgewick R. 2009. Analytic Combinatorics. Cambridge University Press, Cambridge [Google Scholar]
  13. Hammersley J.M., and Grimmett G.R. 1974. Maximal solutions of the generalized subadditive inequality. Pages 270–285 in Harding E.F., and Kendall D.G. eds. Stochastic Geometry. Wiley, London [Google Scholar]
  14. Harding E.F. 1971. The probabilities of rooted tree-shapes generated by random bifurcation. Adv. Appl. Probab. 3, 44–77 [Google Scholar]
  15. Harding E.F. 1974. The probabilities of the shapes of randomly bifurcating trees. Pages 259–269 in Harding E.F., and Kendall D.G., eds. Stochastic Geometry. Wiley, London [Google Scholar]
  16. Maddison W.P. 1997. Gene trees in species trees. Syst. Biol. 46, 523–536 [Google Scholar]
  17. McKenzie A., and Steel M. 2000. Distributions of cherries for two models of trees. Math. Biosci. 164, 81–92 [DOI] [PubMed] [Google Scholar]
  18. Rosenberg N.A. 2006. The mean and variance of the numbers of r-pronged nodes and r-caterpillars in Yule-generated genealogical trees. Ann. Comb. 10, 129–146 [Google Scholar]
  19. Rosenberg N.A. 2007. Counting coalescent histories. J. Comput. Biol. 14, 360–377 [DOI] [PubMed] [Google Scholar]
  20. Rosenberg N.A. 2013. Coalescent histories for caterpillar-like families. IEEE/ACM Trans. Comp. Biol. Bioinf. 10, 1253–1262 [DOI] [PubMed] [Google Scholar]
  21. Rosenberg N.A., and Degnan J.H. 2010. Coalescent histories for discordant gene trees and species trees. Theor. Popul. Biol. 77, 145–151 [DOI] [PubMed] [Google Scholar]
  22. Steel M., and McKenzie A. 2001. Properties of phylogenetic trees generated by Yule-type speciation models. Math. Biosci. 170, 91–112 [DOI] [PubMed] [Google Scholar]
  23. Than C., and Nakhleh L. 2009. Species tree inference by minimizing deep coalescences. PLoS Comp. Biol. 5, e1000501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Than C., Ruths D., Innan H., and Nakhleh L. 2007. Confounding factors in HGT detection: Statistical error, coalescent effects, and multiple solutions. J. Comput. Biol. 14, 517–535 [DOI] [PubMed] [Google Scholar]
  25. Wu Y. 2012. Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66, 763–775 [DOI] [PubMed] [Google Scholar]
  26. Yule G.U. 1925. A mathematical theory of evolution based on the conclusions of Dr. J.C. Willis, F.R.S. Phil. Trans. R. Soc. Lond. B 213, 21–87 [Google Scholar]

Articles from Journal of Computational Biology are provided here courtesy of Mary Ann Liebert, Inc.

RESOURCES