Abstract
Coalescent histories provide lists of species tree branches on which gene tree coalescences can take place, and their enumerative properties assist in understanding the computational complexity of calculations central in the study of gene trees and species trees. Here, we solve an enumerative problem left open by Rosenberg concerning the number of coalescent histories for gene trees and species trees with a matching labeled topology that belongs to a generic caterpillar-like family. By bringing a generating function approach to the study of coalescent histories, we prove that for any caterpillar-like family with seed tree t, the sequence (hn)n≥0 describing the number of matching coalescent histories of the nth tree of the family grows asymptotically as a constant multiple of the Catalan numbers. Thus, hn ~ βtcn, where the asymptotic constant βt > 0 depends on the shape of the seed tree t. The result extends a claim demonstrated only for seed trees with at most 8 taxa to arbitrary seed trees, expanding the set of cases for which detailed enumerative properties of coalescent histories can be determined. We introduce a procedure that computes from t the constant βt as well as the algebraic expression for the generating function of the sequence (hn)n≥0.
Keywords: Catalan numbers, caterpillar-like trees, coalescent, enumeration, generating functions, phylogenetics
1 Introduction
Coalescent histories, mathematical structures representing combinatorially distinct ways in which a given gene tree can coalesce along the branches of a given species tree, are important in a variety of phylogenetic problems [6], [14], [15]. They arise, for example, in proofs concerning theoretical properties of species tree inference algorithms [1], [18], in empirical analyses of gene tree probability distributions [16], and in studies of gene trees under hybridization [21]. Many of these applications trace to the appearance of coalescent histories in a sum performed in a fundamental calculation for inference of species trees from information on multiple genetic loci, the evaluation of gene tree probabilities conditional on species trees [5].
Owing to uses of coalescent histories in sets over which sums are computed, as well as in state spaces of certain phylogenetic Markov chains [7], [10], [11], solutions to enumerative problems involving coalescent histories contribute to an understanding of the computational complexity of phylogenetic calculations. A recursion for the number of coalescent histories for a given gene tree and species tree has been established [13], and several studies have reported exact numerical results and closed-form expressions for the number of coalescent histories for small trees and for specific types of trees of arbitrarily large size [4]–[6], [13]–[15], [19]. The latter computations have proceeded both by solving or deploying the recursion in specific cases [13]–[15], [19], as well as by identifying correspondences between coalescent histories and other combinatorial structures for which enumerative results have already been established [4]–[6].
One class of gene trees and species trees of particular interest for enumeration of coalescent histories is the caterpillar-like families, trees that have a caterpillar shape, except that the caterpillar subtree with r taxa is replaced by a subtree of size r that is not necessarily a caterpillar subtree (Fig. 1). For the simplest caterpillar-like family, the caterpillar trees themselves, if the gene tree and species tree have the same caterpillar labeled topology with n taxa, then, as reported in [5], the number of coalescent histories is a Catalan number,
(1) |
For Tr-caterpillar-like families, in which the r-taxon subtree of an n-taxon caterpillar species tree is replaced by an r-taxon subtree Tr (Fig. 1), by employing the recursion, Rosenberg [14] obtained the exact number of coalescent histories for all n, for each Tr with r ≤ 8, in the case that the gene tree and species tree have the same labeled topology. Rosenberg [14] argued that in each of these cases, as n → ∞, the number of coalescent histories is asymptotic to a constant multiple of the Catalan numbers. A proof of this result has been presented in full for each case with r ≤ 5 [4], [13], [14], and by computer algebra for cases with r = 6, 7, and 8 [14].
Each case considered by [14] involved cumbersome computations specific to the choice of Tr, limiting the generality of the approach. While no reason exists to suspect that the method of [14] would not extend to larger r, it is desirable to find another method that is practical for a general Tr. Here, using a substantially different strategy that brings to studies of coalescent histories the methods of analytic combinatorics, we produce an enumeration result that covers caterpillar-like families in general. We show that the result of [14] applies to all caterpillar-like families, not only those for which Tr has r ≤ 8. That is, we demonstrate that for any Tr, as n → ∞, the number of coalescent histories in the Tr-caterpillar-like family is asymptotic to a constant multiple of the Catalan numbers—thus extending a result known only for r ≤ 8 to arbitrarily large r. We describe a method and symbolic tool for computing the constant. Finally, we discuss the impact of the results in mathematical phylogenetics.
2 Preliminaries
2.1 Species trees and coalescent histories
We consider binary rooted leaf-labeled species trees, taking a single arbitrary labeling (without loss of generality) to represent a given unlabeled species tree topology. We consider an arbitrarily labeled species tree and its unlabeled tree interchangeably, treating the labeling as implicit.
We examine coalescent histories for the case in which gene trees and species trees have the same labeled topology t, terming a coalescent history in this case a matching coalescent history. To be a matching coalescent history, a mapping h from the internal nodes of t (viewed as the gene tree) to the branches of t (viewed as the species tree) must satisfy two conditions (Fig. 2): (a) for each leaf x in t, if x descends from node k in t, then x descends from branch h(k) in t; (b) for each pair of internal nodes k1 and k2 in t, if k2 descends from k1 in t, then branch h(k2) descends from or coincides with branch h(k1) in t. We henceforth consider only matching coalescent histories, treating “matching” as implicit; we also refer simply to histories for short.
2.2 Caterpillar-like families of species trees
For a binary species tree t with at least 2 taxa, we denote by (t(n))n≥0 the caterpillar-like family generated by seed tree t. This family is recursively defined by taking t(0) = t and letting t(n+1) be the tree obtained by appending t(n) and a single leaf to a shared root (Fig. 1).
Our interest is in the number of matching coalescent histories of t(n) for n ≥ 0, a quantity we denote by hn(t) or simply hn. We note that whereas [14] indexed trees by their numbers of taxa, here n represents the number of taxa appended above the root of the seed tree, so that if seed tree t has |t| taxa, then |t| + n gives the number of taxa in t(n).
2.3 Principles of analytic combinatorics
We rely on techniques of analytic combinatorics [8] to obtain our enumerative results, and recall several key points. In general, an integer sequence (an)n≥0 can be associated with a formal power series , also termed the generating function of the integers an. Considering z as a complex variable, typically in a neighborhood of 0, features of the function A(z) are related to the growth of the coefficients an.
More precisely, generating functions, considered as complex functions, enable analyses of the asymptotic growth of the associated integer sequences through the analysis of their singularities in the complex plane. In particular, under suitable conditions, there exists a general correspondence between the singular expansion of a generating function A(z) near its dominant singularities—those nearest the origin—and the asymptotic behavior of the associated coefficients an (Chapter VI of [8]). We make use of theorems that describe this correspondence.
2.4 Catalan numbers
The Catalan sequence appears often in combinatorics [8], [9], [17] and features prominently in our analysis. Rewriting eq. (1) with index n rather than n − 1,
(2) |
The associated generating function is well known [17]:
(3) |
By definition, if [zn]f(z) denotes the nth term in the power series expansion of f(z) at z = 0, we have
(4) |
Here, is replaced by , as the constant 1 does not contribute to the power series expansion for terms of order n+1, with n ≥ 0. Asymptotically, applying Stirling's formula to eq. (2), the Catalan sequence satisfies
(5) |
3 The number of matching coalescent histories for caterpillar-like families
We aim to find a procedure that evaluates the number of coalescent histories hn(t) for matching gene trees and species trees in the caterpillar-like family that begins with seed tree t, and moreover, to show that
(6) |
where the multiplier βt > 0 for the Catalan sequence is a constant depending on t. In other words, we wish to demonstrate that as n → ∞, hn/cn converges to a constant βt > 0 that depends on the seed tree t.
First, in Section 3.1, we determine a lower bound for the number of matching coalescent histories of the nth tree t(n) of the caterpillar-like family with seed tree t. Next, in Section 3.2, we introduce a concept of m-rooted histories of a species tree t(n). The section provides an iterative construction of the rooted histories of t(n+1) from those of t(n), describing the construction by means of a convenient labeling scheme. We follow a commonly used combinatorial enumeration strategy [2], [3] that determines a recursive succession rule for successive collections of objects in a sequence and then uses this rule to compute a generating function. In Section 3.3, we use the iterative construction to produce a bivariate generating function whose coefficients hn,m are the numbers of m-rooted histories for trees t(n). We next obtain the generating function for the integer sequence (hn)n≥0 describing the number of matching coalescent histories for the t(n). Finally, using the lower bound from Section 3.1, in Section 3.4, we apply methods of analytic combinatorics to study the asymptotic behavior of hn.
3.1 Lower bound for hn
For our asymptotic analysis, we will need an initial lower bound for hn. To produce this bound, we first define V as the tree with 2 taxa. Recalling that we index trees so that the number of taxa in a tree exceeds by n the number of taxa in the seed tree, we have [4], [13], [14]
We can then use a constructive procedure, illustrated in detail in Figure 3, to show that for any seed tree t with |t| ≥ 2,
(7) |
For a seed tree t, we can superimpose V on t so that the root rV of V matches the root rt of t (Fig. 3B). The two leaves of V are identified with two of the leaves of t, one on each side of the root of t. Generating caterpillar-like families by adding n single branches separately to V and to t, the superposition of V on t extends, so that V(n) is superimposed on t(n) (Fig. 3C). The n caterpillar branches of t(n) and V(n) then correspond.
Each matching coalescent history h of t(n) determines a corresponding matching coalescent history h′ of V(n) by considering the restriction of h to the set of internal nodes of t(n) that correspond to internal nodes of V(n) (Fig. 3D). Thus, for any seed tree t, the number of matching coalescent histories of t(n) is greater than or equal to that of V(n). In symbols, we have eq. (7). We will use this result in Section 3.4.
3.2 Iterative generation of rooted histories
This section describes the iterative procedure that for a seed tree t eventually enables us to determine a formula for hn. First, in Section 3.2.1, we discuss m-rooted histories, which extend the concept of matching coalescent histories, introducing an additional parameter m. Next, in Section 3.2.2, we examine the relationship between rooted histories and the extended coalescent histories of [13], importing results on extended coalescent histories into the more convenient framework of rooted histories. We expand our goal of enumerating matching coalescent histories for t(n), considering a more general problem of enumerating for m ≥ 1 the m-rooted histories of t(n).
In Section 3.2.3, we define an operator Ω for constructing the rooted histories of t(n+1) from the rooted histories of t(n). Next, in Section 3.2.4, we introduce a labeling scheme that in Section 3.2.5 enables us to switch from counting rooted histories to counting multisets of labels. At the end of Section 3.2, we will have converted our enumeration problem into an enumeration that is more convenient for constructing a generating function.
3.2.1 m-rooted histories
Consider a tree t with |t| ≥ 2, and suppose that the branch above the root of t (the root-branch) is divided into infinitely many components. A matching coalescent history mapping the internal nodes of t onto the branches of t is said to be m-rooted for m ≥ 1 if the root of t is mapped exactly onto the mth component of the root (Fig. 4). It is said to be rooted if it is m-rooted for some m. Branches are numbered so that branch m = 1 is immediately above the root node, and m is greater for components that are farther from the root.
For a rooted history h of a tree t, m = m(h) denotes the component of the root-branch of t that receives the image of the root of t. Hn,m(t) denotes the set of m-rooted histories of t(n), and the set of its rooted histories. The number of m-rooted histories of t(n) is hn,m = |Hn,m|, and the number of 1-rooted histories hn = hn,1 is also the number of matching coalescent histories. Enumerating the matching coalescent histories of t(n) is equivalent to enumerating its 1-rooted histories.
3.2.2 Rooted histories and extended histories
Rooted histories are closely related to extended coalescent histories, as defined by [13]. We use this relationship to study properties of rooted histories. Rosenberg [13] defined the set of k-extended coalescent histories of a tree t with |t| ≥ 1 for integers k ≥ 1; we also consider k = 0 by setting the number of 0-extended histories to 0.
A k-extended history is defined as a coalescent history for a species tree whose root-branch is divided into exactly k ≥ 0 parts. In other words, the root-branch has exactly k ≥ 0 possible components onto which a k-extended history can map the gene tree root. Here we consider matching k-extended histories, so that the internal nodes of a tree t are mapped to the branches of t and its k components above the root. For convenience, we refer to extended histories by the index k, reserving the index m for rooted histories.
By the definitions of k-extended and m-rooted histories, for each k ≥ 0, the set of k-extended histories of a tree is exactly the set of all m-rooted histories with 1 ≤ m ≤ k. Therefore, for a tree t with at least 2 leaves, if we label by et,k its number of k-extended histories, then for each m ≥ 1 the number of m-rooted histories of t is
(8) |
Note that for m = 1, we explicitly use in eq. (8) the fact that et,0 is defined and equal to 0. In addition to setting et,0 = 0 for any tree t, as in [13] we set et,k = 1 for all k ≥ 1 in the case that t has exactly 1 leaf.
Suppose |t| ≥ 1 and k ≥ 0. Denote by tL and tR the left and right subtrees of the root of t. We can compute et,k recursively as in Theorem 3.1 of [13]:
(9) |
As was already observed in the remarks following Corollary 3.2 of [13], by eq. (9), for any tree t with |t| ≥ 1, for positive integers k ≥ 1, the function f(k) = et,k is a polynomial in k. With our extension to permit k = 0, we can extend this fact to k ≥ 0 for |t| ≥ 2: for any tree t with |t| ≥ 2, and for k ≥ 0, we claim that the function f(k) = et,k is a polynomial in k. Note that in allowing k = 0, we claim et,k is a polynomial in k only for |t| ≥ 2; for |t| = 1, et,k is not a polynomial in k because et,0 = 0 and et,k = 1 for k ≥ 1.
To prove the claim, fix t with |t| ≥2 and consider the variable k over domain [1, ∞). We demonstrate that f(k) is a polynomial in k for domain [0, ∞) by showing that the closed-form for f(k) has a factor of k, so that our choice et,0 = 0 in eq. (9) is compatible with the polynomial expression valid for k ≥ 1.
Observe that for i ≥ 1, etL,i and etR,i are polynomials in i, say PtL(i) and PtR(i). Replacing terms etL,i+1 and etR,i+1 in the recursion in eq. (9) by polynomials PtL(i + 1) and PtR(i + 1), we obtain
(10) |
where P′(i) denotes a polynomial in i that results from the product of PtL(i + 1) and PtR(i + 1). By Faulhaber's formula for sums of powers of integers, symbolic sums of the form for a fixed integer p ≥ 0 are polynomials containing a factor of k in their closed forms (Section 6.5 of [9])—for example, . Thus, because the polynomial P′(i) is a linear combination of terms of the form ip, the closed-form expression for the sum appearing in eq. (10) also has a factor of k. It therefore has a value of 0 at k = 0.
Functions et,k for trees t with 1 ≤ |t| ≤ 9 and k ≥ 1 appear in Tables 1-4 of [13]. For |t| ≥ 2, as we have shown, these example polynomials are divisible by the variable representing the number of components of the root-branch. By eq. (8), we immediately obtain the following result.
Proposition 1
For any tree t with |t| ≥ 2 and for m ≥ 1, the number h0,m of m-rooted histories of t is a polynomial in m that can be computed by the difference in eq. (8) using et,k as in eq. (9).
As an example of Proposition 1, consider the tree t = ((A, B), (C, D)), identifying this arbitrary labeling with the unlabeled tree (()()). By applying the recursive procedure in eq. (9), we find that for k ≥ 0, the number of k-extended coalescent histories for t is [13]. The difference eq. (8) yields that for m ≥ 1, the number of m-rooted histories of t is h0,m = et,m − et,m−1 = m2 + 2m + 1.
3.2.3 Rooted histories of t(n+1) from those of t(n)
This section introduces an operator Ω that generates the rooted histories of t(n+1) from those of t(n). For each rooted history h′ of t(n+1), there exists exactly one rooted history h of t(n) with h′ ∈ Ω(h). Recalling the definitions of the sets Hn,m(t) and Hn(t) of m-rooted and rooted histories of t(n), we define Ω as follows.
Definition
Let denote the power set of set X, and fix tree t. The operator Ω is a function
where for a rooted history h ∈ Hn(t), Ω(h) is the set of rooted histories h′ ∈ Hn+1(t) for which the restriction of h′ to t(n+1) excluding its most basal caterpillar branch coincides with the rooted history h of t(n).
Denote by b1, b2, . . . , bn+1 the caterpillar branches in t(n+1), from the least basal b1 to the most basal bn+1 (Fig. 5). Upon removal of the most basal caterpillar branch bn+1 from t(n+1), the root of t(n+1)—to which branch bn+1 is attached—is replaced by a demarcation between the first and second components of the root-branch of t(n). For instance, in Fig. 5A, starting from tree t = ((A, B), (C, D)), we consider h‴, a 3-rooted history of t(3). By removing the most basal caterpillar branch b3 of t(3), we reduce to the 1-rooted history h″ of t(2) (Fig. 5B). Next, by removing the caterpillar branch b2 of t(2), we reduce to the 2-rooted history h′ of t(1) (Fig. 5C). By removing the remaining caterpillar branch b1 from t(1), we reduce to the 2-rooted history h of t = t(0) (Fig. 5D). Therefore, by the definition of Ω, we have h′ ∈ Ω(h), h″ ∈ Ω(h′), and h‴ ∈ Ω(h″).
By definition, Ω has the property that for each rooted history h′ ∈ Hn+1(t), with n ≥ 0, there exists exactly one rooted history h ∈ Hn(t) such that h′ ∈ Ω(h). In other words, for each n ≥ 0, the set of rooted histories Hn+1(t) can be partitioned as a disjoint union,
(11) |
The set Hn+1(t) is therefore generated without double occurrences of any rooted history by applying Ω to the rooted histories in Hn(t). It follows immediately that in performing n iterations of Ω to obtain Ω[. . . [Ω[Ω(H0)]] . . .] from the set H0 of rooted histories of t(0), all the rooted histories of t(n) are generated exactly once.
3.2.4 Labels for rooted histories
The operator Ω, starting from the rooted histories of t(n), generates the rooted histories of t(n+1). In this section, we introduce a labeling scheme, giving each m-rooted history h of t(n) a label L(h) = (n, m). We then describe how Ω acts on the labels of the rooted histories, characterizing the set of labels L[Ω(h)] = {L(h′) : h′ ∈ Ω(h)}. Our goal is to represent each set Hn of rooted histories of t(n) by the multiset of its labels, reducing the enumeration of |Hn,m| to the problem of counting certain ordered pairs (n, m) iteratively generated by simple rules that reflect how the rooted histories in Hn+1 are generated according to rule Ω from the rooted histories in Hn by eq. (11).
In our labeling, each rooted history h ∈ Hn(t) that maps the root of t(n) onto the mth component of the root-branch of t(n) receives label L(h) = (n, m). Enumeration of hn = |Hn,1| then reduces to enumeration of those rooted histories labeled by (n, 1).
Note that a label (n, m) does not uniquely specify an m-rooted history of t(n): a tree t(n) has in general many m-rooted histories, each receiving the label (n, m). In other words, if h, h̄ ∈ Hn(t) and L(h) = L(h̄), then h and h̄ are not necessarily the same rooted history of t(n). We will, however, consider for n ≥ 0 multisets of labels in which we find a copy of the label (n, m) for each m-rooted history of t(n).
To characterize how the operator Ω acts on the labels for rooted histories, consider an m-rooted history h ∈ Hn(t), so that h maps the root of t(n) onto the mth component of the root-branch of t(n). This history is labeled L(h) = (n, m). For instance, taking the seed tree t = ((A, B), (C, D)), the history h of t = t(0) depicted in Figure 6A is labeled L(h) = (0, 3), whereas the history h of t(1) in Figure 6C has L(h) = (1, 1).
By applying Ω to a history h of t(n) with L(h) = (n, m), we produce a set of rooted histories Ω(h) ⊆ Hn+1(t). The set of labels for Ω(h),
is determined according to the rule:
(12) |
where m′ denotes the value of the parameter m—the component of the root-branch of t(n+1) to which the root is mapped—for the rooted histories h′ ∈ Ω(h) of t(n+1).
The rule in eq. (12) distinguishes between two cases depending on whether the value of the parameter m = m(h) of the generating rooted history h is equal to or exceeds 1. In both cases, the set L[Ω(h)] contains infinitely many labels, each with its first component equal to n+1, as the labels refer to rooted histories of t(n+1). The value of the second component m′ ranges in [m − 1, ∞) if m ≥ 2, and in [1, ∞) if m = 1.
Recall that according to the definition of Ω, from an m-rooted history h of t(n) (Fig. 6A and 6C), we generate an m′-rooted history h′ ∈ Ω(h) of t(n+1) (Fig. 6B and 6D) by (i) choosing the component m′ of the root-branch of t(n+1) onto which h′ maps the root of t(n+1), and (ii) letting h′ coincide with h on all nodes of t(n+1) except the root. The rooted history h′ coincides with h once we remove the most basal caterpillar branch of t(n+1).
Figure 6 illustrates both cases of eq. (12). In step (i), infinitely many choices of m′ are possible, because the root-branch of t(n+1) is divided into infinitely many parts. The most basal caterpillar branch in t(n+1) is attached at the border between the first and second components of the root-branch of t(n). Thus, the addition of the (n + 1)st caterpillar branch eliminates a component of the root-branch, so that if the starting rooted history h has m ≥ 2 (Fig. 6A), then the root of t(n) maps to component m − 1 of the root-branch of t(n+1). The root of t(n+1) can map to this same branch, or to any branch m′ with m′ ≥ m − 1. For instance, in Figure 6B, one of the rooted histories h′ generated by a rooted history h with m = 3 has m′ = m − 1 = 2.
If h has m = 1, however, then production of h′ is slightly different (Fig. 6C). By definition, the parameter m for a rooted history cannot be smaller than 1. The value m′ = m − 1 is not permitted, and m′ remains greater than or equal to m = 1 (Fig. 6D).
3.2.5 Counting the labels of rooted histories
The labeling scheme in Section 3.2.4 encodes the application of the operator Ω to the rooted histories of t(n). Now that we have described the set of labels L[Ω(h)] arising from the label L(h) according to the rule in eq. (12), the problem of counting a set of rooted histories becomes a problem of counting the set of the associated labels along with their multiplicities—or the multiset of the labels.
For n ≥ 0 and m ≥ 1, we use Ω((n, m)) to denote, with an abuse of notation, the set of labels L[Ω(h)] when L(h) = (n, m). Recalling that iterative application of Ω to the rooted histories H0 of tree t0) generates the rooted histories Hn of t(n), the enumeration of |Hn,m| for tree t = t(0) becomes a problem of counting those labels of the form (n, m) that are generated when we iteratively apply the operator Ω as Ω[. . . [Ω[Ω(L0)]] . . .] starting from the multiset of labels L0 = {L(h) : h ∈ H0(t)} (Fig. 7).
Eq. (12) characterizes the set of labels L[Ω(h)] of the rooted histories in Ω(h) in terms of the label L(h) of rooted history h. If L(h) = (n, m), then Ω((n, m)) denotes the set of labels L[Ω(h)]. Thus, converting the notation from histories to labels, eq. (12) becomes
(13) |
For the seed tree t, we count hn,m = |Hn,m| by evaluating number of occurrences of the ordered pair (n, m) in the multiset Ln defined as
(14) |
In symbols, we have
(15) |
By eq. (11), each multiset Ln is generated iteratively (Fig. 7). We start with the multiset of labels
(16) |
For each n ≥ 0, the multiset Ln+1 is obtained as
(17) |
where the symbol ⨄ denotes the union operator for multisets. Thus, in M = M1 ⨄ M2, if an element x appears n1 times in M1 and n2 times in M2, then it appears n1 + n2 times in M. Eq. (17) provides an iterative generation of the labels for the rooted histories of Hn+1(t) from the labels of the rooted histories of Hn(t), retaining information about the multiplicity of occurrences of each label.
3.3 Rooted histories and generating functions
We have now obtained eq. (15), which gives an equivalence between the number of m-rooted histories of t(n) and the number of labels (n, m) in the multiset Ln, and eqs. (16) and (17), which give through Ω (eq. (13)) an iterative procedure that generates the family of multisets (Ln)n≥0. In this section, we translate the iterative procedure into algebraic terms, determining the generating function associated with the integer sequence (hn)n≥0.
First, in Section 3.3.1, we characterize a generating function g(y) for the sequence (h0,m)m≥1. Next, in Section 3.3.2, we deduce an equation satisfied by the bivariate generating function F(y, z) for (hn,m)n≥0,m≥1. In Section 3.3.3, we solve the equation, obtaining the desired generating function f(z) for the sequence (hn,1)n≥0. This generating function can be written in turn as a function of g(y).
3.3.1 Generating function for (h0,m)m≥1
In this section, we characterize the generating function g(y) that counts for a given seed tree t the labels in the multiset L0 describing the labels of the rooted histories of t.
Fix the seed tree t. Recalling the equivalence in eq. (15), define the generating function
(18) |
the mth coefficient of whose power series expansion provides the number h0,m of labels (0, m) appearing in L0. By Proposition 1, h0,m can be expressed as a polynomial in the variable m and can thus be decomposed as a finite linear combination of terms of the form mk, where k is a non-negative integer. That is, for a certain finite set of non-negative integers with largest element K,
(19) |
where the wk are constants.
We introduce generating functions gmk, one for each k from 0 to K, in which the mth coefficient is mk:
(20) |
Because K is finite, the desired generating function g(y) can be written as a finite linear combination of this new collection of generating functions gm0 (y), gm1 (y), . . . , gmK (y). More precisely, by substituting in eq. (18) the polynomial in eq. (19) and switching the order of summation, we obtain
(21) |
We now state a lemma that characterizes the generating functions gmk (y)
Lemma 1
For each non-negative integer k from 0 to K, the generating function gmk (y) in eq. (20) is rational with denominator (1 − y)k+1. That is, gmk (y) has the form
where P(y) is a polynomial in y.
Proof
We proceed by induction on k. If k = 0, then by eq. (20), gm0 (y) = 1/(1 − y)−1 = y/(1 − y). Assume the inductive hypothesis for gmk (y). Applying eq. (20) to gmk+1 (y), we can recover gmk+1 (y) as
(22) |
which by the quotient rule for derivatives is a rational function with denominator (1 − y)k+2.
The proof of the lemma gives a recursive procedure in eq. (22) to compute the functions gmk (y). By eq. (21), we immediately obtain from the lemma a result about the generating function g(y).
Proposition 2
The generating function g(y) whose mth coefficient [ym]g(y) is the number of m-rooted histories h0,m of a seed tree t can be written as a finite linear combination
(23) |
where b ≥ 1 and J ≥ 1 are positive integers, each aj is a non-negative integer, and the qj are constants.
As an example, we show how the procedure in Proposition 2 can determine the generating function g(y) for t = ((A, B), (C, D)), the same example seed tree for which we computed the polynomial h0,m via Proposition 1. Recall from Section 3.2.2 that h0,m = m2 + 2m + 1. To obtain the generating function g(y) that has coefficients [ym]g(y) = m2 + 2m + 1, we sum generating functions for monomials m2, 2m, and 1. We already know gm0 (y), and by applying eq. (22), we have
Thus,
(24) |
In eq. (24), g(y) is written as in eq. (23), taking b = 3, J = 3, (a1, a2, a3) = (1, 2, 3), and (q1, q2, q3) = (4, −3, 1).
3.3.2 Bivariate generating function for (hn,m)n≥0,m≥1
Given t, the polynomial nature of h0,m in m enabled us to obtain a generating function for h0,m. We now use the iterative procedure in eq. (17) to determine an equation that characterizes the bivariate generating function with coefficients hn,m. We represent each label of the form (n, m) by a symbolic algebraic expression in the variables y and z, so that (n, m) is replaced by znym. Let be the multiset of all m-rooted histories for all trees t(n). Considering y and z as complex variables in two sufficiently small neighborhoods of 0, we aim to characterize the bivariate function F(y, z) that admits the expansion
where the sum is over all labels in the multiset L and thus has a term for each m-rooted history of each t(n). In particular, the function F(y, z) is the bivariate generating function of the integers hn,m, and its Taylor expansion can be written as
(25) |
where the coefficients hn,m appear explicitly.
By differentiating F(y, z) with respect to y and then taking y = 0, we obtain
(26) |
Thus, for each n ≥ 0, we have
By representing each label of the form (n, m) by the symbolic expression znym and assuming the complex variables y and z are sufficiently close to 0, the recursive generation in eq. (17) of the multisets of labels L0, L1, L2, . . . determines an equation for F(y, z), demonstrated in Appendix 1:
(27) |
Eq. (27) holds if the complex variables y and z are in two sufficiently small neighborhoods of 0, and it characterizes the generating function F(y, z).
3.3.3 Generating function for (hn,1)n≥0
We now have an equation satisfied by the bivariate generating function F(y, z). Further, we have eq. (26), which demonstrates that the desired generating function for the sequence (hn)n≥0 is obtained from . By applying the kernel method [2], [12], we can determine the power series from eq. (27).
The idea of the method consists of coupling the two variables (z, y) as (z, y(z)) in such a way that two conditions hold. First, (i) substituting y = y(z) cancels the kernel of the equation, that is, the factor 1−z/[y(1−y)] on the left-hand side of eq. (27). Second, (ii) for z near 0, the value of y(z) remains in a sufficiently small neighborhood of y = 0, so that eq. (27) still holds near z = 0 after substituting y = y(z). This condition is required, as the power series expansion in eq. (25) for F(y, z) has been assumed to be valid in a neighborhood of (y, z) = (0, 0), and the derivation of eq. (27) relies on the fact that y and z are sufficiently close to 0. If the two conditions hold, then
so that g(y(z)) must be a power series for z = 0, because so must be .
The required substitution couples y and z in such a way that 1 − z/[y(1 − y)] = 0, so that . To determine whether to take the negative root y1(z) or the positive root y2(z), we note that if z is near 0, then y1(z) approaches 0, so that y1(z) lies in a neighborhood of y = 0 and g(y1(z)) admits a power series expansion for z near 0. For y2(z), however, if z is near 0, then y2(z) approaches 1, and thus, g(y2(z)) is not a power series for z near 0 due to the pole of the function g(y) at y = 1 (Proposition 2). The only solution satisfying both (i) and (ii) is consequently
(28) |
which, with the generating function C(z) of the Catalan numbers as in eq. (3), satisfies Y(z) = zC(z). Substituting y = Y(z) in eq. (27), we have , yielding the following result.
Proposition 3
Fix tree t. Let g(y) be the generating function associated with the polynomial h0,m (eq. (18)). Let Y(z) be as in eq. (28). Then the generating function is given by
(29) |
The proposition thus determines the generating function f(z) = g(Y(z))/z for the integer sequence describing the number of matching coalescent histories of species trees in the caterpillar-like family (t(n))n≥0. The function g depends on the seed tree t, whereas Y(z) is fixed in eq. (28) and does not depend on t.
As an example, recall that for t = ((A, B), (C, D)), in eq. (24), we have computed the generating function g for the number h0,m of m-rooted histories of t = t(0). By Proposition 3, the generating function for the number hn of matching coalescent histories of t(n) is
Taking the Taylor expansion of f, we obtain
(30) |
The coefficients hn accord with the enumeration of matching coalescent histories reported in Corollary 3.9 of [13] and Table 3 of [14] for caterpillar-like families with seed tree t = ((A, B), (C, D)), except that those results tabulated numbers of coalescent histories by the number of taxa, whereas here, we use the index of the caterpillar-like family. Thus, in this example, the coefficient of zn gives the number of matching coalescent histories for a tree with n+4 taxa, as |t| = 4. Shifting the index in the formula from [13], [14] to agree with our indexing scheme, we obtain [(5(n+4) − 12)/(4(n + 4) − 6)]c(n+4)−1 = [(5n + 8)/(4n + 10)]cn+3 for the number of matching coalescent histories of t(n). This formula gives precisely the coefficients in the Taylor expansion in eq. (30).
3.4 Asymptotic behavior of hn
From Proposition 3, we have the generating function f that counts matching histories of t(n) for a given fixed seed tree t. Applying techniques of analytic combinatorics as introduced in Section 2.3, we can determine the asymptotic behavior of the coefficients of the generating function
(31) |
with Y(z) as in eq. (28). To simplify notation, we work with f̃ instead of f.
First, in Section 3.4.1, we obtain an asymptotic equivalence between hn and βtcn, where βt is a constant depending on the seed tree t, and the cn are the Catalan numbers (eq. (1)). Next, in Section 3.4.2, we produce a general procedure to determine the constants βt, employing this procedure to obtain values of βt for all seed trees t with |t| ≤ 9. We demonstrate that our values of βt accord with constant multiples of the Catalan numbers previously obtained according to a different method [14] for seed trees with |t| ≤ 8.
3.4.1 A general asymptotic result
Recall that given t, Proposition 2 provides a procedure to determine the rational function g in eq. (31). Writing g as the finite linear combination in eq. (23), the values of b, J, and the (aj)1≤j≤J and (qj)1≤j≤J can all be computed.
As noted in Section 2.3, the expansion of f̃ at its dominant singularity characterizes the asymptotic behavior of the coefficients hn−1. Appendix 2 obtains this expansion at the dominant singularity ,
(32) |
(33) |
with
(34) |
(35) |
Note that in eq. (32), the seed tree affects only the constants αt and βt computed in eqs. (34) and (35) from g, as written in the linear combination in eq. (23). Excluding the constant αt that does not influence the asymptotic behavior of the coefficients, the main term of the expansion of f̃(z) (eq. (33)) is the product of βt and the generating function , whose nth is Catalan number cn−1 (eq. (4)).
Theorem VI.4 of [8] indicates that under conditions satisfied by f̃, the asymptotic coefficients of a generating function as n → ∞ are obtained from the expansion of the function at the dominant singularity; moreover, the error term in the asymptotic coefficients can be computed from the error term in the singular expansion. Applying the theorem to the expansion in eq. (32), we obtain the asymptotic behavior of the coefficients [zn]f̃(z) = hn−1.
Proposition 4
For any seed tree t, when n → ∞, the number hn of matching coalescent histories for t(n) satisfies
(36) |
where βt is a constant that depends on t. The constant βt is computed in eq. (35) once the function g, defined in eq. (18), is written as the linear combination in eq. (23).
We immediately obtain the following corollary, corresponding to our initial claim in eq. (6).
Corollary 1
For any seed tree t, there exists a constant βt > 0 (eq. (35)) such that when n → ∞,
(37) |
Proof
The result follows from Proposition 4 by noting that if βt > 0, then
Note that we are claiming βt > 0. From the definition of βt in eq. (35), because the qj are permitted to be negative, it is not immediately clear that βt > 0. Proposition 4 eliminates the possibility that βt is negative, as hn−1 is necessarily positive. To show that βt ≠ 0, note that by eq. (36), βt = 0 would give
(38) |
so that hn−1/(4n/n2) would remain bounded by a constant as n → ∞.
We now apply the lower bound hn ≥ cn+1 from eq. (7). By eq. (7), we have
As n → ∞, diverges to ∞, while converges to 1 by eq. (5). Therefore, the sequence hn−1/(4n/n2) must diverge and eq. (38) cannot hold. Thus, βt ≠ 0.
As an example of Corollary 1, consider t = ((A, B), (C, D)). By decomposing the function g of eq. (24) as in eq. (23), we have already obtained the parameters b, J, (aj)1≤j≤J, and (qj)1≤j≤J in Section 3.3.1. Therefore, computing βt as in eq. (35), we obtain
Eq. (37) then produces hn ~ 80cn. Note that the limit produced for this tree from hn = [(5n + 8)/(4n + 10)]cn+3 in Section 3.3.3 agrees with the limiting result hn ~ 80cn. Recalling eq. (2),
3.4.2 Determining βt from the seed tree t
We have shown in Corollary 1 that the number of matching coalescent histories hn for the caterpillar-like family t(n) is, for a constant βt, asymptotic to βtcn. We can now assemble our results to describe a procedure that given a seed tree t with |t| ≥ 2 determines both the generating function with coefficients hn and the constant βt.
-
(i)
Determine by eq. (9) the polynomial et,k in k ≥ 0 that counts k-extended histories of t.
-
(ii)
Compute from eq. (8) the polynomial in m that counts for m ≥ 1 the number of m-rooted histories of t.
-
(iii)
Obtain the generating function with coefficients h0,m by using Proposition 2.
-
(iv)
Determine the generating function with coefficients hn by applying Proposition 3.
-
(v)
Write g(y) as a linear combination according to eq. (23), determining the values of b, J, and the aj and qj.
-
(vi)
Compute the asymptotic constant βt from eq. (35).
We have programmed this procedure in Mathematica; starting from a given seed tree t, our program CatFamily.nb can automatically compute for the caterpillar-like family t(n) the generating function with coefficients hn and the asymptotic constant βt. Using this program, we have evaluated βt for each seed tree with 9 taxa, collecting the results in Table 1.
TABLE 1.
Seed tree t | β t | Seed tree t | β t | ||
---|---|---|---|---|---|
65,536 | 1 | 128,864 | |||
81,920 | 166,624 | ||||
94,208 | 197,296 | ||||
104,448 | 224,704 | ||||
138,240 | 308,576 | ||||
118,784 | 262,000 | ||||
113,408 | 250,272 | ||||
148,480 | 339,504 | ||||
177,664 | 417,632 | ||||
141,312 | 326,240 | ||||
193,536 | 464,128 | ||||
121,472 | 182,912 | ||||
157,888 | 243,904 | ||||
187,776 | 296,064 | ||||
214,720 | 344,512 | ||||
296,192 | 487,808 | ||||
251,136 | 410,112 | ||||
162,560 | 214,016 | ||||
219,136 | 306,112 | ||||
268,288 | 294,784 | ||||
177,664 | 425,216 | ||||
249,344 | 366,720 | ||||
353,536 | 532,224 |
Values of βt appear for each of the 46 unlabeled species trees with 9 taxa. For each species tree t, we also provide the constant (eq. (40)). Trees are listed in increasing order by rank as defined in Section 2 of [14]. In the left column, each seed tree t belongs to a caterpillar-like family (t̃(n))n, with |t̃| < 9. In these cases, we recover the values of as determined in Table 3 of [14].
Recall that Rosenberg [14] reported the asymptotic constant multiples of the Catalan numbers, , which represent asymptotic numbers of coalescent histories for seed trees with up to 8 taxa, indexing the results by the number of taxa m rather than by the index n of the caterpillar-like family. Also recall that for seed tree t, tree t(n) has m = |t| + n taxa (Fig. 1). In the notation of [14], writing Atm,1 as the number of matching coalescent histories in the caterpillar-like tree with seed tree t and m ≥ |t| taxa, we have hn = Atm,1.
By eq. (5), we have the asymptotic equivalence cn ~ cn+k/4k for each positive integer k. Therefore,
(39) |
where the asymptotic constant βt of Corollary 1 is normalized to obtain
(40) |
This computation converts the asymptotic constant multiple βt of cn into a corresponding multiple of cm−1, as reported in [14] for small trees. Comparing Table 1 with Table 3 of [14], we see that for the cases examined by [14], the values of we compute from the associated βt agree with the values that were previously reported. This agreement is unsurprising; our method for calculating the constants βt and is simply a computational implementation based on our theorems, and the agreement confirms the validity of the implementation. Although [14] considered only |t| ≤ 8, our method applies for arbitrary |t|.
Evaluation of βt proceeds quadratically in |t|. The recursive step (i) requires at most |t| − 1 recursive calls, one for each internal node of t. Step (ii) is a polynomial subtraction at most linear in |t|, producing the polynomial h0,m with order at most equal to the order of et,m minus 1—that is, at most |t| − 2. Step (iii) determines the generating function g(y) (eq. (18)) from h0,m and the generating functions gmk (y) (eq. (20)). For each k with 0 ≤ k ≤ |t| − 2, gmk (y) is computed in k recursive calls of eq. (22). As the order of h0,m is at most |t| − 2, the total cost for calculating g(y) is thus quadratic in |t|. Steps (iv), (v), and (vi) do not involve recursion and are at most linear in |t|. Thus, because step (iii) is the most expensive step, we see that the cost of the procedure that determines the asymptotic constant βt increases as .
4 Conclusions
In this paper, we have solved a problem left open by [14] on determining the number of coalescent histories for gene trees and species trees that have a matching labeled topology and that belong to a generic caterpillar-like family. We have proven that for any seed tree t, the integer sequence (hn)n≥0, whose nth element represents the number of matching coalescent histories of the caterpillar-like tree t(n), grows asymptotically as a constant multiple of the Catalan numbers, that is, hn ~ βtcn, where the constant βt > 0 depends on the shape of the seed tree t. Rosenberg [14] had previously obtained this result for seed trees with at most 8 taxa; here, by using a succession rule for recursive enumeration and then applying techniques of analytic combinatorics, we have not only proven the existence of the constant βt for seed trees of any size, we have also produced a procedure that computes βt as well as the expression for the generating function of the integers (hn)n≥0.
The numerical results on the constants βt extend the empirical observation of [14] that the caterpillar-like families that produce the largest numbers of matching coalescent histories are those whose seed tree has a high level of balance. By extending from seed trees with |t| ≤ 8 taxa to those with |t| = 9, we observe that the constants βt for the caterpillar-like families with the largest and smallest numbers of matching coalescent histories become further separated, so that for n large, many more coalescent histories exist by which a gene tree can match the species tree for some species trees than for others. For the 9-taxon seed tree with the largest , compared to for the seed tree with the smallest . Our procedure for evaluating βt and as a function of the seed tree can now enable further systematic analyses of the correlates of the constants βt and , to facilitate additional explorations of determinants of the numbers of matching coalescent histories.
Nevertheless, although the constants βt and do depend on the seed tree, we have shown that otherwise, all caterpillar-like families are asymptotically equivalent in their numbers of matching coalescent histories. Computation time is often a challenge in phylogenetic problems, as the discrete structures of phylogenetics can grow rapidly in number with the number of taxa. Our results contribute to the study of computational complexity in phylogenetics, as the complexity of the evaluation of probabilities important in characterizing gene tree distributions [5] is proportional to the number of coalescent histories. That all caterpillar-like families have the same growth pattern up to a constant suggests that as the number of taxa increases, such evaluations will be comparably complex for all caterpillar-like trees. In large trees, the caterpillar branches contribute to the asymptotic growth of the number of matching coalescent histories—which follows a multiple of the Catalan numbers—and the seed tree only to the constant by which the Catalan numbers are multiplied.
The extent to which other tree families follow the Catalan sequence in their numbers of matching coalescent histories remains unknown, though we have recently found a family, the lodgepole family—defined iteratively by setting λ0 to a tree with one taxon and sequentially forming λn+1 by appending λn and a cherry to a shared root—for which the number of matching coalescent histories grows faster than with a constant multiple of the Catalan numbers [6]. Further analysis of this heterogeneous behavior of the increase in the number of coalescent histories will be useful in performing comparisons of coalescent history algorithms with algorithms that obtain similar phylogenetic probabilities but that do not rely on coalescent histories [20]. The use of our substantially different approach employing analytic combinatorics opens new methods for theoretical analysis of coalescent histories and can potentially assist in understanding when Catalan-like growth, the rapid growth of the lodgepole family, and intermediate or perhaps still faster growth patterns will apply.
We note, however, that our strategy for evaluating the asymptotic properties of the number of coalescent histories in caterpillar-like families has, like the work of [14], relied on the fact that the difficulty of the general problem of enumerating coalescent histories is partly evaded by restricting attention to caterpillar-like trees. In the recursion for the number of coalescent histories given a matching gene tree and species tree [13, eq. 1], a term arising from the subtree with fewer branches collapses to 1 for the caterpillar case, greatly simplifying the recursion. This reduction enabled the work of [14] for caterpillar-like families, and it also enables our approach of iteratively adding single-taxon branches to define the operator Ω and the generating function hn,m. Thus, in enumerating coalescent histories for matching lodgepole gene trees and species trees, we proceeded by a different method, establishing a bijection between coalescent histories and established combinatorial structures [6]. We do expect, however, that a generating function approach will be fruitful in other scenarios, perhaps including cases with gene trees and species trees that are caterpillar-like, but non-matching.
Acknowledgments
We acknowledge grant support from the National Science Foundation (DBI-1146722) and the National Institutes of Health (R01 GM117590). A Mathematica notebook CatFamily.nb implementing the procedure in Section 3.4.2 for obtaining from a seed tree t the generating function f(z), the coefficients hn, and the constant βt is available from the authors.
Biographies
Filippo Disanto received the PhD degree in theoretical computer science from both the University of Siena and the University of Paris VII in 2010. After receiving the PhD degree, he was a postdoc at CNRS in Montpellier and at the Institut für Genetik, University of Cologne. Since November 2013, he has been a postdoc in the Rosenberg Laboratory, Stanford University. His main research interests include combinatorics and its applications.
Noah A. Rosenberg received the PhD degree in biological sciences from Stanford University in 2001 and completed postdoctoral training at the University of Southern California. He was on the faculty of the University of Michigan from 2005 to 2011, and he is currently a professor in the Department of Biology at Stanford University. His research interests include human evolutionary genetics, population-genetic theory, and mathematical phylogenetics.
Appendix 1. The equation for F(y, z)
In this appendix, we complete the derivation of eq. (27) satisfied by F(y, z). In the generating function F(y, z) (eq. (25)), each monomial znym corresponds to a label (n, m) ∈ Ln that in turn represents an m-rooted history of t(n). Recall that the multisets of labels L0, L1, L2, . . . (eq. (14)) can be iteratively generated according to eq. (17) through the operator Ω defined in eq. (13), starting from the multiset L0. Also recall that by considering the multiset of labels , we can write . We use the iterative generation of the family of multisets (Ln)n≥0 to obtain an equation for F.
By eq. (13), for n ≥ 0 and m ≥ 2, for each occurrence in Ln of a label (n, m), a copy of each label in set
belongs to the multiset Ln+1. Thus, in algebraic terms, each time that an expression znym with n ≥ 0 and m ≥ 2 is counted in the generating function F—written znym ∈ F in what follows—the terms appear in F as well. Summing over all znym ∈ F with n ≥ 0 and m ≥ 2, we obtain
(41) |
Similarly, for n ≥ 0 and m = 1, for each occurrence in Ln of a label (n, 1), a copy of each label in set Ω((n, 1)) = {(n + 1, j) : j ≥ 1} appears in multiset Ln+1. Thus, for each term zny ∈ F, with n ≥ 0, the terms are counted in F as well. Summing these terms for all zny ∈ F with n ≥ 0,
(42) |
Notice that the sum of the expressions in eqs. (41) and (42) is the algebraic representation of the multiset of labels L \ L0. More precisely, each term znym ∈ F associated with a label (n, m) ∈ Ln, with n ≥ 1, is counted—and counted exactly once—in the sum of eqs. (41) and (42). Therefore, to complete the description of F, we require only those terms z0ym associated with labels (0, m) ∈ L0. These terms are represented
(43) |
considering that (eq. (15)) and that by definition, (eq. (18)).
We can now equate the full generating function F(y, z) to the sum of eqs. (43), (41), and (42), obtaining
Applying the fact that for y near 0 in the complex plane, we then have
(44) |
By eq. (25) and the fact that the multisets Ln of labels (n, m) for m-rooted histories of t(n) have hn,m elements,
Substituting in eq. (44), the last two expressions yield
(45) |
which can be rewritten as in eq. (27).
Appendix 2. The dominant singularity and singular expansion of f̃(z)
This appendix obtains the singular expansion of f̃(z) described in eq. (32). In eq. (31), we have defined f̃(z) as a composition f̃(z) = g(Y(z)), with the internal function Y(z) as in eq. (28) and the external function g(y) as in eq. (23). Owing to the presence of the square root in the expression for Y(z), the dominant singularity of the internal function Y(z)—the singularity nearest the origin of the complex plane—is at . Computing the value of Y(z) at its dominant singularity, we obtain . In particular, we have , where 1 is the radius of convergence of the finite series corresponding to the external function g in f̃. Indeed, it immediately follows from Proposition 2 that y = 1 is the dominant singularity of g(y).
As detailed in Section VI.9 of [8], on dominant singularities of compositions, we are in the setting of the subcritical case, in which the inequality implies that the dominant singularity of g(Y(z)) coincides with the dominant singularity of the internal function Y(z) rather than the dominant singularity y = 1 of the external function g(y). The desired singular expansion of f̃(z) = g(Y(z)) at the dominant singularity can be obtained by inserting y = Y(z) in the regular (non-singular) expansion of g(y) at .
To recover the expansion of g(y) at , we expand and then sum each term qj[yaj/(1 − y)b] of the finite linear combination in eq. (23). At , each of these terms is an analytic function, and we can thus use Taylor's formula to produce the desired expansion. We obtain at
By summing over the indices 1 ≤ j ≤ J of eq. (23), the expansion of g(y) at is
(46) |
with the constants αt and βt defined as in eqs. (34) and (35). Plugging y = Y(z) from eq. (28) into eq. (46), we finally obtain the singular expansion of f̃(z) at as in eq. (32).
Contributor Information
Filippo Disanto, Department of Biology, Stanford University, Stanford, CA, USA. fdisanto@stanford.edu..
Noah A. Rosenberg, Department of Biology, Stanford University, Stanford, CA, USA. noahr@stanford.edu.
REFERENCES
- 1.Allman ES, Degnan JH, Rhodes JA. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J. Math. Biol. 2011;62:833–862. doi: 10.1007/s00285-010-0355-7. [DOI] [PubMed] [Google Scholar]
- 2.Banderier C, Bousquet-Mélou M, Denise A, Flajolet P, Gardy D, Gouyou-Beauchamps D. Generating functions for generating trees. Discr. Math. 2002;246:29–55. [Google Scholar]
- 3.Barcucci E, Del Lungo A, Pergola E, Pinzani R. ECO: a methodology for the enumeration of combinatorial objects. J. Differ. Equ. Appl. 1999;5:435–490. [Google Scholar]
- 4.Degnan JH. PhD thesis. University of New Mexico; Albuquerque: 2005. Gene tree distributions under the coalescent process. [PubMed] [Google Scholar]
- 5.Degnan JH, Salter LA. Gene tree distributions under the coalescent process. Evolution. 2005;59:24–37. [PubMed] [Google Scholar]
- 6.Disanto F, Rosenberg NA. Coalescent histories for lodgepole species trees. J. Comp. Biol. 2015;22 doi: 10.1089/cmb.2015.0015. doi:10.1089/cmb.2015.0015. [DOI] [PubMed] [Google Scholar]
- 7.Dutheil JY, Ganapathy G, Hobolth A, Mailund T, Uyenoyama MK, Schierup MH. Ancestral population genomics: the coalescent hidden Markov model approach. Genetics. 2009;183:259–274. doi: 10.1534/genetics.109.103010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Flajolet P, Sedgewick R. Analytic Combinatorics. Cambridge University Press; Cambridge: 2009. [Google Scholar]
- 9.Graham RL, Knuth DE, Patashnik O. Concrete Mathematics. 2nd ed. Addison-Wesley; Boston: 2008. [Google Scholar]
- 10.Hobolth A, Christensen OF, Mailund T, Schierup MH. Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genet. 2007;3:294–304. doi: 10.1371/journal.pgen.0030007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hobolth A, Dutheil JY, Hawks J, Schierup MH, Mailund T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widepsread selection. Genome Res. 2011;21:349–356. doi: 10.1101/gr.114751.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Prodinger H. The kernel method: a collection of examples. Sém. Lothar. Combin. 2004;50:B50f. [Google Scholar]
- 13.Rosenberg NA. Counting coalescent histories. J. Comp. Biol. 2007;14:360–377. doi: 10.1089/cmb.2006.0109. [DOI] [PubMed] [Google Scholar]
- 14.Rosenberg NA. Coalescent histories for caterpillar-like families. IEEE/ACM Trans. Comp. Biol. Bioinf. 2013;10:1253–1262. doi: 10.1109/tcbb.2013.123. [DOI] [PubMed] [Google Scholar]
- 15.Rosenberg NA, Degnan JH. Coalescent histories for discordant gene trees and species trees. Theor. Pop. Biol. 2010;77:145–151. doi: 10.1016/j.tpb.2009.12.004. [DOI] [PubMed] [Google Scholar]
- 16.Rosenberg NA, Tao R. Discordance of species trees with their most likely gene trees: the case of five taxa. Syst. Biol. 2008;57:131–140. doi: 10.1080/10635150801905535. [DOI] [PubMed] [Google Scholar]
- 17.Stanley RP. Enumerative Combinatorics Volume 2. Cambridge University Press; New York: 1999. [Google Scholar]
- 18.Than CV, Rosenberg NA. Consistency properties of species tree inference by minimizing deep coalescences. J. Comp. Biol. 2011;18:1–15. doi: 10.1089/cmb.2010.0102. [DOI] [PubMed] [Google Scholar]
- 19.Than C, Ruths D, Innan H, Nakhleh L. Confounding factors in HGT detection: statistical error, coalescent effects, and multiple solutions. J. Comp. Biol. 2007;14:517–535. doi: 10.1089/cmb.2007.A010. [DOI] [PubMed] [Google Scholar]
- 20.Wu Y. Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution. 2012;66:763–775. doi: 10.1111/j.1558-5646.2011.01476.x. [DOI] [PubMed] [Google Scholar]
- 21.Yu Y, Than C, Degnan JH, Nakhleh L. Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst. Biol. 2011;60:138–149. doi: 10.1093/sysbio/syq084. [DOI] [PMC free article] [PubMed] [Google Scholar]