Skip to main content
Springer logoLink to Springer
. 2024 Mar 22;86(5):45. doi: 10.1007/s11538-024-01270-8

Enumeration of Rooted Binary Unlabeled Galled Trees

Lily Agranat-Tamir 1,, Shaili Mathur 1, Noah A Rosenberg 1
PMCID: PMC10959814  PMID: 38519704

Abstract

Rooted binary galled trees generalize rooted binary trees to allow a restricted class of cycles, known as galls. We build upon the Wedderburn-Etherington enumeration of rooted binary unlabeled trees with n leaves to enumerate rooted binary unlabeled galled trees with n leaves, also enumerating rooted binary unlabeled galled trees with n leaves and g galls, 0gn-12. The enumerations rely on a recursive decomposition that considers subtrees descended from the nodes of a gall, adopting a restriction on galls that amounts to considering only the rooted binary normal unlabeled galled trees in our enumeration. We write an implicit expression for the generating function encoding the numbers of trees for all n. We show that the number of rooted binary unlabeled galled trees grows with 0.0779(4.8230n)n-32, exceeding the growth 0.3188(2.4833n)n-32 of the number of rooted binary unlabeled trees without galls. However, the growth of the number of galled trees with only one gall has the same exponential order 2.4833 as the number with no galls, exceeding it only in the subexponential term, 0.3910n12 compared to 0.3188n-32. For a fixed number of leaves n, the number of galls g that produces the largest number of rooted binary unlabeled galled trees lies intermediate between the minimum of g=0 and the maximum of g=n-12. We discuss implications in mathematical phylogenetics.

Keywords: Galled trees, Phylogenetics, Unlabeled trees

Introduction

Evolutionary histories of genes, populations, and species are often described by phylogenetic trees that seek to represent their descent relationships. Owing in part to the centrality of phylogenetic trees in evolutionary biology, mathematical studies have characterized numerous classes of phylogenetic trees, investigating their combinatorial properties (Semple and Steel 2003; Felsenstein 2004; Gascuel 2005; Steel 2016; Warnow 2018).

The use of tree structures—typically treated as binary—is often appropriate for representing standard phenomena of evolutionary descent, by which biological entities sequentially bifurcate, in a manner in which diverged entities do not merge back together. Processes such as genetic admixture, horizontal gene transfer, and hybridization, however, produce evolutionary relationships that are not tree-like. These processes involve the merging of separate lineages that had previously descended from shared ancestors. With increasing interest in merging mechanisms during evolutionary descent, much recent attention in mathematical phylogenetics has been devoted to phylogenetic networks (Huson et al. 2010; Gusfield 2014; Kong et al. 2022), in which graphs describing relationships of biological entities permit certain types of cycles.

Among the simplest phylogenetic networks are the galled trees, named for the growths that can appear in plant tissues to produce distinctively shaped structures (Gusfield et al. 2003, 2004b). First introduced in studies of ancestral recombination graphs (Wang et al. 2001; Gusfield et al. 2003, 2004a, b; Gusfield 2005, 2014; Song 2006), galled trees allow diverged lineages to merge forward in time, but only in circumscribed ways. Each merging event creates a gall corresponding to a cycle in the associated network.

From a standpoint that considers galled trees as mathematical objects separately from the processes that could produce them biologically, the defining feature of a galled tree is that cycles in a graph structure are disjoint, so that in a galled tree, a vertex or edge is contained in at most one cycle (Semple and Steel 2006). With this graph-theoretic sense for the meaning of galled trees, the enumerative combinatorics of galled trees has been investigated, both for unrooted and for rooted binary galled trees, focusing on galled trees that are leaf-labeled (Semple and Steel 2006; Bouvel et al. 2020; Cardona and Zhang 2020).

Chang et al. (2018) and Mathur and Rosenberg (2023) have posed the problem of enumerating rooted binary galled trees in which the leaves are not labeled. In a study focused on introducing encodings for galled trees, Chang et al. (2018) argued that the number of rooted binary unlabeled galled trees with n leaves is bounded above by a sequence with a certain generating function. In an enumerative study of labeled histories for rooted binary leaf-labeled galled trees, Mathur and Rosenberg (2023) enumerated a class of rooted binary unlabeled galled trees for n from 1 to 6, obtaining 1, 1, 2, 6, 20, 72. These values are indeed bounded above by the corresponding upper bounds of Chang et al. (2018)—1, 1, 4, 28, 245, 2402 for n=1 to 6—though Chang et al. (2018) used a more expansive definition of rooted binary unlabeled galled trees. They are also bounded above by the enumeration in Theorem 8 of Cardona and Zhang (2020) of the corresponding set of rooted binary labeled galled trees, which gives 1, 1, 6, 69, 960, 24,750 for n=1 to 6.

How many rooted binary unlabeled galled trees possess a given number of leaves n and a given number of galls g? Here, we perform this general enumeration, counting the rooted binary unlabeled galled trees of Mathur and Rosenberg (2023) for n1 leaves and g0 galls. We first recursively enumerate all such rooted binary unlabeled galled trees with a specified number of leaves n, considering all possible numbers of galls. We then refine this enumeration by subdividing it according to specified numbers of leaves n and galls g, considering all possible values of g for a fixed n.

Background

Definitions

We follow Mathur and Rosenberg (2023) in describing key concepts, assuming that all networks and trees are binary (and henceforth dropping the term binary). A rooted phylogenetic network is a directed acyclic graph with four properties: (i) there exists a unique root node with in-degree 0 and out-degree 2; (ii) all leaf nodes have in-degree 1 and out-degree 0; (iii) all non-leaf, non-root nodes have in-degree 2 and out-degree 1 or in-degree 1 and out-degree 2; and (iv) all edges are directed away from the root. Nodes with in-degree 2 and out-degree 1 are termed reticulation nodes, and nodes with in-degree 1 and out-degree 2 are tree nodes.

A rooted galled tree is a rooted (binary) phylogenetic network in which two properties hold (Fig. 1). First, (i) each reticulation node ar has a unique ancestor node r such that exactly two non-overlapping paths of edges exist from r to ar; if the direction of edges is ignored, then the two paths connecting r and ar form a cycle Cr, known as a gall. Following Mathur and Rosenberg (2023), the ancestor node r must be separated from ar by at least two edges. This requirement that cycles contain at least four nodes is required by the perspective of Mathur and Rosenberg (2023) that views galled trees as evolving temporally by a biological process such as hybridization. It is equivalent to the requirement that a galled tree be a normal network and is not imposed in a more expansive galled tree definition that permits 3-node galls (Kong et al. 2022).

Fig. 1.

Fig. 1

Rooted galled trees. A A rooted galled tree. In our definition of rooted galled trees, this example is the smallest network that possesses a gall. The gall is a root gall; node 1 is r, the top node; node 2 is the left hybridizing side node; node 4 is the right hybridizing side node; finally, node 3 is ar, the hybrid node. We depict the hybridizing side nodes and the hybrid node in a horizontal line, representing simultaneity of these nodes in the embedding of the rooted galled tree in time, proceeding from the top to the bottom of the diagram. B A network that does not satisfy our definition of a rooted galled tree, but that does qualify according to some definitions. This network is missing a hybrid node; each gall in our definition possesses at least four nodes. C A more complex rooted galled tree by our definition. D A more complex network that is not a rooted galled tree by our definition, because the red triangle lacks a hybrid node

The second criterion is: (ii) the set of nodes in the gall Cr, associated with reticulation node ar, and the set of nodes in the gall Cs, associated with reticulation node asar, are disjoint.

We term the ancestor node r of a gall with reticulation node ar and cycle Cr the top node. Other nodes in a gall, excluding the top node and reticulation node, are called side nodes. We term the reticulation node a hybrid node, and the two immediate parents of a hybrid node hybridizing side nodes, or just hybridizing nodes. The side nodes to the left and right of the hybrid node are the left side nodes and right side nodes, respectively; the distinction between “left” and “right” is only for convenience, and a gall is invariant with respect to exchange of its left and right side nodes. If the root node of a rooted galled tree is part of a gall, then we call this gall the root gall. The root node is always a top node if it is part of a gall.

Although a rooted galled tree is only strictly a tree if it contains no galls, it is convenient to continue to refer to galled trees as trees; similarly, we allow “subtrees” to possess galls. All networks and trees that we consider are rooted, and we henceforth drop the term rooted. Mathur and Rosenberg (2023) focused on labeled galled trees, in which each leaf is associated with a distinct leaf label; here we consider unlabeled galled trees, and we often drop the term unlabeled. Unlike Mathur and Rosenberg (2023), we have no need to assign a temporal embedding to nodes, with the exception that ancestor nodes can be no more recent than their descendants; the galled trees that we consider are understood to be unordered.

Compositions

We will have occasion to consider the sums of ordered b-tuples of positive integers that equal a positive integer a: the compositions C(ab) of a into b parts, 1ba. The set C(ab) has cardinality |C(a,b)|=a-1b-1. This result is obtained by noting that a list of a copies of the number 1 has a-1 “breakpoints” between consecutive 1’s, and, summing 1’s between neighboring breakpoints, the compositions into b parts are produced by the distinct sets of b-1 among the a-1 breakpoint locations. For b>a, we define C(a,b)=, with |C(a,b)|=0.

We distinguish between palindromic and non-palindromic compositions. Palindromic compositions are unchanged when the order of the parts is reversed; non-palindromic compositions do change when the order is reversed. For example, in C(9, 5), (1, 2, 3, 2, 1) is a palindromic composition; (1, 2, 2, 3, 1) is non-palindromic. We denote the palindromic compositions of a into b parts by Cp(a,b) and the non-palindromic compositions of a into b parts by Cnp(a,b), such that Cp(a,b)Cnp(a,b)=C(a,b) and Cp(a,b)Cnp(a,b)=.

The numbers of palindromic and non-palindromic compositions, |Cp(a,b)| and |Cnp(a,b)|, appear in Table 1. Palindromic compositions are counted by counting the ways to place breakpoints on the “left” half of a list of 1’s; breakpoints are then palindromically placed on the “right” half. Distinct cases exist depending on the parity of a and b. For non-palindromic compositions, we obtain |Cnp(a,b)| from |C(a,b)|-|Cp(a,b)|.

Table 1.

The number of palindromic compositions of a into b parts, 1ba, and the corresponding number of non-palindromic compositions

Parity of a Parity of b |Cp(a,b)| |Cnp(a,b)|
Even Even a2-1b2-1 a-1b-1-a2-1b2-1
Even Odd a2-1b-12 a-1b-1-a2-1b-12
Odd Even 0 a-1b-1
Odd Odd a-12b-12 a-1b-1-a-12b-12

Unlabeled Trees with n Leaves

Our approach to enumerating unlabeled galled trees extends the Wedderburn-Etherington enumeration of unlabeled trees with no galls. For unlabeled trees with no galls, the root of a tree with n2 leaves possesses two immediate subtrees. Assume without loss of generality that the number of leaves in the “left” subtree is greater than or equal to the number of leaves in the “right” subtree. If n3 is odd, then Un, the number of unlabeled trees with n leaves, is obtained by considering the n-12 possible numbers of leaves k for the right subtree, for each k pairing all Uk unlabeled trees with k leaves for the right subtree with all Un-k unlabeled trees with n-k leaves for the left subtree. If n is even, then the enumeration is similar for kn2; if k=n2, however, then we have Un/22 ways choosing two distinct subtrees for the left and right subtrees and Un/2 ways of choosing two copies of the same subtree.

The recursion for Un is [e.g. Harding (1971), Felsenstein (2004)]:

Un=1,n=1,k=1n-12UkUn-k,oddn3,(k=1n2-1UkUn-k)+Un2(Un2+1)2,evenn. 1

With Un=0, the generating function U(t)=n0Untn for the Un satisfies ( Comtet 1974, p. 55):

U(t)=t+12U2(t)+12U(t2). 2

The number of trees with no galls has exponential growth with d0ρ-nn-32, for constants d00.3188 and 1/ρ2.4833 ( Harding 1971; Landau 1977; Flajolet and Sedgewick 2009, p. 65).

The Maximum Number of Galls for Galled Trees with n Leaves

For a fixed number of leaves n, the number of galls that a galled tree can possess is constrained as a function of n. Because a gall contains at least three descendant subtrees—those descended from the two hybridizing nodes and the hybrid node—a minimum of n=3 leaves is required before a tree can possess a gall. Each successive addition of a gall then replaces one subtree with a minimum of three subtrees—those descended from the two hybridizing nodes and the hybrid node of the new gall—so that each gall adds at least two leaves. It follows that a galled tree with n leaves can have at most n-12 galls (Mathur and Rosenberg 2023).

Unlabeled Galled Trees with n Leaves

We are now ready to enumerate unlabeled galled trees. We denote by An the number of unlabeled galled trees with n leaves. Trees with n=1 or n=2 leaves have no galls: A1=U1=1 and A2=U2=1. To recursively evaluate An for n3 leaves, we sum counts from two cases: (1) the root is not the top node of a gall; (2) the root is the top node of a gall. We count galled trees in the former case in Bn, with B1=B2=1, and we count galled trees in the latter case in Dn, with D1=D2=0. The goal is to evaluate

An=Bn+Dn. 3

Root is not a Top Node of a Gall

If the root is not the top node of a gall, then an unlabeled galled tree possesses two immediate subtrees of the root, each of which is itself an unlabeled galled tree (Fig. 2A). The number of unlabeled galled trees then follows a recursion analogous to Eq. 1:

Bn=1,n=1,m=1n-12AmAn-m,oddn3,(m=1n2-1AmAn-m)+An2(An2+1)2,evenn. 4

It is convenient to express Bn in a form that considers compositions c of n into 2 parts. For odd n3,

Bn=12m=1n-1AmAn-m=12cC(n,2)Ac1Ac2. 5

For even n,

Bn=12An2+12m=1n-1AmAn-m=12An2+12cC(n,2)Ac1Ac2. 6

Fig. 2.

Fig. 2

Recursive enumeration of rooted galled trees. Triangles indicate unspecified subtrees with at least one leaf. A The root is not a top node of a gall (Eq. 4). B, C The root is a top node of a gall with an even number of subtrees (Eq. 9). The two trees show the two cases with k=6: (,r)=(4,1) (B) and (,r)=(3,2) (C). D The root is a top node of a gall with an odd number of subtrees and r< (Eq. 10). In this case, k=5 and (,r)=(3,1). E The root is a top node of a gall with an odd number of subtrees, r=, and the composition of n leaves descended from the root gall into k=+r+1 parts representing k subtrees is non-palindromic (Eq. 11). In this case, k=5 and (,r)=(2,2). Different outline colors for triangles indicate different numbers of leaves in associated subtrees. F, G The root is a top node of a gall with an odd number of subtrees, r=, and the composition of n leaves into k parts is palindromic (Eq. 12). In both trees, k=5 and (,r)=(2,2); trees with distinct (F) and identical (G) lists of galled subtrees for left and right subtrees are depicted. Different outline colors for triangles indicate different numbers of leaves in subtrees, and different patterns in the same color indicate different topologies with equally many leaves (Color figure online)

Root is a Top Node of a Gall

If the root is a top node of a gall, then the recursion is more complex. We first count subtrees of the root gall, equal to the count of all side nodes plus the hybrid node. Suppose the root gall contains k subtrees. We have the constraint 3kn, as a gall has at least 3 subtrees (of the left hybridizing node, hybrid node, and right hybridizing node), and the root gall can have as many as n subtrees, each ancestral to a single leaf.

Without loss of generality, we can assume that the number of right side nodes in the root gall, r, is less than or equal to the number of left side nodes ; owing to the existence of the hybrid node, +1+r=k. We divide the case further based on the parity of k, writing

Dn=Dn(e)+Dn(o), 7

where Dn(e) and Dn(o) count unlabeled trees with n leaves in which the root is a top node and the number of descendant subtrees of the root gall is even and odd, respectively.

Even Number of Subtrees of the Root Gall

We consider each even value of k, k=2a for a=2,3,,n2. Given k, because r, r ranges from 1 to k2-1=a-1. Because +1+r=k and k is even, we have the strict inequality r<.

Consider the k subtrees in an order that proceeds from the most ancestral left side node descending through subsequent left side nodes to the left hybridizing node, then to the hybrid node, then the right hybridizing node, and then through ancestors to the most ancestral right side node (Fig. 2B, C). Once k and r have been specified, we consider all possible ways of placing the n leaves into the k subtrees of the gall: the compositions C(nk) of n into k parts. For each composition c=(c1,c2,,ck) in C(nk), where ci is the value of the ith term, the number of galled trees is i=1kAci. We have

Dn(e)=a=2n2r=1a-1cC(n,2a)i=12aAci. 8

The summand cC(n,2a)i=12aAci, representing the number of distinct lists of 2a subtrees with total number of leaves n, does not depend on r, the number of those subtrees descended from right side nodes. For n2, a sum from a=2 to a=n2 is empty. We can therefore simplify Eq. 8 to obtain, for all n1,

Dn(e)=a=2n2[(a-1)cC(n,2a)i=12aAci]. 9

Odd Number of Subtrees of the Root Gall

If k is odd, then we consider k=2a+1 for a=1,2,,n-12. In this case, with r, we have 1ra. With +1+r=k and k odd, r= is possible.

If r<, then r ranges from 1 to k-12-1=a-1 (Fig. 2D). We follow the reasoning of the case of even k and find that this case contributes a number of galled trees equal to

a=1n-12[(a-1)cC(n,2a+1)i=12a+1Aci]. 10

Consider r==a. For non-palindromic compositions of n into k parts representing the k subtrees of the gall, an equivalent unlabeled galled tree is obtained by a mirror-image composition that corresponds to an exchange of left and right subtrees of the root gall (Fig. 2E). We therefore multiply by 12 to account for the fact that each unlabeled galled tree is counted twice, so that the non-palindromic compositions of n contribute a number of galled trees equal to

12a=1n-12cCnp(n,2a+1)i=12a+1Aci. 11

With r==a, for a palindromic composition c of n into k parts representing the k subtrees of the root gall (Fig. 2F, G), we can select two distinct lists of galled subtrees for the a left and the a right subtrees; the number of ways to do so is (i=1aAci)[(i=1aAci)-1]/2. Alternatively, we can select the same lists of galled subtrees for the a left and a right subtrees, in i=1aAci ways. Any choices for the left and right subtrees can be combined with Aca+1 choices for the subtree of the hybrid node of the gall. The palindromic compositions produce a number of galled trees equal to

a=1n-12cCp(n,2a+1)(i=1aAci)[(i=1aAci)+1]Aca+12. 12

Summing the three cases in Eqs. 10, 11, and 12, we obtain

Dn(o)=a=1n-12[(r=1a-1cC(n,2a+1)i=12a+1Aci)+(12cCnp(n,2a+1)i=12a+1Aci)+(cCp(n,2a+1)(i=1aAci)[(i=1aAci)+1]Aca+12)]. 13

We can simplify this expression further. For a palindromic composition cCp(n,2a+1), by definition of palindromic compositions, ci=c2a+2-i for i=1,2,,a. C(n,2a+1) is the disjoint union of Cp(n,2a+1) and Cnp(n,2a+1). We can then write

(12cCnp(n,2a+1)i=12a+1Aci)+(cCp(n,2a+1)(i=1aAci)(i=1aAci)Aca+12)=12cC(n,2a+1)i=12a+1Aci.

Equation 13 then becomes

Dn(o)=a=1n-12[[(r=1a-11)+12](cC(n,2a+1)i=12a+1Aci)+(12cCp(n,2a+1)i=1a+1Aci)].

Note that for n2, a sum from a=1 to n-12 is empty. Therefore, for all n1,

Dn(o)=a=1n-12[(a-12)(cC(n,2a+1)i=12a+1Aci)+(12cCp(n,2a+1)i=1a+1Aci)]. 14

Summary

To summarize the enumeration, the desired number of unlabeled galled trees with n leaves, An, can be calculated in Eq. 3 by summing Eqs. 9 and 14 in Eq. 7, and then adding the result to Bn from Eq. 4.

We simplify by writing the sum of Dn(e) and the first term of Dn(o) in one expression. When adding the even-k terms in Dn(e) and the odd-k terms Dn(o), k now ranges from 3 to n, considering even values of k with k=2a and a=2,3,,n2, and odd values of k with k=2a+1 and a=1,2,,n-12. For k=2a, a-1=k-22, and for k=2a+1, a-12=k-22 as well.

Recalling Eqs. 5 and 6, we can now write a simplified expression for the recursion for An. First, A1=1. If the number of leaves n of an unlabeled galled tree is an odd value n3, then

An=12[(cC(n,2)i=12Aci)+(k=3n(k-2)cC(n,k)i=1kAci)+(a=1n-12cCp(n,2a+1)i=1a+1Aci)]. 15

If the number of leaves n is even, then an extra term appears:

An=12[(cC(n,2)i=12Aci)+An2+(k=3n(k-2)cC(n,k)i=1kAci)+(a=1n-12cCp(n,2a+1)i=1a+1Aci)]. 16

Example

To illustrate the recursive enumeration of rooted galled trees, we enumerate the A5=20 rooted galled trees with 5 leaves. The base case in Eqs. 15 and 16 is A1=1. To evaluate A5, we first evaluate A2, A3, and A4.

Trees with Two Leaves

Only one galled tree has two leaves: the 2-leaf tree with no galls (Table 2). Equation 16 recovers this result: B2=A1(A1+1)/2=1 (Eq. 4), D2(e)=0 (Eq. 9), and D2(o)=0 (Eq. 14), so that A2=B2+D2(e)+D2(o)=1.

Table 2.

Rooted unlabeled galled trees with at most 5 leaves

Number of leaves Tree number Number of galls Galled tree
1 1 0 graphic file with name 11538_2024_1270_Figa_HTML.gif
2 1 0 graphic file with name 11538_2024_1270_Figb_HTML.gif
3 1 0 graphic file with name 11538_2024_1270_Figc_HTML.gif
3 2 1 graphic file with name 11538_2024_1270_Figd_HTML.gif
4 1 0 graphic file with name 11538_2024_1270_Fige_HTML.gif
4 2 1 graphic file with name 11538_2024_1270_Figf_HTML.gif
4 3 0 graphic file with name 11538_2024_1270_Figg_HTML.gif
4 4 1 graphic file with name 11538_2024_1270_Figh_HTML.gif
4 5 1 graphic file with name 11538_2024_1270_Figi_HTML.gif
4 6 1 graphic file with name 11538_2024_1270_Figj_HTML.gif
5 1 0 graphic file with name 11538_2024_1270_Figk_HTML.gif
5 2 1 graphic file with name 11538_2024_1270_Figl_HTML.gif
5 3 0 graphic file with name 11538_2024_1270_Figm_HTML.gif
5 4 1 graphic file with name 11538_2024_1270_Fign_HTML.gif
5 5 1 graphic file with name 11538_2024_1270_Figo_HTML.gif
5 6 1 graphic file with name 11538_2024_1270_Figp_HTML.gif
5 7 0 graphic file with name 11538_2024_1270_Figq_HTML.gif
5 8 1 graphic file with name 11538_2024_1270_Figr_HTML.gif
5 9 1 graphic file with name 11538_2024_1270_Figs_HTML.gif
5 10 2 graphic file with name 11538_2024_1270_Figt_HTML.gif
5 11 1 graphic file with name 11538_2024_1270_Figu_HTML.gif
5 12 1 graphic file with name 11538_2024_1270_Figv_HTML.gif
5 13 1 graphic file with name 11538_2024_1270_Figw_HTML.gif
5 14 2 graphic file with name 11538_2024_1270_Figx_HTML.gif
5 15 1 graphic file with name 11538_2024_1270_Figy_HTML.gif
5 16 1 graphic file with name 11538_2024_1270_Figz_HTML.gif
5 17 1 graphic file with name 11538_2024_1270_Figaa_HTML.gif
5 18 1 graphic file with name 11538_2024_1270_Figab_HTML.gif
5 19 1 graphic file with name 11538_2024_1270_Figac_HTML.gif
5 20 1 graphic file with name 11538_2024_1270_Figad_HTML.gif

Galled trees with different numbers of galls appear in different colors (0, black; 1, orange; 2, purple). For each number of leaves n, we enumerate galled trees in a canonical order. We recursively proceed through trees in which the root is not a top node of a gall, incrementing the number of leaves in the right subtree. Next, for trees in which the root is a top node, we proceed in increasing order of the number of subtrees of the root gall; for fixed numbers of subtrees, we proceed in dictionary order of (,r) values; for fixed (,r), we use reverse dictionary order of the compositions of leaves into subtrees of the gall. The canonical order is used in proceeding through subtrees of a fixed size

Trees with Three Leaves

For n=3, there are two galled trees (Table 2). Using Eqs. 4, 9, and 14, we have

B3=A1A2=1D4(e)=0D4(o)=12A1A1A1+12A1A1=1.

Summing B3=1, representing the tree with n=3 leaves and no galls, D3(e)=0, and D3(o)=1 for the unique tree containing a gall, we have A3=2.

Trees with Four Leaves

For n=4, the number of galled trees is 6. We use Eqs. 4, 9, and 14 to obtain

B4=A1A3+12A2(A2+1)=3D4(e)=A1A1A1A1=1D4(o)=12(A2A1A1+A1A2A1+A1A1A2)+12A1A2=2.

B4 counts trees 1 to 3 in Table 2. D4(e) counts the unique tree with k=4 subtrees of the root gall, tree 6. D4(o) counts trees with k=3 subtrees of the root gall, trees 4 and 5. Summing the three quantities, A4=6.

Trees with Five Leaves

We are now ready for the calculation of A5, which produces A5=20 galled trees with five leaves (Table 2).

B5=A1A4+A2A3=8D5(e)=A2A1A1A1+A1A2A1A1+A1A1A2A1+A1A1A1A2=4D5(o)=[12(A3A1A1+A2A2A1+A2A1A2+A1A3A1+A1A2A2+A1A1A3)+12(A2A1+A1A3)]+(32A1A1A1A1A1+12A1A1A1)=8.

B5 enumerates trees in which the root is not a top node of a gall, trees 1 to 8 for n=5 in Table 2. D5(e) enumerates trees with k=4 subtrees of the root gall, trees 15 to 18. D5(o) enumerates trees for which the root gall has k=3 (trees 9 to 14) or k=5 subtrees (trees 19 and 20). B5, D5(e), and D5(o) sum to A5=20.

Unlabeled Galled Trees with n Leaves and g Galls

A salient feature of a galled tree is its number of galls. Having enumerated unlabeled galled trees with n leaves, we now proceed to subdivide the calculation according to the number of galls: the number of galled trees with n leaves is a sum over g from 0 to n-12 of the number of galled trees with n leaves and g galls.

We denote by En,g the number of galled trees with n leaves and g galls. Because the maximum number of galls with n leaves is n-12 (Sect. 2.4), we define En,g=0 for g>n-12. As the unique galled tree with n=1 leaf has no galls, the base case is E1,0=1.

Again we separate two cases: (1) the root is not the top node of a gall, and (2) the root is the top node of a gall. For the former case, we denote the count by Pn,g, with P1,0=P2,0=1. For the latter case, we denote the count by Rn,g, with R1,g=R2,g=0 for all g. We seek to obtain En,g=Pn,g+Rn,g. We use reasoning that parallels the case in which we do not keep track of the number of galls (Sect. 3).

Note that when summing over all possible values of g, for n1, we have

An=g=0n-12En,gBn=g=0n-12Pn,gDn=g=0n-12Rn,g.

Root is not a Top Node of a Gall

If the root is not a top node, then a tree with n2 leaves and g galls can be decomposed into two subtrees. We assign one of these trees m leaves, 1mn2, and h galls, 0hmin(g,m-12). If n and g are both even, then it is possible for the two subtrees to be identical. Similarly to Eq. 4, we have

Pn,g=1,(n,g)=(1,0),m=1n-12h=0min(g,m-12)Em,hEn-m,g-h,oddn3,m=1n2h=0min(g,m-12)Em,hEn-m,g-h,evennand oddg1,(m=1n2-1h=0min(g,m-12)Em,hEn-m,g-h)+En2,g2(En2,g2+1)2,evennand eveng. 17

Note that in this equation, we can replace min(g,m-12) with g; for m-12<hg, Em,h in the summand is zero, as a tree with m leaves has at most m-12 galls.

We write another expression for Pn,g by considering compositions of n into two parts representing the numbers of leaves in the left and right subtrees. We also decompose g; because entries in a composition are strictly positive, we consider compositions of g+2, noting that each entry of the composition exceeds the associated number of galls by 1.

For (ng) in which n or g is odd and n2, similarly to Eq. 5,

Pn,g=12cC(n,2)dC(g+2,2)Ec1,d1-1Ec2,d2-1. 18

For (ng) both even, as in Eq. 6,

Pn,g=12En2,g2+12cC(n,2)dC(g+2,2)Ec1,d1-1Ec2,d2-1. 19

Root is a Top Node of a Gall

In the case of a root gall, we distribute among the subtrees of the root gall g-1 galls, as one of the g galls is the root gall. We again distinguish between even and odd numbers of subtrees of the root gall, writing Rn,g=Rn,g(e)+Rn,g(o), where Rn,g(e) gives the number of trees with n nodes and g galls in which the root is a top node and the root gall has an even number of descendant subtrees, and Rn,g(o) gives the corresponding number of trees with an odd number of descendant subtrees of the root gall. We follow our reasoning of Sects. 3.2.1 and 3.2.2.

Even Number of Subtrees of the Root Gall

Suppose the number of the subtrees of the root gall is even, k=2a, a=2,3,,n2. As in Sect. 3.2.1, given k, the number of right side nodes of the root gall, r, ranges from 1 to k2-1=a-1.

Here, however, we consider all ways of distributing g-1 galls across k=2a subtrees. Just as n leaves are placed into 2a subtrees by a composition of n into 2a parts, g-1 galls are placed into 2a subtrees by a composition of g-1+2a into 2a parts. By decomposing g-1+2a, we allow for the possibility of 0 galls in a subtree; in a composition d of g-1+2a, the number of galls in entry di is di-1.

For all (ng) with n1 and 0gn-12, the resulting number of trees is similar to Eq. 9:

Rn,g(e)=a=2n2[(a-1)cC(n,2a)dC(g-1+2a,2a)i=12aEci,di-1]. 20

Odd Number of Subtrees of the Root Gall

For an odd number of subtrees of the root gall k=2a+1, a=1,2,,n-12, as in Sect. 3.2.2, r ranges from 1 to a. Again we consider (,r) with r< (as in Eq. 10) and add the r= case (as in Eqs. 11 and 12).

If r<, then similarly to the case of even k, the number of galled trees with k subtrees descended from a root gall of a tree with n1 leaves and 0gn-12 galls is

a=1n-12[(a-1)cC(n,2a+1)dC(g-1+2a+1,2a+1)i=12a+1Eci,di-1]. 21

If r==a, then we again distinguish between non-palindromic and palindromic compositions of n leaves into the k subtrees. Non-palindromic compositions do not result in symmetric trees, irrespective of the way the galls are placed across the subtrees. Therefore, considering only the non-palindromic compositions, similarly to Eq. 11, the number of galled trees with n1 leaves and g0 galls is

12a=1n-12cCnp(n,2a+1)dC(g-1+2a+1,2a+1)i=12a+1Eci,di-1. 22

Finally, for the palindromic compositions of n leaves with k odd and r==a, we distinguish between cases with palindromic and non-palindromic compositions describing the placement of the g galls across k subtrees. For the non-palindromic compositions, similarly to Eq. 22, the number of trees is

12a=1n-12cCp(n,2a+1)dCnp(g-1+2a+1,2a+1)i=12a+1Eci,di-1. 23

If both the composition of n leaves and the composition of g-1 galls are palindromic, then, as in our reasoning for Eq. 12, we can choose either two distinct or two identical lists of subtrees for the a left subtrees and the a right subtrees, and the number of trees is

a=1n-12cCp(n,2a+1)dCp(g-1+2a+1,2a+1)×(i=1aEci,di-1)[(i=1aEci,di-1)+1]Eca+1,da+1-12. 24

We sum Eqs. 21, 22, 23, and 24 to obtain

Rn,g(o)=a=1n-12[(a-12)(cC(n,2a+1)dC(g-1+2a+1,2a+1)i=12a+1Eci,di-1)+(12cCp(n,2a+1)dCp(g-1+2a+1,2a+1)i=1a+1Eci,di-1)]. 25

Summary

As En,g=Pn,g+Rn,g(e)+Rn,g(o), we summarize by adding Eqs. 17, 20, and 25. E1,0=1 and E1,g=0 for g1. For (ng) with n2 leaves and 0gn-12 galls, if n is odd, g is odd, or both n and g are odd, then

En,g=12[(cC(n,2)dC(g+2,2)i=12Eci,di-1)+(k=3n(k-2)cC(n,k)dC(g-1+k,k)i=1kEci,di-1)+(a=1n-12cCp(n,2a+1)dCp(g-1+2a+1,2a+1)i=1a+1Eci,di-1)]. 26

If both n and g are even, then an extra term appears:

En,g=12[(cC(n,2)dC(g+2,2)i=12Eci,di-1)+En2,g2+(k=3n(k-2)cC(n,k)dC(g-1+k,k)i=1kEci,di-1)+(a=1n-12cCp(n,2a+1)dCp(g-1+2a+1,2a+1)i=1a+1Eci,di-1)]. 27

Example: 1 Gall

After the galled trees with no galls (Sect. 2.3), the next simplest case for enumeration of galled trees is the galled trees with only one gall. For this case, if there is no root gall, then the one gall must be in exactly one of the two subtrees descended from the root. The other subtree is a tree with no galls. If a root gall is present, then there are no other galls, and all subtrees descended from the root gall are trees with no galls.

For n=1, E1,1=0. For n2, using the odd case Eq. 26, the first term of Eq. 26 when g=1 is

12cC(n,2)dC(3,2)i=12Eci,di-1=12c1=1n-1d1=01Ec1,d1En-c1,1-d1=m=1n-1Em,0En-m,1=m=1n-1UmEn-m,1.

The second term is

12k=3n(k-2)cC(n,k)dC(k,k)i=1kEci,di-1=12k=3n(k-2)cC(n,k)i=1kEci,0=12k=3n(k-2)cC(n,k)i=1kUci.

The third term is

12a=1n-12cCp(n,2a+1)dCp(2a+1,2a+1)i=1a+1Eci,di-1=12a=1n-12cCp(n,2a+1)i=1a+1Eci,0=12a=1n-12cCp(n,2a+1)i=1a+1Uci.

Summing the three terms, the number of galled trees with n2 leaves and g=1 gall is:

En,1=(m=1n-1UmEn-m,1)+12[(k=3n(k-2)cC(n,k)i=1kUci)+(a=1n-12cCp(n,2a+1)i=1a+1Uci)]. 28

Generating Functions

We now derive and analyze generating functions for An, the number of galled trees with n leaves, and En,1, the number of galled trees with n leaves and 1 gall. We also show that the exponential growth of An proceeds faster with n than the exponential growth of Un, the number of trees without galls—but that En,1 and Un follow the same exponential growth.

To analyze the generating functions, we will need the values of An for small n and En,g for small n and g. Hence, we use our recursions to exhaustively calculate the number of galled trees with n leaves, An (Eqs. 15 and 16), and the number of galled trees with n leaves and 0gn-12 galls, En,g (Eqs. 26 and 27). Considering n from 1 to 18, the numerical values appear in Table 3.

Table 3.

Numbers of galled trees with specified numbers of leaves and galls

Number of Total number Number of trees with a fixed number of galls (En,g)
leaves (n) of trees (An) g=0 g=1 g=2 g=3 g=4 g=5 g=6 g=7 g=8
1 1 1
2 1 1
3 2 1 1
4 6 2 4
5 20 3 15 2
6 72 6 48 18
7 272 11 148 107 6
8 1064 23 435 528 78
9 4271 46 1250 2295 661 19
10 17,497 98 3512 9185 4356 346
11 72,843 207 9726 34,503 24,564 3776 67
12 307,307 451 26,587 123,612 123,825 31,289 1543
13 1,310,792 983 71,975 426,218 574,149 216,501 20,720 246
14 5,643,555 2179 193,200 1,425,011 2,493,129 1,316,450 206,644 6942
15 24,493,270 4850 515,051 4,643,119 10,269,351 7,254,224 1,695,084 110,647 944
16 107,043,258 10,905 1,364,896 14,804,696 40,496,606 36,980,263 12,063,205 1,291,278 31,409
17 470,668,034 24,631 3,598,794 46,336,619 153,960,249 176,934,884 76,980,753 12,248,152 580,235 3717
18 2,080,681,402 56,011 9,447,028 142,720,317 567,348,929 803,058,979 450,309,678 99,840,890 7,756,699 142,871

Entries En,g are computed recursively from Eqs. 26 and 27, and for fixed n, the entries in a row sum to the value of An computed recursively from Eqs. 15 and 16. Additional values of An used for approximating the generating function A(t) are A19=9,242,180,923, A20=41,229,189,089, A21=184,634,145,428, A22=829,732,117,279, A23=3,740,636,883,361, A24=16,912,812,764,736, and A25=76,673,344,515,050

Generating Function for An

Define a generating function A(t)=n0Antn, We rewrite Eqs. 15 and 16 in a single equation. To do so, we note A1=1 and define A0=0 and An=0 for non-integer values of n. For n2, we then have

An=12[(m=0nAmAn-m)+An2+(k=3n(k-2)cC(n,k)i=1kAci)+(a=1n-12cCp(n,2a+1)i=1a+1Aci)]. 29

We write the terms of the generating function with three components:

A(t)=n0Antn=12[2t+n2((m=0nAmAn-m)+An2)tnAi(t)+n2(k=3n(k-2)cC(n,k)i=1kAci)tnAii(t)+n2(a=1n-12cCp(n,2a+1)i=1a+1Aci)tnAiii(t)]. 30

The first term has the form of twice the generating function for the Wedderburn-Etherington numbers (Eq. 2):

Ai(t)=2t+A2(t)+A(t2). 31

For the second term,

Aii(t)=k3(k-2)nkcC(n,k)i=1k(Acitci)=k3(k-2)i10i20ik0Ai1Ai2Aikti1+i2++ik 32
=k3(k-2)Ak(t)=m4(m-3)Am-1(t)=[m1(m-3)Am-1(t)]-[-2+(-1)A(t)]=A(t)[1-A(t)]2-21-A(t)+2+A(t). 33

The step in Eq. 32 makes use of A0=0. Equation 33 is obtained if and only if the sum in the previous step converges; that is, if and only if |A(t)|<1.

Finally, for the third term,

Aiii(t)=n3(a=1n-12cCp(n,2a+1)i=1a+1Aci)tn=m1amn2a+1cC(a,m)(i=1m(Acit2ci))An-2atn-2a=m1i10i20im0Ai1Ai2Aimt2i1+2i2++2im0At 34
=m1A(t)Am(t2)=A(t)1-A(t2)-A(t). 35

Equation 34 holds because A0=0, and the last equality holds if and only if |A(t2)|<1.

Labeling the radius of convergence of A(t) by α and inserting into Eq. 30 the quantities in Eqs. 31, 33 and 35, for 0<|t|<α,

A(t)=1+t+12A2(t)+12A(t2)-11-A(t)+A(t)2[1-A(t)]2+A(t)2[1-A(t2)]. 36

Growth of An

We now address the asymptotic growth of An. In particular, we show that the number of galled trees grows exponentially faster in the number of leaves n than the corresponding number of trees without galls.

First, note that the radius of convergence α is a positive constant less than 1. The convergence radius of generating function U(t) for the Un (Eq. 2) is a value ρ0.4027, and in particular, 0<ρ<1 (p. 262 Landau 1977). Because An>Un for all n3, A(t)>U(t) for all 0<t<ρ. Hence, we have αρ<1; in the Appendix, we show α>0.

Also note that because α<1, t2<t for 0<t<α. Because |A(t)|<1 for 0<t<α and A(t) increases monotonically for 0<t<α, |A(t2)|<1 for 0<t<α.

To find the asymptotic growth from the generating function for galled trees, A(t), we use the asymptotics of implicit tree-like classes theorem (Meir and Moon 1989a, b; Flajolet and Sedgewick 2009, pp. 467–468). This theorem describes the asymptotic growth of the coefficients of a generating function that is described implicitly, such as in Eq. 36. We write A(t)=ϕ(t,A(t)), and we denote A(t)=w.

To use the theorem, we must first show that the function A(t), defined by ϕ(t,w)=n,ksn,ktnwk, belongs to the smooth implicit-function schema. Indeed, the necessary conditions are satisfied:

  1. ϕ is analytic in t and w around 0 from Eq. 36 and the positive convergence radius α>0.

  2. A0=0.

  3. An0 for n1.

  4. s0,11, which is verified by noting that the t0w1 term in the right-hand side of Eq. 36 is equal to -1+12+12m0Am(t2)=12m1Am(t2)1.

  5. s0,0=0, which follows from ϕ(0,0)=A(0)=0, and sn,k0, which is verified from the series expansion of Eq. 36.

  6. From Eq. 33, there exists a coefficient sn,k>0 for n0 and k2: for example, s0,2=12-1+122=12.

  7. The last condition, which we show below, is that there are solutions α and w0 for the characteristic system:
    ϕ(α,w0)=w0 37
    ϕw(α,w0)=1. 38

According to the theorem, functions belonging to the smooth implicit-function schema converge at the solution to the characteristic system, where they possess a square-root singularity. We conclude that A(t) converges at α, with A(α)=w0, and that [tn]A(t)[δ/(2π)]α-nn-32, where

δ=2αϕt(α,w0)ϕww(α,w0). 39

It remains to show condition (7). We can write ϕ(t,w) as:

ϕ(t,w)=g1(t)+12w2-11-w+w2(1-w)2+wg2(t), 40

where g1(t)=1+t+12A(t2) and g2(t)=1/[2(1-A(t2))]. Taking the derivative with respect to w, we have

ϕw(t,w)=[-1+2g2(t)]+[5-6g2(t)]w+[-6+6g2(t)]w2+[6-2g2(t)]w3-2w42(1-w)3. 41

We do not know the value of A(α2) that appears in g2(α). A(t) is monotonically increasing with t>0; because α2 is less than the radius of convergence α, A converges at α2 and A(α2) is a finite constant. As shown above, A(α2)<1. To find (α,w0) numerically, we first note that Eq. 41 depends on t only through A(t2). Hence, we can traverse values of y=A(t2), numerically solving Eq. 38 for the single variable w in terms of y. Solutions for w must satisfy w>y, as w=A(t)>A(t2) by the monotonicity of A(t).

Next, we see that Eq. 40 contains variables w, y, and t; using the pairs (wy) obtained in the previous step, we numerically solve Eq. 37 for t in terms of w and y. In the third step, for each triple (twy), we insert the value of t into the generating function n=125Ant2n, where values A1,A2,,A25 are taken from Table 3; we retain triples with small |y-n=125Ant2n|. Note that in this step, we could instead have retained triples with small |w-n=125Antn|; faster convergence of n=125Ant2n compared to n=125Antn with a fixed number of known values of An suggests that a more accurate result is obtained by use of y rather than w.

Finally, the best-fit triple (twy) gives the numerical solution for (α,w0,A(α2)), or (0.2073, 0.3550, 0.0450). We hence have α0.2073 for the radius of convergence, and A(α)0.3550. The radius α is indeed lower than ρ0.4027. Taking additional digits, the exponential order of the sequence An is approximately 0.2073397-14.8230, greater than that of the sequence Un for trees without galls (0.4026975-12.4833).

To calculate the asymptotic approximation to An, we evaluate the constant δ. We have:

ϕww(t,w)=1+3w(1-w)4ϕt(t,w)=1+tA(t2)+wtA(t2)[1-A(t2)]2.

We numerically evaluate the derivative A(α2) from the first 25 terms by A(α2)[n=125An(α2)n-n=125An(α2-0.001)n]/0.001. Inserting α0.2073397 for t and A(α)0.3550 for w, we have ϕww(α,w0)7.1533, A(α2)1.0981, ϕt(α,w0)1.3163, δ2·0.2073·1.3163/7.15330.2762 by Eq. 39, and

An=[tn]A(t)0.0779(4.8230n)n-32. 42

Generating Function for En,1

We next find the generating function of En,1, E(t)=n0entn, writing en=En,1. We define e0=0, and recall that e1=0 and that Eq. 28 applies for n2. We then have for n1

en=(m=0nUmen-m)+12[(k=3n(k-2)cC(n,k)i=1kUci)+(a=1n-12cCp(n,2a+1)i=1a+1Uci)]. 43

We can now write

E(t)=n0entn=12[2n3(m=0nUmen-m)tnEi(t)+n3(k=3n(k-2)cC(n,k)i=1kUci)tnEii(t)+n3(a=1n-12cCp(n,2a+1)i=1a+1Uci)tnEiii(t)]. 44

As in the derivation of A(t), we calculate the three parts separately.

First, because em=0 for m=0,1,2,

Ei(t)=2n0(m=0nUmen-m)tn=2m0nm(Umtm)(en-mtn-m)=2m00(Umtm)(et)=2U(t)E(t). 45

For the second term, the derivation is identical to that of Eq. 33:

Eii(t)=U(t)[1-U(t)]2-21-U(t)+2+U(t). 46

Analogously to Eq. 33, Eq. 46 relies on a summation that can be completed if and only if |U(t)|<1, that is, for |t|<ρ (Landau 1977, Eqs. 4 and 5).

Finally, for the third term, following the derivation of Eq. 35,

Eiii(t)=U(t)1-U(t2)-U(t), 47

where the last equality holds if and only if |U(t2)|<1. Because ρ<1, |t|2<|t| for 0<|t|<ρ, and by the monotonicity of U(t) for 0<t<ρ, |U(t2)|<|U(t)|<1 for 0<|t|<ρ.

Summarizing Eqs. 44, 45, 46, and 47, for 0<t<ρ,

E(t)=1+U(t)E(t)-11-U(t)+U(t)2[1-U(t)]2+U(t)2[1-U(t2)].

Solving for E(t), we have

E(t)=11-U(t)-1[1-U(t)]2+U(t)2[1-U(t)]3+U(t)2[1-U(t)][1-U(t2)]. 48

Growth of En,1

We now show that the asymptotic growth of the number of galled trees with one gall follows the asymptotic exponential growth of the number of trees with no galls. We also find the asymptotic approximation of En,1.

First, E(t)>U(t). From the form of Eq. 48, E(t) converges if and only if |U(t)|<1. It is shown in Eqs. 4 and 5 of Landau (1977) that 0<U(t)<1 for 0<t<ρ, with limtρ-U(t)=1. We conclude that E(t) has the same radius of convergence ρ as U(t). To find the asymptotic behavior of E(t), we notice that

E(t)=U(t)[2U(t)-1]2[1-U(t)]3+U(t)2[1-U(t)][1-U(t2)]. 49

As tρ-, with U(t)1-γ1-t/ρ and γ1.1300 (Flajolet and Sedgewick 2009, pp. 476–477), 1-U(t)0. Hence, the first of the two terms in Eq. 49 is the leading term as tρ-, producing

E(t)12γ3(1-t/ρ)32. 50

At this point, we seek to use transfer theorems to transfer the asymptotic equivalence for E(t) to an asymptotic equivalence for its coefficients. To do so, we note that U(t) satisfies the technical criterion that it is Δ-analytic at ρ—that is, it is analytic in a domain Δ of particular shape around the singularity at ρ. The computation of E(t) from U(t) maintains the property that E(t) is Δ-analytic with a singularity at ρ.

We can therefore use a transfer formula [Corollary VI.1, page 392 and Theorem VI.4, page 393 in Flajolet and Sedgewick (2009)], according to which, if f(t) is Δ-analytic with a singularity at b, and f(t)(1-tb)-a as tb1 with t in Δ, and a{0,-1,-2,}, then the coefficients of f satisfy [tn]f(t)na-1b-n/Γ(a). Using Eq. 50, we apply the transfer formula to E(t) with ρ in the role of b and 32 for a, noting Γ(32)=π/2:

En,112γ3Γ(32)n12ρ-n=1γ3πn12ρ-n. 51

En,1 and Un have the same exponential growth. Whereas Un has subexponential term 0.3188n-32, however, En,1 has larger subexponential term 0.3910n12.

Bivariate Generating Function for En,g

We now find the bivariate generating function A(t,u)=n0g0En,gtnug. First, note that E0,g=0 for each g0. For n=1, E1,0=1 and E1,g=0 for g1. From the recursion for En,g (Eqs. 26, 27), we get

A(t,u)=12[2t+n2g0((cC(n,2)dC(g+2,2)i=12Eci,di-1)+En2,g2)tnugAi(t,u)+n2g0(k=3n(k-2)cC(n,k)dC(g-1+k,k)i=1kEci,di-1)tnugAii(t,u)+n2g0(a=1n-12cCp(n,2a+1)dCp(g-1+2a+1,2a+1)i=1a+1Eci,di-1)tnugAiii(t,u)], 52

where Em,=0 if at least one of (m,) is not in N.

We can solve to find an expression for A(t,u) in a manner similar to the solution for A(t). For the second and third terms, we have i=1k(di-1)=g-1 and i=12a+1(di-1)=g-1; in these terms, the gth gall is the root gall. Therefore,

Ai(t,u)=2t+n2g0m=0n=0gEm,En-m,g-tnug+n0g0En,gt2nu2g=2t+m00Em,tmunmgEn-m,g-tn-mug-+A(t2,u2)=2t+A2(t,u)+A(t2,u2), 53
Aii(t,u)=k3(k-2)nkg1cC(n,k)dC(g-1+k,k)(i=1kEci,di-1tciudi-1)u=uk3(k-2)i10j10Ei1,j1ti1uj1i20j20Ei2,j2ti2uj2ik0jk0Eik,jktikujk=uk3(k-2)Ak(t,u)=u[A(t,u)[1-A(t,u)]2-21-A(t,u)+2+A(t,u)], 54
Aiii(t,u)=m1amn2a+1b0g-12bcC(a,m)dC(b+m,m)(i=1mEci,di-1t2ciu2(di-1))En-2a,(g-1)-2btn-2au(g-1)-2bu=um1i10j10Ei1,j1t2i1u2j1i20j20Ei2,j2t2i2u2j2im0jm0Eim,jmt2imu2jm0p0E,ptup=um1A(t,u)Am(t2,u2)=u[A(t,u)1-A(t2,u2)-A(t,u)]. 55

In summary, inserting Eqs. 53, 54, and 55 into Eq. 52,

A(t,u)=u+t+12A2(t,u)+12A(t2,u2)-u1-A(t,u)+uA(t,u)2[1-A(t,u)]2+uA(t,u)2[1-A(t2,u2)]. 56

The Distribution of the Number of Galled Trees with a Fixed Number of Leaves

The bivariate generating function A(t,u) provides a basis for studying the distribution of the number of galls across galled trees with n leaves. The approach follows a theorem concerning asymptotic distributions in Theorem 2.23 of Drmota (2009) and Proposition IX.17 on p. 682 of (Flajolet and Sedgewick 2009). We use the form of the theorem quoted in Theorem 2 of Bouvel et al. (2020), who considered labeled galled trees. We conclude that for a fixed number of leaves, the number of galled trees as a function of the number of galls g is asymptotically normally distributed with mean and variance linear in n.

Following Bouvel et al. (2020), we consider a power series C(zx) in two variables that is defined implicitly as the solution of C(z,x)=F(z,x,C(z,x)), where F satisfies certain conditions. We suppose {Xn} is a sequence of random variables such that E[xXn]=[zn]C(z,x)/[zn]C(z,1). Then Xn is asymptotically normally distributed with a mean and variance that are linear multiples of n calculated from F.

In our scenario, t, u, and A play the roles of z, x, and C. A(t,u) is implicitly defined as a function of t, u, and A itself. With A(t,u)=n0g0En,gtnug, Xn gives the random number of galls in a randomly selected galled tree with n leaves. Fixing the number of leaves n in A(t,u), this random variable satisfies

E[uXn]=g0En,gtnugg0En,gtn=g0En,gugg0En,g=[tn]A(t,u)[tn]A(t,1). 57

To conclude that random variable Xn—the random number of galls in a tree with n leaves—is normally distributed, it remains only to verify the conditions of the theorem.

Translating from the notation of Bouvel et al. (2020) and writing A(t,u)=ψ(t,u,A(t,u)) with A(t,u)=w so that w=ψ(t,u,w), we must show all of the following:

  1. ψ(t,u,w) is analytic in tuw around 0.

  2. ψ(0,u,w)=0.

  3. ψ(t,u,0)0 for t>0.

  4. All coefficients [tnug]ψ(t,u,w) are real and nonnegative.

  5. Nonnegative solutions (t,w)=(t0,w0) exist for the following pair of equations:
    i.ψ(t,1,w)=w,ii.ψw(t,1,w)=1.
  6. The solutions satisfy:
    i.ψww(t0,1,w0)0,ii.ψt(t0,1,w0)0.

Condition 1 holds because ψ is a quotient of analytic functions in tuw with denominator greater than 0 near (0, 0, 0). Condition 2 is met because A(t,u)=n0g0En,gtnug=n1g0En,gtnug (because E0,g=0 for all g0) and so A(0,u)=n1g00nug=0. Therefore,

ψ(0,u,A)=u+0+12A2(0,u)+12A(02,u2)-u1-A(0,u)+uA(0,u)2[1-A(0,u)]2+uA(0,u)2[1-A(02,u2)]=u+0+0+0-u+0+0=0.

For condition 3, ψ(t,u,0)=t+0+A(t2,u2)-0+0+0, which is not equal to 0 for t>0. Condition 4 holds trivially from the definition of A(t,u). For conditions 5 and 6, we first show ψ(t,1,w)=ϕ(t,w). First,

ψ(t,1,A(t,1))=1+t+12A2(t,1)+12A(t2,12)-11-A(t,1)+1·A(t,1)2[1-A(t,1)]2+1·A(t,1)2[1-A(t2,12)].

Next, we have

A(t,1)=n0g0En,gtn1g=n0(g0En,g)tn=n0Antn=A(t).

We then have ψ(t,1,w)=ϕ(t,w). We have already shown conditions 5 and 6 in our analysis of function ϕ.

With all the conditions demonstrated, we conclude that the random number of galls in a tree with n leaves is normally distributed.

Numerical Results

The numerical results for the number of galled trees with n leaves and the number of galled trees with 0gn-12 galls suggest a number of simple observations (Table 3). First, for g=0, we recover the Wedderburn-Etherington numbers obtained from Eq. 1. For n=1 to 6, we obtain the values of An and En,g computed by Mathur and Rosenberg (2023). Finally, as g is bounded above by gmax=n-12, pairs of consecutive values of n, an odd then an even integer, have the same number of values of g for which the number of galled trees En,g is nonzero, namely n+12.

Considering a fixed number of leaves n18, we comment informally on the number of galled trees across different values of g. The number of trees with at least one gall is larger than the number without galls. As g increases for fixed n, the number of trees increases to a maximum, then declines. For values of n for which the maximal number of galls is even (n=1,2,5,6,9,10,13,14,17,18), the largest number of trees occurs when the number of galls is gmax/2, half of this maximum. When gmax is odd (n=3,4,7,8,11,12,15,16), the largest number of trees occurs at g=(gmax-1)/2 or g=(gmax+1)/2.

Figure 3 plots the number of trees for fixed n as a function of g, considering four consecutive values of n that represent the four cases possible for the parity of n and gmax. The plots are somewhat symmetric; for n=16 and 17, a neighboring value of g produces a number of trees close to the maximum, and for n=15 and 18, the peak stands out more clearly. The patterns accord with the asymptotic normal distribution demonstrated for the number of galls as n increases (Sect. 5.6).

Fig. 3.

Fig. 3

Number of galled trees as a function of the number of galls g, for fixed numbers of leaves n. A n=15. B n=16. C n=17. D n=18. Values are computed from Eqs. 26 and 27

Figure 4 examines the growth of En,g on a logarithmic scale for different fixed values of g. The number of trees with no galls has exponential growth d0ρ-nn-32, for constants d00.3188 and 1/ρ2.4833 (Sect. 2.3). With one gall, En,1 exceeds Un with growth d1ρ-nn12 for d10.3910, but with the same exponential growth (Sect. 5.4). With specified numbers of galls g2, we see that growth of En,g for fixed g appears to also follow an exponential trend.

Fig. 4.

Fig. 4

Number of galled trees as a function of the number of leaves n, for fixed numbers of galls g=0,1,2,3,4,5,6. Values are computed from Eqs. 26 and 27. The y-axis appears on a logarithmic scale

Discussion

Building on the Wedderburn-Etherington recursion for enumerating rooted binary unlabeled trees with n leaves (Eq. 1), we have introduced a recursion to enumerate rooted binary unlabeled (normal) galled trees with n leaves. The recursion follows the spirit of the Wedderburn-Etherington formula in its recursive descent from the tree root—but with additional terms for cases in which the root of the tree is also the top node of a gall (Eqs. 15 and 16). Continuing with a similar recursive strategy, we have also obtained a recursive formula for the number of galled trees with a fixed number of leaves n and a fixed number of galls g (Eqs. 26 and 27). We have derived generating functions for the number of galled trees (Eq. 36) and for the number of galled trees with 1 gall (Eq. 48), analyzing their asymptotic behavior.

Our numerical calculations find that for small n, for a fixed number of galls g, the increase of the number of galled trees En,g with n appears faster for larger values of the fixed number of galls g (Table 3, Fig. 4). Because En,g=0 for n<2g+1, for higher values of g, values of En,g at small n do not reflect the asymptotic trend. Nevertheless, for g=1, the initial apparent rapid growth of En,g visible with increasing n moderates, in accord with the finding that the exponential order of the increase is the same as for the case of no galls (Sect. 5.4). A similar moderation in growth with increasing n is just observable for En,2, which could hint at a similar exponential growth; we can conjecture that each En,g with fixed g has the same exponential growth. Note that Fuchs et al. (2019, Theorem 5.1; 2022, Theorem 1 and Corollary 2) showed that the exponential growth of labeled tree-child networks and normal networks with a fixed number of reticulation vertices (corresponding to a fixed number of galls in our case) is the same for any such number; only the subexponential growth differs.

On the other hand, when g is not restricted, we have shown in Sect. 5.2 that the convergence radius of the generating function A(t) satisfies 0<α<ρ<1, so that An grows with 0.0779(4.8230n)n-32. We also observed that the number of galled trees An grows numerically faster with n than does the number of trees with no galls (Table 3).

For a fixed number of leaves n, the number of trees En,g with a fixed number of galls increases to a maximum when the number of galls is at or near half the maximum number of galls n-12, then decreases. This pattern accords with the normal distribution we expect as n increases (Sect. 5.6). It is explained by the fact that many ways often exist to add a gall to a tree with a small number of galls without changing the number of lineages n (Fig. 5A). As the number of galls grows, fewer places are available in the tree to add more galls (Fig. 5B), and the number of possible trees declines. Informally, for a tree with n leaves, when we have a maximum of gmax potential galls from which to choose, the binomial gmaxg, describing the number of possible subsets containing g galls, is highest for g near gmax/2.

Fig. 5.

Fig. 5

The addition of a gall to a galled tree. A One of the galled trees with 5 leaves and 0 galls. B The ten galled trees with 5 leaves and 1 gall produced by adding a gall to the tree in (A). The gall is indicated in blue. C The sole galled tree with 5 leaves and 2 galls produced by adding a gall to the tree highlighted in (B). The new gall is indicated in orange. A tree with 5 leaves has at most 2 galls (Color figure online)

Galled trees provide a class of networks for use with biological processes such as admixture of populations, horizontal gene transfer, hybridization, and the recombination processes for which galled trees were originally introduced (Wang et al. 2001; Gusfield et al. 2004a). Other definitions of galled trees have previously been considered in enumerative problems (Semple and Steel 2006; Chang et al. 2018; Bouvel et al. 2020; Cardona and Zhang 2020); our definition, which requires galled trees to be “normal” by imposing a minimum of four nodes per gall, is designed for scenarios in which two lineages merge to form a new third lineage, but continue to have other descendants that are not descended from this merging event. Such scenarios are suited to phenomena such as admixture and hybridization, in which the merging process of two groups to form a third group has this feature: it does not cause the disappearance of the original two groups, which are free to produce additional descendants through processes that do not involve admixture and hybridization.

In related work, Cardona and Zhang (2020) enumerated rooted binary labeled normal galled trees. Their Theorem 8 finds that the number Mn of such trees with n leaves is

Mn=(k2,k3,,kn)C(n+k2++kn-1)!1k32k4(n-2)knk2!k3!kn!2k2+k3++kn, 58

where C is the set of vectors (k2,k3,,kn) of nonnegative integers satisfying 1+k2+2k3++(n-1)kn=n. This enumeration accords with our enumeration of the corresponding unlabeled normal galled trees. For n=1 and 2, Eq. 58 produces 1 rooted binary labeled normal galled tree; for n=3, it gives 6 labeled trees—in accord with our count of 2 unlabeled normal galled trees, each of which has 3 possible labelings. For n=4, Eq. 58 gives 69 labeled trees; the 6 unlabeled normal galled trees in Table 2 have 12, 12, 3, 12, 6, and 24 labelings, respectively, summing to 69.

The enumeration of galled trees can assist in studies involving mixture processes in the same way that the Wedderburn-Etherington enumeration assists in evolutionary biology more generally, by describing the contents of a space of biologically relevant trees that must be traversed in a variety of algorithmic, combinatorial, probabilistic, and statistical problems [e.g. Harding (1971), Matsen and Evans (2012), Sievers et al. (2014), Colijn and Plazzotta (2018), Rosenberg (2021)]. The study adds to the growing area of enumerative combinatorics of phylogenetic networks [e.g. Bouvel et al. (2020), Cardona and Zhang (2020), Gunawan et al. (2020), Bienvenu et al. (2022), Fuchs et al. (2022)] and is one of relatively few studies to examine a class of unlabeled networks (Chang et al. 2018; Mathur and Rosenberg 2023). Further work can investigate the properties of En,g for fixed g2.

Acknowledgements

We are grateful to M. Fuchs, B. Gittenberger, and a reviewer; comments from the reviewer and discussions with M. Fuchs and B. Gittenberger have assisted with the asymptotic analyses of the generating functions. We acknowledge NSF grant BCS-2116322 and the Scholarship for Outstanding Postdoctoral Fellows in Data Science given by the council for Higher Education of Israel.

Appendix

This appendix shows that the radius of convergence α for the generating function A(t)=n0Antn is positive. The approach is to bound An from above by a quantity whose generating function is known to have a positive radius of convergence. For clarity in comparing with Bouvel et al. (2020) and Cardona and Zhang (2020), we retain the terms “rooted,” “labeled” and “unlabeled,” and “normal” in this appendix.

Our number of rooted unlabeled normal galled trees An is bounded above by Mn, the corresponding number of rooted labeled normal galled trees in Theorem 8 of Cardona and Zhang (2020) (Eq. 58). Mn is in turn bounded above by the number of rooted labeled galled trees tabulated without imposing the normality requirement, a quantity of Bouvel et al. (2020) that we call Qn. Section 5 of Bouvel et al. (2020) showed that the exponential generating function Q(t)=n0Qntn/n! has positive radius of convergence r=18.

To prove that α>0, we note that the number of rooted labeled normal galled trees with n leaves is Mn=i=1nM(Ti). Here, the sum proceeds over the An rooted unlabeled normal galled trees, as each rooted labeled normal galled tree is obtained by placing a labeling on one of the rooted unlabeled normal galled trees. M(Ti) is the number of labelings of rooted unlabeled normal galled tree Ti.

Next, consider the concept of a symmetric node of a rooted unlabeled galled tree, an internal node with two identical rooted unlabeled subtrees. The top node of a gall can be symmetric, but side nodes of a gall cannot, as one subtree of a side node contains the reticulation node and the other does not. A reticulation node also cannot be a symmetric node, as it has only one subtree.

For a rooted unlabeled normal galled tree, the number of distinct labelings is L(Ti)=n!/2si, where si is the number of symmetric nodes of Ti. To see why this result holds, consider a planar representation of Ti, and examine all n! labelings of the n nodes. For each such labeling, for each symmetric node, a rotation of Ti around the node generates a distinct labeling for the same labeled tree—so that each rooted labeled normal galled tree is obtained from 2si of the n! labelings.

The number of symmetric nodes is bounded above by the maximal number of internal nodes that are not side nodes or hybridization nodes. This number is n-1, the number of internal nodes of a rooted tree with no galls; note that each gall adds two internal nodes to the tree, but neither of the “extra” nodes can be symmetric, as they include a side node and a reticulation node.

It is convenient to use n rather than n-1 for the upper bound on the number of symmetric nodes. Then

An2nn!=i=1Ann!2n<i=1Ann!2si=MnQn,

so that

An<Qnn!2n. 59

The generating function Q(t) has positive radius of convergence r=18, so that Q(2t) converges for |t|<116. Multiplying Eq. 59 by tn and summing over all n, we have |A(t)|<|Q(2t)| for 0<|t|<116. Hence, the smaller A(t) must have positive radius of convergence α116. In particular, α>0.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Bienvenu F, Lambert A, Steel M. Combinatorial and stochastic properties of ranked tree-child networks. Random Struct Algorithms. 2022;60:653–689. doi: 10.1002/rsa.21048. [DOI] [Google Scholar]
  2. Bouvel M, Gambette P, Mansouri M. Counting phylogenetic networks of level 1 and 2. J Math Biol. 2020;81:1357–1395. doi: 10.1007/s00285-020-01543-5. [DOI] [PubMed] [Google Scholar]
  3. Cardona G, Zhang L. Counting and enumerating tree-child networks and their subclasses. J Comput Syst Sci. 2020;114:84–104. doi: 10.1016/j.jcss.2020.06.001. [DOI] [Google Scholar]
  4. Chang K-Y, Hon W-K, Thankachan SV (2018) Compact encoding for galled-trees and its applications. In: 2018 Data Compression Conference, Snowbird, UT, pp 297–306
  5. Colijn C, Plazzotta G. A metric on phylogenetic tree shapes. Syst Biol. 2018;67:113–126. doi: 10.1093/sysbio/syx046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Comtet L. Advanced combinatorics. Boston: Reidel; 1974. [Google Scholar]
  7. Drmota M. Random trees. Vienna: Springer; 2009. [Google Scholar]
  8. Felsenstein J. Inferring phylogenies. Sunderland, MA: Sinauer; 2004. [Google Scholar]
  9. Flajolet P, Sedgewick R. Analytic combinatorics. Cambridge: Cambridge University Press; 2009. [Google Scholar]
  10. Fuchs M, Gittenberger B, Mansouri M. Counting phylogenetic networks with few reticulation vertices: tree-child and normal networks. Australas J Comb. 2019;73:385–423. [Google Scholar]
  11. Fuchs M, Huang E-Y, Yu G-R. Counting phylogenetic networks with few reticulation vertices: a second approach. Discr Appl Math. 2022;320:140–149. doi: 10.1016/j.dam.2022.03.026. [DOI] [Google Scholar]
  12. Gascuel O. Mathematics of evolution and phylogeny. Oxford: Oxford University Press; 2005. [Google Scholar]
  13. Gunawan AD, Rathin J, Zhang L. Counting and enumerating galled networks. Discrete Appl Math. 2020;283:644–654. doi: 10.1016/j.dam.2020.03.005. [DOI] [Google Scholar]
  14. Gusfield D. Optimal, efficient reconstruction of root-unknown phylogenetic networks with constrained and structured recombination. J Comput Syst Sci. 2005;70:381–398. doi: 10.1016/j.jcss.2004.12.009. [DOI] [Google Scholar]
  15. Gusfield D. ReCombinatorics. Cambridge: MIT Press; 2014. [Google Scholar]
  16. Gusfield D, Eddhu S, Langley C (2003) Efficient reconstruction of phylogenetic networks with constrained recombination. In: Computational Systems Bioinformatics, CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference, Stanford, CA, pp 363–374 [PubMed]
  17. Gusfield D, Eddhu S, Langley C (2004a) The fine structure of galls in phylogenetic networks. INFORMS J Comput 16:459–469
  18. Gusfield D, Eddhu S, Langley C (2004b) Optimal, efficient reconstruction of phylogenetic networks with constrained recombination. J Bioinform Comput Biol 2:173–213 [DOI] [PubMed]
  19. Harding EF. The probabilities of rooted tree-shapes generated by random bifurcation. Adv Appl Prob. 1971;3:44–77. doi: 10.2307/1426329. [DOI] [Google Scholar]
  20. Huson DH, Rupp R, Scornavacca C. Phylogenetic networks: concepts, algorithms and applications. Cambridge: Cambridge University Press; 2010. [Google Scholar]
  21. Kong S, Pons JC, Kubatko L, Wicke K. Classes of explicit phylogenetic networks and their biological and mathematical significance. J Math Biol. 2022;84:47. doi: 10.1007/s00285-022-01746-y. [DOI] [PubMed] [Google Scholar]
  22. Landau BV. An asymptotic expansion for the Wedderburn–Etherington sequence. Mathematika. 1977;24:262–265. doi: 10.1112/S0025579300009177. [DOI] [Google Scholar]
  23. Mathur S, Rosenberg NA. All galls are divided into three or more parts: recursive enumeration of labeled histories for galled trees. Algorithms Mol Biol. 2023;18:1. doi: 10.1186/s13015-023-00224-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Matsen FA, Evans SN. Ubiquity of synonymity: almost all large binary trees are not uniquely identified by their spectra or their immanantal polynomials. Algorithms Mol Biol. 2012;7:14. doi: 10.1186/1748-7188-7-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Meir A, Moon JW. On an asymptotic method in enumeration. J Comb Theory Ser A. 1989;51:77–89. doi: 10.1016/0097-3165(89)90078-2. [DOI] [Google Scholar]
  26. Meir A, Moon JW. Erratum: on an asymptotic method in enumeration. J Comb Theory Ser A. 1989;52:163. doi: 10.1016/0097-3165(89)90071-X. [DOI] [Google Scholar]
  27. Rosenberg NA. On the Colijn–Plazzotta numbering scheme for unlabaled binary rooted trees. Discrete Appl Math. 2021;291:88–98. doi: 10.1016/j.dam.2020.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Semple C, Steel M. Phylogentics. Oxford: Oxford University Press; 2003. [Google Scholar]
  29. Semple C, Steel M. Unicyclic networks: compatibility and enumeration. IEEE/ACM Trans Comput Biol Bioinform. 2006;3:84–91. doi: 10.1109/TCBB.2006.14. [DOI] [PubMed] [Google Scholar]
  30. Sievers F, Hughes GM, Higgins DG. Systematic exploration of guide-tree topology effects for small protein alignments. BMC Bioinform. 2014;15:338. doi: 10.1186/1471-2105-15-338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Song YS. A concise necessary and sufficient condition for the existence of a galled-tree. IEEE/ACM Trans Comput Biol Bioinform. 2006;3:186–191. doi: 10.1109/TCBB.2006.15. [DOI] [PubMed] [Google Scholar]
  32. Steel M. Phylogeny: discrete and random processes in evolution. Philadelphia: Society for Industrial and Applied Mathematics; 2016. [Google Scholar]
  33. Wang L, Zhang K, Zhang L. Perfect phylogenetic networks with recombination. J Comput Biol. 2001;8:69–78. doi: 10.1089/106652701300099119. [DOI] [PubMed] [Google Scholar]
  34. Warnow T. Computational phylogenetics. Cambridge: Cambridge University Press; 2018. [Google Scholar]

Articles from Bulletin of Mathematical Biology are provided here courtesy of Springer

RESOURCES