Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2013 Apr 8;8(4):e60123. doi: 10.1371/journal.pone.0060123

The Effect of Single Recombination Events on Coalescent Tree Height and Shape

Luca Ferretti 1,2, Filippo Disanto 1, Thomas Wiehe 1,*
Editor: Nadia Singh3
PMCID: PMC3620475  PMID: 23593168

Abstract

The coalescent with recombination is a fundamental model to describe the genealogical history of DNA sequence samples from recombining organisms. Considering recombination as a process which acts along genomes and which creates sequence segments with shared ancestry, we study the influence of single recombination events upon tree characteristics of the coalescent. We focus on properties such as tree height and tree balance and quantify analytically the changes in these quantities incurred by recombination in terms of probability distributions. We find that changes in tree topology are often relatively mild under conditions of neutral evolution, while changes in tree height are on average quite large. Our results add to a quantitative understanding of the spatial coalescent and provide the neutral reference to which the impact by other evolutionary scenarios, for instance tree distortion by selective sweeps, can be compared.

Introduction

Coalescent theory is a central part of modern population genetics [1][3]. It constitutes the basis of genealogical models, of statistical tests of the neutral evolution hypothesis [4] as well as of many simulation tools [5][7]. Besides application in population genetics, coalescent models and their various generalizations became an object of study in their own right in probability, graph theory and combinatorics [8][12].

The classical coalescent is a binary, rooted, unordered tree with a fixed number Inline graphic of leafs. The latter is also called the size of the tree (Figure 1A). Such a tree can be interpreted as the genealogical history of a sample of DNA sequences, where mergers (“coalescents”) of two lineages represent events of common ancestry. Thus, coalescent trees are naturally fitted with a time scale and for this reason they are sometimes called labelled histories. A biologically important generalization of the simple case is the coalescent with recombination. Recombination is a process by which two DNA sequences reciprocally exchange genetic material. In the coalescent framework this translates into lineage splits (Figure 1B). A split represents the un-coupling of the genealogical history of two sequence fragments. The ancestral recombination graph (ARG) [13] is a model to integrate such lineage splits into coalescent trees. Each sequence position Inline graphic along the chromosome is associated with a coalescent tree Inline graphic, which is the marginal tree of the ARG at position Inline graphic. Depending on the rate of recombination, chromosomes are divided into smaller or larger sequence fragments Inline graphic (“haplotype block”) in such a way that all positions within a fragment are free of recombination and therefore have the same marginal tree Inline graphic.

Figure 1. Example coalescent trees.

Figure 1

A: Tree of size Inline graphic generated under the coalescent process. The Inline graphic-axis represents a time scale, with leafs at the ‘present’, and the root in the ‘past’. Starting from the present and going backwards in time, coalescent events are exponentially distributed with a parameter depending on population size (Inline graphic) and the number of lineages at any given point in time. B: Recombination is a prune (asterisk) and re-graft (circle) event: a lineage splits and merges onto another lineage which exists in the population at the time of recombination. This lineage does not need to extend to the present, and it may have become extinct from the entire population (cross). Recombination has changed the height of the coalescent tree with respect to the tree in panel A (Inline graphic), but has not changed root imbalance: for both trees Inline graphic.

The spatial coalescent is the sequence Inline graphic of coalescent trees along a sample of recombining chromosomes. Study of the spatial coalescent is of prominent interest in population genomics, since it contains information about the demographic and evolutionary history of a population. For instance, it has lately been used to infer demographic parameters in non-African human [14]. Unfortunately, the spatial coalescent is not a simple Markov process [15], complicating its probabilistic analysis and leaving many open problems to be addressed.

Here, we investigate the impact of single recombination events upon some measures of tree topology and shape. By topology we mean the branching pattern of a tree; by shape we mean its topology and branch lengths. In particular, we ask how recombination affects tree height and tree (im-)balance. The latter is measured by the difference in size of the left and right subtrees emerging from the root or any internal node. Depending on when and where a recombination event occurs, the effect on altering tree structure may be drastic, mild or completely silent. Informally, drastic events are those which lead to a large change of tree height or balance. These are events which typically involve splits by recombination of the branches emerging from the root of the tree. As such they may strongly affect the genealogical structure of haplotypes. Identifying and characterizing these events is very informative for population genetic inference. Mild events are typically those which occur along very recent branches, close to the leafs of the tree. They do not, or only mildly, affect haplotype structure and mutation frequency spectrum. Interestingly, there is a non-negligible portion of recombination events which do not alter tree topology, i.e. the branching pattern. We call these events silent. Sometimes, also the branch lengths remain unchanged; we call these events hidden (Figure 2).

Figure 2. Non-silent, silent and hidden recombination events.

Figure 2

A: Non-silent recombination changes tree topology. In the case shown, also Inline graphic changes from Inline graphic to Inline graphic. B: A recombination event which changes the order of internal nodes. Whether this event is classified as non-silent or silent, depends on the tree definition. It is non-silent for labelled histories (considered here; eq (1)), but it would be silent for unlabelled trees. C: A silent recombination event, which does not affect the branching pattern, but the lengths of the recombining branches. D: A hidden recombination event. It does neither affect branching pattern nor branch lengths.

Our goals are to formalize these concepts, to characterize in more detail the effect of single recombination events upon tree shape and to quantify the relative frequencies of drastic, mild and silent events. We explicitly calculate the probabilities of changes in height or root balance induced by a single recombination event. Our results are based on the assumption of a standard neutral model of constant population size. This means that for each coalescent event two lineages are chosen at random to merge. Further, the timing of events is exponentially distributed with a rate which, after re-scaling by population size Inline graphic, depends only on the number of lineages at a given time.

In Results Section (a), we define a probability density for the trees in the spatial coalescent and we explain the difference between pointwise marginal trees Inline graphic, evaluated at every basepair Inline graphic of the DNA sequences, and the marginal trees Inline graphic, evaluated at every fragment Inline graphic. We derive a simple relation between the densities of Inline graphic and Inline graphic. In Section (b) we analyze the recombination events which lead to height-changes and derive their probabilities. In Section (c) we quantify the concept of root imbalance, called Inline graphic, and derive the first-order transition probabilities under single recombination events. We focus on events which produce unbalanced trees and, at the same time, lead to an increase of tree height. This type of events is of particular interest for the analysis of biological data. Their effect on the mutation frequency spectrum and on haplotype structure is the basis of tests to reject the neutral evolution hypothesis (e.g., [16][18]). Therefore, for bench-marking it is highly interesting to know how often such events occur under purely neutral conditions, but it is not the goal of this paper to devise another neutrality test. Then, we generalize the results regarding the tree topology parameter Inline graphic and derive the transition probability for arbitrary types of recombination events. Using this, we calculate the run-length distribution of Inline graphic along recombining chromosomes. Finally, in Section 0.4, we calculate the average proportion of hidden recombination events and derive its limiting behavior for large sample sizes.

We remind the reader that the spatial coalescent is a non-Markovian process and not completely determined by transitions of any finite order. However, it is a homogeneous process. Therefore, first-order transition probabilities are well-defined and independent of the position in the sequence. Here, we compute first order probabilities for single recombination events from one tree to the next, averaging over all trees of the ARG which are not directly involved in the recombination event considered. Therefore, our results hold for the spatial coalescent as described by the ARG [13]. In fact, the ARG is the model which is underlying all our calculations.

Results

(a) Tree Distribution and Recombination

We consider a sample of Inline graphic “chromosomes” from a diploid panmictic population of constant size Inline graphic. Without recombination, the genealogical history for these chromosomes is described by the classical coalescent process [1], [2]. The set of all possible coalescent trees of size Inline graphic is a product Inline graphic, where Inline graphic contains positive real waiting times of Inline graphic independent coalescent events and the discrete set Inline graphic represents the set of all possible tree topologies. For our purposes here it is more convenient to consider labelled coalescent trees: this means that not only the internal nodes are ordered but also the leafs carry leaf labels. Hence [19] (see also http://oeis.org/A006472), the cardinality of Inline graphic is

graphic file with name pone.0060123.e034.jpg (1)

Furthermore, all trees in Inline graphic have the same probability Inline graphic, when they are generated under the standard coalescent process [20]. The waiting times Inline graphic for a coalescent event, given Inline graphic lineages, are exponentially distributed with mean Inline graphic. Time runs backward from the leafs to the root of the tree and is measured in units of the coalescent, i.e. time is scaled by four times the population size. Therefore, Inline graphic can be regarded as being equipped with a probability mass function which factorizes into a probability density Inline graphic for each waiting time (Inline graphic) and the discrete probability for the topology Inline graphic. For trees Inline graphic in the above sense, we denote the resulting probability ‘density’ by

graphic file with name pone.0060123.e045.jpg

and we have

graphic file with name pone.0060123.e046.jpg (2)

where Inline graphic is the time interval during which the coalescent tree Inline graphic has Inline graphic lineages.

Modeling recombination as an ARG [13], there are two processes to be considered: coalescence and recombination. Given Inline graphic independent lineages, in the coalescent process two lineages merge into a single one with rate Inline graphic. In the recombination process, a single lineage splits into two with rate Inline graphic, where Inline graphic denotes the population recombination rate, Inline graphic is the recombination rate per base and Inline graphic is the finite length of the sequence. After a recombinational split the two ancestral lineages correspond to different sequence fragments, left and right of the point of recombination. This point is chosen uniformly along the sequence of length Inline graphic. We assume that Inline graphic is small, so multiple recombination events in the same position are negligible.

Given a tree Inline graphic in position Inline graphic, the length before the first recombination event downstream (or upstream) of Inline graphic is geometrically distributed with parameter Inline graphic, where Inline graphic represents the total length of the tree. Since Inline graphic is small, it can be safely approximated by an exponential distribution with the same parameter Inline graphic.

Recombination events may change the shape of the tree. The local tree at position Inline graphic in the genome may differ from the local tree at position Inline graphic due to recombination. Moving along the genome, we consider two different sequences of trees: the sequence Inline graphic of local trees for all positions Inline graphic, and the sequence Inline graphic of local trees which are separated by a single recombination event (Figure 3). Note that a tree in Inline graphic can span several base positions, as the typical length Inline graphic of the fragment Inline graphic is greater than 1. Also, note that consecutive trees in Inline graphic need not be different. This occurs when fragments are separated by hidden recombination events.

Figure 3. Distinction between sequences Inline graphic and Inline graphic along a recombining chromosome (sketched in the middle).

Figure 3

Sequence Inline graphic is the sequence of coalescent trees plotted for each nucleotide. Sequence Inline graphic is the sequence of coalescent trees for each recombination fragment. Recombination breakpoints are indicated by arrows.

The standard coalescent without recombination is recovered when looking at the tree for a single position Inline graphic in the sequence, ignoring all other trees. Neither the rate of coalescent events nor the choice of coalescing lineages in this tree are influenced by ancestral lineages at other positions. The local tree Inline graphic at any position Inline graphic is therefore a standard coalescent tree without recombination [21] and the marginal density of a tree in position Inline graphic of the ARG is identical to Inline graphic; i.e., picking the tree in position Inline graphic from a random sequence Inline graphic is equivalent to generating one from the standard coalescent process without recombination.

On the other hand, picking a tree from a random sequence Inline graphic results in a different distribution. The reason is that short trees recombine less, therefore they tend to span larger regions and to be under-represented in Inline graphic compared to Inline graphic, as illustrated in Figures 4 and 5.

Figure 4. Cumulative distribution of tree height for Inline graphic (black) and Inline graphic (red) along a recombining chromosome of length Inline graphic bp.

Figure 4

Shown are the height distribution of trees in Inline graphic (solid; “positions”) and in Inline graphic (dashed; “fragments”). For comparison, the theoretical distributions for Inline graphic are plotted in light colors.

Figure 5. Height of neutral coalescent trees along the genome.

Figure 5

One simulation run using ms [5] with Inline graphic and Inline graphic. On the right, the distribution of the trees according to Inline graphic and Inline graphic and the average length before a recombination event, for a simulation of a sequence of length Inline graphic.

In fact, the two distributions differ by weights which are proportional to the length Inline graphic of the fragments spanned by each tree. Since in the limit of large sequences the average length is Inline graphic, we have Inline graphic. Therefore, for large sequences, the tree density after a random recombination event is given by

graphic file with name pone.0060123.e102.jpg (3)

where Inline graphic denotes the total length of the tree. For the standard neutral model, Inline graphic. Note that the two distributions differ only in their weights of branch lengths, but not with respect to topology.

The argument leading to eq (3) can be made rigorous under the assumption of infinitely long chromosomes, using the fact that the coalescent with recombination is an ergodic process [22] (see Text S1, Supporting Information eqs (1)–(3)). As a check of eq (3), we show that Inline graphic is invariant under a single recombination event. Let Inline graphic be the transition density from tree Inline graphic in a given position Inline graphic to tree Inline graphic in position Inline graphic, and Inline graphic the transition density from tree Inline graphic to tree Inline graphic obtained by a single recombination event. Since the marginal density Inline graphic is the same for every position, we have

graphic file with name pone.0060123.e115.jpg (4)

independent of the recombination rate. For small recombination rates and at first order in Inline graphic, we have Inline graphic. Substituting this into (4) gives

graphic file with name pone.0060123.e118.jpg (5)

That is, after normalization Inline graphic is an invariant distribution under Inline graphic. The normalization is Inline graphic.

Furthermore, any marginal tree obtained from an ARG (conditioned on the number of recombinations in the sequence) by choosing randomly an ancestral lineage for every recombination event is distributed according to Inline graphic. This can be seen from symmetry: none of two trees separated by a single recombination event is distinguished, so they have the same distribution, which is the invariant distribution under a single recombination event, i.e. Inline graphic. This property has far-reaching consequences since it makes it possible to exploit the symmetries of the ARG.

Note that the two distributions, Inline graphic and Inline graphic, become asymptotically identical when Inline graphic becomes large. To see this, it suffices to consider the random variable Inline graphic. Its mean is identical to Inline graphic. Since Inline graphic for large Inline graphic [2], one has

graphic file with name pone.0060123.e131.jpg (6)

The right hand side of equation (6) converges to Inline graphic with increasing Inline graphic. Therefore the factor Inline graphic converges to 1 and Inline graphic (in the sense of local weak convergence). The relations between the empirical probability distributions Inline graphic and Inline graphic along the sequence and the probability densities Inline graphic and Inline graphic are summarized in the following diagram:

graphic file with name pone.0060123.e140.jpg

The distributions Inline graphic and Inline graphic need to be carefully distinguished when measuring the effect of a single recombination event. If one asks for the first recombination event downstream of a given position Inline graphic in the genome, then the initial tree at position Inline graphic is distributed with Inline graphic. If one asks instead for the effect of a randomly chosen recombination event, then the density Inline graphic is the appropriate one.

(b) Height-changing Recombination Events

Probabilities of height changing events

Recombination can be interpreted as a random prune-and-regraft event on the tree [23]. First, a time point of pruning is selected uniformly anywhere on the tree; second, the node immediately above the selected branch is removed; third, the pruned branch is re-grafted onto the tree anywhere above the pruning point or onto the ancestral lineage of the root, forming a new node. For hidden recombination events, prune and re-graft occur on the same branch, without modifying topology or branch lengths of the tree.

We denote the root node by Inline graphic and the first internal node by Inline graphic. There are four types of recombination events that change the height of the tree (Figure 6).

Figure 6. Types of height-changing recombination events.

Figure 6

The square indicates the new node created by re-grafting. It forms the new root in cases U, D and N. In case S, an existing internal node becomes the new root (empty square overlaid on node Inline graphic).

U (‘up’): a prune-and-regraft event on the root branches generates a higher root without changing the topology;

D (‘down’): a prune-and-regraft event on the root branches generates a lower root without changing the topology;

N (‘new’): pruning a branch below the root branches and re-grafting onto the ancestral branch of the root creates a new root, while the old root becomes internal node Inline graphic;

S (‘substitute’): pruning a root branch and re-grafting onto a branch in the subtree of Inline graphic causes Inline graphic to become the root.

In fact, for the root to change height it must either be shifted (cases U and D) or be replaced (cases N and S). If the root is replaced, it can become an internal node Inline graphic (case N) or be lost (case S). Cases U and D leave the topology unchanged, while cases N and S do not.

We denote the probabilities of these events by Inline graphic, Inline graphic, Inline graphic, Inline graphic. We compute these quantities under both distributions, Inline graphic and Inline graphic.

Given a coalescent tree of size Inline graphic, let the level Inline graphic be the time interval when exactly Inline graphic independent lineages coexist, with Inline graphic. The waiting time at the Inline graphicth level is Inline graphic, in the following called Inline graphic for short. Tree height may be increased by recombination events of type U or N. The total probability for this, Inline graphic, is given by the sum of the probabilities of pruning at all possible levels, but never re-grafting lower than the root:

graphic file with name pone.0060123.e168.jpg (7)

where the product is defined to be Inline graphic when Inline graphic. This is a telescopic series that can be re-summed in a function of the total length of the tree

graphic file with name pone.0060123.e171.jpg

yielding the simple result

graphic file with name pone.0060123.e172.jpg (8)

Interestingly, this probability depends only on the total length Inline graphic of the tree and not on the topology. Very short trees grow with high probability, very long trees are unlikely to grow (Figure S1). The average probability of height-increase when passing from one recombination-delimited sequence fragment to the next is

graphic file with name pone.0060123.e174.jpg
graphic file with name pone.0060123.e175.jpg (9)

which agrees very well with simulations (Figure 7). Note that Inline graphic approaches zero as slowly as Inline graphic.

Figure 7. Increase of tree height.

Figure 7

Probabilities Inline graphic (black), Inline graphic (green) and Inline graphic (red) of events that increase tree height as a function of sample size Inline graphic. Dots represent the values of Inline graphic obtained by simulations using program ms [5] and selecting a random recombination event which is far from the sequence boundaries.

This result can also be derived directly by counting ARGs, since Inline graphic corresponds to the distribution of a random tree in an ARG. We will consider the case of a recombination event at a given level Inline graphic and then average over all levels. To obtain the total number of ARGs Inline graphic with a single recombination event at level Inline graphic, choose a tree at random (among Inline graphic possibilities), then choose the branch to be pruned (Inline graphic possibilities) and the branch to which it is re-grafted at the same or a higher level (Inline graphic possibilities). Therefore,

graphic file with name pone.0060123.e190.jpg (10)

The number of ARGs where the new tree is higher than the old one is Inline graphic, because there is just one possibility of re-grafting, namely on the ancestral lineage above the root of the old tree. The probability of pruning at level Inline graphic in the old tree is Inline graphic. Therefore, one can average over Inline graphic to obtain Inline graphic, which is identical to equation (9).

Focusing now on pruning of the root branches, we obtain Inline graphic analogously to equation (7). Let Inline graphic be the number of direct descendants of node Inline graphic at level Inline graphic. Inline graphic can take values Inline graphic. The average value of Inline graphic satisfies the recursion

graphic file with name pone.0060123.e203.jpg
graphic file with name pone.0060123.e204.jpg

that has the solution

graphic file with name pone.0060123.e205.jpg

In particular, the average number of direct descendants of the root at level Inline graphic is Inline graphic. The probability Inline graphic is a modification of equation (7): multiplying by the fraction of events that are actually of type U, i.e. Inline graphic, one obtains

graphic file with name pone.0060123.e210.jpg (11)

In contrast to equation (7), equation (11) cannot be easily simplified since it depends also on the topology. After averaging over Inline graphic, we obtain

graphic file with name pone.0060123.e212.jpg (12)

and

graphic file with name pone.0060123.e213.jpg (13)

where Inline graphic.

The probabilities Inline graphic and Inline graphic can be computed similarly to the above formulae, giving

graphic file with name pone.0060123.e217.jpg (14)

and

graphic file with name pone.0060123.e218.jpg (15)

(Text S1, Supporting Information eqs (4)–(9)). Alternatively, one may employ an argument based on symmetry properties of the ARG. Among two adjacent trees in the ARG, the left one is smaller or larger than the right one with equal probability. Therefore,

graphic file with name pone.0060123.e219.jpg (16)

The same is true when the root is only shifted. Thus,

graphic file with name pone.0060123.e220.jpg (17)

Hence, by subtraction,

graphic file with name pone.0060123.e221.jpg (18)

Note that the identities (17) and (18), being topological in nature, are also valid for models with variable population size. A related result about the probability that a random recombination event leaves tree height unchanged (Inline graphic) has been obtained previously by Griffiths & Marjoram [24].

Equations (8), (11), (14), (15) are valid also when averaging over the distribution Inline graphic, instead of Inline graphic. However, exact results are available only for small sample sizes. For the case of arbitrary Inline graphic we use the following Taylor approximation of the ratio moment

graphic file with name pone.0060123.e226.jpg (19)

where Inline graphic represents the desired probability Inline graphic. When the expansion is truncated at zeroth order (i.e., replacing the first moment of the ratio by the ratio of first moments), one obtains the results analogous to equations (12), (13), (17) and (18). More detailed calculations are given in Text S1, Supporting Information eqs (10)–(12). These yield, for instance, the probability of increasing tree height

graphic file with name pone.0060123.e229.jpg (20)

Note that the scaling factor on the right hand side in equation (20) approaches Inline graphic very slowly with increasing Inline graphic. The case Inline graphic is actually an exception since an exact formula exists [15] for all values of ; in fact, Inline graphic depends only on Inline graphic, therefore it is sufficient to average this quantity over the distribution of Inline graphic obtained in [15]. For small samples there is a considerable difference between Inline graphic and Inline graphic. For example, if Inline graphic, we have Inline graphic while only Inline graphic.

Amount of change in height

The variation in height Inline graphic has a simple distribution. If the height increases, then the difference is given by the waiting time for coalescence of two lineages. It is

graphic file with name pone.0060123.e242.jpg (21)

and

graphic file with name pone.0060123.e243.jpg (22)

where Inline graphic is the Heaviside function, Inline graphic if Inline graphic and Inline graphic otherwise. If the height decreases because of an event of type D, its distribution is given by the waiting time for coalescence before time Inline graphic, equivalent to the “bounded coalescent” for two lineages [25]

graphic file with name pone.0060123.e249.jpg (23)

For events of type S, the variation in height is simply the waiting time Inline graphic of the tree

graphic file with name pone.0060123.e251.jpg (24)

where Inline graphic is the Dirac delta distribution. Averaging these quantities over Inline graphic and using the symmetries of the ARG, we obtain

graphic file with name pone.0060123.e254.jpg (25)

and

graphic file with name pone.0060123.e255.jpg (26)

i.e., all these variations in height are exponentially distributed for an average tree.

Taking expectations, the average change in height after one of these events is

graphic file with name pone.0060123.e256.jpg

irrespective of the type of event, i.e Inline graphic. Comparing this to the average height of a tree, Inline graphic, one notices that a single recombination event changes tree height by 50% on average.

(c) Root Imbalance and Recombination

Let Inline graphic (Inline graphic) be the number of left (right) descendants of the root. We have Inline graphic. We call the random variable Inline graphic root imbalance. Inline graphic is a coarse-grained measure of tree topology. A recombination event may or may not change Inline graphic and a change of Inline graphic is neither sufficient nor necessary for a change in tree height. Since many recombination events induce rearrangements of the lower branches (close to the leafs) of the tree, they may affect Inline graphic without affecting tree height. Still, large changes in Inline graphic are often associated with height-changing recombination events of type N or S and thus are associated with drastic changes of tree topology.

In this section we calculate the transition probabilities Inline graphic for Inline graphic under a single recombination event, averaged over the initial tree. First, we focus on events of type UN, i.e. increasing height, and then we obtain the transition probabilities for all types of events separately.

Root imbalance and height-increasing events

Let the size of a branch be the number of leaves below the branch. A specific tree of size Inline graphic can be fully described by the probability Inline graphic that a randomly chosen branch at level Inline graphic has size Inline graphic. Averaging over trees of size Inline graphic, the probability that a branch of level Inline graphic has size Inline graphic is

graphic file with name pone.0060123.e277.jpg (27)

[26]. Let Inline graphic be the probability that the height increases and the pruned branch has size Inline graphic. It is obtained, similarly to Inline graphic, by multiplying each term of the sum in equation (7) by Inline graphic. Thus, given a tree Inline graphic,

graphic file with name pone.0060123.e283.jpg (28)

and, averaging over Inline graphic, one obtains

graphic file with name pone.0060123.e285.jpg (29)

More generally, the probability that the pruned branch has size Inline graphic, given that recombination leads to an increase in height, is simply Inline graphic. The random variable Inline graphic can take values between Inline graphic and Inline graphic and is the folded version of the random variable Inline graphic which ranges from Inline graphic to Inline graphic. Hence, the distribution of Inline graphic, after an event that increases tree height, is

graphic file with name pone.0060123.e295.jpg

and the distribution of Inline graphic, conditioned on tree height increase, is

graphic file with name pone.0060123.e297.jpg (30)

as illustrated in Figure S3.

Now we calculate the probability conditioned on the value Inline graphic of Inline graphic before recombination, i.e. the transition probability Inline graphic. The basic quantity for this computation is the probability Inline graphic that a branch at level Inline graphic has size Inline graphic in a tree of total size Inline graphic, given that the size of the root branches are Inline graphic and Inline graphic. To compute this, we need information about the actual size Inline graphic at level Inline graphic of the subtree of size Inline graphic of the root. We denote the distribution of Inline graphic by Inline graphic and the distribution of Inline graphic given the sizes Inline graphic and Inline graphic of its root subtree at levels Inline graphic and Inline graphic by Inline graphic. Note that Inline graphic does not depend on Inline graphic nor on Inline graphic, but only on the size of the root subtree to which it belongs (see Figure S4). Therefore we have

graphic file with name pone.0060123.e321.jpg (31)

The probability Inline graphic is equal to

graphic file with name pone.0060123.e323.jpg (32)

as can be shown by considering the corresponding subtree of the root as the whole tree and using equation (27). The probability Inline graphic depends only on the topology, therefore it can be obtained by counting the number of labelled coalescent trees (http://arxiv.org/abs/1112.1295v2) with a root branch of size Inline graphic in the whole tree that reduces to size Inline graphic at level Inline graphic, denoted by Inline graphic, and dividing by the total number of trees with a root branch of size Inline graphic, denoted by Inline graphic. Using that Inline graphic, that the coalescent process induces a uniform distribution on Inline graphic and that the distribution of Inline graphic is Inline graphic [27], we have

graphic file with name pone.0060123.e335.jpg (33)

The set of all trees in Inline graphic can be generated in the following way: (i) choose Inline graphic leafs out of Inline graphic; (ii) choose an relative order of the Inline graphic coalescent events among the two subsets with Inline graphic and Inline graphic leafs such that among the first Inline graphic events Inline graphic events belong to the first subset and Inline graphic belong to the second; (iii) choose a topology for the root subtree of size Inline graphic; (iv) choose a topology for the complementary subtree of the root. This process generates exactly once all trees in Inline graphic, except for the case Inline graphic, where each tree is generated twice. Therefore, we have

graphic file with name pone.0060123.e348.jpg (34)

Taking the ratio of tree counts, we obtain an hypergeometric distribution

graphic file with name pone.0060123.e349.jpg (35)

Finally, inserting the results (32) and (35) into (31), we obtain

graphic file with name pone.0060123.e350.jpg (36)
graphic file with name pone.0060123.e351.jpg

where Inline graphic and Inline graphic are the normalization and the mean (i.e., the zeroth and first moment) of the hypergeometric distribution with parameters Inline graphic, Inline graphic and Inline graphic, if they satisfy Inline graphic, and Inline graphic otherwise. Note that Inline graphic.

As before, we introduce Inline graphic in equation (7) to obtain

graphic file with name pone.0060123.e361.jpg (37)

and, finally, the result

graphic file with name pone.0060123.e362.jpg (38)
graphic file with name pone.0060123.e363.jpg (39)

Figures 8 and S5 illustrate these probabilities. With a recombination event of type N, Inline graphic tends to change to smaller values. Thus, the tree becomes more unbalanced. However, by far the highest probability is attained for Inline graphic, irrespective of Inline graphic and mainly due to events of type U. This case is omitted from the figures for clarity.

Figure 8. Transition probabilities of Inline graphic.

Figure 8

Distribution Inline graphic as a function of Inline graphic (horizontal axis) and Inline graphic (vertical axis) for Inline graphic. The diagonal terms (Inline graphic) are not shown.

Other recombination events that change root imbalance

Now we consider all possible recombination events that change Inline graphic. Events of type U and D do not change Inline graphic, so they can be ignored. Apart from the events of type N that we discussed above, other relevant recombination events are of type S and of type R (‘root remains’), i.e. any event which leaves the root untouched. To compute the probability of a change in Inline graphic for these types of events, we use the fact that random trees from an ARG have the distribution Inline graphic and that the probability of each labelled ARG topology is the same. Due to this, we need only count the number of ARGs with a single recombination event at level Inline graphic compatible with root imbalances Inline graphic and Inline graphic, and denoted by Inline graphic and Inline graphic. Then, we divide by the total number Inline graphic of ARGs with a recombination at level Inline graphic and root imbalance Inline graphic for the original tree. Putting everything together, we obtain

graphic file with name pone.0060123.e385.jpg (40)
graphic file with name pone.0060123.e386.jpg
graphic file with name pone.0060123.e387.jpg
graphic file with name pone.0060123.e388.jpg
graphic file with name pone.0060123.e389.jpg
graphic file with name pone.0060123.e390.jpg
graphic file with name pone.0060123.e391.jpg

where Inline graphic is the second moment of the hypergeometric distribution with parameters Inline graphic, Inline graphic and Inline graphic satisfying Inline graphic, and Inline graphic otherwise, and Inline graphic is the Heaviside function, Inline graphic if Inline graphic and 0 otherwise. Note that the ARG symmetries imply the non-trivial relation

graphic file with name pone.0060123.e401.jpg (41)

The relative importance of Inline graphic versus Inline graphic and Inline graphic is shown in Figure S6.

The contribution for events of type S can be obtained using the symmetry properties of the ARG. In fact, an ARG with a recombination event of type S changing Inline graphic to Inline graphic is equivalent to an ARG with an event of type N changing Inline graphic to Inline graphic. Therefore,

graphic file with name pone.0060123.e409.jpg (42)

This result is essentially the transpose of the one shown in Figure 8, i.e. after an event of Type S, Inline graphic has an almost uniform distribution irrespective of Inline graphic.

Finally, the transition probability is

graphic file with name pone.0060123.e412.jpg (43)

This distribution is shown in Figures S7 and S8 for Inline graphic.

(d) Hidden and Silent Recombination Events

Counting ARGs we now determine the fraction of hidden recombination events, i.e. those which neither change tree topology nor branch lengths. Since these events are ‘invisible’ when analysing sequence polymorphisms or haplotype structure, their frequency can only be estimated by theoretical means.

Hidden recombination events are caused by pruning and re-grafting on the same branch (see Figure 2D). Let Inline graphic denote the number of ARGs with a hidden event at level Inline graphic. Since ARG topologies are equiprobable under Inline graphic, the probability that a recombination event is hidden is

graphic file with name pone.0060123.e417.jpg (44)

where Inline graphic is the probability of pruning at level Inline graphic. To calculate Inline graphic we need to consider the following ingredients. A branch pruned under node Inline graphic can be regrafted in Inline graphic topologically inequivalent ways on the same branch (but possibly on different levels). This number has to be multiplied by the number of branches under node Inline graphic at level Inline graphic (denoted by Inline graphic). Then, one has to sum over all possible nodes Inline graphic and over all possible initial trees Inline graphic. This yields

graphic file with name pone.0060123.e428.jpg (45)

Combining eqs (44) and (45) we obtain

graphic file with name pone.0060123.e429.jpg (46)

This means that the fraction of hidden recombination events is of the order Inline graphic. They are quite frequent for small to moderate Inline graphic, but become increasingly rare with increasing Inline graphic. Still, even when Inline graphic, about 9% of all recombination events are hidden.

Using the same technique of counting ARGs also the fraction of silent recombination events (i.e. events that do not change topology but that may change branch lengths) can be obtained. We start by counting events that are silent but not hidden. Given a tree, select a branch for pruning. Then, there are exactly two ways for re-grafting: either on the branch immediately above or on the branch immediately below the old parent node of the pruned branch (Figure 2B or C), but not on the pruned branch itself (the latter would be a hidden event). Performing similar calculations as before we obtain

graphic file with name pone.0060123.e434.jpg (47)

Therefore,

graphic file with name pone.0060123.e435.jpg (48)

Note that the following holds:

graphic file with name pone.0060123.e436.jpg (49)

An intuitive explanation is the following: for any pruning point, there are two possible ways for re-grafting such that tree topology remains unchanged and there is exactly one way for re-grafting which leads to an increase of tree height. Therefore, Inline graphic. Then, eq (49) follows from symmetry of the ARG. Note that this argument is topological and does not depend on waiting times, i.e. branch lengths.

(e) Correlation Lengths

Since the spatial coalescent is a non-Markovian process, it is important to know over which chromosomal distances correlation and statistical dependence among trees persist. Correlation between trees, measured by any well-behaved tree statistic, decreases with distance. An interesting question is how quickly recombination reduces correlation. The answer depends on the particular statistic which is employed to measure correlation. Topology based statistics, such as Inline graphic (measuring imbalance at the root) or Colless’ index [28] (measuring imbalance at all internal nodes), behave differently from length based statistics, such as tree height (Figure 9).

Figure 9. Correlation length Inline graphic (blue line) as a function of sample size Inline graphic.

Figure 9

The red line is the approximation Inline graphic.

We use our above results regarding events of type U, D, N, S and R to give a quantitative answer. The idea is to approximate the correlation length for a statistic by the inverse of the probability of recombination events that have a strong impact on this statistic.

Events of type U or D change height, but leave the topology unchanged. Events of type R preserve height but alter topology. Events of type N or S may change both, height and topology. They also lead to the fastest decay of correlation.

The average number of recombination events before an event of type N or S occurs is the inverse of this probability. This quantity is a rough estimate for the correlation length of tree shape. The numerical values of Inline graphic for 20≲n≲100 lie between Inline graphic (Figure S2). Based on this estimate, correlation between trees should decay strongly within Inline graphic to Inline graphic recombination events. This is in agreement with numerical simulations. More generally, the topological correlation length can be roughly estimated as

graphic file with name pone.0060123.e446.jpg (50)

It Increases Logarithmically in Inline graphic (Figure 9)

To translate this into physical length, we assume that the distance between two consecutive recombination events is exponentially distributed with mean Inline graphic. Averaging over Inline graphic we obtain Inline graphic. Therefore, distance Inline graphic between two events of type N or S is approximately

graphic file with name pone.0060123.e452.jpg (51)

independent of Inline graphic. For example, if the scaled recombination rate is Inline graphic, the genomic distance between such events is about Inline graphickb. Assuming that also the scaled mutation rate is Inline graphic per bp and assuming Inline graphic, an interval between drastic recombination events of type N or S contains about Inline graphic polymorphic sites. This number should be sufficiently high to enable at least a rough tree re-construction from SNP data, and to estimate Inline graphic. It will probably not be sufficient for the reconstruction of the fine topological structure of the lower branches.

To estimate the correlation length of Inline graphic, also events of type R need to be taken into account. In fact, changes in Inline graphic occur more often than events of type N or S. Using equation (43), we determined the run-length of Inline graphic, i.e. the number of recombination events that occur before a change in Inline graphic happens. Considering a random initial tree, an estimate for the run-length is given by

graphic file with name pone.0060123.e464.jpg (52)

The run-length is longer for more imbalanced trees, but always on the order of a few recombination events (between Inline graphic and Inline graphic; Figure 10). This is also a reasonable estimate for the correlation length of the fine topological structure.

Figure 10. Run length Inline graphic as a function of Inline graphic for even sample sizes (A) (Inline graphic) and for odd sample sizes (B) (Inline graphic).

Figure 10

We now consider correlation in tree height. Height can change by events U,D,N and S. The average change in height is the same, Inline graphic, for all these events. Therefore, correlation length can be estimated as

graphic file with name pone.0060123.e472.jpg

Since

Inline graphic is between Inline graphic and Inline graphic for 20≲n≲100 (Figure 7), drastic changes in height are expected on average every Inline graphic to Inline graphic recombination events. More generally, the correlation length also increases logarithmically in Inline graphic and is

graphic file with name pone.0060123.e479.jpg (53)

For the physical correlation length we have.

graphic file with name pone.0060123.e480.jpg (54)

This is only about a quarter of the topological correlation length. Therefore, an exact reconstruction of tree height is difficult. For instance, for Inline graphic and Inline graphic, one would have on average only Inline graphic SNPs to estimate height or other tree parameters.

For the case Inline graphic, Hudson [21] gives a formula for the correlation between the heights of two trees in dependence of the recombination rate Inline graphic. The formula predicts that the correlation drops to about Inline graphic with Inline graphic, i.e. after approximately 1.4 recombination events. Our rough estimate for the correlation length in this case is Inline graphic, and in good agreement with Hudson’s result.

Finally, we briefly comment that linkage disequilibrium and haplotype block size depend strongly on the number and distribution of mutation and recombination events along coalescent trees, i.e. they depend strongly on tree topology and length. Since topology can in practice only be indirectly estimated from polymorphism patterns, not all changes in topology are actually visible for these statistics. The correlation lengths estimated from experimental data will tend to be larger than the theoretical estimates presented here. Assuming that haplotype blocks are mostly delimited by ‘drastic’ recombination events, involving a change of topology, we estimate the size of these haplotype fragments Inline graphic, centered at some position Inline graphic with a tree Inline graphic. Assuming further that neither tree length Inline graphic nor the probability of topology-changing drastic recombination events Inline graphic change much after a ‘non-drastic’ recombination event, the probability distribution for the haplotype sizes is

graphic file with name pone.0060123.e494.jpg (55)

The average size is then

graphic file with name pone.0060123.e495.jpg (56)

The class of drastic recombination events that should be considered to determine Inline graphic is probably larger than the class of type N and S events. However, Inline graphic is a reasonable lower bound approximation.

Discussion

We have considered the effect of single recombination events on coalescent tree topology and explicitly determined the probability with which recombination triggers ‘drastic’ changes. We consider a change to be drastic if it leads to a change of tree height or of tree imbalance. These types of events are of practical interest because both have an effect on the pattern of polymorphic sites which are informative for genealogical reconstruction and evolutionary inferences. The primary effect of height change is upon the number of mutations, while a change in tree imbalance primarily affects the mutation site frequency spectrum.

Our results show important qualitative differences for the two types. The average change in height is quite drastic per se (50% of average tree height), while the average change in imbalance is quite mild, with large jumps occuring only very rarely. Our results hold for the standard neutral model, i.e. a model with constant population size and without substructure. As such, our results may serve as the analytical reference case for constructing formal tests of the neutral evolution hypothesis. For instance, the probabilities of height or topology change are markedly altered in the presence of selective sweeps, i.e. the fast fixation of a mutant allele due to positive selection. Recombination close to the sweep site, where tree height is severely reduced [29], tends to lead to both a drastic increase of tree height and highly imbalanced trees [16], [18]. In contrast, variable population size leaves a different signature on the probabilities of drastic recombination events. Non-constancy of Inline graphic is reflected in branch length variation, but it has no impact on the branching pattern, i.e. on topology. In fact, if panmixis continues to hold, the probability distribution of tree topologies does not depend on population size. Variation of Inline graphic affects only branch lengths and waiting times. Since all our results, averaged over Inline graphic, depend implicitly on the first moments of the waiting times through the quantity Inline graphic, they can in principle be adapted to models with variable population size using the theory developed earlier [26], [30]. A detailed treatment is left to further investigation. Here we just note that the relations (17), (18) and (49) are valid for all models of variable population size.

Population substructure is another important case of deviation from the standard neutral model. Restricted gene flow between sub-populations strongly affects the transition probabilities of root imbalance, but less the distribution of height change. A more detailed discussion of the impact of these evolutionary scenarios upon a test statistic of the neutral evolution hypothesis is given in [18].

We have derived a number of further results which shed more light on the details and consequences of recombination. We analysed the correlation length between trees on a recombining chromosome and showed that topological correlation is generally longer-ranging than correlation in tree height. Still, for both types very few recombination events – on the order of ten – are sufficient to unlink the genealogical histories of two genomic fragments, given standard neutral conditions. The calculations also make clear that correlation length (number of recombinations) scales logarithmically in Inline graphic. This is important to take into account for deep sequencing association studies.

It is perhaps surprising to see that a considerable fraction of recombination events remains hidden. Even for large sample sizes, about Inline graphic of the recombination events are not visible. An even larger fraction is silent, i.e. does not cause topological changes of the underlying genealogy.

Analyzing root imbalance in more detail, we found that the distribution of Inline graphic-run lengths is biased towards unbalanced trees: under the standard neutral model, unbalanced trees tend to span larger genomic regions than balanced trees. Interestingly, the Inline graphic-run length, when normalized, is asymptotically independent of Inline graphic. Our results provide a basis to tackle problems of correlation between tree statistics in coalescent models. They extend known results, such as the one by Hudson [21] concerning tree height correlation, to the more general case of arbitrary sample size Inline graphic.

Some of the quantities studied here involve counting problems of ancestral recombination graphs with a single recombination event. These problems are related to counting problems of phylogenetic networks [31]. Unlike counting problems of trees, which can often be tackled by generating function techniques ([20], arxiv.org/abs/1112.1295v2, arxiv.org/abs/1202.5668v3), only few results are available for tree-like structures with independent cycles so far [32]. Our results represent a step towards a combinatorial treatment of these problems.

Supporting Information

Figure S1

Probability of increasing height after a recombination event as a function of the total tree length Inline graphic.

(PDF)

Figure S2

Probability of recombination events Inline graphic which change tree height and topology as a function of the sample size Inline graphic.

(PDF)

Figure S3

Distribution Inline graphic of Inline graphic after an event that increases tree height, for Inline graphic.

(PDF)

Figure S4

Illustration of the sizes Inline graphic and Inline graphic of the subtrees at the levels Inline graphic and Inline graphic corresponding to pruning and regrafting, respectively.

(PDF)

Figure S5

Probability distribution Inline graphic for Inline graphic (in blue, pink, yellow, green) and Inline graphic. For clarity, only the probabilities for Inline graphic are shown.

(PDF)

Figure S6

Ratio Inline graphic as a function of Inline graphic (Inline graphic-axis) and Inline graphic (Inline graphic-axis) for Inline graphic. For clarity, only the probabilities for Inline graphic are shown.

(PDF)

Figure S7

Distribution Inline graphic of Inline graphic for Inline graphic (in blue, pink, yellow, green) and Inline graphic.

(PDF)

Figure S8

Distribution Inline graphic as a function of Inline graphic (Inline graphic-axis) and Inline graphic (Inline graphic-axis) for Inline graphic. For clarity, only the probabilities for Inline graphic are shown.

(PDF)

Text S1

Supporting information.

(PDF)

Acknowledgments

We would like to thank Jeff Thorne and two anonymous reviewers for very constructive comments, and A. Klassmann and S. Ramos-Onsins for numerous discussions.

Funding Statement

This work was supported by grants from the German Research Foundation (DFG-SFB680 and DFG-SPP1590) to TW, grant AGL2010-14822 (MICINN, Spain) to Miguel Perez-Enciso and by a Consolider Grant from the Spanish Ministry of Research, CSD2007- 00036 “Centre for Research in Agrigenomics.” LF acknowledges support from CSIC (Spain) under the JAE-doc program, and by a visiting scholar grant from SFB680. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Kingman JFC (1982) The coalescent. Stochastic Processes and their Applications 13: 235–248. [Google Scholar]
  • 2. Hudson RR (1990) Gene genealogies and the coalescent process. In: Oxford Surveys in Evolutionary Biology, Oxford University Press, volume 7: 1–44. [Google Scholar]
  • 3.Wakeley J (2009) Coalescent theory – an introduction. Greenwood Village, Colorado: Roberts&Company.
  • 4. Kimura M (1987) Molecular evolutionary clock and the neutral theory. J Mol Evol 26: 24–33. [DOI] [PubMed] [Google Scholar]
  • 5. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18: 337–338. [DOI] [PubMed] [Google Scholar]
  • 6. Kim Y, Wiehe T (2009) Simulation of DNA sequence evolution under models of recent directional selection. Brief Bioinform 10: 84–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Ewing G, Hermisson J (2010) MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26: 2064–2065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Griffiths RC (1984) Asymptotic line-of-descent distributions. J Math Biol 21: 67–75. [Google Scholar]
  • 9. Sagitov S (1999) The general coalescent with asynchronous mergers of ancestral lines. J Appl Probab 36: 1116–1125. [Google Scholar]
  • 10. Greven A, Pfaffelhuber P, Winter A (2009) Convergence in distribution of random metric measure spaces (Λ-coalescent measure trees). Probab Theory Relat Fields 145: 285–322. [Google Scholar]
  • 11. Bhaskar A, Kamm JA, Song YS (2012) Approximate sampling formulae for general finite-alleles models of mutation. Adv Appl Probab 44: 408–428. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Angel O, Berestycki N, Limic V (2012) Global divergence of spatial coalescents. Probab Theory Relat Fields 152: 625–679. [Google Scholar]
  • 13. Griffiths RC, Marjoram P (1996) Ancestral inference from samples of DNA sequences with recombination. J Comput Biol 3: 479–502. [DOI] [PubMed] [Google Scholar]
  • 14. Li H, Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature 475: 493–496. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Wiuf C, Hein J (1999) Recombination as a point process along sequences. Theor Popul Biol 55: 248–259. [DOI] [PubMed] [Google Scholar]
  • 16. Fay J, Wu C (2000) Hitchhiking under positive Darwinian selection. Genetics 155: 1405–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Li H (2011) A new test for detecting recent positive selection that is free from the confounding impacts of demography. Mol Biol Evol 28: 365–375. [DOI] [PubMed] [Google Scholar]
  • 18.Li H, Wiehe T (2012) Coalescent tree imbalance as an indicator of selective sweeps. (in review). [DOI] [PMC free article] [PubMed]
  • 19. Murtagh F (1984) Counting dendrograms: A survey. Discrete Applied Mathematics 7: 191–199. [Google Scholar]
  • 20. Disanto F, Wiehe T (2013) Exact enumeration of cherries and pitchforks in ranked trees under the coalescent model. Mathematical Biosciences 242: 195–200. [DOI] [PubMed] [Google Scholar]
  • 21. Hudson R (1983) Properties of a neutral allele model with intragenic recombination. Theor Popul Biol 23: 183–201. [DOI] [PubMed] [Google Scholar]
  • 22. Wiuf C (2006) Consistency of estimators of population scaled parameters using composite likelihood. J Math Biol 53: 821–841. [DOI] [PubMed] [Google Scholar]
  • 23. Paul J, Steinrücken M, Song Y (2011) An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187: 1115–1128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Griffiths RC, Marjoram P (1997) An ancestral recombination graph. In: Progress in population genetics and human evolution (Minneapolis, MN, 1994), New York: Springer, volume 87 of IMA Vol. Math. Appl. 257–270.
  • 25. Rasmussen M, Kellis M (2012) Unified modeling of gene duplication, loss, and coalescence using a locus tree. Genome Res 22: 755–765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Zivkovic D, Wiehe T (2008) Second-order moments of segregating sites under variable population size. Genetics 180: 341–357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437–460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Colless DH (1982) Review: [untitled]. Systematic Zoology 31: 100–104. [Google Scholar]
  • 29. Kaplan N, Hudson R, Langley C (1989) The “hitchhiking effect” revisited. Genetics 123: 887–899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Griffiths RC, Tavaré S (2003) The genealogy of a neutral mutation. In: Green P, Hjort N, Richardson S, editors, Highly Structured Stochastic Systems, Oxford Statistical Science Series, Oxford University Press, volume 27: 393–412. [Google Scholar]
  • 31.Huson DH, Rupp R, Scornavacca C (2011) Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge University Press.
  • 32. Semple C, Steel M (2006) Unicyclic networks: compatibility and enumeration. IEEE/ACM Trans Comput Biol Bioinform 3: 84–91. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1

Probability of increasing height after a recombination event as a function of the total tree length Inline graphic.

(PDF)

Figure S2

Probability of recombination events Inline graphic which change tree height and topology as a function of the sample size Inline graphic.

(PDF)

Figure S3

Distribution Inline graphic of Inline graphic after an event that increases tree height, for Inline graphic.

(PDF)

Figure S4

Illustration of the sizes Inline graphic and Inline graphic of the subtrees at the levels Inline graphic and Inline graphic corresponding to pruning and regrafting, respectively.

(PDF)

Figure S5

Probability distribution Inline graphic for Inline graphic (in blue, pink, yellow, green) and Inline graphic. For clarity, only the probabilities for Inline graphic are shown.

(PDF)

Figure S6

Ratio Inline graphic as a function of Inline graphic (Inline graphic-axis) and Inline graphic (Inline graphic-axis) for Inline graphic. For clarity, only the probabilities for Inline graphic are shown.

(PDF)

Figure S7

Distribution Inline graphic of Inline graphic for Inline graphic (in blue, pink, yellow, green) and Inline graphic.

(PDF)

Figure S8

Distribution Inline graphic as a function of Inline graphic (Inline graphic-axis) and Inline graphic (Inline graphic-axis) for Inline graphic. For clarity, only the probabilities for Inline graphic are shown.

(PDF)

Text S1

Supporting information.

(PDF)


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES