Abstract
Understanding the patterns and processes of diversification of life in the planet is a key challenge of science. The Tree of Life represents such diversification processes through the evolutionary relationships among the different taxa, and can be extended down to intra-specific relationships. Here we examine the topological properties of a large set of interspecific and intraspecific phylogenies and show that the branching patterns follow allometric rules conserved across the different levels in the Tree of Life, all significantly departing from those expected from the standard null models. The finding of non-random universal patterns of phylogenetic differentiation suggests that similar evolutionary forces drive diversification across the broad range of scales, from macro-evolutionary to micro-evolutionary processes, shaping the diversity of life on the planet.
Introduction
The Tree of Life is a synoptic depiction of the pathways of evolutionary differentiation between Earth life forms [1], and contains valuable clues on the key issue of understanding the diversification of life in the planet [2]. The branching pattern of the Tree of Life, which is being captured at increasing resolution by the advent of molecular tools [3], can be examined to investigate fundamental questions, such as whether it follows universal rules, and at what extent random differentiation mechanisms explain the shape of phylogenetic trees. The examination of the structure of the Tree of Life can also help to infer whether evolution acts at intraspecific scales in a way different from the action of evolution at the interspecific scale. Here we address these fundamental questions on the basis of a comprehensive comparative analysis of phylogenetic trees representing different fractions and domains of the Tree of Life, from interspecific to intraspecific scales. We draw from previous analyses of the geometry of the Tree of Life [4], the characterization of other branching systems [5], [6], and using tools derived from modern network theory [7]–[10] to examine the scaling of the branching in the Tree of Life [11], [12]. Our analysis is based on a thorough data set of more than 5000 interspecific phylogenies and a sample of 67 intraspecific phylogenies (see Text S1), thereby testing the universality of the results derived across scales.
A phylogenetic tree is a set of nodes, each node representing a diversification event, connected by branches (links). For each node i, a subtree Si is made up of a root at node i and all the descendant nodes stemming from this root. The subtree size Ai gives the number of subtaxa that diversify from node i (including itself). Beyond this measure of the diversity degree, the characterization of how the diversity is arranged through the phylogenies can be achieved through the cumulative branch size, Ci, a measure of the subtree shape. It is defined [13] as the sum of the branch sizes associated to all the nodes in the subtree Si, Ci = ΣAj. For the same tree size, and restricting to binary branching events, the smallest value of the cumulative branch size is obtained for a completely symmetric, balanced tree, whereas the most asymmetric, the pectinate or comb-like tree in which all branches split successively from a single one, yields the largest Ci value [13]. To be clearer, we show in Figure 1 the analysis of Ai and Ci for a completely balanced tree (Figure 1A) and for a completely imbalanced tree (Figure 1B). A portion of a real phylogenetic tree is also shown (Figure 1C). How the shape of the tree (i.e., the distribution of the biological diversification) does change with tree size (i.e., with the number of taxa it contains) is given by the scaling of the subtree shape C vs. the subtree size A, as described by the allometric scaling relation C∼Aη. We quantitatively characterize the shape of each tree in our data set by calculating the functions F(A) and F(C), which are the complementary cumulative distribution functions (CCDF) of Ai and Ci values in the tree, respectively, and the value of the allometric scaling exponent, η. We compare the results derived from the analyses of inter- and intra-specific phylogenetic trees among them, to test for the preservation of branching patterns across evolutionary scales, and against those derived from the analyses of randomly-generated trees to test whether the allometric scaling derived can be modeled using simple, random branching rules.
Results
The branch-size CCDF displays power-law tails of the form for large branch size A (Figure 2A). The power-law exponents τA are remarkably similar for the data sets analyzed: τA = 1.76±0.03, and 1.74±0.02 for intra- and interspecific phylogenies, respectively. Similarly, the cumulative-branch-size CCDF also displays a power-law tail of the form at large C, with a similar agreement between the exponents of the intra- and interspecific data sets: τC = 1.53±0.02 and 1.53±0.02, respectively (Figure 2B). The discrepancy observed between the two data sets at the tail of the distributions can be explained by the different sizes of the typical trees on them: each tree contributes a natural cutoff to the overall distribution, and since the intraspecific trees are smaller in average, their cutoff appears at smaller tree sizes.
The allometric exponent, η, that characterizes the scaling of tree shape with tree size (Figure 3A), is also remarkably similar for the intraspecific (η = 1.43±0.01) and the interspecific (η = 1.44±0.01) phylogenies. This constancy of the exponents is still more remarkable when realizing (inset of Figure 3A) that it does not only apply to average properties of sets of intraspecific and interspecific trees, but also to individual phylogenies of groups of organisms pertaining to different kingdoms and living across widely contrasting environments, as it is reflected by the very narrow range of η obtained from different phylogenies (〈η〉 = 1.47, σ = 0.03, Figure 3A). The scaling exponents for our large interspecific data set are also matched almost perfectly (Figure S1) by those derived from a set of 67 interspecific phylogenies randomly drawn from the published literature thereby validating the uniformity of the scaling rules of the broad interspecific phylogenies and the smaller set of intraspecific ones used here. The later was also derived from a similar random sample taken from the published literature (see Text S1).
The allometric scaling of C∼A1.44 derived from our analysis falls somehow in between those obtained by simulated phylogenies derived from two extreme topologies: The symmetric tree gives C∼A ln A, which corresponds to η = 1 with a logarithmic correction, while the pectinate tree has η = 2. The natural null model for tree construction, the Equal-Rates Markov (ERM) model [14], [15], yields a scaling C∼A ln A similar to the symmetric tree with η = 1 but different from the scaling displayed by empirical inter- and intraspecific phylogenies, particularly for large ones (Figure 3B). Therefore some topological aspects of phylogenetic trees are not adequately reproduced by the ERM model. Our results imply that successful lineages diversify more profusely than expected under random branching, generating the large imbalances that characterize emerging depictions of the Tree of Life [4]. Alternative models introducing correlations, such as the proportional-to-distinguishable-arrangements (PDA) model [4], [16] or the beta splitting model [17], could generate more realistic phylogenies. Guided by previous biological allometric scaling analysis, we have assumed a power-law scaling of the form C∼Aη. However, other ansatz could also fit the data. The important point, however, is that these modeling approaches should give similar scaling properties for intra- as for interspecific branching.
Discussion
Traditionally, microevolutionary and macroevolutionary processes have been studied independently by population geneticists and evolutionary biologists, respectively [18]. The divide between these two levels of generation of biological diversity is an old one, rooted in the controversy between Darwinian gradualism and the saltationism proposed by others, prominently paleontologists, to explain macroevolutionary processes [19]. The debate as to whether macroevolution is more than the accumulation of microevolutionary events remains active [18], [20], [21], although refined paleontological evidence supports the continuum between micro- and macroevolution for some lineages [22]. The results presented here show that the branching and scaling patterns in intraspecific and interspecific phylogenies do not differ significantly for the topological properties we have calculated. Thus, shall saltation processes be a factor at the macroevolutive level, this is not reflected in the topology of phylogenetic branching as examined here. Evidence for possible differences in phylogenetic topologies between the inter- and intraspecific levels may require a detailed analysis of branching times, which we have not attempted.
Processes leading to scaling laws in size distributions in natural systems have been formulated as growth models [23], [24]. Many of the findings carry over to scaling properties found in networks [25] and their description in terms of branching processes [26]. But most of these models predict branching topologies similar to the ERM model. An alternative approach to understand the observed exponent would be to trace analogies with scaling laws in different branching systems [5], [6], [27] which have been explained by invoking a natural optimization criterion based in the fact that the observed trees contain the largest possible number of apices within the smallest number of branching levels. For binary trees of size A, where nodes are restricted to occupy uniformly a D dimensional Euclidean space, the minimum value of C scales as Aη, with η = (D+1)/D. This scaling also describes the D-dimensional tree with the maximum size for a given depth (the average distance between root and leaves). The value of η obtained in our phylogeny analysis, η≅1.44, is achieved only for optimal trees restricted to spaces of D≅2.27 dimensions. Given the apparently unlimited number of variables that may yield differences among taxa, restricting their representation to a space with such a small number of dimensions seems unreasonable. This interpretation suggests that the evolutionary process yielding the observed phylogenies is not the most parsimonious one, which could potentially yield a similar biodiversity with fewer branching levels. In fact, the natural choice D = ∞ gives an optimal exponent η = 1, which correspond to the ERM value and departs from observed scaling. Optimal traffic networks [28] also led to the exponent τA = 2 which departs from the empirical scaling exponent reported here for phylogenetic trees.
In summary, the remarkably similar allometric exponents reported here to characterize universally the scaling properties of intra- and inter-specific phylogenies across kingdoms, reproductive systems and environments, strongly suggests the conservation of branching rules, and hence of the evolutionary processes that drive biological diversification, across the entire history of life. Although at short branch sizes the topology of observed phylogenies cannot differ much from that expected under random and symmetric trees, due to the restriction of binary bifurcations in phylogenetic tree reconstruction, significant departures become universally evident as trees become larger, where the null ERM model and real phylogenies differ (Figure 2B). These deviations suggest (a) that the evolution of life leads to less biodiversity than an optimal tree can possibly generate; and (b) the operation of a mechanism generating a correlated branching, where some memory of past evolutionary events is maintained along each branch. This correlated branching pattern implies that entities that diversify faster than average lead to new biological forms that diversify more than average themselves. Invariance across the broad scales considered here indicates that relatively simple rules govern the phylogenetic branching and the unfolding of biodiversity. Their deviation from random models indicates that evolutionary success is a correlated trait within lineages, yielding present asymmetries in the structure of the Tree of Life.
Materials and Methods
Phylogenies databases
On June 30th 2007 we downloaded the 5,212 phylogenetic trees available at that time in the database TreeBASE (http://www.treebase.org). TreeBASE constitutes a large database of interspecific phylogenies, which were collected from previously published research papers. The size of trees oscillates from 10 to 600 tips. Most of the bifurcations in these trees are binary, as confirmed by the fact that the ratio between the number of tips and the total number of nodes gives 0.52 when averaged over all the trees (for perfect binary trees, the ratio is 0.50).
As a comprehensive database comparable to TreeBASE does not exist for intraspecific phylogenies, we constructed an intraspecific data set by manually compiling 67 intraspecific phylogenies from several published phylogenetic analysis [S1–S45]. We compiled this data set in such a way that it contains: 1) Organisms from the main different environments (terrestrial, marine and fresh water), climatic regions (from polar to desert), and branches of life (Table S1). 2) Phylogenetic trees reconstructed with the main phylogenetic tree estimation methods, i.e., neighbor-joining, maximum parsimony and maximum likelihood methods.
In order to test whether the results derived from the examination of the relatively small (67 phylogenies) intraspecific data base can be compared with the much larger (5212) set of interspecific phylogenies extracted from TreeBASE, we sampled the literature to construct a dataset of 67 interspecific phylogenies drawn from the literature [S46–S85] using the same criteria as those to derived the intraspecific phylogeny data base (Table S1), obtaining full agreement (Figure S1). The intra- and interspecific phylogenies derived from the literature ranged between 30 and 170 tips, and they contained mainly binary branching events. An example for each kind of phylogenies is shown in Figures S2A and S3A.
Branch size and cumulative branch size distributions
We associate to each node i of a phylogenetic tree two quantities, the size Ai (number of nodes) of the subtree Si made up of node i and all the descendant nodes below it, that is, the subtree which does not contain the global root of the original tree, and the cumulative branch size, Ci, defined as the sum of the branch sizes associated to all the nodes in the subtree Si, Ci = ΣAj. To characterize the probability distributions of the Ai and Ci values on a particular phylogenetic tree we compute the respective complementary cumulative distribution functions (CCDF): F(A) = probability(Ai>A), and F(C) = probability(Ci>C). We observe that these quantities scale, for large values of A and C, as power laws: and . The exponents τA and τC, thus, characterize the probabilities of {Ai} and {Ci}: and, respectively.
Allometric scaling relationship
We observe that a functional relationship among the values of C and A, i.e. among shape and size, exists and also follows a power law, C∼Aη, characterized by an exponent η. Since this relationship encodes the variation of a system property as size is varied, we can call this an allometric scaling relationship, to stress its connections with other functional relationships relating function and size [11], [13], [27]. We note that introduction of the change of variables C∼Aη into leads to, from which η = (1−τA)/(1−τC.). Thus, only two out of the three exponents are independent. As simple examples for which the above exponents can be computed by direct counting, we mention the pectinate or fully unbalanced tree, i.e. a tree in which all branching occurs successively along a single branch, characterized by the exponents τA = 0, τC = 1/2, η = 2, or the fully symmetric or Cayley tree, characterized by τA = 2, and C∼AlnA, which except for the weak logarithmic correction corresponds to η = 1 and τC = 2. Figures S2B and S3B show, in contrast, the allometric scaling relationship for the particular examples of intra- and inter-specific phylogenies displayed in Figures S2A and S3A.
In order to investigate whether observations differ from random expectations, we have compared the allometric scaling found here with the prediction of a null model [29], the Equal-rates Markov (ERM) model. The ERM model was attributed to Harding [30], and to Cavalli-Sforza and Edwards [31], although it is based on models of the diversification process that date back at least to Yule [23]. The main assumption of the ERM model is that the phylogeny is the product of random branching. This is the result when the “effective speciation rate” (the difference between extinction and speciation rate) is equal for all species. The effective speciation rate may change chronologically, provided that it is the same for all lineages at a given time [23]. For this model we obtain C∼A ln A, or η = 1, and also τA = τC = 2. The random asymmetries introduced by the ERM are not strong enough to change the scaling behavior from the symmetric tree result.
The quantity Ci/Ai can be thought as a measure of the average depth or distance of the phylogenetic tree leaves to the node i. This can be seen taking into account that Ci = Σ(dij+1), where dij corresponds to the distance of each of the nodes j of the subtree Si to the root i. Thus, the relationship between C and A can be written as Ci = Ai+〈d〉iAi, where 〈d〉i is the average depth of the nodes in the subtree Si. The relationship between Ci/Ai and the depth is obtained: Ci/Ai = 〈d〉i+1. This quantity is closely related to the Sackin's index defined as the distance of the leaves to the root: S = Σl ∈leaves dl ,root [32], [33]. It can be shown that for binary trees C = 2S+1, where C = Σ∀i di ,root. Since the scaling law relating the increase of the depth or Sackin's index with three size is known to be the same as the scaling of the Colless' index, measuring the symmetry or balance of a phylogenetic tree [34], our results for η can be put in the context of the numerous studies available on the unbalance of phylogenetic trees [4], [17], [35]. Thus, connections between several methodologies previously used to analyze the topology of trees, such as size distributions [10], [23], unbalance and depth [4], [8], [32]–[35], and transport efficiency [7], [13], [27], [28], are revealed within the framework presented here.
Supporting Information
Footnotes
Competing Interests: The authors have declared that no competing interests exist.
Funding: We acknowledge financial support from MEC (Spain) and FEDER, project FISICOS, from CSIC (Spain) project PIE 200750I016, from SBF (Switzerland) through project C05.0148 (Physics of Risk), and from the European Commission through the NEST-Complexity project EDEN. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Cracraft J, Donoghue MJ. Assembling the Tree of Life. Oxford: Oxford University Press; 2004. [Google Scholar]
- 2.Purvis S, Hector A. Getting the measure of biodiversity. Nature. 2000;405:212–219. doi: 10.1038/35012221. [DOI] [PubMed] [Google Scholar]
- 3.Rokas A. Genomics and the Tree of Life. Science. 2006;313:1897–1899. doi: 10.1126/science.1134490. [DOI] [PubMed] [Google Scholar]
- 4.Blum MGB, François O. Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. Syst Biol. 2006;55:685–691. doi: 10.1080/10635150600889625. [DOI] [PubMed] [Google Scholar]
- 5.Rodriguez-Iturbe I, Rinaldo A. Fractal river basins: chance and self-organization. New York: Cambridge University Press; 1997. [Google Scholar]
- 6.Makarieva AM, Gorshkov VG, Li B-L. Revising the distributive networks models of West, Brown and Enquist (1997) and Banavar, Maritan and Rinaldo (1999): metabolic inequity of living tissues provides clues for the observed allometric scaling rules. J Theor Biol. 2005;237:291–301. doi: 10.1016/j.jtbi.2005.04.016. [DOI] [PubMed] [Google Scholar]
- 7.Garlaschelli D, Caldarelli G, Pietronero L. Universal scaling relations in food webs. Nature. 2003;423:165–168. doi: 10.1038/nature01604. [DOI] [PubMed] [Google Scholar]
- 8.Camacho J, Arenas A. Food-web topology Universal scaling in food-web structure? Nature. 2005;435:E3–E4. doi: 10.1038/nature03839. [DOI] [PubMed] [Google Scholar]
- 9.Proulx SR, Promislow DEL, Phillips PC. Network thinking in ecology and evolution. Trends Ecol Evol. 2005;20:345–353. doi: 10.1016/j.tree.2005.04.004. [DOI] [PubMed] [Google Scholar]
- 10.Klemm K, Eguíluz VM, San Miguel M. Scaling in the structure of directory trees in a computer cluster, Phys Rev Lett. 2005;95:128701. doi: 10.1103/PhysRevLett.95.128701. [DOI] [PubMed] [Google Scholar]
- 11.LaBarbera M. Analyzing Body Size as a Factor in Ecology and Evolution. Annu Rev Ecol Syst. 1989;20:97–117. [Google Scholar]
- 12.Webb JK, Brook BW, Shine R. What makes a species vulnerable to extinction? Comparative life-history traits of two sympatric snakes. Ecol Res. 2002;17:59–67. [Google Scholar]
- 14.Mooers AO, Heard SB. Inferring evolutionary process from phylogenetic tree shape. Q Rev Biol. 1997;72:31–54. [Google Scholar]
- 15.Caldarelli G, Cartozo CC, De Los Rios P, Servedio VDP. Widespread occurrence of the inverse square distribution in social sciences and taxonomy. Phys Rev E. 2004;69:035101(1–3). doi: 10.1103/PhysRevE.69.035101. [DOI] [PubMed] [Google Scholar]
- 16.Pinelis I. Evolutionary models of phylogenetic trees. Proc R Soc Lond B. 2003;270:1425–1431. doi: 10.1098/rspb.2003.2374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Aldous DJ. Stochastic models and descriptive statistics for phylogenetic trees from Yule to today. Stat Sci. 2001;16:23–34. [Google Scholar]
- 18.Simons A. The continuity of microevolution and macroevolution. J Evol Biol. 2002;15:688–701. [Google Scholar]
- 19.Mayr E. Speciation and macroevolution. Evolution. 1982;36:1119–1132. doi: 10.1111/j.1558-5646.1982.tb05483.x. [DOI] [PubMed] [Google Scholar]
- 20.Grantham T. Is macroevolution more than succesive rounds of microevolution? Paleontology. 2007;50:75–85. [Google Scholar]
- 21.Erwin DH. Macroevolution is more than repeated rounds of microevolution. Evol Dev. 2000;2:78–84. doi: 10.1046/j.1525-142x.2000.00045.x. [DOI] [PubMed] [Google Scholar]
- 22.Kutschera U, Niklas KJ. The modern theory of biological evolution: an expanded synthesis. Naturwissenschaften. 2004;91:255–276. doi: 10.1007/s00114-004-0515-y. [DOI] [PubMed] [Google Scholar]
- 23.Yule GU. A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis. Philos Trans R Soc Lond A. 1924;213:21–87. [Google Scholar]
- 24.Simon HA. On a class of skew distribution functions. Biometrika. 1995;42:425–440. [Google Scholar]
- 25.Bornholdt S, Ebel H. World Wide Web Scaling Exponent from Simon's 1955 Model. Phys Rev E. 2001;64:035104(R). doi: 10.1103/PhysRevE.64.035104. [DOI] [PubMed] [Google Scholar]
- 26.Durrett R. Random Graph Dynamics. Cambridge: Cambridge University Press; 2007. [Google Scholar]
- 27.Brown JH, Gillooly JF, Allen AP, Savage VM, West GB. Toward a Metabolic Theory of Ecology. Ecology. 2004;85:1771–1789. [Google Scholar]
- 28.Barthélemy M, Flammini A. Optimal traffic networks. J Stat Mech. 2006;07:L07002. [Google Scholar]
- 29.Harvey PH, Colwell RK, Silvertown JW, May RM. Null models in ecology. Ann Rev Ecol Syst. 1983;14:189–211. [Google Scholar]
- 30.Harding EF. The probabilities of rooted tree-shapes generated by random bifurcation. Adv Appl Prob. 1971;3:44–77. [Google Scholar]
- 31.Cavalli-Sforza LL, Edwards AWF. Phylogenetic analysis: models and estimation procedures. Evolution. 1967;21:550–570. doi: 10.1111/j.1558-5646.1967.tb03411.x. [DOI] [PubMed] [Google Scholar]
- 32.Sackin MJ. “Good” and “bad” phenograms. Sys Zool. 1972;21:225–226. [Google Scholar]
- 33.Shao KT, Sokal R. Tree balance. Sys Zool. 1990;39:226–276. [Google Scholar]
- 34.Ford DJ. Probabilities on cladograms: introduction to the alpha model (PhD Thesis, Stanford University) 2006 [Google Scholar]
- 35.Holman EW. Nodes in phylogenetic trees: the relation between imbalance and number of descendent species. Syst Biol. 2005;54:895–899. doi: 10.1080/10635150500354696. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.