Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2023 May 22;120(22):e2220389120. doi: 10.1073/pnas.2220389120

Phylogenomic comparative methods: Accurate evolutionary inferences in the presence of gene tree discordance

Mark S Hibbins a,b,1, Lara C Breithaupt b,c, Matthew W Hahn b,d
PMCID: PMC10235958  PMID: 37216509

Significance

Phylogenetic comparative methods allow for the study of trait evolution between species by accounting for their shared evolutionary history. These methods usually assume that species relationships can be described by a single tree. However, different parts of the genome can have their own independent evolutionary histories that can disagree with each other. If these disagreeing histories contribute to trait evolution over time, standard comparative methods can be misled. In this work, we developed two approaches to phylogenetic comparative methods that account for this variation in histories across the genome. We used these methods to estimate more accurate rates of floral trait evolution in wild tomatoes. Our work opens alternative approaches for the study of trait evolution among species.

Keywords: phylogenetic comparative methods, trait evolution, phylogenetic discordance

Abstract

Phylogenetic comparative methods have long been a mainstay of evolutionary biology, allowing for the study of trait evolution across species while accounting for their common ancestry. These analyses typically assume a single, bifurcating phylogenetic tree describing the shared history among species. However, modern phylogenomic analyses have shown that genomes are often composed of mosaic histories that can disagree both with the species tree and with each other—so-called discordant gene trees. These gene trees describe shared histories that are not captured by the species tree, and therefore that are unaccounted for in classic comparative approaches. The application of standard comparative methods to species histories containing discordance leads to incorrect inferences about the timing, direction, and rate of evolution. Here, we develop two approaches for incorporating gene tree histories into comparative methods: one that constructs an updated phylogenetic variance–covariance matrix from gene trees, and another that applies Felsenstein's pruning algorithm over a set of gene trees to calculate trait histories and likelihoods. Using simulation, we demonstrate that our approaches generate much more accurate estimates of tree-wide rates of trait evolution than standard methods. We apply our methods to two clades of the wild tomato genus Solanum with varying rates of discordance, demonstrating the contribution of gene tree discordance to variation in a set of floral traits. Our approaches have the potential to be applied to a broad range of classic inference problems in phylogenetics, including ancestral state reconstruction and the inference of lineage-specific rate shifts.


A major goal of evolutionary biology is to understand how and why traits vary among species. One of the major sources of this variation is common ancestry. If left unaccounted for, this shared history can lead to pseudoreplication and spurious trait correlations (1). Phylogenetic comparative methods have been developed to account for shared history, enabling more accurate inferences about the tempo and mode of trait evolution (2). With the statistical toolkit offered by phylogenetic comparative methods, researchers can ask questions about the rate at which traits evolve, whether these rates have changed over time or in different lineages, what traits may have looked like in ancestral or extinct lineages, and whether trait shifts are correlated with historical or environmental factors (36).

In classic comparative methods, common ancestry among species is accounted for by using a single species phylogeny. However, genome-scale analyses of phylogenetic history have revealed that individual loci can have their own independent histories (715). The result is gene tree discordance—the disagreement of trees at individual loci both with each other and with the species phylogeny. Gene tree discordance has important implications for phylogenetic comparative methods because discordant gene trees contain branches that are not present in the species phylogeny. Evolution along such discordant branches can result in trait similarity among species with no shared history in the species tree (Fig. 1). Such patterns of trait variation can mislead standard phylogenetic comparative methods, particularly by resulting in overestimates of the number of trait transitions or the rate of trait evolution (1620). This effect has been termed “hemiplasy”, as single transitions on discordant gene trees can falsely resemble homoplasy when analyzed on the species tree (21).

Fig. 1.

Fig. 1.

Conceptualizing quantitative trait evolution with discordant gene trees. (A) Given a species tree (far left), we model gene trees as arising under the multispecies coalescent process. One topology is concordant with the species tree (blue), while the other two possible topologies are discordant (red and yellow). Under ILS these two discordant trees have the same topology and frequency. (B) Over the course of evolution, mutations occur at loci that affect quantitative traits, each of which has a topology drawn from the multispecies coalescent process. Mutations on the internal branches of discordant gene trees can introduce shared trait history that is not captured by the species tree. Here, we summarize mutations occurring at different loci on a single tree if the loci had the same topology, with each mutation at each locus contributing positively or negatively to the trait value in each species. In the example here, species pairs B–C and A–C might covary in quantitative trait values due to mutations on shared branches in gene trees, despite sharing no common ancestor in the species tree. For example, a mutation on the internal branch of the magenta tree causes species B and C to have more similar trait values. (C) Given a large number of mutations and loci, trait evolution over time can be modeled by Brownian motion on each gene tree topology. This stochastic process models the trait value as a random walk over time, with species trait values calculated as the weighted average of the values on each gene tree (18, 22).

Discordance is a concern for evolutionary inference because it has biological causes that cannot be overcome by addressing technical errors or by increasing species sampling (23). Two primary causes of discordance, incomplete lineage sorting (ILS) and introgression, have different effects on gene tree frequencies and branch lengths and are therefore expected to bias comparative methods in different ways. ILS, a stochastic process that depends on species tree internal branch lengths and population sizes, generates symmetry in the frequencies of possible discordant gene trees (2425). Therefore, higher amounts of ILS lead to broad increases in the occurrence of hemiplasy across multiple possible incongruent trait patterns (18, 26). Introgression is a process of historical hybridization and back-crossing that, while widespread in modern phylogenomic datasets (2729), is often more limited to specific pairs of taxa. In particular, postspeciation introgression between nonsister lineages leads to an excess of gene trees grouping those lineages as sister (3033). This pattern should result in an excess of trait-sharing for the species involved in introgression compared to the species not exchanging genes (1920, 22).

While some progress has been made in accounting for discordance in the evolution of discrete traits, especially in nucleotide models (3438), many classic phylogenetic comparative methods remain unable to account for gene tree discordance when analyzing quantitative traits. The approaches required to improve these methods will depend on the question being asked. Some tasks, such as maximum-likelihood estimation of the rate of trait evolution under Brownian motion (σ2) (e.g., refs. 39 and 40) or phylogenetic regression (41), depend on the specification of a matrix that describes the trait variances and covariances expected from the species phylogeny (often denoted C). Other comparative approaches, such as ancestral state reconstruction (42) and inference of lineage-specific rate shifts (43), can require more sophisticated approaches that calculate state probabilities on different parts of a phylogeny; one such approach is to use Felsenstein's pruning algorithm applied to a species tree with specified branch lengths (44). Mendes et al. (18) showed that failing to account for discordance can bias estimates of σ2 upward and can lead to falsely inflated numbers of trait-mean transitions. In general, the development of a comparative framework incorporating gene tree discordance would lead to more accurate evolutionary inferences in a wide variety of systems with ILS and/or introgression, across a wide variety of approaches for making inferences about quantitative traits.

Here, we demonstrate the utility of an updated phylogenomic comparative framework, using two distinct approaches to incorporate the summed history of concordant and discordant gene trees into evolutionary inference. In the first approach, we show how to construct an updated phylogenetic variance/covariance matrix (which we denote C*) to include the covariances introduced by discordant gene trees. We provide an R package, seastaR, that can construct this updated matrix for any number of species, either by summing the internal branches of an input set of gene trees or by calculating expected gene tree internal branches from an input species tree using the multispecies coalescent model. We show how estimates of the evolutionary rate are made more accurate by using C*, and suggest how this updated matrix can be passed to other available software packages to make multiple evolutionary inferences more robust to discordance. In the second approach, we develop a method for applying the pruning algorithm over a set of gene trees to return the likelihood of an observed trait across species. Using a pilot implementation of this approach for a rooted three-species tree, we show how it can be used to accurately estimate the rate of quantitative trait evolution. Although currently limited to a smaller number of species, this latter approach has the potential to perform more complicated comparative inferences in the presence of discordance. Finally, we apply our approaches to empirical morphological data from wild tomatoes (45), finding a greater discrepancy between species tree and gene tree rate estimates in a clade with a higher rate of gene tree discordance. Overall, our approaches pave the way toward more accurate evolutionary inferences in the presence of gene tree discordance.

Methods

Building a Phylogenetic Variance/Covariance Matrix From Data with Discordance.

As previously discussed, one of the most common ways that phylogeny is incorporated into comparative analyses is by constructing a phylogenetic variance/covariance matrix, C. This square matrix has rows and columns corresponding to the number of taxa in the phylogeny, with the diagonal elements containing the expected trait variances for each species and the off-diagonal elements containing the expected trait covariances between each species pair. Considering three species with the relationship [(A, B), C] (Fig. 2A), the standard covariance matrix has the following form:

C = VarACovAB0CovBAVarB000VarC. [1]

Fig. 2.

Fig. 2.

Inferring the gene tree variance/covariance matrix, C*. (A) Gene trees are generated from a species tree under the multispecies coalescent process (note that introgression can readily be incorporated, but is not shown here for clarity). Each gene tree contributes its internal branch length (for covariance terms) and its total height (for variance terms) to C*. The contribution of each tree to C* is weighted by its expected or observed frequency, depending on the approach taken. Frequencies are denoted as f(XY), where X and Y are the taxa sister in the gene tree of interest. (B) A comparison of C and C* for a five-taxon species tree with branch lengths labeled in coalescent units (not precisely to scale). Each internal branch has a length of 0.3, corresponding to a level of discordance of approximately 50%. This level of discordance means that each clade descended from these internal branches (5/4/3/2, 5/4, and 3/2) will be present in ~50% of gene trees. The standard phylogenetic covariance matrix, C, contains no covariance between species 1 and the other taxa, because they do not share an internal branch in the species tree. In contrast, species 1 covaries with all other species in the tree using C*, because multiple discordant gene trees have species 2 to 5 sharing an internal branch with species 1.

Trait covariances arise from shared internal branches in the phylogeny. As only species A and B share an internal branch in the species tree, the other two species pairs have no expected covariance.

In contrast, if we consider the gene trees that are generated by the species tree in Fig. 2A, the two discordant gene trees contain internal branches shared by pairs B–C and A–C. Discordance due to ILS generates all three possible topologies for this species tree, so all off-diagonal entries in the covariance matrix should have nonzero values (18). We are interested in estimating this updated covariance matrix, which we denote C*:

C* = VarACovABCovACCovBAVarBCovBCCovCACovCBVarC. [2]

To construct C*, we provide the R package seastaR. seastaR uses two approaches for estimating C*, both following the same principle: Each gene tree topology contributes an internal branch which, after being weighted by that tree’s expected frequency, fills an off-diagonal entry in the covariance matrix (Fig. 2A). Each gene tree also contributes its total height, weighted by frequency, to the expected trait variances for each species. Both approaches assume that each individual gene tree contributes equally to trait variation among species (i.e., the effect size of mutations that affect trait variation does not differ on average among loci). We also assume that loci contributing to trait variation follow the same distribution of tree topologies as the genome at large, so the specified loci do not have to be explicitly related to the trait in question.

The first approach for estimating C*, trees_to_vcv, constructs this matrix from a list of provided gene trees (with branch lengths) and their observed frequencies. The method works by obtaining all the internal branch lengths present in each gene tree, as well as the height of each gene tree, and averaging them to get C*. A major advantage of this approach is that it can easily account for both ILS and introgression as sources of gene tree discordance, as the effects of both are captured in the distribution of observed gene tree topologies and branch lengths. On the other hand, individual gene trees may be inferred with error, making their branch lengths and frequencies less reliable. If accurately estimated gene trees are unavailable, our second approach, get_full_matrix, constructs C* solely from an input species tree in coalescent units. This method breaks the input phylogeny down into each possible triplet, and for each triplet uses expectations from the multispecies coalescent model to calculate the expected internal branches and frequencies for each possible gene tree (see ref. 18). For an exemplar five-taxon tree specified in coalescent units, we compared the standard C matrix to a C* matrix computed using get_full_matrix (Fig. 2B). The test tree has three internal branches, each of length of 0.3 coalescent units. Given these branch lengths, we expect 50% of trees to be discordant for each of these three branches, meaning that only 50% of gene trees will have (for instance) the clade containing species 5 and 4 sister to the clade containing species 3 and 2. As expected, C* contains covariance entries for species pairs that do not share an internal branch in the species tree, but that share internal branches in at least one discordant gene tree (Fig. 2B). In addition, the sister lineages in the species tree have smaller covariances in C* than in C, because they do not share an internal branch in many discordant trees.

Our package, seastaR, contains several other utilities, including a parser for an input set of estimated gene trees, a simulator that can simulate trait evolution using C*, and a function to obtain the maximum-likelihood estimate of σ2 using C* (Results). Also note that, although not currently implemented, seastaR could be extended to construct C* from an input species network specified in coalescent units, using expectations from the multispecies network coalescent model (22).

Calculating Trait Likelihoods over a Set of Gene Trees Using Felsenstein’s Pruning Algorithm.

Updating the phylogenetic variance/covariance matrix provides a straightforward solution to account for gene tree discordance that works for several important inference tasks in comparative methods. However, many questions require more sophisticated models that do not have straightforward solutions making use of this matrix. For these questions, the field would benefit from a general approach to calculating likelihoods given a set of gene trees and a model of trait evolution. Our solution makes use of Felsenstein’s pruning algorithm (44), a dynamic programming algorithm that calculates probabilities for a set of character states across all nodes in a phylogeny. A tree-wide likelihood can be calculated from the probabilities at the root, which can be used in conjunction with numerical optimization methods to estimate model parameters.

We developed an approach to apply the pruning algorithm to a specified set of gene trees, rather than to a single tree. This approach is implemented in C++ and draws heavily on the infrastructure of CAFE (4647), a program that uses the pruning algorithm to calculate likelihoods for a birth–death model of gene family evolution. We make several modifications based on the methods presented in ref. 48 and implemented in CAGEE (https://github.com/hahnlab/CAGEE) that allow CAFE’s implementation of the pruning algorithm to be applied to continuous traits rather than integer counts of gene families (see also ref. 49). First, the pruning algorithm requires a vector of possible discrete character states over which probabilities can be calculated. To obtain this vector from an observed continuous trait, we take the range (2(maxX, 2maxX where X is the vector of observed characters for each species. The vector of character states is then filled with 100 equidistant steps from the lower bound to the higher bound. Second, we need to assign probabilities to all the character states at the tips of the phylogeny, so that the pruning algorithm has a place to start. This is straightforward for integer count data, as the observed value can simply be assigned a probability of 1 at the tip. However, for continuous traits it will often be the case that none of the values in our discretized trait vector exactly match the observed values at the tips. Therefore, we implement an approach that distributes the probability at the tip over the two states in the discretized vector closest to each of the observed values, proportional to how distant they are from the observed value (Equation 18 in the appendix of ref. 48). Third, to calculate the transition probability between each pair of discretized trait values over a branch in the phylogeny, we use the Brownian motion model. The probability density for Brownian motion is

px,x0,t=12πtσexx022σ2t, [3]

where x0 is the initial trait value, x is the trait value after time t, and σ2 is the evolutionary rate per unit time. With these methods in place, we can apply the standard pruning algorithm to an individual tree with observed character states and a specified σ2 value.

To estimate a single likelihood over a set of gene trees, we initially apply the standard pruning algorithm to each gene tree individually. Like the covariance matrix approach, we assume that individual loci contribute equally to trait variation among species, and that trait loci follow the same distribution of tree topologies as the genome at large. These gene trees with branch lengths are given to the method directly, and must be ultrametric. Any set of trees can be specified, but the manner in which they are specified will depend on the size of the species tree (i.e., number of tips). For a large species tree, individual gene trees can be inferred or predicted by theory, similarly to the two approaches used by seastaR. Because it may not be possible to sample every possible topology, we recommend sampling a reasonable number of individual gene trees (Discussion). For a small species tree, the most efficient approach will be to specify one tree for each possible topology, along with its frequency. Again, the branch lengths and frequencies of each tree topology can be averaged from a set of inferred trees or predicted from theory. The total likelihood is then calculated as

L =τfτilogmaxpτi, [4]

where τ is the set of gene trees, fτi is the frequency of gene tree i, and pτi is the vector of character state probabilities at the root for gene tree i. In words: we obtain a partial negative log-likelihood for each individual gene tree, these partial likelihoods are then weighted by each gene tree’s observed frequency, and finally the weighted partial likelihoods are summed together to produce the total likelihood (Fig. 3).

Fig. 3.

Fig. 3.

Applying the pruning algorithm to sets of gene trees. In our proposed approach, the pruning algorithm (shown as upward arrows) is applied to each individual gene tree to obtain character state probabilities at each node (curved lines) up to the root for a quantitative trait. These root probabilities are then used to obtain a partial likelihood from each gene tree, which are then summed together weighted by the gene tree frequency to obtain the final likelihood.

A major advantage of the pruning algorithm method is that maximum-likelihood inference can be used to estimate parameters for a wide variety of models. In addition, like the trees_to_vcv method of seastaR, this approach can easily handle introgression events if the signals of introgression are contained in the specified gene trees, or if expected gene trees under the multispecies network coalescent could be specified by the user. Currently, our implementation uses the Nelder–Mead algorithm (50) to find the optimal Brownian motion evolutionary rate parameter, σ2. In the future, we would like the software to also perform more sophisticated inferences, such as ancestral state reconstruction or lineage-specific rate shifts. Our approach could also be extended to any evolutionary model, not just Brownian motion.

Simulating Complex Traits with Discordance.

To demonstrate the utility of our phylogenomic comparative approaches, we used simulations to evaluate their performance on a simple inference task: estimating the evolutionary rate parameter, σ2. We simulated traits from a phylogenetic history with increasing rates of gene tree discordance by making random draws from a multivariate normal distribution (where C* specifies the covariance structure). This simulation approach assumes an infinitesimal contribution to the trait from all genomic loci, an approximation that holds reasonably well for many complex quantitative traits. For each simulated dataset, we applied both standard inference of σ2 using the species tree and our updated inferences that account for gene tree discordance. See SI Appendix, Methods for the exact conditions and parameters used in our simulations.

We simulated our traits under the model parameterization of ref. 18. Levels of discordance in this model are altered by changing the effective population size, N, allowing us to increase the level of discordance by increasing N. The equations for trait variances and covariances are also scaled by N, such that branch lengths in units of absolute time are divided by 2N. The evolutionary rate is a compound parameter, 2NμσM2 , where μ is the mutation rate and σM2 is the variance in mutational effect sizes. A consequence of this formulation of the evolutionary rate is that the true rate used to simulate the data increases as we increase the rate of discordance by increasing N. This model is akin to the one shown in Fig. 1B, in which mutations occur on gene tree branches with normally distributed effect sizes. Given enough mutations and enough time, these cumulative effects resemble Brownian motion of trait means along each lineage (Fig. 1C; 18).

Results

Phylogenomic Comparative Approaches Yield More Accurate Evolutionary Rate Estimates in the Presence of Discordance.

We applied both of our phylogenomic comparative approaches to data simulated with discordance in order to evaluate their accuracy in estimating the evolutionary rate parameter, σ2. For the approach that uses the updated the variance/covariance matrix, C*, we use the maximum-likelihood estimator of σ2:

σ^2=XEXTC1XEXn, [5]

(40), where X is the vector of observed trait values at the tips, C is the phylogenetic variance-covariance matrix, and n is the number of tips. EX is the vector containing the expected trait value at the root, calculated as follows:

EX=11TC1111TC1X, [6]

where 1 is a column vector of ones of size n × 1. To account for gene tree discordance with this estimator, we simply use C* in place of C in Eqs. 5 and 6. We have implemented this method in seastaR to allow users to estimate σ2. For this approach, we simulated 1,000 trait datasets for each condition of increasing gene tree discordance, estimating σ2 using both C and C* for each dataset.

For the approach using the pruning algorithm, we implemented the Nelder–Mead optimization algorithm. Given a set of input gene trees and tree frequencies, our optimization approach proposes a new value of σ2 in each iteration, returning a single likelihood value over the set of gene trees each time; the optimal value of σ2 is the one that maximizes this total likelihood. Owing to longer computation times, we simulated 100 trait datasets for each set of parameters with this method, using either a single tree specified (the species tree) or multiple trees specified (the gene trees).

As expected, we found that increasing the level of discordance results in an increasingly upward bias in estimates of the evolutionary rate from the species tree (Fig. 4, green lines). As there are no internal branches in the species tree that can explain the increased trait covariances between nonsister taxa, such methods must propose a higher evolutionary rate to explain the data. In contrast, we found that both the covariance matrix approach (i.e., C*; Fig. 4A) and pruning algorithm approach (Fig. 4B) yielded more accurate evolutionary rate estimates, ones that closely tracked the true population-scaled evolutionary rate as the level of discordance increased. Both phylogenomic comparative approaches can model the increased covariances generated by the increasing frequencies of discordant gene trees. As can be observed, both approaches tend to slightly underestimate the true evolutionary rate, but they are much closer to the true value than standard species tree estimates, especially at higher rates of discordance.

Fig. 4.

Fig. 4.

Phylogenomic comparative methods produce more accurate evolutionary rate estimates. (A) Rate estimates obtained using a maximum-likelihood estimator applied to the covariance matrix (Eq. 5). (B) Rate estimates obtained using numerical optimization of the likelihood with the pruning algorithm. In both panels, the green line shows inferences from methods using only the species tree, the red line shows the inferences from methods accounting for gene tree discordance, and the blue dashed line shows the true simulated value of the evolutionary rate. The level of gene tree discordance expected from each simulated species tree (SI Appendix, Methods) is shown on the X axis. Note that panels A and B have different Y axis scales.

Phylogenomic Comparative Approaches Are Robust to the Effects of Gene Tree Estimation Error.

In empirical datasets, it is reasonable to expect gene trees to be estimated with some degree of error, especially in the limits of short sequence length (such as ultraconserved elements), long periods of evolutionary divergence, or high rates of sequencing error. In general, these sources of technical error should not be biased toward specific lineages, so their effect should be to cause general overestimation of gene tree discordance. This may in turn result in lower evolutionary rate estimates when using our approaches, as they might “overcorrect” the problem. More generally, we were concerned that increasing the rate of discordance might always lead to a lower evolutionary rate estimate, regardless of the true history that generated the data. Such behavior would present a potential problem for the application of our approaches to empirical datasets.

To address these concerns, we simulated traits from gene trees under a single, fixed rate of gene tree discordance (of approximately 15%) using the methods described in the previous section. We then applied our approaches to estimating σ2 to this dataset, varying the specified rate of gene tree discordance from 0 (in which case we used the standard species tree inference) to approximately 60%. In contrast to our initial concerns, we found that in both the covariance matrix (Fig. 5A) and pruning algorithm (Fig. 5B) approaches: 1) the effect of mis-specifying the rate of gene tree discordance is relatively small compared to the effect of using the species tree in place of gene trees; 2) increasing the specified rate of gene tree discordance leads to a small increase, rather than decrease, in the estimated evolutionary rate, but still closely tracked the true value. This latter effect may occur because increasing the specified rate of gene tree discordance requires branch lengths to be scaled down in accordance with N, resulting in less proposed time over which evolutionary changes can occur. Overall, these results suggest that gene tree estimation error should not be a major concern for our approaches, as long as the correct set of tree topologies is specified.

Fig. 5.

Fig. 5.

Phylogenomic comparative methods are robust to gene tree estimation error. In both panels, the solid vertical line denotes the true rate of discordance used to simulate the trait data, and the horizontal blue line denotes the true evolutionary rate. The X axis shows the rate of discordance supplied to each approach when estimating the evolutionary rate from the simulated data. For a rate of discordance equal to 0, we used the standard species tree inference rather than gene tree inference. Specifying too much discordance can also cause overestimation of the evolutionary rate. Note that panels A and B have different Y axis scales.

Rate Estimates for Floral Traits in the Wild Tomato Clade Solanum Are Consistent with Evolution on Discordant Gene Trees.

Our simulations show that when traits evolve on discordant gene trees, standard species tree approaches tend to greatly overestimate the true value, while our gene tree approaches slightly underestimate the true value but are much more accurate. The degree of discrepancy between species tree and gene tree approaches grows larger as the rate of discordance increases. To further test these expectations and to highlight the application of our methods to empirical data, we estimated the evolutionary rates of several floral traits (anther length, corolla diameter, and stigma length) measured in wild tomatoes (Solanum) (45). We obtained the time-scaled phylogeny of this clade from ref. 10 and converted from time in years to coalescent units assuming N = 100,000 and one generation every 2 y (51). We then pruned the phylogenetic tree into high and low ILS triplets, each consisting of three taxa. The high ILS group consisted of the following accessions (IDs from the Tomato Genetics Resource Center): S. galapagense LA0436, S. cheesmaniae LA3124, and S. pimpinellifolium LA1269; the low ILS group consisted of S. pennellii LA3778, S. pennellii LA0716, and S. pimpinellifolium LA1589. Based on the internal branch lengths in coalescent units, the high ILS and low ILS triplets had expected rates of discordance of approximately 47% and 0.9%, respectively. These rates correspond to the rates of discordance seen in empirically estimated gene trees in ref. 10. Based on our simulation results, if discordant gene trees contribute to tomato floral trait variation, we should see a greater discrepancy between species tree methods and our gene tree methods in the high ILS triplet.

In both of our approaches, we used the multispecies coalescent model to calculate the expected gene tree frequencies and branch lengths in each triplet. For the covariance matrix method, we used these expectations to construct the covariance matrix C* for each triplet using the get_full_matrix() method. For the pruning algorithm method, we specified a representative gene tree of each of the possible topologies with expected branch lengths, and weighted each tree by its expected frequency. For both methods, we used the standard approach of specifying a single species tree and our gene tree approaches to estimate the evolutionary rate using the mean trait values within each accession if multiple individuals were measured.

In line with our expectations, rate estimates obtained from standard species tree approaches are much higher than those obtained from both of our gene tree methods in the high ILS triplet, for all three traits (Fig. 6). The discrepancy is much smaller in the low ILS triplets (Fig. 6). The bias due to discordance was very large for the estimates obtained from the covariance matrix method in seastaR (Fig. 6A), where the species tree rate estimates were several orders of magnitude higher than the gene tree estimates in the high ILS triplet. This is consistent with our simulation finding that the estimated evolutionary rate is more biased under discordance when using covariance methods (Fig. 4A). Even when accounting for gene tree discordance, the rate estimates obtained from the covariance matrix method were substantially higher than those obtained from the pruning algorithm method (compare Fig. 6 A and B). This discrepancy can be explained by flat/undefined likelihood surfaces for the proposed values of σ2 (SI Appendix, Fig. S1): The pruning algorithm method, which employs a likelihood search, reaches a likelihood plateau and does not propose further improvements, whereas the covariance matrix method uses the analytical likelihood estimator to obtain the maximum value, regardless of the shape of the likelihood surface. This problem may arise when a small number of taxa are studied, as less information is available to discern the rate of trait evolution. To help users evaluate this problem in their datasets, we have implemented a function for calculating trait likelihoods over a range of proposed σ2 values in seastaR. In this case, we believe the pruning algorithm estimates represent more biologically realistic rates of evolution.

Fig. 6.

Fig. 6.

Evolutionary rate estimates for three floral traits in Solanum using our approaches (red bars), in comparison to standard species tree methods (blue bars). (A) rate estimates (σ2) obtained using the analytical maximum-likelihood estimator as implemented in our R package seastaR. In the high ILS triplet, the species tree estimates (blue) go far above the scale of the Y axis, so these bars are labeled with an up arrow and the true estimated values for clarity. (B) Rate estimates (σ2) obtained via maximum likelihood optimization using our pruning algorithm implementation for calculating likelihoods on gene trees. Note that the values on the Y axis are not the same in the two panels.

Discussion

There has been much phylogenetic research focused on the accurate estimation of species trees in the face of gene tree discordance (18, 23, 5255). Despite this focus on inferring trees in the face of discordance, standard phylogenetic comparative methods still rely on a single “resolved” tree to describe the shared history of species. Recent work has made it clear that, if only a single tree is used, gene tree discordance can shape trait variation and mislead comparative methods (e.g., refs. 18 and 19). However, few solutions have been proposed to solve these problems, especially for quantitative traits evolving on clades containing discordance. Here, we have developed two approaches, which we refer to as phylogenomic comparative methods, that can incorporate gene tree discordance into comparative inference. One approach uses a more complete phylogenetic variance–covariance matrix that includes the covariance present in discordant gene trees. We have developed an R package, seastaR, for building this matrix using the frequencies and branch lengths of relevant gene trees. The second approach applies the pruning algorithm over a set of gene trees—concordant and discordant—to estimate likelihoods. Using simulation, we demonstrate that these methods generate more accurate evolutionary rate estimates for traits evolving in the presence of discordance, and are generally robust to the effects of gene tree estimation error. Finally, we demonstrate that empirical floral traits in the wild tomato clade Solanum are consistent with evolution on discordant gene trees, with the clade with a higher rate of gene tree discordance exhibiting a greater discrepancy in rate estimates between traditional approaches and our methods.

Many phylogenetic comparative methods take the variance–covariance matrix, C, as input (e.g., refs. 40, 42, 56, and 57). Because of the wide use of C, we anticipate that a more complete variance–covariance matrix, C*, will be easy to incorporate into many comparative analyses. The seastaR package provides an easy way for users to generate C*, either from a set of specified gene trees or from a specified species tree (assuming a multispecies coalescent process). Here, we have demonstrated how C* can be used to obtain a maximum-likelihood estimate of the rate of quantitative trait evolution under Brownian motion, a method that is also implemented in seastaR. One obvious extension of the use of C* is in phylogenetic generalized linear mixed models, where the covariance matrix is often specified directly in packages such as MCMCglmm (58). However, many popular packages for implementing comparative methods—such as phytools (59), ape (60), and Geiger (61)—do not take a matrix directly, instead turning an input species tree into a matrix. Furthermore, they require a strictly bifurcating tree as input to construct a phylo class object. Integrating the ability to accept C* (or equivalent sets of gene trees) into these methods would enable a much larger array of inference tasks to take discordance into account.

The pruning algorithm is widely used in likelihood-based inference of phylogenetic trees (62) and for some applications in quantitative trait evolution (e.g., refs. 46 and 6368). Our method using the pruning algorithm across a set of gene trees makes many of the same assumptions as previous implementations, but models each trait as the combined result of a large number of loci; these loci were represented in our calculations by a smaller number of exemplar gene tree topologies, each with the mean set of branch lengths for a given topology. Although it is not as straightforward to incorporate our method into other approaches as with C*, because the pruning algorithm is a general method for calculating likelihoods, it has enormous potential to be applied to a wide variety of inference problems. As trees get larger, the computational cost of the matrix operations in Eqs. 5 and 6 grows exponentially with the number of taxa. In contrast, the number of calculations in the pruning algorithm only grows linearly, and therefore trees with thousands of tips can be analyzed (69). Furthermore, even though several methods for dealing with sparse matrices make it possible to analyze larger numbers of taxa (e.g., ref. 70), C* has more covariance entries and is therefore less sparse than C; this again limits matrix-based approaches in phylogenomic comparative methods.

Both of our approaches can be extended in multiple ways. While we have only considered Brownian motion models here, there are multiple other trait models that could be used. The Ornstein–Uhlenbeck process is a popular model for trait evolution, with estimators available using both matrix (7174) and pruning algorithm approaches (63, 6566, 68). Additional models for continuous traits include “early burst” (75) and Lévy (“jump”) processes (76). All of these models should be able to be accommodated by phylogenomic comparative methods. In addition, although we have described the covariances in our models with a particular set of gene trees in mind, both methods can be used with any weighted mixture of trees. This means that users do not have to assume a particular model of species tree evolution (e.g., the multispecies coalescent model) and can even ignore ILS altogether in favor of phylogenetic network models (77). This should also allow our approaches to accommodate unequal contributions to trait variation across individual loci, for example, if some loci are known to be functionally related to the trait of interest and therefore expected to have mutations of larger effect on average.

There are also multiple caveats that come with our proposed approaches, and some important technical limitations to consider. First, errors in gene tree or species tree specification might bias inferences. This is especially true if gene trees are being used as inputs, as we require both accurate and ultrametric trees. Our methods assume that loci controlling trait variation and the genome at large follow the same distribution of trees; however, if trait loci experience stronger than average selection, these loci could have proportionally fewer discordant gene trees than the genomic background (78). Gene trees may also be misspecified due to technical errors in their estimation. We found that error in gene tree frequencies and branch lengths is relatively inconsequential for our approaches, under the conditions considered here. Specifying a set of incorrect gene tree topologies may have more of an effect, but since ILS is expected to generate all possible topologies with respect to a single branch, we do not expect this to be a significant issue. Obtaining ultrametric gene trees remains challenging due to variation in rates of evolution among loci and small amounts of data per locus. Even when species trees are used to generate gene tree frequencies (i.e., get_full_matrix), many coalescent-based methods for inferring species trees do not estimate tip branch lengths (e.g., refs. 54 and 79), further limiting accurate inferences (but see refs. 19 and 77). If there is uncertainty in the species tree topology or branch lengths, a straightforward solution would be to embed the approaches used here within a Bayesian framework (e.g., refs. 80 and 81). It is important to note, however, that gene tree discordance is not equivalent to species tree uncertainty: averaging over each gene tree topology on its own in a Bayesian framework would simply mean averaging over many incorrect trees. Instead, a proper Bayesian approach to accommodating discordance would have to sum over a new set of gene trees (or covariances) for each species tree topology proposed, as was done here with a single topology.

A second caveat is that large numbers of taxa make it harder to accurately estimate both the matrix used in seastaR and the gene trees used within the pruning algorithm. If gene trees are predicted from theory, seastaR calculates C* from the species tree by breaking the tree into triplets. While this will return approximately correct covariances for all pairs of species, it necessarily ignores any covariance structures that might only be possible in trees with four or more taxa. The problem for the pruning algorithm approach could be even worse, as separate gene tree topologies must be specified: specifying representative gene trees for all possible topologies becomes prohibitive with more taxa because the number of gene trees grows super-exponentially. Even if gene trees are estimated from the data, with only a few dozen taxa there are more possible gene tree topologies than independent loci in a genome. Two solutions suggest ways around these issues. First, the problems can be somewhat ameliorated by recognizing that it is not the number of taxa that is the issue, but instead the number of lineages within “knots” (cf. ref. 82) on the larger phylogeny that are prone to gene tree discordance. For instance, even in a tree with 100 species, if only three are undergoing ILS, then only three topologies must be considered. Judicious choices as to the number of different topologies that must be considered in any particular analysis could save a lot of computational effort. Second, as mentioned in the Methods, one approach that can be applied to the pruning algorithm method is to sample a limited number of individual gene trees, either directly from the inferred trees or from the multispecies coalescent model applied to the species tree. Even if we have to sample 100 trees, the likelihood calculations on each are relatively fast and can be parallelized. Such a sampling scheme will also naturally recapitulate the degree of discordance associated with every branch in the species tree.

Throughout our analyses, we found that rate estimates using a single species tree differed from those accounting for gene trees, even when the level of discordance was very low or zero (Fig. 4A). This result occurs because the two modes of inference are fundamentally different: even with no discordance, “gene tree” analyses are based on gene tree branch lengths, not species tree branch lengths. Gene tree branch lengths are always longer than species tree branch lengths because each pair of lineages is expected to coalesce 2N generations before their time of speciation (8384). These longer gene tree branch lengths result in higher trait variances in the traits, such that a higher evolutionary rate must be proposed to explain the same data when using the species tree for analysis. This distinction highlights an additional challenge for a potential application of our pruning algorithm approach—ancestral state reconstruction. Because the internal nodes of gene trees—including concordant gene trees—do not exist at the same moment of time as the internal nodes of species trees, reconstructing ancestral states at the time of speciation requires knowledge of the contribution of each gene tree branch to trait evolution at that particular time point. This could be accomplished by the insertion of single-descendant nodes on gene tree lineages that are concurrent with ancestral nodes on the species tree. Inferring lineage-specific rate shifts will likewise require that each gene tree branch, or segment of a gene tree branch, be assigned to specific species tree lineages (cf. ref. 37). In general, these considerations highlight the fact that using gene trees in place of a species tree is a fundamentally different mode of inference, and that standard comparative methods using the species tree may yield incorrect inferences even if there is no discordance.

We consistently found that evolutionary rate estimates for tomato floral traits were much greater using species tree approaches than our gene tree approaches (Fig. 6). For the high ILS triplet, there was even more bias than in the low ILS knot. These results are consistent with a contribution of gene trees, rather than a single species phylogeny, to variation in these traits. Our analysis of gene tree error suggests that this result is not simply an artifact of increasing the specified rate of gene tree discordance, but is the result of biological variation in the floral traits. Furthermore, our findings have implications for the study of evolutionary rate variation among clades. For example, imagine that researchers wished to investigate whether the evolutionary rate of corolla diameter differed between our high ILS and low ILS triplets. Applying standard species tree methods, they would find that the corolla diameter of the high ILS species evolves at a much faster rate than in their low ILS counterparts. However, from our results in Fig. 6, after correcting for the contribution of discordant gene trees, this difference disappears and the trait appears to evolve at approximately the same rate in both clades. This result highlights how variation in the rate of gene tree discordance among clades is a confounding factor when studying the evolution of lineage-specific rate shifts.

An increasingly common finding in phylogenomics is that of rapid and/or highly parallel trait evolution associated with rapid species radiations (8589). The application of classic comparative approaches in these systems has suggested that many radiations violate constant-rate Brownian motion models, with more complex models being proposed instead (9092). However, adaptive radiations often have very little time between speciation events, resulting in high rates of gene tree discordance and therefore high potential for hemiplasy (10). Here, we have found that apparently higher rates of trait evolution in rapid radiations may be perfectly consistent with a standard Brownian motion model with a constant evolutionary rate. In this circumstance, higher apparent rates of evolution are simply the result of a stronger contribution of discordant gene trees to covariance among species. Our proposed phylogenomic comparative methods help to address these issues, providing more accurate evolutionary inferences in systems with high rates of discordance.

Supplementary Material

Appendix 01 (PDF)

Acknowledgments

We would like to thank Ben Fulton and Jason Bertram for offering helpful programming and statistical advice in implementing the pruning algorithm approach, as well as David Haak for sharing floral trait data. Two reviewers provided helpful comments on the manuscript. This work was supported by the EEB Postdoctoral Fellowship awarded to M.S.H. by the Department of Ecology and Evolutionary Biology at the University of Toronto, as well as NSF grant DEB-1936187 awarded to M.W.H. Portions of this paper were developed from the Ph.D. thesis of M.S.H.

Author contributions

M.S.H. and M.W.H. designed research; M.S.H. and L.C.B. performed research; M.S.H. and L.C.B. contributed new reagents/analytic tools; M.S.H. and L.C.B. analyzed data; and M.S.H., L.C.B., and M.W.H. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

Data, Materials, and Software Availability

Source code and analysis scripts related to seastaR and the covariance matrix method can be found in https://github.com/larabreithaupt/seastaR (93). Code and scripts related to the pruning algorithm method can be found in https://github.com/mhibbins/genetreepruningalg (94).

Supporting Information

References

  • 1.Felsenstein J., Phylogenies and the comparative method. Am. Nat. 125, 1–15 (1985), 10.1086/284325. [DOI] [PubMed] [Google Scholar]
  • 2.Harvey P. H., Pagel M., The Comparative Method in Evolutionary Biology (Oxford University Press, Oxford, 1991). [Google Scholar]
  • 3.Martins E. P., Hansen T. F., “The statistical analysis of interspecific data: A review and evaluation of phylogenetic comparative methods” in Phylogenies and The Comparative Method in Animal Behavior, Martins E. P., Ed. (Oxford University Press, New York, 1996). [Google Scholar]
  • 4.Garamszegi L. Z., Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology (Springer, Berlin, 2014). [Google Scholar]
  • 5.Adams D. C., Collyer M. L., Multivariate phylogenetic comparative methods: Evaluations, comparisons, and recommendations. Syst. Biol. 67, 14–31 (2018), 10.1093/sysbio/syx055. [DOI] [PubMed] [Google Scholar]
  • 6.Revell L. J., Harmon L. J., Phylogenetic Comparative Methods in R (Princeton University Press, 2022). [Google Scholar]
  • 7.Pollard D. A., Iyer V. N., Moses A. M., Eisen M. B., Widespread discordance of gene trees with species tree in Drosophila: Evidence for incomplete lineage sorting. PLoS Genet. 2, e173 (2006), 10.1371/journal.pgen.0020173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.White M. A., Ane C., Dewey C. N., Larget B. R., Payseur B. A., Fine-scale phylogenetic discordance across the house mouse genome. PLoS Genet. 5, e1000729 (2009), 10.1371/journal.pgen.1000729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Fontaine M. C., et al. , Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science 347, 1258524 (2015), 10.1126/science.1258524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Pease J. B., Haak D. C., Hahn M. W., Moyle L. C., Phylogenomics reveals three sources of adaptive variation during a rapid radiation. PLoS Biol. 14, e1002379 (2016), 10.1371/journal.pbio.1002379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Novikova P. Y., et al. , Sequencing of the genus Arabidopsis identifies a complex history of nonbifurcating speciation and abundant trans-specific polymorphism. Nat. Genet. 48, 1077–1082 (2016), 10.1038/ng.3617. [DOI] [PubMed] [Google Scholar]
  • 12.Copetti D., et al. , Extensive gene tree discordance and hemiplasy shaped the genomes of North American columnar cacti. Proc. Natl. Acad. Sci. U.S.A. 114, 12003–12008 (2017), 10.1073/pnas.1706367114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wu M., Kostyun J. L., Hahn M. W., Moyle L. C., Dissecting the basis of novel trait evolution in a radiation with widespread phylogenetic discordance. Mol. Ecol. 27, 3301–3316 (2018), 10.1111/mec.14780. [DOI] [PubMed] [Google Scholar]
  • 14.Edelman N. B., et al. , Genomic architecture and introgression shape a butterfly radiation. Science 366, 594–599 (2019), 10.1126/science.aaw2090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Vanderpool D., et al. , Primate phylogenomics uncovers multiple rapid radiations and ancient interspecific introgression. PLoS Biol. 18, e3000954 (2020), 10.1371/journal.pbio.3000954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hahn M. W., Nakhleh L., Irrational exuberance for resolved species trees. Evolution 70, 7–17 (2016), 10.1111/evo.12832. [DOI] [PubMed] [Google Scholar]
  • 17.Mendes F. K., Hahn M. W., Gene tree discordance causes apparent substitution rate variation. Syst. Biol. 65, 711–721 (2016), 10.1093/sysbio/syw018. [DOI] [PubMed] [Google Scholar]
  • 18.Mendes F. K., Fuentes-Gonzalez J. A., Schraiber J. G., Hahn M. W., A multispecies coalescent model for quantitative traits. eLife 7, e36482 (2018), 10.7554/eLife.36482. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hibbins M. S., Gibson M. J., Hahn M. W., Determining the probability of hemiplasy in the presence of incomplete lineage sorting and introgression. eLife 9, e63753 (2020), 10.7554/eLife.63753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Wang Y., Cao Z., Ogilvie H. A., Nakhleh L., Phylogenomic assessment of the role of hybridization and introgression in trait evolution. PLoS Genet. 17, e1009701 (2021), 10.1371/journal.pgen.1009701. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Avise J. C., Robinson T. J., Hemiplasy: A new term in the lexicon of phylogenetics. Syst. Biol. 57, 503–507 (2008), 10.1080/10635150802164587. [DOI] [PubMed] [Google Scholar]
  • 22.Hibbins M. S., Hahn M. W., The effects of introgression across thousands of quantitative traits revealed by gene expression in wild tomatoes. PLoS Genet. 17, e1009892 (2021), 10.1371/journal.pgen.1009892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Degnan J. H., Rosenberg N. A., Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24, 332–340 (2009), 10.1016/j.tree.2009.01.009. [DOI] [PubMed] [Google Scholar]
  • 24.Hudson R. R., Testing the constant-rate neutral allele model with protein sequence data. Evolution 37, 203–217 (1983), 10.1111/j.1558-5646.1983.tb05528.x. [DOI] [PubMed] [Google Scholar]
  • 25.Pamilo P., Nei M., Relationships between gene trees and species trees. Mol. Biol. Evol. 5, 568–583 (1988), 10.1093/oxfordjournals.molbev.a040517. [DOI] [PubMed] [Google Scholar]
  • 26.Guerrero R. F., Hahn M. W., Quantifying the risk of hemiplasy in phylogenetic inference. Proc. Natl. Acad. Sci. U.S.A. 115, 12787–12792 (2018), 10.1073/pnas.1811268115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mallet J., Besansky N., Hahn M. W., How reticulated are species? BioEssays 38, 140–149 (2016), 10.1002/bies.201500149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Taylor S. A., Larson E. L., Insights from genomes into the evolutionary importance and prevalence of hybridization in nature. Nat. Ecol. Evol. 3, 170–177 (2019), 10.1038/s41559-018-0777-y. [DOI] [PubMed] [Google Scholar]
  • 29.Dagilis A. J., et al. , A need for standardized reporting of introgression: Insights from studies across eukaryotes. Evol. Lett. 6, 344–357 (2022), 10.1002/evl3.294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Reich D., Thangaraj K., Patterson N., Price A. L., Singh L., Reconstructing Indian population history. Nature 461, 489–494 (2009), 10.1038/nature08365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Green R. E., et al. , A draft sequence of the Neandertal genome. Science 328, 710–722 (2010), 10.1126/science.1188021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Durand E. Y., Patterson N., Reich D., Slatkin M., Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239–2252 (2011), 10.1093/molbev/msr048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Patterson N., et al. , Ancient admixture in human history. Genetics 192, 1065–1093 (2012), 10.1534/genetics.112.145037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.De Maio N., Schlotterer C., Kosiol C., Linking great apes genome evolution across time scales using polymorphism-aware phylogenetic models. Mol. Biol. Evol. 30, 2249–2262 (2013), 10.1093/molbev/mst131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.De Maio N., Wu C. H., O’Reilly K. M., Wilson D., New routes to phylogeography: A Bayesian structured coalescent approximation. PLoS Genet. 11, e1005421 (2015), 10.1371/journal.pgen.1005421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Schrempf D., Minh B. Q., De Maio N., von Haeseler A., Kosiol C., Reversible polymorphism-aware phylogenetic models and their application to tree inference. J. Theoret Biol. 407, 362–370 (2016), 10.1016/j.jtbi.2016.07.042. [DOI] [PubMed] [Google Scholar]
  • 37.Ogilvie H. A., Bouckaert R. R., Drummond A. J., StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Mol. Biol. Evol. 34, 2101–2114 (2017), 10.1093/molbev/msx126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Schrempf D., Minh B. Q., von Haeseler A., Kosiol C., Polymorphism-aware species trees with advanced mutation models, bootstrap, and rate heterogeneity. Mol. Biol. Evol. 36, 1294–1301 (2019), 10.1093/molbev/msz043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Garland T., Ives A. R., Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods. Am. Nat. 155, 346–364 (2000), 10.1086/303327. [DOI] [PubMed] [Google Scholar]
  • 40.O’Meara B. C., Ane C., Sanderson M. J., Wainwright P. C., Testing for different rates of continuous trait evolution using likelihood. Evolution 60, 922–933 (2006), https://www.ncbi.nlm.nih.gov/pubmed/16817533. [PubMed] [Google Scholar]
  • 41.Grafen A., The phylogenetic regression. Philos. Trans. R. Soc. Lond. B Biol. Sci. 326, 119–157 (1989), 10.1098/rstb.1989.0106. [DOI] [PubMed] [Google Scholar]
  • 42.Pagel M., Inferring the historical patterns of biological evolution. Nature 401, 877–884 (1999), 10.1038/44766. [DOI] [PubMed] [Google Scholar]
  • 43.Alfaro M. E., et al. , Nine exceptional radiations plus high turnover explain species diversity in jawed vertebrates. Proc. Natl. Acad. Sci. U.S.A. 106, 13410–13414 (2009), 10.1073/pnas.0811087106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Felsenstein J., Maximum-likelihood estimation of evolutionary trees from continuous characters. Am. J. Hum. Genet. 25, 471–492 (1973), https://www.ncbi.nlm.nih.gov/pubmed/4741844. [PMC free article] [PubMed] [Google Scholar]
  • 45.Haak D. C., Ballenger B. A., Moyle L. C., No evidence for phylogenetic constraint on natural defense evolution among wild tomatoes. Ecology 95, 1633–1641 (2014), 10.1890/13-1145.1. [DOI] [PubMed] [Google Scholar]
  • 46.Hahn M. W., De Bie T., Stajich J. E., Nguyen C., Cristianini N., Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Res. 15, 1153–1160 (2005), 10.1101/gr.3567505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mendes F. K., Vanderpool D., Fulton B., Hahn M. W., CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics 36, 5516–5518 (2020), 10.1093/bioinformatics/btaa1022. [DOI] [PubMed] [Google Scholar]
  • 48.Boucher F. C., Demery V., Inferring bounded evolution in phenotypic characters from phylogenetic comparative data. Systematic Biol. 65, 651–661 (2016), 10.1093/sysbio/syw015. [DOI] [PubMed] [Google Scholar]
  • 49.Bertram J., et al. , CAGEE: Computational analysis of gene expression evolution. Mol. Biol. Evol. msad106, (2023), 10.1093/molbev/msad106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Nelder J. A., Mead R., A simplex-method for function minimization. Computer J. 7, 308–313 (1965), 10.1093/comjnl/7.4.308. [DOI] [Google Scholar]
  • 51.Hamlin J. A. P., Hibbins M. S., Moyle L. C., Assessing biological factors affecting postspeciation introgression. Evol. Lett. 4, 137–154 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bryant D., Bouckaert R., Felsenstein J., Rosenberg N. A., RoyChoudhury A., Inferring species trees directly from biallelic genetic markers: Bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol. 29, 1917–1932 (2012), 10.1093/molbev/mss086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Chifman J., Kubatko L., Quartet inference from SNP data under the coalescent model. Bioinformatics 30, 3317–3324 (2014), 10.1093/bioinformatics/btu530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Mirarab S., et al. , ASTRAL: Genome-scale coalescent-based species tree estimation. Bioinformatics 30, i541–i548 (2014), 10.1093/bioinformatics/btu462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Zhang C., Rabiee M., Sayyari E., Mirarab S., ASTRAL-III: Polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 19, 153 (2018), 10.1186/s12859-018-2129-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Housworth E. A., Martins E. P., Lynch M., The phylogenetic mixed model. Am. Nat. 163, 84–96 (2004), 10.1086/380570. [DOI] [PubMed] [Google Scholar]
  • 57.Revell L. J., Harmon L. J., Testing quantitative genetic hypotheses about the evolutionary rate matrix for continuous characters. Evol. Ecol. Res. 10, 311–331 (2008). [Google Scholar]
  • 58.Hadfield J. D., MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. J. Stat. Software 33, 1–22 (2010), 10.18637/jss.v033.i02. [DOI] [Google Scholar]
  • 59.Revell L. J., phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012), 10.1111/j.2041-210X.2011.00169.x. [DOI] [Google Scholar]
  • 60.Paradis E., K. Schliep, ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019), 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]
  • 61.Pennell M. W., et al. , geiger v2.0: An expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics 30, 2216–2218 (2014), 10.1093/bioinformatics/btu181. [DOI] [PubMed] [Google Scholar]
  • 62.Felsenstein J., Evolutionary trees from DNA-sequences–A maximum-likelihood approach. J. Mol. Evol. 17, 368–376 (1981), 10.1007/Bf01734359. [DOI] [PubMed] [Google Scholar]
  • 63.FitzJohn R. G., DiversiTree: Comparative phylogenetic analyses of diversification in R. Methods Ecol. Evol. 3, 1084–1092 (2012), 10.1111/j.2041-210X.2012.00234.x. [DOI] [Google Scholar]
  • 64.Freckleton R. P., Fast likelihood calculations for comparative analyses. Methods Ecol. Evol. 3, 940–947 (2012), 10.1111/j.2041-210X.2012.00220.x. [DOI] [Google Scholar]
  • 65.Ho L., Ané C., A linear-time algorithm for Gaussian and non-Gaussian trait evolution models. Syst. Biol. 63, 397–408 (2014), 10.1093/sysbio/syu005. [DOI] [PubMed] [Google Scholar]
  • 66.Uyeda J. C., Harmon L. J., A novel Bayesian method for inferring and interpreting the dynamics of adaptive landscapes from phylogenetic comparative data. Syst. Biol. 63, 902–918 (2014), 10.1093/sysbio/syu057. [DOI] [PubMed] [Google Scholar]
  • 67.Hiscott G., Fox C., Parry M., Bryant D., Efficient recycled algorithms for quantitative trait models on phylogenies. Genome Biol. Evol. 8, 1338–1350 (2016), 10.1093/gbe/evw064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Mitov V., Bartoszek K., Asimomitis G., Stadler T., Fast likelihood calculation for multivariate Gaussian phylogenetic models with shifts. Theoret. Popul. Biol. 131, 66–78 (2020), 10.1016/j.tpb.2019.11.005. [DOI] [PubMed] [Google Scholar]
  • 69.Mitov V., Stadler T., Parallel likelihood calculation for phylogenetic comparative models: The SPLITT C++ library Methods Ecol. Evol. 10, 493–506 (2019), 10.1111/2041-210x.13136. [DOI] [Google Scholar]
  • 70.Hadfield J. D., Nakagawa S., General quantitative genetic methods for comparative biology: Phylogenies, taxonomies and multi-trait models for continuous and categorical characters. J. Evol. Biol. 23, 494–508 (2010), 10.1111/j.14209101.2009.01915.x. [DOI] [PubMed] [Google Scholar]
  • 71.Hansen T. F., Stabilizing selection and the comparative analysis of adaptation. Evolution 51, 1341–1351 (1997), 10.1111/j.15585646.1997.tb01457.x. [DOI] [PubMed] [Google Scholar]
  • 72.Butler M. A., King A. A., Phylogenetic comparative analysis: A modeling approach for adaptive evolution. Am. Nat. 164, 683–695 (2004), 10.1086/426002. [DOI] [PubMed] [Google Scholar]
  • 73.Beaulieu J. M., Jhwueng D. C., Boettiger C., O’Meara B. C., Modeling stabilizing selection: Expanding the Ornstein-Uhlenbeck model of adaptive evolution. Evolution 66, 2369–2383 (2012), 10.1111/j.1558-5646.2012.01619.x. [DOI] [PubMed] [Google Scholar]
  • 74.Rohlfs R. V., Harrigan P., Nielsen R., Modeling gene expression evolution with an extended Ornstein-Uhlenbeck process accounting for within-species variation. Mol. Biol. Evol. 31, 201–211 (2014), 10.1093/molbev/mst190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Harmon L. J., et al. , Early bursts of body size and shape evolution are rare in comparative data. Evolution 64, 2385–2396 (2010), 10.1111/j.15585646.2010.01025.x. [DOI] [PubMed] [Google Scholar]
  • 76.Landis M. J., Schraiber J. G., Pulsed evolution shaped modern vertebrate body sizes. Proc. Natl. Acad. Sci. U.S.A. 114, 13224–13229 (2017), 10.1073/pnas.1710920114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Bastide P., Solis-Lemus C., Kriebel R., William Sparks K., Ane C., Phylogenetic comparative methods on phylogenetic networks with reticulations. Syst. Biol. 67, 800–820 (2018), 10.1093/sysbio/syy033. [DOI] [PubMed] [Google Scholar]
  • 78.Pease J. B., Hahn M. W., More accurate phylogenies inferred from low-recombination regions in the presence of incomplete lineage sorting. Evolution 67, 2376–2384 (2013), 10.1111/evo.12118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Liu L., Yu L., Edwards S. V., A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10, 302 (2010), 10.1186/1471-2148-10-302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Huelsenbeck J. P., Rannala B., Masly J. P., Accommodating phylogenetic uncertainty in evolutionary studies. Science 288, 2349–2350 (2000), 10.1126/science.288.5475.2349. [DOI] [PubMed] [Google Scholar]
  • 81.Pagel M., Meade A., Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am. Nat. 167, 808–825 (2006), 10.1086/503444. [DOI] [PubMed] [Google Scholar]
  • 82.Ane C., Larget B., Baum D. A., Smith S. D., Rokas A., Bayesian estimation of concordance among gene trees. Mol. Biol. Evol. 24, 412–426 (2007), 10.1093/molbev/msl170. [DOI] [PubMed] [Google Scholar]
  • 83.Gillespie J. H., Langley C. H., Are evolutionary rates really variable? J. Mol. Evol. 13, 27–34 (1979), 10.1007/BF01732751. [DOI] [PubMed] [Google Scholar]
  • 84.Edwards S. V., Beerli P., Perspective: Gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies. Evolution 54, 1839–1854 (2000), 10.1111/j.0014-3820.2000.tb01231.x. [DOI] [PubMed] [Google Scholar]
  • 85.Schluter D., Price T., Mooers A. O., Ludwig D., Likelihood of ancestor states in adaptive radiation. Evolution 51, 1699–1711 (1997), 10.1111/j.1558-5646.1997.tb05095.x. [DOI] [PubMed] [Google Scholar]
  • 86.Boughman J. W., Rundle H. D., Schluter D., Parallel evolution of sexual isolation in sticklebacks. Evolution 59, 361–373 (2005), https://www.ncbi.nlm.nih.gov/pubmed/15807421. [PubMed] [Google Scholar]
  • 87.Sun Y., Wang A., Wan D., Wang Q., Liu J., Rapid radiation of Rheum (Polygonaceae) and parallel evolution of morphological traits. Mol. Phylogenet. Evol. 63, 150–158 (2012), 10.1016/j.ympev.2012.01.002. [DOI] [PubMed] [Google Scholar]
  • 88.Parins-Fukuchi T., Stull G. W., Smith S. A., Phylogenomic conflict coincides with rapid morphological innovation. Proc. Natl. Acad. Sci. U.S.A. 118, e2023058118 (2021), 10.1073/pnas.2023058118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Urban S., Gerwin J., Hulsey C. D., Meyer A., Kratochwil C. F., The repeated evolution of stripe patterns is correlated with body morphology in the adaptive radiations of East African cichlid fishes. Ecol. Evol. 12, e8568 (2022), 10.1002/ece3.8568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Simpson G. G., Tempo and Mode in Evolution (Columbia University Press, New York, 1944). [Google Scholar]
  • 91.Blomberg S. P., Garland T. Jr., Ives A. R., Testing for phylogenetic signal in comparative data: Behavioral traits are more labile. Evolution 57, 717–745 (2003), 10.1111/j.0014-3820.2003.tb00285.x. [DOI] [PubMed] [Google Scholar]
  • 92.Freckleton R. P., Harvey P. H., Detecting non-Brownian trait evolution in adaptive radiations. PLoS Biol. 4, e373 (2006), 10.1371/journal.pbio.0040373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Breithaupt L. C., Hibbins M. S., seastaR. GitHub. https://github.com/larabreithaupt/seastaR. Deposited March 15, 2023.
  • 94.Hibbins M. S., Gene tree pruning algorithm. GitHub. https://github.com/mhibbins/genetreepruningalg. Deposited November 11, 2022.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

Data Availability Statement

Source code and analysis scripts related to seastaR and the covariance matrix method can be found in https://github.com/larabreithaupt/seastaR (93). Code and scripts related to the pruning algorithm method can be found in https://github.com/mhibbins/genetreepruningalg (94).


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES