Abstract
Comparative biology seeks to unlock the power of cross-species trait variation to learn the rules of life. In this venture, modern studies increasingly leverage large datasets spanning many traits and levels of biological organization and complexity. To analyze these complex data in a statistically-sound manner, researchers must choose a phylogeny that is assumed to model the mean trait values across species—an assumption that may be tenuous depending on the true evolutionary architecture of the traits. Yet the consequences of this decision remain poorly understood, particularly for modern studies seeking to analyze multiple, distinct traits within the same framework. Here, we conduct a comprehensive simulation study to examine how tree choice impacts phylogenetic regression in large-scale analyses of many traits and species. We find that regression outcomes are highly sensitive to the assumed tree, sometimes yielding alarmingly high false positive rates as the number of traits and species increase together. Counterintuitively, adding more data exacerbates rather than mitigates this issue, highlighting the risks inherent for high-throughput analyses typical of modern comparative research. Experimental manipulations of tree topology in an empirical case study of gene expression and longevity traits further reveal extreme sensitivity to tree choice. While significant challenges remain in aligning traits with appropriate trees, we find compelling promise with robust estimators, which can mitigate the effects of tree misspecification under realistic evolutionary scenarios. Collectively, our findings underscore the critical need for careful tree selection in comparative studies while pointing to robust regression as a powerful tool for navigating phylogenetic uncertainty in modern evolutionary research.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12862-025-02451-2.
Introduction
Comparative biology has been transformed by phylogenetic regression [1, 2] and the wave of evolutionary advances that followed in its wake. Over the years, the principles of phylogenetic regression have been expanded [3, 4], clarified [5–7], refined [8, 9], and debated [10], laying the groundwork for phylogenetic comparative methods (PCMs) in the 21 st century. As modern trait datasets continue to grow in size and complexity, these methods have become essential tools for analyzing biodiversity [11–13]. Researchers aiming to study trait evolution across species must now heed Felsenstein’s warnings about the dangers of phylogenetic ignorance [1]. Importantly, all PCMs rest on a critical assumption: that the chosen tree accurately reflects the evolutionary history of the traits under study.
When considering the vastness of biodiversity, the range of biological traits available for comparative study seems equally limitless. Modern phylogenetic analyses of traits routinely span molecular to organismal scales—ranging from classical quantitative traits like brain size [14, 15] and longevity [16–18], to more recently studied genomic-era traits, such as gene expression [18–21] and three-dimensional chromosomal interactions [22]. Underlying these traits is a vast array of genetic architectures and associated phylogenies, the precise identities of which remain unknown for most traits [23]. Regardless, all PCMs require a phylogenetic tree [1, 24–30]. That is, researchers must choose one tree or another to model evolution [1, 24–30]. This raises a critical question: how does one choose the best tree (or trees) to study the evolutionary history of a dataset spanning many different traits together?
A natural inclination is to select the overall species-level phylogeny, often estimated from genomic data [31, 32]. This is a popular [33–36] and seemingly justifiable choice for certain traits, as many models describe the evolution of the mean trait value at the species level [5, 37, 38]. However, one could also argue for using a specific tree—or set of trees—representing the genealogies of genes encoding the architecture of a trait (e.g [35, 39–42]).,. If the evolution of a trait is governed by a specific gene or restricted set of genes, then its evolution may be best captured by the corresponding gene trees, which may or may not agree with the species tree [35, 39–42]. For example, gene expression evolution may follow the genealogy of the gene itself [35, 40, 43], while wingspan evolution might be better represented by the phylogenies of multiple genes involved in wing development. Extending this logic, a weighted mean of all possible trees might be appropriate for some quantitative traits [39, 41, 44]. Each of these strategies is justifiable, depending on the nature and genetic architecture of the traits under study. Practically speaking, researchers must commit to one of these choices [1, 20–26], often without knowing whether the decision is optimal—and rarely with a full understanding of the consequences of poor tree choice.
It seems that selecting a set of traits to study may be easier than choosing an appropriate tree. Of course, “all models are wrong, but some are useful” [45]. Seldom is the true phylogeny known, and the real challenge lies in determining whether an assumed tree adequately reflects the evolutionary history of a given trait. This is perhaps a more complex and pressing question than often appreciated [39, 44, 46], particularly in modern studies that span a wide range of complex traits with largely unknown genetic architectures [23]. Evidence from simple phylogenetic regression models with only a single predictor trait demonstrates sensitivity to tree misspecification [47]. However, the situation is more complicated for modern studies seeking to analyze an expanding breadth of biological traits varying widely in their complexities and associated phylogenies. Previous research has shown that, in some contexts, larger datasets can mitigate poor model fit in regression by diluting misleading signals from model misspecification [48–50]. This raises a key unanswered question: can more data buffer against the risks of poor tree choice? Here, we address this question by systematically exploring the impact of tree selection in modern evolutionary analyses of large-scale trait datasets.
Results
We hypothesized that the consequences of poor tree choice [47] could be mitigated by increasing the amount of available data, reflecting conditions of modern comparative studies. To test this hypothesis, we conducted a large-scale simulation study to examine how tree choice impacts phylogenetic regression (see Methods), focusing on the most pervasive and widely studied source of phylogenetic conflict: gene tree–species tree mismatch [51–53]. In these simulations, we modeled traits according to a particular tree and assessed how different tree assumptions affected regression outcomes. We considered two simple scenarios representing correct tree choice: one in which the trait evolved along a gene tree and the gene tree was assumed (GG), and another in which the trait evolved along the species tree and the species tree was assumed (SS). We also evaluated four simple scenarios involving incorrect tree choice: the first in which the trait evolved along the gene tree but the species tree was assumed (GS), the second in which the trait evolved along the species tree but the gene tree was assumed (SG), the third in which a random tree unrelated to trait evolution was assumed (RandTree), and the fourth in which no tree was assumed (NoTree). We then extended our analysis to consider a more complex and realistic setting in which each trait evolved along its own trait-specific gene tree. In this case, we assessed misspecified phylogenetic regression performance under the GS, RandTree, and NoTree scenarios. This framework allowed us to systematically investigate how tree choice affects the accuracy of phylogenetic regression across varying numbers of traits, species, and levels of phylogenetic conflict (speciation rate, see Methods).
We first evaluated the accuracy of conventional phylogenetic regression in the six simple simulation scenarios in which all traits were generated under the same tree (Figs. S1 and S2). When the correct tree was assumed (GG or SS), false positive rates consistently remained below the widely acceptable threshold of 5%, regardless of the number of traits, number of species, or speciation rate. In contrast, conventional regression proved highly sensitive to incorrect tree choice, yielding excessively high false positive rates across all four mismatched scenarios (GS, SG, RandTree, and NoTree). A clear pattern emerged: false positive rates increased with more traits, more species, and higher speciation rates. That is, assuming an incorrect tree led to worse outcomes as dataset size or speciation rate increased—with false positive rates soaring to nearly 100% in some scenarios. The identity of the assumed tree also played a major role in model performance, with SG generally performing best among the mismatched scenarios, followed by GS, NoTree, and RandTree (Figs. S1 and S2). Though the better performance of SG relative to GS is consistent with previous findings [47], both scenarios produced unacceptably high false positive rates, with GS sometimes performing substantially worse than NoTree. Further, the consistently higher performance of RandTree relative to NoTree suggests that assuming a random tree may be worse than ignoring phylogeny altogether.
Given the high sensitivity of conventional phylogenetic regression to tree choice, we next explored potential solutions with robust estimators, hypothesizing that they could reduce this sensitivity. Application of a robust sandwich estimator [54, 55] to the same simulation data and scenarios indeed revealed consistently lower sensitivity to incorrect tree choice (Figs. S1 and S2). Specifically, robust phylogenetic regression nearly always yielded lower false positive rates than its conventional counterpart for the same tree choice scenario. The greatest improvements were observed for RandTree, followed by GS, SG, and NoTree. Notably, while assuming a random tree led to the worst outcomes with conventional regression, robust regression delivered the largest performance gains in this challenging scenario. When the number of species was large, robust regression reduced false positive rates for RandTree to levels even lower than those observed for GS with conventional regression. It also reduced sensitivity in the GS mismatch scenario, decreasing false positive rates from 56 to 80% down to 7–18% in analyses of large trees, often making them comparable to those for SG with conventional regression.
We then extended our evaluations of conventional and robust phylogenetic regression to the more complex and realistic setting in which each trait evolved along its own trait-specific gene tree (Fig. 1). As before, conventional regression yielded unacceptably high false positive rates across all mismatched scenarios (GS, RandTree, and NoTree), which increased with more traits, more species, and higher speciation rates. The ordering of performance also remained consistent, with GS generally performing best, followed by NoTree and then RandTree. In contrast, despite the added complexity of heterogeneous trait histories, robust regression still markedly outperformed conventional regression across all misspecified scenarios. This time, the most pronounced gains were observed for GS, followed by RandTree and then NoTree. Notably, the improvement for GS was so substantial that false positive rates nearly always dropped near or below the widely accepted 5% threshold, demonstrating that robust regression can effectively rescue tree misspecification under realistic and challenging conditions.
Fig. 1.
Robust regression rescues poor tree choice in a realistic setting in which all traits evolve according to their own specific tree. Estimated false positive rate plotted as a function of speciation rate
for simulations with
and
traits (left-to-right columns) and
and
species (top-to-bottom rows). Solid lines represent results for conventional phylogenetic regression, while dashed lines represent results for robust regression, with line color corresponding to tree choice scenarios: red (GS), purple (RandTree), and blue (NoTree)
Last, we investigated the impact of tree choice in an empirical context by applying conventional and robust phylogenetic regression to a dataset comprising expression levels of 15,898 genes in three tissues from 106 mammals and two life history traits related to lifespan (maximum lifespan and female time to maturity; see Methods) [18]. In this analysis, we tested for associations between gene expression and lifespan traits, comparing results obtained assuming the original species tree to those assuming seven increasingly perturbed trees. Specifically, we experimentally manipulated the original tree using nearest neighbor interchanges (NNIs) [56] to generate a series of trees with progressively larger topological changes, representing increasingly poor empirical tree choices. To quantify the impact of tree choice, we computed two measures:
![]() |
and
![]() |
where
and
are the P-values obtained assuming the original and perturbed trees, respectively, and the asterisk denotes the use of the robust estimator. Thus,
captures the difference in significance between conventional regression on the original versus perturbed tree, while
represents the difference in significance between conventional and robust regression on the same perturbed tree, allowing us to separately assess the sensitivity of conventional regression to tree misspecification (
) and the ability of robust regression to mitigate these effects (
).
Our empirical study uncovered compelling evidence of strong sensitivity to tree choice with conventional phylogenetic regression (Figs. S3-S8). Across experiments, both the number of predictor genes and NNIs inflated the significance of regression results when assuming manipulated trees. These findings were recapitulated for analyses of both traits and all three tissues, as indicated by increasing
values with more genes and NNIs. That is, conventional regression tended to produce more significant associations when assuming a random tree unlikely to reflect the true evolutionary history of a trait. As in our simulation studies, robust phylogenetic regression showed improved performance, with trends toward negative
values in some conditions (Figs. S3-S8). In these cases,
often moved in the opposite direction of
, suggesting that robust regression yielded lower significance than its conventional counterpart when applied to the same perturbed trees.
Focused analyses of 40 candidate genes [18] per tissue—previously identified as associated with longevity—produced results broadly consistent with our main findings (Fig. 2). Conventional phylogenetic regression again yielded inflated significance for models fit to perturbed trees, reflected in increasingly positive
values with more NNIs (Fig. 2a-c). However, comparisons between conventional and robust regression revealed more nuanced, tissue-specific trends. In brain and liver,
was positive when NNI
, but shifted toward negative values when NNI
. Thus, robust regression tended to yield more significant associations than conventional regression when the original and assumed trees were similar, whereas the opposite was true when they differed substantially. Consistent with this pattern, assuming the original phylogeny (NNI
) resulted in higher significance for robust regression in brain (
) and liver (
= 0.96), but not in kidney (
= −0.014). These analyses also revealed high gene-to-gene variability in sensitivity to tree choice (Figs. 2d-f). In particular, while most genes exhibited negative
and
values, a small subset showed shifts toward positive values as tree perturbation increased.
Fig. 2.
Tree choice influences empirical tests of trait associations. Results for conventional and robust phylogenetic regression applied to mammalian gene expression data from 40 candidate genes associated with maximum lifespan in brain (left), liver (center), and kidney (right). Distributions of
and
(a-c) and heatmaps illustrating
and
of each candidate gene (d-f) for NNI = 1, 2, 5, 10, 25, 50, and 100
Discussion
Our results for conventional phylogenetic regression point to a simple truth: we cannot trust inferences of trait evolution when we cannot trust the assumed tree upon which they are based. Phylogenetic comparisons require a tree [1, 20–26], and this decision has significant consequences for modern studies that increasingly sample many species and traits of varying complexity. We recently showed that tree mismatch can bias simple phylogenetic regression models when analyzing a single predictor trait [47], raising the question of whether adding more trait data could alleviate this issue. Our results make clear that it does not. Choosing an incorrect tree can substantially inflate support for spurious evolutionary associations. These findings add to a growing realization that accurate tree choice—not just accurate estimation [57, 58]—is essential for sound evolutionary inference.
How should we interpret these findings, and what do they mean for empirical studies? Answers to these questions depend on both confidence in the assumed tree and the extent to which it reflects the true genetic architecture of the studied trait. We observed a clear ranking in the severity of poor tree choice: assuming a random and likely grossly incorrect species tree was often worse than assuming no tree at all, followed by incorrectly assuming the species tree, and then incorrectly assuming the gene tree. Notably, the better performance of no tree relative to a random tree has also been demonstrated in previous work [59, 60]. That said, we included the RandTree and NoTree scenarios primarily for context, as such strategies are unlikely to be applied in modern studies. Nonetheless, our results suggest that omitting a tree may in some cases be preferable when its identity and relation to a studied trait are highly uncertain.
All is not lost, however, as our findings also show that robust phylogenetic regression offers a promising path forward in challenging scenarios (see also [39, 47]). Across all scenarios explored here, robust regression nearly always outperformed its conventional counterpart. Notably, it helped mitigate the adverse effects of incorrectly assuming the species tree, which may be of particular importance given that the species tree is a popular choice [33–36]. In our most realistic simulations, robust regression not only improved performance under this scenario, but effectively neutralized tree misspecification, transforming a major analytical vulnerability into a manageable source of uncertainty. We also found promising evidence of adequate power to detect true trait relationships despite poor tree choice (see Supplementary Materials). Together, these findings suggest that robust regression has the potential to strengthen the foundation of comparative studies, enabling researchers to move forward despite persistent challenges in tree choice.
Knowledge (or lack thereof) of the phylogenetic identity and genetic architecture of traits is key to choosing a tree. If a chosen tree does indeed reflect the genetic architecture of a trait, then the specter of false significance should be nothing to fear. On the other hand, there is good reason to be cautious when tree identity is uncertain, and our results argue that simply assuming the species tree is not a guaranteed solution. Even so, applying robust regression alongside the species tree assumption offered a promising improvement, nearly always yielding fewer false positives than conventional regression—an especially important result given that modern studies often target traits with largely unknown genetic architectures [23]. Many trait models are based on classical models of quantitative genetics that describe the evolution of the mean trait value of species [1, 37, 61, 62], which has led to the prominence of species trees in comparative studies [33–36]. If model assumptions hold, these issues may be less relevant for certain traits. Researchers have questioned the irrational exuberance for species trees [35], and we find a promising path through analytical rescue with robust regression applied to the species tree, even for traits encoded by many gene trees. The possibility—if not plausibility—that genetic architecture itself evolves and differs among related species is also a potential source of issues, which we leave for future studies to explore.
It is worth emphasizing that our study is deliberately focused on the problem of accurate tree choice and identity. That is, we asked whether a trait evolved according to one tree or another and probed the consequences of choosing the wrong tree (or no tree at all). We sought to isolate the impact of tree choice by using known, simulated trees, rather than estimated ones. Practically speaking, this means that our simulations effectively represent “best-case” scenarios with perfectly estimated trees, free of error, where we either get the identity right or wrong. Previous studies investigated the consequences of simple estimation error [47, 63], though arguably more work is needed. Additionally, there are many sources of biological conflict [64–68], such as selection, migration, population structure, and gene duplication, that future studies will need to examine to fully appreciate these complexities and their implications for comparative trait studies.
Our findings can be framed in light of previous investigations into the impacts of phylogenetic error and tree misspecification. Importantly, the asymptotic properties of generalized least squares are not guaranteed when the assumed and true covariance structures differ [59, 60, 69], as occurs when phylogenetic regression is based on an incorrect tree. Errors in tree topology or branch lengths can distort the assumed covariance structure, causing standard errors to be underestimated and confidence intervals to be biased even if coefficient estimates remain unbiased [70]. This issue is likely reflected in our amplified false positive rates that track increasing levels of gene tree-species tree discordance with higher speciation rates. Motivated by Felsenstein’s independent contrasts and analyses of a simple primate phylogeny, Stone (2011) [71] showed that phylogenetic regression can be robust to mild perturbations to the assumed tree; this pattern is also reflected in our empirical case study for smaller NNIs. However, Stone’s (2011) study also suggested that this robustness may be data-dependent, as observed in analyses of a small five-species phylogeny. Our results expand on these findings by applying them to much larger datasets with greater number of traits and species. Additionally, we focus on the specific process of incomplete lineage sorting as a source of conflict that can cause differences in both topology and branch lengths resulting from coalescent processes. Studies based on discrete traits have likewise found similar issues with biased estimates of substitution rates [46], while others have documented inflated significance of phylogenetic ANOVA due to hemiplasy [41] and measures of phenotypic divergence when forced to randomly resolve polytomies [72].
We explored one promising avenue with robust phylogenetic regression [55], though other approaches are also emerging. While not a perfect solution, robust estimators generally outperformed conventional regression, reducing the impact of model misspecification in our most realistic simulation setting. It is important to note, however, that these estimators have their own limitations, including the need for small-sample bias correction and the potential for overcorrection [73–75]. Recently, multi-response phylogenetic models [76, 77] have been developed to partition the effects of phylogeny alongside other sources of error while modeling among-trait correlations. These methods therefore represent an important complementary direction for explicitly incorporating multiple sources of phylogenetic error into comparative analyses. Recent advances in genome-scale comparative methods also show potential for modeling phylogenetic architecture as a weighted contribution of gene trees [39]. Other studies have sought ways to model trait evolution without strong a priori knowledge of trees [34, 36], and established strategies exist for accommodating uncertainty due specifically to estimation error obtained from Bayesian posteriors [78, 79]. Yet, it is important to emphasize that these methods are primarily equipped to address uncertainty due to tree reconstruction errors and do not specifically target the problem of tree choice itself. If a trait truly evolves according to a specific configuration of genes and associated genealogies, then fitting trait models across a distribution of species tree estimates may still be problematic [35]. Future work in this area will be important for developing robust solutions that account for both tree choice and tree estimation error.
Our results contribute to a growing appreciation for careful and thoughtful phylogenetic modeling in modern comparative analyses [4, 44]. We aimed to provide a foundational understanding of the perils of tree choice by focusing on a simulation and empirical case studies that arguably represent “base case” conditions. In reality, evolution is highly complex and shaped by a wide array of processes that generate variation both within and among species. Many traits of interest are polygenic, often influenced by hundreds or even thousands of genes. In these situations, the true phylogeny may be unattainable, yet our results suggest that robust methods may help even for complex trait datasets shaped by many trees. An important related question is whether predictor traits evolve in a colinear fashion, which could introduce instabilities into regression models [80]. Moreover, models of trait evolution increasingly incorporate additional processes, such as adaptive optima [81, 82], early-bursts [83], and non-Brownian dynamics [84]. Future studies that evaluate the impacts of tree choice under these added complexities will be valuable for advancing phylogenetic regression and comparative methods more broadly.
Materials and methods
Revealing the problem: simulating poor tree choice
Following previous studies [46, 85], we modeled trait evolution based on the principles of Brownian motion and phylogenetic regression via independent contrasts [1]. In this model, traits evolve according to either the overall species tree
or a single gene tree
. For tree
, we simulated the response trait vector
as
![]() |
1 |
,
where phylogenetic signal was incorporated into both the response and predictor traits. The design matrix
and model residual vector
were modeled using a multivariate normal (MVN) distribution with phylogenetic covariance matrix
. The first column of
corresponded to the intercept, represented by a vector
of all ones, while the remaining columns contained
predictor traits organized into vectors
, defined according to a specific tree [2, 24, 86], reflecting shared evolutionary ancestry [87, 88]. The vector
contained model parameters, with the first element representing the intercept
, and the remaining elements corresponding to the slope parameters
for each of the
predictor traits. Therefore each
,
, represents the effect of a particular trait on the response variable. To test the null hypothesis that none of the predictor traits included in the model were associated with the response trait
, we employed multiple phylogenetic regression with F-tests using the standard significance threshold of
. The null hypothesis assumed that none of the predictor traits in the multiple regression model influenced the response trait (
). Under this null hypothesis, we expect approximately 5% of simulations to yield a statistically significant F statistic with P-value
by chance alone (false positive). By focusing on the model-wide significance based on the F statistic, we simultaneously tested the null hypothesis that all regression coefficients are zero without performing multiple comparisons.
These two models (
or
) defined the data-generating processes for simulating response and predictor traits according to a given tree. After simulating the traits, we computed their phylogenetic independent contrasts (PICs) using the original Felsenstein algorithm [1]. Choosing an appropriate tree for PIC computation is crucial [1, 20–26], as phylogenetic regression assumes that this tree accurately represents the evolutionary histories of the studied traits. This step in comparative analyses determines whether phylogenetic regression is matched (the same tree is used for both simulation and PIC regression) or mismatched (different trees are used for simulation and PIC regression). We explored two matched scenarios: SS (
used for both simulations and PICs) and GG (
used for both simulations and PICs). We also explored two simple mismatched scenarios: SG (
used for simulations,
used for PICs) and GS (
used for simulations,
used for PICs). For each replicate under these four scenarios, we generated a simulated dataset and performed multiple phylogenetic regression to compute F statistics and P-values. Additionally, we considered two additional scenarios of mismatched regression: RandTree, where a random tree unrelated to the generating tree was used to compute PICs, and NoTree, where regression was applied directly to the raw trait data without computing PICs.
Our multifactorial simulation experiment varied three key parameters: the number of predictor traits
, the number of species
, and the level of gene tree-species tree conflict (determined by the speciation rate
) [89],. Specifically, we simulated trait datasets with differing values of
,
, and
. For our initial analyses, we considered a smaller range of predictor traits
with small, medium, and large trees of
species. For a second set of more expansive analyses, we used a larger range of predictor traits
with larger tree sizes
. The speciation rate
influences the probability of gene tree-species tree conflict, as faster
results in shorter intervals between speciation events and less time for coalescence events to occur, thereby increasing the probability of ILS. Hence, by simultaneously varying
, and
, our simulations captured a broad array of phylogenetic and experimental conditions.
To align with similar studies [47], our simulation protocol followed a multi-step process: (1) a speciation rate
was selected, (2) a species tree S was simulated using sim.bd.taxa.age from TreeSim [90] according to a birth-death process with a fixed depth of 10, birth rate
, and death rate =
, (3) a gene tree
was simulated from the species tree S under the multispecies coalescent process using sim.coaltree.phylo from Phybase [91], and (4) two distinct trait datasets were generated, one using
(
,
), and the other using
(
,
). After generating the traits, we computed PICs using the pic function from ape [92] under the four modeling scenarios (SS, GG, SG, and GS). Lastly, phylogenetic regression was conducted for each scenario to compute F statistics and associated P-values for the full multiple regression model comprising all
predictor traits. This process was repeated for
replicates under each combination of simulation conditions. In all cases, traits were generated under the null hypothesis of statistical independence
), allowing us to estimate the false positive rate as the proportion of replicates with P
for each scenario (SS, GG, SG, and GS). For the RandTree scenario, PICs were computed using a randomly generated birth-death tree unrelated to the true trait-generating tree, while for the NoTree scenario, we applied standard non-phylogenetic regression directly to the simulated raw trait values (Step 4) without computing PICs on any tree.
To complement these analyses, we conducted a separate set of simulation experiments reflecting a more complex and realistic scenario in which each trait evolved according to its own trait-specific gene tree. That is, each of the
predictor traits, as well as the response trait itself, was associated with its own gene tree. We then investigated the impacts of incorrectly assuming the species tree (GS), a random species tree (RandTree), or no tree at all (NoTree) when computing PICs for phylogenetic regression.
Robust phylogenetic regression
Alongside our analyses of conventional phylogenetic regression, we applied a robust estimator using the lm_robust function with option se_type = “HC3” from the estimatr [54] package to compute F statistics and associated P-values. This estimator includes an estimate of the residual variance- covariance structure during the model fitting process [93], and this residual structure accounts for both residuals and leverage for each observation, thereby guarding against high leverage outliers. Specifically, this algorithm is known as a sandwich estimator [93, 94] that employs the HC3 variant of the Huber-White heteroskedasticity-consistent covariance matrix estimator. The HC3 adjustment corrects for small-sample bias by downweighing the influence of high-leverage observations, achieved by scaling each residual by the square of 1 minus its leverage. This correction present within the HC3 estimator improves upon earlier versions (HC0, HC1, and HC2) for settings of small sample sizes, and thus small tree sizes within our study. We chose this particular estimator over our previously implemented algorithms [55] because it is better suited to addressing the issue of poor covariance specification.
An empirical test of tree choice
We obtained all data for our empirical analyses from a prior study [18]. These data comprised a phylogeny of 106 mammals, two longevity traits (maximum lifespan and female time to maturity), and normalized expression levels of 15,898 genes in three tissues (brain, kidney, and liver). For species with multiple sampled individuals in a given tissue, we computed the mean expression value to obtain a single estimate per species. These values were then log2-transformed following the protocol of the original study [18]. As in the original study, we sought to link variation in gene expression across species (predictor variables) with each of the two life history traits (response variables).
Our empirical case study involved experimentally manipulating phylogenetic trees and using these perturbed trees as input for phylogenetic regression. To address the high dimensionality of the dataset, we first employed a random sampling strategy that included: (1) sampling uniformly at random without replacement a set of
genes, (2) computing PICs for each gene using the original tree, (3) mimicking tree error by generating an alternative perturbed tree through one, two, five, 10, 25, 50, or 100 sequential, random NNIs [56], (4) computing PICs using the perturbed tree, and (5) fitting two separate phylogenetic regression models: one assuming the original tree and one assuming the perturbed tree. NNIs were computed using the NNI function in the R package TreeSearch [95], which reassigns branch lengths after interchanges. We repeated this process
times for each combination of
and NNI to obtain a distribution of
P-values for both models, allowing us to compare regression model fit between the original and perturbed trees. We applied this same strategy to our focused analyses of candidate genes, for which we obtained three distinct sets of the top 40 most significant genes previously inferred as associated with maximum lifespan in one of the three tissues [18].
Supplementary Information
Acknowledgements
This research was supported by a grant from the Arkansas Bioscience Institutes, as well as the Arkansas High Performance Computing Center, which is funded through multiple National Science Foundation grants and the Arkansas Economic Development Commission. This work was also supported by National Science Foundation grants NSF DBI-2130666, DEB-2529693, as well as National Institutes of Health grants R35GM142438 and R35GM128590, and start-up funds provided by the University of Arkansas.
Authors’ contributions
M.S.D and R.H.A conducted analyses and prepared figures. R.H.A, R.A, and M.D contributed to experimental, conceptual and general study design. All authors wrote, edited, and reviewed the manuscript.
Data availability
All data and materials are available as Supplementary Materials associated with this publication and in a public GitHub public repository: https://github.com/radamsRHA/TreeChoiceExperiment.
Declarations
Ethics approval and consent to participate
Not Applicable.
Consent for publication
Not Applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Raquel Assis, Email: rassis@fau.edu.
Richard Adams, Email: adamsrh@uark.edu.
References
- 1.Felsenstein J. Phylogenies and the comparative method. Am Nat. 1985;125:1–15. [Google Scholar]
- 2.Grafen A. The phylogenetic regression. Phil Trans R Soc Lond B. 1989;326:119–57. [DOI] [PubMed] [Google Scholar]
- 3.Pennell MW, Harmon LJ. An integrative view of phylogenetic comparative methods: connections to population genetics, community ecology, and paleobiology. Ann N Y Acad Sci. 2013;1289:90–105. [DOI] [PubMed] [Google Scholar]
- 4.Uyeda JC, Zenil-Ferguson R, Pennell MW. Rethinking phylogenetic comparative methods. Syst Biol. 2018;67:1091–109. [DOI] [PubMed] [Google Scholar]
- 5.Harvey PH, Pagel MD. The comparative method in evolutionary biology. Oxford: Oxford University Press; 1991. 10.1093/oso/9780198546412.001.0001. [Google Scholar]
- 6.Huey RB, Garland T, Turelli M. Revisiting a key innovation in evolutionary biology: Felsenstein’s phylogenies and the comparative method. Am Nat. 2019;193:755–72. [DOI] [PubMed] [Google Scholar]
- 7.Sanford GM, Lutterschmidt WI, Hutchison VH. Comp Method Revisit BioScience. 2002;52:830. [Google Scholar]
- 8.Beaulieu JM, Jhwueng D-C, Boettiger C, O’Meara BC. Modeling stabilizing selection: expanding the Ornstein-Uhlenbeck model of adaptive evolution. Evolution. 2012;66:2369–83. [DOI] [PubMed] [Google Scholar]
- 9.Felsenstein J. Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods in enzymology. Volume 266. Elsevier; 1996. pp. 418–27. [DOI] [PubMed]
- 10.Blomberg SP, Garland T, Ives AR. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution. 2003;57:717–45. [DOI] [PubMed] [Google Scholar]
- 11.Cornwallis CK, Griffin AS. A guided tour of phylogenetic comparative methods for studying trait evolution. Annu Rev Ecol Evol Syst. 2024;55:181–204. [Google Scholar]
- 12.Cornwell W, Nakagawa S. Phylogenetic comparative methods. Curr Biol. 2017;27:R333–6. [DOI] [PubMed] [Google Scholar]
- 13.Dewar AE, Belcher LJ, West SA. A phylogenetic approach to comparative genomics. Nat Rev Genet. 2025. 10.1038/s41576-024-00803-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Lande R. Quantitative genetic analysis of multivariate evolution, applied to brain: body size allometry. Evolution. 1979;33:402–16. [DOI] [PubMed] [Google Scholar]
- 15.Smaers JB, et al. The evolution of mammalian brain size. Sci Adv. 2021;7:eabe2101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Magalhães JPD, Costa J, Church GM. An analysis of the relationship between metabolism, developmental schedules, and longevity using phylogenetic independent contrasts. Journals Gerontology: Ser A. 2007;62:149–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Cagan A, et al. Somatic mutation rates scale with lifespan across mammals. Nature. 2022;604:517–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Liu W, et al. Large-scale across species transcriptomic analysis identifies genetic selection signatures associated with longevity in mammals. EMBO J. 2023;42:e112740. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rohlfs RV, Harrigan P, Nielsen R. Modeling gene expression evolution with an extended Ornstein–Uhlenbeck process accounting for within-species variation. Mol Biol Evol. 2014;31:201–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Dunn CW, Luo X, Wu Z. Phylogenetic analysis of gene expression. Integr Comp Biol. 2013;53:847–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Brawand D, et al. The evolution of gene expression levels in mammalian organs. Nature. 2011;478:343–8. [DOI] [PubMed] [Google Scholar]
- 22.Yang Y, Zhang Y, Ren B, Dixon JR, Ma J. Comparing 3D genome organization in multiple species using Phylo-HMRF. Cell Syst. 2019;8:494–e50514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lappalainen T, Li YI, Ramachandran S, Gusev A. Genetic and molecular architecture of complex traits. Cell. 2024;187:1059–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Martins EP, Garland T. Phylogenetic analyses of the correlated evolution of continuous characters: a simulation study. Evolution. 1991;45:534–57. [DOI] [PubMed] [Google Scholar]
- 25.Gittleman J, Luh HK. On comparing comparative methods. Annu Rev Ecol Syst. 1992. 10.1146/annurev.es.23.110192.002123. [Google Scholar]
- 26.Miles D, Dunham A. Historical perspectives in ecology and evolutionary biology: the use of phylogenetic comparative analyses. Annu Rev Ecol Syst, 587–619 (1993). https://www.jstor.org/stable/2097191?seq=1
- 27.Boettiger C, Coop G, Ralph P. Is your phylogeny informative? Measuring the power of comparative methods: is your phylogeny informative? Evolution. 2012;66:2240–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Cressler CE, Butler MA, King AA. Detecting adaptive evolution in phylogenetic comparative analysis using the Ornstein–Uhlenbeck model. Syst Biol. 2015;64:953–68. [DOI] [PubMed] [Google Scholar]
- 29.Harmon L. Phylogenetic comparative methods: learning from trees. Scotts Valley: (CreateSpace Independent Publishing Platform; 2018. [Google Scholar]
- 30.Brahmantio B, Bartoszek K, Yapar E. Bayesian inference of mixed Gaussian phylogenetic models. (2024). Preprint at 10.48550/ARXIV.2410.11548.
- 31.Brassac J, Blattner FR. Species-level phylogeny and polyploid relationships in Hordeum (Poaceae) inferred by next-generation sequencing and in silico cloning of multiple nuclear loci. Syst Biol. 2015;64:792–808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Marcussen T, Meseguer AS. Species-level phylogeny, fruit evolution and diversification history of geranium (Geraniaceae). Mol Phylogenet Evol. 2017;110:134–49. [DOI] [PubMed] [Google Scholar]
- 33.Pagel M. Inferring the historical patterns of biological evolution. Nature. 1999;401:877–84. [DOI] [PubMed] [Google Scholar]
- 34.Einum S, Fleming IA. Of chickens and eggs: diverging propagule size of iteroparous and semelparous organisms. Evolution. 2007;61:232–8. [DOI] [PubMed] [Google Scholar]
- 35.Hahn MW, Nakhleh L. Irrational exuberance for resolved species trees. Evolution. 2016;70:7–17. [DOI] [PubMed] [Google Scholar]
- 36.Moyers Arévalo RL, Amador LI, Almeida FC, Giannini NP. Evolution of body mass in bats: insights from a large supermatrix phylogeny. J Mammal Evol. 2020;27:123–38. [Google Scholar]
- 37.Felsenstein J. Phylogenies and quantitative characters. Annu Rev Ecol Syst. 1988;19:445–71. [Google Scholar]
- 38.Felsenstein J. Inferring Phylogenies. Sinauer Associates (2004).
- 39.Hibbins MS, Breithaupt LC, Hahn MW. Phylogenomic comparative methods: accurate evolutionary inferences in the presence of gene tree discordance. Proc Natl Acad Sci U S A. 2023;120:e2220389120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Dimayacyac JR, Wu S, Jiang D, Pennell M. Evaluating the performance of widely used phylogenetic models for gene expression evolution. Genome Biol Evol. 2023;15:evad211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Mendes FK, Fuentes-González JA, Schraiber JG, Hahn MW. A multispecies coalescent model for quantitative traits. eLife. 2018;7:e36482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kryuchkova-Mostacci N, Robinson-Rechavi M. Tissue-specificity of gene expression diverges slowly between orthologs, and rapidly between paralogs. PLoS Comput Biol. 2016;12:e1005274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Avise JC, Robinson TJ. Hemiplasy. A new term in the lexicon of phylogenetics. Syst Biol. 2008;57:503–7. [DOI] [PubMed] [Google Scholar]
- 44.Schraiber JG, Edge MD, Pennell M. Unifying approaches from statistical genetics and phylogenetics for mapping phenotypes in structured populations. PLoS Biol. 2024;22:e3002847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Box GEP. Science and statistics. J Am Stat Assoc. 1976;71:791–9. [Google Scholar]
- 46.Mendes FK, Hahn MW. Gene tree discordance causes apparent substitution rate variation. Syst Biol. 2016;65:711–21. [DOI] [PubMed] [Google Scholar]
- 47.Adams R, et al. A tale of too many trees: a conundrum for phylogenetic regression. Mol Biol Evol. 2025;msaf032. 10.1093/molbev/msaf032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Domingos P. A few useful things to know about machine learning. Commun ACM. 2012;55:78–87. [Google Scholar]
- 49.Halevy A, Norvig P, Pereira F. The unreasonable effectiveness of data. IEEE Intell Syst. 2009;24:8–12. [Google Scholar]
- 50.Hastie T, Friedman J, Tibshirani R. The elements of statistical learning. New York, NY: Springer; 2001. 10.1007/978-0-387-21606-5. [Google Scholar]
- 51.Maddison WP. Gene trees in species trees. Syst Biol. 1997;46:523–36. [Google Scholar]
- 52.Degnan JH, Rosenberg NA. Discordance of species trees with their most likely gene trees. PLoS Genet. 2006;2:e68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Edwards SV, IS A NEW AND GENERAL THEORY OF MOLECULAR. Syst EMERGING? Evolution. 2009;63:1–19. [DOI] [PubMed] [Google Scholar]
- 54.Blair G, Cooper J, Coppock A, Humphreys M, Sonnet L. estimatr: Fast Estimators for Design-Based Inference package version 1.0.4 (2024) https://CRAN.R-project.org/package=estimatr.
- 55.Adams R, Cain Z, Assis R, DeGiorgio M. Robust phylogenetic regression. Syst Biol. 2024;73:140–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.DasGupta B. On Computing the Nearest Neighbor Interchange Distance. in Proc. DIMACS Workshop on Discrete Problems with Medical Applications 55, 125–143 (1998).
- 57.Revell LJ, et al. Comparing evolutionary rates between trees, clades and traits. Methods Ecol Evol. 2018;9:994–1005. [Google Scholar]
- 58.Caetano DS, Harmon LJ. Estimating correlated rates of trait evolution with uncertainty. Syst Biol. 2019;68:412–29. [DOI] [PubMed] [Google Scholar]
- 59.Blomberg SP, Lefevre JG, Wells JA, Waterhouse M. Independent contrasts and PGLS regression estimators are equivalent. Syst Biol. 2012;61:3. [DOI] [PubMed] [Google Scholar]
- 60.Mittelhammer R, Judge GG, Miller DJ. Econometric foundations pack with CD-ROM. Cambridge University Press. 2000.
- 61.Revell L, Harmon L. Testing quantitative genetic hypotheses about the evolutionary rate matrix for continuous characters. Evol Ecol Res. 2008;10:311–31. [Google Scholar]
- 62.Blomberg SP, Rathnayake SI, Moreau CM. Beyond brownian motion and the Ornstein-Uhlenbeck process: stochastic diffusion models for the evolution of quantitative characters. Am Nat. 2020;195:145–65. [DOI] [PubMed] [Google Scholar]
- 63.Symonds MRE. The effects of topological inaccuracy in evolutionary trees on the phylogenetic comparative method of independent contrasts. Syst Biol. 2002;51:541–53. [DOI] [PubMed] [Google Scholar]
- 64.Lanier HC, Knowles LL. Is recombination a problem for species-tree analyses? Syst Biol. 2012;61:691–701. [DOI] [PubMed] [Google Scholar]
- 65.Bartoszek K, Glémin S, Kaj I, Lascoux M. Using the Ornstein–Uhlenbeck process to model the evolution of interacting populations. J Theor Biol. 2017;429:35–45. [DOI] [PubMed] [Google Scholar]
- 66.Adams RH, Schield DR, Card DC, Castoe TA. Assessing the impacts of positive selection on coalescent-based species tree estimation and species delimitation. Syst Biol. 2018;67:1076–90. [DOI] [PubMed] [Google Scholar]
- 67.Koch H, DeGiorgio M. Maximum likelihood estimation of species trees from gene trees in the presence of ancestral population structure. Genome Biol Evol. 2020;12:3977–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Begum T, Robinson-Rechavi M. Special care is needed in applying phylogenetic comparative methods to gene trees with speciation and duplication nodes. Mol Biol Evol. 2021;38:1614–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Casella G, Berger RL. Statistical inference, Advanced Series, 20022nd ed. Pacific Grove (CA): Duxbury.
- 70.Rohle FJ. A comment on phylogenetic correction. Evolution. 2006;60:1509–15. [DOI] [PubMed] [Google Scholar]
- 71.Stone EA. Why the phylogenetic regression appears robust to tree misspecification. Syst Biol. 2011;60:245–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Rabosky DL. No substitute for real data: a cautionary note on the use of phylogenies from birth–death polytomy resolvers for downstream comparative analyses. Evolution. 2015;69:3207–16. [DOI] [PubMed] [Google Scholar]
- 73.Hayes AF, Cai L. Using heteroskedasticity-consistent standard error estimators in OLS regression: an introduction and software implementation. Behav Res Methods. 2007;39:709–22. [DOI] [PubMed] [Google Scholar]
- 74.Kauermann G, Carroll RJ. A note on the efficiency of sandwich covariance matrix estimation. J Am Stat Assoc. 2001;96:1387–96. [Google Scholar]
- 75.Wilcox RR, Comment. Am Stat. 2001;55:374–5. [Google Scholar]
- 76.Halliwell B, Holland BR, Yates LA. Multi-response phylogenetic mixed models: concepts and application. Biol Rev. 2025;100:1294–316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Westoby M, Yates L, Holland B, Halliwell B. Phylogenetically conservative trait correlation: quantification and interpretation. J Ecol. 2023;111:2105–17. [Google Scholar]
- 78.Villemereuil PD, Wells JA, Edwards RD, Blomberg SP. Bayesian models for comparative analysis integrating phylogenetic uncertainty. BMC Evol Biol. 2012;12:102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Fuentes -G, Polly JA, P. D., Martins EP. A bayesian extension of phylogenetic generalized least squares: incorporating uncertainty in the comparative study of trait relationships and evolutionary rates. Evolution. 2020;74:311–25. [DOI] [PubMed] [Google Scholar]
- 80.Mason CH, Perreault Jr WD. Collinearity, power, and interpretation of multiple regression analysis. J Mark Res. 1991;28:268–80. [Google Scholar]
- 81.Hansen TF. Stabilizing selection and the comparative analysis of adaptation. Evolution. 1997;51:1341–51. [DOI] [PubMed] [Google Scholar]
- 82.Butler MA, King AA. Phylogenetic comparative analysis: a modeling approach for adaptive evolution. Am Nat. 2004;164:683–95. [DOI] [PubMed] [Google Scholar]
- 83.Harmon LJ, Losos JB, Davies J, Gillespie T, Gittleman RG, Jennings JLB, Kozak W, McPeek KH, Moreno-Roark MA, Near F, T.J. and, Purvis A. Early bursts of body size and shape evolution are rare in comparative data. Evolution. 2010;64:2385–96. [DOI] [PubMed] [Google Scholar]
- 84.Landis MJ, Schraiber JG, Liang M. Phylogenetic analysis using Lévy processes: finding jumps in the evolution of continuous traits. Syst Biol. 2013;62:193–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Mazel F, et al. Improving phylogenetic regression under complex evolutionary models. Ecology. 2016;97:286–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Martins EP. Conducting phylogenetic comparative studies when the phylogeny is not known. Evolution. 1996;50:12–22. [DOI] [PubMed] [Google Scholar]
- 87.Mundry R. Statistical issues and assumptions of phylogenetic generalized least squares. In: Garamszegi LZ, editor. Modern phylogenetic comparative methods and their application in evolutionary biology. Berlin, Heidelberg: Springer Berlin Heidelberg; 2014. pp. 131–53. 10.1007/978-3-662-43550-2_6. [Google Scholar]
- 88.Rohlf FJ. Comparative methods for the analysis of continuous variables: geometric interpretations. Evolution. 2001;55:2143–60. [DOI] [PubMed] [Google Scholar]
- 89.Yule G, Willis JC. F R S Phil Trans R Soc Lond B. 1925;213:21–87. [Google Scholar]
- 90.Stadler, T. (2019). TreeSim: Simulating phylogenetic trees (Version 2.4). CRAN. Stadler T. Package ‘TreeSim’. (2019). https://CRAN.R-project.org/package=TreeSim
- 91.Liu L, Yu L. Phybase: an R package for species tree analysis. Bioinformatics. 2010;26:962–3. [DOI] [PubMed] [Google Scholar]
- 92.Paradis E, Schliep K. Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35:526–8. [DOI] [PubMed] [Google Scholar]
- 93.MacKinnon JG, White H. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. J Econ. 1985;29:305–25. [Google Scholar]
- 94.White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica: J Econometric Soc, 817–38. (1980).
- 95.Smith MR. 2021. TreeSearch: morphological phylogenetic analysis in R. bioRxiv, pp. 2021-11.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data and materials are available as Supplementary Materials associated with this publication and in a public GitHub public repository: https://github.com/radamsRHA/TreeChoiceExperiment.





