Abstract
Comparative phylogenetics provides a wealth of computational tools to understand evolutionary processes and their outcomes. Advances in these methodologies have occurred in parallel with a surge in cross-species genomic and phenotypic data. To date, however, the majority of published studies have focused on classical questions in evolutionary biology, such as speciation and the ecological drivers of trait evolution. Here, we argue that evolutionary medicine in general, and our understanding of the origin and diversification of disease traits in particular, would be greatly expanded by a wider integration of phylogenetic comparative methods (PCMs). We use comparative oncology – the study of cancer across the tree of life – as a case study to demonstrate the power of the approach and show that implementing PCMs can highlight the mode and tempo of the evolutionary changes in intrinsic, species-level disease vulnerabilities.
Introduction
Evolutionary medicine seeks to understand the ultimate origin and persistence of human disease within the context of the species’ evolutionary history and that of their relatives (1–3). A central explanatory concept in the field has been the idea of evolutionary mismatch (4,5), which hypothesizes that contemporary maladies are often rooted in the dramatic shift in environment humans have undergone on a timescale that natural selection cannot meaningfully act upon. In the last decade, a plethora of human diseases have been explored through this lens, including autoimmune disorders, diabetes, heart disease, and psychiatric disorders (1,2,6–14).
The evolutionary mismatch framework has been incredibly productive. To date, however, the majority of research in evolutionary medicine has focused on humans (Homo sapiens) and our immediate ancestors, with relatively little attention paid to the context of broader phylogeny. For example, early attempts to explain the variation in cancer risk across taxa heavily relied on a life history framework of disease risk among humans and our closest relatives (15–19). Given the diverse ecological backgrounds in which species evolve, variant strategies will dictate the amount of energetic investment into somatic maintenance and other disease relevant domains. Moreover, the application of evolutionary trait models can drive deeper testing of the hypotheses developed from a life history framework of disease vulnerability. Life history theory has provided a powerful toolkit for understanding the evolution of reproductive and survival strategies across different species. More recently, this framework has been applied to studies of cross-species cancer prevalence, and a growing body of research suggests that cancer susceptibility may be linked to reduced energetic investment in somatic maintenance, favoring a high-volume reproductive output instead, as suggested by fast life history theory (15,20–22). Yet, there is more to be learned about human maladies when we expand our focal point to examine disease across deeper evolutionary history. In recent years, the integration of phylogenetic comparative methods (henceforth, PCMs), the study of evolutionary relationships among species, has provided such avenues for investigating the evolutionary origins and diversification of disease risk.
The comparative method is a cornerstone of evolutionary biology (23). However, it was not until the 1980s that the development of PCMs made significant headway, due in equal parts to the development of robust statistical approaches and confidence in DNA-based phylogenies (24–27). (“The comparative method of 1950 was indistinguishable from the comparative method of 350 BC,” as Ridley 1983, p. 6 put it(28).) By now, the rigor of statistical and computational techniques within comparative phylogenetics have come to rival those of population genetics. The continuous growth in computational methods and the accumulation of vast amounts of genomic and phenotypic data across a wide range of species has made PCMs increasingly popular. While attempts to apply PCMs to questions in evolutionary medicine have increased, efforts to outline ‘best practices’ for PCMs in evolutionary medicine have been sparse (29,30).
In this review, we argue that the integration of PCMs in evolutionary medicine can greatly enhance our understanding of disease risk, which may inform novel strategies for prevention and treatment of disease. Drawing on recent advances in PCMs and evolutionary medicine, we highlight the key benefits of this integrative approach and provide examples of its applications in the study of comparative oncology – the study of cancer across the tree of life (31–34). We begin by outlining some of the basics of the comparative method and then apply it to a previously published dataset of cross-species cancer prevalence. We end by discussing the challenges and limitations of using PCMs in the context of human health, and provide a roadmap for its applications in broader evolutionary medicine topics.
A primer on comparative phylogenetics
The foundation of modern models of continuous trait evolution rests with Brownian motion (discrete traits - traits with distinct states - are typically modeled with a model called the Markov k-state, or Mk, model). Models of Brownian motion originally developed from attempts to explain the outcome of processes that are seemingly the product of random interactions, specifically the random movement of particles suspended in a medium (fluid or gas). First described while observing the “irregular motion” of coal dust on the surface of alcohol by Jan Ingenhousz, the Brownian motion is actually named after the work of botanist Robert Brown, some decades later. Brownian motion is the basis of a large family of mathematical models that are popular due to their mathematical tractability and are suitable for modeling phenomena where small stochastic displacements add up to approximate Gaussian fluctuations. In comparative phylogenetics, data that has changed over time is plentiful, in the form of the average values of continuous evolutionary traits within populations or species. Many such traits are well-fit by Brownian motion. Naturally, early phylogenetic studies and research applied the Brownian model to the divergence of traits over time rather successfully. So it follows that this is often the basis of the null hypothesis in models of phylogenetic context (35). Although one possible mechanism for random change fitting a Brownian model is neutral “genetic drift”, there are other possible mechanisms for Brownian motion involving selection (36,37). In cases where Brownian motion is the most suitable model for a trait, it is characterized by a “random walk,” signifying a lack of directionality or identifiable trends in variation. Given that Brownian motion is one of the most commonly best-supported models, it provides a useful null hypothesis in investigating alternative modes of phenotypic evolutionary change (38). Building on the foundational model of Brownian motion, alternative evolutionary models have been developed over the past decades that are now able to incorporate various evolutionary forces, such as directional selection, adaptive radiation, and evolutionary constraints, to better fit empirical data (39–43). Crucially, the advancement and sophistication of the computational modeling abilities have also expanded the range of questions we can inquire about the evolutionary forces driving the extensive variation we observe in species’ traits.
Phylogenetic Variance-Covariance Matrices
Phylogenetic variance covariance (VCV) matrices are essential to phylogenetic statistical models. In phylogenetic comparative methods, many models predict that the observed average values of continuous traits across species will follow a multivariate normal distribution. To account for phylogenetic relationships, a matrix is constructed by summing the shared branch lengths between species from their most recent common ancestor to the root, capturing evolutionary relatedness. This matrix is then scaled according to the assumptions of a given evolutionary model to generate the variance-covariance matrix, which describes the expected covariances among species based on their shared evolutionary history (44). The phylogenetic VCV can be utilized in various statistical methods to model trait relationships, applying it in evolutionary model fitting to capture expected trait covariances and in phylogenetic generalized least squares (PGLS) to model residual error and account for the non-independence of species for regression estimates.
Analyzing Models of Phenotype Evolution
A prominent extension of Brownian motion, used widely in mathematical and physical frameworks, is the Ornstein–Uhlenbeck (OU) model, which describes trait evolution that reverts toward an optimal or mean value. George Eugene Uhlenbeck and Leonard Ornstein discovered the model in 1930 to describe the velocity of a Brownian particle undergoing friction (resistance to change)(45). OU has broad applications in physics, finance, and evolutionary biology. The OU model of evolution was adapted from models to explain random movements in a statistical framework to describe the evolution of continuous traits in a population over time (46). Evolutionary biologists commonly interpret this model as incorporating both stabilizing selection and genetic drift to explain how a trait may evolve in response to stochastic displacements around a center, e.g. the fitness optimum (47). An OU model includes a stochastic component, similar to Brownian motion, while the deterministic component models describes the pull toward a medial value due to stabilizing selection or any other restraining force.
Beyond the OU model, other evolutionary models further expand on Brownian motion to capture various aspects of evolution. For instance, the rate trend model is variation of the Brownian motion model with linear trends in traits or step variance (i.e. how fast a trait spreads around a mean) that describe the rate of change in continuous traits over time (40,48). These rates can be either positive (indicating increasing rates over time) or negative (indicating decreasing rates over time). This model can explain how evolutionary pressures and environmental changes influence or inhibit rates of trait change.
The mean trend model in phylogenetic trait evolution accounts for a directional change in traits over time, incorporating both a random walk component and a consistent drift. Unlike the Brownian Motion model, which describes evolution purely as a stochastic process, the mean trend model adds a “drift” parameter to capture the gradual increase or decrease in trait values across evolutionary time. This drift represents a directional bias, suggesting that traits may not only evolve randomly but may also follow a specific trend due to persistent selective pressures or long-term directional processes (39,40).
The white noise model represents a scenario in which trait values are entirely independent across species, meaning there is no phylogenetic correlation or influence of evolutionary history on trait similarity. Under this model, trait values follow a normal distribution with a constant variance, calculated from the trait values, without any shared evolutionary trajectory. The variance-covariance matrix for this model is characterized by non-zero values only on the diagonal, indicating that each species’ trait is unrelated to any others (39).
Another early approach to stepping beyond Brownian motion was suggested by Mark Pagel in 1999 (49). While the originally suggested approach was designed for discrete traits, the author later pointed out that it applies equally well to continuous traits, and has been used extensively ever since. In traditional explicit evolutionary models, such as Brownian motion and its extensions, trait evolution is modeled by explicitly defining and adjusting parameters that represent various evolutionary processes driving trait change. Pagel’s frameworks of tree transformations differ from the traditional explicit evolutionary models discussed above. Instead of directly adjusting the mechanics of the model, Pagel’s tree transformations (λ, δ, and κ) directly adjust the elements of the phylogenetic variance-covariance matrix. This translates into different transformations of branch lengths of the tree.
Pagel’s λ measures phylogenetic signal using the Brownian motion model as a baseline, where values range from 0 (independent trait evolution) to 1 (evolution under Brownian motion), with intermediate values indicating partial phylogenetic influence. The δ transformation adjusts node heights to model changes in evolutionary rates over time, with δ < 1 indicating deceleration and δ > 1 suggesting acceleration toward the present. Finally, the κ transformation alters branch lengths to explore how evolutionary distance affects trait similarity, with κ = 1 retaining the original tree structure and κ = 0 equalizing all branch lengths across the tree, representing a situation where change is proportional to the number of nodes separating species rather than time (37).
Variations in Evolutionary Mode Across Species
In Table 1, σ2 represents one measure of evolutionary rates in terms of the variance in trait evolution. z0 denotes the trait value at the root of the phylogenetic tree (ancestral state). tij refers to the shared evolutionary time between species i and j. α is the strength of stabilizing selection in the Ornstein-Uhlenbeck (OU) model. a is the rate change parameter in the Early Burst (EB) model, where negative values indicate decelerating rates of evolution. λ measures the degree of phylogenetic signal in Pagel’s Lambda model. δ adjusts evolutionary rates as a function of time in Pagel’s Delta model. κ describes evolutionary change following a speciational pattern in Pagel’s Kappa model, scaling the branch lengths. Slope indicates a linear trend in the rate of evolution over time (Rate Trend model), while Drift represents the directional trend in trait evolution over time (Mean Trend model).
Table 1:
Comparison of models in phylogenetic trait evolution.
Model | Biological Meaning | Fitted Parameters | Variance-Covariance Equation |
---|---|---|---|
Brownian Motion (BM) | Trait evolution via random walk | , | |
Ornstein-Uhlenbeck (OU) | Trait evolution with stabilizing selection | ,, | |
Early Burst (EB) | Evolutionary rates decrease over time | ,, | |
Pagel’s Lambda | Trait evolution with phylogenetic signal | ,, | |
PagePs Delta | Evolutionary rates vary over time | ,, | |
PagePs Kappa | Evolutionary changcs follow a speciational pattern | ,, | |
Rate Trend | Evolutionary rates increase or decrease over time | ,, | |
Mean Trend | Trait evolution with directional trend | ,, | |
White Noise | Trait values are independently normally distributed | , |
Variations in Best Fit Evolutionary Mode
Across the tree of life, variations in life history strategies, genetics, and environmental pressures have all driven a diverse set of evolutionary solutions for managing disease risk. Instead of assuming a uniform path of evolution, fitting and comparing modes of evolution across groups using geiger’s fitContinuous function provides us with a representation of how evolutionary modes can diverge across species. The models are first fit to the overall tree and dataset based on the lowest AIC score, followed by model fitting at the class level. A model with an AIC score at least 10 points lower than the alternatives is considered a significantly better fit. If multiple models fall within 10 AIC points of each other, the evidence for selecting a single best model is weak. In this case the best-fit model from the higher taxonomic classification is designated as the best fit model for the group. This approach maintains consistency in model assumptions, ensuring that for a group to diverge from the evolutionary patterns of its ancestors, it must demonstrate a significant deviation in model fit. See Table 1 for the biological meaning of each evolutionary model.
Reconstructing Disease Traits
To properly reconstruct the evolutionary rates and disease states for a group of species, the phylogenetic tree and disease data is first fit to various models of evolution to assess the model of best fit. In the case of the mammal data, Pagel’s λ tree transformation is the best fit. The λ value is fit to the phylogeny using the fitContinuous function from geiger. This lambda is then used to rescale the phylogeny using the rescale function (e.g., λ=0.46), adjusting the branch lengths to reflect the phylogenetic signal in the trait evolution. Ancestral states and confidence intervals under a Brownian motion model for malignancy prevalence were estimated using maximum likelihood, implemented via the anc.ML function from phytools. Estimated ancestral states under Pagel’s model are represented by the white line, illustrating the inferred malignancy prevalence at internal nodes of the phylogeny. The surrounding blue haze represents the 95% confidence interval, capturing the range of uncertainty in these reconstructions and highlighting variability in trait evolution across mammalian lineages.
Adapting Phylogenetic Generalized Least Squares
Through MLE, we found Pagel’s Lambda to be the overall best fit model for the cross-species cancer prevalence dataset. Two other Pagel models; Ornstein-Uhlenbeck and Rate Trend, were the best fit models for smaller clades. The method, PGLSSeyPagel, used in Compton et al. to test for statistical relationships between life history and cancer data, assumed Brownian motion(34). With this, we developed a new function, PGLSSeyOptim. This method builds upon the PGLS.SEy, from phytools (53), and the PGLSSeyPagel method created by Diego Mallo. This function adds the ability to assume Brownian Motion, OU, Pagel’s lambda, delta, kappa, white noise, and rate trend. Parameter estimates within the PGLSSeyOptim utilize fitContinuous to ensure continuity between the fitting and the trend estimation process.
Phylogenetic Comparative Methods Currently utilized within Comparative Oncology
Given the plethora of phenotypic traits and the distinct data formats that they assume (e.g., discrete or continuous) there has been an equally diverse set of phylogenetic comparative methods (PCMs) developed to analyze them. Given a specific phenotypic trait and the type of data collected, a mainstay of comparative phylogenetics has been debates on what is the best process method given the context of the data. For any discipline that is eager to integrate the use of PCMs, it can be daunting to face these debates and determine which method is best suited for their specific purpose. Across the field, a standard for testing the relationships between life history and disease traits across species has yet to be set. To understand which method is most appropriate in the context of evolutionary medicine, simulation studies were conducted. In our simulation, we compared two methods used in previous studies alongside our newly created PGLSSEyOptim function. The first method, used in Compton et al.’s Cancer Prevalence Across Vertebrates is the PGLSSEyPagel function(34). The other method is the Compar.gee function from ape. The Compar.gee function, adapted from the approach described by Paradis and Claude (2002), employs a generalized estimation equation alongside a phylogenetic correlation structure in order to generate log-odd relationships (54). This method does not use a continuous variable, such as species’ level cancer prevalence, but relies upon the occurrences and non-occurrences of the independent variable within each species. This method is used within Bulls et al.’s similar analysis of cancer across tetrapods to perform a binomial regression (33).
Validating Popular PCMs in Evolutionary Medicine
Within the recent attempts to apply PCMs to cross-species disease prevalence data, there are several statistical methods being utilized for species level trend estimation. To understand which method is best for estimating trends in disease risk across species, we conducted a simulation study. The study used previously collected life history data from PanTHERIA and Amniote along with simulated cancer prevalence data, to compare error rates between multiple phylogenetic statistical methods. (50–52). The simulation has two changing parameters, which are the minimum number of records per species and the degree of the relationship between simulated life history and neoplasia prevalence values (slope). For each of these combinations, with each method, 1000 iterations were completed before moving on to the next combination of parameters. Each iteration of the simulation first randomly samples the needed amount of species from the dataset. From there, the number of records are simulated using the negative binomial distribution. The negative binomial distribution most similarly mirrored the real distribution of records per species. The phylogenetic trees are then cut and standard errors are calculated for the PGLS functions. The relationships are then simulated using the slope-intercept formula, with a small amount of noise to avoid convergence. For simulations where slope is equal to 0, noise is maintained to ensure a non-significant relationship. For compar.gee, Neoplasia Occurrences and Non-Occurrences are generated from the same data used for the PGLS tests.
Within the simulation study, three different models of evolution, OU, BM, and Pagel’s Lambda, were used to simulate trait data with a trend. The trend was randomly chosen for each trial between −2 and 2. For the random noise experiment, both independent and dependent variables were generated randomly. The number of records per species was chosen randomly for each species from the negative binomial distribution with a minimum threshold of 1, 5, 10 or 20 records per species, to mimic different studies in the literature(17,31,32,34). For the experiments with a given trend and model, a false negative error was indicated if the p-value was greater than 0.05. For the null experiment, false positive errors were indicated if the p-value was less than 0.05. Across all parameter values and methods, the average error rates were 15.9% for the compar.gee binomial, 6.9% for PGLSSey, and 5.84% for PGLSSey.Optim. Additional summary statistics are in supplementary table 2.
Discussion
Modes of Evolution Between Clades
Figure 1 illustrates how the best-fit evolutionary models can vary across different branches of a phylogeny. While a single model may explain trait evolution over time across a large group of species, closer examination of specific clades reveals shifts in these models. Under the life history framework, natural selection leads species to adopt different trade-offs to optimize survival and reproduction. As species diverge and follow distinct life history strategies, their evolutionary trajectories can vary as a result. These changes in trait evolution often reflect shifts in those evolutionary trajectories, as species adapt to different ecological pressures and life history demands (20,55–57). Applying a single model across hundreds of species may fail to capture the diverse dynamics of trait evolution, especially when these species experience different evolutionary pressures. Utilizing multiple evolutionary models better reflects the varying rates and processes of evolution within clades. This approach provides a more accurate depiction of the complex, non-uniform evolutionary paths that shape adaptations, including those related to disease prevention. As Beaulieu et al. (2012) argue, biological reality is too complex for any single model to fully capture, and the fit of a model depends on both its parameters and the available data (58). Using multiple models helps account for the varied evolutionary pressures across species.
Figure 1.
One phylogeny, multiple modes of evolution
Insights from Ancestral State Reconstruction
Understanding the evolutionary history of cancer prevalence across species has significant implications for human medicine. Evolutionary biologists can utilize ancestral state reconstruction under a best-fit model of evolution to infer how disease traits have changed over time across lineages. If specific genetic variations or evolutionary pressures are associated with higher or lower cancer risk in certain lineages, these findings can guide human cancer research by highlighting conserved or divergent biological pathways involved in tumor suppression, DNA repair, or immune response (59–63). Ancestral state reconstructions can be applied beyond cancer research, providing valuable insights into the evolutionary drivers of various diseases that impact human health.
Best Practices for Evolutionary Analyses
Our analyses indicate that the PGLS approach provides greater accuracy in detecting trends compared to the binomial regression (Fig 3, ST2). The robustness and additional functionality of PGLSSeyOptim, in conjunction with first fitting for the best evolutionary model for trait data, provides a rigorous framework to ensure that the most appropriate assumptions are made when testing for relationships across a phylogeny. This approach parallels that of Brocklehurst in 2016, who similarly tested for the best fit evolutionary model to analyze evolution rates and variations of body size (64). As phylogenetic comparative methods (PCMs) evolve and become central tools in evolutionary medicine, it is critical to account for the different evolutionary processes that may vary for different traits and trees. Various disease traits may follow distinct evolutionary modes. Failing to account for potential variations in evolutionary models could lead to misinterpretations of phylogenetic relationships and trait correlations. When utilizing tools like PGLSSeyOptim, users should seek the best fit evolutionary model for their data and phylogeny, then utilize the best fit model and its estimated parameters for further phylogenetic analysis.
Figure 3.
Simulation results for minimums of 1,5,10, and 20 records for 50, 150 or 350 random mammalian species. False positive error rates were based on simulations with no relationship between the independent and dependent variables. False negative error rates were calculated based on simulated data that had a linear relationship between the independent and dependent variables with a slope randomly chosen uniformly between −2 and 2, but that changed along the phylogeny based on one of three models: Brownian motion (Brownian), Ornstein-Ulenbeck (OU) or Pagel’s Lambda.
Comparison of Phylogenetic Regression Models
All regression methods had similar false negative rates, across modes of evolution, number of species, and number of observations per species (Figure 3). They all appear to be well suited for analyzing disease susceptibility if it evolved randomly over evolutionary time (Brownian motion). That is, there were very few instances of false negative errors when we simulated the trait evolving via Brownian motion, regardless of the type of regression used. However, when disease susceptibility evolved under the OU model, with sudden discrete changes, all regression methods were susceptible to some level of false negatives, which was worse when there were fewer species in the analyses. The standard PGLS method (pgls.SEy) performed similarly to our optimized version (pgls.SEy.Optim), which estimates parameters within the evolutionary model used to fit the data. With just 50 species in the analysis, both PGLS regressions worked surprisingly well on data generated with the Pagel’s Lambda model, resulting in very few false negative errors. We did find that the binomial regression models are vulnerable to detecting false positive results, and this actually gets worse with more species added to the analyses (Figure 3). This discrepancy likely stems from how the underlying distribution of the necropsy data does not align well with the binomial model’s assumptions, leading to overdispersion. Overdispersion occurs here as the binomial model underestimates the variance inherent in the data, resulting in artificially low standard errors and inflated test statistics. Consequently, binomial regression models appear to fit the data better than they truly do, increasing the risk of false positives. The nature of zoological databases will likely result in a distribution of records that follows a negative binomial distribution. If using count data, researchers should ensure that their data follows the expected distribution of their given statistical method. The MCMCglmm R package allows for count data input following a negative binomial distribution(65). Other groups have employed regressions using phylogenetic independent contrasts instead of phylogenetic VCV matrices to mitigate potential errors in regression estimates caused by phylogenetic correlations. This approach is particularly useful when researchers aim to minimize the influence of phylogenetic signal and evolutionary history in their analyses (66). However, when studying evolving disease traits in a comparative setting, incorporating phylogenetic relationships is crucial for understanding trends within an evolutionary context.
Comparative phylogenetics has a clear, complementary place alongside comparative genomics in explaining the origin, risk, and persistence of human pathologies. As comparative data on diseases in non-human species increases, so shall the need for the robust utilization of comparative phylogenetic methods. Like many other subfields within evolutionary medicine, comparative oncology is a nascent field and we should expect to see an array of methodological approaches to other diseases as well (e.g., studying cardiovascular disease across the tree of life). However, just as genomics has managed to do, there must be agreed upon “best practices” for the analyses of disease prevalence data. While these best practices must be an emergent property of peer-review and discussion amongst the field, we hope that we have at least laid the groundwork for how to evaluate the robustness of these methods within evolutionary medicine.
Methods
Aggregating Necropsy Data for Species Level Analysis
We utilized a previously published dataset of cross-species cancer prevalence, containing data on 292 species and 16049 individuals (34). In short, the database is built on the necropsy records of each individual, collected, with permission, from 99 zoological institutions, aquariums, and animal housing facilities. The individual necropsy data was then aggregated, where relevant variables such as benign and malignant tumor incidence were summarized to represent each species. Species level data also includes the number of necropsy records and taxonomic information. The summarized necropsy data was then paired with species-specific adult mass data. Life history trait data was collected from two of the popular life history databases often utilized in comparative oncology studies: PanTHERIA and Amniote (50–52).
Fitting Popular Models of Evolution to Cross-Species Cancer Prevalence Data
The fitContinuous function from the geiger package allows for the fitting of phylogenies and disease trait data to various models of evolution(40). The function utilizes Maximum Likelihood Estimation (MLE) to assess the log-likelihood for each model. It iterates through different parameter estimates using the subplex optimization algorithm to maximize the log-likelihood and generate the best parameter estimates. Before fitting, trait prevalence data undergoes an arcsine square root transformation to ensure it follows a normal distribution. The AIC (Akaike Information Criterion) is then utilized to determine the best model, while accounting for overfitting. We applied this framework to fit evolutionary models to cancer prevalence data from 292 species. Based on AIC scores, Pagel’s Lambda tree transformation provided the best fit (Log-Likelihood: 124.72; AIC: −243.45; ST1). The model was fit with an intermediate lambda value of 0.46. This is typically described as an intermediate phylogenetic signal, which indicates that species tend to have similar traits as their close relatives, but not to the extent predicted by Brownian motion.
Conclusion
Evolutionary medicine has made important insights into the origin and persistence of human disease. As the field has diversified, it has continued to permeate new fields of disease and demonstrated its broad utility to medicine. Despite its successes, much of the progress made in evolutionary medicine has been conceptual, expanding our understanding of human disease within the already established framework of what we know about humans’ evolutionary history. Primary explorations of human-relevant disease are possible through the classical methods employed in evolutionary biology, namely comparative phylogenetics. When we compare human diseases across species the prevalence of that disease is well understood as a continuous species trait. Virtually every human disease that can be viewed through the lens of epidemiology (i.e., a measurement of its prevalence in a population) can be studied in a similar way across species through comparative phylogenetics. As comparative phylogenetics becomes increasingly integrated into disease research, it is essential to establish best practices to ensure accurate and meaningful results that will significantly contribute to our understanding of cancer.
Figure 2.
Ancestral State Reconstruction of Malignancy Prevalence under Pagel’s Lambda Model Over Time for 98 Mammalian Species
Funding
National Institutes of Health grant U54 CA217376 (CCM, ZTC, WJM, BY, MM, HH, HR)
National Institutes of Health grant T32 CA272303 (ZTC)
Templeton Foundation, Science of Purpose Initiative grant 62220 (BS)
Funding Statement
National Institutes of Health grant U54 CA217376 (CCM, ZTC, WJM, BY, MM, HH, HR)
National Institutes of Health grant T32 CA272303 (ZTC)
Templeton Foundation, Science of Purpose Initiative grant 62220 (BS)
Footnotes
“We must learn to treat comparative data with the same respect as we would experimental results.” (Maynard Smith and Holliday 1979, p. vii)
Supplementary Materials:
All code, tables, and figures are available at https://github.com/wjmellon/EvMedPCMS
References
- 1.Nesse RM, Williams GC. Evolution and the origins of disease. Sci Am. 1998. Nov;279(5):86–93. [DOI] [PubMed] [Google Scholar]
- 2.Williams GC, Nesse RM. The dawn of Darwinian medicine. Q Rev Biol. 1991. Mar;66(1):1–22. [DOI] [PubMed] [Google Scholar]
- 3.Nesse RM, Williams GC. On Darwinian Medicine [Internet]. www-personal.umich.edu; 1999. [cited 2023 May 30]. Available from: http://www-personal.umich.edu/~esse/Articles/OnDarMed-LifeScience-1999.PDF
- 4.Lea AJ, Clark AG, Dahl AW, Devinsky O, Garcia AR, Golden CD, et al. Evolutionary mismatch and the role of GxE interactions in human disease. ArXiv [Internet]. 2023. Feb 13; Available from: 10.1101/2021.10.24.21265442 [DOI] [Google Scholar]
- 5.Manus MB. Evolutionary mismatch. Evol Med Public Health. 2018. Aug 8;2018(1):190–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lea AJ, Clark AG, Dahl AW, Devinsky O, Garcia AR, Golden CD, et al. Applying an evolutionary mismatch framework to understand disease susceptibility. PLoS Biol. 2023. Sep;21(9):e3002311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bourrat P, Griffiths PE. The idea of mismatch in evolutionary medicine. Br J Philos Sci [Internet]. 2021. Jul 20; Available from: 10.1086/716543 [DOI] [Google Scholar]
- 8.Turaman C. Classification of the risk factors of coronary heart disease and their evolutionary origins. Health Sciences Review. 2022. Jun 1;3:100027. [Google Scholar]
- 9.Keestra SM, Male V, Salali GD. Out of balance: the role of evolutionary mismatches in the sex disparity in autoimmune disease. Med Hypotheses. 2021. Jun;151:110558. [DOI] [PubMed] [Google Scholar]
- 10.Campbell BC, Cajigal A. Diabetes: energetics, development and human evolution. Med Hypotheses. 2001. Jul;57(1):64–7. [DOI] [PubMed] [Google Scholar]
- 11.Nesse RM. Fear and fitness: An evolutionary analysis of anxiety disorders. Ethol Sociobiol [Internet]. 1994; Available from: https://www.sciencedirect.com/science/article/pii/0162309594900027 [Google Scholar]
- 12.Nesse RM. Good reasons for bad feelings: Insights from the frontier of evolutionary psychiatry. 2019; Available from: https://books.google.ca/books?hl=en&lr=&id=frmHDwAAQBAJ&oi=fnd&pg=PR13&ots=TBHFiHMFwl&sig=4usb1AswcmAYa-x4Ex8z_v5i9m8
- 13.Nesse RM. Why Evolutionary Do Mental Disorders Persist? Perspectives on Evolution and Mental Health [Internet]. 2022; Available from: https://books.google.com/books?hl=en&lr=&id=Tph-EAAAQBAJ&oi=fnd&pg=PA84&dq=Why+Evolutionary+Mental+Disorders+Persist&ots=d6KaREPMi8&sig=bnsD-2uUjZKVB6BjEd76YEgWgB0 [Google Scholar]
- 14.Nesse RM. Is depression an adaptation? Arch Gen Psychiatry [Internet]. 2000; Available from: https://jamanetwork.com/journals/jamapsychiatry/article-abstract/481547 [DOI] [PubMed] [Google Scholar]
- 15.Jacqueline C, Biro PA, Beckmann C, Moller AP, Renaud F, Sorci G, et al. Cancer: A disease at the crossroads of trade-offs. Evol Appl. 2017. Mar;10(3):215–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nedelcu AM. Evo-devo perspectives on cancer. Essays Biochem. 2022. Dec 8;66(6):797–815. [DOI] [PubMed] [Google Scholar]
- 17.Kapsetaki SE, Compton ZT, Dolan J, Harris VK, Mellon W, Rupp SM, et al. Life history traits and cancer prevalence in birds. Evol Med Public Health [Internet]. 2024. Jun 27; Available from: https://academic.oup.com/emph/advance-article/doi/10.1093/emph/eoae011/7700206 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Boddy AM, Abegglen LM, Pessier AP, Aktipis A, Schiffman JD, Maley CC, et al. Lifetime cancer prevalence and life history traits in mammals. Evol Med Public Health. 2020. May 25;2020(1):187–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Dujon AM, Boutry J, Tissot S, Lemaître JF, Boddy AM, Gérard AL, et al. Cancer Susceptibility as a Cost of Reproduction and Contributor to Life History Evolution. Frontiers in Ecology and Evolution [Internet]. 2022;10. Available from: https://www.frontiersin.org/articles/10.3389/fevo.2022.861103 [Google Scholar]
- 20.Brommer JE. The evolution of fitness in life-history theory. Biol Rev Camb Philos Soc. 2000. Aug;75(3):377–404. [DOI] [PubMed] [Google Scholar]
- 21.Boddy AM, Kokko H, Breden F, Wilkinson GS, Aktipis CA. Cancer susceptibility and reproductive trade-offs: a model of the evolution of cancer defences. Philos Trans R Soc Lond B Biol Sci [Internet]. 2015. Jul 19;370(1673). Available from: 10.1098/rstb.2014.0220 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Boddy AM, Harrison TM, Abegglen LM. Comparative Oncology: New Insights into an Ancient Disease. iScience. 2020. Aug 21;23(8):101373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Cornwallis CK, Griffin AS. A guided tour of phylogenetic comparative methods for studying trait evolution. Annu Rev Ecol Evol Syst [Internet]. 2024. Aug 6 [cited 2024 Oct 22]; Available from: https://www.annualreviews.org/content/journals/10.1146/annurev-ecolsys-102221-050754 [Google Scholar]
- 24.Felsenstein J. Phylogenies and the Comparative Method. Am Nat. 1985. Jan 1;125(1):1–15. [Google Scholar]
- 25.Felsenstein J. Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet. 1988;22:521–65. [DOI] [PubMed] [Google Scholar]
- 26.Felsenstein J. Phylogenies and quantitative characters. Annu Rev Ecol Syst [Internet]. 1988; Available from: https://www.jstor.org/stable/2097162?casa_token=MpYB7MCAtCsAAAAA:1svZfnl3lAF8SXNE4FVQFUUnpf-2QAHzvJkzBzcHgYupr7plC1eAcL7hHyl3NpZlhlpOjeViNczJq8ob7efuE3OExcZdcHdTvBlO23l9GJw0c8GMQg [Google Scholar]
- 27.Pagel MD, Harvey PH. Comparative methods for examining adaptation depend on evolutionary models. Folia Primatol (Basel). 1989. Sep 11;53(1–4):203–20. [DOI] [PubMed] [Google Scholar]
- 28.Ridley M. The explanation of organic diversity: the comparative method and adaptations for mating. New York: Oxford University Press; 1983. 1 online resource (viii, 272 pages illustrations). (Oxford science publications). [Google Scholar]
- 29.Garland T Jr, Bennett AF, Rezende EL. Phylogenetic approaches in comparative physiology. J Exp Biol. 2005. Aug;208(Pt 16):3015–35. [DOI] [PubMed] [Google Scholar]
- 30.Rezende EL, F. DFJA. Phylogenetic analyses: comparing species to infer adaptations and physiological mechanisms. Comprehensive Physiology. 2012;2(1):639–74. [DOI] [PubMed] [Google Scholar]
- 31.Vincze O, Colchero F, Lemaître JF, Conde DA, Pavard S, Bieuville M, et al. Cancer risk across mammals. Nature. 2022. Jan;601(7892):263–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Boddy AM, Abegglen LM, Pessier AP, Schiffman JD, Maley CC, Witte C. Lifetime cancer prevalence and life history traits in mammals. Evol Med Public Health [Internet]. 2020. May 25 [cited 2020 Jul 9]; Available from: https://academic.oup.com/emph/article-pdf/doi/10.1093/emph/eoaa015/33293561/eoaa015.pdf [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bulls SE, Platner L, Ayub W, Moreno N, Arditi JP, Dreyer S, et al. Cancer prevalence is related to body mass and lifespan in tetrapods and remarkably low in turtles [Internet]. bioRxiv. 2023. [cited 2023 Mar 21]. p. 2022.07.12.499088. Available from: https://www.biorxiv.org/content/biorxiv/early/2023/02/24/2022.07.12.499088 [Google Scholar]
- 34.Compton ZT, Mellon W, Harris VK, Rupp S, Mallo D, Kapsetaki SE, et al. Cancer prevalence across vertebrates. Cancer Discov. 2024. Oct 24;OF1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.McShea DW, Wang SC, Brandon RN. A quantitative formulation of biology’s first law. Evolution. 2019. Jun;73(6):1101–15. [DOI] [PubMed] [Google Scholar]
- 36.Revell LJ, Harmon LJ, Collar DC. Phylogenetic signal, evolutionary process, and rate. Syst Biol. 2008. Aug;57(4):591–601. [DOI] [PubMed] [Google Scholar]
- 37.Harmon LJ. Phylogenetic comparative methods. 2019; Available from: https://ecoevorxiv.org/repository/object/4486/download/9000/?embed=True
- 38.Hunt G, Hopkins MJ, Lidgard S. Simple versus complex models of trait evolution and stasis as a response to environmental change. Proc Natl Acad Sci U S A. 2015. Apr 21;112(16):4885–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Harmon LJ, Weir JT, Brock CD, Glor RE, Challenger W. GEIGER: investigating evolutionary radiations. Bioinformatics. 2008. Jan 1;24(1):129–31. [DOI] [PubMed] [Google Scholar]
- 40.Pennell MW, Eastman JM, Slater GJ, Brown JW, Uyeda JC, FitzJohn RG, et al. Geiger v2.0: An expanded suite of methods for fitting macroevolutionary models to phylogenetic trees. Bioinformatics. 2014. Aug 1;30(15):2216–8. [DOI] [PubMed] [Google Scholar]
- 41.Revell LJ. Ancestral character estimation under the threshold model from quantitative genetics. Evolution. 2014. Mar;68(3):743–59. [DOI] [PubMed] [Google Scholar]
- 42.Castiglione S, Tesone G, Piccolo M, Melchionna M, Mondanaro A, Serio C, et al. A new method for testing evolutionary rate variation and shifts in phenotypic evolution. Methods Ecol Evol. 2018. Apr;9(4):974–83. [Google Scholar]
- 43.Baker J, Meade A, Pagel M, Venditti C. Adaptive evolution toward larger size in mammals. Proc Natl Acad Sci U S A. 2015. Apr 21;112(16):5093–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Garland T Jr, Ives AR. Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods. Am Nat. 2000. Mar;155(3):346–64. [DOI] [PubMed] [Google Scholar]
- 45.Uhlenbeck GE, Ornstein LS. On the theory of the Brownian motion. Physical review [Internet]. 1930; Available from: https://journals.aps.org/pr/abstract/10.1103/PhysRev.36.823 [Google Scholar]
- 46.Felsenstein J. Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Hum Genet. 1973. Sep;25(5):471–92. [PMC free article] [PubMed] [Google Scholar]
- 47.Blomberg SP, Rathnayake SI, Moreau CM. Beyond Brownian motion and the Ornstein-Uhlenbeck process: Stochastic diffusion models for the evolution of quantitative characters. Am Nat. 2020. Feb 1;195(2):145–65. [DOI] [PubMed] [Google Scholar]
- 48.Hunt G. Measuring rates of phenotypic evolution and the inseparability of tempo and mode. Paleobiology. 2012;38(3):351–73. [Google Scholar]
- 49.Pagel M. Inferring the historical patterns of biological evolution. Nature. 1999. Oct 28;401(6756):877–84. [DOI] [PubMed] [Google Scholar]
- 50.Jones KE, Bielby J, Cardillo M, Fritz SA, O’Dell J, Orme CDL, et al. PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals: Ecological Archives E090–184. Ecology. 2009;90(9):2648–2648. [Google Scholar]
- 51.de Magalhães JP, Costa J. A database of vertebrate longevity records and their relation to other life-history traits. J Evol Biol. 2009. Aug;22(8):1770–4. [DOI] [PubMed] [Google Scholar]
- 52.Myhrvold NP, Baldridge E, Chan B, Sivam D, Freeman DL, Ernest SKM. An amniote life-history database to perform comparative analyses with birds, mammals, and reptiles. Ecology. 2015. Nov;96(11):3109–3000. [Google Scholar]
- 53.Revell LJ. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol. 2012;3(2):217–23. [Google Scholar]
- 54.Paradis E, Claude J. Analysis of comparative data using generalized estimating equations. J Theor Biol. 2002. Sep 21;218(2):175–85. [DOI] [PubMed] [Google Scholar]
- 55.Stearns SC. Life history evolution: successes, limitations, and prospects. Naturwissenschaften. 2000. Nov;87(11):476–86. [DOI] [PubMed] [Google Scholar]
- 56.Lika K, Kooijman SALM. Life history implications of allocation to growth versus reproduction in dynamic energy budgets. Bull Math Biol. 2003. Sep;65(5):809–34. [DOI] [PubMed] [Google Scholar]
- 57.Roff D. Evolution of life histories: Theory and analysis. 1993rd ed. Roff DA, editor. New York, NY: Springer; 1993. 548 p. [Google Scholar]
- 58.Beaulieu JM, Jhwueng DC, Boettiger C, O’Meara BC. Modeling stabilizing selection: expanding the Ornstein-Uhlenbeck model of adaptive evolution. Evolution. 2012. Aug;66(8):2369–83. [DOI] [PubMed] [Google Scholar]
- 59.Greaves M. Evolutionary determinants of cancer. Cancer Discov. 2015. Aug;5(8):806–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Abegglen LM, Caulin AF, Chan A, Lee K, Robinson R, Campbell MS, et al. Potential Mechanisms for Cancer Resistance in Elephants and Comparative Cellular Response to DNA Damage in Humans. JAMA. 2015. Nov 3;314(17):1850–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Vazquez JM, Pena MT, Muhammad B, Kraft M, Adams LB, Lynch VJ. Parallel evolution of reduced cancer risk and tumor suppressor duplications in Xenarthra. Elife [Internet]. 2022. Dec 8;11. Available from: 10.7554/eLife.82558 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Seluanov A, Gladyshev VN, Vijg J, Gorbunova V. Mechanisms of cancer resistance in long-lived mammals. Nat Rev Cancer. 2018. Jul;18(7):433–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Vazquez JM, Sulak M, Chigurupati S, Lynch VJ. A Zombie LIF Gene in Elephants Is Upregulated by TP53 to Induce Apoptosis in Response to DNA Damage. Cell Rep. 2018. Aug 14;24(7):1765–76. [DOI] [PubMed] [Google Scholar]
- 64.Brocklehurst N. Rates and modes of body size evolution in early carnivores and herbivores: a case study from Captorhinidae. PeerJ. 2016. Jan 11;4(e1555):e1555. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Hadfield JD. MCMC methods for multi-response generalized linear mixed models: TheMCMCglmmRPackage. J Stat Softw. 2010. Feb 2;33(2):1–22.20808728 [Google Scholar]
- 66.Adams R, Cain Z, Assis R, DeGiorgio M. Robust phylogenetic regression. Syst Biol. 2024. May 27;73(1):140–57. [DOI] [PMC free article] [PubMed] [Google Scholar]