Abstract
Simultaneous molecular dating of population and species divergences is essential in many biological investigations, including phylogeography, phylodynamics and species delimitation studies. In these investigations, multiple sequence alignments consist of both intra- and interspecies samples (mixed samples). As a result, the phylogenetic trees contain interspecies, interpopulation and within-population divergences. Bayesian relaxed clock methods are often employed in these analyses, but they assume the same tree prior for both inter- and intraspecies branching processes and require specification of a clock model for branch rates (independent vs. autocorrelated rates models). We evaluated the impact of a single tree prior on Bayesian divergence time estimates by analysing computer-simulated data sets. We also examined the effect of the assumption of independence of evolutionary rate variation among branches when the branch rates are autocorrelated. Bayesian approach with coalescent tree priors generally produced excellent molecular dates and highest posterior densities with high coverage probabilities. We also evaluated the performance of a non-Bayesian method, RelTime, which does not require the specification of a tree prior or a clock model. RelTime’s performance was similar to that of the Bayesian approach, suggesting that it is also suitable to analyse data sets containing both populations and species variation when its computational efficiency is needed.
Keywords: Bayesian, divergence times, molecular dating, RelTime, tree prior
1 |. INTRODUCTION
Divergence times derived from molecular data have become critical for elucidating earth’s historical processes that have shaped the evolution of life on Earth (Hedges, Marin, Suleski, Paymer, & Kumar, 2015; Ho, 2014; O’Reilly, dos Reis, & Donoghue, 2015). Recent biological timescales of closely related taxa are generally built in the context of phylogeographic, phylodynamics and species delimitation studies (Esselstyn, Evans, Sedlock, Anwarali Khan, & Heaney, 2012; McCluskey & Postlethwait, 2015; Mello, Vilela, & Schrago, 2018; Wang, Shikano, Persat, & Merilä, 2015). Divergence times are usually estimated from molecular data with multiple individuals per species sampled. Consequently, both inter- and intraspecies divergences (nodes) are present in the same phylogenetic tree, such that both speciation and population processes are at play. Such data sets are becoming increasingly common (Edwards, Shultz, & Campbell-Staton, 2015; Lemmon & Lemmon, 2012; Manzanilla et al., 2018; McCormack, Hird, Zellmer, Carstens, & Brumfield, 2013; Melville et al., 2017; Merwe, McPherson, Siow, & Rossetto, 2014).
To accurately infer divergence times in data sets with a mixture of micro- and macro-evolutionary events, a multispecies coalescent (MSC) approach would be highly appropriate because it explicitly accounts for conflicts between gene genealogies and the species tree by modelling incomplete lineage sorting (ILS) across lineages (Degnan & Salter, 2005; Edwards & Beerli, 2000; Heled & Drummond, 2010; Maddison, 1997; Rannala & Yang, 2003; Takahata, 1989; Takahata & Nei, 1985; Yang & Rannala, 2014). However, MSC is not always applied because computational times required to estimate divergence times can be impractical for contemporary data sizes, even though some recent advances have been made to reduce the computational burden (Ogilvie, Bouckaert, & Drummond, 2017; Ogilvie, Heled, Xie, & Drummond, 2016; Rannala & Yang, 2017; Xu & Yang, 2016). Moreover, some implementations of MSC require an assumption of a strict molecular clock (Heled & Drummond, 2010; Xu & Yang, 2016; Yang, 2015).
Instead, researchers frequently use the standard Bayesian framework that is not based on the MSC, like the one implemented in the BEAST software (Bouckaert et al., 2014; dos Reis & Yang, 2011). BEAST requires specification of priors, including models that describe the branching pattern (tree prior) and that assume the absence of autocorrelated branch rates (ABR). The commonly used tree priors in BEAST analyses are those that describe speciation processes, for example Yule (Yule, 1924) and birth and death (BD) (Feller, 1939; Kendall, 1948), and within-species population processes, for example coalescent priors considering a constant population size or an exponential growth (Kuhner, Yamato, & Felsenstein, 1995, 1998). Yule and BD priors model macro-evolutionary events, so their use to describe intraspecies divergences would improperly assume these divergences to be like speciation events rather than due to intrapopulation coalescence. On the other hand, the use of coalescent priors may bias time estimation of ancient interspecies divergences, like the tMRCA (time to the most recent common ancestor) of a whole genus or several extant species.
Therefore, it is critical to investigate whether the divergence times produced by Bayesian methods are robust to the selection of tree prior and to the application of a single tree prior for trees with mixed samples. Also, the relaxed clock models in recent implementations of BEAST (version 1.10.4; Suchard et al., 2018) and BEAST 2 (Bouckaert et al., 2019) do not include autocorrelation of evolutionary rates among branches (Kishino, Thorne, & Bruno, 2001; Thorne, Kishino, & Painter, 1998). However, although studies have found statistical support for independent branch rates across lineages (e.g. Drummond, Ho, Phillips, & Rambaut, 2006; Smith, Beaulieu, & Donoghue, 2010; Linder, Britton, & Sennblad, 2011; Jarvis et al., 2014; Barba-Montoya, Dos Reis, Schneider, Donoghue, & Yang, 2018), a recent study found that autocorrelated branch rates may be the norm in biological data (Tao, Tamura, Battistuzzi, & Kumar, 2019), so the impact of rate model misspecification when using BEAST must be evaluated.
Ritchie et al. (Ritchie, Lo, & Ho, 2017) have already conducted simulations and empirical analyses to evaluate the impact of tree priors on time estimates in BEAST and tested their performance under a stepping-stone sampling approach. Their study has confirmed that the choice of tree prior can substantially impact molecular dates and highlighted the importance of testing their relative performance for data sets with a mix of both intra- and interspecies divergences. Although extensive simulations have been conducted, they considered data sets simulated under an independent branch rate (IBR) model and with an MSC branching model (Table S1) while evaluating the effect of using different numbers of intra- and interspecies sequences on estimates of divergence times with distinct tree priors. We have built upon their efforts and examined the impact of using a single tree prior on time estimates in BEAST under many other scenarios, such as an autocorrelated branch rate (ABR) model and the complete absence of ILS. Furthermore, we have evaluated the performance of a fast non-Bayesian method, RelTime, that relaxes the molecular clock throughout the tree (Battistuzzi, Tao, Jones, Tamura, & Kumar, 2018; Tamura et al., 2012; Tamura, Tao, & Kumar, 2018). Other fast non-Bayesian methods relax molecular clocks for estimating divergence times (e.g. penalized likelihood methods) (Ho & Duchêne, 2014; Sanderson, 2002). However, we have chosen RelTime because other non-Bayesian methods require the assumption of rate variation (e.g. rate autocorrelation is assumed in penalized likelihood methods) and cannot calculate the uncertainty of time estimates reliably, tending to produce overly narrow confidence intervals (Tao, Tamura, & Kumar, 2020; Tao, Tamura, Mello, & Kumar, 2019). RelTime has been reported to perform well for simulated and empirical data sets (Mello, Tao, Tamura, & Kumar, 2017; Tamura et al., 2012). It is often used to establish biological timescales, mainly for phylogenomic data sets (Bond et al., 2014; Chung et al., 2015; De Vega et al., 2015; Duellman, Marion, & Hedges, 2016; Harkins, Schwartz, Cartwright, & Stone, 2016; Irisarri et al., 2018; Kim et al., 2016; Kuntner et al., 2019; Mahler, Ingram, Revell, & Losos, 2013; Mahony, Foley, Biju, & Teeling, 2017; Pacheco et al., 2018; Qiao et al., 2016; Shin et al., 2018; Teixeira et al., 2017; Zhou et al., 2014). So, it is necessary to evaluate RelTime’s absolute and comparative performance for use in data sets containing both inter- and intraspecies sampling.
Overall, we have conducted many analyses and comparisons that reveal new properties of molecular dating with Bayesian and non-Bayesian approaches in addition to those reported in Ritchie et al. (2017) (Table S1). Ultimately, our analyses provide insights into many questions that are faced by practitioners in molecular dating studies.
2 |. MATERIAL AND METHOD
We analysed simulated data sets to test the performance of a Bayesian approach (BEAST 2) (Drummond et al., 2006) and the RelTime method (MEGA X) (Kumar, Stecher, Li, Knyaz, & Tamura, 2018; Tamura et al., 2012). Simulations provide valuable means to test the reliability of inferred node ages because the actual time is known and the impact of many biologically realistic conditions can be explored, including the presence/absence of autocorrelation of branch rates across the tree (ABR vs. IBR model) and the type of branching process (MSC vs. mixture model branching process).
The MSC branching process is fully dictated by a coalescent model, in which ILS is probable. The mixture branching process is a mixture model, where both coalescence and speciation processes are used to model ramification along the phylogenetic tree. In this case, a coalescent model is assumed for within-species divergences, while a speciation model is assumed for between-species divergences. Therefore, all species are reciprocally monophyletic, and incomplete lineage sorting is completely absent. This is more complex than a model fully dictated by an MSC branching process. Although it may seem biologically improbable, such type of phylogenies is often reported in the literature (e.g. Benítez-Benítez, Escudero, Rodríguez-Sánchez, Martín-Bravo, & Jiménez-Mejías, 2018; Betkas et al., 2019; Lim & Lee, 2018; Stribna et al., 2019) and one of the reasons for getting such phylogenies is small effective population sizes and/or high levels of purifying selection on the analysed locus. We will use the nomenclature ILS and no-ILS to refer to the MSC branching process and mixture branching process, respectively (Figure 1). Although the MSC model does not entail the presence of ILS, we think this is a more convenient way to label the simulations, facilitating the communication of our findings.
FIGURE 1.
An example of the diversity of trees used in this study. Panels a-c, trees simulated with incomplete lineage sorting (ILS); d-e, trees simulated without lineage sorting (no-ILS). Panels a and d were simulated under a constant branch rate (CBR) model; b and e were simulated under an independent branch rate (IBR) model; and c and f were simulated under an autocorrelated branch rate (ABR) model.
2.1 |. Computer-simulated data sets
We considered two distinct scenarios to simulate data sets with a mixed sampling of intra- and interspecies sequences. First, we assumed the MSC branching model for the whole tree, hereafter referred to as ILS, which represents the baseline scenario for understanding sources of error. In this case, genealogies were simulated within species phylogeny under the MSC approach (e.g. Figure 1a–c). Second, we simulated a mixture branching model, hereafter referred to as no-ILS. Thus, all species are reciprocally monophyletic, and speciation times are concordant with the divergence of sequences (e.g. Figure 1d–f). The no-ILS scenario allowed us to assess the performance of methods under more complex evolutionary histories. Figure 2 summarizes the simulations conducted and analyses performed.
FIGURE 2.
Simulations and analyses performed in the present work. Branch rates were simulated under constant rates (CBR), independent rates (IBR) and autocorrelated rates (ABR) with 100 replicates each. Then, alignments were analysed under RelTime and BEAST 2 methods. In BEAST, four distinct tree priors were used, namely the Yule, birth–death, constant–coalescent and skyline–coalescent models. This represented 1,500 analyses for both no-ILS and ILS scenarios (see text for details), totalling 3,000 analyses.
For ILS simulations, phylogenies of ten species with a root height of 10 million years (Ma) and a Yule process with λ = 0.11 (speciation rate) were generated in the TreeSim R package (Stadler, 2011). The speciation rate adopted was based on the value estimated with the function yule from ape R package (Paradis & Schliep, 2019) for a mammalian timetree containing 5,364 species (Hedges et al., 2015). Then, single-gene genealogies were simulated in the Phybase R package (Liu & Yu, 2010) with a constant scaled population parameter (theta) of 0.014 and ten individual samples per species. Assumed generation time (g) was 20 years, and the effective population size (Ne) was 50,000. These values were based on the inferences made for one of the most documented lineages in terms of populational parameter estimates, the great apes (F.-C. Chen & Li, 2001; Hobolth, Dutheil, Hawks, Schierup, & Mailund, 2011; Schrago, 2014). An evolutionary rate of 3.5 × 10−9 substitutions/site/year was used, which was based on the evolutionary rate estimated from first and second codon positions of mitochondrial coding genes from vertebrate lineages (Endicott & Ho, 2008; Eo & DeWoody, 2010).
The no-ILS phylogenies were simulated with the same procedure and parameters used in ILS simulations (Yule process with ten species, λ = 0.11 and root height = 10 Ma). Then, the ms function from the phyclust R package, which generates coalescent trees under a modified Hudson (2002) ‘s neutral model, was used to simulate coalescent trees under constant population size with ten samples (k = 10) (Chen, 2011; Hudson, 2002). As in the ILS simulations, g = 20 years and Ne = 50,000. Coalescent trees were then ‘pasted’ at each tip (species) of the phylogeny while keeping the expected phylogeny ultrametric because all the sequences were contemporaneous. This was done by removing the last portion of each tip species branch according to the coalescent tMRCA. An evolutionary rate of 3.5 × 10−9 substitutions/site/year was again assumed. One hundred data sets were generated for both ILS and no-ILS analyses.
Furthermore, we simulated three different types of evolutionary rate variation among branches: constant branch rates (CBR, Figure 1a,d), independent branch rates (IBR, Figure 1b,e) and autocorrelated branch rates (ABR, Figure 1c,f). In CBR, the evolutionary rate of 3.5 × 10−9 substitutions/site/year was applied to all branches of the phylogenies. In IBR, branch rates were allowed to randomly vary as much as 50% of the mean (Tamura et al., 2012). Branch rates were sampled from uniform density in the interval [(1 − x)r, (1 + x)r], where r is the mean evolutionary rate (3.5 × 10−9 substitutions/site/year), and the x is the degree of rate variation (0.5 for 50% rate variation) (Fig. S1). In ABR, we used modified functions from the NELSI R package (Ho, Duchêne, & Duchêne, 2015) using 3.5 × 10−9 substitutions/site/year as the initial rate and a ν parameter of 0.01 (Kishino et al., 2001) (Fig. S1). Sequences were simulated in seq-gen (Rambaut & Grass, 1997) under HKY substitution model (Hasegawa, Kishino, & Yano, 1985) to generate alignments of 10,000 sites with equilibrium frequencies and transition/transversion ratios sampled from a range of values extracted from an empirical distribution (Rosenberg & Kumar, 2003). Thus, we simulated six scenarios: CBR-ILS, IBR-ILS, ABR-ILS, CBR-no-ILS, IBR-no-ILS and ABR-no-ILS, each of which contains 100 simulated data sets. We opted to simulate one long alignment, instead of simulating and concatenating several loci with distinct modes of evolution. It is because a long alignment allowed us to evaluate the best-case scenario for molecular dating methodologies, eliminating other sources of uncertainty due to short sequences, so that the actual performance of priors under distinct rate scenarios could be evaluated. This way, we can identify the shortcomings of dating methodologies and compare our results with previous work that did not use partitioning schemes (Ritchie et al., 2017).
Additionally, to evaluate the impact of the speed of the evolutionary rate in the performance of dating methodologies, we also performed simulations under the no-ILS scenario with a faster rate of evolution, which was 1.0 × 10−8 substitutions/site/year. This was more than 2.5 times faster than the original rate used and agreed with the values reported in the literature for synonymous substitution rates and the rate of fast-evolving genes of the mitochondrial DNA of several animal lineages (Endicott & Ho, 2008; Eo & DeWoody, 2010; Lynch, 2006; Mueller, 2006). Thus, for analyses with a faster rate, we simulated three scenarios: CRB, IBR and ABR, each of which contains 100 simulated data sets and evolved under the no-ILS model. All these 100 simulated data sets, under the three distinct rate scenarios, were analysed in RelTime. For BEAST, because of the limitation of the computational times imposed by the analyses, we evaluated ten data sets from each rate scenario under the same conditions used for the slower rate simulations (see below).
2.2 |. Analysis of simulated data
Bayesian divergence times were estimated by using BEAST v2.4.7 (Bouckaert et al., 2014), and RelTime estimates were obtained using MEGA X (Kumar et al., 2018; Tamura et al., 2012). Correct substitution model (HKY) and topologies were used in all the dating analyses to prevent confounding errors from phylogeny inference and substitution model misspecification.
In BEAST 2, a strict clock was used for the data set simulated under CBR, while the uncorrelated lognormal relaxed clock was adopted for the data set simulated under IBR and ABR. All analyses were run with the default clock rate and mean rate priors for the strict clock and the uncorrelated lognormal relaxed clock, respectively. The root height was calibrated with a normal prior density with mean 1.0 and standard deviation of 0.05. This was done to exclude the effects of multiple calibrations and the interaction between them, which can impact the results considerably (Battistuzzi, Billing-Ross, Murillo, Filipski, & Kumar, 2015; Duchêne, Duchêne, Holmes, & Ho, 2015; dos Reis et al., 2018; dos Reis et al., 2015). Node ages were inferred based on the Yule (pure birth, BEAST-Yule) and birth–death (BEAST-BD) speciation processes in BEAST, both with default settings (Yule birthRate = uniform prior [0.0, ∞]; BDBirthRate = uniform prior [0.0, 1,000.0] and BDDeathRate = uniform prior [0.0, 1.0]), in order to test the performance of distinct Bayesian tree priors. BEAST was also run with coalescent priors, namely the constant population size (BEAST-Constant) and the skyline-coalescent (BEAST-Skyline) with default settings (PopSize = 1/X prior [−∞, ∞]; MarkovChainedPopSizes = Jef freys, shape = 1.0) (Drummond, Rambaut, Shapiro, & Pybus, 2005). These coalescent tree priors were chosen to allow the comparison of our results with previous work (Ritchie et al., 2017). The same prior settings were used for analysing both slow and fast rate data sets. In all analyses, ESS (effective sample sizes) values were checked with the function effectiveSize from coda R package (Plummer, Best, Cowles, & Vines, 2006), after discarding the burn-in period. The Geweke convergence diagnostic was computed in coda for the first 10% and the last 50% of all MCMC runs (after discarding the burn-in period) to test for the stationarity of every single chain (Geweke, 1992). The recovered distribution of z-scores for individual parameters was generally centred around zero, with few values being outside two standard deviations above or below the mean (Figs. S2–S5). No significant discrepancies were recovered in the profile of the z-score distributions across distinct simulation conditions. Because of computational constraints, two chains were only run for randomly selected replicates for each simulation scenario and tree prior, and the convergence was visually accessed in the Tracer program (Rambaut, Drummond, Xie, Baele, & Suchard, 2018). In all of these cases, the BEAST chains converged. Additionally, the univariate potential scale reduction factors (PSRF) (Gelman & Rubin, 1992) were computed in coda for each parameter of those randomly selected chains and respected the limit of 1.2 suggested in Brooks and Gelman (1998) (with 0.4% of cases having scale reduction factors within the interval 1.2–1.25). Thus, the strategy used to monitor MCMC implies that differences found on the performance of distinct tree priors are not related to the mixing/stationarity of the MCMC chains.
In RelTime, one does not need to specify a tree prior, a clock model for evolutionary rates or a statistical distribution to describe the heterogeneity of branch rates (Tamura et al., 2012, 2018). RelTime computes relative times and lineage rates directly from the branch lengths that are inferred from the molecular sequences (Tamura et al., 2012, 2018), so branch length errors and variations in linegae rates are automatically incorporated in the calculation of confidence intervals (CIs) in RelTime (Tao, Tamura, Mello, et al., 2019). Calibrations are also not a prerequisite to estimating divergence times, so we did not use calibrations in RelTime analysis.
2.3 |. Measurements
To compare the estimated and true times, we report two primary metrics. One is the normalized time difference (Δt), which was computed for each node in every phylogeny analysed. Δt is the difference between the estimated and the true divergence time divided by the true time. For no-ILS phylogenies, we report Δt for within-population, coalescent (last coalescent within species) and interspecies comparisons to evaluate performance for these three distinct evolutionary levels. However, such a distinction is not possible for ILS data sets. Therefore, we also conducted a linear regression between all the estimated and true times in a phylogeny. The slope of the linear regression through the origin is referred to as the time slope, which is reported alongside the standard R2 statistic. For BEAST analyses, we used the median value as the point estimate from the posterior distributions. The precision of time estimates was evaluated by the width of HPDs (highest posterior densities) and RelTime CIs (Tao, Tamura, Mello, et al., 2019) as well as by their coverage probabilities, that is the frequency in which the corresponding HPD/CI included the correct time. All simulated and estimated divergence times were normalized to the height of the ingroup node so that the ingroup height was one in all cases, and all the other divergences were proportions of that. This node (the ingroup root height) was then excluded from all the accuracy and precision measures.
3 |. RESULTS
We carried out a total of 2,400 BEAST analyses for 600 simulated data sets, as four different tree priors were applied for each data set. The RelTime analysis was conducted only once for each data set, as RelTime does not require the specification of a tree prior. RelTime calculations completed in less than 2 min for every data set, but BEAST analyses took orders of magnitude longer (e.g. Battistuzzi, Billing-Ross, Paliwal, & Kumar, 2011). Also, several BEAST’s MCMC calculations did not reach acceptable ESS values (≥200) even with chain lengths that exceeded one hundred million generations and took ~ 12 hr to run on an Intel Core i7® iMac @ 4GHz machine. This problem was encountered frequently in BEAST-Yule analyses; they failed for 15% and 32% of no-ILS data sets for IBR and ABR models, respectively. For ILS data sets, 4%, 5% and 3% of ABR replicates in BEAST-Yule, BEAST-BD and BEAST-Skyline did not reach ESS values higher than 200. We excluded these data sets from further consideration and present the results based on the remaining data sets for BEAST analyses. Importantly, we focused the results mainly on the no-ILS simulations, as they represent the most complex scenario of the branching process.
When methods were evaluated with a faster evolutionary rate (1.0 × 10−8 substitutions/site/year) under the no-ILS scenario, their performance improved. For RelTime and BEAST, the use of a fast rate led to a general increase in accuracy for all node levels (within-pop, coalescent and interspecies). Importantly, results based on the simulations with a faster evolutionary rate were very similar to those obtained with a slower rate of evolution (3.5 × 10−9 substitutions/site/year) (Figure S6–S8). Therefore, we will present here only the slow rate scenario.
3.1 |. Performance for constant rate phylogenies
For no-ILS phylogenies, distributions of normalized difference between true and estimated time (Δt) for within-population divergences were centred around zero for BEAST (Figure 3a). BEAST-BD and BEAST-Skyline performed the best, and BEAST-Constant showed a slight tendency to overestimate times (Figure 3a). BEAST-Yule overestimated divergence times with a much higher median Δt than other tree priors (Figure 4a). In contrast, RelTime showed a tendency to underestimate within-population divergences (median Δt < 0; Figure 3a), because it assigns a time equal to 0 for nodes at which the tip sequences show a small amount of difference.
FIGURE 3.
Boxplots of the differences between true and estimated times normalized by the true times (simulations under no-ILS). Constant branch rates in the top panels (a-c), independent branch rates on the middle panels (d-f) and correlated branch rates on the bottom panels (g-i). Left-hand panels a, d and g display normalized difference values for population intradivergences; centre panels b, e and h for species coalescent times; and right-hand panels c, f and i for interspecies divergences. The number inside parentheses indicate the number of outliers (in percentage), which are not displayed in the figure. The colours correspond to the distinct dating methodologies: RelTime in grey, BEAST birth–death tree prior in blue, BEAST constant-size coalescent tree prior in green and BEAST-skyline-coalescent tree prior in yellow.
FIGURE 4.
Performance of the Yule prior for estimating divergence times in BEAST under the absence of lineage sorting (simulations under no-ILS). Normalized differences between actual and estimated times for intrapopulation divergences, species tMRCAs and interspecies divergences are shown. The number inside parentheses indicates outliers (in percentage) that are not displayed in the figure.
Unlike RelTime, BEAST tree priors assign non-zero times to all the branches (Marin & Hedges, 2018). In our simulations, all the expected divergence times were non-zero, so the difference between the small non-zero divergence times assigned by BEAST (represented by the posterior median) and the expected values was smaller than the difference between the zero times assigned by RelTime and the expected values. However, this strategy would result in an overestimation of times in BEAST if some of the sampled sequences were truly identical or the sequence divergence was very small. For example, At for nodes < 0.1% of the root (ingroup) height was always larger than 0 for BEAST, but it was smaller for RelTime (Figure 5). Consistent with this pattern is the observation that RelTime produced fewer outlier Δt values as compared to BEAST (numbers in parentheses in Figures 3a and 4a).
FIGURE 5.
Boxplots of the differences between actual and estimated times normalized by the true times for nodes < 0.1% of the root (ingroup) height in simulations under CBR and no-ILS (100 replicates). The number inside parentheses shows outliers (in percentage) not displayed. RelTime is displayed in grey, while Yule tree prior is displayed in pink, birth–death in blue, constant-size coalescent in green and skyline-coalescent in yellow.
Patterns for coalescent time (tMRCA) estimates were similar to those for within-population divergences for BEAST and RelTime (Figures 3b and 4a), except that the median Δt were closer to zero for BEAST-Constant, BEAST-Yule and RelTime (Figures 3b and 4a). RelTime performed better in estimating species tMRCAs than within-population divergences because the effect of very short branches was much smaller. All methods performed well for interspecies comparisons, and Δt values were generally close to zero (Figures 3c and 4a).
Overall, the analyses of data simulated under a strict clock and without any ILS produced similar patterns among divergence levels. Interestingly, such situations have not been analysed previously. So, with this baseline, we can begin to assess the impact of biological complexities on divergence time inference over different time scales. For example, the presence of incomplete lineage sorting (ILS) is a possible major biological feature of the data utilized in phylogeography and species delimitation studies. We examined whether adopting an MSC process (ILS simulations) would have any impact on node time estimates. For this comparison, we used time slopes because it was generally not possible to distinguish between within-population, coalescence and interspecies divergences in ILS simulations. The distributions of time slopes, derived from 100 simulated data sets in each case, are shown for RelTime (grey), BEAST-Yule (pink), BEAST-BD (blue), BEAST-Constant (green) and BEAST-Skyline (yellow) (Figure 6). Generally, BEAST analyses under distinct tree priors and RelTime performed well, with time slopes showing a sharp peak close to 1 for no-ILS (Figure 6a) and ILS phylogenies (Figure 6d). Interestingly, the dispersion of time slopes for ILS phylogenies was smaller than that for the no-ILS phylogenies (Figure 6a and d), which is likely because no-ILS phylogenies contained a higher amount of short branches near the tips of the phylogeny as compared to the ILS phylogenies (e.g. Figure 1).
FIGURE 6.
Histogram of slopes of the linear regressions between true times and inferred times at RelTime (grey) and BEAST under distinct tree priors: Yule (pink/red), birth–death (blue), constant-size coalescent (green) and skyline-coalescent (yellow) (for 100 simulations). Mean and standard deviation (in parentheses) values are displayed inside each panel coloured according to the method. Top panels (a-c) display results under the absence of lineage sorting (no-ILS), while bottom panels (d-f) under the presence of incomplete lineage sorting (ILS). Constant branch rates on the left (a and d), independent branch rates on the centre (b and e) and correlated branch rates on the right (c and f).
3.2 |. Branch rates varying independently (IBR)
The presence of evolutionary rate variability across branches produced similar results to those observed for the strict clock for most of the methods (Figures 3 and 5). The main problem was observed in BEAST-Yule analysis, in which the overestimation of times became more acute (Figure 4b), and time slopes showed a clear departure away from 1 (red curves in Figure 6b,e). These results agree with Ritchie et al. (2017)’s study, which already reported poor performance of BEAST-Yule for ILS phylogenies. Still, our analyses showed that this problem becomes even more acute for no-ILS phylogenies (Figure 6b, red curve). RelTime performed well for both no-ILS and ILS phylogenies simulated under the IBR model. For intrapopulation nodes, RelTime underestimated times slightly (Figure 3d), but its time slopes were closer to 1 (Figure 6b,e).
As expected, the variability of evolutionary rates caused a greater dispersion of times slopes for BEAST and RelTime for no-ILS data sets (Figure 6a vs. Figure 6b) and ILS data sets (Figure 6d vs. Figure 6e). However, the difference in dispersion for no-ILS and ILS data sets was rather small (Figure 6a vs. Figure 6d and Figure 6b vs. Figure 6e), which can be observed through a similar standard deviation (SD) values of the distribution of slopes. Overall, we found that all methods performed similarly for no-ILS and ILS phylogenies. The exception was BEAST-Yule analyses, in which the mean of the distribution of slopes was closer to one with SD values considerably lower in ILS simulations (SD = 0.124 for no-ILS data sets and SD = 0.074 for ILS data sets) (Figure 6b vs. Figure 6e). Additional simulations with branch rates randomly varying as much as 100% of the mean also showed results similar to those noted above for Bayesian methods (Figure S9). Also, RelTime produced estimates similar to those reported above.
3.3 |. Branch rate variation with autocorrelation (ABR)
In the analysis of CBR and IBR data sets, we were able to assume the correct evolutionary rate prior in the BEAST analyses. However, BEAST did not have a facility to select an autocorrelated clock model, so we examined how the use of the IBR model for ABR data sets impacts performance. There was not much difference between the time slopes for IBR and ABR models in no-ILS phylogenies, except for the BEAST-Yule analyses (Figure 6b vs. Figure 6c). This means that the violation of the clock model in BEAST when analysing ABR data sets caused limited performance detriment. However, the dispersion of time slopes was markedly higher for data sets with ILS (Figure 6e vs. Figure 6f). RelTime produced a distribution of error that was slightly narrower when compared to BEAST under the BD and coalescent tree priors (Figure 6f). Interestingly, BEAST-Yule’s performance for ABR data sets with ILS turns out to be better than other BEAST analyses (Figure 6f). This result differs from IBR analysis in our simulation and from the previous work of Ritchie et al. (2017), which evaluated BEAST-Yule’s performance exclusively under the IBR model.
3.4 |. Precision and coverage probabilities of divergence time estimation
To evaluate the impact of using a mixed sample of population and species sequences on the precision of divergence time estimation, we examined the highest posterior density intervals (HPDs) produced by Bayesian analyses and the confidence intervals (CIs) produced by RelTime. The width of HPDs of time estimates for the analyses of data simulated under CBR was very similar among the different tree prior analyses in BEAST (Figure 7a). Generally, adopting a Yule tree prior led to narrower HPDs, while the skyline tree prior produced wider HPDs. RelTime produced wider CIs than BEAST under the CBR, regardless of the tree prior adopted.
FIGURE 7.
Boxplots of the CI widths normalized by the estimated node ages in simulations under CBR (a), IBR (b) and ABR (c) scenarios with no-ILS (average of 100 replicates). The percentages above the boxplots show coverage probabilities. RelTime is displayed in grey, while Yule tree prior is displayed in pink, birth–death in blue, constant-size coalescent in green and skyline-coalescent in yellow
Coverage probabilities were similar for RelTime and BEAST analyses (>94%). Under IBR (Figure 7b) and ABR (Figure 7c), Yule tree prior produced the wider HPDs among all BEAST analyses, while presenting the lowest coverage probability. The coverage probabilities of the Yule tree prior were 62.2% under the IBR scenario and 71.8% under the ABR scenario. In comparison, coverage probabilities were higher than 90% for all other tree priors and rate scenarios.
For RelTime analyses of IBR data sets, the CI widths presented the same pattern as observed for the CBR simulations. In this case, RelTime CIs were much wider than BEAST HPDs overall. RelTime coverage probability was close to 90%, while BEAST-BD, BEAST-Constant and BEAST-Skyline coverage probabilities were around 96%. For the ABR data, RelTime CI widths were wider than BEAST-BD, BEAST-Constant and BEAST-Skyline HPD widths, but narrower than BEAST-Yule. Importantly, coverage probabilities were lower for both RelTime and BEAST analyses under the ABR scenario. BEAST-BD, BEAST-Constant and BEAST-Skyline analyses showed coverage probabilities around 91%, while RelTime coverage probability was ~88%.
For data sets in which branch rates varied randomly as much as 100% of the mean, HPD intervals from BEAST were narrower than RelTime CIs (Figure S10; see also Figure 6b). This was mainly due to the larger effect that very short branches have on RelTime than in BEAST. RelTime CI widths were much smaller when they were calculated, excluding very shallow nodes, while their effect was minor in BEAST (Figure S10).
4 |. DISCUSSION
Bayesian methods are frequently used to estimate sequence divergence times for data sets containing a mixed sampling of species and populations. However, there has been a paucity of computer simulation studies assessing their accuracy, apart from Ritchie et al.’s (2017) study. Therefore, we have conducted simulations under a variety of scenarios to explore the accuracy of Bayesian time estimates in mixed sample phylogenies with the presence of ILS, the violation of a strict molecular clock and the misspecification of the branch rate model (Table S1). We assessed the impact of tree priors on divergence time estimation in BEAST and showed that BEAST performs well for this kind of data set. Moreover, we evaluated the performance of RelTime, a non-Bayesian approach, on mixed sampled data. We found that RelTime is a reliable and computationally efficient method for estimating divergence times in phylogeographic, phylodynamics and species delimitation studies. In these evaluations, we chose not to study the impact of calibrations on time estimation as other molecular dating studies have done before to avoid confounding the impact of calibrations with that of rate models and other priors (Duchêne, Lanfear, & Ho, 2014; Warnock, Parham, Joyce, Lyson, & Donoghue, 2015; Warnock, Yang, & Donoghue, 2017).
We found that the choice of tree priors had little impact on BEAST results for data sets under CBR evolution because the confounding effect of times and rates is less problematic than the case in which rates vary among branches. For ABR and IBR phylogenies, BEAST performed well when the birth–death, constant–coalescent and skyline-coalescent priors were used. The performance of BEAST-Yule varied greatly depending on the presence/absence of rate variation, ILS and rate autocorrelation. Setting the performance of BEAST-Yule aside, one can conclude that the presence of rate variation introduced ≥ 3x more uncertainty in time estimation (for instance, the SD in slopes’ distribution was ~0.019 and ~0.103 for ILS data sets under CBR and ABR, respectively). The difference was the highest for ILS data sets with the ABR model. Interestingly, our simulation results suggest that the accuracy of dating with data sets containing a mix of inter- and intraspecies sampling may not be strongly impacted by the actual shape of the phylogenetic trees, as ILS and no-ILS phylogenies resulted in the similar performance of the dating methods.
For BEAST-Yule, the performance was significantly improved when ILS data were analysed, mainly for ABR simulations. This improvement is likely because of a trade-off between the fact that no-ILS phylogenies contain many more closely related sequences for which divergence dates are harder to estimate. The number of short branches close to the tips of the phylogeny is much higher in no-ILS than in ILS phylogenies (see Figure 1). This pattern leads to a stronger violation of the Yule process than in the case where ILS was allowed to occur, but it does not appear to impact dating with BD and coalescent priors. This difference is likely because the Yule prior has less flexibility when compared to the other ones, as it considers all divergences to be speciation events and does not model extinction. This result is in agreement with Ritchie et al. (2017) that has also shown that the Yule tree prior’s performance was very sensitive across a range of sampling scenarios (varying the number of species and alleles sampled per species), in contrast to BEAST-BD and coalescent tree priors for which results were consistent across distinct sampling schemes.
Our analyses confirmed conclusions reached by Ritchie et al. (2017) about the poor performance of BEAST-Yule analyses as compared to BEAST-BD and BEAST- Skyline. Besides this, our analysis of no-ILS phylogenies with the IBR model extends this conclusion, as we have found BEAST-Yule to perform even worse for no-ILS data sets as compared to ILS data sets. The BEAST-Skyline had a better performance in Ritchie et al. (2017) simulations across distinct sequence sampling schemes. We have also extended their findings by showing that the BEAST-Skyline approach appears to perform the best across all rate and lineage sorting models. It is likely because the skyline approach (Drummond et al., 2005) does not assume a constant population size and can accommodate increased diversification in recent times. Interestingly, preliminary analyses for IBR and ABR simulations showed that the adoption of an MSC model to estimate divergence times in StarBEAST 2 presented results very similar to the recovered using BEAST 2 with BD and coalescent tree priors (Figure S11). Therefore, BEAST 2 may be as effective as StarBEAST 2 to infer timetrees for data sets with mixed sampling and few loci, as long as the Yule tree prior is avoided.
As for the precision of time estimates, the birth–death, constant–coalescent and skyline-coalescent tree priors had an outstanding performance in all rate scenarios, with high coverage probabilities. However, uncertainty levels were higher for BEAST-Yule analyses than for the aforementioned tree priors under IBR and ABR scenarios. Additionally, the adoption of a Yule tree prior also dropped coverage probability to low values under these rate scenarios. This makes the hypotheses testing in phylogeography and phylodynamics studies difficult when using the Yule tree prior because these studies generally employ data sets with a mix of intra- and interspecies sampling. Our finding is in agreement with Ritchie et al.’s (2017) findings, suggesting that the Yule tree prior is a poor choice to estimate timetrees for data sets with mixed sampling.
RelTime also performed well in inferring timetrees for data sets with mixed sampling. The performance was comparable to BEAST when birth–death, constant–coalescent and skyline tree priors were used. We found that RelTime’s performance was least affected by different simulated scenarios. It is likely because RelTime does not require a priori specification of tree priors or rate models and directly estimates relative lineage rates and times from branch lengths (in units of substitutions per site).
Importantly, RelTime performed well in estimating divergences when the branch rates were autocorrelated. Tao, Tamura, Battistuzzi, et al. (2019) have shown that the autocorrelation of rates is common in molecular phylogenies, so RelTime will be suitable for empirical analysis of large data sets with a mix of inter- and intraspecies divergences. RelTime also performed well for ILS phylogenies, which is essential because ILS is widespread for data sets with mixed sampling (Jennings & Edwards, 2005; Wang et al., 2018).
In RelTime, the relative rate between sister lineages is the ratio of the evolutionary depths of the two lineages, and a relative rate framework is used without any need for model selection. It was already shown that this implementation generates a reasonable description of molecular rate evolution with minimal computational demands (Tamura et al., 2018; Tao, Tamura, Mello, et al., 2019). This approach is in contrast to Bayesian methods that require the specification of a branch rate model. Therefore, the varying modes of the rate of evolution simulated, as well as the presence/absence of ILS, do impact RelTime performance as much as they impacted BEAST under some tree priors.
We found a fundamental difference between RelTime and Bayesian in their time estimates for nodes near the tips for which sequences are identical or have very few changes (i.e. branch lengths and node heights are close to zero). RelTime will produce a time estimate of zero (or close to zero), which was generally an underestimate in our simulations. Bayesian approaches will often produce a non-zero estimate in these cases based on the tree prior chosen. Therefore, the estimates and confidence intervals for very shallow molecular dates will differ between the two methods.
Although RelTime CIs were generally wider than BEAST HPDs produced by birth–death, constant–coalescent and skyline–coalescent tree priors, both types of approaches produced excellent coverage probabilities for the ABR scenario. This reinforces the utility of the method to estimate molecular dates for data sets with mixed sampling, since the autocorrelation of rates may be widespread in such data. Additionally, RelTime generated time estimates orders of magnitude faster than BEAST. We suggest its usefulness for data sets that have a mixture of intra- and interspecies sampling (often found in phylogeographic, phylodynamics and species delimitation studies) without assuming a priori any model for the branching process on the tree or the branch rate variation.
In conclusion, our simulations covered a wide range of modes of evolution that considered distinct scenarios of rate evolution (CBR, IBR and ABR) and presence/absence of ILS, many of which were never carried out before for data sets with a mixed sampling of intra- and interspecies sequences. Therefore, our study provides important insights into the impact of tree priors on divergence time estimation in BEAST. Evaluation of RelTime’s performance on mixed sampling data also showed that RelTime could be used as a reliable and computationally efficient alternative for estimating divergence times in phylogeographic, phylodynamics and species delimitation studies.
Supplementary Material
ACKNOWLEDGEMENTS
We thank many reviewers for helpful comments on previous versions of this manuscript. We also thank Dr. Marcos Caraballo for technical assistance. This research was supported by grants from the Brazilian Research Council (CNPq, 233920/2014-5 and 409152/2018-8) to BM and National Aeronautics and Space Administration (NASA NNX16AJ30G), National Institutes of Health (GM0126567-03) and National Science Foundation (NSF 1661218 and 1932765) to SK.
Funding information
Brazilian Research Council (CNPq), Grant/Award Number: 233920/2014-5 and 409152/2018-8; National Aeronautics and Space Administration, Grant/Award Number: NNX16AJ30G; Division of Biological Infrastructure, Grant/Award Number: 1661218 and 1932765; National Institute of General Medical Sciences, Grant/Award Number: GM0126567-02
Footnotes
DATA AVAILABILITY STATEMENT
All the simulated data sets are available from Dryad (https://doi.org/10.5061/dryad.q573n5tgr). The data set consists of xml files used to perform BEAST2 (and StarBEAST2) analyses, which contain the simulated alignments and trees.
SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section.
REFERENCES
- Barba-Montoya J, Dos Reis M, Schneider H, Donoghue PCJ, & Yang Z (2018). Constraining uncertainty in the timescale of angiosperm evolution and the veracity of a Cretaceous Terrestrial Revolution. New Phytologist, 218(2), 819–834. 10.1111/nph.15011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Battistuzzi FU, Billing-Ross P, Murillo O, Filipski A, & Kumar S (2015). A protocol for diagnosing the effect of calibration priors on posterior time estimates: A case study for the cambrian explosion of animal Phyla. Molecular Biology and Evolution, 32(7), 1907–1912. 10.1093/molbev/msv075 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Battistuzzi FU, Billing-Ross P, Paliwal A, & Kumar S (2011). Fast and slow implementations of relaxed-clock methods show similar patterns of accuracy in estimating divergence times. Molecular Biology and Evolution, 28(9), 2439–2442. 10.1093/molbev/msr100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Battistuzzi FU, Tao Q, Jones L, Tamura K, & Kumar S (2018). RelTime relaxes the strict molecular clock throughout the phylogeny. Genome Biology and Evolution, 10(6), 1631–1636. 10.1093/gbe/evy118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Benítez-Benítez C, Escudero M, Rodríguez-Sánchez F, Martín-Bravo S, & Jiménez-Mejías P (2018). Pliocene-Pleistocene ecological niche evolution shapes the phylogeography of a Mediterranean plant group. Molecular Ecology, 27, 1696–1713. 10.1111/mec.14567 [DOI] [PubMed] [Google Scholar]
- Betkas Y, Aksu I, Kaya C, Baycelebi E, Atasaral S, Ekmekci FG, & Turan D (2019). Phylogeny and phylogeography of the genus Alburnoides (Teleostei, Cyprinidae) in Turkey based on mitochondrial DNA sequences. Mitochondrial DNA Part A, 30(7), 794–805. 10.1080/24701394.2019.1664493 [DOI] [PubMed] [Google Scholar]
- Bond JE, Garrison NL, Hamilton CA, Godwin RL, Hedin M, & Agnarsson I (2014). Phylogenomics resolves a spider backbone phylogeny and rejects a prevailing paradigm for orb web evolution. Current Biology, 24(15), 1765–1771. 10.1016/j.cub.2014.06.034 [DOI] [PubMed] [Google Scholar]
- Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H, Xie D, … Drummond AJ (2014). BEAST 2: A software platform for bayesian evolutionary analysis. PLOS Computational Biology, 10(4), e1003537. 10.1371/journal.pcbi.1003537 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bouckaert R, Vaughan TG, Barido-Sottani J, Duchêne S, Fourment M, Gavryushkina A, … Drummond AJ (2019). BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 15(4), e1006650. 10.1371/journal.pcbi.1006650 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brooks SP, & Gelman A (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434–455. 10.2307/1390675 [DOI] [Google Scholar]
- Chen F-C, & Li W-H (2001). Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. The American Journal of Human Genetics, 68(2), 444–456. 10.1086/318206 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen W-C (2011). Overlapping codon model, phylogenetic clustering, and alternative partial expectation conditional maximization algorithm. Ph.D. Dissertation.Iowa Stat University [Google Scholar]
- Chung O, Jin S, Cho YS, Lim J, Kim H, Jho S, … Paek WK (2015). The first whole genome and transcriptome of the cinereous vulture reveals adaptation in the gastric and immune defense systems and possible convergent evolution between the Old and New World vultures. Genome Biology, 16(215), 1–11. 10.1186/s13059-015-0780-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- De Vega JJ, Ayling S, Hegarty M, Kudrna D, Goicoechea JL, Ergon Å, … Skøt L (2015). Red clover (Trifolium pratense L.) draft genome provides a platform for trait improvement. Scientific Reports, 5(1), 10.1038/srep17394 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Degnan JH, & Salter LA (2005). Gene tree distributions under the coalescent process. Evolution; International Journal of Organic Evolution, 59(1), 24–37. 10.1111/j.0014-3820.2005.tb00891.x [DOI] [PubMed] [Google Scholar]
- dos Reis M, Gunnell GF, Barba-Montoya J, Wilkins A, Yang Z, & Yoder AD (2018). Using phylogenomic data to explore the effects of relaxed clocks and calibration strategies on divergence time estimation: primates as a test case. Systematic Biology, 67(4), 594–615. 10.1093/sysbio/syy001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- dos Reis M, Thawornwattana Y, Angelis K, Telford MJ, Donoghue PCJ, & Yang Z (2015). Uncertainty in the timing of origin of animals and the limits of precision in molecular timescales. Current Biology, 25(22), 2939–2950. 10.1016/j.cub.2015.09.066 [DOI] [PMC free article] [PubMed] [Google Scholar]
- dos Reis M, & Yang Z (2011). Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times. Molecular Biology and Evolution, 28(7), 2161–2172. 10.1093/molbev/msr045 [DOI] [PubMed] [Google Scholar]
- Drummond AJ, Ho SYW, Phillips MJ, & Rambaut A (2006). Relaxed phylogenetics and dating with confidence. PLoS Biology, 4(5), e88. 10.1371/journal.pbio.0040088 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drummond AJ, Rambaut A, Shapiro B, & Pybus OG (2005). Bayesian coalescent inference of past population dynamics from molecular sequences. Molecular Biology and Evolution, 22(5), 1185–1192. 10.1093/molbev/msi103 [DOI] [PubMed] [Google Scholar]
- Duchêne DA, Duchêne S, Holmes EC, & Ho SYW (2015). Evaluating the adequacy of molecular clock models using posterior predictive simulations. Molecular Biology and Evolution, 32(11), 2986–2995. 10.1093/molbev/msv154 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duchêne S, Lanfear R, & Ho SYW (2014). The impact of calibration and clock-model choice on molecular estimates of divergence times. Molecular Phylogenetics and Evolution, 78, 277–289. 10.1016/j.ympev.2014.05.032 [DOI] [PubMed] [Google Scholar]
- Duellman WE, Marion AB, & Hedges SB (2016). Phylogenetics, classification, and biogeography of the treefrogs (Amphibia: Anura: Arboranae). Zootaxa, 4104(1), 1. 10.11646/zootaxa.4104.1.1 [DOI] [PubMed] [Google Scholar]
- Edwards SV, & Beerli P (2000). Gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies. Evolution; International Journal of Organic Evolution, 54(6), 1839–1854. [DOI] [PubMed] [Google Scholar]
- Edwards SV, Shultz AJ, & Campbell-Staton SC (2015). Next-generation sequencing and the expanding domain of phylogeography. Folia Zoologica, 64(3), 187–206. 10.25225/fozo.v64.i3.a2.2015 [DOI] [Google Scholar]
- Endicott P, & Ho SYW (2008). A Bayesian evaluation of human mitochondrial substitution rates. American Journal of Human Genetics, 82(4), 895–902. 10.1016/j.ajhg.2008.01.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eo SH, & DeWoody JA (2010). Evolutionary rates of mitochondrial genomes correspond to diversification rates and to contemporary species richness in birds and reptiles. Proceedings of the Royal Society B: Biological Sciences, 277(1700), 3587–3592. 10.1098/rspb.2010.0965 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Esselstyn JA, Evans BJ, Sedlock JL, Anwarali Khan FA, & Heaney LR (2012). Single-locus species delimitation: A test of the mixed Yule-coalescent model, with an empirical application to Philippine round-leaf bats. Proceedings of the Royal Society B: Biological Sciences, 279(1743), 3678–3686. 10.1098/rspb.2012.0705 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Feller W (1939). Die Grundlagen der Volterraschen Theorie Des Kampfes Ums Dasein in Wahrscheinlichkeitstheoretischer Behandlung. Acta Biotheoretica, 5(1), 11–40. 10.1007/BF01602932 [DOI] [Google Scholar]
- Gelman A, & Rubin DB (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–511. 10.1214/ss/1177011136 [DOI] [Google Scholar]
- Geweke J (1992). Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In Bernardo JM, Berger JO, Dawid AP, & Smith AFM (Eds.), Bayesian statistics, 4th ed. Oxford, UKClarendon Press. [Google Scholar]
- Harkins KM, Schwartz RS, Cartwright RA, & Stone AC (2016). Phylogenomic reconstruction supports supercontinent origins for Leishmania. Infection, Genetics and Evolution, 38, 101–109. 10.1016/j.meegid.2015.11.030 [DOI] [PubMed] [Google Scholar]
- Hasegawa M, Kishino H, & Yano T (1985). Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution, 22(2), 160–174. 10.1007/BF02101694 [DOI] [PubMed] [Google Scholar]
- Hedges SB, Marin J, Suleski M, Paymer M, & Kumar S (2015). Tree of life reveals clock-like speciation and diversification. Molecular Biology and Evolution, 32(4), 835–845. 10.1093/molbev/msv037 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Heled J, & Drummond AJ (2010). Bayesian inference of species trees from multilocus data. Molecular Biology and Evolution, 27(3), 570–580. 10.1093/molbev/msp274 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ho SYW (2014). The changing face of the molecular evolutionary clock. Trends in Ecology and Evolution, 29(9), 496–503. 10.1016/j.tree.2014.07.004 [DOI] [PubMed] [Google Scholar]
- Ho SYW, & Duchêne S (2014). Molecular-clock methods for estimating evolutionary rates and timescales. Molecular Ecology, 23(24), 5947–5965. 10.1111/mec.12953 [DOI] [PubMed] [Google Scholar]
- Ho SYW, Duchêne S, & Duchêne DA (2015). Simulating and detecting autocorrelation of molecular evolutionary rates among lineages. Molecular Ecology Resources, 15(4), 688–696. 10.1111/1755-0998.12320 [DOI] [PubMed] [Google Scholar]
- Hobolth A, Dutheil JY, Hawks J, Schierup MH, & Mailund T (2011). Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Research, 21(3), 349–356. 10.1101/gr.114751.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hudson RR (2002). Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics, 18(2), 337–338. 10.1093/bioinformatics/18.2.337 [DOI] [PubMed] [Google Scholar]
- Irisarri I, Singh P, Koblmüller S, Torres-Dowdall J, Henning F, Franchini P, … Meyer A (2018). Phylogenomics uncovers early hybridization and adaptive loci shaping the radiation of Lake Tanganyika cichlid fishes. Nature Communications, 9(1), 3159. 10.1038/s41467-018-05479-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, … Zhang G (2014). Whole-genome analyses resolve early branches in the tree of life of modern birds. Science, 346(6215), 1320–1331. 10.1126/science.1253451 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jennings WB, & Edwards SV (2005). Speciational history of Australian grass finches (Poephila) inferred from thirty gene trees. Evolution; International Journal of Organic Evolution, 59(9), 2033–2047. [PubMed] [Google Scholar]
- Kendall DG (1948). On the generalized “Birth-and-Death” process. The Annals of Mathematical Statistics, 19(1), 1–15. 10.1214/aoms/1177730285 [DOI] [Google Scholar]
- Kim S, Cho YS, Kim H-M, Chung O, Kim H, Jho S, … Yeo J-H (2016). Comparison of carnivore, omnivore, and herbivore mammalian genomes with a new leopard assembly. Genome Biology, 17(1), 10.1186/s13059-016-1071-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kishino H, Thorne JL, & Bruno WJ (2001). Performance of a divergence time estimation method under a probabilistic model of rate evolution. Molecular Biology and Evolution, 18(3), 352–361. 10.1093/oxfordjournals.molbev.a003811 [DOI] [PubMed] [Google Scholar]
- Kuhner MK, Yamato J, & Felsenstein J (1995). Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics, 140(4), 1421–1430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuhner MK, Yamato J, & Felsenstein J (1998). Maximum likelihood estimation of population growth rates based on the coalescent. Genetics, 149(1), 429–434. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar S, Stecher G, Li M, Knyaz C, & Tamura K (2018). MEGA X: Molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution, 35(6), 1547–1549. 10.1093/molbev/msy096 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuntner M, Hamilton CA, Cheng R-C, Gregorič M, Lupše N, Lokovšek T, … Bond JE (2019). Golden orbweavers ignore biological rules: phylogenomic and comparative analyses unravel a complex evolution of sexual size dimorphism. Systematic Biology, 68(4), 555–572. 10.1093/sysbio/syy082 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lemmon AR, & Lemmon EM (2012). High-throughput identification of informative nuclear loci for shallow-scale phylogenetics and phylogeography. Systematic Biology, 61(5), 745–761. 10.1093/sysbio/sys051 [DOI] [PubMed] [Google Scholar]
- Lim BK, & Lee TE Jr (2018). Community ecology and phylogeography of bats in the guianan savannas of Northern South America. Diversity, 10, 129. 10.3390/d10040129 [DOI] [Google Scholar]
- Linder M, Britton T, & Sennblad B (2011). Evaluation of Bayesian models of substitution rate evolution-parental guidance versus mutual independence. Systematic Biology, 60(3), 329–342. 10.1093/sysbio/syr009 [DOI] [PubMed] [Google Scholar]
- Liu L, & Yu L (2010). Phybase: An R package for species tree analysis. Bioinformatics, 26(7), 962–963. 10.1093/bioinformatics/btq062 [DOI] [PubMed] [Google Scholar]
- Lynch M (2006). Mutation pressure and the evolution of organelle genomic architecture. Science, 311(5768), 1727–1730. 10.1126/science.1118884 [DOI] [PubMed] [Google Scholar]
- Maddison WP (1997). Gene trees in species trees. Systematic Biology, 46(3), 523–536. 10.1093/sysbio/46.3.523 [DOI] [Google Scholar]
- Mahler DL, Ingram T, Revell LJ, & Losos JB (2013). Exceptional convergence on the macroevolutionary landscape in island lizard radiations. Science, 341(6143), 292–295. 10.1126/science.1232392 [DOI] [PubMed] [Google Scholar]
- Mahony S, Foley NM, Biju SD, & Teeling EC (2017). Evolutionary history of the asian horned frogs (Megophryinae): integrative approaches to timetree dating in the absence of a fossil record. Molecular Biology and Evolution, 34(3), 744–771. 10.1093/molbev/msw267 [DOI] [PubMed] [Google Scholar]
- Manzanilla V, Kool A, Nguyen Nhat L, Van Nong H, Le Thi Thu H, & de Boer HJ (2018). Phylogenomics and barcoding of Panax: Toward the identification of ginseng species. BMC Evolutionary Biology, 18, 44. 10.1186/s12862-018-1160-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marin J, & Hedges SB (2018). Undersampling genomes has biased time and rate estimates throughout the tree of life. Molecular Biology and Evolution, 35(8), 2077–2084. 10.1093/molbev/msy103 [DOI] [PubMed] [Google Scholar]
- McCluskey BM, & Postlethwait JH (2015). Phylogeny of Zebrafish, a “Model Species,” within Danio, a “Model Genus.”. Molecular Biology and Evolution, 32(3), 635–652. 10.1093/molbev/msu325 [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCormack JE, Hird SM, Zellmer AJ, Carstens BC, & Brumfield RT (2013). Applications of next-generation sequencing to phylogeography and phylogenetics. Molecular Phylogenetics and Evolution, 66(2), 526–538. 10.1016/j.ympev.2011.12.007 [DOI] [PubMed] [Google Scholar]
- Mello B, Tao Q, Tamura K, & Kumar S (2017). Fast and accurate estimates of divergence times from big data. Molecular Biology and Evolution, 34(1), 45–50. 10.1093/molbev/msw247 [DOI] [PubMed] [Google Scholar]
- Mello B, Vilela JF, & Schrago CG (2018). Conservation phylogenetics and computational species delimitation of Neotropical primates. Biological Conservation, 217, 397–406. 10.1016/j.biocon.2017.11.017 [DOI] [Google Scholar]
- Melville J, Haines ML, Boysen K, Hodkinson L, Kilian A, Smith Date KL, … Parris KM (2017). Identifying hybridization and admixture using SNPs: Application of the DArTseq platform in phylogeographic research on vertebrates. Royal Society Open Science, 4(7), 161061. 10.1098/rsos.161061 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Merwe M, McPherson H, Siow J, & Rossetto M (2014). Next-Gen phylogeography of rainforest trees: Exploring landscape-level cpDNA variation from whole-genome sequencing. Molecular Ecology Resources, 14(1), 199–208. 10.1111/1755-0998.12176 [DOI] [PubMed] [Google Scholar]
- Mueller RL (2006). Evolutionary rates, divergence dates, and the performance of mitochondrial genes in bayesian phylogenetic analysis. Systematic Biology, 55(2), 289–300. 10.1080/10635150500541672 [DOI] [PubMed] [Google Scholar]
- Ogilvie HA, Bouckaert RR, & Drummond AJ (2017). StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Molecular Biology and Evolution, 34(8), 2101–2114. 10.1093/molbev/msx126 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ogilvie HA, Heled J, Xie D, & Drummond AJ (2016). Computational performance and statistical accuracy of *BEAST and comparisons with other methods. Systematic Biology, 65(3), 381–396. 10.1093/sysbio/syv118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- O’Reilly JE, dos Reis M, & Donoghue PCJ (2015). Dating tips for divergence-time estimation. Trends in Genetics, 31(11), 637–650. 10.1016/j.tig.2015.08.001 [DOI] [PubMed] [Google Scholar]
- Pacheco MA, Matta NE, Valkiunas G, Parker PG, Mello B, Stanley CE, … Escalante AA (2018). Mode and rate of evolution of haemosporidian mitochondrial genomes: Timing the radiation of avian parasites. Molecular Biology and Evolution, 35(2), 383–403. 10.1093/molbev/msx285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Paradis E, & Schliep K (2019). ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics, 35(3), 526–528. [DOI] [PubMed] [Google Scholar]
- Plummer M, Best N, Cowles K, & Vines K (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6(1), 7–11. [Google Scholar]
- Qiao Q, Xue LI, Wang Q, Sun H, Zhong Y, Huang J, … Zhang T (2016). Comparative transcriptomics of strawberries (Fragaria spp.) provides insights into evolutionary patterns. Frontiers in Plant Science, 7(1839), 1–10. 10.3389/fpls.2016.01839 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rambaut A, Drummond AJ, Xie D, Baele G, & Suchard MA (2018). Posterior summarisation in Bayesian phylogenetics using Tracer 1.7. Systematic Biology, 67(5), 901–904. 10.1093/sysbio/syy032 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rambaut A, & Grass NC (1997). Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Computer Applications in the Biosciences, 13(3), 235–238. 10.1093/bioinformatics/13.3.235 [DOI] [PubMed] [Google Scholar]
- Rannala B, & Yang Z (2003). Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics, 164(4), 1645–1656. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rannala B, & Yang Z (2017). Efficient Bayesian species tree inference under the multispecies coalescent. Systematic Biology, 66(5), 823–842. 10.1093/sysbio/syw119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ritchie AM, Lo N, & Ho SYW (2017). The impact of the tree prior on molecular dating of data sets containing a mixture of inter- and intraspecies sampling. Systematic Biology, 66(3), 413–425. 10.1093/sysbio/syw095 [DOI] [PubMed] [Google Scholar]
- Rosenberg MS, & Kumar S (2003). Heterogeneity of nucleotide frequencies among evolutionary lineages and phylogenetic inference. Molecular Biology and Evolution, 20(4), 610–621. 10.1093/molbev/msg067 [DOI] [PubMed] [Google Scholar]
- Sanderson MJ (2002). Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Molecular Biology and Evolution, 19(1), 101–109. 10.1093/oxfordjournals.molbev.a003974 [DOI] [PubMed] [Google Scholar]
- Schrago CG (2014). The effective population sizes of the anthropoid ancestors of the human-chimpanzee lineage provide insights on the historical biogeography of the great apes. Molecular Biology and Evolution, 31(1), 37–47. 10.1093/molbev/mst191 [DOI] [PubMed] [Google Scholar]
- Shin S, Clarke DJ, Lemmon AR, Moriarty Lemmon E, Aitken AL, Haddad S, … McKenna DD (2018). Phylogenomic data yield new and robust insights into the phylogeny and evolution of Weevils. Molecular Biology and Evolution, 35(4), 823–836. 10.1093/molbev/msx324 [DOI] [PubMed] [Google Scholar]
- Smith SA, Beaulieu JM, & Donoghue MJ (2010). An uncorrelated relaxed-clock analysis suggests an earlier origin for flowering plants. Proceedings of the National Academy of Sciences, 107(13), 5897–5902. 10.1073/pnas.1001225107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stadler T (2011). Simulating trees with a fixed number of extant species. Systematic Biology, 60(5), 676–684. 10.1093/sysbio/syr029 [DOI] [PubMed] [Google Scholar]
- Stribna T, Romportl D, Demjanovič J, Vogeler A, Tschapka M, Benda P, … Hulva P (2019). Pan African phylogeography and palaeodistribution of rousettine fruit bats: Ecogeographic correlation with Pleistocene climate vegetation cycles. Journal of Biogeography, 46(10), 2336–2349. 10.1111/jbi.13651 [DOI] [Google Scholar]
- Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, & Rambaut A (2018). Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evolution, 4(1), vey016. 10.1093/ve/vey016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahata N (1989). Gene genealogy in three related populations: Consistency probability between gene and population trees. Genetics, 122(4), 957–966. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takahata N, & Nei M (1985). Gene genealogy and variance of interpopulational nucleotide differences. Genetics, 110(2), 325–344. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tamura K, Battistuzzi FU, Billing-Ross P, Murillo O, Filipski A, & Kumar S (2012). Estimating divergence times in large molecular phylogenies. Proceedings of the National Academy of Sciences, 109(47), 19333–19338. 10.1073/pnas.1213199109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tamura K, Tao Q, & Kumar S (2018). Theoretical foundation of the reltime method for estimating divergence times from variable evolutionary rates. Molecular Biology and Evolution, 35(7), 1770–1782. 10.1093/molbev/msy044 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tao Q, Tamura K, U Battistuzzi F, & Kumar S, (2019). A machine learning method for detecting autocorrelation of evolutionary rates in large phylogenies. Molecular Biology and Evolution, 36(4), 811–824. 10.1093/molbev/msz014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tao Q, Tamura K, & Kumar S (2020). Rapid and reliable methods for molecular dating. In The molecular evolutionary clock. Theory and practice. Edited by Ho S (accepted with revision). [Google Scholar]
- Tao Q, Tamura K, Mello B, & Kumar S (2019). Reliable confidence intervals for reltime estimates of evolutionary divergence times. Molecular Biology and Evolution, 37(1), 280–290. 10.1093/molbev/msz236 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Teixeira MM, Moreno LF, Stielow BJ, Muszewska A, Hainaut M, Gonzaga L, … de Hoog GS (2017). Exploring the genomic diversity of black yeasts and relatives (Chaetothyriales, Ascomycota). Studies in Mycology, 86, 1–28. 10.1016/j.simyco.2017.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thorne JL, Kishino H, & Painter IS (1998). Estimating the rate of evolution of the rate of molecular evolution. Molecular Biology and Evolution, 15(12), 1647–1657. 10.1093/oxfordjournals.molbev.a025892 [DOI] [PubMed] [Google Scholar]
- Wang C, Shikano T, Persat H, & Merilä J (2015). Mitochondrial phylogeography and cryptic divergence in the stickleback genus Pungitius. Journal of Biogeography, 42(12), 2334–2348. 10.1111/jbi.12591 [DOI] [Google Scholar]
- Wang K, Lenstra JA, Liu L, Hu Q, Ma T, Qiu Q, & Liu J (2018). Incomplete lineage sorting rather than hybridization explains the inconsistent phylogeny of the wisent. Communications Biology, 1, 169. 10.1038/s42003-018-0176-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Warnock RCM, Parham JF, Joyce WG, Lyson TR, & Donoghue PCJ (2015). Calibration uncertainty in molecular dating analyses: There is no substitute for the prior evaluation of time priors. Proceedings of the Royal Society B: Biological Sciences, 282(1798), 20141013. 10.1098/rspb.2014.1013 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Warnock RCM, Yang Z, & Donoghue PCJ (2017). Testing the molecular clock using mechanistic models of fossil preservation and molecular evolution. Proceedings of the Royal Society B, 284(1857), 20170227. 10.1098/rspb.2017.0227 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu B, & Yang Z (2016). Challenges in species tree estimation under the multispecies coalescent model. Genetics, 204(4), 1353–1368. 10.1534/genetics.116.190173 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Z (2015). The BPP program for species tree estimation and species delimitation. Current Zoology, 61(5), 854–865. 10.1093/czoolo/61.5.854 [DOI] [Google Scholar]
- Yang Z, & Rannala B (2014). Unguided species delimitation using DNA sequence data from multiple loci. Molecular Biology and Evolution, 31(12), 3125–3135. 10.1093/molbev/msu279 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yule GU (1924). A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis. Philosophical Transactions of the Royal Society B, 213, 21–87. 10.1098/rstb.1925.0002 [DOI] [Google Scholar]
- Zhou X, Wang B, Pan QI, Zhang J, Kumar S, Sun X, … Li M (2014). Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history. Nature Genetics, 46(12), 1303–1310. 10.1038/ng.3137 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.