Abstract
Tip-dating methods are becoming popular alternatives to traditional node calibration approaches for building time-scaled phylogenetic trees, but questions remain about their application to empirical datasets. We compared the performance of the most popular methods against a dated tree of fossil Canidae derived from previously published monographs. Using a canid morphology dataset, we performed tip-dating using BEAST v. 2.1.3 and MrBayes v. 3.2.5. We find that for key nodes (Canis, approx. 3.2 Ma, Caninae approx. 11.7 Ma) a non-mechanistic model using a uniform tree prior produces estimates that are unrealistically old (27.5, 38.9 Ma). Mechanistic models (incorporating lineage birth, death and sampling rates) estimate ages that are closely in line with prior research. We provide a discussion of these two families of models (mechanistic versus non-mechanistic) and their applicability to fossil datasets.
Keywords: tip-dating, total evidence dating, Canidae, MrBayes, BEASTmasteR, uniform tree prior
1. Introduction
‘Tip-dating’ methods allow for fossils to be incorporated as terminal taxa in divergence dating analysis. These methods require a tree model that allows non-contemporaneous tips. These models can be categorized broadly into two types: mechanistic models where trees are a function of parametrized speciation, extinction and sampling processes, termed birth–death–serial–sampling (BDSS; [1]) or fossilized birth–death (FBD; [2]) models, and the non-mechanistic uniform, prior on trees and node ages [3], which does not have parameters for the rates of these processes. BDSS/FBD models can allow or disallow sampled ancestors (SAs; [4,5]). Importantly, tip-dating methods allow researchers to avoid relying on node calibrations. While node calibration approaches are valuable, they are subject to a number of well-known criticisms [2,3,6–8] such as subjectivity and incomplete use of information. Node calibration also weakens inferential capacity by requiring a priori constraint of dates that researchers would prefer to infer.
As a result of these analytical advantages, tip-dating methods are becoming popular. However, some studies using these approaches on empirical datasets have reached negative conclusions about the plausibility of inferred dates (references in the electronic supplementary material). While tip-dating methods have been validated against simulations, it is debatable to what extent the manufactured histories are comparable to the complexity of real evolutionary histories [9]. For empirical work, it can be difficult to tell if problematic inferences in a particular study are because of to the data, the methods, human error or a combination of the three.
It may therefore be useful to compare tip-dating inferences on a high-quality empirical dataset, one where the fossil record strongly corroborates key divergence times without Bayesian computational methods. An ideal dataset would also avoid difficulties found in classic dating questions such as the origin of angiosperms, placental mammals, crown birds and the Cambrian phyla (table 1). Suitable fossil datasets are rare, but one for which a strong argument (table 1) can be made is the fossil Canidae (dog family; [11]). Monographs on the three Canidae subfamilies Hesperocyoninae [12], Borophaginae [13] and Caninae [14] combined cladistic analysis of discrete characters with expert knowledge of stratigraphy and continuous characters to produce species-level phylogenies dated to approximately 1–2 myr resolution. We use Canidae to compare date estimates made under mechanistic (BDSS/FBD) and non-mechanistic (uniform tree prior) models to expert opinion. We conclude that reasonable date estimation requires an appropriate choice of tree prior, which may vary by palaeontological dataset.
Table 1.
clade features that make tip-dating challenging | example clades with challenges | canidae |
---|---|---|
clade evolved into widely disparate niches | angiosperms, mammals; hominids (forest versus savannah habitats) | clade in about the same ecological niche (carnivore) |
clade spans a mass extinction and post-extinction diversification | mammals, birds | approximately constant macroevolutionary regime |
clade has a massive worldwide radiation, and/or biogeographic history in region with weaker fossil availability (e.g. Australia) | angiosperms, mammals, birds, Australian marsupials | mostly endemic to a single region (North America) for most of Canidae history |
fossils have few characters | angiosperms (pollen), bivalves | canid fossils have many characters (100+), although more desired owing to the number of extant/fossil taxa (160+) |
fossils episodic or scarce near possible clade origin | placentals, angiosperms, Cambrian arthropods | fossils preserved continuously throughout clade history (40–0 Ma) |
morphological evolution affecting preservability | angiosperms (woody versus herbaceous); Cambrian phyla (soft versus hard parts; body size) | approximately constant preservability |
likely changes in molecular/morphological rate (owing to major changes in body size, population size, growth rate, etc.) | angiosperms (woody versus herbaceous, annuals versus perennials) | moderate change |
available coded fossils represent only a small proportion of total known diversity | e.g. O'Leary et al. [10] placental dataset | coded fossil diversity greatly exceeds extant diversity |
2. Methods
(a). Data
The ‘expert tree’ was digitized from the monographs of Wang & Tedford [12–14] using TreeRogue [15], with judgement calls resolved in favour of preserving the authors' depiction of divergence times (electronic supplementary material). Morphological characters and dates came from Slater [16,17].
(b). Tip-dating analyses
MrBayes analyses were conducted by modification of Slater's commands file. Fifty-eight variants of MrBayes analyses were constructed to investigate several issues noted in the interaction of MrBayes versions and documentation, and Slater's commands file (electronic supplementary material, appendix 1).
We compared the expert tree (figure 1a) and Slater's published uniform tree prior analysis that included many node-date constraints (figure 1b: mb1_orig) to six focal analyses (four MrBayes v. 3.2.5 analyses and two BEAST v. 2.1.3). These were (figure 1c) mb1_UC: Slater's analysis with various corrections; (figure 1d) mb8_UU: uniform tree prior, uninformative priors on clock parameters and no node date calibrations except for a required root age calibration, set to uniform (45 100) to represent the common situation where researchers wish to infer node dates rather than pre-specify them; (figure 1f) mb9x_SA: mb8_UU but with SA-BDSS tree prior and flat priors on speciation, extinction and sampling rate; (figure 1e) mb10_noSA: mb9x_SA but noSA-BDSS, i.e. disallowing SAs; (figure 1g) r1_noSA: BEAST2 noSA-BDSS analysis with flat priors used for each major parameter (mean and s.d. of the lognormal relaxed clock; and birth, death and serial sampling rates); (figure 1h) r2_SA: BEAST2 SA-BDSS analysis with the same priors. BEAST2 analyses were constructed with BEASTmasteR [18,19]; full details on the analyses are in the electronic supplementary material.
3. Results
The six focal analyses are compared in figure 1, and key priors and results are shown in the electronic supplementary material, table S1. The unconstrained MrBayes uniform tree prior analysis (mb8_UU) produces estimates with implausibly old ages and huge uncertainties, and with the age of Canidae overlapping the K–Pg boundary. This behaviour was also noted by Slater [16]. The expert-tree dates of crown Canis (which includes Cuon, Lycaon and Xenocyon) and crown Caninae are approximately 3.2 and approximately 11.7 Ma, but mb8_UU makes mean estimates of 27.5 and 38.9 Ma, and even the wide 95% highest posterior densities (HPDs), spanning 22–25 myr, do not overlap expert opinion. More surprisingly, even Slater's highly constrained analysis (mb1_UC), although closer, does not produce HPDs (5.1–9.6 Ma; 17.8–25.5 Ma) that overlap expert-tree dates. By contrast, both BEAST2 estimates (r1_noSA and r2_SA) and MrBayes noSA-BDSS (mb10_noSA, mb9x_SA) are within approximately 1–2 Ma of expert estimates (HPD widths approx. 2–3 myr). The date of total-group Canidae (node 3, figure 1) matches the expert tree when it has been constrained (mb1_UC), but is 27 Ma older in mb8_UU, and consistently approximately 3–5 Ma younger in BDSS-type analyses.
Additional comparisons are available in the electronic supplementary material and tables S1–S2, including comparisons of topological distances between the Bayesian dating estimates and an undated MrBayes analysis on the same data and posterior prediction of tip dates. The electronic supplementary material, appendix S1 also discuss difficulties observed in some non-focal runs.
4. Discussion
The result of greatest interest is the contrast between expert-tree dates and dates inferred with the uniform tree prior. Whether or not this is surprising may depend on researcher background. We suggest that reasoning from first principles suggests that effective tip-dating under the uniform tree prior will be difficult without strongly informative priors on node dates and/or clock rate and variability. Apart from such constraints, nothing in the tip dates or the uniform tree prior restricts the age of nodes below the dated tips; thus, in our fossils-only analysis, the node ages are scaled up and down as the root age is sampled according to the root age prior. Without informative priors, the clock rate and variability parameters will adjust along with the tree height; highly uncertain node ages will result.
Despite what first principles suggest, we suspect our results may surprise some researchers. The MrBayes uniform tree prior was the leading model in the early tip-dating literature (11 out of 16 papers as of mid-2015, nine of them as the exclusive Bayesian tip-dating method; electronic supplementary material), and until recently (October 2014, v. 3.2.3), the uniform tree prior was the only option available in MrBayes. Early tip-dating efforts in BEAST/BEAST2 required tedious manual editing of XML and/or elaborate scripting efforts (such as BEASTmasteR), whereas MrBayes was relatively easy to use. Therefore, many early attempts at tip-dating used the uniform tree prior.
In contrast to the results with the uniform tree prior, analyses using BDSS/FBD tree priors (mb10_noSA, mb9x_SA, r1_noSA, r2_SA) retrieved results that approximate previous age estimates. Given only the characters and tip-dates, and with uninformative priors on parameters and the root age, these analyses were able to estimate node ages that were close to expert opinion, with a high rate of fossil sampling limiting node ages. These analyses gave more reasonable age and uncertainty estimates than the uniform tree prior even when the analysis was given substantial additional information in the form of many node calibrations (mb1_UC). Even well constrained uniform tree prior analyses displayed a tendency to space node ages evenly between calibrations and tip dates, regardless of morphological branch lengths (electronic supplementary material).
Tip-dating with the uniform tree prior was introduced [3] as an alternative to node calibration, attractive because tip-dating avoided various undesirable compromises that researchers are forced to make to when constructing node-age priors. Ronquist et al. [3] also critiqued Stadler's [1] BDSS prior as being ‘complete but unrealistic’, particularly owing to assumptions about constant birth/death/sampling rates and sampling in the Recent. They offered the uniform prior as an alternative, free of these difficulties. If, however, strongly informative priors on rates or node age calibrations are required to produce reasonable results under the uniform tree prior, its main appeal is lost. The addition of BDSS/FBD models with SAs to MrBayes [5] suggests that the best prospects for tip-dating may lay in adding realism to mechanistic models, rather than in attempting to devise non-mechanistic, agnostic dating priors.
A major caveat in our study is that we did not attempt to study the effect of poorer fossil taxon sampling on the inferences made under different tree priors. Canidae are unusually well sampled. In other cases, researchers may only have a handful of fossils when true diversity was hundreds or thousands of species (closer to the situation in the exemplar Hymenoptera dataset explored by [3,5]). In such situations, the uniform tree prior's performance may improve relative to BDSS-type models attempting to estimate mechanistic parameters from few data.
A great deal of work remains to understand how best to perform tip-dating analyses. We have shown that for this high-quality dataset, mechanistic and non-mechanistic models perform quite differently, and present an argument that mechanistic models are more appropriate for this dataset.
Supplementary Material
Supplementary Material
Supplementary Material
Supplementary Material
Acknowledgements
We thank David Bapst, Graeme Lloyd, Jeremy Beaulieu, Kathryn Massana, Brian O'Meara, Graham Slater and Mike Lee for helpful comments and discussion, as well as the participants of the 2014 Society of Vertebrate Palaeontology tip-dating workshop/symposium. We also thank the BEAST developers and the beast-users Google Group, particularly Remco Bouckaert.
Data accessibility
All scripts, data files and results files are available via a zipfile on Dryad at http://dx.doi.org/10.5061/dryad.vn52f.
Authors' contributions
N.J.M. wrote BEASTmasteR, conducted the BEAST2 computational analyses and drafted the manuscript. A.W. contributed to MrBayes dating efforts and edited and corrected the manuscript. Both authors agree to be held accountable for the content therein and approve the final version of the manuscript.
Competing interests
We have no competing interests.
Funding
N.J.M. was supported by NIMBioS fellowship under NSF award no. EFJ0832858, and ARC DECRA fellowship DE150101773. Work on this topic began under the NSF Bivalves in Time and Space grant (DEB-0919451). A.W. was supported by NSF DEB-1256993.
References
- 1.Stadler T. 2010. Sampling-through-time in birth-death trees. J. Theor. Biol. 267, 396–404. ( 10.1016/j.jtbi.2010.09.010) [DOI] [PubMed] [Google Scholar]
- 2.Heath TA, Huelsenbeck JP, Stadler T. 2014. The fossilized birth-death process for coherent calibration of divergence-time estimates. Proc. Natl Acad. Sci. USA 111, E2957–E2966. ( 10.1073/pnas.1319091111) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ronquist F, Klopfstein S, Vilhelmsen L, Schulmeister S, Murray DL, Rasnitsyn AP. 2012. A total-evidence approach to dating with fossils, applied to the early radiation of the Hymenoptera. Syst. Biol. 61, 973–999. ( 10.1093/sysbio/sys058) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gavryushkina A, Welch D, Stadler T, Drummond AJ. 2014. Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration. PLoS Comput. Biol. 10, e1003919 ( 10.1371/journal.pcbi.1003919) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zhang C, Stadler T, Klopfstein S, Heath TA, Ronquist F.. 2016. Total-evidence dating under the fossilized birth-death process. Syst. Biol. 65, 228–249. ( 10.1093/sysbio/syv080) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Parham JF, et al. 2012. Best practices for justifying fossil calibrations. Syst. Biol. 61, 346–359. ( 10.1093/sysbio/syr107) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Pyron RA. 2011. Divergence time estimation using fossils as terminal taxa and the origins of Lissamphibia. Syst. Biol. 60, 466–481. ( 10.1093/sysbio/syr047) [DOI] [PubMed] [Google Scholar]
- 8.Wood HM, Matzke NJ, Gillespie RG, Griswold CE. 2013. Treating fossils as terminal taxa in divergence time estimation reveals ancient vicariance patterns in the Palpimanoid spiders. Syst. Biol. 62, 264–284. ( 10.1093/sysbio/sys092) [DOI] [PubMed] [Google Scholar]
- 9.Hillis DM. 1995. Approaches for assessing phylogenetic accuracy. Syst. Biol. 44, 3–16. ( 10.1093/sysbio/44.1.3) [DOI] [Google Scholar]
- 10.O'Leary, et al. 2013. The placental mammal ancestor and the post-K–Pg radiation of placentals. Science 339, 662–667. ( 10.1126/science.1229237) [DOI] [PubMed] [Google Scholar]
- 11.Wang XT, Richard H. 2008. Dogs: their fossil relatives and evolutionary history. New York, NY: Columbia University Press. [Google Scholar]
- 12.Wang X. 1994. Phylogenetic systematics of the Hesperocyoninae (Carnivora, Canidae). Bull. Am. Museum Nat. Hist. 221, 1–207. [Google Scholar]
- 13.Wang XT, Richard H, Taylor BE. 1999. Phylogenetic systematics of the Borophaginae. Bull. Am. Mus. Nat. Hist. 243, 1–391. [Google Scholar]
- 14.Tedford RH, Wang X, Taylor BE. 2009. Phylogenetic systematics of the North American fossil Caninae (Carnivora, Canidae). Bull. Am. Mus. Nat. Hist. 325, 1–218. ( 10.1206/574.1) [DOI] [Google Scholar]
- 15.Matzke NJ.2013. TreeRogue: R code for digitizing trees. See https://stat.ethz.ch/pipermail/r-sig-phylo/2010-October/000816.html .
- 16.Slater GJ. 2015. Iterative adaptive radiations of fossil canids show no evidence for diversity-dependent trait evolution. Proc. Natl Acad. Sci. USA 112, 4897–4902. ( 10.1073/pnas.1403666111) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Slater GJ.2015. Data from: iterative adaptive radiations of fossil canids show no evidence for diversity-dependent trait evolution. Dryad. Accessed 1 May 2015. See . . [DOI]
- 18.Matzke NJ.2015. BEASTmasteR: automated conversion of NEXUS data to BEAST2 XML format, for fossil tip-dating and other uses. PhyloWiki. See http://phylo.wikidot.com/beastmaster .
- 19.Matzke NJ. 2016. The evolution of antievolution policies after Kitzmiller versus Dover. Science 351, 28–30. ( 10.1126/science.aad4057) [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All scripts, data files and results files are available via a zipfile on Dryad at http://dx.doi.org/10.5061/dryad.vn52f.