Abstract Abstract
Phylogenies are a central and indispensable tool for evolutionary and ecological research. Even though most angiosperm families are well investigated from a phylogenetic point of view, there are far less possibilities to carry out large-scale meta-analyses at order level or higher. Here, we reconstructed a large-scale dated phylogeny including nearly 1/8th of all angiosperm species, based on two plastid barcoding genes, matK (incl. trnK) and rbcL. Novel sequences were generated for several species, while the rest of the data were mined from GenBank. The resulting tree was dated using 56 angiosperm fossils as calibration points. The resulting megaphylogeny is one of the largest dated phylogenetic tree of angiosperms yet, consisting of 36,101 sampled species, representing 8,399 genera, 426 families and all orders. This novel framework will be useful for investigating different broad scale research questions in ecological and evolutionary biology.
Keywords: phylogeny, angiosperms, large-scale dating analyses, evolution, ecology
Introduction
During the past two decades, awareness has grown that ecological and evolutionary studies benefit from incorporating phylogenetic information (Wanntorp et al. 1990, Webb et al. 2002). In some ecological disciplines, it has even become almost unimaginable that a spatiotemporal context is not considered when specific hypotheses are tested. For example, in the fields of community ecology, trait-based ecology and macroecology, macroevolutionary and historical biogeography research hypotheses cannot be properly tested without the incorporation of a phylogenetic framework (e.g. Graham and Fine 2008, Hardy 2008, Kissling 2017, Vandelook et al. 2012, Vandelook et al. 2018, Couvreur et al. 2011, Janssens et al. 2009,Janssens et al. 2016). Likewise, phylogenetic diversity is considered an important element in conservation biology and related biodiversity assessment studies (Chave et al. 2007). Even though the importance of phylogenetics in ecology and evolution is recognised, it remains somehow strenuous to combine ecological research with evolutionary biology and integrate it in a phylogenetic scenario. This discrepancy is sometimes caused by a lack of awareness and knowledge about the other disciplines, whereby researchers could be reluctant to reach out to such expertise and combine their results into new disciplines. Additionally, differences in methodologies and techniques applied by ecologists and evolutionary biologists can sometimes cause a certain hesitation to go for a complementary approach with blending disciplines. In addition, there is a nearly continuous development of new insights and techniques in the fields of ecology and evolution (e.g. Bouckaert et al. 2019, Revell et al. 2008, Revell 2012, Suchard et al. 2018), making it rather challenging to keep up to date with the latest novelties. Furthermore, not all organisms investigated from an ecological perspective are present in molecular databases, which make it difficult to construct a perfectly matching phylogenetic hypothesis for further analysis. For scientists who focus on resolving specific evolutionary or ecological queries, building a phylogenetic framework from novel gene sequence data is often a heavy burden as it takes a lot of time, money and effort, even apart from the specific expertise needed. The construction of a purpose-built phylogeny can be considered as rather costly and labour-intensive and requires more elaborate expertise on novel techniques than when sequences are merely mined from GenBank in order to make a tree, based on already existing sequences. Whereas the former strategy allows the user to make a tailor-made phylogeny that can be used for further ecological or evolutionary purposes, the latter is less proficient, as one can only use the sequences that are available in genetic databases. Nevertheless, in the case of large-scale meta-analyses, it becomes almost impossible to obtain sequence data from all species investigated. When there is a need to examine evolutionary and ecological trends in an historical context, a large-scale phylogenetic hypothesis, that is optimised in a spatiotemporal context, provides an optimal solution.
There is currently an ongoing quest to optimise the methodology for constructing large-scale mega-phylogenies that can be used for further ecological and evolutionary studies. This is done by either mining and analysing publicly available DNA sequences (Zanne et al. 2014), amalgamating published phylograms (Hinchliff et al. 2015) or the combination of both (Smith and Brown 2018). For example, Zanne et al. (2014) constructed their own large supermatrix-based phylogeny that was used to gain more insights into the evolution of cold-tolerant angiosperm lineages. However, the study of Qian and Jin (2016) showed that the phylogeny of Zanne et al. (2014) contained several taxonomic errors. The approaches of Smith and Brown (2018) and Hinchliff et al. (2015) also do not always provide the most optimal phylogenetic framework for further analyses as both studies use a (partially) synthetic approach, based on already published phylograms that can putatively contain inconsistencies in their estimated node ages. The main goal of the present study is, therefore, to provide a large-scale dated phylogeny - encompassing nearly 1/8th of all angiosperms - that can be used for further ecological and evolutionary analyses. In order to construct this angiosperm phylogeny, a comprehensive approach was applied in which sequence data were both mined and generated, subsequently aligned, phylogenetically analysed and dated using over 50 fossil calibration points. With the applied methodology, we aimed to create sufficient overlap in molecular markers without having too much missing sequence data in the datamatrix. In addition, phylogenetic analyses, as well as the age estimation assessment, were performed as a single analysis on the whole datamatrix in order to create a dated angiosperm mega-phylogeny that is characterised by a low degree of synthesis.
Material and methods
Marker choice
In 2009, the Consortium for the Barcode of Life working group (CBOL) advised sequencing of the two plastid markers matK (incl. trnK) and rbcL for identifying plant species, resulting in a massive amount of data available on GenBank. rbcL is a conservative locus with low level of variation across flowering plants and therefore useful for reconstructing higher level divergence. In contrast, matK contains rapidly evolving regions that are useful for studying interspecific divergence (Hilu et al. 2003, Kress et al. 2005). Thus, the combination of matK (incl. trnK) and rbcL has the advantage of combining different evolutionary rates, making it possible to infer relationships at different taxonomic levels. In addition, we sampled only matK (incl. trnK) and rbcL markers in order to reduce missing data to a minimum, as this impacts the phylogenetic inference between species. These supermatrix approaches - which generally contain a substantial amount of missing data – can suffer from imbalance in presence/absence for each taxon per locus, resulting in low resolution and support or even wrongly inferred relationships (Sanderson and Shaffer 2002, Roure et al. 2013).
Taxon sampling
We extracted angiosperm sequence data of rbcL and matK (incl. trnK) from GenBank (15 February 2015) using the ‘NCBI Nucleotide extraction’ tool in Geneious v11.0 (Auckland, New Zealand). Five gymnosperm genera were chosen as outgroup (Suppl. material 1). This large dataset was supplemented with 468 specimens of African tree species obtained via multiple barcoding projects (available at the Barcode of Life Data Systems (BOLD)), as well as via additional lab work (see paragraph on molecular protocols below). In total, 820 newly obtained sequences are submitted to GenBank (Suppl. material 1).
Molecular protocols
A modified CTAB protocol was used for total genomic DNA isolation (Tel-Zur et al. 1999). Secondary metabolites were removed by washing ground leaf material with extraction buffer (100 mM Tris pH 8, 5mM EDTA pH 8, 0.35 M sorbitol). After the addition of 575 µl CTAB lysis buffer with addition of 3% PVP-40, the samples were incubated for 1.5 hours (60°C). Chloroform-isoamylalcohol (24/1 v/v) extraction was done twice, followed by an ethanol-salt precipitation (absolute ethanol, sodium acetate 3 M). After centrifugation, the pellet was washed twice (70% ethanol), air-dried and dissolved in 100 µl TE buffer (10 mM Tris pH 8, 1 mM EDTA pH 8).
Amplification reactions of matK (incl. trnK) and rbcL were carried out with a 25 μl reaction mix containing 1 µl DNA, 2 x 1 µl oligonucleotide primer (100 ng/µl), 2.5 µl of 10 mM dNTPs, 2.5 µl Taq Buffer, 0.2 µl KAPA Taq DNA polymerase and 16.8 µl MilliQ water. Reactions commenced with a 3 minute heating at 95°C, followed by 30 cycles consisting of 95°C denaturation for 30 s, primer annealing for 60 s and extension at 72°C for 60 s. Reactions ended with a 3 minute incubation at 72°C. Annealing temperatures for matK (incl. trnK) and rbcL were set at 50°C and 55°C, respectively. Primers designed by Kim J. (unpublished) were used to sequence matK (incl. trnK), whereas rbcL primers were adopted from Fay et al. (1997) and Little and Barrington (2003). PCR products were cleaned using an ExoSap purification protocol. Purified amplification products were sequenced by the Macrogen sequencing facilities (Macrogen, Seoul, South Korea). Raw sequences were assembled using Geneious v11.0 (Biomatters, New Zealand).
Sequence alignment and phylogenetic analyses
We are aware that the publicly available database, GenBank, contains a large amount of erroneous data (Ashelford et al. 2005, Yao et al. 2004, Shen et al. 2013). Retrieving the sequence data was, therefore, subjected to a quality control procedure. All downloaded sequences were blasted (Megablast option) against the GenBank database, thereby discarding all sequences with anomalies against their original identification. Minimum similarity in BLAST was set at 0.0005, whereas word size (W) was reduced to 8 for greater sensitivity of the local pairwise alignment and the maximum hits was set at 250. A single sequence of each fragment was retained for each taxon name or non-canonical NCBI taxon identifier given in GenBank. In the case where multiple accessions per species were available on GenBank, we chose the accession with the highest sequence length, the best quality and the highest sequence similarity compared to the other accessions of the same species in the GenBank database. Additionally, sequences with multiple ambiguities were discarded, as well as sequences with similar taxon names, but different nucleotide sequences. In addition, sequences with erroneous taxonomic names (checked in R using the “Taxize” and “Taxonstand” packages (R Development Core Team 2009, Cayuela et al. 2012, Chamberlain et al. 2016)) were removed from further analyses. Importantly, Taxize uses the Taxonomic Name Resolution Service (TNRS; Boyle et al. 2013) function to match taxonomic names, whereas Taxonstand is linked with ‘The Plant List’ database. As such, we also checked the validity of the taxonomic names in our dataset using both databases. Only those taxa which had names that were considered valid for both databases were kept for further analyses.
For sequence fragments that are protein-encoded, comparison of amino acid (AA) sequences, based on the associated triplet codons between taxa, was applied. As a result, taxa with a sudden shift in AA or frame shift were discarded from the dataset.
Alignment was carried out in multiple stages. Due to our large angiosperm-wide dataset, an initial alignment (automatically and manually) was conducted for each order included in the dataset. Subsequently, the different alignments were combined using the Profile alignment algorithm (Geneious v11.0, Auckland, New Zealand). The initial automatic alignment was conducted with MAFFT (Katoh et al. 2002) using an E-INS-i algorithm, a 100PAM/k = 2 scoring matrix, a gap open penalty of 1.3, and an offset value of 0.123. Manual fine-tuning of the aligned dataset was performed in Geneious v11.0 (Auckland, New Zealand). During the manual alignment of the different datasets, we carefully assessed the homology of every nucleotide at each position in the alignment (Phillips et al. 2009). The large amount of angiosperm taxa included in the analyses often provided a good view on the evolution of the nucleotides at certain positions, in which some taxa functioned as transition lineages between differing nucleotides and their exact position in the alignment. The importance of a well-designed homology assessment for a complex sequence dataset has been proven successful here for the phylogenetic inference of the angiosperms.
The best-fit nucleotide substitution model for both rbcL and matK (incl. trnK) was selected using jModelTest 2.1.4. (Posada 2008) out of 88 possible models under the Akaike Information Criterion (AIC). The GTR+G model was determined as the best substitution model for each locus and, as such, both markers were jointly analysed under this model. Maximum Likelihood (ML) tree inference was conducted using the Randomized Axelerated Maximum Likelihood (RAxML) software version 7.4.2 (Stamatakis 2006) under the general time-reversible (GTR) substitution model with gamma rate heterogeneity and lewis correction. Although the phylogeny, based on the plastid dataset, generated relationships that corresponded well with currently known angiosperm phylogenies (e.g. Wikström et al. 2001, Soltis et al. 2002, Moore et al. 2007, Magallón and Castillo 2009, Magallón 2014, Magallón et al. 2015, Bell et al. 2005, Bell et al. 2010), we decided to use a constraint (Suppl. material 2) in order to make sure that possible unrecognised mismatches for certain puzzling lineages were significantly reduced. The constraint tree follows the phylogenetic framework of APG4 (Angiosperm Phylogeny Group 2016) at order level. At the lower phylogenetic level, families were only constrained as polytomy in their specific angiosperm order. Genera and species were not constrained.
Support values for the large angiosperm dataset were obtained via the rapid bootstrapping algorithm as implemented in RAxML 7.4.2 (Stamatakis 2006), examining 1000 pseudo-replicates under the same parameters as for the heuristic ML analyses. Bootstrap values were visualised using the Consensus Tree Builder algorithm as implemented in Geneious v11.0.
Divergence time analysis
Evaluation of fossil calibration points was carried out following the specimen-based approach for assessing paleontological data by Parham et al. (2012). As such, 56 angiosperm fossils were used as calibration points in our molecular dating analysis. Detailed information about the fossils, including (1) citation of museum specimens, (2) locality and stratigraphy of fossils, (3) referenced stratigraphic age and (4) crown/stem node position is provided in Table 1. Fossils are placed at both early and recently diversified lineages within the angiosperms. Due to the large size of the dataset, we applied the penalised likelihood algorithm as implemented in treePL (Smith and O'Meara 2012), which utilises hard minimum and maximum age constraints. In order to estimate these hard minimum and maximum age constraints, we calculated the log normal distribution of each fossil calibration point using BEAUti v.1.10 (Suchard et al. 2018). Maximum age constraints for each fossil correspond to the 95.0% upper boundary of the computed log normal distribution, in which the offset equals the age of the fossil calibration point, the mean is set at 1.0 and the standard deviation at 1.0. This methodology resulted in a minimum 15 million year broad interval for each angiosperm calibration point (Table 1). Due to recently published studies in which both old and young age estimates were retrieved for the crown node of the angiosperms (e.g. Bell et al. 2005, Bell et al. 2010, Magallón et al. 2015, Magallón 2014, Magallón and Castillo 2009, Moore et al. 2007, Smith et al. 2010, Wikström et al. 2001, Soltis et al. 2002), we opted to set the hard maximum and minimum calibration of the angiosperms at 220 and 180 million years, respectively. As for the overall calibration, we followed the strategy of Smith et al. (2010), in which all fossils were considered as a minimum-age constraint. Smith et al. (2010) applied this approach since earlier studies on angiosperm evolution had treated tricolpate fossil pollen as maximum-age constraint, thereby maybe artificially pushing the root age of the angiosperms towards more recent times (e.g. Soltis et al. 2002, Magallón et al. 2015, Magallón 2014, Magallón and Castillo 2009, Moore et al. 2007, Bell et al. 2010, Bell et al. 2005).
Table 1.
List of fossils used as calibration points, including their oldest stratigraphic occurrence, minimum and maximum ages, the calibrated clades and used references. cr.=crown, st.=stem.
Clade | Fossil | Reference | Period | Locality/Formation/Group | Min. age | Max. age | cr. / st. |
Ebenaceae | Austrodiospyros cryptostoma Basinger et Christophel | Basinger and Christophel 1985 | Late Eocene | Anglesea formation (Victoria, Australia) | 37.8 | 54.62 | cr. |
Apocynaceae | Apocynophyllum helveticum Heer | Wilde 1989 | Middle Eocene | Messel formation (Darmstadt, Germany) | 47.8 | 64.62 | cr. |
Cornaceae | Hironoia fusiformis Takahashi, Crane et Manchester | Takahashi et al. 2002 | Early Conacian | Ashizawa formation, Futuba group (North-eastern Honshu, Japan) | 89.8 | 106.6 | cr. |
Dipelta | Dipelta europaea Reid et Chandler | Reid and Chandler 1926 | Late Eocene-Early Oligocene | Bembridge Flora (UK) | 33.9 | 50.72 | st. |
Oleaceae | Fraxinus wilcoxiana (Berry) Call et Dilcher | Call and Dilcher 1992 | Middle Eocene | Claiborne formation (Tennessee, USA) | 47.8 | 64.62 | st. |
Diervilla | Diervilla echinata Piel | Piel 1971 | Oligocene | Fraser River system (British Colombia, Canada) | 27.8 | 44.62 | st. |
Solanaceae (Physalinae) | Physalis infinemundi Wilf, Carvahlo, Gandolfo et Cuneo | Wilf et al. 2017 | Early Eocene | Laguna del Hunco (Chubut, Patagonia, Argentina) | 52.0 | 68.82 | st. |
Valeriana | Valeriana sp. | Mai 1985 | Late Miocene | Europe | 11.6 | 28.42 | st. |
Emmenopterys | Emmenopterys Oliv. | Wehr and Manchester 1996 | Middle Eocene | Middle Eocene Republic Flora (Washington, USA) | 47.8 | 64.62 | st. |
Pelliciera | Pelliciera rhizophorae Planch. et Triana | Graham 1977 | Middle Eocene | Gatuncillo formation (Panama) | 47.8 | 64.62 | st. |
Araliaceae | Acanthopanax gigantocarpus Knobloch et Mai | Knobloch and Mai 1986 | Maastrichtian | Eisleben formation (Germany) | 72.1 | 88.92 | st. |
Ilex | Ilex hercynica Mai | Mai 1987 | Early Paleocene | Gonna formation (Sangerhausen, Germany) | 66.0 | 82.82 | st. |
Actinidiaceae | Saurauia antiqua Knobloch et Mai | Knobloch and Mai 1986 | Late Santonian | Klikov-Schichtenfolge (Germany) | 85.8 | 102.6 | st. |
Nymphaeales | unnamed Nymphaeales | Friis et al. 2001 | Late Aptian-Early Albian | Vale de Agua (Portugal) | 112.0 | 128.8 | cr. |
Canellales | Walkeripollis gabonensis Doyle, Hotton et Ward | Doyle et al. 1990 | Late Barremian-Early Aptian | Cocobeach (Gabon) | 125.0 | 141.8 | st. |
Magnoliaceae | Archaeanthus linnenbergeri Dilcher et Crane | Dilcher and Crane 1984 | Early Cenomanian | Dakota formation (Kansas, USA) | 100.5 | 117.3 | cr. |
Magnoliales | Endressinia brasiliana Mohr et Bernardes-de-Oliveira | Mohr and Bernardes-De-Oliveira 2004 | Aptian-Albian | Crato formation (Brasil) | 112.0 | 128.8 | cr. |
Lauraceae | Potomacanthus lobatus Crane, Friis et Pedersen | Crane et al. 1994 | Early and Middle Albian | Puddledock locality (Virginia, USA) | 119.0 | 135.8 | cr. |
Arecaceae | unnamed palms | Christopher 1979, Daghlian 1981 | Conacian-Santonian | Magothy formation (Maryland) | 89.8 | 106.6 | cr. |
Musella-Ensete | Ensete oregonense Manchester et Kress | Manchester and Kress 1993 | Middle Eocene | Clarno formation (Oregon, USA) | 43.0 | 59.82 | st. |
Zingiberaceae | Zingiberopsis attenuata Hickey et Peterson | Hickey and Peterson 1978 | Middle to late Paleocene | Paskapoo formation (Alberta, Canada) | 61.6 | 78.42 | cr. |
Zingiberales | Spirematospermum chandlerae Friis | Friis 1988 | Santonian-Campanian | Neuse River formation (North Carolina, USA) | 83.6 | 100.4 | cr. |
Araceae | Mayoa portugallica Friis, Pedersen et Crane | Friis et al. 2004 | Barremanian-Aptian | Almargem formation (Torres Vedras, Portugal) | 125.0 | 141.8 | cr. |
Restionaceae | unnamed Restionaceae | Jarzen 1978 | Maastrichtian | Morgan Creek (Saskatchewan, Canada) | 72.1 | 88.92 | st. |
Poaceae | unnamed grasses | Jardiné and Magloire 1965 | Maastrichtian | Senegal-Ivory Coast | 72.1 | 88.92 | cr. |
Berberidaceae | Mahonia Nutt. | Manchester 1999 | Middle Eocene | Green River formation (Colorado-Utah, USA) | 47.8 | 64.62 | cr. |
Platanaceae | Platanocarpus brookensis Crane, Pedersen, Friis et Drinnan | Crane et al. 1993 | Early and Middle Albian | Patapsco formation (Virginia, USA) | 112.0 | 128.8 | st. |
Sabiales | Insitiocarpus moravicus Knobloch et Mai | Knobloch and Mai 1986 | Early Cenomanian | Peruc-schichten (Czeck Republic) | 98.0 | 114.8 | cr. |
Iteaceae | Divisestylus brevistamineus | Hermsen et al. 2003 | Turonian | Raritan formation (New Jersey) | 93.9 | 110.7 | cr. |
Altingiaceae | Microaltingia apocarpela | Zhou et al. 2001 | Turonian | Raritan formation (New Jersey) | 93.9 | 110.7 | cr. |
Tilia | Tilia vescipites Nichols et Ott | Nichols and Ott 1978 | Middle Paleocene | Wind River basin (Wyoming, USA) | 61.6 | 78.42 | cr. |
Polygonaceae | Persicaria (L.) Mill. | Muller 1981 | Paleocene | Europe | 66.0 | 82.82 | cr. |
Clausena | Clausena Burm.f. | Pan 2010 | Late Oligocene | Guang River Flora (Ethiopia) | 27.36 | 44.18 | cr. |
Malpighiales | Paleoclusia chevalieri Crepet et Nixon | Crepet and Nixon 1998 | Turonian | Raritan formation (New Jersey) | 93.5 | 110.3 | cr. |
Fagales | Normapolles | Batten 1981, Kedves 1989, Pacltova 1966 | Late Cenomanian | Europa and USA | 94.7 | 111.5 | cr. |
Phytolaccaceae | Coahuilacarpon phytolaccoides Cevallos-Ferriz, Estrada-Ruiz et Perez-Hernandez | Cevallos-Ferriz et al. 2008 | Late Campanian | Cerro del Pueblo formation (Mexico) | 72.5 | 89.32 | st. |
Juglandaceae | Cyclocarya brownii Manchester et Dilcher | Crane et al. 1990 | Late Paleocene | Almont and Beicegel Creek (North Dakota, USA) | 59.2 | 76.02 | cr. |
Rosales | unnamed Rosidae | Crepet and Nixon 1996 | Turonian | Raritan formation (New Jersey) | 93.9 | 110.7 | cr. |
Betulaceae | Endressianthus miraensis Friis, Pedersen et Schoenenberger | Friis et al. 2003 | Campanian-Maastrichtian | Mira (Portugal) | 72.1 | 88.92 | cr. |
Fagaceae | Antiquacupula sulcata Sims, Herendeen et Crane | Sims et al. 1998 | Late Santonian | Gaillard formation (Georgia, USA) | 85.8 | 102.6 | cr. |
Salicaceae | Pseudosalix handleyi Boucher, Manchester et Judd | Boucher et al. 2003 | Middle Eocene | Green River formation (Colorado-Utah, USA) | 53.5 | 70.32 | cr. |
Ranunculales | Leefructus mirus Sun, Dilcher, Wang et Chen | Sun et al. 2011 | Barremanian-Aptian | Yixian formation (China) | 125.0 | 141.8 | cr. |
Fabaceae | Fabaceae sp. | Herendeen et al. 1992 | Early Eocene | Buchanan clay pit (Tenessee, USA) | 56.0 | 72.82 | cr. |
Styracaceae | Rehderodendron stonei Vaudois-Mieja | Vaudois-Miéja 1983 | Early Eocene | Sabals d'Anjou (France) | 56.0 | 72.82 | cr. |
Dipterocarpaceae | Shorea maomingensis Feng, Kodrul et Jin | Feng et al. 2013 | Late Eocene | Huangniuling formation (Maoming Basin, China) | 37.8 | 54.62 | cr. |
Lamiaceae | Ajuginucula smithii Reid et Chandler | Reid and Chandler 1926 | Late Eocene-Early Oligocene | Bembridge Flora (UK) | 33.9 | 50.72 | cr. |
Theaceae s.l. | Pentapetalum trifasciculandricus Martinez-Millan, Crepet et Nixon | Martinez-Millan et al. 2009 | Turonian | Raritan formation (New Jersey) | 93.9 | 110.7 | cr. |
Myrsinaceae | unnamed Myrsinaceae | Pole 1996 | Middle Miocene | Foulden Hills Diatomite (New Zealand) | 15.9 | 32.72 | cr. |
Myrtaceae | Tristaniandra alleyi Wilson et Basinger | Basinger et al. 2007 | Middle Eocene | Golden Grove - East Yatala Sand Pit (South Australia) | 47.8 | 64.62 | cr. |
Lythraceae | Decodon tiffneyi Estrada-Ruiz, Calvillo-Canadell et Cevallos-Ferriz | Estrada-Ruiz et al. 2009 | Late Campanian | Cerro del Pueblo formation (Mexico) | 72.5 | 89.32 | cr. |
Ampelocissus s.l. | Ampelocissus parvisemina Chen et Manchester | Chen and Manchester 2007 | Late Paleocene | Beicegal Creek (North Dakota, USA) | 59.2 | 76.02 | cr. |
Vitaceae | Indovitis chitaleyae Manchester, Kapgate et Wen | Manchester et al. 2013 | Maastrichtian | Mahurzari (India) | 72.1 | 88.92 | cr. |
Rosa | Rosa germerensis Edelman | Edelman 1975 | Early Eocene | Germer Basin Flora (Idaho, USA) | 56.0 | 72.82 | cr. |
Prunus | Prunus wutuensis Li, Smith, Liu, Awasthi, Yang et Li | Li et al. 2011 | Early Eocene | Wutu (China) | 56.0 | 72.82 | cr. |
Myristicaceae | Myristicacarpum chandlerae Manchester, Doyle et Sauquet | Doyle et al. 2008 | Early Eocene | London Clay (UK) | 56.0 | 72.82 | cr. |
The molecular clock hypothesis was tested using a chi2 likelihood ratio test (Felsenstein 1988) and demonstrated that the substitution rates in the combined dataset are not clock-like (P < 0.001 for all markers). The most optimal maximum likelihood tree obtained via RAxML was used as input for the penalised likelihood dating analysis in treePL (Smith and O'Meara 2012). Due to the large size dataset, treePL was preferred over other age estimation software packages such as BEAST 1.10 (Suchard et al. 2018), BEAST 2.5 (Bouckaert et al. 2019) or MrBayes 3.2 (Ronquist et al. 2012). The best-fit smoothing parameter of 0.0033 was specified empirically using an adaptation of the cross-validation test as implemented in treePL (Sanderson 2003, Smith and O'Meara 2012). An adapted methodology was set up as the original tree of over 35,000 taxa was too large for correctly calculating the best-fit smoothing parameter. In order to accurately carry out the cross-validation test, 500 replicates were made of the original dataset in which 90% of the original species were randomly pruned. Each of the replicates was then subjected to a cross-validation test under the following parameters: cvstart = 10; cvstop = 0.0001; cvmultstep = 0.9; randomcv. The best-fit smoothing parameter was selected as the variable with the highest proportion (0.0033; 12%), with the second best-fit smoothing parameter being situated at 0.0036 (11%). Smoothing parameters calculated per replicate followed a normal distribution with its optimum around 0.0033 and 0.0036 (Suppl. material 3). This strategy of calculating the smoothing parameter of very large datasets seemed effective and robust for estimating node ages of our angiosperm phylogeny using treePL. Furthermore, since there is a large amount of rate heterogeneity amongst angiosperm lineages that could likely infringe the treePL model, it is considered that a low smoothing parameter will provide a more robust analysis. So, by applying a lower penalty, potential issues that could be caused by strongly contrasting evolutionary rates within distant angiosperm clades will putatively be avoided (Stephen Smith, pers. comm.). In order to generate 95% confidence intervals for the dated nodes, we generated 1,000 bootstrap pseudo-replicates using the ML topology of the earlier heuristic analysis as constraint. Each ML bootstrap tree was then individually dated using treePL under the same parameters as for the single age estimation analysis, described above. Subsequently, the 1,000 dated bootstrap trees were imported into TreeAnnotator v1.10 in order to calculate and visualise the 95% confidence intervals for each node (Suchard et al. 2018).
Results and Discussion
The final aligned data matrix consists of 36,101 angiosperm species. matK (incl. trnK) sequences were mined for 31,391 species (87%), whereas rbcL sequences were obtained for 26,811 (74%) species (Suppl. material 1). The sequence dataset has an aligned length of 4,968 basepairs (bp) of which 4,285 (86%) belong to matK (incl. trnK) and 683 (14%) to rbcL. Within rbcL, all characters were variable (100%), whereas for matK (incl. trnK) 3,921 characters (91.5%) were variable. Support value analyses indicate that approximately 26% of the branches have a bootstrap value > 75 (Suppl. material 4Suppl. material 3). Based on the different studies that estimated the total number of flowering plants currently described (between 260,000 and 450,000 species) (Crane et al. 1995, Christenhusz and Byng 2016, Cronquist 1981, Lupia et al. 1999, Pimm and Joppa 2015, Prance et al. 2000, Thorne 2002), the presented phylogeny represents between 14% and 8% of the known flowering plants, respectively. In addition, the phylogenetic tree contains 54.6% (8,399) of all currently accepted angiosperm genera and 94.5% (426) of all families of flowering plants are included, as well as all currently known angiosperm orders. As such, the current angiosperm tree can be regarded as the largest dated angiosperm phylogenetic framework that is generated by combining genuine sequence data and fossil calibration points and will be useful for large-scale ecological and biogeographical studies. Compared to the species-level-based tree of Zanne et al. (2014) and its updated version by Qian and Jin (2016), the current phylogeny is larger in size, containing more species (+4,797 species) and genera (+468). However, the phylogeny of Zanne et al. (2014) included more families and an equal number of orders. Additionally, Zanne et al. (2014)'s updated phylogeny (Qian and Jin 2016) also included 1,190 taxa of bryophytes, pteridophytes and gymnosperms, whereas the current phylogeny only contains 5 outgroup gymnosperm species. As a result, when comparing the differences in species number between both angiosperm mega-phylogenies, the current tree contains nearly 20% more flowering plant lineages (+5,987 species).
Age estimation of the large-scale angiosperm tree resulted in a dated phylogeny (Fig. 1; Suppl. material 5) that largely corresponds to the different recent angiosperm-wide dating analyses (e.g.Bell et al. 2010, Magallón et al. 2015, Smith et al. 2010, Wikström et al. 2001, Zanne et al. 2014). Even though small dissimilarities are present concerning the age of the most early diversified angiosperm lineages (see Table 1), the overall age of the different families corresponds rather well to what is known from these other studies. Differences in stem node age of large clades such as superasterids, superrosids, eudicots, monocots or magnoliids are probably due to the use of a slightly different and larger set of fossil calibration points, as well as not using tricolpate fossil pollen as maximum-age for eudicots. Compared to the angiosperm phylogeny of Zanne et al. (2014), where time-scaling was carried out with 39 fossil calibrations, the current tree contains 56 fossils in total. Although some fossils are the same between both Zanne’s study and ours (e.g. Pseudosalix handleyi, Fraxinus wilcoxiana, Spirematospermum chandlerae), several fossils that have been used to optimise the age estimation of the current megaphylogeny are carefully chosen from other dating analyses (Bell et al. 2010, Magallón et al. 2015, Smith et al. 2010).
Figure 1.
Maximum Likelihood-based angiosperm phylogram based on the combined rbcL and matK (incl. trnK) dataset.
Recently, Qian and Jin (2016) developed a novel tool (S.PhyloMaker package as implemented in the R environment) to generate artificially enriched species trees, based on an updated version of the original angiosperm mega-phylogeny of Zanne et al. (2014). According to the study of Qian and Jin (2016), the software package produces phylogenies for every species that one needs to assess in a community ecological environment. S.PhyloMaker grafts species of interest, either as a basal polytomy (regular or Phylomatic/BLADJ approach; Webb et al. 2008), or randomly branched within the existing parental clades that are found in the mega-phylogeny. Likewise, branch lengths or time-calibrated node splits of newly added taxa are also artificially estimated according to their relative position in the original mega-phylogeny. Even though the software package of Qian and Jin (2016) provides a good alternative for the lack of decent sampling of angiosperm taxa in mega-phylogenies for some ecological studies, not all ecological or evolutionary disciplines that are in need of a phylogenetic framework can rely on this methodology, as it is not based on the inclusion of original sequence data. Additionally, the current, more densely sampled phylogenetic framework could be used in the S.Phylomaker system in order to reduce the variance that is related to the random addition of new lineages, as the placement of new taxa can be more precisely carried out due to the presence of more nodes with known heights. The use of only chloroplast data for the construction of this large-scale angiosperm mega-phylogeny has, indeed, some disadvantages as chloroplasts constitute a single, linked locus that is mainly maternally inherited within angiosperms and processes such as hybridisation and subsequent introgression, as well as reticulate evolution and incomplete lineage sorting, are difficult to detect with only data from one genome (Soltis and Soltis 2009, Lee et al. 2011). This, in combination with the fact that only two gene markers were used for phylogeny reconstruction, results in making this phylogeny to be regarded as an angiosperm gene tree rather than a species tree. Despite these putative issues, the large-scale phylogenetic hypothesis, that has been constructed here, has proven to be useful for resolving large-scale evolutionary questions at angiosperm level (e.g. Dagallier et al. in press). To date, it remains a continuous challenge to increase the size of large-scale angiosperm phylogenies with new species and gene markers to create a reliable platform, in which ecological and evolutionary research can be combined with phylogenetics. The current phylogeny is a further step towards an all-encompassing angiosperm phylogeny that can be used to resolve large-scale ecological and evolutionary queries.
Supplementary Material
Supplementary Table
Steven Janssens
Data type: Species list
Brief description: Table S1. Accession numbers of rbcL and matK (incl. trnK) sequences of the species included in the angiosperm phylogeny (including information on genera, family and order). Newly obtained accessions are indicated with an asterisk.
File: oo_329737.xlsx
Constraint input topology
Steven Janssens
Data type: Constraint topology
Brief description: Constraint input topology for RAxML analyses of all angiosperms analysed in this study (incl. outgroup taxa).
File: oo_362680.tre
Proportion of smoothing parameters
Steven Janssens
Data type: graph
Brief description: Proportion of smoothing parameters calculated for each of the 500 tree replicates
File: oo_363086.pdf
Angiosperm phylogeny - ML bootstrap values
Steven Janssens
Data type: phylogeny
Brief description: Maximum Likelihood bootstrap consensus tree. Values above the branches indicate bootstrap support. Note that the support values above order level are all artificially set at 100 because of the use of a constraint backbone.
File: oo_329452.tre
Dated angiosperm phylogram
Steven Janssens
Data type: phylogeny
Brief description: Maximum Likelihood phylogram of 36101 angiosperm species (nexus file). Outgroup included. Blue bars indicate 95% confidence intervals.
File: oo_330891.tre
Acknowledgements
This study is part of the HERBAXYLAREDD project (BR/143/A3/HERBAXYLAREDD), funded by the Belgian Belspo-BRAIN program axis 4. This project is supported by Plant.ID, which has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement N° 765000. This study is also supported by the BRAIN.be BELSPO research program AFRIFORD and by the French Foundation for Research on Biodiversity (FRB) and the Provence-Alpes-Côte d’Azur region (PACA) region through the Centre for Synthesis and Analysis of Biodiversity data (CESAB) programme, as part of the RAINBIO research project (http://rainbio.cesab.org). The authors thank Kenneth Oberlander and an anonymous reviewer for improving the manuscript.
References
- Group Angiosperm Phylogeny. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Botanical Journal of the Linnean Society. 2016;181:1–20. doi: 10.1111/boj.12385. [DOI] [Google Scholar]
- Ashelford K. E., Chuzhanova N. A., Fry J. C., Jones A. J., Weightman A. J. At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Applied and Environmental Microbioogly. 2005;71:7724–7736. doi: 10.1128/AEM.71.12.7724-7736.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Basinger J. F., Christophel D. C. Fossil flowers and leaves of the Ebenaceae from the Eocene of southern Australia. Canadian Journal of Botany. 1985;63:1825–1843. doi: 10.1139/b85-258. [DOI] [Google Scholar]
- Basinger J. F., Greenwood D. R., Wilson P. G., Chistophel D. C. Fossil flowers and fruits of capsular Myrtaceae from the Eocene of South Australia. Canadian Journal of Botany. 2007;85:204–215. doi: 10.1139/B07-001. [DOI] [Google Scholar]
- Batten D. J. Stratigraphy, palaeogeography and evolutionary significance of Late Cretaceous and Early Tertiary Normapolles pollen. Review of Palaeobotany and Palynology. 1981;35:125–137. doi: 10.1016/0034-6667(81)90104-4. [DOI] [Google Scholar]
- Bell C. D., Soltis D. E., Soltis P. S. The age of the angiosperms: A molecular timescale without a clock. Evolution. 2005;59:1245–1258. doi: 10.1554/05-005. [DOI] [PubMed] [Google Scholar]
- Bell C. D., Soltis D. E., Soltis P. S. The age and diversification of the angiosperms re-visited. American Journal of Botany. 2010;97:1296–1303. doi: 10.3732/ajb.0900346. [DOI] [PubMed] [Google Scholar]
- Boucher L. D., Manchester S,, Judd WS, An extinct genus of Salicaceae based on twigs with attached flowers, fruits, and foliage from the Eocene Green River Formation of Utah and Colorado, USA. American Journal of Botany. 2003;90:1389–1399. doi: 10.3732/ajb.90.9.1389. [DOI] [PubMed] [Google Scholar]
- Bouckaert R., Vaughan T. G., Barido-Sottani J., Duchêne S., Fourment M., Gavryushkina A., Heled J., Jones G., Kühnert D., De Maio N., Matschiner M., Mendes F. K., Müller N. F., Ogilvie H., Plessis L., Popinga A., Rambaut A., Rasmussen D., Siveroni I., Suchard M. A., Wu C. H., Xie D., Zhang C., Stadler T., Drummond A. J. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology. 2019;15:1006650. doi: 10.1371/journal.pcbi.1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boyle B., Hopkins N., Lu Z., Raygoza Garay JA., Mozzherin D., Rees T., Matasci N., Narro M. L., Piel W. H., McKay S. J., Lowry S., Freeland C., Peet RK., Enquist B. J. The taxonomic name resolution service: an online tool for automated standardization of plant names. BMC bioinformatics. 2013;14:16. doi: 10.1186/1471-2105-14-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Call V. B., Dilcher D. L. Investigations of angiosperms from the Eocene of southeastern North America: Samaras of Fraxinus wilcoxiana Berry. Review of Palaeobotany and Palynology. 1992;74:249–266. doi: 10.1016/0034-6667(92)90010-E. [DOI] [Google Scholar]
- Cayuela L., Granzow-de la Cerda Í., Albuquerque F. S., Golicher D. J. taxonstand: An r package for species names standardisation in vegetation databases. Methods in Ecology and Evolution. 2012;3:1078–1083. doi: 10.1111/j.2041-210X.2012.00232.x. [DOI] [Google Scholar]
- Cevallos-Ferriz S. R.S., Estrada-Ruiz E., Perez-Hernandez B. R. Phytolaccaceae infructescence from Cerro del Pueblo Formation, Upper Cretaceous (Late Campanian) American Journal of Botany. 2008;95:77–83. doi: 10.3732/ajb.95.1.77. [DOI] [PubMed] [Google Scholar]
- Chamberlain S., Szocs E, Boettiger C, Ram K, Bartomeus I, Baumgartner J, Foster Z, O’Donnell J Taxize: taxonomic information from around the web. Version 0.7.8. https://github.com/ropensci/taxize. 2016
- Chave J., Chust G., Thébaud C. The importance of phylogenetic structure in biodiversity studies. In: Storch D., Marquet P., Braun J., editors. Scaling Biodiversity. Cambridge University Press; Cambridge: 2007. 16 [Google Scholar]
- Chen I., Manchester SR. Seed morphology of modern and fossil Ampelocissus (Vitaceae) and implications for phytogeography. American Journal of Botany. 2007;94:1534–1553. doi: 10.3732/ajb.94.9.1534. [DOI] [PubMed] [Google Scholar]
- Christenhusz M. J.M., Byng J. W. The number of known plants species in the world and its annual increase. Phytotaxa. 2016;261:201–217. doi: 10.11646/phytotaxa.261.3.1. [DOI] [Google Scholar]
- Christopher R. A. Normapolles and triporate pollen assemblages from the Raritan and Magothy formations (upper Cretaceous) of New Jersey. Palynology. 1979;3:73–122. doi: 10.1080/01916122.1979.9989185. [DOI] [Google Scholar]
- Couvreur T. L.P., Pirie M. D., Chatrou L. W., Saunders R. M., Su Y. C., Richardson J. E., Erkens R. H. Early evolutionary history of the flowering plant family Annonaceae: Steady diversification and boreotropical geodispersal. Journal of Biogeography. 2011;38:664–680. doi: 10.1111/j.1365-2699.2010.02434.x. [DOI] [Google Scholar]
- Crane P. R., Manchester S. R.,, Dilcher D. L., A preliminary survey of fossil leaves and well-preserved reproductive structures from the Sentinel Butte Formation (Paleocene) near Almont, North Dakota. Fieldiana Geology. 1990;20:1–63. [Google Scholar]
- Crane P. R., Pedersen K. R., Friis E. M., Drinnan A. N. Early Cretaceous (early to middle Albian) platanoid inflorescences associated with Sapindopsis leaves from the Potomac Group of North America. Systematic Botany. 1993;18:328–344. doi: 10.2307/2419407. [DOI] [Google Scholar]
- Crane P. R., Friis E. M., Pedersen K. R. Paleobotanical evidence on the early radiation of magnoliid angiosperms. Plant Systematics and Evolution. 1994;8:51–72. [Google Scholar]
- Crane P. R., Friis E. M., Pedersen K. R. The origin and early diversification of angiosperms. Nature. 1995;374:27–33. doi: 10.1038/374027a0. [DOI] [Google Scholar]
- Crepet WL., Nixon KC. The fossil history of stamens. In: D’Arcy WG., Keating RC., editors. The anther: form, function and phylogeny. Cambridge University Press; Cambridge, UK: 1996. 25-27 [Google Scholar]
- Crepet W. L., Nixon K. C. Fossil Clusiaceae from the Late Cretaceous (Turonian) of New Jersey and implications regarding the history of bee pollination. American Journal of Botany. 1998;85:1122–1133. doi: 10.2307/2446345. [DOI] [PubMed] [Google Scholar]
- Cronquist A. The evolution and classification of flowering plants. Columbia University Press; New York: 1981. [Google Scholar]
- Dagallier Léo‐Paul, Janssens Steven, Dauby Gilles, Blach‐Overgaard Anne, Mackinder Barbara, Droissart Vincent, Svenning Jens‐Christian, Sosef Marc, Stévart Tariq, Harris David, Sonké Bonaventure, Wieringa Jan, Hardy Olivier, Couvreur Thomas. Cradles and museums of generic plant diversity across tropical Africa. Journal of Biogeography. in press doi: 10.1111/nph.16293. [DOI] [PMC free article] [PubMed]
- Daghlian C. P. A review of the fossil record of monocotyledons. Botanical Review. 1981;47:517–555. doi: 10.1007/BF02860540. [DOI] [Google Scholar]
- Dilcher DL., Crane PR. Archaeanthus: an early angiosperm from the Cenomanian of the Western Interior of North America. Annals of the Missouri Botanical Garden. 1984;71:351–383. doi: 10.2307/2399030. [DOI] [Google Scholar]
- Doyle J. A., Hotton C. L., Ward J. V. Early Cretaceous tetrads, zonasulculate pollen, and Winteraceae. 1. Taxonomy, morphology, and ultrastructure. American Journal of Botany. 1990;77:1544–1557. doi: 10.1002/j.1537-2197.1990.tb11395.x. [DOI] [Google Scholar]
- Doyle J. A., Manchester SA., Souquet H A seed related to Myristicaceae in the Early Eocene of South England. Systematic Botany. 2008;33:636–646. doi: 10.1600/036364408786500217. [DOI] [Google Scholar]
- Edelman DW. The Eocene Germer Basin flora of south-central Idaho. MSc thesis, University of Idaho, Moscow, ID, USA.; 1975. [Google Scholar]
- Estrada-Ruiz E., Calvillo-Canadell L., Cevallos-Ferriz S. R.S. Upper Cretaceous aquatic plants from Northern Mexico. Aquatic Botany. 2009;90:283–288. [Google Scholar]
- Fay M. F., Swensen S. M., Chase M. W. Taxonomic affinities of Medusagyne oppositifolia (Medusagynaceae). Kew Bulletin. 1997;52:111–120. doi: 10.2307/4117844. [DOI] [Google Scholar]
- Felsenstein J. Phylogenies and quantitative characters. Annual Review of Ecology and Systematics. 1988;19:445–471. doi: 10.1146/annurev.es.19.110188.002305. [DOI] [Google Scholar]
- Feng X., Tang B., Kodrul T. M., Jin J. Winged fruits and associated leaves of Shorea (Dipterocarpaceae) from the Late Eocene of South China and their phytogeographic and paleoclimatic implications. American Journal of Botany. 2013;100:574–581. doi: 10.3732/ajb.1200397. [DOI] [PubMed] [Google Scholar]
- Friis EM., Pedersen KR, Crane PR. Fossil evidence of water lilies (Nymphaeales) in the Early Cretaceous. Nature. 2001;410:357–360. doi: 10.1038/35066557. [DOI] [PubMed] [Google Scholar]
- Friis EM., Pedersen KR., Crane PR. Araceae from the Early Cretaceous of Portugal: evidence on the emergence of monocotyledons. Proceedings of the National Academy of Sciences of the USA. 2004;101:16565–16570. doi: 10.1073/pnas.0407174101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friis E. M. Spirematospermum chandlerae sp. nov., an extinct species of Zingiberaceae from the North American Cretaceous. Tertiary Research. 1988;9:7–12. [Google Scholar]
- Friis E. M., Pedersen K. R., Schönenberger J. Endressianthus, a new Normapolles-producing plant genus of Fagalean affinity from the Late Cretaceous of Portugal. International Journal of Plant Sciences. 2003;164:201–223. doi: 10.1086/376875. [DOI] [Google Scholar]
- Graham A. New records of Pelliciera (Theaceae/Pelliceriaceae) in the Tertiary of the Caribbean. Biotropica. 1977;9:48–52. doi: 10.2307/2387858. [DOI] [Google Scholar]
- Graham C., Fine P. Phylogenetic beta diversity: linking ecological and evolutionary processes across space and time. Ecology Letters. 2008;11:1265–1277. doi: 10.1111/j.1461-0248.2008.01256.x. [DOI] [PubMed] [Google Scholar]
- Hardy O. J. Testing the spatial phylogenetic structure of local communities: statistical performances of different null models and test statistics on a locally neutral community. Journal of Ecology. 2008;96:914–926. doi: 10.1111/j.1365-2745.2008.01421.x. [DOI] [Google Scholar]
- Herendeen PS., Crepet WL., Dilcher DL. The fossil history of the Leguminosae: phylogenetic and biogeographic implications. In: Herendeen PS., Dilcher DL., editors. Advances in Legume systematics, Part 4: The Fossil Record. Royal Botanic Gardens; Kew, UK: 1992. 303–316. [Google Scholar]
- Hermsen E., Gandolfo M. A., Nixon K. C., Crepet W. L. Divisestylus gen. nov. (aff. Iteaceae), a fossil saxifrage from the Late Cretaceous of. American Journal of Botany. 2003;90:1373–1388. doi: 10.3732/ajb.90.9.1373. [DOI] [PubMed] [Google Scholar]
- Hickey L. J., Peterson A. K. Zingiberopsis, a fossil genus of the ginger family from the Late Cretaceous to Early Eocene sediments of western interior North America. Canadian Journal of Botany. 1978;56:1136–1152. doi: 10.1139/b78-128. [DOI] [Google Scholar]
- Hilu K. W., Borsch T., Müller K., Soltis D. E., Soltis P. S., Savolainen V., Chase M. W., Powell M. P., Alice L. A., Evans R., Sauquet H., Neinhuis C., Slotta T. A.B., Rohwer J. G., Campbell C. S., Chatrou L. W. Angiosperm phylogeny based on matK sequence information. American Journal of Botany. 2003;90:1758–1776. doi: 10.3732/ajb.90.12.1758. [DOI] [PubMed] [Google Scholar]
- Hinchliff C. E., Smith S. A., Allman J. F., Burleigh J. G., Chaudhary R., Coghill L. M., Crandall K. A., Deng J., Drew B. T., Gazis R., Gude K., Hibbett D. S., Katz L. A., Laughinghouse H. D., McTavish E. J., Midford P. E., Owen C. L., Ree R. H., Rees J. A., Soltis D. E., Williams T., Cranston K. A. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proceedings of the National Academy of Sciences, USA. 2015;112:12764–12769. doi: 10.1073/pnas.1423041112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janssens S. B., Knox E. B., Huysmans S., Smets E. F., Merckx V. S.F.T. Rapid radiation of Impatiens: Result of a global climate change. Molecular Phylogenetics and Evolution. 2009;52:806–824. doi: 10.1016/j.ympev.2009.04.013. [DOI] [PubMed] [Google Scholar]
- Janssens S. B., Vandelook F., De Langhe E., Verstraete B., Smets E., Vandenhouwe I., Swennen R. Evolutionary dynamics and biogeography of Musaceae reveal a correlation between the diversification of the banana family and the geological and climatic history of Southeast Asia. New Phytologist. 2016;210:1453–65. doi: 10.1111/nph.13856. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jardiné S., Magloire H. Palynologie et stratigraphie du Crétacé des Bassin du Sénégal et Cote d'Ivoire. Mémoires du Bureau de Recherches Géologiques et Miniéres. 1965;32:187–245. [Google Scholar]
- Jarzen D. M. Some Maastrichtian palynomorphs and their phytogeographical and paleoecological implications. Palynology. 1978;2:29–38. doi: 10.1080/01916122.1978.9989163. [DOI] [Google Scholar]
- Katoh K., Misawa K., Kuma K., Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on Fourier transform. Nucleic Acids Research. 2002;30:3059–3066. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kedves M. Evolution of the Normapolles complex. In: Crane PR., Blackmore S., editors. Evolution, Systematics, and Fossil History of the Hamamelidae, 1-7. Systematics Association Special Volume. 40b. Clarendon Press; Oxford: 1989. 1-7 [Google Scholar]
- Kissling W. D. Has frugivory influenced the macroecology and diversification of a tropical keystone plant family? Research Ideas and Outcomes. 2017;3:e14944. doi: 10.3897/rio.3.e14944. [DOI] [Google Scholar]
- Knobloch E., Mai D. H. Monographie der Fruchte and Samen in der Kreide von Mitteleuropa. Rozpravy Ustredniho Ustavu Geologickeho Praha. 1986;47:1–219. [Google Scholar]
- Kress W. J., Wurdack K. J., Zimmer E. A., Weigt L. A., Janzen D. H. Use of DNA barcodes to identify flowering plants. Proceedings of the National Academy of Sciences USA. 2005;102:8369–8374. doi: 10.1073/pnas.0503123102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee Ernest, Cibrian-Jaramillo Angelica, Kolokotronis Sergios-Orestis, Katari Manpreet, Stamatakis Alexandros, Ott Michael, Chiu Joanna, Little Damon, Stevenson Dennis, McCombie W. Richard, Martienssen Robert, Coruzzi Gloria, DeSalle Rob. A Functional Phylogenomic View of the Seed Plants. PlosONE. 2011 doi: 10.1371/journal.pgen.1002411. [DOI] [PMC free article] [PubMed]
- Little D. P., Barrington D. S. Major evolutionary events in the origin and diversification of the fern genus Polystichum (Dryopteridaceae). American Journal of Botany. 2003;90:508–514. doi: 10.3732/ajb.90.3.508. [DOI] [PubMed] [Google Scholar]
- Li Y., Smith T., Liu C. J., Awasthi N., Yang J., Wang Y. F., Li C. S. Endocarps of Prunus (Rosaceae: Prunoideae) from the early Eocene of Wutu, Shandong Province, China. Taxon. 2011;60:555–564. doi: 10.1002/tax.602021. [DOI] [Google Scholar]
- Lupia R., Lidgard S, Crane PR Comparing palynological abundance and diversity: Implications for biotic replacement during the Cretaceous angiosperm radiation. Paleobiology. 1999;25:305–340. doi: 10.1017/S009483730002131X. [DOI] [Google Scholar]
- Magallón S., Castillo A. Angiosperm diversification through time. American Journal of Botany. 2009;96:349–365. doi: 10.3732/ajb.0800060. [DOI] [PubMed] [Google Scholar]
- Magallón S. A review of the effect of relaxed clock method, long branches, genes, and calibrations in the estimation of angiosperm age. Botanical Sciences. 2014;92:1–22. doi: 10.17129/botsci.37. [DOI] [Google Scholar]
- Magallón S., Gomez-Acevedo S, Sanches-Reyes LL A metacalibrated time‐tree documents the early rise of flowering plant phylogenetic diversity. New Phytologist. 2015;207:437–453. doi: 10.1111/nph.13264. [DOI] [PubMed] [Google Scholar]
- Mai D. H. Entwicklung der Wasser- und Sumpfpflanzen-Gesellschaften Europas von der Kreide bis ins Quartär. Flora. 1985;176:449–511. doi: 10.1016/S0367-2530(17)30141-X. [DOI] [Google Scholar]
- Mai D. H. Neue Fruchte und Samen aus Palaozanen Ablagerungen Mitteleuropas. Feddes Repertorium. 1987;98:197–229. [Google Scholar]
- Manchester SR., Kress JW. Fossil bananas (Musaceae): Ensete oregonense sp. nov. from the Eocene of western North America and its phytogeographic significance. American Journal of Botany. 1993;80:1264–1272. doi: 10.1002/j.1537-2197.1993.tb15363.x. [DOI] [Google Scholar]
- Manchester SR. Biogeographical relationships of North American Tertiary floras. Annals of the Missouri Botanical Garden. 1999;86:472–522. doi: 10.2307/2666183. [DOI] [Google Scholar]
- Manchester SR., Kapgate DK., Wen J Oldest fruits of the grape family (Vitaceae) from the Late Cretaceous Deccan Cherts of India. American Journal of Botany. 2013;100:1849–1859. doi: 10.3732/ajb.1300008. [DOI] [PubMed] [Google Scholar]
- Martinez-Millan M., Crepet W. L., Nixon K. C. Pentapetalum trifasciculandricus gen. et sp. nov., a thealean fossil flower from the Raritan Formation, New Jersey,USA (Turonian, Late Cretaceous) American Journal of Botany. 2009;96:933–949. doi: 10.3732/ajb.0800347. [DOI] [PubMed] [Google Scholar]
- Mohr B. A.R., Bernardes-De-Oliveira M. E.C. Endressinia brasiliana, a magnolialean angiosperm from the Lower Cretaceous Crato Formation (Brazil) International Journal of Plant Sciences. 2004;165:1121–1133. doi: 10.1086/423879. [DOI] [Google Scholar]
- Moore M., Bell C., Soltis P., Soltis D. Using plastid genome‐scale data to resolve enigmatic relationships among basal angiosperms. Proceedings of the National Academy of Sciences USA. 2007;104:19363–19368. doi: 10.1073/pnas.0708072104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Muller J. Fossil pollen records of extant angiosperms. The Botanical Review. 1981;47:1–142. doi: 10.1007/BF02860537. [DOI] [Google Scholar]
- Nichols D. J., Ott H. L. Biostratigraphy and evolution of the Momipites-Caryapollenites lineage in the Early Terti-ary in the Wind River Basin, Wyoming. Palynology. 1978;2:93–112. doi: 10.1080/01916122.1978.9989167. [DOI] [Google Scholar]
- Pacltova B. Pollen grains of angiosperms in the Cenomanian Peruc Formation in Bohemia. Palaeobotanist. 1966;15:52–54. [Google Scholar]
- Pan A. D. Rutaceae leaf fossils from the Late Oligocene (27.23 Ma) Guang River flora of northwestern Ethiopia. Review of Palaeobotany and Palynology. 2010;159:188–194. doi: 10.1016/j.revpalbo.2009.12.005. [DOI] [Google Scholar]
- Parham J. F., Donoghue P. C., Bell C. J., Calway T. D., Head J. J., Holroyd P. A., Inoue J. G., Irmis R. B., Joyce W. G., Ksepka D. T., Patane J. S.L., Smith N. D., Tarver J. E., Tuinen M., Yang Z., Angielczyk K. D., Greenwood J. M., Hipsley C. A., Jacobs L., Makovicky P. J., Müller J., Smith K. T., Theodor J. M., Warnock R. C.M., Benton M. J. Best practices for justifying fossil calibrations. Systematic Biology. 2012;61:346–359. doi: 10.1093/sysbio/syr107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips S. J., Dudik M., Elith J., Graham C. H., Lehmann A., Leathwick J., Ferrier S. Sample selection bias and presence-only distribution models: Implications for background and pseudo-absence data. Ecological Applications. 2009;19:181–197. doi: 10.1890/07-2153.1. [DOI] [PubMed] [Google Scholar]
- Piel K. M. Palynology of Oligocene sediments from central British Columbia. Canadian Journal of Botany. 1971;49:1885–1920. doi: 10.1139/b71-266. [DOI] [Google Scholar]
- Pimm S. L., Joppa L. N. How many plant species are there, where are they and at what rate are they going extinct? Annals of the Missouri Botanical Garden. 2015;100:170–176. doi: 10.3417/2012018. [DOI] [Google Scholar]
- Pole M. Plant macrofossils from the Foulden Hills Diatomite (Miocene), Central Otago, New Zealand. Journal of The Royal Society of New Zealand. 1996;26:1–39. doi: 10.1080/03014223.1996.9517503. [DOI] [Google Scholar]
- Posada D. jModelTest: phylogenetic model averaging. Molecular Biology and Evolution. 2008;25:1253–1256. doi: 10.1093/molbev/msn083. [DOI] [PubMed] [Google Scholar]
- Prance G., Beentje H., Dransfield J., Johns R. The tropical flora remains undercollected. Annals of the Missouri Botanical Garden. 2000;87:67–71. doi: 10.2307/2666209. [DOI] [Google Scholar]
- Qian H., Jin Y. An updated megaphylogeny of plants, a tool for generating plant phylogenies and an analysis of phylogenetic community structure. The Plant Journal. 2016;9:233–239. [Google Scholar]
- Team R Development Core. R Foundation for Statistical Computing.; 2009. R: A language and environment for statistical computing. Vienna, Austria: [Google Scholar]
- Reid EM., Chandler MEJ. Catalogue of Cainozoic plants in the department of Geology. Vol I. The Bembridge flora. British Museum (Natural History); London: 1926. [DOI] [Google Scholar]
- Revell L. J., Harmon L. J., Collar D. C. Phylogenetic signal, evolutionary process, and rate. Systematic Biology. 2008;57:591–601. doi: 10.1080/10635150802302427. [DOI] [PubMed] [Google Scholar]
- Revell L. J. phytools: An R package for phylogenetic comparative biology (and other things). Methods in Ecology and. Evolution. 2012;3:217–223. [Google Scholar]
- Ronquist F., Teslenko M., Mark P., Ayres D. L., Darling A., Höhna S., Larget B., Liu L., Suchard M. A., Huelsenbeck J. P. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology. 2012;61:539–542. doi: 10.1093/sysbio/sys029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roure B., Baurain D., Philippe H. Impact of missing data on phylogenies inferred from empirical phylogenomic data sets. Molecular Biology and Evolution. 2013;30:197–214. doi: 10.1093/molbev/mss208. [DOI] [PubMed] [Google Scholar]
- Sanderson M. J., Shaffer H. B. Troubleshooting molecular phylogenetic analyses. Annual Review of Ecology and Systematics. 2002;33:49–72. doi: 10.1146/annurev.ecolsys.33.010802.150509. [DOI] [Google Scholar]
- Sanderson M. J. r8s: Inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 2003;19:301–302. doi: 10.1093/bioinformatics/19.2.301. [DOI] [PubMed] [Google Scholar]
- Shen Y. Y., Chen X., Murphy R. W. Assessing DNA barcoding as a tool for species identification and data quality control. PLoS ONE. 2013;8:e57125. doi: 10.1371/journal.pone.0057125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sims H., Herendeen P., Crane P. New fenus of fossil Fagaceae from the Santonian (Late Cretaceous) of Central Georgia, U.S.A. International Journal of Plant Sciences. 1998;159:391–404. doi: 10.1086/297559. [DOI] [Google Scholar]
- Smith S. A., Beaulieu J. M., Donoghue M. J. An uncorrelated relaxed-clock analysis suggests an earlier origin for flowering plants. Proceedings of the National Academy of Sciences USA. 2010;107:5897–5902. doi: 10.1073/pnas.1001225107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith S. A., O'Meara B. C. treePL: divergence time estimation using penalized likelihood for large phylogenies. Bioinformatics. 2012;28:2689–2690. doi: 10.1093/bioinformatics/bts492. [DOI] [PubMed] [Google Scholar]
- Smith S. A., Brown J. W. Constructing a broadly inclusive seed plant phylogeny. American Journal of Botany. 2018;105:302–314. doi: 10.1002/ajb2.1019. [DOI] [PubMed] [Google Scholar]
- Soltis Doug, Soltis Pam. The role of hybridization in plant speciation. Annual Review of Plant Biology. 2009;60:561–588. doi: 10.1146/annurev.arplant.043008.092039. [DOI] [PubMed] [Google Scholar]
- Soltis P., Soltis D., Savolainen V., Crane P., Barraclough T. Rate heterogeneity among lineages of tracheophytes: integration of molecular and fossil data and evidence for molecular living fossils. Proceedings of the National Academy of Sciences USA. 2002;99:4430–4435. doi: 10.1073/pnas.032087199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–2690. doi: 10.1093/bioinformatics/btl446. [DOI] [PubMed] [Google Scholar]
- Suchard M. A., Lemey P., Baele G., Ayres D. L., Drummond A. J., Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evolution. 2018;4(16) doi: 10.1093/ve/vey016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun G., Dilcher D. L., Wang H., Chen Z. A eudicot from the Early Cretaceous of China. Nature. 2011;471:625–628. doi: 10.1038/nature09811. [DOI] [PubMed] [Google Scholar]
- Takahashi M., Crane P. R., Manchester SR. Hironoia fusiformis gen. et sp. nov.: A cornalean fruit from the Kamikitaba locality (Upper Cretaceous, Lower Coniacian) in northeastern Japan. Journal of Plant Research. 2002;115:463–473. doi: 10.1007/s10265-002-0062-6. [DOI] [PubMed] [Google Scholar]
- Tel-Zur N., Abbo S., Myslabodski D., Mizrahi Y. Modified CTAB procedure for DNA isolation from epiphytic cacti of the genera Hylocereus and Selenicereus (Cactaceae). Plant Molecular Biology Reporter. 1999;17:249–254. doi: 10.1023/A:1007656315275. [DOI] [Google Scholar]
- Thorne R. F. How many species of seed plants are there? Taxon. 2002;51:511–522. doi: 10.2307/1554864. [DOI] [Google Scholar]
- Vandelook F., Janssens S. B., Probert R. J. Relative embryo length as an adaptation to habitat and life cycle in Apiaceae. New Phytologist. 2012;195:479–487. doi: 10.1111/j.1469-8137.2012.04172.x. [DOI] [PubMed] [Google Scholar]
- Vandelook F., Janssens S. B., Matthies D. Ecological niche and phylogeny explain distribution of seed mass in the central European flora. Oikos. 2018;127:1410–1421. doi: 10.1111/oik.05239. [DOI] [Google Scholar]
- Vaudois-Miéja N. Extension paléogéographique en Europe de l’actuel genre asiatique Rehderodendron Hu (Styracacées). 125–130Comptes-Rendus des Seances de l’Academie des Sciences, Série 2: Mecanique-Physique, Chimie, Sciences de l’Univers, Sciences de la Terre. 1983;296 [Google Scholar]
- Wanntorp H. E., Brooks D. R., Nilson T., Nylin S., Ronquist F., Stearns S. C., Wedell N. Phylogenetic approach in ecology. Oikos. 1990;41:119–132. doi: 10.2307/3565745. [DOI] [Google Scholar]
- Webb C. O., Ackerly D. D., Mcpeek M. A., M.J Donoghue. Phylogenies and community ecology. Annual Review of Ecology and Systematics. 2002;33:475–505. doi: 10.1146/annurev.ecolsys.33.010802.150448. [DOI] [Google Scholar]
- Webb C. O., Ackerly D. D., Kembel S. W. Phylocom: software for the analysis of phylogenetic community structure and trait evolution. Bioinformatics. 2008;24:2098–2100. doi: 10.1093/bioinformatics/btn358. [DOI] [PubMed] [Google Scholar]
- Wehr W. C., Manchester SR. Paleobotanical significance of Eocene flowers, fruits, and seeds from Republic, Washington. Washington Geology. 1996;24:25–27. [Google Scholar]
- Wikström N., Savolainen V., Chase M. W. Evolution of the angiosperms: Calibrating the family tree. Proceedings of the Royal Society of London B Biological Sciences. 2001;268:2211–2220. doi: 10.1098/rspb.2001.1782. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilde V. Untersuchungen zur Systematik der Blattreste aus dem Mitteleozan der Grube Messel bei Darmstadt (Hessen, Bundesrepublik Deutschland) Courier Forschungsinstitut Senckenberg. 1989;115:1–213. [Google Scholar]
- Wilf P., Carvalho M. R., Gandolfo M. A., Cuneo N. R. Eocene lantern fruits from Godwanan Patagonia and the early origins of Solanaceae. Science. 2017;355:71–75. doi: 10.1126/science.aag2737. [DOI] [PubMed] [Google Scholar]
- Yao Y. G., Bravi C. M., Bandelt H. J. A call for mtDNA data quality control in forensic science. Forensic Science International. 2004;141:1–6. doi: 10.1016/j.forsciint.2003.12.004. [DOI] [PubMed] [Google Scholar]
- Zanne A. E., Tank D. C., Cornwell W. K., Eastman J. M., Smith S. A., FitzJohn R. G., McGlinn D. J., O'Meara B. C., Moles A. T., Reich P. B., Royer D. L., Soltis D. E., Stevens P. F., Westoby M., Wright I. J., Aarssen L., Bertin R. I., Calaminus A., Govaerts R., Hemmings F., Leishman M. R., Oleksyn J., Soltis P. S., Swenson N. G., Warman L., Beaulieu J. M. Three keys to the radiation of angiosperms into freezing environments. Nature. 2014;506:89–92. doi: 10.1038/nature12872. [DOI] [PubMed] [Google Scholar]
- Zhou Z., Crepet W. L., Nixon K. C. The earliest fossil evidence of the Hamamelidaceae: Late Cretaceous (Turonian) inflorescences and fruits of Altingioideae. American Journal of Botany. 2001;88:753–766. doi: 10.2307/2657028. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Table
Steven Janssens
Data type: Species list
Brief description: Table S1. Accession numbers of rbcL and matK (incl. trnK) sequences of the species included in the angiosperm phylogeny (including information on genera, family and order). Newly obtained accessions are indicated with an asterisk.
File: oo_329737.xlsx
Constraint input topology
Steven Janssens
Data type: Constraint topology
Brief description: Constraint input topology for RAxML analyses of all angiosperms analysed in this study (incl. outgroup taxa).
File: oo_362680.tre
Proportion of smoothing parameters
Steven Janssens
Data type: graph
Brief description: Proportion of smoothing parameters calculated for each of the 500 tree replicates
File: oo_363086.pdf
Angiosperm phylogeny - ML bootstrap values
Steven Janssens
Data type: phylogeny
Brief description: Maximum Likelihood bootstrap consensus tree. Values above the branches indicate bootstrap support. Note that the support values above order level are all artificially set at 100 because of the use of a constraint backbone.
File: oo_329452.tre
Dated angiosperm phylogram
Steven Janssens
Data type: phylogeny
Brief description: Maximum Likelihood phylogram of 36101 angiosperm species (nexus file). Outgroup included. Blue bars indicate 95% confidence intervals.
File: oo_330891.tre