Abstract
The phylogeny of flowering plants is now rapidly being disclosed by analysis of DNA sequence data, and currently, many Cretaceous fossils of flowering plants are being described. Combining molecular phylogenies with reference fossils of known minimum age makes it possible to date the nodes of the phylogenetic tree. The dating may be done by counting inferred changes in sequenced genes along the branches of the phylogeny and calculating change rates by using the reference fossils. Plastid DNA rbcL sequences and eight reference fossils indicate that ≈14 of the extant monocot lineages may have diverged from each other during the Early Cretaceous >100 million years B.P. The lineages are very different in size and geographical distribution and provide perspective on flowering plant evolution.
During the last 10 years, there has been a tremendous increase in our knowledge of flowering plant evolution. Many Cretaceous fossils of various groups of flowering plants have been discovered and described (1–3), and the major branching pattern of the flowering plant phylogeny has been disclosed from cladistic analysis of molecular data (4–6). Combining the increasing information on fossils with the currently available molecular phylogenies opens up the possibility of dating the entire phylogeny of flowering plants. The major diversification of the flowering plants took place during the Early Cretaceous (2, 3), and with dated phylogenies, the extant lineages that survived from this period [>100 million years (Myr) B.P.] may now be characterized. Such entities of similar age are more relevant units of comparison in evolutionary biology and historical biogeography than the orders and families of current classification.
Monocots comprise one-fourth of all flowering plants and include such familiar groups as lilies, orchids, palms, and perhaps the ecologically (grasslands) and economically (cereals) most important of all plant families, the grasses (Poaceae). A strongly supported phylogeny of monocots based on analysis of three genes is now available (6, 7), and there are sequences from the plastid DNA gene rbcL from all but a few of the ≈100 families of monocots. Several monocot fossils assignable to family or order are known from the Late Cretaceous (8–14). From these data, 14 lineages of monocot flowering plants are tentatively hypothesized herein to date back to the Early Cretaceous.
All dating methods based on molecular data involve some kind of molecular clock assumption (15) despite the fact that substitution rates are known to vary in different lineages (16, 17). In monocots, plastid DNA substitution rates have been shown to decrease in the order grasses-orchids-lilies-bromeliads-palms (18, 19). Several methods have been devised to deal with such rate variation and the problems of clock-based dating (20). One approach is to remove lineages and species showing significantly different rates until remaining lineages pass a relative-rate test (21). Another approach, also followed herein, is to allow for different rates in different parts of the tree. Sanderson (22) developed this approach even further, allowing for different rates among all of the branches of the tree, under the assumption that such differences are small between adjacent branches. There are, however, no real data or analysis of branch lengths supporting this assumption. Sanderson's method is also computationally complex and not feasible for dating large trees with several reference fossils.
Herein, the focus is on divergence times for the basal nodes of the monocot phylogeny, and any precision in dating the upper nodes of the tree is not attempted. To this end, mean branch lengths from the terminals to the basal nodes of the tree are calculated. Unequal rates in different lineages are manifested as unequal branch lengths counting from the root to the terminals in phylogenetic trees, and the procedure of calculating mean branch lengths reduces the problem of unequal rates toward the base of the tree. Dating is done with a set of Cretaceous monocot fossils that may be attached to the nodes of the tree, and confidence intervals on the age estimates are calculated from the variation in branch lengths.
Materials and Methods
DNA sequences of the rbcL gene were obtained from GenBank or by sequencing material from the herbarium at Uppsala University. Eight sequences were generated for this study for terminals 2, 30, 32, 62, 64, 80, 83, and 87 as listed in Fig. 1. Sequencing procedures have been described elsewhere (23). GenBank accession numbers are given in the legend to Fig. 1. Alignment is straightforward and involves no indels. The data matrix of aligned sequences is available on request. A general outline of the phylogeny of monocots was obtained from published analyses based on three genes providing strong support for the major clades (6, 7). Complete resolution was attained by parsimony analysis (24) of the rbcL sequences with major clades constrained if supported also by the other data (4–7). The topology is seen in Fig. 1. Alternative equally parsimonious trees involve rearrangements mainly within some of the major groups and do not influence the conclusions on Early Cretaceous clades. Branch lengths (not shown in Fig. 1) were obtained by optimization of the rbcL data on the tree. Alternative optimization strategies (24) affect branch lengths and rate calibrations but have a negligible effect on the final age estimates.
The fossil record of monocots was surveyed, and all Cretaceous fossils of monocots that safely can be attached to the phylogenetic tree in Fig. 1 are listed in Table 1. Tertiary fossils were not included, because their meaningful use as reference fossils for the phylogenetic tree in Fig. 1 would necessitate a much larger sample of terminals with several representatives from each family. The fossils provide a minimum age for the nodes above which they attach. The mean distance in rbcL nucleotide changes, as expressed by branch lengths, from the terminals to a node divided by its minimum age then provides an observed change rate in that particular part of the tree (25). For example, for node A (Fig. 1) the mean distance in branch lengths from the node to terminals 2–16 is divided by the age to obtain the change rate 75/69.5 = 1.08 (Table 1).
Table 1.
Taxon (material) | Node | Terminals | Distance, branch lengths | Age, Myr | Rate, distance/age |
---|---|---|---|---|---|
Tofieldiaceae/Dicolpopollis (pollen; ref. 8) | A | 15 | 75 | 69.5 | 1.08 |
Araceae/Pistia (plants; ref. 9) | B | 2 | 49 | 69.5 | 0.71 |
Cymodoceaceae/Cymodocea (plants; ref. 10) | C | 2 | 37 | 69.5 | 0.53 |
Arecaceae/Spinizonocolpites (pollen; ref. 11) | D | 31 | 85 | 89.5 | 0.95 |
Zingiberales/Spirematospermum (fruits; ref. 12) | E | 12 | 61 | 83 | 0.73 |
Typhaceae/Typha (fruits; ref. 13) | F | 2 | 26 | 69.5 | 0.37 |
Poaceae/Monoporites (pollen; ref. 14) | G | 3 | 37 | 69.5 | 0.53 |
Flagellariaceae/Joinvilleaceae/Restionaceae Milfordia (pollen; ref. 14) | H | 6 | 78 | 69.5 | 1.12 |
Shown are the taxon (material) to which the fossil is assigned, the node above which they attach in the phylogeny of Fig. 1, the number of terminals descended from the node, the mean distance in branch lengths from the terminals to the node, the age of the oldest known fossil in Myr, and the observed change rate (distance/age). Several types of palm pollen (Arecaceae) are known from the Cretaceous, and reliable records have been obtained from the Turonian (89.5 Myr). Zingiberales fruits are from the Santonian–Campanian boundary (83 Myr). The other fossils are all from the Maastrichtian, without any more precise dating, and the age is thus set to mid Maastrichtian (69.5 Myr). There is a recent report (1) of a 90-Myr-old fossil of Triuridaceae, but this family lacks the rbcL gene; this family is thus not represented in the phylogeny of Fig. 1, and the assignment to family requires further consideration (E. M. Friis, personal communication).
The different reference fossils provide different change rates; however, if there was no reason to assume any overall differences in observed rates between different parts of the tree (as investigated by ANOVA in branch lengths explained below), a mean change rate was calculated and used for the dating. Because a major source of error in phylogenetic dating is the rate calibrations (as discussed below), it is desirable to use a mean rate derived from several reference fossils rather than the individual rates. By dividing the mean distance in nucleotide changes from the terminals to other nodes with the mean change rates, an approximate dating of all of the nodes is possible. Being dependent on the species sampled as terminals, this dating is uncertain for upper nodes of the tree but should be more reliable for basal nodes where more terminals contribute to the age estimate.
Unequal branch lengths indicate that unequal rates may have been present. Specifically, if the branch lengths from the root to the terminals differ, as they certainly do, the observed change rates are unequal. ANOVA was used to investigate variation in branch lengths from the root to the terminals among and within the major clades of the phylogenetic tree. The major clades compared were Alismatales (terminals 2–16 in Fig. 1), Pandanales (terminals 17–20), Liliales (terminals 21–29), Dioscoreales (terminals 30–32), Asparagales (terminals 33–60), commelinoids (terminals 61–91), Commelinales + Zingiberales (terminals 63–74), and Poales (terminals 79–91). Branch lengths were found to be significantly longer in Commelinales + Zingiberales and in Poales. Excluding these two clades, ANOVA found no significant added variation in branch lengths among the major clades (commelinoids then restricted to terminals 61–62 and 75–78 in Fig. 1) compared with the variation detected within the major clades. Because of the significantly longer branch lengths in Commelinales + Zingiberales and in Poales, separately calculated change rates were obtained as explained above and used for the node datings within these two clades. Confidence intervals (95%) on the node datings were calculated from the variation in branch lengths as ± 1.96 standard errors for the mean branch length to the node divided by the change rate.
Results
ANOVA found branch lengths to be significantly longer in most of the Poales and in Commelinales + Zingiberales than in other monocots. However, with these two clades excluded ANOVA found no significant added variance caused by major clade membership. Because branches were found to be longer in Poales and Commelinales + Zingiberales, three mean rates were calculated and used for dating the phylogeny. These rates are (0.53 + 1.12)/2 = 0.83 for Poales, 0.73 for Commelinales + Zingiberales, and (1.08 + 0.71 + 0.53 + 0.95 + 0.37)/5 = 0.73 changes per Myr for the remaining monocots (obtained from the rate column in Table 1). Observed mean rates thus turned out to be the same (0.73) in Commelinales + Zingiberales as in the remaining monocots (excluding Poales), despite the overall differences in branch lengths detected by ANOVA.
The datings and confidence intervals are shown in Fig. 1. The figure indicates that 13 nodes (including the root) in the monocot phylogeny are older than 100 Myr, and they lead to 14 clades or lineages of extant monocots that consequently date back to the Early Cretaceous. These are (i) Acorus, the sister group of all other monocots (26), (ii) Tofieldiaceae, (iii) Araceae, (iv) most Alismatales (excluding Araceae and Tofieldiaceae), (v) Pandanales, (vi) Liliales, (vii) Dioscoreales + Nartheciaceae, (viii) Asparagales, (ix) Arecaceae, (x) Dasypogonaceae, (xi) Commelinales + Zingiberales, (xii) Bromeliaceae + Rapateaceae, (xiii) Sparganium + Typha, and (xiv) most Poales (excluding Sparganium and Typha).
Discussion
The rates are based on the rbcL nucleotide changes along the branches of the tree as inferred from character optimization in parsimony analysis and should and need not be taken as estimates of the real substitution rates. The dating method is not based on mean distances between pairs of sequences (27–29) but on mean branch lengths from the terminals to the nodes of a specific phylogeny. Attaching fossils to nodes of the phylogeny allows for calculation of change rates along the branches and dating the phylogeny, thereby bypassing pairwise sequence comparisons and estimates of real substitution rates.
Rate calibration is probably a much greater problem in molecular clock dating than are unequal rates. The issue has not received much attention (20) and frequently only single or few fossils and reference nodes are used. Uncertain dating of the reference nodes rather than unequal substitution rates is probably the reason behind the discrepancies in earlier estimates of monocot and flowering plant origin (27–29). In these earlier studies, single or few reference nodes of rather uncertain age were used, for example 50–70 Myr for a maize-wheat divergence (27) or around 1,000 Myr for a plants-animals-yeast divergence (28). Herein, several fossils have been used to alleviate this problem. Because the focus of this analysis is on the boundary between Early and Late Cretaceous, an attempt was made to include all known Cretaceous monocot fossils (Table 1) that can be attached to the phylogeny of Fig. 1.
The observed change rates for the monocot phylogeny in Fig. 1 vary from 0.37 to 1.12 changes per Myr for the whole rbcL gene (Table 1). It seems that the variation in rates stems mainly from error sources in rate calibration rather than underlying differences in real substitution rates, although there is such a difference between the Poales and other monocots. The variation may be due to inadequate sampling of terminals (partly inevitable because of extinction) leading to short branch lengths and low rates or inadequate sampling of fossils (inevitable because of incomplete preservation and discovery) leading to underestimated node ages and high rates.
The split between Acorus and the remaining monocots is estimated to be more than 134 Myr old, near the Jurassic–Cretaceous boundary (145 Myr). This result is in agreement with Sanderson's (22) dating of the split between Araceae (Spathiphyllum) and other monocots to the Jurassic–Cretaceous boundary in his analysis of land plant divergences. The result is also compatible with the estimate of the monocot–dicot divergence to 200 ± 40 Myr ago made by Wolfe et al. (27), if it is assumed that at least 160 − 134 = 26 Myr passed between monocot origin (at least 160 Myr) and the split between Acorus and other monocots (134 Myr).
The list of 14 clades identified herein should be taken as a first hypothesis. As seen from the confidence intervals (Fig. 1), there are uncertainties regarding the number of Early Cretaceous clades particularly within Alismatales and Poales. Several of the 14 lineages may be slightly younger than 100 Myr. On the other hand, there are some clades within the Poales that could be more than 100 Myr considering the confidence limits, although the dating indicates that they are younger. A more robust hypothesis will eventually be obtained by adding sequences from more genes and more terminals. There are a few monocot families not represented in the phylogeny of Fig. 1 and not assigned to any order in the current classification (6), and they may attach at nodes around 100 Myr old. Near-parsimonious alternatives indicate that some of the Early Cretaceous clades may consist of two such clades more than 100 Myr old. Hence, it could be that the number of Early Cretaceous monocot clades approaches 20, given that the change rates calculated herein are not too misleading. Interestingly, the rate for monocots excluding Poales (0.73) is close to that found with the same approach in a different group of flowering plants, the Asterales (0.74; ref. 25).
The largest known clade of flowering plants is the eudicots. They have some families with an Early Cretaceous fossil record (30, 31), and clearly there are many clades of eudicots more than 100 Myr old. Preliminary calculations indicate that there may be more than twice as many as for the monocots. Phylogenetically basal flowering plants (magnoliids, neither monocots nor eudicots) comprise 15 more or less isolated orders and families not assigned to order (6). There are several Early Cretaceous fossils (2, 3), and it is likely that each of these 15 groups dates back to the Early Cretaceous; thus, the number of Early Cretaceous clades seems to be about the same as for the monocots. In all, it seems that the number of flowering plant clades that survived from the Early Cretaceous is more than 50 but less than 100. Identification of those Early Cretaceous clades is an exciting new goal for plant systematics.
Phylogenetic dating provides a new perspective on flowering plant evolution. The clades that survived from the period of early flowering plant diversification during the Early Cretaceous >100 Myr ago are not the same as those recognized as families, orders, etc. in plant classification (6). A brief look at the 14 Early Cretaceous clades of monocots listed herein indicates how different they may be. Most contain more than 1,000 species and are pantropical or worldwide but with very different distribution patterns. Asparagales include the orchids and comprise perhaps 25,000 species; the Poales comprise around 20,000 species. There are, however, only about 25 species of Tofieldiaceae and of Sparganium and Typha. Some of the clades consist of only a few species: there are two species of Acorus in the northern hemisphere and eight species of Dasypogonaceae, all in Australia. Although present distributions may be very different from those of the Cretaceous, such 100-Myr-old clades with restricted distributions and with no fossils found elsewhere may represent ancient Laurasian and Gondwanan elements and exemplify how we may be able to discover old patterns of historical biogeography.
Acknowledgments
I thank Birgitta Bremer for access to lab facilities, Nahid Heidari for sequencing several monocots, Else Marie Friis for critical evaluation of the fossil reports, Bengt Oxelman for ANOVA calculations, Tom Britton for methods of computation of confidence intervals, Sten Kaijser for various advice on statistics, and Mark W. Chase, Michael J. Sanderson, and two anonymous reviewers for comments on earlier versions of the text. The research was supported by the Swedish Natural Science Research Council.
Abbreviation
- Myr
million years
Footnotes
This paper was submitted directly (Track II) to the PNAS office.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. are listed in the legend to Fig. 1).
Article published online before print: Proc. Natl. Acad. Sci. USA, 10.1073/pnas.080421597.
Article and publication date are at www.pnas.org/cgi/doi/10.1073/pnas.080421597
References
- 1.Gandolfo M A, Nixon K C, Crepet W L, Stevenson D W, Friis E M. Nature (London) 1998;394:532–533. [Google Scholar]
- 2.Crane P R, Friis E M, Pedersen K R. Nature (London) 1995;374:27–33. [Google Scholar]
- 3.Crane P R, Friis E M, Pedersen K R. Plant Syst Evol Suppl. 1994;8:51–72. [Google Scholar]
- 4.Källersjö M, Farris J S, Chase M W, Bremer B, Fay M F, Humphries C J, Petersen G, Seberg O, Bremer K. Plant Syst Evol. 1998;213:259–287. [Google Scholar]
- 5.Soltis D E, Soltis P S, Mort M E, Chase M W, Savolainen V, Hoot S B, Morton C M. Syst Biol. 1998;47:32–42. doi: 10.1080/106351598261012. [DOI] [PubMed] [Google Scholar]
- 6.Angiosperm Phylogeny Group. Ann Mo Bot Gard. 1998;85:531–553. [Google Scholar]
- 7.Chase M W, Soltis D E, Soltis P S, Rudall P J, Fay M F, Hahn W H, Sullivan S, Joseph J, Molvray M, Kores P J, et al. In: Monocots: Systematics and Evolution. Vol. 1 of Proceedings of the Second International Conference on the Comparative Biology of Monocotyledons. Wilson K L, Morrison D A, editors. Melbourne: CSIRO; 2000. , in press. [Google Scholar]
- 8.Chmura C A. Palaeontogr Abteilung B Paläophytol. 1973;141:89–171. [Google Scholar]
- 9.Hickey L H. Am J Bot. 1991;78, Suppl. 115:6. [Google Scholar]
- 10.Voight E. In: Recent and Fossil Bryozoa. Larwood G P, Nielsen C, editors. Fredensborg, Denmark: Olsen & Olsen; 1981. pp. 281–298. [Google Scholar]
- 11.Harley M M. Dissertation. Kew, U.K.: Royal Botanic Gardens; 1996. [Google Scholar]
- 12.Rodríguez-de la Rosa R A, Cevallos-Ferriz S R S. Int J Plant Sci. 1994;155:786–805. [Google Scholar]
- 13.Knobloch E, Mai D H. Rozpr Ustred Ustavu Geol. 1986;47:1–219. [Google Scholar]
- 14.Linder H P. Kew Bull. 1987;42:297–318. [Google Scholar]
- 15.Zuckerkandl E, Pauling L. In: Evolutionary Divergence and Convergence. Bryson V, Vogel H J, editors. New York: Academic; 1965. pp. 97–166. [Google Scholar]
- 16.Bousquet J, Strauss S H, Doerksen A D, Price R A. Proc Natl Acad Sci USA. 1992;89:7844–7848. doi: 10.1073/pnas.89.16.7844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gaut B S, Muse S V, Clegg M T. Mol Phylogenet Evol. 1993;2:89–96. doi: 10.1006/mpev.1993.1009. [DOI] [PubMed] [Google Scholar]
- 18.Wilson M A, Gaut B, Clegg M T. Mol Biol Evol. 1990;7:303–314. doi: 10.1093/oxfordjournals.molbev.a040605. [DOI] [PubMed] [Google Scholar]
- 19.Gaut B S, Muse S V, Clark W D, Clegg M T. J Mol Evol. 1992;35:292–303. doi: 10.1007/BF00161167. [DOI] [PubMed] [Google Scholar]
- 20.Sanderson M J. In: Molecular Systematics of Plants II DNA Sequencing. Soltis D E, Soltis P S, Doyle J J, editors. Boston: Kluwer; 1998. pp. 242–264. [Google Scholar]
- 21.Takezaki N, Rzhetsky A, Nei M. Mol Biol Evol. 1995;12:823–833. doi: 10.1093/oxfordjournals.molbev.a040259. [DOI] [PubMed] [Google Scholar]
- 22.Sanderson M J. Mol Biol Evol. 1997;14:1218–1231. [Google Scholar]
- 23.Kårehed J, Lundberg J, Bremer B, Bremer K. Syst Bot. 2000;24:660–682. [Google Scholar]
- 24.Swofford D L. paup, Phylogenetic Analysis Using Parsimony. Champaign, IL: Illinois Natural History Survey; 1993. , Version 3.1.1. [Google Scholar]
- 25.Bremer K, Gustafsson M H G. Proc Natl Acad Sci USA. 1997;94:9188–9190. doi: 10.1073/pnas.94.17.9188. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Duvall M R, Clegg M T, Chase M W, Clark W D, Kress W J, Hills H G, Eguiarte L, Smith J F, Gaut B S, Zimmer E A, et al. Ann Mo Bot Gard. 1993;80:607–619. [Google Scholar]
- 27.Wolfe K H, Gouy M, Yang Y-W, Sharp P M, Li W-H. Proc Natl Acad Sci USA. 1989;86:6201–6205. doi: 10.1073/pnas.86.16.6201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Martin W, Gierl A, Saedler H. Nature (London) 1989;339:46–48. [Google Scholar]
- 29.Martin W, Lydiate D, Brinkmann H, Forkmann G, Saedler H, Cerff R. Mol Biol Evol. 1993;10:140–162. doi: 10.1093/oxfordjournals.molbev.a039989. [DOI] [PubMed] [Google Scholar]
- 30.Drinnan A N, Crane P R, Friis E M, Pedersen K R. Am J Bot. 1991;78:153–176. [Google Scholar]
- 31.Crane P R, Pedersen K R, Friis E M, Drinnan A N. Syst Bot. 1993;18:328–344. [Google Scholar]