Abstract
The Zingiberales are an iconic order of monocotyledonous plants comprising eight families with distinctive and diverse floral morphologies and representing an important ecological element of tropical and subtropical forests. While the eight families are demonstrated to be monophyletic, phylogenetic relationships among these families remain unresolved. Neither combined morphological and molecular studies nor recent attempts to resolve family relationships using sequence data from whole plastomes has resulted in a well-supported, family-level phylogenetic hypothesis of relationships. Here we approach this challenge by leveraging the complete genome of one member of the order, Musa acuminata, together with transcriptome information from each of the other seven families to design a set of nuclear loci that can be enriched from highly divergent taxa with a single array-based capture of indexed genomic DNA. A total of 494 exons from 418 nuclear genes were captured for 53 ingroup taxa. The entire plastid genome was also captured for the same 53 taxa. Of the total genes captured, 308 nuclear and 68 plastid genes were used for phylogenetic estimation. The concatenated plastid and nuclear dataset supports the position of Musaceae as sister to the remaining seven families. Moreover, the combined dataset recovers known intra- and inter-family phylogenetic relationships with generally high bootstrap support. This is a flexible and cost effective method that gives the broader plant biology community a tool for generating phylogenomic scale sequence data in non-model systems at varying evolutionary depths.
Keywords: Array-based capture, Ancient radiation, Exon capture, High-throughput sequencing, HTS, Heliconiaceae, Musaceae, Gingers, Banana
Introduction
Zingiberales are a diverse group of tropical monocots, including important tropical crop plants (e.g., ginger, turmeric, cardamom, bananas) and ornamentals (e.g., cannas, bird-of-paradise, prayer plants). Eight families are recognized with a total of ca. 2500 species. Fossil zingibers are known since the Cretaceous, and show a mix of characters from Musaceae and Zingiberaceae (Friis, 1988; Rodriguez-de la Rosa & Cevallos-Ferriz, 1994; Iles et al., 2015) on the basis of fruits, seeds, leaves, rhizomes, and phytoliths (Friis, Crane & Pedersen, 2011; Chen & Smith, 2013). Zingiberales are thought to have diverged from the sister order Commelinales (sensu Angiosperm Phylogeny Group, 2003) between 80–124 Ma, with diversification into the major lineages occurring from ca. 60–100 Ma (Kress & Specht, 2006; Magallón et al., 2015). However, relationships among the families are not well-resolved using multi-gene phylogenies (Kress et al., 2001; Barrett et al., 2014), likely due to this early rapid radiation. Specifically, the relationship between Musaceae, Strelitziaceae + Lowiaceae, Heliconiaceae, and the remaining four families, which form a well-supported monophyletic group (i.e., the ‘ginger clade’), have conflicting support among studies. Whole plastid data for 14 taxa spanning the eight families still failed to resolve the early diverging branches of the phylogeny, perhaps owing to limited sampling and a lack of phylogenetic signal in the plastome (Barrett et al., 2014). However challenging to resolve, rapid evolutionary radiations are thought to be a common theme across the tree of life and are thought to explain poorly resolved phylogenies in many groups including insects, birds, bees, turtles, mammals, and angiosperms (Whitfield & Lockhart, 2007; Whitfield & Kjer, 2008).
The advent of high throughput sequencing and methods that extend the utility of new sequencing technology to non-model organisms has enabled sequence-based understanding of evolutionary relationships in previously intractable groups (Crawford et al., 2012; Faircloth et al., 2012; Lemmon, Emme & Lemmon, 2012; Bi et al., 2013). Specifically, for phylogenetic studies, multiple genes containing appropriate levels of sequence divergence can now be obtained for many phylogenetically distant individuals. Various genome enrichment methods, using hybridization to capture a targeted set of genes based on appropriately designed nucleotide probes, have enabled targeted sets of hundreds or thousands of loci to be sequenced in parallel for multiple individuals. However, the ability to capture loci across relatively deep phylogenetic scales has remained challenging because of the inverse relationship between capture efficiency and the evolutionary distance from the individual(s) used to design the probes (Bi et al., 2012; Lemmon, Emme & Lemmon, 2012; Peñalba et al., 2014; Weitemier et al., 2014). For very deep divergences in animals, to understand amniote evolution or deep divergences in vertebrate evolution for example, ultra-conserved elements (Faircloth et al., 2012) and anchored hybrid enrichment (Lemmon, Emme & Lemmon, 2012) have been used to target conserved loci that are flanked by less conserved regions. However, these regions were developed using animal genomes and are unsuitable for use in plants (Reneker et al., 2012).
Historical whole genome duplication followed by fractionation and diploidization, genome-level processes that are common during plant evolution and occur in a lineage-specific manner, make it likely that loci with known orthology will need to be tested and developed separately for each plant lineage. Some methods have been developed for lineage specific capture, such as whole exome capture (Bi et al., 2012) that uses a transcriptome sequence and a relatively closely related sequenced genome to design lineage-specific baits. This approach was modified and recently used in plants (Weitemier et al., 2014). However, the success of these approaches to capture targeted genes is limited by the distance of the samples to the target transcriptome. A more flexible approach uses PCR products to generate a home-made, in-solution capture (Maricic, Whitten & Pääbo, 2010; Peñalba et al., 2014), but this requires some prior knowledge of locus sequence and primer optimization and likely is most useful to target 10–50 loci with known phylogenetic utility.
In the case of the Zingiberales, with possibly over 100 Myr of divergence since the initial lineage diversification leading to the modern families, it is necessary to design a set of probes that can capture sequences with a relatively high percentage of polymorphisms, yet still allow the reliable assignment of orthology to captured sequences. In order to do this, we used transcriptomes that were generated as part of the Monocot Tree of Life Project (MonAToL: http://www.botany.wisc.edu/monatol/) or One Thousand Plant Transcriptomes (OneKP: https://sites.google.com/a/ualberta.ca/onekp/home) together with the annotated whole genome of Musa acuminata (D’Hont et al., 2012) to design a set of probes that were printed on an Agilent microarray chip in parallel. This parallel printing approach enables divergent taxa to be captured on a single array and alleviates binding competition between closely related and divergent individuals. Simultaneously, we captured whole plastid genomes based on published plastid genomes from one member each of the eight families (Barrett et al., 2014).
We show the utility of this cost effective method in generating phylogenetically informative sequence data by constructing a phylogenetic tree of the Zingiberales that recaptures known relationships and resolves previously recalcitrant parts of the phylogeny with high support. Because of the phylogenetic breadth of transcriptomes becoming publically available across the plant kingdom, this method has the potential to aid in the design of lineage specific sequencing projects that span phylogenetic distances on the order of 100 Myr or possibly greater.
Methods
Taxon sampling, DNA extraction, and library preparation
Sampling included several members of each of the eight families: Heliconiaceae (5), Musaceae (9, including 2 previously published whole genomes, D’Hont et al., 2012; Davey et al., 2013), Strelitziaceae (3), Lowiaceae (2), Zingiberaceae (16), Costaceae (10), Marantaceae (7), and Cannaceae (3). In total, 53 individuals were sequenced de novo (Table S1). DNA was extracted using an SDS and salt extraction protocol (Edwards, Johnstone & Thompson, 1991; Konieczny & Ausubel, 1993) from freshly collected leaves dried in silica, eluted in TE buffer, and sonicated with a Bioruptor® (Diagenode, Liège, Belgium) or qSonica Q800R machine to an average size of approximately 250bp. Sonicated DNA was cleaned and concentrated with solid phase reversible immobilization magnetic beads (Sera-Mag; GE Healthcare, Little Chalfont, UK), and libraries were prepared according to Meyer & Kircher (2010).
Probe design, sequence capture, sequencing
To generate a nuclear probe set, the Musa acuminata CDS was downloaded from the banana genome hub (http://banana-genome.cirad.fr/) and split into annotated exons. Raw reads of transcriptomes for each of the remaining seven families were cleaned to remove adapters, low-complexity sequences, contamination, and PCR duplicates (Singhal, 2013). Cleaned transcriptome reads were aligned to the Musa acuminata exons using NovoAlign v3.01 (http://novocraft.com) with –t 502 to allow highly divergent sequences to map. After mapping, SNPs were called using SAMtools v0.1.18 (Li et al., 2009) and VarScan v2.3.6 (Koboldt et al., 2012) and consensus sequences for each family were made based on SNP calls. All exons were filtered for: (1) having overlapping read coverage in all 7 families (2) being longer than 150 bp (3) having between 30–70% GC content (4) being unique by reciprocal BLAST (5) not being found in the RepeatMasker database (command parameters can be found in Supplementary Methods). Although conservation to the Musa sequence was not used as a direct filter, the alignment protocol inherently limits chosen regions to those with relative conservation across the families–the minimum percent identity between any Musa exon and a family specific bait for the same region was 86% (Table S1). After filtering, a total of 494 exons from 418 genes for each of the eight families (the Musa reference sequence plus each sequence from the seven families) were printed with 1 bp tiling twice each on an Agilent 1M microarray chip (G3358A) (Fig. 1A). A second chip was printed with one complete plastid genome from each family (Barrett et al., 2014) with slightly less than 1 bp tiling. Libraries from a total of 56 individuals were quantified by Qubit® and pooled in equimolar quantities. The total library pool was split in half and one half was hybridized to the nuclear array and the other half was hybridized to the plastid array (Hodges et al., 2009). After hybridization, pools were subject to a limited amount of PCR amplification and enrichment success was verified with qPCR using primers matching both targeted and non-targeted regions. Because of known bias toward plastid dominance in sequenced reads owing to a greater percentage of plastid DNA in the total genomic DNA extractions, the separate hybridization pools were combined in a ratio of 3 parts nuclear to 1 part plastid and sequenced (100 bp paired-end reads) in one lane of a Illumina® HiSeq® 2500 platform at the Vincent J. Coates Genomics Sequencing Facility at the University of California, Berkeley.
Read processing
Raw reads were cleaned to remove adapters, low-complexity sequences, contamination, and PCR duplicates (Singhal, 2013). Custom Perl scripts were created to perform a series of alignment and reference adjustments using NovoAlign v3.01 (NovoCraft: http://novocraft.com), VarScan v2.3.6 (Koboldt et al., 2012) and Mapsembler2 v2.1.6 (Peterlongo & Chikhi, 2012) to generate a per individual reference for SNP calling without the need for de novo assembly (Fig. 1B). Perl scripts are available in a github repository (https://github.com/chodon/zingiberales). The plastid sequences were processed the same way except extension with Mapsembler2 was omitted, and individual genes were extracted from the whole plastid prior to final mapping. Finally, reads were mapped with NovoAlign with –t 90 and PCR duplicates were removed with Picard v1.103 (http://picard.sourceforge.net). SNPs were called following best practices guidelines using the HaplotypeCaller and readBackedPhasing algorithms in GATK v3.1.1 (McKenna et al., 2010; DePristo et al., 2011; Van der Auwera et al., 2013), except quality scores were not recalibrated because the lack of a reference set of known variants. Consensus sequences were created based on SNP calls for regions with greater than 20× coverage (Nielsen et al., 2011). SNPs in areas with less than 20× coverage were converted to Ns and regions with less than 5× coverage were discarded. For outgroup taxa, raw reads from transcriptomes generated as part of OneKP were subject to the same pipeline as sequences generated de novo. The raw sequence data from the Musa balbisiana genome project (Davey et al., 2013) was also subject to the pipeline, but only aligned for the plastid gene set. Raw de novo sequence reads are available on NCBI under bioproject SRP066318 and the final concatenated alignment is accessible on github https://github.com/chodon/zingiberales.
Alignment
After consensus sequences were made, a second pipeline was made to pass sequences through a series of alignment steps to (1) trim sequences to the Musa reference (MAFFT v7.164 (Katoh et al., 2002; Katoh & Standley, 2013) and mothur v1.34.4 (Schloss et al., 2009)), (2) place sequences into coding frame (MACSE v1.01b (Ranwez et al., 2011)), and (3) align by codon position (prank v140603 (Löytynoja & Goldman, 2005)). Plastid gene introns were spliced out by hand in Geneious v5.6.4 (Kearse et al., 2012) prior to step 3, above. After alignment, several additional steps were taken to eliminate genes that might contain non-orthologous sequences. Gene trees were generated with RAxML v8.1.17 (Stamatakis, 2014) and the single gene trees were assessed to identify those in which the gene of a single individual taxon accounted for greater than 15% of the total tree length (dos Reis et al., 2012). Exon sequences from one individual were BLASTed to the nucleotide collection database (BLASTN v2.2.30+, Altschul et al., 1997). Exons were removed from further analyses if significant BLAST hits were found to a whole plastid genome, or to ribosomal, transposon, or mitochondrial DNA. Exons were also removed from further analysis if they had unexpectedly high average coverage of greater than 200× or because frameshifts were introduced during codon position assignment or the alignment had too many indels to be reliable (Table S2). We also manually checked all alignments for potential problems (Rothfels et al., 2015). Command parameters for all steps can be found in Supplementary Methods.
Phylogenetic analyses
The nuclear and plastid sequence data were concatenated and analyzed using maximum parsimony (MP), maximum likelihood (ML), and coalescent approaches. For MP, PAUP* v4.0a142 (Swofford, 2002) was used to perform a heuristic search with 100 random addition sequence replicates and default parameters (TBR branch swapping with one tree held per replicate). MP support was evaluated with 1000 bootstrap replicates, each with 10 random addition sequence replicates. For ML reconstruction, gene-by-codon position partitions were created for the complete concatenated data set resulting in a total of 1128 initial partition subsets. These initial subsets were then grouped using the relaxed hierarchical clustering algorithm with a 1% search strategy (Lanfear et al., 2014) implemented in PartitionFinder v1.1.1 (Lanfear et al., 2012). The resulting partitioning scheme generated by PartitionFinder consisted of 112 subsets (see Supplementary Methods). The PartitionFinder scheme was analyzed with RAxML v8.1.24 (Stamatakis, 2014) with the GTR+Γ4 model of sequence evolution estimated for each partition subset and the topology linked across partitions. ML support was evaluated for the same partitioning scheme with 1000 bootstrap replicates, using the rapid bootstrap algorithm (Stamatakis, Hoover & Rougemont, 2008), and using the CAT25 approximation instead of Γ4, to model site-to-site rate heterogeneity (Stamatakis, 2006). The RAxML analysis was performed on the CIPRES web server (Miller, Pfeiffer & Schwartz, 2010). For coalescent analysis, first, best ML gene trees and bootstrap gene trees were generated in RAxML v8.1.17 using the GTR+Γ4 model. Best gene trees were the best tree of 20 independent searches and support was evaluated with 1000 bootstrap replicates. ASTRAL-II v4.7.8 (Mirarab & Warnow, 2015) was used to generate an optimal tree based on the best ML gene trees and support evaluated with the 1000 bootstrap replicates.
Results
Probe design, sequence capture, and alignment
All targeted regions for all individuals were successfully captured, although average coverage varied based on gene region (Fig. 2A), individual, and phylogenetic distance to the reference sequence (Fig. 2B). Members of the Musaceae, in general, captured better than any other family, likely because they are phylogenetically closest to the original genomic reference upon which the probes were designed. Within each family, close relatives of the species or taxon used to design the bait had higher success rates of capture than more distant members of the family. For example, Siphonochilus kirkii, had the lowest average coverage and capture efficiency for Zingiberaceae (Figs. 2B and 2C) as predicted by its evolutionary distance from the transcriptome-sequenced taxon Curcuma longa. The minimum percent identity of any captured sequence to its bait was 73%, while average distances were between 94–99% identity (Table S1). Of the total sequenced bases, the capture efficiency varied across individuals with the maximum percentage of bases mapping 3.5× higher than the minimum percentage (Fig. 2C). An average of 26% of captured bases mapped to target, which is similar to capture efficiency reported in captures of human mitochondrial DNA (Maricic, Whitten & Pääbo, 2010) and transcriptome based capture of chipmunk DNA (Bi et al., 2013). Despite the attempt to capture nuclear and plastid targets evenly, sequencing was highly biased towards plastid targets (Fig. 2C). There was some variability between individuals that was independent of phylogenetic distance, likely due to the standard variation in the success of DNA library preparation, which results from differences in DNA quality, genome size, and difficulties of accurately quantifying DNA for pooling in equimolar quantities. Any differences in DNA concentration were likely amplified in the post-hybridization PCR enrichment step.
Of the 494 nuclear probe exons, 124 were removed from further analyses based on coverage, BLAST results, skewed tree length, or alignment anomalies (Table S2). These 124 exons were from 110 genes. Twenty exons from 14 genes failed a test for coverage outliers because they had greater than 200× coverage, which is outside of the 99.99% confidence interval (Fig. S1). It is possible that these regions were either incorrectly annotated as nuclear regions in the Musa draft genome, or were transferred to the nuclear genome from more high copy genomes, especially considering that 15 of these exons were annotated as having an “unknown chromosomal location” in the Musa draft genome (Fig. 2A, Table S2). A total of 37 exons from 34 genes were removed from the nuclear dataset and 13 genes from the plastid dataset due to skewed tree length. Four nuclear exons from two genes were removed because of introduced frameshifts and ycf1 from the plastid was eliminated because of insertions and deletions in the alignment apparent after manual inspection. Finally, 63 additional exons from 61 genes were removed because of a top BLAST hit to a whole plastid genome, mitochondrial, transposon or ribosomal DNA. Of these 63 exons, the 27 ribosomal and 21 mitochondrial exons could likely be included in further analyses or within family specific analyses in future work after analyzing secondary structure and genomic location.
The final dataset of 308 nuclear genes had a total aligned length of 81,546 bp with 24,379 (29.9%) parsimony informative sites and an average coverage of 40 ± 13× (mean ± s.d.). The 68 gene plastid dataset had a total aligned length of 56,202 bp with 8,336 (14.8%) parsimony informative sites and an average coverage of 377 ± 589× (Table S2).
Phylogenetic analyses
The recovered topology (Fig. 3) places Musaceae as sister to all other families with 100% parsimony bootstrap support (pb) and maximum likelihood bootstrap support (mlb). The ginger families (Cannaceae, Costaceae, Zingiberaceae and Marantaceae) are well supported (100 pb/mlb) as monophyletic. The MP and ML trees are largely congruent and support values are generally high from shallow to deep phylogenetic relationships. The optimal coalescent tree does not conflict with the ML tree but there is low support for several nodes, notably the placement of Musaceae sister to the rest of the order (Fig. 3).
Discussion
This method functions to capture hundreds of loci across deep divergence, with successful capture across individual species that are divergent from the genomic data for which the baits were generated. Using several different taxa as bait and filtering genes for those found in all families ameliorated the problem of decreased capture efficiency as phylogenetic distance from probes increases. Furthermore, this protocol can be customized to any plant group and can often be generated with publically available data generated from previous studies. Despite deep phylogenetic divergence, the array-based capture was effective, enabling the avoidance of high efficiency, but costly, in-solution capture protocols. Although the method is limited by the necessity to find orthologs across transcriptomes of varying quality, generated in different labs, and under different conditions, the probe generation and filtration protocol successfully found hundreds of orthologous loci, which offered significant signal at the evolutionary depth of this study. The number of orthologous loci that are expected to be necessary to provide sufficient power to resolve questions asked should be considered when tailoring this pipeline for other systems. Future work will focus on limiting mistaken high copy and excessive plastid capture as well as minimizing the introduction of PCR duplicates.
Family relationships within Zingiberales have been studied since the mid-1950s (Tomlinson, 1956; Tomlinson, 1962). Based on morphological, anatomical, and developmental data a monophyletic ‘ginger’ clade (Zingiberaceae, Costaceae, Cannaceae and Marantaceae) has long been established (Dahlgren & Rasmussen, 1983; Kirchoff, 1988). However, there are no reliable estimates for the relationships among the other four families (i.e., the ‘banana’ lineages: Musaceae, Heliconiaceae, Lowiaceae, and Strelitziaceae) and the ginger clade despite several phylogenetic studies from combined genomic compartments and morphological data (Kress, 1990; Kress et al., 2001; Johansen, 2005). Even studies using plastome scale datasets failed to produce a well resolved phylogeny near the root of the Zingiberales (Barrett et al., 2014). Here, we show that a targeted exon capture generates phylogenomic scale data that can fruitfully address this problem and may be adapted for resolving ancient radiation in other plant groups. Our main finding suggests that Musaceae is the sister group to the remaining families of Zingiberales and that many other deep relationships within Zingiberales are well supported (Fig. 3). Recent studies of gene family evolution and gene duplication (Bartlett & Specht, 2010; Yockteng et al., 2013; Almeida, Yockteng & Specht, 2015) further support this placement of Musaceae. Relationships within individual Zingiberales families are also well supported in the ML and MP analyses (Fig. 3). The coalescent analysis using ASTRAL-II did not show support for some relationships, but the validity of applying these approaches remains unclear (Gatesy & Springer, 2013; Gatesy & Springer, 2014; Mirarab et al., 2014). Importantly, the relationships found here are not in conflict with existing well supported hypotheses for generic-level relationships (Kress, Prince & Williams, 2002; Johansen, 2005; Prince & Kress, 2006; Specht, 2006; Kress et al., 2007; Prince, 2010; Li et al., 2010; Cron et al., 2012), indicating that our method is identifying orthologs and that the data produced should be useful at finer phylogenetic scales as well a deep ones.
This pilot study is a first attempt at harnessing phylogenomic data from both the nuclear and plastid genomes to address the global phylogeny of Zingiberales. We have planned substantially increased taxon sampling for both ingroups and out groups and work is ongoing to incorporate morphological data from living and fossil representatives into a phylogenetic reconstruction pipeline to co-estimate fossil placement and lineage divergence times. This will permit us to make full use of information recorded in both the fossil record and genetic data to understand morphological evolution of floral and vegetative traits across the Zingiberales, and estimate ages of diversification for the major lineages, testing the hypothesis of an ancient and rapid radiation at the base of the order.
Supplemental Information
Acknowledgments
We thank Lydia Smith at the Evolutionary Genetics Lab and Ke Bi at the Computational Genomics Resource Lab at UC Berkeley for invaluable help and discussion with lab and bioinformatics work; Igor Antoshechkin at CalTech for help with the QSonica; the Huntington and UC Botanical Gardens for collections and hosting CS; Jerry Davis for collections and DNA; OneKP and MonAToL, especially Jim Leebens-Mack and Dennis Wm. Stevenson, for early access to raw transcriptome reads used for baits.
Funding Statement
This research was supported by NSF DEB 1257701 Collaborative Research Award to CDS and SYS, and NSF DEB 0816661 Research Opportunity Award Supplement to CDS and CFB. This work used the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH S10 Instrumentation Grants S10RR029668 and S10RR027303. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional Information and Declarations
Competing Interests
The authors declare that they have no competing interests.
Author Contributions
Chodon Sass conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
William J.D. Iles performed the experiments, analyzed the data, wrote the paper, prepared figures and/or tables, reviewed drafts of the paper.
Craig F. Barrett conceived and designed the experiments, performed the experiments, reviewed drafts of the paper.
Selena Y. Smith conceived and designed the experiments, reviewed drafts of the paper.
Chelsea D. Specht conceived and designed the experiments, contributed reagents/materials/analysis tools, wrote the paper, reviewed drafts of the paper.
DNA Deposition
The following information was supplied regarding the deposition of DNA sequences:
NCBI bioproject SRP066318
Data Deposition
The following information was supplied regarding data availability: https://github.com/chodon/zingiberales and NCBI bioproject SRP066318
References
- Almeida, Yockteng & Specht (2015).Almeida AMR, Yockteng R, Specht CD. Evolution of petaloidy in the Zingiberales: an assessment of the relationship between ultrastructure and gene expression patterns. Developmental Dynamics. 2015;244(9):1121–1132. doi: 10.1002/dvdy.24280. [DOI] [PubMed] [Google Scholar]
- Altschul et al. (1997).Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Angiosperm Phylogeny Group (2003).Angiosperm Phylogeny Group An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG II. Botanical Journal of the Linnean Society. 2003;141(4):399–436. doi: 10.1046/j.1095-8339.2003.t01-1-00158.x. [DOI] [Google Scholar]
- Barrett et al. (2014).Barrett CF, Specht CD, Leebens-Mack J, Stevenson DW, Zomlefer WB, Davis JI. Resolving ancient radiations: can complete plastid gene sets elucidate deep relationships among the tropical gingers (Zingiberales)? Annals of Botany. 2014;113(1):119–133. doi: 10.1093/aob/mct264. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bartlett & Specht (2010).Bartlett ME, Specht CD. Evidence for the involvement of GLOBOSA-like gene duplications and expression divergence in the evolution of floral morphology in the Zingiberales. New Phytologist. 2010;187(2):521–541. doi: 10.1111/j.1469-8137.2010.03279.x. [DOI] [PubMed] [Google Scholar]
- Bi et al. (2012).Bi K, Vanderpool D, Singhal S, Linderoth T, Moritz C, Good JM. Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales. BMC Genomics. 2012;13:403. doi: 10.1186/1471-2164-13-403. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bi et al. (2013).Bi K, Linderoth T, Vanderpool D, Good JM, Nielsen R, Moritz C. Unlocking the vault: next-generation museum population genomics. Molecular Ecology. 2013;22(24):6018–6032. doi: 10.1111/mec.12516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen & Smith (2013).Chen ST, Smith SY. Phytolith variability in Zingiberales: a tool for the reconstruction of past tropical vegetation. Palaeogeography, Palaeoclimatology, Palaeoecology. 2013;370:1–12. doi: 10.1016/j.palaeo.2012.10.026. [DOI] [Google Scholar]
- Crawford et al. (2012).Crawford NG, Faircloth BC, McCormack JE, Brumfield RT, Winker K, Glenn TC. More than 1000 ultraconserved elements provide evidence that turtles are the sister group of archosaurs. Biology Letters. 2012;8(5):783–786. doi: 10.1098/rsbl.2012.0331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cron et al. (2012).Cron GV, Pirone C, Bartlett M, Kress WJ, Specht C. Phylogenetic relationships and evolution in the Strelitziaceae (Zingiberales) Systematic Botany. 2012;37(3):606–619. doi: 10.1600/036364412X648562. [DOI] [Google Scholar]
- D’Hont et al. (2012).D’Hont A, Denoeud F, Aury J-M, Baurens F-C, Carreel F, Garsmeur O, Noel B, Bocs S, Droc G, Rouard M, Da Silva C, Jabbari K, Cardi C, Poulain J, Souquet M, Labadie K, Jourda C, Lengellé J, Rodier-Goud M, Alberti A, Bernard M, Correa M, Ayyampalayam S, Mckain MR, Leebens-Mack J, Burgess D, Freeling M, Mbéguié-A-Mbéguié D, Chabannes M, Wicker T, Panaud O, Barbosa J, Hribova E, Heslop-Harrison P, Habas R, Rivallan R, Francois P, Poiron C, Kilian A, Burthia D, Jenny C, Bakry F, Brown S, Guignon V, Kema G, Dita M, Waalwijk C, Joseph S, Dievart A, Jaillon O, Leclercq J, Argout X, Lyons E, Almeida A, Jeridi M, Dolezel J, Roux N, Risterucci A-M, Weissenbach J, Ruiz M, Glaszmann J-C, Quétier F, Yahiaoui N, Wincker P. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488(7410):213–217. doi: 10.1038/nature11241. [DOI] [PubMed] [Google Scholar]
- Dahlgren & Rasmussen (1983).Dahlgren RMT, Rasmussen FN. Monocotyledon evolution: characters and phylogenetic estimation. Evolutionary Biology. 1983;16:255–395. doi: 10.1007/978-1-4615-6971-8_7. [DOI] [Google Scholar]
- Davey et al. (2013).Davey MW, Gudimella R, Harikrishna JA, Sin LW, Khalid N, Keulemans J. A draft Musa balbisiana genome sequence for molecular genetics in polyploid, inter- and intra-specific Musa hybrids. BMC Genomics. 2013;14:683. doi: 10.1186/1471-2164-14-683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DePristo et al. (2011).DePristo MA, Banks E, Poplin R, Garimella K V, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics. 2011;43(5):491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dos Reis et al. (2012).Dos Reis M, Inoue J, Hasegawa M, Asher RJ, Donoghue PCJ, Yang Z. Phylogenomic datasets provide both precision and accuracy in estimating the timescale of placental mammal phylogeny. Proceedings of the Royal Society B: Biological Sciences. 2012;279(1742):3491–3500. doi: 10.1098/rspb.2012.0683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edwards, Johnstone & Thompson (1991).Edwards K, Johnstone C, Thompson C. A simple and rapid method for the preparation of plant genomic DNA for PCR analysis. Nucleic Acids Research. 1991;19(6):1349. doi: 10.1093/nar/19.6.1349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faircloth et al. (2012).Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC. Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Systematic Biology. 2012;61(5):717–726. doi: 10.1093/sysbio/sys004. [DOI] [PubMed] [Google Scholar]
- Friis (1988).Friis EM. Spirematospermum chandlerae sp. nov., an extinct species of Zingiberaceae from the North American Cretaceous. Tertiary Research. 1988;9:7–12. [Google Scholar]
- Friis, Crane & Pedersen (2011).Friis EM, Crane PR, Pedersen KR. Early flowers and angiosperm evolution. Cambridge: Cambridge University Press; 2011. [DOI] [Google Scholar]
- Gatesy & Springer (2013).Gatesy J, Springer MS. Concatenation versus coalescence versus “concatalescence.”. Proceedings of the National Academy of Sciences. 2013;110(13):e1584. doi: 10.1073/pnas.1221121110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gatesy & Springer (2014).Gatesy J, Springer MS. Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. Molecular Phylogenetics and Evolution. 2014;80:231–266. doi: 10.1016/j.ympev.2014.08.013. [DOI] [PubMed] [Google Scholar]
- Hodges et al. (2009).Hodges E, Rooks M, Xuan ZY, Bhattacharjee A, Gordon DB, Brizuela L, McCombie WR, Hannon GJ. Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing. Nature Protocols. 2009;4:960–974. doi: 10.1038/nprot.2009.68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iles et al. (2015).Iles WJD, Smith SY, Gandolfo MA, Graham SW. A review of monocot fossils suitable for molecular dating analyses. Botanical Journal of the Linnean Society. 2015;178(3):346–374. doi: 10.1111/boj.12233. [DOI] [Google Scholar]
- Johansen (2005).Johansen LB. Phylogeny of Orchidantha (Lowiaceae) and the Zingiberales based on six DNA regions. Systematic Botany. 2005;30(1):106–117. doi: 10.1600/0363644053661931. [DOI] [Google Scholar]
- Katoh et al. (2002).Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Research. 2002;30(14):3059–3066. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katoh & Standley (2013).Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution. 2013;30(4):772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kearse et al. (2012).Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, Thierer T, Ashton B, Mentjies P, Drummond A. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–1649. doi: 10.1093/bioinformatics/bts199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kirchoff (1988).Kirchoff BK. Floral ontogeny and evolution in the ginger group of the Zingiberales. In: Leins P, Tucker SC, Endress PK, editors. Aspects of floral development. Berlin: Lubrecht & Cramer Ltd.; 1988. pp. 45–56. [Google Scholar]
- Koboldt et al. (2012).Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Research. 2012;22:568–576. doi: 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Konieczny & Ausubel (1993).Konieczny A, Ausubel FM. A procedure for mapping Arabidopsis mutations using co-dominant ecotype-specific PCR-based markers. The Plant Journal. 1993;4(2):403–410. doi: 10.1046/j.1365-313X.1993.04020403.x. [DOI] [PubMed] [Google Scholar]
- Kress (1990).Kress WJ. The phylogeny and classification of the Zingiberales. Annals of Missouri Botanical Garden. 1990;77(4):698–721. doi: 10.2307/2399669. [DOI] [Google Scholar]
- Kress et al. (2001).Kress WJ, Prince LM, Hahn WJ, Zimmer EA. Unraveling the evolutionary radiation of the families of the Zingiberales using morphological and molecular evidence. Systematic Biology. 2001;50(6):926–944. doi: 10.1080/106351501753462885. [DOI] [PubMed] [Google Scholar]
- Kress et al. (2007).Kress WJ, Newman MF, Poulsen AD, Specht C. An analysis of generic circumscriptions in tribe Alpinieae (Alpiniodeae: Zingiberaceae) Gardens’ Bulletin Singapore. 2007;59:113–128. [Google Scholar]
- Kress, Prince & Williams (2002).Kress WJ, Prince LM, Williams KJ. The phylogeny and a new classification of the gingers (Zingiberaceae): evidence from molecular data. American Journal of Botany. 2002;89(10):1682–1696. doi: 10.3732/ajb.89.10.1682. [DOI] [PubMed] [Google Scholar]
- Kress & Specht (2006).Kress WJ, Specht CD. The evolutionary and biogeographic origin and diversification of the tropical monocot order Zingiberales. In: Columbus JT, Friar EA, Hamilton CW, Porter JM, Prince LM, Simpson MG, editors. Monocots: Comparative Biology and Evolution. Claremont: Rancho Santa Ana Botanic Garden; 2006. pp. 619–630. [Google Scholar]
- Lanfear et al. (2012).Lanfear R, Calcott B, Ho SYW, Guindon S. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution. 2012;29(6):1695–1701. doi: 10.1093/molbev/mss020. [DOI] [PubMed] [Google Scholar]
- Lanfear et al. (2014).Lanfear R, Calcott B, Kainer D, Mayer C, Stamatakis A. Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evolutionary Biology. 2014;14:82. doi: 10.1186/1471-2148-14-82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lemmon, Emme & Lemmon (2012).Lemmon AR, Emme SA, Lemmon EM. Anchored hybrid enrichment for massively high-throughput phylogenomics. Systematic Biology. 2012;61(5):727–744. doi: 10.1093/sysbio/sys049. [DOI] [PubMed] [Google Scholar]
- Li et al. (2009).Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li et al. (2010).Li L-F, Häkkinen M, Yuan Y-M, Hao G, Ge X-J. Molecular phylogeny and systematics of the banana family (Musaceae) inferred from multiple nuclear and chloroplast DNA fragments, with a special reference to the genus Musa. Molecular Phylogenetics and Evolution. 2010;57(1):1–10. doi: 10.1016/j.ympev.2010.06.021. [DOI] [PubMed] [Google Scholar]
- Löytynoja & Goldman (2005).Löytynoja A, Goldman N. An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences, USA. 2005;102(30):10557–10562. doi: 10.1073/pnas.0409137102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Magallón et al. (2015).Magallón S, Gómez-Acevedo S, Sánchez-Reyes LL, Hernández-Hernández T. A metacalibrated time-tree documents the early rise of flowering plant phylogenetic diversity. New Phytologist. 2015;207(2):437–453. doi: 10.1111/nph.13264. [DOI] [PubMed] [Google Scholar]
- Maricic, Whitten & Pääbo (2010).Maricic T, Whitten M, Pääbo S. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS ONE. 2010;5(11):e1584. doi: 10.1371/journal.pone.0014004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McKenna et al. (2010).McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer & Kircher (2010).Meyer M, Kircher M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harbor Protocols. 2010;2010(6):732–741. doi: 10.1101/pdb.prot5448. [DOI] [PubMed] [Google Scholar]
- Miller, Pfeiffer & Schwartz (2010).Miller MA, Pfeiffer W, Schwartz T. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. 2010 Gateway Computing Environments Workshop (GCE); 14–14 November; New Orleans, LA: IEEE; 2010. pp. 1–8. [DOI] [Google Scholar]
- Mirarab et al. (2014).Mirarab S, Bayzid MS, Boussau B, Warnow T. Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science. 2014;346(6215):1250463. doi: 10.1126/science.1250463. [DOI] [PubMed] [Google Scholar]
- Mirarab & Warnow (2015).Mirarab S, Warnow T. ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics. 2015;31(12):i44–i52. doi: 10.1093/bioinformatics/btv234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nielsen et al. (2011).Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and SNP calling from next-generation sequencing data. Nature Reviews Genetics. 2011;12:443–451. doi: 10.1038/nrg2986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peñalba et al. (2014).Peñalba JV, Smith LL, Tonione MA, Sass C, Hykin SM, Skipwith PL, McGuire JA, Bowie RCK, Moritz C. Sequence capture using PCR-generated probes: a cost-effective method of targeted high-throughput sequencing for non-model organisms. Molecular Ecology Resources. 2014;14:1000–1010. doi: 10.1111/1755-0998.12249. [DOI] [PubMed] [Google Scholar]
- Peterlongo & Chikhi (2012).Peterlongo P, Chikhi R. Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinformatics. 2012;13:48. doi: 10.1186/1471-2105-13-48. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prince (2010).Phylogenetic relationships and species delimitation in Canna (Cannaceae) In: Prince LM, editor; Seberg O, Petersen G, Barford, Davis JI, editors. Diversity, Phylogeny and Evolution in the Monocotyledons. Aarhus: Aarhus University Press; 2010. pp. 307–331. [Google Scholar]
- Prince & Kress (2006).Prince LM, Kress WJ. Phylogenetic relationships and classification in Marantaceae: insights from plastid DNA sequence data. Taxon. 2006;55(2):281–296. doi: 10.2307/25065578. [DOI] [Google Scholar]
- Ranwez et al. (2011).Ranwez V, Harispe S, Delsuc F, Douzery EJP. MACSE: multiple alignment of coding sequences accounting for frameshifts and stop codons. PLoS ONE. 2011;6(9):e1584. doi: 10.1371/journal.pone.0022594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reneker et al. (2012).Reneker J, Lyons E, Conant GC, Pires JC, Freeling M, Shyu C-R, Korkin D. Long identical multispecies elements in plant and animal genomes. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(19):E1183–E1191. doi: 10.1073/pnas.1121356109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rodriguez-de la Rosa & Cevallos-Ferriz (1994).Rodriguez-de la Rosa RA, Cevallos-Ferriz SRS. Upper cretaceous zingiberalean fruits with in situ seeds from southeastern Coahuila, Mexico. International Journal of Plant Sciences. 1994;155(6):786–805. doi: 10.1086/297218. [DOI] [Google Scholar]
- Rothfels et al. (2015).Rothfels CJ, Li F-W, Sigel EM, Huiet L, Larsson A, Burge DO, Ruhsam M, Deyholos M, Soltis DE, Stewart CN, Shaw SW, Pokorny L, Chen T, DePamphilis C, DeGironimo L, Chen L, Wei X, Sun X, Korall P, Stevenson DW, Graham SW, Wong GK-S, Pryer KM. The evolutionary history of ferns inferred from 25 low-copy nuclear genes. American Journal of Botany. 2015;102(7):1089–1107. doi: 10.3732/ajb.1500089. [DOI] [PubMed] [Google Scholar]
- Schloss et al. (2009).Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology. 2009;75(23):7537–7541. doi: 10.1128/AEM.01541-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singhal (2013).Singhal S. De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set. Molecular Ecology Resources. 2013;13(3):403–416. doi: 10.1111/1755-0998.12077. [DOI] [PubMed] [Google Scholar]
- Specht (2006).Specht CD. Systematics and evolution of the tropical monocot family Costaceae (Zingiberales): a multiple dataset approach. Systematic Botany. 2006;31(1):89–106. doi: 10.1600/036364406775971840. [DOI] [Google Scholar]
- Stamatakis (2006).Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22(21):2688–2690. doi: 10.1093/bioinformatics/btl446. [DOI] [PubMed] [Google Scholar]
- Stamatakis (2014).Stamatakis A. A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. (RAxML version 8) 2014;30(9):1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stamatakis, Hoover & Rougemont (2008).Stamatakis A, Hoover P, Rougemont J. A rapid bootstrap algorithm for the RAxML Web servers. Systematic Biology. 2008;57(5):758–771. doi: 10.1080/10635150802429642. [DOI] [PubMed] [Google Scholar]
- Swofford (2002).Swofford DL. PAUP*: phylogenetic analysis using parsimony (*and other methods) version 4. 2002. Available at http://paup.csit.fsu.edu/about.html .
- Tomlinson (1956).Tomlinson PB. Studies in the systematic anatomy of the Zingiberaceae. Journal of the Linnean Society (Botany) 1956;55(361):547–592. doi: 10.1111/j.1095-8339.1956.tb00023.x. [DOI] [Google Scholar]
- Tomlinson (1962).Tomlinson PB. Phylogeny of the scitamineae—morphological and anatomical considerations. Evolution. 1962;16(2):192–213. doi: 10.2307/2406197. [DOI] [Google Scholar]
- Van der Auwera et al. (2013).Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA. From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Current Protocols in Bioinformatics. 2013;43:11.10.10–11.10.33. doi: 10.1002/0471250953.bi1110s43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weitemier et al. (2014).Weitemier K, Straub SCK, Cronn RC, Fishbein M, Schmickl R, McDonnell A, Liston A. Hyb-Seq: combining target enrichment and genome skimming for plant phylogenomics. Applications in Plant Sciences. 2014;2(9):1400042. doi: 10.3732/apps.1400042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whitfield & Kjer (2008).Whitfield JB, Kjer KM. Ancient rapid radiations of insects: challenges for phylogenetic analysis. Annual Review of Entomology. 2008;53:449–472. doi: 10.1146/annurev.ento.53.103106.093304. [DOI] [PubMed] [Google Scholar]
- Whitfield & Lockhart (2007).Whitfield JB, Lockhart PJ. Deciphering ancient rapid radiations. Trends in Ecology & Evolution. 2007;22(5):258–265. doi: 10.1016/j.tree.2007.01.012. [DOI] [PubMed] [Google Scholar]
- Yockteng et al. (2013).Yockteng R, Almeida AMR, Morioka K, Alvarez-Buylla ER, Specht CD. Molecular evolution and patterns of duplication in the SEP/AGL6-like lineage of the Zingiberales: a proposed mechanism for floral diversification. Molecular Biology and Evolution. 2013;30(11):2401–2422. doi: 10.1093/molbev/mst137. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.