Abstract
Premise
Rubiaceae is among the most species‐rich plant families, as well as one of the most morphologically and geographically diverse. Currently available phylogenies have mostly relied on few genomic and plastid loci, as opposed to large‐scale genomic data. Target enrichment provides the ability to generate sequence data for hundreds to thousands of phylogenetically informative, single‐copy loci, which often leads to improved phylogenetic resolution at both shallow and deep taxonomic scales; however, a publicly accessible Rubiaceae‐specific probe set that allows for comparable phylogenetic inference across clades is lacking.
Methods
Here, we use publicly accessible genomic resources to identify putatively single‐copy nuclear loci for target enrichment in two Rubiaceae groups: tribe Hillieae (Cinchonoideae) and tribal complex Palicoureeae+Psychotrieae (Rubioideae). We sequenced 2270 exonic regions corresponding to 1059 loci in our target clades and generated in silico target enrichment sequences for other Rubiaceae taxa using our designed probe set. To test the utility of our probe set for phylogenetic inference across Rubiaceae, we performed a coalescent‐aware phylogenetic analysis using a subset of 27 Rubiaceae taxa from 10 different tribes and three subfamilies, and one outgroup in Apocynaceae.
Results
We recovered an average of 75% and 84% of targeted exons and loci, respectively, per Rubiaceae sample. Probes designed using genomic resources from a particular subfamily were most efficient at targeting sequences from taxa in that subfamily. The number of paralogs recovered during assembly varied for each clade. Phylogenetic inference of Rubiaceae with our target regions resolves relationships at various scales. Relationships are largely consistent with previous studies of relationships in the family with high support (≥0.98 local posterior probability) at nearly all nodes and evidence of gene tree discordance.
Discussion
Our probe set, which we call Rubiaceae2270x, was effective for targeting loci in species across and even outside of Rubiaceae. This probe set will facilitate phylogenomic studies in Rubiaceae and advance systematics and macroevolutionary studies in the family.
Keywords: phylogenetics, probe set, Rubiaceae, target enrichment
Target enrichment has emerged as a relevant and widely applicable tool for generating large‐scale genomic data for resolving phylogenies across the tree of life. This reduced‐representation sequencing strategy is characterized by targeting pre‐selected single‐copy loci from across the genome, which are isolated using RNA probes before sequencing. The resulting data set consists of hundreds to thousands of individual nuclear loci sequenced across taxa (Andermann et al., 2019). These data sets are ideal for applying complex phylogenetic models, including species tree methods that incorporate incomplete lineage sorting and model reticulate evolution. This method also has the benefit of being robust to low‐quality, low‐quantity input DNA, meaning that specimens from natural history collections can serve as a DNA input source (Hart et al., 2016).
Studies employing target enrichment data can rely either on universal probe sets, which target loci that are highly conserved across major clades (e.g., angiosperms [Johnson et al., 2019] or flagellate plants [Breinholt et al., 2021]), or on lineage‐specific probe sets. The relative utility of lineage‐specific vs. universal probe sets depends on the genetic distance between the probe sequences and the target regions in the group of interest, as well as the scope of the phylogenetic study (Kadlec et al., 2017; Chau et al., 2018; Soto Gomez et al., 2019; Straub et al., 2020; Yardeni et al., 2022). While universal probe sets such as Angiosperms353 are an important community resource and have been successfully used to resolve evolutionary relationships at both macro‐ and micro‐evolutionary timescales (Smith et al., 2014; Slimp et al., 2021; Le et al., 2022), they also present challenges. For example, universal probes are developed to recover primarily highly conserved loci and may not be useful for resolving relationships at or below the species level, especially when applied to rapid radiations or very shallow phylogenetic splits. Difficulty with sequence assembly can arise due to the negative correlation between enrichment success and the degree of divergence between focal taxa and the taxa used for probe design (Liu et al., 2019). This can result in fewer and less complete loci when using a universal probe set as compared to a family‐specific one (Siniscalchi et al., 2021; Ufimov et al., 2021; Yardeni et al., 2022). This also decreases the number of informative characters available for phylogenetic inference. Furthermore, as the degree of ploidy is extremely variable across angiosperms, using a universal probe set can result in a high proportion of paralogs that complicate phylogenetic inference, particularly in groups where polyploidy is common (Frost et al., 2022).
Multiple studies have found that lineage‐specific probe sets outperform universal probe sets in species‐level phylogenetic studies (Ufimov et al., 2021; Yardeni et al., 2022). An advantage of using lineage‐specific probe sets is that they can be designed to be as taxon‐specific as the user wants, mitigating some of the issues with universal probe sets. As a result, loci from lineage‐specific probe sets tend to have more variable sites (Soto Gomez et al., 2019; Shah et al., 2021; Ufimov et al., 2021; Yardeni et al., 2022). Fortunately, predesigned, lineage‐specific probe sets exist for many major clades of plants, including Asteraceae (Mandel et al., 2014), Orchidaceae (Eserman et al., 2021), Brassicaceae (Nikolov et al., 2019), and Melastomataceae (Jantzen et al., 2020). For other groups, custom lineage‐specific probe sets can easily be designed for specific projects using a variety of existing pipelines (Chamala et al., 2015; Schmickl et al., 2016; Jantzen et al., 2020), as long as there are genomic resources (such as draft genomes or transcriptome references) available. Despite their shortcomings, the utility of universal probe sets cannot be refuted, and when financially feasible it may be advantageous to incorporate both universal and lineage‐specific probes in target enrichment studies (Hendriks et al., 2021).
Rubiaceae is the fourth largest flowering plant family, with ~13,500 species in ~620 genera (Plants of the World Online, 2023). While most of its genera and species are tropical, Rubiaceae species occur in nearly all habitats, ranging from taiga to rainforest, and on all continents including Antarctica. Along with geographical diversity, the family exhibits high variation in floral and fruit morphology, habit, and ecological interactions. This variation has inspired multiple studies to understand patterns of trait evolution (Bremer and Eriksson, 1992; Ferrero et al., 2012; Razafimandimbison et al., 2014; Ehrendorfer et al., 2019), and the family has been used as a model to characterize angiosperm macroevolution (Antonelli et al., 2009). While many phylogenetic studies have been performed for different clades in Rubiaceae (Löfstrand et al., 2019; Borges et al., 2021; Razafimandimbison et al., 2021; Amenu et al., 2022), as well for the entire family (Bremer and Eriksson, 2009), these have mostly relied on a few loci. Only a handful of studies have used genomic‐scale phylogenetic data to resolve relationships within Rubiaceae, with most of these using the Angiosperms353 data (Antonelli et al., 2021; Canales et al., 2022; Thureborn et al., 2022) and another using microarray technology (Prata et al., 2018).
Here, we present the first lineage‐specific probe set for target enrichment phylogenetics of Rubiaceae. Using publicly accessible genomic resources for the family, we identified a set of 2270 exonic regions across 1059 low‐copy nuclear loci. We explored the performance of these loci using both in silico and in vitro methods, the latter targeting two distantly related and understudied groups: tribe Hillieae (Cinchonoideae) and tribal complex Palicoureeae+Psychotrieae (Rubioideae). To assess the strength of our probe set in recovering intertribal and subfamilial relationships, we also inferred phylogenetic relationships for a subset of 27 Rubiaceae taxa and an outgroup (Apocynaceae) using a coalescent‐aware approach. Our probe set was successful in recovering loci in Hillieae and Palicoureeae+Psychotrieae, as well as in other tribes of Rubiaceae, and inferred phylogenetic relationships were largely consistent with previous studies in the family. Therefore, we anticipate that this probe set will be useful for resolving relationships across the family and can complement studies employing universal probe sets. This resource will facilitate multi‐locus phylogenetic studies by the community of researchers studying the coffee family and improve our understanding of evolutionary relationships and macroevolutionary dynamics in this important plant group.
METHODS
Locus selection
Putatively single‐ to low‐copy nuclear loci were identified from existing Rubiaceae genomic data: six transcriptomes (Mapouria douarrei Beauvis., Psychotria marginata Sw., Carapichea ipecacuanha (Brot.) L. Andersson, Cinchona pubescens Vahl, Hamelia patens Jacq., and Neolamarckia cadamba (Roxb.) Bosser) and four genome‐skimming paired‐end reads (Corynanthe mayumbensis (R. D. Good) Raym.‐Hamet ex N. Hallé, Mitragyna speciosa (Korth.) Havil., Gardenia jasminoides J. Ellis, and Galium odoratum (L.) Scop.). Transcriptomes were retrieved from the National Center for Biotechnology Information (NCBI) TSA database (https://www.ncbi.nlm.nih.gov/genbank/tsa/), the 1000 Plant Transcriptomes project (1KP) database (Matasci et al., 2014), and MedPlant RNA Seq Database (https://medplantrnaseq.org). Genome skim data were retrieved from the NCBI SRA database (https://www.ncbi.nlm.nih.gov/sra). The chloroplast plastome from Gardenia jasminoides, available from the NCBI Genome database (https://www.ncbi.nlm.nih.gov/genome/), was used to identify plastome sequences. No mitochondrial reference was used in probe design.
Sondovač version 1.3 was used to identify putative orthologous loci from the publicly mined genomic resources (Schmickl et al., 2016). Briefly, Sondovač identifies single‐copy nuclear loci from low‐coverage genome‐skimming data that are mapped to a transcriptome while discarding non‐coding reads and those that map to organellar genomes. To reduce the presence of multi‐copy regions in our data set, duplicated transcripts with a BLAT score above 1000 were removed in Sondovač. The filtered genome reads were subsequently de novo assembled into exonic regions (ERs) using Geneious (version 2021.1.1; Biomatters, Auckland, New Zealand). Recovery of complete exons is not guaranteed, as this is determined by the quality of the input transcriptomes and genome skim data sets. ERs were filtered to maintain contigs ≥120 bp, and these were subsequently filtered to keep those that belonged to transcripts with a total length ≥360 bp across all contigs. Lastly, we removed ERs with ≥90% sequence similarity from the data set.
While Sondovač uses genome‐skimming data for one sample and one transcriptome at a time, previous research has found that the number of identified loci can be maximized if multiple genome‐skimming data sets are combined and paired with each available transcriptome (Uribe‐Converse, 2016; Bagley et al., 2020). Adopting this strategy, we combined all genome‐skimming reads belonging to the same subfamily, and the resulting composite genome‐skimming read sets were paired with each same‐subfamily transcriptome. Six separate Sondovač runs were performed in total, following the design in Table 1. This resulted in three sets of loci derived from Rubioideae genomic resources (RUB) and three sets derived from Cinchonoideae genomic resources (CIN). To identify loci shared among taxa, ERs with ≥90% sequence similarity across the six sets were clustered using cd‐hit‐est (V4.8.1; Li and Godzik, 2006), and the longest sequence in each cluster was retained. We subsequently removed unique ERs from the set to avoid including sequences that are not shared across taxa and were left with 2270 remaining final ERs, each of which may be a complete or partial exon.
Table 1.
Sondovač design and output. Sources of genomic resources are included in Appendix 2.
| Locus set | Transcriptome input | Genome skim input | Plastome input | Unfiltered ERs | Final ERs |
|---|---|---|---|---|---|
| RUB1 | Mapouria douarrei | Combined Galium paired‐end reads | Gardenia jasminoides | 273 | 38 |
| RUB2 | Carapichea ipecacuanha | 1930 | 429 | ||
| RUB3 | Psychotria marginata | 1676 | 201 | ||
| CIN1 | Hamelia patens | Combined Neolamarckia and Corynanthe paired‐end reads | 2904 | 462 | |
| CIN2 | Neolamarckia cadamba | 12,209 | 834 | ||
| CIN3 | Cinchona pubescens | 1108 | 306 |
Note: ER = exonic region.
To determine whether our developed locus data set shared sequences with the Angiosperms353 data set, we ran a BLAST search of our 2270 ERs against the Angiosperms353 representative sequences (Johnson et al., 2019; https://github.com/mossmatters/Angiosperms353/tree/master/Probes).
Sampling and DNA extraction for target enrichment
We included 132 species of Rubiaceae, representing 164 samples. Sampling was mostly from two distantly related clades: (1) Hillieae (Cinchonoideae), represented by 26 species and 44 accessions in the genera Hillia, Cosmibuena, and Balmea (90% of all species in the clade); and (2) Palicoureeae and the closely related Psychotrieae (Rubioideae), represented by 102 species and 109 accessions in Palicourea, Notopleura, Rudgea, Carapichea, and Eumachia in Palicoureeae and Psychotria in Psychotrieae (~5% of all species in the clade; Appendix 1). Our sampling also included additional members of Cinchonoideae: one Hamelia patens (Hamelieae), one Hoffmannia phoenicopoda K. Schum. (Hamelieae), one Coutarea hexandra (Jacq.) K. Schum. (Chiococceae), and one Ferdinandusa paraensis Ducke (Dialypetalantheae) (Appendix 1). From the total samples included in this study, 79% came from herbarium specimens deposited at MO or LSU (acronyms following Thiers, 2023), including all Hillieae accessions (Appendix 1).
We extracted total genomic DNA from leaf tissue using a sorbitol extraction protocol adapted from Štorchová et al. (2000). DNA concentration was assessed with a Qubit fluorometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA), and fragment size distribution was visualized with gel electrophoresis (1% agarose). Sample extractions that yielded low DNA concentrations were repeated and then pooled together in order to isolate enough DNA for sequencing (i.e., ≥280 ng).
Target enrichment probe design, sequencing, and assembly
Probe design, library preparation, and sequencing were conducted by RAPiD Genomics (Gainesville, Florida, USA). Using the 2270 identified ERs as templates and a proprietary workflow, RAPiD Genomics synthesized 14,429 biotinylated 120‐mer RNA probes in an overlapping strategy to reach full coverage of each locus. Following library preparation and target enrichment, 150‐bp paired‐end read sequencing was conducted in one lane of an Illumina NovaSeq S4 (Illumina, San Diego, California, USA).
The quality of demultiplexed raw paired‐end sequenced reads was assessed in FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and adapter sequences were trimmed with Trimmomatic (version 0.39; Bolger et al., 2014) using default settings. Cleaned reads were assembled with HybPiper (version 2.0.1; Johnson et al., 2016). For each clade of interest (i.e., Hillieae and Palicoureeae+Psychotrieae), we applied two approaches to data assembly to assess performance of (1) individual ERs or single‐exonic regions (SERs) and (2) ERs concatenated into multi‐exonic regions (MERs). With the first approach, we were able to more directly assess the utility and performance of our target sequences. The second approach allows us to recover longer sequences that likely include more informative sites useful for gene tree inference. Analyzing exons individually may be desirable for data sets that have a high proportion of multi‐copy loci and paralogs in which the assembly of supercontigs (i.e., sequences containing both coding and non‐coding flanking sequences) that span multiple exons can result in chimeric sequences (Morales‐Briones et al., 2022).
In the first approach, the individual 2270 ERs were added to the HybPiper target file and used as references for assembly. In the second approach, all ERs within each gene were concatenated within the target file. Same‐gene ERs were identified by mapping the 2270 ER sequences to multiple sets of assembled CapSim captured fragments (see section below) in an iterative fashion (Appendix S1; see Supporting Information with this article) using the Geneious mapper function in Geneious (version 2021.1.1). This approach allowed us to concatenate ERs into loci without knowledge of their synteny within each locus.
In both the SER and MER assemblies, we used the HybPiper intronerate.py script to extract supercontigs. We calculated summary statistics for all assemblies (Table 2, Appendices S2 and S3) and identified putative paralogs with HybPiper.
Table 2.
HybPiper single‐exonic region (SER) and multi‐exonic region (MER) assembly statistics averaged for each clade. Statistics are presented as SER/MER. Additional HybPiper statistics are provided in Appendix S6.
| Statistics | Hillieae | Other CIN | Palicoureeae+Psychotrieae | Other RUB | Ixoroideae | Apocynaceae |
|---|---|---|---|---|---|---|
| PctOnTarget | 85/84 | 58/61 | 74/74 | 39/39 | 32/32 | 48/48 |
| GenesWithSeqs | 1736/883 | 1377/685 | 1786/943 | 895/433 | 1204/577 | 403/224 |
| GenesAt25pct | 1712/819 | 1355/625 | 1759/865 | 869/393 | 1168/520 | 391/199 |
| GenesAt50pct | 1701/794 | 1340/593 | 1754/851 | 865/384 | 1162/510 | 384/189 |
| GenesAt75pct | 1669/764 | 1294/557 | 1733/823 | 847/370 | 1139/496 | 365/175 |
| GenesAt150pct | 1341/549 | 882/417 | 1513/640 | 688/333 | 910/443 | 242/117 |
| ParalogWarningsDepth | 596/274 | 383/153 | 336/124 | 37/17 | 65/27 | 10/5 |
Note: CIN = Cinchonoideae; RUB = Rubioideae.
Testing loci and probe performance across Rubiaceae using simulations
We performed multiple in silico target enrichment experiments using CapSim to (1) assess whether loci that were identified via Sondovač would be captured in other, distantly related Rubiaceae taxa prior to probe design and (2) assess how well the final probes would perform on taxa from across Rubiaceae (Cao et al., 2018). Briefly, CapSim performs in silico targeted enrichment experiments by taking a set of sequences and an assembled genome as input and simulates probe hybridization and Illumina sequencing, resulting in a set of captured fragments representing paired‐end reads (Cao et al., 2018).
First, we ran CapSim using five Rubiaceae assembled genomes from GenBank (Appendix 2): three Ixoroideae (Coffea canephora Pierre ex A. Froehner, Coffea eugenioides S. Moore, and Gardenia jasminoides) and two Rubioideae (Leptodermis oblonga Bunge and Ophiorrhiza pumila Champ. ex Benth.). We also included one outgroup Apocynaceae, also in Gentianales (Rhazya stricta Decne.). The classification of subfamily Ixoroideae has been in disagreement, with mixed evidence for the monophyly of Ixoroideae and its sister subfamily, Cinchonoideae (Rydin et al., 2017; Antonelli et al., 2021). Throughout this paper, we recognize both traditional subfamilies: Ixoroideae and Cinchonoideae sensu stricto. In these runs, the full length of the 2270 filtered ERs served as probes. While using CapSim in this way is not directly comparable to an in vitro target enrichment experiment that uses 120‐bp probe sequences, it allows us to determine whether our loci are conserved across Rubiaceae, and thus, their likely empirical utility. In each CapSim run, we simulated Illumina sequencing and chose to produce between 10 and 40 million captured fragments (see Appendix S4), approximating real sequencing effort. Each resulting set of captured fragments was mapped to its original reference genome assembly and contigs were de novo assembled while trimming paired‐end read overhangs in Geneious (version 2021.1.1), using the ‘Geneious mapper’ function under default settings. After assembling contigs from the CapSim output for these six taxa, we used the total number of contigs assembled as a proxy for sequence recovery from our probe set, with a higher number indicating greater utility within that clade.
After validating the performance of ERs across Rubiaceae using the first CapSim approach described above, we assessed how well the final designed probes would perform on other taxa from across Rubiaceae. We reran CapSim, using the 14,429 120‐mer RNA probe sequences, rather than the full‐length ERs as above. We simulated 10 million captured fragments from all assembled genomes listed in Appendix S5. This approach approximates how the probes designed by RAPiD Genomics from our target loci will perform on other Rubiaceae taxa, especially those distantly related from our focal subfamilies Rubioideae and Cinchonoideae for which we did not generate empirical target enrichment data. We followed the same downstream methodology as stated in the previous section: HybPiper was used to assemble captured fragments using both approaches (i.e., using SERs and MERs), calculate summary statistics, and identify potential paralogs.
To assess the informativeness of the sequence data captured across taxonomic scales in Rubiaceae, we created multiple sequence alignments of the ER assemblies (Table 3) with MAFFT (Katoh and Standley, 2013). Summary statistics were calculated for each alignment set with AMAS (Borowiec, 2016). No cleaning was performed on these alignments prior to calculating summary statistics.
Table 3.
AMAS single‐exonic region alignment statistics, showing average values for each statistic with ranges in parentheses. Additional AMAS statistics are provided in Appendix S8.
| Statistics | Hillieae | Palicoureeae+Psychotrieae | All combined a |
| No_of_taxa | 32 (2–41) | 90 (2–109) | 123 (2–164) |
| Alignment_length (bp) | 1510 (282–12,591) | 2199 (227–32,805) | 4184 (389–41,669) |
| Missing_percent (%) | 52 (7–80) | 56 (1.5–90) | 73 (14–95) |
| No_variable_sites | 247 (1–1237) | 808 (2–9764) | 1473 (11–13,407) |
| Proportion_variable_sites | 0.2 (0.002–0.5) | 0.4 (0.003–0.7) | 0.4 (0.01–0.6) |
| Parsimony_informative_sites | 85 (0–674) | 386 (0–5462) | 710 (0–7080) |
| Proportion_parsimony_informative | 0.06 (0–0.3) | 0.2 (0–0.5) | 0.2 (0–0.5) |
“All combined” alignments include Hillieae, Palicoureeae+Psychotrieae, and all outgroups (including CapSim samples).
Phylogenetic inference
To assess the phylogenetic utility of our target sequences at different taxonomic scales and to compare our SER and MER data sets, we inferred two coalescent‐aware phylogenetic trees using each data type and a subset of our sampled taxa. We included a small group of 10 closely related Palicourea in subfamily Rubioideae, three species from tribe Hillieae in subfamily Cinchonoideae (Hillia triflora var. triflora (Oerst.) C. M. Taylor, Balmea stormiae Martínez, and Cosmibuena grandiflora (Ruiz & Pav.) Rusby), and two species from Hamelieae in subfamily Cinchonoideae (Hamelia patens and Hoffmannia phoenicopoda). We also included all taxa from the CapSim experiments: four Rubioideae from four tribes (Morindeae, Paederieae, Rubieae, and Ophiorrhizeae), four Ixoroideae from two tribes (Coffeeae and Gardenieae), two Cinchonoideae from two tribes (Cinchoneae and Naucleeae), and outgroup Rhazya stricta from Apocynaceae (Appendix 2). We excluded all loci flagged as paralogous by HybPiper from the analysis, leaving a remainder of 356 out of 2270 SERs and 428 out of 1059 MERs to be used for downstream analyses. In analyses of both data sets, sequences were aligned using MAFFT (Katoh and Standley, 2013). Columns with more than 20% missing data were removed from each alignment using Phyx version 1.3 (Brown et al., 2017), and spurious sequences, here defined as sequences shorter than 20% of the total alignment length, were removed using trimAl (Capella‐Gutiérrez et al., 2009). We also removed erroneous sequences from alignments using TAPER (Zhang et al., 2021).
After cleaning the alignments, gene trees were inferred using RAxML‐NG (Kozlov et al., 2019) under the GTR+G substitution model. This method automatically excludes alignments with fewer than four taxa. All other gene trees were kept, but nodes with less than 20% support were collapsed using Newick Utilities version 1.6 (Junier and Zdobnov, 2010). Species trees were inferred using ASTRAL III (Zhang et al., 2018) with default settings. A total of 322 and 406 gene trees (with four or more taxa) from the SER and MER data sets, respectively, were inferred and used as input for each ASTRAL analysis. We used the local posterior probability scores (LPP) and normalized quartet scores provided by ASTRAL to estimate branch support and explore gene tree discordance.
RESULTS
Locus selection and testing
We identified a total of 20,100 ERs across the six runs with Sondovač. The initial run of cd‐hit‐est resulted in 17,464 clusters, although most (15,194 = 87%) included only a single ER. These unique ERs predominantly originated from the CIN1 and CIN2 ER sets (Table 1), likely due to the relatively high quality and large size of the Hamelia patens and Neolamarckia cadamba transcriptomes. After removal of unique ERs, the set was reduced to 2270 clusters; 662 consisted of sequences uniquely mined from Rubioideae, 1589 consisted of sequences uniquely mined from Cinchonoideae, and 19 included sequences from both Rubioideae and Cinchonoideae. Of the representative sequences selected to represent the 19 clusters shared between subfamilies, six came from Rubioideae and 13 from Cinchonoideae. Below, we refer to loci that are derived from Rubioideae sequences as RUB and those derived from Cinchonoideae as CIN. ER length was 120–4841 bp (x̄ = 323 bp); concatenating same‐gene ERs resulted in a set of 1059 MER loci 125–4841 bp long (x̄ = 669 bp). Only 21 ERs (0.9%) had hits longer than 120 bp corresponding to Angiosperms353 loci.
The first round of CapSim target enrichment simulations to assess the performance of our probe set consistently resulted in sets of >500 assembled contigs despite variation in the size of the input assembled genome and number of reads simulated (Appendix S4). While paralogs and contigs assembled from off‐target reads may add to the total number of contigs assembled per input genome, the relatively high number of contigs assembled for all Rubiaceae taxa included suggests phylogenetic utility of our selected loci across the family.
Sequencing, assembly, and performance of probes across Rubiaceae
Silica‐dried specimens tended to yield DNA with higher molecular weight than herbarium specimens (25 samples had DNA with a high enough molecular weight to be sheared prior to sequencing and only one of these was derived from a herbarium specimen); however, total DNA quantity varied considerably among these samples. While the samples with the lowest DNA quantities were derived from herbarium specimens, the sample with the most DNA also came from a herbarium specimen with degraded DNA. Of the 164 samples for which empirical target enrichment was attempted, 153 (~93%) were successfully sequenced (Appendix 1). All of the samples that failed sequencing were derived from herbarium specimens. Sequenced samples include at least one sample representing each of 25 Hillieae species (86% of all species), three other Cinchonoideae species (Hamelia patens, Hoffmannia phoenicopoda, and Coutarea hexandra), 97 Palicoureeae (~10% of all species), and 12 Psychotrieae (~2% of all species). On average, 4,684,470 reads were sequenced per sample.
HybPiper assembly statistics for SERs and MERs averaged for each clade (Rubioideae, Cinchonoideae, Ixoroideae, Apocynaceae) are summarized in Table 2 (additional statistics are provided in Appendix S6). HybPiper statistics for each sample are provided in Appendices S2 and S3 for SERs and MERs, respectively. Both newly sequenced samples and those that underwent CapSim simulation using the final designed probes are included in the assembly results (Table 2). The percentage of reads that mapped to sequences in the HybPiper target file (PctOnTarget) varied from 30% to 96% across all samples. CapSim‐derived samples had a lower PctOnTarget, averaging 36% for SER assemblies, while all other samples had an average of 77%. We recovered an average of 1712 SERs and 892 MERs per Rubiaceae sample, amounting to 75% and 84% of SER and MER targets, respectively. Only six samples (five Hillieae and one Apocynaceae) had a recovery of less than 25% of SERs and MERs. Most SERs (71%) had both ≥75% of the locus length recovered and ≥70% taxon coverage.
In both the SER and MER assemblies, Palicoureeae+Psychotrieae samples had the greatest number of genes with assembled sequences on average, followed by Hillieae, other Cinchonoideae, Ixoroideae, other Rubioideae, and Apocynaceae (Table 2). Generally, the number and completeness of genes assembled for a given sample was greater when the sample originated from the same clade as the genomic source from which the probe sequence was derived (Figure 1, Appendix S7). For Rubioideae assemblies, the percentage of the locus length recovered by HybPiper was higher for loci captured by probes derived from Rubioideae (RUB) vs. the probes derived from Cinchonoideae (CIN). Within this clade, 91% of SERs captured by RUB probes had both ≥75% of the locus length recovered and ≥70% taxon coverage, compared to 67% of SERs captured by CIN probes. Similarly for Cinchonoideae, the percentage of the locus length recovered is higher for loci captured by CIN probes. In this clade, 92% of SERs captured by CIN probes had both ≥75% of the locus length recovered and ≥70% taxon coverage, compared to 62% of SERs captured by RUB probes. For the Ixoroideae assemblies, completeness (in terms of taxon coverage and the percentage of the locus length recovered) was also higher for loci captured by CIN probes. Genes assembled for Rhazya stricta (Apocynaceae) had similar capture for both RUB and CIN probes. We observed no significant differences in the number of sequenced reads (two‐tailed t‐test P value = 0.14, α = 0.05) or locus completeness (two‐tailed t‐test P value = 0.054, α = 0.05) between samples derived from herbarium and silica‐dried tissue.
Figure 1.

Heatmap showing the results of HybPiper single‐exonic region (SER) assemblies for Cinchonoideae (CIN), Rubioideae (RUB), and Ixoroideae (IXOR) samples, and one Apocynaceae sample. Target SERs are on the x‐axis and are grouped by the taxonomic source of the reference sequence. “RUB+CIN” indicates the 19 SERs from either RUB or CIN that shared ≥90% sequence similarity with one or more SERs from other clades during cd‐hit‐est clustering. Samples are on the y‐axis and are grouped by taxonomy. Darker shades represent higher percentages of the locus length recovered by HybPiper. Locus completeness tends to be higher when target species are from the same clade as the probe sequences.
The number of potential paralogs flagged by HybPiper, as identified by the number of loci for which the coverage depth of coding sequences extracted by Exonerate (Slater and Birney, 2005) within HybPiper was >1 for 75% of the length of the reference sequence (ParalogWarningsDepth in HybPiper), varied widely from clade to clade. To a lesser extent, the number of potential paralogs also varied according to whether SERs or MERs were used as references during assembly. At the subfamily level, Cinchonoideae had the highest average proportion of paralogs (32% of SERs and 26% of MERs), followed by Rubioideae (18% of SERs and 13% of MERs), Ixoroideae (5% of both SERs and MERs), and Apocynaceae (1% of SERs and 2% of MERs; Table 2).
AMAS alignment statistics are summarized in Table 3 (additional statistics are provided in Appendix S8). Alignment statistics varied depending on which Rubiaceae taxa were included in the alignments (Figure 2A). For the set of alignments including all Rubiaceae sampled in this study, the number of variable sites ranged from 11–13,407 per locus (x̄ = 1473 sites) and the number of parsimony informative sites ranged from 0–7080 per locus (x̄ = 710 sites; Figure 2B). As expected, the average proportion of parsimony informative sites across loci was lower for within‐clade (Hillieae and Palicoureeae+Psychotrieae) assemblies than for the across‐clade assemblies (Figure 2A). In both cases, long loci composed of hundreds of informative sites were recovered.
Figure 2.

AMAS summary statistics for the single‐exonic region alignments. (A) Violin plots showing the proportion of parsimony informative sites for within‐clade (“HIL” and “PAL+PSY”) and across‐clade (“All”) alignments. (B) Scatter plot of alignment length against the number of parsimony informative sites, including the set of alignments with all Rubiaceae combined.
Phylogenetic inference
Species tree inference from SERs (Figure 3) and MERs (Appendix S9) recovered the same phylogenetic relationships and varied in support at certain internal nodes. Notably, the placement of Cinchona pubescens as sister to the rest of Cinchonoideae is highly supported (LPP = 1.00) in the tree inferred from SERs but is poorly supported (LPP = 0.22) in the tree inferred from MERs. Additionally, two internal branches in Palicoureeae have lower support in the tree inferred from MERs compared to the tree inferred from SERs.
Figure 3.

ASTRAL species tree estimated using the single‐exonic region data set. Numbers above branches indicate local posterior probability support values; only values <0.98 are shown. Pie charts at internal nodes indicate quartet support (i.e., the percentage of quartets in gene trees that agree with the branch) for the main topology (blue), the first alternate topology (yellow), and the second alternate topology (pink). The two Palicourea suerrensis samples included originate from different populations.
In analyses of both SERs and MERs, quartet topologies at most branches tended to be congruent with the species tree; in both analyses, the main quartet topology accounted for ≥50% of possible topologies at 84% of branches. Quartet support for the main topology on internal branches of the SER tree ranged from 0.36–0.98 (x̄ = 0.67), support for the second alternative topology ranged from 0.01–0.4 (x̄ = 0.17), and support for the third alternative topology ranged from 0.01–0.4 (x̄ = 0.16). In the MER tree, quartet support for the main topology ranged from 0.37–0.97 (x̄ = 0.65), support for the second alternative topology ranged from 0.02–0.38 (x̄ = 0.17), and support for the third alternative topology ranged from 0.01–0.39 (x̄ = 0.17). In both species trees, Cinchonoideae is inferred as paraphyletic with respect to Ixoroideae and Hamelieae is inferred as paraphyletic with respect to Hillieae.
DISCUSSION
Despite the growing importance of target enrichment as a strategy to estimate phylogenies across many different plant groups, the method has rarely been used in Rubiaceae. To date, most published target enrichment data from the family were obtained using universal probe sets (Antonelli et al., 2021; Thureborn et al., 2022) or with microarray technology (Prata et al., 2018), and Rubiaceae lacked a publicly accessible set of family‐specific probes for target enrichment. Filling this gap, we used publicly accessible genomic resources for various Rubioideae and Cinchonoideae taxa to isolate a set of 2270 ERs for target enrichment with phylogenetic utility at various scales across Rubiaceae. We tested the performance of these probes, called Rubiaceae2270x, via in vitro target enrichment in two distantly related clades—Hillieae and Palicoureeae+Psychotrieae—and with in silico simulations for taxa outside of our clades of interest. In general, performance of this probe set was high in both empirical data generation and simulated target enrichment in taxa across the family.
Although low‐depth whole genome sequences may not yield high coverage across single‐copy nuclear loci, by mapping genome skimming reads to transcriptomes (the basis of our target selection strategy) we were able to identify thousands of exonic regions, useful for phylogenetic inference. Our Rubiaceae2270x probe set performed well in targeting loci from species across Rubiaceae, despite probe sequences being entirely derived from Rubioideae and Cinchonoideae. There was little overlap between the loci identified from these two subfamilies: less than 1% of the CIN and RUB final ERs mapped to the same genes. Despite this, probes designed from one clade's genomic resources were efficient at targeting loci in other clades. Illustrating this, although ca. 70% of the probe sequences originated from a single clade (Cinchonoideae), an average of 1733 SERs (76.3%) were recovered with at least 75% of the target sequence for sequenced Rubioideae taxa in Palicoureeae+Psychotrieae (Table 2). Similarly, simulated target enrichment was high for Rubiaceae outside these two clades, resulting in 668–1168 SERs with at least 75% of the target sequence; simulated capture even resulted in 345 SERs for Rhazya stricta (Apocynaceae), the outgroup taxon (Figure 2). However, probes did tend to perform better (at least in terms of percentage of the locus length recovered) when the target species were from the same clade from which probe sequences were derived. Overall, we demonstrate the utility of Rubiaceae2270x by generating dense empirical data sets for Cinchonoideae and Palicoureeae+Psychotrieae using a high proportion of herbarium specimens, while success of in silico simulations suggests that this locus data set can be successfully captured across Rubiaceae.
The loci isolated using Rubiaceae2270x have properties that are desirable for phylogenomics, including high variability (Figure 2), long average length (Table 3), and a relatively low proportion of paralogs (the greatest average proportion of potential paralogs observed in a clade was 27% [MERs] for Hillieae). Comparing our Palicoureeae+Psychotrieae (Rubioideae) data set to Thureborn et al. (2022), which used Angiosperms353 to sequence taxa across Rubioideae, Rubiaceae2270x resulted in a similar proportion of total loci with paralog warnings (55% in our study vs. 53% in Thureborn et al., 2022). Here, these percentages refer to loci for which >1 contig covering a specified percentage of the length of the reference sequence was assembled by HybPiper (our study uses a 75% cutoff and Thureborn et al. uses a less conservative cutoff of 85%). While informative for assessing the number of potentially paralogous genes present in a data set, this metric does not distinguish allelic variation from potential paralogs. Our probe set will facilitate future work needed to determine the nature of the loci that receive this paralog warning.
Both family‐wide and within‐clade alignment sets showed sufficient sequence divergence and informativeness for phylogenetic analysis, with an average proportion of parsimony informative sites varying from 6% in Hillieae to nearly 20% across Rubiaceae (Figure 2A). Palicoureeae+Psychotrieae alignments tended to be more variable than Hillieae alignments (Figure 2), a pattern that is expected when sequences for more species are included. An increase in the proportion of informative sites across alignments may also be explained by the relative ages of the clades in this study—18.7 Ma for Hillieae vs. 63.0 Ma for Palicoureeae+Psychotrieae (Bremer and Eriksson, 2009). By representing an older clade, Palicoureeae+Psychotrieae may have accumulated more substitutions through time. The variation present in these loci suggests that Rubiaceae2270x has utility across phylogenetic scales within Rubiaceae.
In the ASTRAL analyses of a subset of our sampled Rubiaceae, most nodes were recovered with high support (LPP ≥ 0.98) in both species trees, including relationships between closely related Palicourea species, despite only including the small proportion of loci with no paralog warnings (14%) in the analyses. Relationships between tribes and subfamilies were mostly consistent with previous studies (Bremer and Eriksson, 2009; Wikström et al., 2015). One major distinction is that our results reject the monophyly of subfamily Cinchonoideae. In our SER (Figure 3) and MER (Appendix S9) species trees, Mitragyna speciosa (Cinchonoideae) was recovered as sister to Ixoroideae, although placement of this species was not strongly supported (LPP = 0.70 in the SER tree and 0.75 in the MER tree) and a second alternative topology is supported by a large proportion of loci. While Cinchona pubescens was sister to Hillieae+Hamelieae in both trees, its placement was poorly supported (LPP = 0.22) in the MER tree. Other recent phylogenetic studies of Rubiaceae and Gentianales more broadly have also shown Cinchonoideae to be paraphyletic with respect to Ixoroideae (Rydin et al., 2017; Wikström et al., 2020; Antonelli et al., 2021). However, these studies also demonstrate that tree topology and support for relationships can vary depending on the nature of the data as well as the methods being used for phylogenetic inference (e.g., plastid vs. nuclear genomic data, or concatenation‐based vs. coalescent‐aware phylogenetic inference). Other lower‐order phylogenetic relationships inferred (e.g., relationships within Hillieae and Rubioideae) were more consistent with our expectations based on previous studies (Bremer and Eriksson, 2009; Sedio et al., 2013; Wikström et al., 2015; Razafimandimbison et al., 2017; Thureborn et al., 2022). These phylogenetic results provide additional support for the efficacy of our probe set in recovering informative loci in taxa from across Rubiaceae, and for their applicability in both shallow‐ and deep‐level phylogenomic studies. By providing thousands of variable loci specific to Rubiaceae, our probe set could facilitate the study of sources of gene tree discordance (i.e., incomplete lineage sorting, gene flow, whole‐genome duplication), identify nodes with higher conflict, and infer non‐bifurcating relationships in Rubiaceae.
Rubiaceae2270x has very little overlap with the Angiosperms353 loci, with fewer than 1% of loci mapping to Angiosperms353 for more than 120 bp in length. Therefore, our data set can be combined with Angiosperms353, or other universal locus data sets, in future target enrichment–based studies. Angiosperms353 has already been used successfully to infer highly supported phylogenetic relationships in the Rubioideae clade (Thureborn et al., 2022). Combining universal probe set data and lineage‐specific data is a promising future direction for resolving recalcitrant relationships in Rubiaceae (particularly at shallow nodes). Studies in other families have already successfully adopted this approach (Chau et al., 2018; Hendriks et al., 2021; Shah et al., 2021; Siniscalchi et al., 2021; Ufimov et al., 2021). Combining probe sets allows for the selection of loci that have the most desirable properties for phylogenetic inference within the taxon of interest (Baker et al., 2022; Yardeni et al., 2022) and has the added benefit of drawing from and adding to a large, publicly available data set of shared taxa across plant groups. The Rubiaceae2270x target enrichment probe set is a tool for researchers to generate hundreds of single‐copy loci for phylogenomic inference in Rubiaceae. Similar resources have been developed for other species‐rich angiosperm families, including Asteraceae (Mandel et al., 2014), Melastomataceae (Jantzen et al., 2020), Orchidaceae (Eserman et al., 2021), and Bromeliaceae (Yardeni et al., 2022). We hope that Rubiaceae2270x will facilitate an increase in the amount of genome‐scale data available for this large and ecologically important plant family, leading to improved resolution of relationships among major clades of Rubiaceae, which remain to be fully understood (Bremer and Eriksson, 2009).
AUTHOR CONTRIBUTIONS
L.D.B. designed the probe set, performed all Hillieae DNA extractions, and wrote the first draft of this manuscript. L.P.L. designed the sampling strategy for the target clades and funded the project. L.D.B. and A.M.B. performed bioinformatic analyses. C.M.T. validated the taxonomic status of all studied specimens and designed the sampling strategy for Rubioideae. All authors collaborated on writing, editing, and approving the final version of the document ahead of submission.
Supporting information
Appendix S1. Exonic region concatenation protocol.
Appendix S2. HybPiper summary statistics for single‐exonic region assemblies. Grayed rows indicate CapSim‐derived sequencing data. Bolded rows indicate silica‐dried specimens.
Appendix S3. HybPiper summary statistics for multi‐exonic region assemblies. Grayed rows indicate CapSim‐derived sequencing data. Bolded rows indicate silica‐dried specimens.
Appendix S4. Results of initial CapSim, simulated target enrichment experiment. Contigs is the number of contigs recovered during the map to reference assembly.
Appendix S5. Statistics for downloaded genomes used in CapSim experiments. “Total length” (Mb = megabases) is the length of all contigs combined in the original genome assembly. “N50” is the scaffold N50 score provided with the NCBI accession for the genome (see Appendix 2), used here as a metric for genome quality.
Appendix S6. HybPiper single‐exonic region and multi‐exonic region (MER) assembly statistics averaged for each clade. Statistics for MER assemblies are in blue.
Appendix S7. Heatmap showing results of HybPiper multi‐exonic region (MER) assemblies for Cinchonoideae (CIN), Rubioideae (RUB), and Ixoroideae (IXOR) samples, and one Apocynaceae sample. Target MERs are on the x‐axis and are grouped by the taxonomic source of the reference sequence. “RUB+CIN” indicates the 22 MERs with constituent ERs from both RUB and CIN. Samples are on the y‐axis and are grouped by taxonomy. Darker shades represent higher percentages of the locus length recovered by HybPiper. Locus completeness tends to be higher when target species are from the same clade as the probe sequences.
Appendix S8. AMAS single‐exonic region alignment statistics, showing average values for each statistic with ranges in parentheses. “All combined” alignments include Hillieae, Palicoureeae+Psychotrieae, and all outgroups (including CapSim samples).
Appendix S9. ASTRAL species tree estimated using the multi‐exonic region data set. Numbers above branches indicate local posterior probability support values; only values <0.98 are shown. Pie charts at internal nodes indicate quartet support (i.e., the percentage of quartets in gene trees that agree with the branch) for the main topology (blue), the first alternate topology (yellow), and the second alternate topology (pink). The two Palicourea suerrensis samples included originate from different populations.
ACKNOWLEDGMENTS
Financial support was provided by start‐up funds from the Louisiana State University (LSU) College of Science, LSU Office of Research and Economic Development, and National Science Foundation (NSF) Award #2055525 to L.P.L., and graduate student grants from the Botanical Society of America, Garden Club of America, American Society of Plant Taxonomists, and Society of Systematic Biologists to L.D.B. L.D.B. was further supported by an NSF Graduate Research Fellowship. Feedback from Janet Mansaray, Diego Paredes‐Burneo, Katherin Arango‐Gómez, Aislinn Mumford, and Laura Frost significantly improved previous drafts of this manuscript. We thank LSU and the Missouri Botanical Garden for providing access to their herbaria, and for granting permission to destructively sample specimens.
Appendix 1. Voucher information for sampled specimens that underwent sequencing.
| Species | Tribe | Voucher | Institution | Tissue type |
|---|---|---|---|---|
| Coutarea hexandra (Jacq.) K. Schum. | Chiococceae | Callejas 4587 | LSU | Herbarium |
| Ferdinandusa paraensis Ducke | Dialypetalantheae | Mori 20866 | LSU | Herbarium |
| Hamelia patens Jacq. | Hamelieae | Lagomarsino s.n. | LSU | Silica |
| Hoffmannia phoenicopoda K. Schum. | Hamelieae | Wendt 3392 | LSU | Herbarium |
| Balmea stormiae Martínez | Hillieae | Vázquez 1081 | MO | Herbarium |
| Cosmibuena grandiflora (Ruiz & Pav.) Rusby | Hillieae | Haber 409 | MO | Herbarium |
| Cosmibuena grandiflora (Ruiz & Pav.) Rusby* | Hillieae | Pipoly 10310 | LSU | Herbarium |
| Cosmibuena grandiflora (Ruiz & Pav.) Rusby | Hillieae | Tyson 904 | MO | Herbarium |
| Cosmibuena macrocarpa (Benth.) Klotzsch ex Walp. | Hillieae | Silverstone‐Sopkin 10729 | MO | Herbarium |
| Cosmibuena matudae (Standl.) L. O. Williams | Hillieae | Moreno 9550 | MO | Herbarium |
| Cosmibuena valerii Standl. | Hillieae | Haber 548 | LSU | Herbarium |
| Hillia allenii C. M. Taylor | Hillieae | McPherson 11676 | MO | Herbarium |
| Hillia bonoi Steyerm. | Hillieae | Burandt V0007 | MO | Herbarium |
| Hillia chiapensis Standl. | Hillieae | Stevens 11558 | LSU | Herbarium |
| Hillia foldatsii Steyerm. | Hillieae | Holst 3772 | MO | Herbarium |
| Hillia grayumii C. M. Taylor | Hillieae | Stevens 24958 | MO | Herbarium |
| Hillia illustris (Vell.) K. Schum. | Hillieae | Rimachi Y. 9155 | LSU | Herbarium |
| Hillia illustris (Vell.) K. Schum. | Hillieae | Solomon 13957 | LSU | Herbarium |
| Hillia illustris (Vell.) K. Schum. | Hillieae | Fuentes 4120 | MO | Herbarium |
| Hillia illustris (Vell.) K. Schum. | Hillieae | Vásquez 4130 | MO | Herbarium |
| Hillia killipii Standl. | Hillieae | Valenzuela 7574 | MO | Herbarium |
| Hillia killipii Standl. | Hillieae | van der Werff 25095 | MO | Herbarium |
| Hillia longifilamentosa (Steyerm.) C. M. Taylor | Hillieae | Gamboa R. 2206 | MO | Herbarium |
| Hillia loranthoides Standl. | Hillieae | Bello 844 | MO | Herbarium |
| Hillia macbridei Standl. | Hillieae | Zak 3788 | MO | Herbarium |
| Hillia macrophylla Standl. | Hillieae | Bello 767 | MO | Herbarium |
| Hillia macrophylla Standl. | Hillieae | Cornejo 8056 | MO | Herbarium |
| Hillia macrophylla Standl. | Hillieae | Rojas 9037 | MO | Herbarium |
| Hillia macrophylla Standl. | Hillieae | Werner 2171 | MO | Herbarium |
| Hillia maxonii Standl.* | Hillieae | Dwyer 7377 | MO | Herbarium |
| Hillia maxonii Standl. | Hillieae | Luteyn 12690 | MO | Herbarium |
| Hillia maxonii Standl. | Hillieae | Morales 4855 | MO | Herbarium |
| Hillia palmana Standl. | Hillieae | Hammel 13930 | MO | Herbarium |
| Hillia parasitica Jacq. | Hillieae | Jiménez 2189 | MO | Herbarium |
| Hillia parasitica Jacq. | Hillieae | Prance 29385 | LSU | Herbarium |
| Hillia parasitica Jacq.* | Hillieae | Veley 1362 | LSU | Herbarium |
| Hillia parasitica Jacq. | Hillieae | Zarucchi 5659 | MO | Herbarium |
| Hillia parasitica Jacq. | Hillieae | Kvist192 | LSU | Herbarium |
| Hillia pumila C. M. Taylor | Hillieae | Vásquez 28189 | MO | Herbarium |
| Hillia tetrandra Sw. | Hillieae | Martínez 20613 | MO | Herbarium |
| Hillia triflora (Oerst.) C. M. Taylor var. triflora | Hillieae | Bello 5314 | MO | Herbarium |
| Hillia triflora (Oerst.) C. M. Taylor var. triflora | Hillieae | Feinsinger 626‐A | MO | Herbarium |
| Hillia triflora var. pittieri (Standl.) C. M. Taylor | Hillieae | Croat 14341 | MO | Herbarium |
| Hillia ulei K. Schum. ex Ule | Hillieae | Foster 9667 | LSU | Herbarium |
| Hillia ulei K. Schum. ex Ule | Hillieae | Revilla 576 | MO | Herbarium |
| Hillia wurdackii Steyerm. | Hillieae | Monteagudo 15255 | MO | Herbarium |
| Hillia wurdackii Steyerm. | Hillieae | Valenzuela 10940 | MO | Herbarium |
| Hillia wurdackii Steyerm. | Hillieae | Woytkowski 8236 | MO | Herbarium |
| Carapichea guianensis Aubl. | Palicoureeae | González 2158 | MO | Herbarium |
| Carapichea ipecacuanha (Brot.) L. Andersson | Palicoureeae | Croat 15117 | MO | Herbarium |
| Eumachia boliviana (Standl.) Delprete & J. H. Kirkbr. | Palicoureeae | Campbell 22035 | MO | Herbarium |
| Notopleura epiphytica (K. Krause) C. M. Taylor | Palicoureeae | Neill 15737 | MO | Silica |
| Notopleura uliginosa (Sw.) Bremek. | Palicoureeae | Stevens 37138 | MO | Silica |
| Palicourea acanthacea (Standl. ex Steyerm.) C. M. Taylor | Palicoureeae | Monsalve 352 | MO | Herbarium |
| Palicourea acuminata (Benth.) Borhidi | Palicoureeae | Fonnegra 6968 | MO | Herbarium |
| Palicourea acuminata (Benth.) Borhidi | Palicoureeae | Lachenaud 966 | MO | Herbarium |
| Palicourea acuminata (Benth.) Borhidi | Palicoureeae | Suazo 4616 | MO | Herbarium |
| Palicourea allenii (Standl.) Borhidi | Palicoureeae | Clark 243 | MO | Herbarium |
| Palicourea amethystina (Ruiz & Pav.) DC. | Palicoureeae | Jaramillo 1989 | MO | Herbarium |
| Palicourea andina C. M. Taylor | Palicoureeae | Dziedzioch 149 | MO | Herbarium |
| Palicourea angustifolia Kunth | Palicoureeae | Wolff 16 | MO | Herbarium |
| Palicourea apicata Kunth | Palicoureeae | Stergios 2543 | MO | Herbarium |
| Palicourea apoda (Steyerm.) Delprete & J. H. Kirkbr.* | Palicoureeae | Pipoly 8373 | MO | Herbarium |
| Palicourea attenuata Rusby | Palicoureeae | Fuentes 4647 | MO | Silica |
| Palicourea bangii (Rusby) C. M. Taylor | Palicoureeae | Fuentes 12864 | MO | Silica |
| Palicourea berteroana (DC.) Borhidi | Palicoureeae | Taylor 11646 | MO | Silica |
| Palicourea brachiata (Sw.) Borhidi | Palicoureeae | Davidse 36906 | MO | Silica |
| Palicourea brachiata (Sw.) Borhidi | Palicoureeae | Taylor 11718 | MO | Silica |
| Palicourea brevicollis (Müll. Arg.) C. M. Taylor | Palicoureeae | Zardini 15831 | MO | Herbarium |
| Palicourea callithrix (Miq.) Delprete & J. H. Kirkbr. | Palicoureeae | Granville 13399 | MO | Herbarium |
| Palicourea colorata (Benth.) Borhidi | Palicoureeae | Liesner s.n. | MO | Herbarium |
| Palicourea conephoroides (Rusby) C. M. Taylor | Palicoureeae | Quizhpe 672 | MO | Herbarium |
| Palicourea correae (Dwyer & M. V. Hayden) Borhidi | Palicoureeae | Dwyer 1968 | MO | Herbarium |
| Palicourea correae (Dwyer & M. V. Hayden) Borhidi | Palicoureeae | MacDougal 6258 | MO | Silica |
| Palicourea corymbifera (Müll. Arg.) Standl. | Palicoureeae | Grimes 3319 | MO | Herbarium |
| Palicourea croceoides (Sw.) Roem. & Schult. | Palicoureeae | Merello 1713 | MO | Silica |
| Palicourea croceoides Desv. ex Ham. | Palicoureeae | Taylor 11640 | MO | Silica |
| Palicourea cyanococca (Dombrain) Borhidi | Palicoureeae | Stevens 30807 | MO | Silica |
| Palicourea deflexa (DC.) Borhidi | Palicoureeae | Meave 1172 | MO | Herbarium |
| Palicourea deflexa (DC.) Borhidi | Palicoureeae | Taylor 11717 | MO | Silica |
| Palicourea demissa Standl. | Palicoureeae | Zak 3081 | MO | Herbarium |
| Palicourea dichotoma (Rudge) Delprete & J. H. Kirkbr. | Palicoureeae | Maceda 1475 | MO | Herbarium |
| Palicourea didymocarpos (A. Rich.) Griseb. | Palicoureeae | Nee 35839 | MO | Herbarium |
| Palicourea divaricata Schltdl. | Palicoureeae | Carvalho 6490 | MO | Herbarium |
| Palicourea domingensis (Jacq.) DC. | Palicoureeae | Axelrod 1028 | MO | Herbarium |
| Palicourea egensis (Müll. Arg.) Borhidi | Palicoureeae | Liesner 7059 | MO | Herbarium |
| Palicourea elata (Sw.) Borhidi | Palicoureeae | Ibarra‐Manriquez 5301 | MO | Herbarium |
| Palicourea elata (Sw.) Borhidi | Palicoureeae | Stevens 36104 | MO | Silica |
| Palicourea flavescens Kunth | Palicoureeae | van derWerff 10948 | MO | Herbarium |
| Palicourea flavifolia (Rusby) Standl. | Palicoureeae | Maldonado 2948 | MO | Silica |
| Palicourea glomerulata (Donn. Sm.) Borhidi | Palicoureeae | Berger 1481 | MO | Herbarium |
| Palicourea gracilenta (Müll. Arg.) Delprete & J. H. Kirkbr. | Palicoureeae | Croat 102206 | MO | Herbarium |
| Palicourea grandifolia (Humb. & Bonpl. ex Roem. & Schult.) Standl. | Palicoureeae | Liesner 6534 | MO | Herbarium |
| Palicourea guianensis Aubl. | Palicoureeae | de la Quintana 257 | MO | Silica |
| Palicourea guianensis Aubl. | Palicoureeae | Redden 2302 | MO | Herbarium |
| Palicourea guianensis Aubl. | Palicoureeae | Will 83 | MO | Herbarium |
| Palicourea hazenii (Standl.) Borhidi | Palicoureeae | Freire 1098 | MO | Herbarium |
| Palicourea hoffmannseggiana (Müll. Arg.) Delprete & J. H. Kirkbr. | Palicoureeae | Torida‐Marbot 177 | MO | Herbarium |
| Palicourea jelskii Standl. | Palicoureeae | Fuentes 12894 | MO | Silica |
| Palicourea justicifolia (Rudge) Delprete & J. H. Kirkbr. | Palicoureeae | Gutiérrez 534 | MO | Herbarium |
| Palicourea lasiantha K. Krause | Palicoureeae | Graham 199 | MO | Herbarium |
| Palicourea lasiorrhachis Oerst.* | Palicoureeae | Wilbur 19554 | MO | Herbarium |
| Palicourea lehmannii (K. Schum. & K. Krause) Standl. | Palicoureeae | Silverstone 8396 | MO | Herbarium |
| Palicourea lineata Benth. | Palicoureeae | Garcia 112 | MO | Herbarium |
| Palicourea loxensis C. M. Taylor | Palicoureeae | Neill 16912 | MO | Silica |
| Palicourea luteonivea C. M. Taylor | Palicoureeae | Cayola 2534 | MO | Silica |
| Palicourea macrobotrys (Ruiz & Pav.) DC. | Palicoureeae | Gatti 17549 | MO | Herbarium |
| Palicourea marcgravii A. St.‐Hil. | Palicoureeae | Vasconcelos s.n. | MO | Herbarium |
| Palicourea muscosa (Jacq.) Delprete & J. H. Kirkbr. | Palicoureeae | Meier 3151 | MO | Herbarium |
| Palicourea nitidella (Müll. Arg.) Standl. | Palicoureeae | Liesner 6369 | MO | Herbarium |
| Palicourea obliquinervia (Müll. Arg.) Borhidi | Palicoureeae | Clarke 1268 | MO | Herbarium |
| Palicourea ostreophora (Wernham) Borhidi | Palicoureeae | Schunke 8366 | MO | Herbarium |
| Palicourea padifolia (Humb. & Bonpl. ex Roem. & Schult.) C. M. Taylor & Lorence | Palicoureeae | Dietzsch 1390 | MO | Herbarium |
| Palicourea petiolaris Kunth | Palicoureeae | Ortega 3101 | MO | Herbarium |
| Palicourea polycephala (Benth.) Delprete & J. H. Kirkbr. | Palicoureeae | Ehringhaus 56 | MO | Herbarium |
| Palicourea potaroensis (Sandwith) Delprete & J. H. Kirkbr.* | Palicoureeae | Henkel 1670 | MO | Herbarium |
| Palicourea prunifolia (Kunth) Borhidi | Palicoureeae | Nee 41304 | MO | Herbarium |
| Palicourea pubescens (Sw.) Borhidi | Palicoureeae | Stevens 21197 | MO | Herbarium |
| Palicourea pubescens Sw. | Palicoureeae | Taylor 316 | MO | Herbarium |
| Palicourea pyramidalis Standl. | Palicoureeae | Hurtado 1015 | MO | Herbarium |
| Palicourea quadrifolia (Rudge) DC. | Palicoureeae | Richard 76 | MO | Herbarium |
| Palicourea quadrilateralis C. M. Taylor* | Palicoureeae | Callejas 4018 | MO | Herbarium |
| Palicourea quinquepyrena C. M. Taylor | Palicoureeae | Rodríguez 1748 | MO | Herbarium |
| Palicourea racemosa (Aubl.) G. Nicholson | Palicoureeae | Rivero 265 | MO | Herbarium |
| Palicourea reticulata (Ruiz & Pav.) C. M. Taylor | Palicoureeae | Zambrana 5775 | MO | Silica |
| Palicourea rhodothamna (Standl.) C. M. Taylor | Palicoureeae | Rimachi 7296 | MO | Herbarium |
| Palicourea rigida Kunth | Palicoureeae | Gillespie 1715 | MO | Herbarium |
| Palicourea rigida Kunth | Palicoureeae | Subieta 322 | MO | Herbarium |
| Palicourea seemannii Standl. | Palicoureeae | Juncosa 556 | MO | Herbarium |
| Palicourea sessilis (Vell.) C. M. Taylor | Palicoureeae | Fiaschi 2844 | MO | Herbarium |
| Palicourea solitudinum (Standl.) Borhidi* | Palicoureeae | Duke 5262 | MO | Herbarium |
| Palicourea standleyana C. M. Taylor | Palicoureeae | Gentry 63598 | MO | Herbarium |
| Palicourea stenosepala Standl. | Palicoureeae | Link 13 | MO | Herbarium |
| Palicourea stipularis Benth. | Palicoureeae | Leiva 15000 | MO | Herbarium |
| Palicourea subfusca (Müll. Arg.) C. M. Taylor | Palicoureeae | Parada 996 | MO | Herbarium |
| Palicourea suerrensis (Donn. Sm.) Borhidi | Palicoureeae | McPherson 15864 | MO | Silica |
| Palicourea suerrensis (Donn. Sm.) Borhidi | Palicoureeae | Stevens 36741 | MO | Silica |
| Palicourea sulphurea (Ruiz & Pav.) DC. | Palicoureeae | Palacios 9594 | MO | Herbarium |
| Palicourea tetragona (Donn. Sm.) C. M. Taylor & Lorence | Palicoureeae | Beach 1469 | MO | Herbarium |
| Palicourea tetragona (Donn. Sm.) C. M. Taylor & Lorence | Palicoureeae | Davidse 36909 | MO | Silica |
| Palicourea thyrsiflora (Ruiz & Pav.) DC. | Palicoureeae | Homeier 5199 | MO | Herbarium |
| Palicourea timbiquensis (Standl.) C. M. Taylor | Palicoureeae | Hoover 4120 | MO | Herbarium |
| Palicourea tinctoria Roem. & Schult. | Palicoureeae | Cornejo 149 | MO | Herbarium |
| Palicourea tomentosa (Aubl.) Borhidi | Palicoureeae | Araujo 1546 | MO | Herbarium |
| Palicourea topoensis C. M. Taylor | Palicoureeae | Zak 3747 | MO | Herbarium |
| Palicourea trichocephala (Poepp. & Endl.) Borhidi | Palicoureeae | Schunke 7495 | MO | Herbarium |
| Palicourea triphylla DC. | Palicoureeae | Killeen 6548 | MO | Herbarium |
| Palicourea triphylla DC. | Palicoureeae | Nee 46793 | MO | Herbarium |
| Palicourea triphylla DC. | Palicoureeae | Stevens 36461 | MO | Silica |
| Palicourea winkleri Borhidi | Palicoureeae | Stevens 37485 | MO | Silica |
| Palicourea woronovii (Standl.) C. M. Taylor, Bruniera & Zappi | Palicoureeae | van der Werff 20100 | MO | Herbarium |
| Rudgea cornifolia (Kunth) Standl. | Palicoureeae | de Gracia Cruz 818 | MO | Herbarium |
| Psychotria brachypoda (Müll. Arg.) L. B. Sm. & Downs | Psychotrieae | Silva 1622 | MO | Herbarium |
| Psychotria carthagenensis Jacq. | Psychotrieae | Araujo 2124 | MO | Silica |
| Psychotria grandis Sw. | Psychotrieae | Taylor 11745 | MO | Silica |
| Psychotria guianensis DC. | Psychotrieae | Merello 1711 | MO | Herbarium |
| Psychotria horizontalis Sw. | Psychotrieae | Stevens 32733 | MO | Silica |
| Psychotria jinotegensis C. Nelson, Ant. Molina & Standl. | Psychotrieae | Stevens 33549 | MO | Silica |
| Psychotria limonensis K. Krause | Psychotrieae | Stevens 31580 | MO | Silica |
| Psychotria marginata Sw. | Psychotrieae | Stevens32781 | MO | Silica |
| Psychotria nervosa Sw. | Psychotrieae | Stevens 32362 | MO | Silica |
| Psychotria panamensis Standl. | Psychotrieae | Stevens 32285 | MO | Silica |
| Psychotria subsessilis Benth. | Psychotrieae | Stevens 31494 | MO | Silica |
| Psychotria suterella Müll. Arg. | Psychotrieae | Souza 8789 | MO | Herbarium |
Note: LSU = Louisiana State University; MO = Missouri Botanical Garden.
*Samples that failed sequencing.
Appendix 2. Genomic resources used throughout the study.
| Species | Subfamilya | Tribe | Data type | Source | ID |
|---|---|---|---|---|---|
| Carapichea ipecacuanha (Brot.) L. Andersson | RUB | Palicoureeae | T | 1KP | JOPH |
| Cinchona pubescens Vahl | CIN | Cinchoneae | T | MedPlant | medp_cinpu‐20110618 |
| AG | NCBI Genome | GCA_025175665.1 | |||
| Coffea arabica L. | IXOR | Coffeeae | AG | NCBI Genome | GCA_003713225.1 |
| Coffea eugenioides S. Moore | IXOR | Coffeeae | AG | NCBI Genome | GCA_003713205.1 |
| Coffea canephora Pierre ex A. Froehner | IXOR | Coffeeae | AG | NCBI Genome | GCA_900059795.1 |
| Coffea humblotiana Baill. | IXOR | Coffeeae | AG | NCBI Genome | GCA_023065734.1 |
| Corynanthe mayumbensis (R. D. Good) Raym.‐Hamet ex N. Hallé | CIN | Naucleeae | GS | NCBI SRA | SRX5486283 |
| Galium californicum Hook. & Arn. | RUB | Rubieae | GS | NCBI SRA | SRX5658933 |
| Galium odoratum (L.) Scop. | RUB | Rubieae | GS | NCBI SRA | SRX8928310 |
| Galium porrigens Dempster | RUB | Rubieae | AG | NCBI Genome | GCA_012274505.1 |
| Galium verum L. | RUB | Rubieae | GS | NCBI SRA | ERR3089164 |
| Gardenia jasminoides J. Ellis | IXOR | Gardenieae | AG | NCBI Genome | GCA_013103745.1 |
| AG | NCBI Nucleotide | CM023130.1 | |||
| Gynochthodes officinalis (F. C. How) Razafim. & B. Bremer | RUB | Morindeae | AG | NCBI Genome | GCA_020080225.1 |
| Hamelia patens Jacq. | CIN | Hamelieae | T | NCBI TSA | SRX8873813 |
| Leptodermis oblonga Bunge | RUB | Paederieae | AG | NCBI Genome | GCA_016801395.1 |
| Mitragyna speciosa (Korth.) Havil. | CIN | Naucleeae | AG | NCBI Genome | GCA_024721245.1 |
| GS | NCBI SRA | SRR5602600 | |||
| Neolamarckia cadamba (Roxb.) Bosser | CIN | Naucleeae | T | NCBI TSA | SRX400176 |
| Ophiorrhiza pumila Champ. ex Benth. | RUB | Ophiorrhizeae | AG | NCBI Genome | GCA_016586305.1 |
| Mapouria douarrei Beauvis. | RUB | Psychotrieae | T | 1KP | DNQA |
| Psychotria marginata Sw. | RUB | Psychotrieae | T | 1KP | PCNH |
| Rhazya stricta Decne. | RAUV | Amsonieae | AG | NCBI Genome | GCA_001752375.1 |
Note: CIN = Cinchonoideae; IXOR = Ixoroideae; RAUV = Rauvolfioideae; RUB = Rubioideae; AG = assembled genome; GS = genome skim; T = transcriptome.
Subfamily RAUV is in Apocynaceae.
Ball, L. D. , Bedoya A. M., Taylor C. M., and Lagomarsino L. P.. 2023. A target enrichment probe set for resolving phylogenetic relationships in the coffee family, Rubiaceae. Applications in Plant Sciences 11(6): e11554. 10.1002/aps3.11554
DATA AVAILABILITY STATEMENT
The Rubiaceae2270x sequences are available on Figshare (DOI: 10.6084/m9.figshare.22776512; Li, 2018). Raw reads from the in vitro target enrichment study are available on the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (BioProject ID PRJNA970862).
REFERENCES
- Amenu, S. G. , Wei N., Wu L., Oyebanji O., Hu G., Zhou Y., and Wang Q.. 2022. Phylogenomic and comparative analyses of Coffeeae alliance (Rubiaceae): Deep insights into phylogenetic relationships and plastome evolution. BMC Plant Biology 22(1): 88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andermann, T. , Torres Jiménez M. F., Matos‐Maraví P., Batista R., Blanco‐Pastor J. L., Gustafsson A. L. S., Kistler L., et al. 2019. A guide to carrying out a phylogenomic target sequence capture project. Frontiers in Genetics 10: 1407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Antonelli, A. , Nylander J. A. A., Persson C., and Sanmartín I.. 2009. Tracing the impact of the Andean uplift on Neotropical plant evolution. Proceedings of the National Academy of Sciences, USA 106(24): 9749–9754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Antonelli, A. , Clarkson J. J., Kainulainen K., Maurin O., Brewer G. E., Davis A. P., Epitawalage N., et al. 2021. Settling a family feud: A high‐level phylogenomic framework for the Gentianales based on 353 nuclear genes and partial plastomes. American Journal of Botany 108(7): 1143–1165. [DOI] [PubMed] [Google Scholar]
- Bagley, J. C. , Uribe‐Convers S., Carlsen M. M., and Muchhala N.. 2020. Utility of targeted sequence capture for phylogenomics in rapid, recent angiosperm radiations: Neotropical Burmeistera bellflowers as a case study. Molecular Phylogenetics and Evolution 152(November): 106769. [DOI] [PubMed] [Google Scholar]
- Baker, W. J. , Bailey P., Barber V., and Barker A.. 2022. A comprehensive phylogenomic platform for exploring the angiosperm tree of life. Systematic Biology 71(2): 301–319. https://academic.oup.com/sysbio/article-abstract/71/2/301/6275244 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bolger, A. M. , Lohse M., and Usadel B.. 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30(15): 2114–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borges, R. L. , Razafimandimbison S. G., Roque N., and Rydin C.. 2021. Phylogeny of the Neotropical element of the Randia clade (Gardenieae, Rubiaceae, Gentianales). Plant Ecology and Evolution 154(3): 458–469. [Google Scholar]
- Borowiec, M. L. 2016. AMAS: A fast tool for alignment manipulation and computing of summary statistics. PeerJ 4(January): e1660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Breinholt, J. W. , Carey S. B., Tiley G. P., Davis E. C., Endara L., McDaniel S. F., Neves L. G., et al. 2021. A target enrichment probe set for resolving the flagellate land plant tree of life. Applications in Plant Sciences 9(1): e11406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bremer, B. , and Eriksson O.. 1992. Evolution of fruit characters and dispersal modes in the tropical family Rubiaceae. Biological Journal of the Linnean Society 47(1): 79–95. [Google Scholar]
- Bremer, B. , and Eriksson T.. 2009. Time tree of Rubiaceae: Phylogeny and dating the family, subfamilies, and tribes. International Journal of Plant Sciences 170(6): 766–793. [Google Scholar]
- Brown, J. W. , Walker J. F., and Smith S. A.. 2017. Phyx: Phylogenetic tools for Unix. Bioinformatics 33(12): 1886–1888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Canales, N. A. , Gardner E. M., Gress T., Walker K., Bieker V., Martin M. D., Nesbitt M., et al. 2022. Museomic approaches to genotype historic Cinchona barks. BioRxiv [Preprint]. Available at: 10.1101/2022.04.26.489609 [posted 28 April 2022; accessed 5 October 2023]. [DOI]
- Cao, M. D. , Ganesamoorthy D., Zhou C., and Coin L. J. M.. 2018. Simulating the dynamics of targeted capture sequencing with CapSim. Bioinformatics 34(5): 873–874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Capella‐Gutiérrez, S. , Silla‐Martínez J. M., and Gabaldón T.. 2009. trimAl: A tool for automated alignment trimming in large‐scale phylogenetic analyses. Bioinformatics 25(15): 1972–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chamala, S. , García N., Godden G. T., Krishnakumar V., Jordon‐Thaden I. E., De Smet R., Barbazuk W. B., et al. 2015. MarkerMiner 1.0: A new application for phylogenetic marker development using angiosperm transcriptomes. Applications in Plant Sciences 3(4): e1400115. 10.3732/apps.1400115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chau, J. H. , Rahfeldt W. A., and Olmstead R. G.. 2018. Comparison of taxon‐specific versus general locus sets for targeted sequence capture in plant phylogenomics. Applications in Plant Sciences 6(3): e1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ehrendorfer, F. , Barfuss M. H. J., Manen J.‐F., and Schneeweiss G. M.. 2019. Correction: Phylogeny, character evolution and spatiotemporal diversification of the species‐rich and world‐wide distributed tribe Rubieae (Rubiaceae). PLoS ONE 14(1): e0211589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eserman, L. A. , Thomas S. K., Coffey E. E. D., and Leebens‐Mack J. H.. 2021. Target sequence capture in orchids: Developing a kit to sequence hundreds of single‐copy loci. Applications in Plant Sciences 9(7): e11416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferrero, V. , Rojas D., Vale A., and Navarro L.. 2012. Delving into the loss of heterostyly in Rubiaceae: Is there a similar trend in tropical and non‐tropical climate zones? Perspectives in Plant Ecology, Evolution and Systematics 14(3): 161–167. [Google Scholar]
- Frost, L. , Bedoya A. M., and Lagomarsino L.. 2022. Strong phylogenetic signal despite high phylogenomic complexity in an Andean plant radiation (Freziera, Pentaphylacaceae). BioRxiv [Preprint]. Available at: 10.1101/2021.07.01.450750 [posted 12 September 2023; accessed 5 October 2023]. [DOI]
- Hart, M. L. , Forrest L. L., Nicholls J. A., and Kidner C. A.. 2016. Retrieval of hundreds of nuclear loci from herbarium specimens. Taxon 65(5): 1081–1092. [Google Scholar]
- Hendriks, K. P. , Mandáková T., Hay N. M., Ly E., Hooft van Huysduynen A., Tamrakar R., Thomas S. K., et al. 2021. The best of both worlds: Combining lineage‐specific and universal bait sets in target‐enrichment hybridization reactions. Applications in Plant Sciences 9(7): e11438. 10.1002/aps3.11438 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jantzen, J. R. , Amarasinghe P., Folk R. A., Reginato M., Michelangeli F. A., Soltis D. E., Cellinese N., and Soltis P. S.. 2020. A two‐tier bioinformatic pipeline to develop probes for target capture of nuclear loci with applications in Melastomataceae. Applications in Plant Sciences 8(5): e11345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson, M. G. , Gardner E. M., Liu Y., Medina R., Goffinet B., Shaw A. J., Zerega N. J. C., and Wickett N. J.. 2016. HybPiper: Extracting coding sequence and introns for phylogenetics from high‐throughput sequencing reads using target enrichment. Applications in Plant Sciences 4(7): e1600016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Johnson, M. G. , Pokorny L., Dodsworth S., Botigué L. R., Cowan R. S., Devault A., Eiserhardt W. L., et al. 2019. A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k‐medoids clustering. Systematic Biology 68(4): 594–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Junier, T. , and Zdobnov E. M.. 2010. The Newick utilities: High‐throughput phylogenetic tree processing in the UNIX shell. Bioinformatics 26(13): 1669–1670. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kadlec, M. , Bellstedt D. U., Le Maitre N. C., and Pirie M. D.. 2017. Targeted NGS for species level phylogenomics: ‘Made to measure’ or ‘one size fits all’? PeerJ 5(July): e3569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katoh, K. , and Standley D. M.. 2013. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Molecular Biology and Evolution 30(4): 772–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kozlov, A. M. , Darriba D., Flouri T., Morel B., and Stamatakis A.. 2019. RAxML‐NG: A fast, scalable and user‐friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 35(21): 4453–4455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Le, H. T. T. , Nguyen L. N., Pham H. L. B., Le H. T. M., Luong T. D., Huynh H. T. T., Nguyen V. T., et al. 2022. Target capture reveals the complex origin of Vietnamese ginseng. Frontiers in Plant Science 13(July): 814178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, W. 2018. Figshare dataset. Website: 10.6084/m9.figshare.6025748.v1 [accessed 6 October 2023]. [DOI]
- Li, W. , and Godzik A.. 2006. Cd‐Hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13): 1658–1659. [DOI] [PubMed] [Google Scholar]
- Liu, Y. , Johnson M. G., Cox C. J., Medina R., Devos N., Vanderpoorten A., Hedenäs L., et al. 2019. Resolution of the ordinal phylogeny of mosses using targeted exons from organellar and nuclear genomes. Nature Communications 10(1): 1485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Löfstrand, S. D. , Razafimandimbison S. G., and Rydin C.. 2019. Phylogeny of Coussareeae (Rubioideae, Rubiaceae). Plant Systematics and Evolution 305(4): 293–304. [Google Scholar]
- Mandel, J. R. , Dikow R. B., Funk V. A., Masalia R. R., Staton S. E., Kozik A., Michelmore R. W., et al. 2014. A target enrichment method for gathering phylogenetic information from hundreds of loci: An example from the Compositae. Applications in Plant Sciences 2(2): e1300085. 10.3732/apps.1300085 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matasci, N. , Hung L.‐H., Yan Z., Carpenter E. J., Wickett N. J., Mirarab S., Nguyen N., et al. 2014. Data access for the 1,000 Plants (1KP) Project. GigaScience 3(October): 17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morales‐Briones, D. F. , Gehrke B., and Huang C. H.. 2022. Analysis of paralogs in target enrichment data pinpoints multiple ancient polyploidy events in Alchemilla s.l. (Rosaceae). Systematic Biology 71(1): 190–207. https://academic.oup.com/sysbio/article-abstract/71/1/190/6274658 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nikolov, L. A. , Shushkov P., Nevado B., Gan X., Al‐Shehbaz I. A., Filatov D., Do. Bailey C., and Tsiantis M.. 2019. Resolving the backbone of the Brassicaceae phylogeny for investigating trait diversity. New Phytologist 222(3): 1638–1651. [DOI] [PubMed] [Google Scholar]
- Plants of the World Online . 2023. Plants of the World Online. Facilitated by the Royal Botanic Gardens, Kew. Website: https://powo.science.kew.org/ [accessed 4 February 2023].
- Prata, E. M. B. , Sass C., Rodrigues D. P., Domingos F. M. C. B., Specht C. D., Damasco G., Ribas C. C., et al. 2018. Towards integrative taxonomy in Neotropical botany: Disentangling the Pagamea guianensis species complex (Rubiaceae). Botanical Journal of the Linnean Society 188(2): 213–231. [Google Scholar]
- Razafimandimbison, S. G. , Taylor C. M., Wikström N., Pailler T., Khodabandeh A., and Bremer B.. 2014. Phylogeny and generic limits in the sister tribes Psychotrieae and Palicoureeae (Rubiaceae): Evolution of schizocarps in Psychotria and origins of bacterial leaf nodules of the Malagasy species. American Journal of Botany 101(7): 1102–1126. [DOI] [PubMed] [Google Scholar]
- Razafimandimbison, S. G. , Kainulainen K., Wikström N., and Bremer B.. 2017. Historical biogeography and phylogeny of the pantropical Psychotrieae alliance (Rubiaceae), with particular emphasis on the Western Indian Ocean region. American Journal of Botany 104(9): 1407–1423. [DOI] [PubMed] [Google Scholar]
- Razafimandimbison, S. G. , Wong K.‐M., and Rydin C.. 2021. Molecular systematics of the tribe Prismatomerideae (Rubiaceae) and its taxonomic consequences, with notes on the importance of the inflorescence morphology for species‐group recognition in Rennellia . Taxon 70(2): 324–338. [Google Scholar]
- Rydin, C. , Wikström N., and Bremer B.. 2017. Conflicting results from mitochondrial genomic data challenge current views of Rubiaceae phylogeny. American Journal of Botany 104(10): 1522–1532. [DOI] [PubMed] [Google Scholar]
- Schmickl, R. , Liston A., Zeisek V., Oberlander K., Weitemier K., Straub S. C. K., Cronn R. C., et al. 2016. Phylogenetic marker development for target enrichment from transcriptome and genome skim data: The pipeline and its application in Southern African Oxalis (Oxalidaceae). Molecular Ecology Resources 16(5): 1124–1135. [DOI] [PubMed] [Google Scholar]
- Sedio, B. E. , Paul J. R., Taylor C. M., and Dick C. W.. 2013. Fine‐scale niche structure of Neotropical forests reflects a legacy of the great American biotic interchange. Nature Communications 4: 2317. [DOI] [PubMed] [Google Scholar]
- Shah, T. , Schneider J. V., Zizka G., Maurin O., Baker W., Forest F., Brewer G. E., et al. 2021. Joining forces in Ochnaceae phylogenomics: A tale of two targeted sequencing probe kits. American Journal of Botany 108(7): 1201–1216. [DOI] [PubMed] [Google Scholar]
- Siniscalchi, C. M. , Hidalgo O., Palazzesi L., Pellicer J., Pokorny L., Maurin O., Leitch I. J., et al. 2021. Lineage‐specific vs. universal: A comparison of the Compositae1061 and Angiosperms353 enrichment panels in the sunflower family. Applications in Plant Sciences 9(7): e11422. 10.1002/aps3.11422 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slater, G. , and Birney E.. 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6: 31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Slimp, M. , Williams L. D., Hale H., and Johnson M. G.. 2021. On the potential of Angiosperms353 for population genomic studies. Applications in Plant Sciences 9(7): e11419. 10.1002/aps3.11419 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith, B. T. , Harvey M. G., Faircloth B. C., Glenn T. C., and Brumfield R. T.. 2014. Target capture and massively parallel sequencing of ultraconserved elements for comparative studies at shallow evolutionary time scales. Systematic Biology 63(1): 83–95. [DOI] [PubMed] [Google Scholar]
- Soto Gomez, M. , Pokorny L., Kantar M. B., Forest F., Leitch I. J., Gravendeel B., Wilkin P., et al. 2019. A customized nuclear target enrichment approach for developing a phylogenomic baseline for Dioscorea yams (Dioscoreaceae). Applications in Plant Sciences 7(6): e11254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Straub, S. C. K. , Boutte J., Fishbein M., and Livshultz T.. 2020. Enabling evolutionary studies at multiple scales in Apocynaceae through Hyb‐Seq. Applications in Plant Sciences 8(11): e11400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Štorchová, H. , Hrdličková R., Chrtek J. Jr., Tetera M., Fitze D., and Fehrer J.. 2020. An improved method of DNA isolation from plants collected in the field and conserved in saturated NaCl/CTAB solution. Taxon 49(1): 79–84. [Google Scholar]
- Thiers, B. 2023. (continuously updated). Index Herbariorum. Website http://sweetgum.nybg.org/science/ih/ [accessed 6 October 2023].
- Thureborn, O. , Razafimandimbison S. G., Wikström N., and Rydin C.. 2022. Target capture data resolve recalcitrant relationships in the coffee family (Rubioideae, Rubiaceae). Frontiers in Plant Science 13(September): 967456. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ufimov, R. , Zeisek V., Píšová S., Baker W. J., Fér T., van Loo M., Dobeš C., and Schmickl R.. 2021. Relative performance of customized and universal probe sets in target enrichment: A case study in subtribe Malinae. Applications in Plant Sciences 9(7): e11442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uribe‐Converse, S. 2016. Capture probe design with Sondovac. Simon's Lab Book. Website: https://uribeconvers.wordpress.com/ [posted 11 May 2016; accessed 14 September 2020].
- Wikström, N. , Kainulainen K., Razafimandimbison S. G., Smedmark J. E. E., and Bremer B.. 2015. A revised time tree of the asterids: Establishing a temporal framework for evolutionary studies of the coffee family (Rubiaceae). PLoS ONE 10(5): e0126690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wikström, N. , Bremer B., and Rydin C.. 2020. Conflicting phylogenetic signals in genomic data of the coffee family (Rubiaceae). Journal of Systematics and Evolution 58(4): 440–460. [Google Scholar]
- Yardeni, G. , Viruel J., Paris M., Hess J., Groot Crego C., de La Harpe M., Rivera N., et al. 2022. Taxon‐specific or universal? Using target capture to study the evolutionary history of rapid radiations. Molecular Ecology Resources 22(3): 927–945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang, C. , Rabiee M., Sayyari E., and Mirarab S.. 2018. ASTRAL‐III: Polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 19(Suppl 6): 153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang, C. , Zhao Y., Braun E. L., and Mirarab S.. 2021. TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution. Methods in Ecology and Evolution 12(11): 2145–2158. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix S1. Exonic region concatenation protocol.
Appendix S2. HybPiper summary statistics for single‐exonic region assemblies. Grayed rows indicate CapSim‐derived sequencing data. Bolded rows indicate silica‐dried specimens.
Appendix S3. HybPiper summary statistics for multi‐exonic region assemblies. Grayed rows indicate CapSim‐derived sequencing data. Bolded rows indicate silica‐dried specimens.
Appendix S4. Results of initial CapSim, simulated target enrichment experiment. Contigs is the number of contigs recovered during the map to reference assembly.
Appendix S5. Statistics for downloaded genomes used in CapSim experiments. “Total length” (Mb = megabases) is the length of all contigs combined in the original genome assembly. “N50” is the scaffold N50 score provided with the NCBI accession for the genome (see Appendix 2), used here as a metric for genome quality.
Appendix S6. HybPiper single‐exonic region and multi‐exonic region (MER) assembly statistics averaged for each clade. Statistics for MER assemblies are in blue.
Appendix S7. Heatmap showing results of HybPiper multi‐exonic region (MER) assemblies for Cinchonoideae (CIN), Rubioideae (RUB), and Ixoroideae (IXOR) samples, and one Apocynaceae sample. Target MERs are on the x‐axis and are grouped by the taxonomic source of the reference sequence. “RUB+CIN” indicates the 22 MERs with constituent ERs from both RUB and CIN. Samples are on the y‐axis and are grouped by taxonomy. Darker shades represent higher percentages of the locus length recovered by HybPiper. Locus completeness tends to be higher when target species are from the same clade as the probe sequences.
Appendix S8. AMAS single‐exonic region alignment statistics, showing average values for each statistic with ranges in parentheses. “All combined” alignments include Hillieae, Palicoureeae+Psychotrieae, and all outgroups (including CapSim samples).
Appendix S9. ASTRAL species tree estimated using the multi‐exonic region data set. Numbers above branches indicate local posterior probability support values; only values <0.98 are shown. Pie charts at internal nodes indicate quartet support (i.e., the percentage of quartets in gene trees that agree with the branch) for the main topology (blue), the first alternate topology (yellow), and the second alternate topology (pink). The two Palicourea suerrensis samples included originate from different populations.
Data Availability Statement
The Rubiaceae2270x sequences are available on Figshare (DOI: 10.6084/m9.figshare.22776512; Li, 2018). Raw reads from the in vitro target enrichment study are available on the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (BioProject ID PRJNA970862).
