Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2022 Apr 18;109(4):580–601. doi: 10.1002/ajb2.1827

Phylogenomic discordance suggests polytomies along the backbone of the large genus Solanum

Edeline Gagnon 1,2,, Rebecca Hilgenhof 1,2, Andrés Orejuela 1,2, Angela McDonnell 3, Gaurav Sablok 4,5, Xavier Aubriot 6, Leandro Giacomin 7, Yuri Gouvêa 8, Thamyris Bragionis 8, João Renato Stehmann 8, Lynn Bohs 9, Steven Dodsworth 10,11, Christopher Martine 12, Péter Poczai 4,13, Sandra Knapp 14, Tiina Särkinen 1
PMCID: PMC9321964  PMID: 35170754

Abstract

Premise

Evolutionary studies require solid phylogenetic frameworks, but increased volumes of phylogenomic data have revealed incongruent topologies among gene trees in many organisms both between and within genomes. Some of these incongruences indicate polytomies that may remain impossible to resolve. Here we investigate the degree of gene‐tree discordance in Solanum, one of the largest flowering plant genera that includes the cultivated potato, tomato, and eggplant, as well as 24 minor crop plants.

Methods

A densely sampled species‐level phylogeny of Solanum is built using unpublished and publicly available Sanger sequences comprising 60% of all accepted species (742 spp.) and nine regions (ITS, waxy, and seven plastid markers). The robustness of this topology is tested by examining a full plastome dataset with 140 species and a nuclear target‐capture dataset with 39 species of Solanum (Angiosperms353 probe set).

Results

While the taxonomic framework of Solanum remained stable, gene tree conflicts and discordance between phylogenetic trees generated from the target‐capture and plastome datasets were observed. The latter correspond to regions with short internodal branches, and network analysis and polytomy tests suggest the backbone is composed of three polytomies found at different evolutionary depths. The strongest area of discordance, near the crown node of Solanum, could potentially represent a hard polytomy.

Conclusions

We argue that incomplete lineage sorting due to rapid diversification is the most likely cause for these polytomies, and that embracing the uncertainty that underlies them is crucial to understand the evolution of large and rapidly radiating lineages.

Keywords: Angiosperms353, hard polytomy, incomplete lineage sorting, incongruence, multilocus phylogenetic trees, nuclear‐plastid discordances, plastomes, short backbone branches, Solanaceae, target capture


Recent advances in high‐throughput sequencing have provided larger molecular datasets, including entire genomes, for reconstructing evolutionary relationships (e.g., Ronco et al., 2021). Considerable progress has been made since the publication of the first molecular‐based classification of orders and families of flowering plants (APG, 1998), with one of the most recent examples including a phylogenetic tree of the entire Viridiplantae based on transcriptome data from more than a thousand species (One Thousand Plant Transcriptomes Initiative, 2019). While large datasets have strengthened our understanding of evolutionary relationships and classifications across the Tree of Life, several of them have demonstrated repeated cases of persistent topological discordance across key nodes in birds (Suh et al., 2015; Suh, 2016), mammals (Morgan et al., 2013; Romiguier et al., 2013; Simion et al., 2017), amphibians (Hime et al., 2021), plants (Wickett et al., 2014; One Thousand Plant Transcriptomes Initiative, 2019), and fungi (Kuramae et al., 2006). Whereas previous expectations were that these “soft polytomies” would be improved with the addition of more data, their persistence after addition of more taxonomic and molecular data have led some authors to suggest that they actually represent “hard polytomies”, i.e., extremely rapid divergence events of three or more lineages at the same time or reticulate evolution due to species hybridization and/or introgression. In an era where obtaining genome‐wide sampling of species for phylogenetic reconstruction has become mainstream, the question about whether persistent topological discordance can be resolved with more data or whether they reflect complex biological realities (Jeffroy et al., 2006; Philippe et al., 2011) is becoming increasingly common.

Discordance in phylogenetic signal can be due to three general classes of effects (Wendel and Doyle, 1998): (1) technical causes such as gene choice, sequencing error, model selection, or poor taxonomic sampling (Philippe et al., 20112017); (2) organism‐level processes such as rapid or convergent evolution, rapid diversification, incomplete lineage sorting (ILS), or horizontal gene transfer (Degnan and Rosenberg, 2009), and (3) gene and genome‐level processes such as interlocus interactions and concerted evolution, intragenic recombination, use of paralogous genes for analysis, and/or non‐independence of sites used for analysis. Together, these biological and non‐biological processes can lead to conflicting phylogenetic signals between different loci in the genome and hinder the recovery of the evolutionary history of a group (Degnan and Rosenberg, 2009). Consequently, careful assessment of phylogenetic discordance across mitochondrial, plastid, and nuclear datasets is critical for understanding realistic evolutionary patterns in a group, as traditional statistical branch support measures fail to reflect topological variation of the gene trees underlying a species tree (Liu et al., 2009; Kumar et al., 2012).

Here we explore the presence of topological discordance in nuclear and plastome datasets of the large and economically important angiosperm genus Solanum L. (Solanaceae), which includes 1,228 accepted species and several major crops and their wild relatives, including potato, tomato and brinjal eggplant (aubergine), as well as at least 24 minor crop species (website: Solanaceaesource.org, accessed November 2020). Building a robust species‐level phylogeny for Solanum has been challenging because of the sheer size of the genus, and because of persistent poorly resolved nodes along the phylogenetic backbone. Bohs (2005) published the first plastid phylogenetic analysis for Solanum and established a set of 12 highly supported clades based on her strategic sampling of 112 species (9% of the total species number in the genus), spanning morphological and geographic variation. As new studies have emerged with increased taxonomic and genetic sampling (e.g., Levin et al., 2006; Weese and Bohs, 2007; Stern et al., 2011; Särkinen et al., 2013; Tepe et al., 2016), the understanding of overall phylogenetic relationships within Solanum has evolved to recognise three main clades: (1) the Thelopodium clade containing three species sister to the rest of the genus; (2) Clade I containing c. 350 mostly herbaceous and non‐spiny species (including the Tomato, Petota, and Basarthrum clades that contain the cultivated tomato, potato, and pepino, respectively); and (3) Clade II consisting of c. 900 predominantly spiny and shrubby species, including the cultivated brinjal eggplant (Table 1). The two latter clades are further resolved into 10 major and 43 minor clades (Table 1).

Table 1.

Number of species and taxon sampling across major and minor clades of Solanum. Clades are based on groups identified in previous molecular phylogenetic studies (Bohs, 2005; Weese and Bohs, 2007; Stern et al., 2011; Stern and Bohs, 2012; Särkinen et al., 2013; Tepe et al., 2016). Species number for each clade is based on current updated taxonomy in the SolanaceaeSource database (website: solanaceaesource.org, accessed November 2020). The 19 clades sampled in the pruned trees for the principal coordinate analysis in this study are in bold. New associated major clade names are given where applicable. Rows shaded in gray represent major and minor clades belonging to Clade II. The Eastern Hemisphere Spiny clade (EHS, formerly known as Old World spiny clade) comprises almost all the spiny solanums occurring in the eastern hemisphere.

Minor clade Associated major clade (Särkinen et al., 2013) New associated major clade (this study) Species Sampled species (%)
Supermatrix Plastome (PL) Target capture (TC)
Thelopodium Thelopodium 3 3 (100%) 1 (33%) 1 (33%)
African non‐spiny M Clade VANAns 14 5 (36%) 1 (7%)
Normania M Clade VANAns 3 2 (67%) 1 (33%) 1 (33%)
Archaesolanum M Clade VANAns 8 8 (100%) 1 (13%) 1 (13%)
Valdiviense M Clade VANAns 1 1 (100%) 1 (100%) 1 (100%)
Dulcamaroid M Clade DulMo 45 25 (56%) 8 (18%) 1 (2%)
Morelloid M Clade DulMo 75 66 (88%) 15 (20%) 1 (1%)
Regmandra Potato Regmandra 12 6 (50%) 4 (33%) 1 (8%)
Herpystichum Potato 10 10 (100%)
Pteroidea Potato 10 10 (100%) 1 (10%)
Oxycoccoides Potato 1 1 (100%)
Articulatum Potato 2 2 (100%)
Basarthrum Potato 16 10 (56%) 3 (19%) 3 (19%)
Anarrhichomenum Potato 12 8 (82%)
Etuberosum Potato 3 2 (67%) 2 (67%) 1 (33%)
Tomato Potato 7 14 (82%) 8 (47%) 3 (18%)
Petota Potato 113 61 (54%) 38 (34%) 2 (2%)
Clandestinum‐Mapiriense Clandestinum‐Mapiriense 3 3 (100%) 1 (33%) 1 (33%)
Wendlandii‐Allophyllum Wendlandii‐Allophyllum 10 7 (70%) 1 (10%) 1 (10%)
Nemorense Nemorense 4 4 (100%) 1 (25%)
Pachyphylla Cyphomandra 39 32 (82%) 1 (3%)
Cyphomandropsis Cyphomandra 11 7 (64%) 1 (9%) 1 (9%)
Geminata Geminata 150 68 (45%) 5 (3%) 1 (1%)
Reductum Geminata 2 2 (100%) 1 (50%)
Brevantherum Brevantherum 83 29 (35%) 3 (4%)
Gonatotrichum Brevantherum 7 7 (100%) 1 (14%)
Inornatum Brevantherum 5 2 (40%) 1 (20%)
Trachytrichium Brevantherum 2 2 (100%)
Elaeagniifolium Leptostemonum 5 5 (100%) 1 (20%) 1 (20%)
Micracantha Leptostemonum 14 9 (64%) 1 (7%)
Torva Leptostemonum 54 34 (63%) 5 (9%) 1 (2%)
Erythrotrichum Leptostemonum 33 13 (39%) 1 (3%)
Thomasiifolium Leptostemonum 9 4 (44%) 1 (11%)
Gardneri Leptostemonum 10 8 (80%) 1 (10%)
Acanthophora Leptostemonum 22 13 (59%) 1 (5%)
Lasiocarpa Leptostemonum 12 12 (100%)
Sisymbriifolium Leptostemonum 4 4 (100%) 1 (25%) 1 (25%)
Androceras Leptostemonum 16 15 (94%)
Crinitum Leptostemonum 23 10 (43%)
Bahamense Leptostemonum 3 3 (100%)
Asterophorum Leptostemonum 4 2 (50%)
Carolinense Leptostemonum 11 8 (73%) 1 (9%)
Hieronymi Leptostemonum 1 1 (100%) 1 (100%)
Eastern Hemisphere Spiny Leptostemonum 332 197 (59%) 24 (7%) 16 (5%)
Campechiense Leptostemonum 1 1 (100%)
Crotonoides Leptostemonum 3 2 (67%) 1 (33%)
Multispinum Leptostemonum 1 1 (100%) 1 (100%)
Unplaced Leptostemonum 9 1 (13%)
TOTALS: 1228 746 (60%) 140 (11%) 39 (3%)

Despite these advancements, phylogenetic relationships between many of the major clades of Solanum have remained poorly resolved, mainly due to limitations in taxon and molecular marker sampling. The most recent genus‐wide phylogenetic study by Särkinen et al. (2013), based on seven markers (two nuclear and 5 plastid) and fewer than half (34%) of the species of Solanum, failed to resolve the relationships among major clades, especially within Clade II and the large component Leptostemonum clade, which includes the Old World spiny clade, comprising almost all spiny Solanum species that occur in the eastern hemisphere. To reduce colonial connotations associated with this name, we hereafter refer to this clade as the Eastern Hemisphere Spiny clade (EHS; Table 1).

To gain a better understanding of the evolutionary relationships of Solanum, we built a new Sanger supermatrix that included 60% of the species of the genus and compared the phylogenetic relationships obtained with the Sanger supermatrix with genus‐wide plastid (PL) and nuclear target‐capture (TC) phylogenomic datasets. We ask: (1) Does a significant increase in taxon sampling of the supermatrix dataset lead to significant changes in the circumscription of major and minor clades in Solanum? (2) Does increased gene sampling in both plastome and nuclear data resolve previously identified polytomies between major clades? (3) Is there evidence of discordance within and between genomic datasets? and (4) Are areas of high discordance in the Solanum phylogeny better represented by polytomies rather than bifurcating nodes? Comparison of the topologies from the different datasets, and results from discordance analyses, a filtered supertree network, and polytomy tests lead us to suggest that some of the soft polytomies of Solanum might be hard polytomies caused by rapid speciation and diversification coupled with ILS. We discuss the consequences that such an interpretation has for investigating the biogeography and morphological trait evolution across the economically important genus.

MATERIALS AND METHODS

Taxon sampling

A Sanger sequence supermatrix was generated including all available sequences from GenBank related to the genus Solanum for nine regions: (1) the nuclear ribosomal internal transcribed spacer (ITS); (2) low‐copy nuclear region waxy (i.e., GBSSI); (3) two protein‐coding plastid genes matK and ndhF; and (4) five non‐coding plastid regions (ndhF‐rpl32, psbA‐trnH, rpl32‐trnL, trnS‐G, and trnT‐L). Only vouchered and verified samples were utilized. All sequences were blasted against target regions in USEARCH version 11 (Edgar, 2010). Taxon names were checked against SolanaceaeSource synonymy (website: solanaceaesource.org, accessed November 2020) and duplicate sequences belonging to the same species were pruned out to retain a single individual per taxon. A total of 817 Sanger sequences were generated and added to the matrix, adding 129 previously unsampled species and new data for 257 species (Appendix S1). Final species sampling across major and minor clades of Solanum varied from 13 to 100%, with 742 species of Solanum (60% of the 1228 currently accepted species as of November 2020; Table 1). Four species of Jaltomata Schltdl. were used as an outgroup (Appendix S1).

To assess phylogenetic discordance within Solanum, a set of species was selected for the phylogenomic study to represent all 10 major and as many of the 43 minor clades of Solanum as possible (Table 1), as well as the outgroup Jaltomata. The final sampling included 151 samples for the plastome (PL) dataset (140 Solanum species; Table 1 and Appendix S2) and 40 samples for the target‐capture (TC) dataset (39 Solanum species; Table 1 and Appendix S3). For the PL dataset, 86 samples were sequenced using low‐coverage genome skimming, and the remaining samples were downloaded from GenBank (November 2019). For the TC dataset, 12 samples were sequenced as part of the Plant and Fungal Trees of Life project (Baker et al., 2021) using the Angiosperms353 bait set (Johnson et al., 2019). In addition, 17 sequences were added from an unpublished dataset provided by A. McDonnell and C. Martine. Sequences for the remaining 12 samples were extracted from the GenBank SRA archive using the SRA Toolkit 2.10.7 (website: https://github.com/ncbi/sra-tools; Appendix S3).

DNA extraction, library preparation and sequencing

Supermatrix Sanger sequencing

DNA extractions for Sanger sequencing were done using DNeasy plant mini extraction kits (Qiagen, Valencia, California, USA) or the FastDNA kit (MP Biomedicals, Irvine, California, USA). Amplification of waxy followed Levin et al. (2005) using two (waxyF with 1171R and 1058F with 2R) or four primer pairs (waxyF with Ex4R, Ex4F with 1171R, 1058F with 3′N, and 3F with 2R). trnT‐L was amplified with primers a‐d and c‐f (Taberlet et al., 1991; Bohs and Olmstead, 2001; Bohs, 2004). ndhF amplification followed Bohs and Olmstead (1997), psbA‐trnH followed Sang et al. (1997), matK followed Rosario et al. (2019), ITS and trnS‐G followed Levin et al. (2006), and rpl32‐trnL and ndhF‐rpl32 followed Miller et al. (2009). Sequencing was carried out on ABI automated sequencers at the University of Utah DNA sequencing facility (Salt Lake City, Utah, USA), at the Natural History Museum (London, UK), and at Myleus Biotecnologia (Belo Horizonte, Brazil). Contigs were visually checked in Sequencher version 4.8 (GeneCodes, Ann Arbor, Michigan, USA) and Geneious Prime 2020.1.1 (website: https://www.geneious.com). The combined matrix was 10,908 bp long (Appendix S4). The two most densely sampled regions (trnT‐L and ITS) included 84% and 82% of the sampled species, respectively; waxy (54%) and ITS (67%) loci had the most parsimony informative characters (Appendix S4).

PL and TC datasets

DNA for high‐throughput sequencing was extracted using the low‐salt CTAB method (Arseneau et al., 2017) and quantified on a Qubit fluorometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA). Genome skimming was done at the Institute of Biotechnology, University of Helsinki (Finland). A paired‐end genomic library was constructed using the Nextera DNA library preparation kit (Illumina, San Diego, California, USA). Fragment analysis was conducted with an Agilent Technologies (Santa Clara, California, USA) 2100 Bioanalyzer using a DNA 1000 chip. Sequencing was performed on an Illumina MiSeq platform from both ends with a read length of 150 bp. DNA extraction, quantification, and sequencing for TC followed Johnson et al. (2019). All PL and TC reads have been submitted to GenBank and the European Nucleotide Archive (Appendices S2 and S3).

Phylogenetic analyses

Overview of methodological strategy

Ten phylogenetic analyses with different methodological strategies were compared across the supermatrix, PL and TC datasets, to test if the phylogenetic results were robust despite these different choices (e.g., Philippe et al., 20112017; Saarela et al., 2018; Duvall et al., 2020). The Sanger supermatrix analyses based on Maximum Likelihood (ML) and Bayesian inference (BI) were used as a reference to compare results from the PL and TC species trees because the Sanger supermatrix had the most complete taxonomic sampling (Table 2). For the PL dataset, a total of four analysis were compared to test the effect of missing data and sampling on the resulting phylogenies, as well as the effect of different partitioning schemes in IQ‐TREE2 (Table 2; Minh et al., 2020b). For the TC dataset, a total of four analyses were compared to test the effect of the phylogenetic method (ML vs. coalescent methods), missing data, and taxonomic sampling on the resulting phylogenies (Table 2). Full methods for all analyses are described below. All bioinformatic analyses were run either on the Toby‐G1 server at the Royal Botanic Garden Edinburgh (Scotland, UK), or the Crop Diversity Server from the James Hutton Institute, in Dundee, Scotland, except for the supermatrix ML analysis.

Table 2.

Overview of the 10 different analyses conducted across the Sanger supermatrix, plastome (PL), and target capture (TC) datasets. Acronyms indicate how each analysis is referred to in the figures and text. ML = Maximum Likelihood; BI = Bayesian Inference, A353 = Angiosperms353 bait set. See Materials and Methods section for full details.

Dataset Taxon and genomic sampling Phylogenetic method Partitioning scheme Acronym
Supermatrix 746 taxa, 9 loci ML: RaxML Supermatrix ML
BI: Beast2 Supermatrix BI
Plastome (PL) 151 taxa, full + partial plastomes ML: IQ‐TREE2 Unpartitioned PL‐151‐UP
151 taxa, full + partial plastomes ML: IQ‐TREE2 Best‐Partition scheme PL‐151‐BP
125 taxa, full plastomes only ML: IQ‐TREE2 Unpartitioned PL‐125‐UP
125 taxa, full plastomes only ML: IQ‐TREE2 Best‐Partition scheme PL‐125‐BP
Target capture (TC) (A353) 40 taxa, 338 exons ML: IQ‐TREE2 TC‐min04‐ML
40 taxa, 338 exons Coalescent: ASTRAL‐III TC‐min04‐ASTRAL‐III
40 taxa, 303 exons ML: IQ‐TREE2 TC‐min20‐ML
40 taxa, 303 exons Coalescent: ASTRAL‐III TC‐min20‐ASTRAL‐III

Supermatrix dataset

Sequences were aligned in MAFFT version 7 (Katoh et al., 2005), manually checked, and optimised. Short multi‐repeats and ambiguously aligned regions were excluded manually or with trimAl (‐gappyout method; Capella‐Gutiérrez et al., 2009). Both ML and BI analyses were run on individual loci, as well as on a combined plastid alignment (seven loci in total) to check for topological incongruences, rogue taxa, and misidentified sequences. Visual checks revealed a small number of clear mis‐determinations and/or lab errors. A further 26 samples were removed based on high RogueNaRok scores (Aberer et al., 2013). Nuclear sequence data (ITS and waxy) were identified for all known polyploid species (63 species, Appendix S5), and subsequently examined to determine if there were any strong incongruences with the results from the plastid loci. As none were found (Appendices S6 and S7), sequences from these species were kept in the final supermatrix analysis.

Maximum likelihood (ML) and Bayesian inference (BI) analyses were run on all nine loci individually and on the combined plastid dataset (seven loci). ML analyses were run in RaxML‐HPC version 8.2.12 (Stamatakis, 2014) on XSEDE on CIPRES Science Gateway version 3.3 (Miller et al., 2010), with 10 independent runs based on unique starting trees. The General Time Reversible (GTR) model with CAT (Tavaré, 1986; Stamatakis, 2006) was used for all partitions. A total of 1,000 non‐parametric bootstraps were run; bootstrap support (BS) ≥ 95% was considered strong, 75 to 94% moderate, and 60 to 74% weak.

BI analyses were run using Beast version 2.6.3 (Bouckaert et al., 2019), with two parallel runs sampling trees every 10,000 generations. ModelTest‐NG (Darriba et al., 2020) was used to find the most suitable nucleotide substitution model for the individual loci and combined plastid loci; JC + G4 was specified for the ITS and trnS‐G regions, GTR + G4 for the psbA‐trnH, trnL‐T, rpL32 and matK regions, and the GTR + I + G4 model for all other regions, as well as the combined plastid dataset and the full supermatrix dataset. For all analyses, an uncorrelated log‐normal relaxed clock, birth‐death tree prior, and a normally distributed UCLD.mean prior was specified (mean 1, SD = 0.3). All runs were checked with Tracer version 1.7.1 (Rambaut et al., 2018) to ensure that adequate effective sample sizes were reached (ESS > 200). LogCombiner and TreeAnnotator were used to generate the final maximum credibility tree with a 15% burn‐in. Posterior probability (PP) values ≥0.95 were considered strong, and from 0.94 to 0.75 as moderate to weak.

The concatenated ML Sanger supermatrix analysis was run on a concatenated matrix, with the same settings as described above in RaxML. The concatenated BI Sanger supermatrix was analysed partitioning the dataset between ITS, waxy and the plastid genes. Modifications to the analysis included a monophyletic constraint on Solanum, and four parallel runs that were run for 60 million generations with two chains, sampling trees every 10,000 generations. The ML best tree was used as a starting topology to speed up convergence of the chains.

PL dataset

Paired reads from genome skimming were cleaned using BBDuk from the BBTools suite (sourceforge.net/projects/bbmap/; ktrimright = t, k = 27, hdist = 1, edist = 0, qtrim = rl, trimq = 20, minlength = 36, trimbyoverlap = t, minoverlap = 24, and qin = 33). Sequence quality was checked with FastQC (Andrews, 2010) and MultiFastQC (Ewels et al., 2016). Plastome assembly was done using de novo assembly with Fast‐Plast version 1.2.6 (website: https://github.com/mrmckain/Fast-Plast), and reference‐guided assembly using GetOrganelle version 1.6.2.e (Jin et al., 2020) with the high‐coverage plastome sequence of S. dulcamara L. (GenBank KY863443; Amiryousefi et al., 2018). For GetOrganelle, the following settings were used: ‐w 0.6; ‐R 20; ‐k 85; 95; 105; and 127; for Fast‐Plast, the Solanales Bow‐tie index was used for the assembly. Results from both methods were aligned in Geneious and visually checked to determine consistency. Assembly quality was assessed using the reads identified from the Bow‐tie step in the Fast‐Plast analysis, which were mapped against the final recovered plastome sequence using BWA (Li and Durbin, 2010). Mean and standard deviation of coverage depth for each base pair was determined by examining the same files in Geneious. Assemblies were annotated using both Chlorobox GeSeq (Tillich et al., 2017) and the “Annotate from database” tool in Geneious using the reference plastid genome of S. dulcamara. Results were compared to ensure that start and stop codons for exon boundaries were congruent. Annotated plastomes were submitted to GenBank (Appendix S2). A total of 55 full plastomes were assembled with a mean length of 155,498 bp (max. 156,138 bp, min. 154,715 bp; Appendix S2), and a mean coverage of 158 (min. 22, max. 571; Appendix S2), and 28 partial plastomes (45,398 to 154,598 bp) with a mean coverage of 29 (min 4, max 96; Appendix S2). All plastomes had a highly conserved quadripartite structure, with no loss, duplication, or expansion of gene families.

Plastomes from this study and those retrieved from GenBank were aligned in Geneious using MAFFT (Katoh et al., 2005), visually checked, and corrected. A copy of the inverted repeat (IRa) was removed prior to phylogenomic analyses, although 1,189 bp were kept at the beginning of the region to be able to extract the gene that spans the boundary between the small single copy (SSC) and IRa region. We then separated the plastome alignment into: (1) 79 protein‐coding regions; (2) 15 introns; and (3) 73 intergenic regions. For each dataset, the ambiguously aligned regions and polyA repeats were removed, using visual checks for the exons and intron regions, and the strict mode of trimAl (Capella‐Gutiérrez et al., 2009) for the intergenic regions (Appendix S8). Sequences shorter than 25% of the length of the aligned matrix for each region and columns containing >75% of gaps were removed in trimAl (Capella‐Gutiérrez et al., 2009) to avoid issues with long branch attraction following Gardner et al. (2021). Two pseudogenes (ycf1 and rps19) at the junction of IRa and Long Single Copy (LSC) (Amiryousefi et al., 2018), and four intergenic regions with no parsimony informative characters were excluded from the final analysis. All remaining loci alignments were concatenated together for the final PL phylogenetic analyses.

To test for the effect of missing data, two datasets were compared: (1) a matrix with 151 taxa containing all 140 species selected for this study with higher proportion of missing data (147,278 bp long with the second IR removed); and (2) a matrix with 125 samples containing only complete plastid sequences (Appendices S2 and S8).

ML searches were run on all PL datasets in IQ‐TREE2 (Minh et al., 2020b) with 1,000 non‐parametric bootstraps. Optimal substitution models were determined using –TEST in IQ‐TREE2 (Appendix S9). For both PL datasets, topologies from two different partitioning schemes were also compared (unpartitioned vs. best‐fit partition scheme based on PartitionFinder; Lanfear et al., 2012) in IQ‐TREE2, to test if accounting for variation in substitution rate amongst loci affected the phylogenetic results. BS values ≥95% were considered strong, 75 to 94% moderate, and 60 to 74% weak.

TC dataset

Trimmomatic (Bolger et al., 2014) was used to trim reads (TruSeq. 3‐PE‐simpleclip.fa:1:30:6, LEADING:30, TRAILING:30, SLIDINGWINDOW:4:30, MINLEN:36). Read quality was checked with FastQC (Andrews, 2010) and MultiFastQC (Ewels et al., 2016). Over‐represented repeat sequences were removed with CutAdapt (Martin, 2011). HybPiper (Johnson et al., 2016) was used to produce reference‐guided de novo assembles using the reference provided by Johnson et al. (2019). Putative paralogs were identified using the HybPiper script “paralog_retriever.py”. Phylogenies were generated for all 45 loci for which paralog warnings were found using MAFFT (Katoh et al., 2005) and FastTree (Price et al., 2010). Five loci were deleted and several taxa whose paralogs caused paraphyly of clades were excluded from 27 loci (one to seven taxa per loci). A single gene (g5299) presented a clear duplication event and was divided into two separate matrices for downstream analyses.

Default HybPiper settings were used for all but three samples (S. betaceum Cav., S. valdiviense Dunal, and S. etuberosum Lindl.), for which the coverage cutoff was reduced from eight to four to maximise recovery of target genes. One sample (S. terminale Forssk.) was excluded due to poor sequence quality. Only the exon dataset was analyzed in downstream phylogenomic analyses, because the transcriptome dataset showed large differences in the recovered flanking regions of target loci between samples, likely due to post‐transcriptional splicing and editing of messenger RNA. The HybPiper script “fasta_merge.py” was used to concatenate all genes together and produce a partition file. In summary, an average of 289 genes per sample were recovered for the TC analysis (min 48, max 340) when the two samples with low numbers were excluded (S. betaceum and S. etuberosum, Appendix S3). Furthermore, to reduce the effect of missing data and long branch attraction, sequences shorter than 25% of the average length for the gene were eliminated. The number of loci retained from the min04 and min20 datasets was 310 and 348 respectively, with the final aligned length varying between 242,272 bp and 261,975 bp (Appendix S10).

The effect of missing data was tested by comparing two different sampling thresholds based on the minimum number of taxa in each of the target genes alignments (min20 vs. min04, i.e., a minimum of 20 taxa per gene and a minimum of four taxa per gene, respectively) using HybPiper (Johnson et al., 2016) to retrieve and filter the genes.

ML analyses were run on both TC datasets in IQ‐TREE2 (Minh et al., 2020b) with partitioning between loci. In addition, IQ‐TREE2 was used to generate individual ML trees for each loci, and the resulting phylogenetic trees were used for coalescent analyses with ASTRAL‐III version 5.7.3 (Appendix S9; Zhang et al., 2018), where tree nodes with <10% BS values were collapsed using Newick Utilities version 1.5.0 (Junier and Zdobnov, 2010). Trees with excessively long branches were identified using phyx (Brown et al., 2017) by looking at tree lengths and root‐to‐tip variation (command “pxlstr”); seven gene trees with excessively long branches were identified and excluded for the min20 and ten for the min04 datasets, leading to a total of 303 and 338 gene trees being used for the respective coalescent analyses. Branch support was assessed using local PP support (Sayyari and Mirarab, 2016) calculated in ASTRAL‐III, where PP values >0.95 were considered strong, 0.75 to 0.94 weak to moderate, and ≤0.74 as unsupported.

Discordance analyses

Comparison of resulting species trees

Topological congruence and discordances between all 10 topologies generated were assessed visually by generating graphical representations through custom R‐scripts using the following packages: “ggtree” (Yu, 2020), “stringr” (Wickham and Wickham, 2019), “ape” (Paradis and Schliep, 2019), “ggplot2” (Villanueva and Chen, 2019) and “gridExtra” (Auguie, 2017). To facilitate comparisons, all trees were reduced to include the outgroup Jaltomata and 9 taxa representing the following clades of Solanum, which were recovered in all analyses: Thelopodium, Regmandra, Potato, Morelloid (as a representative of both the Dulcamaroid and Morelloid clades), Archaesolanum, S. anomalostemon S.Knapp & M.Nee (species sister to Clade II), Acanthophora (minor clade of the Leptostemonum) and two representatives of the EHS clade (Table 1). The species sampled in the PL and TC datasets were identical for all except three minor clades, in which different closely related species were sequenced (Acanthophora: S. viarum Dunal/S. capsicoides All.; Morelloid: S. opacum A.Braun & C.D. Bouché/S. americanum Mill.)

Concordance factors

Phylogenomic discordance was measured using gene concordance factors (gCF) and site concordance factors (sCF) calculated in IQ‐TREE2 (Minh et al., 2020a). These metrics assess the proportion of gene trees that are concordant with different nodes along the phylogenetic tree and the number of informative sites supporting alternative topologies. Low gCF values can result from either limited information (i.e., short branches) and/or genuine conflicting signal; low sCF values (~30%) indicate lack of phylogenetic information in loci (Minh et al., 2020a). The metrics were calculated using the TC‐min20‐ASTRAL‐III min20 topology (303 genes) and the PL IQ‐TREE2 topology of 151 species (unpartitioned) where sampling was reduced to 21 and 34 tips in TC and PL topologies, respectively, retaining a single tip for each of the different minor and major clades. An additional tip was retained for the EHS Clade to visualize the gCF and sCF for the crown node of that lineage.

Network analyses and polytomy tests

The presence of reticulate evolution and conflicting signals in gene trees in the TC dataset was explored by generating a filtered supertree network in SplitsTree 4 (Huson and Bryant, 2006) of the TC min20 dataset (303 genes) collapsing branches with <75% local PP support with a minimum number of trees set to 50% (151 trees). Polytomy tests were carried out in ASTRAL‐III (Sayyari and Mirarab, 2018), using the ASTRAL‐III topologies of the two datasets (min20 and min04). Gene trees were used to infer quartet frequencies for all branches to determine the presence of polytomies while accounting for ILS. The analysis was run twice to minimize gene tree error.

RESULTS

Phylogenetic analyses

Congruent recovery of major clades

All three datasets, including the supermatrix and the two phylogenomic datasets (PL and TC), recovered previously recognized major clades in Solanum (Figures 1 and 2AC); a few minor clades, concentrated in Clade II, were found to be polyphyletic in the supermatrix phylogeny, including the Mapiriense‐Clandestinum, Sisymbriifolium, Wendlandii‐Allophyllum and Cyphomandropsis minor clades (Appendices S11 and S12); comparison with PL and TC phylogenies is not possible, as only one species of each clade were sampled in these datasets. In Clade I, nearly all specimens of the Dulcamaroid clade formed a monophyletic group. The only exception concerned S. alphonsei Dunal, sampled here for the first time. In both the supermatrix and PL analyses, this species was sister to S. valdiviense of the Valdiviense clade, with maximum branch support in the PL analyses (Figure 2, Appendix S13).

Figure 1.

Figure 1

Supermatrix phylogeny from Maximum Likelihood analysis (RaxML) of 742 Solanum species based on two nuclear and seven plastid regions. Bootstrap branch support values are color‐coded: black = strong (0.95–1.0), white = moderate to weak support (0.75–0.94). Dashed lines in phylogeny indicate relationships that were not recovered in the TC and PL analyses (see Figures 2, 3). Clade names refer to major and minor clades discussed in the text (see Table 1).

Figure 2.

Figure 2

Comparison of Solanum clades recovered in plastome (PL) and target‐capture (TC) phylogenomic datasets. (A) Plastome phylogeny from the unpartitioned maximum likelihood analysis (PL‐151‐UP) based on 160 loci representing exons, introns and intergenic regions; (B) Filtered supertree network of the TC dataset (min20) based on 303 gene trees with a 50% minimum tree threshold. (C) TC phylogeny with 40 species from coalescent analysis (TC‐min20‐ASTRAL‐III). Clades are shown in the same color in all three phylogenies to enable comparison. Branch support values (BS values in (A) and local PP values in (C)) are color coded: black = strong (0.95–1.0), white = moderate to weak (0.75–0.94). Scale bars = substitutions/site. Collection or GenBank numbers are indicated in the PL phylogeny for duplicate species sampled in the phylogenetic trees.

Despite these minor novelties, all analyses recovered the Thelopodium clade as sister to the rest of Solanum (Figures 1 and 2; Appendices S11S15). The Potato clade was strongly supported across all analyses (Figures 1 and 2; Appendices S11S15), as was the Regmandra clade in supermatrix and PL analyses (only one sample in TC phylogenies). Furthermore, all analyses recovered a clade here referred to as DulMo that includes the Morelloid and Dulcamaroid clades (Figures 1 and 2; Appendices S11S15). A new strongly supported clade, here referred to as VANAns clade and comprising the Valdiviense (including S. alphonsei, see below), Archaesolanum, Normania, and the African non‐spiny clades, was found across all analyses (Figures 1 and 2; Appendices S11 to S15).

Clade II was supported as monophyletic across all topologies (Figures 1 and 2A, C), with maximum branch support in all 10 species trees (Appendices S11 to S15). While differences in sampling prevent thorough comparisons of relationships between clades within Clade II, there was no deep incongruences detected amongst topologies obtained with the supermatrix, PL, and TC datasets (Figures 1 and 2A, C; Appendices S9S15). Within Clade II, the large Leptostemonum clade (the spiny solanums) was strongly supported in all cases (Figures 1 and 2A, C; Appendices S11S15).

Incongruent relationships amongst clades and impact of different analyses

Overall, we found that despite using different phylogenetic analyses and investigating the impact of missing data and taxon sampling on the different datasets, these had little impact on the relationships recovered amongst clades. The BI and ML supermatrix analyses were identical in terms of composition and relationships of major clades (Figure 3B), as were the four PL species trees (Figure 3D, E). There were some differences amongst the topologies of the TC datasets, but these differences concerned branches which had little support (Figure 3A–C). Between supermatrix, PL and TC datasets, however, major incongruences between species trees were observed with respect to the relationships among the main clades identified in the section above (Figures 1, 3).

Figure 3.

Figure 3

Comparison of Solanum clades recovered in the three different datasets. (A) TC ASTRAL‐III phylogeny of the min20 dataset, with local posterior probabilities indicated at nodes; (B) ML and BI phylogenies of supermatrix dataset, with bootstrap support and posterior probabilities indicated at nodes; (C) TC ML phylogeny of the min20 dataset, with local posterior probabilities indicated at nodes; (D) PL ML phylogenies of the unpartitioned and best partition‐scheme of the 151 taxa dataset, with bootstrap for each respective analysis is indicated at nodes; (E) TC ML phylogeny and ASTRAL‐III phylogeny of the min04 dataset, with bootstrap support and local posterior probabilities indicated at nodes; (F) PL ML phylogenies of the unpartitioned and best partition‐scheme of the 125 taxa dataset, with bootstrap for each respective analysis indicated at nodes.

While the BI and ML supermatrix phylogeny supported the monophyly of the previously recognised Clade I that includes most non‐spiny Solanum clades (Figure 1; Appendices S11 and 12), the PL and TC phylogenetic trees resolved clades associated with Clade I as a grade relative to Clade II (Figure 2A, C; Appendices S13S15). This was due in large part to the unstable position of the Regmandra clade that was subtended by a particularly short branch and resolved in different positions along the backbone in all three datasets (Figure 3). For example, the ML supermatrix analysis recovered the Regmandra clade as sister to the Potato clade with strong to moderate branch support (Figure 3B), although the BI supermatrix analysis could not resolve whether the Regmandra clade was sister DulMo + VANAns clade or the Potato clade (Figure 3B, Appendix S12). In contrast, the PL analyses resolved Regmandra as sister to the M clade + Clade II, with either maximal or no branch support at all (Figure 3). The TC species trees resolved Regmandra as sister to the Potato clade, DulMo, and Clade II, with maximum support (Figure 3). While one of the TC ASTRAL‐III analysis also recovered this topology with moderate support (local posterior probability 0.82, Figure 3), the other TC ASTRAL‐III analysis resolved Regmandra as sister to the VANAns clade, but without any branch support (local PP 0.4, Figure 3).

The previously identified M Clade composed of the VANAns and DulMo clades were not supported by all analyses (Figure 3). While all PL ML analyses recovered the M clade with maximum BS values (Figure 3), none of the TC analyses recovered it. Instead, they resolved the DulMo clade as sister to the Potato clade, with maximal BS or local PP support values (Figure 3). Furthermore, the VANAns clade was recovered as sister to the rest of Solanum (excluding the Thelopodium clade) with moderate support in the TC ML analyses. Placement of the VANAns clade in the TC ASTRAL‐III analyses had low or no support value, being resolved as either sister to DulMo, or sister to the rest of Solanum, excluding the Thelopodium clade (Figure 3).

In addition, the position of the Potato clade within Solanum was incongruent between datasets, i.e., whereas it was resolved as sister to Regmandra in the supermatrix analysis, it was resolved as sister to the remaining Solanum in PL dataset, and sister to the DulMo clade in all TC analyses (Figure 3), all with strong branch support. The phylogenomic datasets also showed incongruent positions for the Etuberosum clade within the larger Potato clade, where TC analyses resolved it as sister to the Petota clade with maximum local PP support in the ASTRAL‐III analyses (Appendix S15); in the ML analyses, this position either had moderate BS values (76%) or was found to be nested within the Petota clade with no branch support (Appendix S14). In contrast, PL analyses placed Etuberosum clade as sister to the Tomato clade with maximum branch support (Appendix S13).

Finally, the BI and ML supermatrix phylogenies resolved the morphologically unusual S. anomalostemon as sister to the rest of Clade II (BS 95%, PP 1.0; Figure 3, Appendices S11 and S12). This contrasts with results from previous analyses, which found it to be part of the Mapiriense clade (Särkinen et al., 2015). PL analyses supported S. anomalostemon + Brevantherum clade as sister to the rest of Clade II with high branch support (Appendix S13). Solanum anomalostemon was also found to be sister to Clade II, although the Brevantherum clade was not included in the TC analyses preventing a strict comparison (Figure 3). Two other taxa were found to represent single species lineage: S. polygamum Vahl as sister to the Leptostemonum clade and S. euacanthum Phil. as sister to the EHS clade (Appendices S11 and S12). Within the Leptostemonum clade, the EHS clade was strongly supported in all analyses (Figures 1, 3). There were however some minor differences in species‐level relationships for closely related species of the Eggplant clade and Anguivi Grade (viz. S. campylacanthum Hochst. ex A.Rich., S. melongena L., S. linnaeanum Hepper & P.‐M.LJaeger, S. dasyphyllum Schum. & Thonn., and S. aethiopicum L.; Figures 1 and 2AC; Appendices S11S15).

Discordance analyses

Concordance factors

Phylogenomic discordance was generally high across the PL and TC topologies, with gCF values >50% in only three nodes in the PL phylogeny (Solanum as a whole, S. chilense (Dunal) Reiche + S. lycopersicum L. or the Tomato clade, and S. hieronymi Kuntze + S. aridum Morong in the Leptostemonum clade; Figure 4). Elsewhere, along the backbone of the PL phylogeny, gCF fell to 39% and below (8 nodes with gCF values 10% and below), with the lowest values found near branch nodes that varied the most amongst the different reconstructed species trees. This included the node subtending Regmandra (gCF 4%, SCF 38%; Figure 4), and that positioning Regmandra + DulMo + VANAns clade as sister to Clade II (gCF 2%, SCF 31%). Similarly, low gCF and uninformative sCF values around 33% were found across Clade II, including the node placing S. hieronymi + S. aridum as sister to the Elaeagnifolium + EHS minor clades (gCF 6%, sCF 36%; Figure 4), as well as the placement of the Erythrotrichum + Thomasiifolium clades within the large Leptostemonum clade (gCF 5%, sCF 23%; Figure 4).

Figure 4.

Figure 4

Discordance analyses within and between the plastome (PL) and target capture (TC) phylogenomic datasets across Solanum. Rooted TC ASTRAL‐III phylogeny (left) and PL IQ‐TREE2 phylogeny (right) with gene concordance factor (gCF) and site concordance factor (sCF) values shown as pie charts, above and below each node respectively; the PL topology is the unpartitioned ML analysis of 151 taxa, whereas the TC topology is based on the analysis of 40 taxa and 303 genes recovered from the A353 bait set. Both trees have been pruned to retain a single tip for each of the major and minor clades present within the PL and TC datasets. For gCF pie charts, blue represents proportion of gene trees concordant with that branch (gCF), green is proportion of gene trees concordant for 1st alternative quartet topology (gDF1), yellow support for 2nd alternative quartet topology (gDF2), and red is the gene discordance support due to polylphyly (gDFP). For the sCF pie charts: blue represents proportion of concordance across sites (sCF), green support for 1st alternative topology (quartet 1), and yellow support for 2nd alternative topology (quartet 2) as averaged over 100 sites. Percentages of gCF and sCF are given above branches, in bold. Branch support (local posterior probability) values ≥0.95 are not shown, and 0.94 and below are shown in italic grey, on the right; double‐dash (‐‐) indicates that the branch support was unavailable due to rooting of the phylogenetic tree.

Across the TC phylogeny, gCF and sCF values were slightly higher on average, with 3 nodes presenting values >50% for both metrics, i.e., one within the Petota clade (gCF 67%, SCF 69%; Figure 4), one at the base of the Leptostemonum clade (gCF 64%, SCF 72%; Figure 4), and another at the base of the EHS clade within Leptostemonum (gCF 58%, SCF 75%; Figure 4). Three nodes had low gCF values of 10% or less, with again some of the lowest values located near the base of the tree, including the relationship of Regmandra as sister to the VANAns clade (gCF 3%, sCF 39%; Figure 4), or placement of Potato as sister to the DulMo clade (gCF 10%, sCF 41%; Figure 4), and the relationship of the Potato + DulMo clades as sister to Clade II (gCF 4%, sCF 41%; Figure 4).

Network analyses and polytomy tests

High amount of reticulation/gene tree conflict was recovered between major clades of Solanum previously assigned to Clade I (e.g., Thelopodium, Regmandra, Potato, DulMo, VANAns), as well as with some lineage belonging to Clade II in the filtered supertree network using the TC data with 303 genes (min20; Figure 2B). The network clearly supported the monophyly of the Leptostemonum and the EHS clade (Figure 2B), corresponding to the nodes with high gCF and sCF values in the TC ASTRAL‐III phylogeny (N1 and N2, Figure 4).

The polytomy tests carried out for the two TC ASTRAL‐III datasets resulted in 10 nodes each for which the null hypothesis of branch lengths equal to zero was accepted, suggesting they should be collapsed into polytomies (Appendix S16); these nodes corresponded to the ones subtending the Regmandra, Leptostemonum and EHS clades, but were also located within the VANAns clade as well as within Clade II, the. Polytomies were also detected with the Petota clade, including at the base of the Tomato clade (min04 dataset, Appendix S16), and at the base of the Etuberosum + Petota + Tomato clade (min20 dataset, Appendix S16). Repeating the analysis by collapsing nodes with <75% local PP support led to the collapse of 12 to 13 nodes across the analyses, most of them affecting the same clades as in the previous runs, but also leading to the collapse of the crown node of Solanum. The effective number of gene trees was too low when nodes with <75% local PP support were collapsed to carry out the test for two nodes subtending S. betaceum and S. anomalostemon, most likely related to the low number of genes recovered for S. betaceum (Appendix S3).

DISCUSSION

The results of the ten phylogenetic analyses conducted here provide an updated evolutionary framework for the large and economically important genus Solanum, demonstrating that the major and minor clades within the group are stable (with a few noteworthy exceptions, see below). However, the strong levels of nuclear and nuclear‐plastome discordance uncovered in the PL and TC analyses, in combination with the network analysis and polytomy tests, suggest that there are polytomies present along the backbone of the phylogeny. We first discuss the stability of the clades within Solanum, and the discovery of a few novel minor clades. We then examine the nuclear‐plastome discordance and polytomies recovered and explore the possible causes underlying these, and their implications for the study of biogeography and trait evolution.

Updated evolutionary framework for Solanum

The supermatrix phylogeny, despite being based on only nine loci, nearly doubles the species sampling, confirming the monophyly of most major and minor clades established in previous analyses (Särkinen et al., 2013) and the polyphyly of three minor clades (Pachyphylla, Cyphomandropsis, and Allophyllum, the latter including species of Mapiriense‐Clandestinum clade). It also reveals three new minor clades in Solanum comprising a single species each and confirms the placement of 129 previously unsampled species (e.g., S. alphonsei in the Valdiviense clade and S. graveolens Bunburry in the Cyphomandra clade; Appendices S11 and S12). Meanwhile, the phylogenomic analyses with increased gene sampling reveal a previously undetected major clade referred to as VANAns comprising of four minor clades (Valdiviense, Archaesolanum, Normania, and African non‐spiny clades). Finally, our results did not support two previously resolved major clades due to nuclear‐plastome discordance (Clade I and the M clade; Figure 2). Detailed molecular systematic studies with increased taxon and genetic sampling will be required to fully resolve the circumscription of all the major and minor clades recovered with diagnostic features, including the new ones identified here (Hilgenhof et al., unpublished manuscript).

Overall, our results establish that the taxonomic framework used in Solanum dividing the large genus into major and minor clades is robust, based on both phylogenomic datasets recovering the same major clades independent of methodological choices compared to the Sanger sequence supermatrix (e.g., Thelopodium, Regmandra, Potato, DulMo, VANAns, Clade II, Leptostemonum, and EHS clade). The major and minor clades currently used as informal infrageneric groups in Solanum were first established by Bohs (2005) based on a single locus of c. 2000 bp in length (ndhF). Our results demonstrate that larger species and gene sampling support the clades established earlier (e.g., Weese and Bohs, 2007; Särkinen et al., 2013). However, increased gene sampling provided by the two phylogenomic datasets does not help to resolve any of the polytomies along the backbone of Solanum close to the crown node and along the backbone of Clade II (Särkinen et al., 2013).

Nuclear and nuclear‐plastome discordance

Our results reveal three regions of the Solanum phylogeny with gene discordance with low gCF and sCF values in the PL and TC dataset (Figure 4). These regions with nuclear discordance include: (1) the backbone of Solanum near the crown node of the genus where major clades previously identified as Clade I diverge (from here on referred to as Grade I); (2) the backbone of the large Leptostemonum clade; and (3) the backbone of the EHS clade within the Leptostemonum (Figures 2B and 3). Many of the branches within these regions are extremely short in both PL and TC phylogenomic datasets (Figures 1 and 2; Appendices S11S15), and network analyses of the nuclear dataset reveals reticulation in one of them (Grade I, Figure 2B). Polytomy tests confirm that multiple nodes within all three regions should be collapsed in the TC dataset (Appendix S16) and support the recognition of these regions as polytomies. Hence, we refer to these three regions of the phylogeny as polytomies from hereon.

Further exploration of the polytomies reveal nuclear‐plastome discordance within Grade I, relating to the position and relationship between Regmandra, Potato, DulMo and VANAns clades (Figures 3 and 4). No signal of nuclear‐plastome discordance was detected in the other polytomies based on the species sampling presented here (Figures 3 and 4), but increased species sampling will be needed to confirm these results.

Altogether, our results indicate the presence of three polytomies which differ somewhat in nature. The deepest of these polytomies along the backbone of Solanum near the crown node shows high nuclear and nuclear‐plastid discordance with reticulation evident even within the nuclear phylogenomic dataset (Figure 2B). This polytomy could be referred to as a hard polytomy because it will probably be difficult to resolve even with more genomic data, due to its deeper position in the phylogeny in terms of evolutionary depth and time, the presence of clear nuclear‐plastome discordance, short branch lengths and evidence for reticulation within the nuclear phylogenomic dataset. In contrast, the other two polytomies along the backbone of Leptostemonum and the EHS clades are at shallower evolutionary depth and show nuclear discordance only without clear/widespread reticulation in the nuclear dataset (Figure 2B). These polytomies represent simpler cases and may turn out to be possible to resolve with more genomic data. In either case, to confirm whether the polytomies recovered here are truly “hard” or “soft”, denser taxon sampling and more genomic data will be required to carry out more rigorous tests concerning the cause of the gene discordance observed here.

What is causing genomic discordance in our dataset?

Finding genomic discordance in our phylogenomic datasets is unsurprising, given that it has also been found in many other phylogenomic studies in the Solanaceae, including Nicotiana (Dodsworth et al., 2020), the Capsiceae (Capsicum and relatives; Spalink et al., 2018), subtribe Iochrominae (Gates et al., 2018), Jaltomata (Wu et al., 2019), and two studies of Solanum involving the Tomato (Strickler et al., 2015; Pease et al., 2016) and Petota clades (Huang et al., 2019). ILS was shown to be responsible for the widespread discordance found in phylogenomic data in the diploid Tomato clade (Strickler et al., 2015; Pease et al., 2016), while hybridization and introgression has been argued to be behind genomic discordance in Petota clade that includes many polyploids (Huang et al., 2019).

Potential processes responsible for nuclear or nuclear‐plastome discordance involve gene introgression, ILS, hybridization, and polyploidization; distinguishing between these remains difficult even with increased genomic sampling involving custom bait sets (Larridon et al., 2020; Koenen et al., 2021) or whole genome‐sequences (Suh, 2016; Malinsky et al., 2018; Williams et al., 2021). Comparison of the nuclear and plastome topologies in our study does not indicate any obvious chloroplast capture events that could explain the observed nuclear‐plastome discordance along the backbone of Solanum near the crown node. Furthermore, cytogenetic and chromosome studies show no evidence for genome duplication or polyploidy along the three polytomies discovered here, despite the three‐fold increase in genome size between the distantly related potato (S. tuberosum L., Potato clade) and eggplant (S. melongena, Leptostemonum clade; Barchi et al., 2019). Chromosome counts indicate that the ancestor of Solanum was diploid, i.e., a large majority of Solanum species are reported to be diploid (>97% of the 506 species for which chromosome counts are available), and mapping of ploidy level across the phylogeny indicates that most of the lineages involved in the three polytomy regions identified here are diploid (Chiarini et al., 2018). Polyploidy has arisen independently within the Archaesolanum, Petota, Morelloid, Caroliniense, Elaeagnifolium, and EHS minor clades within the larger Leptostemonum clade (Chiarini et al., 2018), and hybridization/introgression has been argued to be the case behind phylogenomic discordance found in the Petota clade (Huang et al., 2019). Gene duplication could explain the signal recovered here for the EHS clade but is unlikely to explain the discordance observed here. Save for one locus, our analyses did not detect the presence of paralogs in our nuclear dataset.

Currently, the most likely explanation for the discordance along the backbone of Solanum is due to ILS caused by rapid speciation. Two of the polytomies include the most species‐rich (Table 1) and rapidly diversifying lineages of Solanum, the Leptostemonum and the EHS clades (Echeverría‐Londoño et al., 2020), whose crown ages have been estimated to be between 8 to 11 and 4 to 6 million years (Myr), respectively (Särkinen et al., 2013). The backbone of Solanum near the crown node has been estimated to be almost twice as old as the Leptostemonum clade (13 to 17 Myr; Särkinen et al., 2013) yet shows a strong signal of nuclear‐plastome discordance. While past studies have not detected any increased rates of diversification near the crown node of Solanum, detecting diversification rate shifts remains a challenge (Louca and Pennell, 2020), especially in older nodes. Hence, we cannot fully exclude the option that ILS and rapid speciation has taken place close to the crown node of the genus.

Presence of short internal branches is typical of ILS in lineages with large population sizes and high mutation rates (Schrempf and Szöllősi, 2020). This fits with the biology of Solanum in general, which is typically known to contain “weedy”, disturbance‐loving pioneer species resilient to change. Many species are known to have large geographical ranges and ecological amplitude, including globally distributed weeds from the Leptostemonum, Brevantherun and Morelloid clades, such as S. elaeagnifolium Cav., S. caroliniense L., S. torvum Sw., S. erianthum D.Don, S. mauritianum Scop., S. americanum, and S. nigrum L. (Knapp et al., 20172019; Cowie et al., 2018; Särkinen et al., 2018). Some of the weedy characteristics found in these species include the ability to improve fitness and defense traits in response to disturbance (Chavana et al., 2021), as well as having allelopathic properties which allow them to establish themselves to the detriment of native vegetation (Cowie et al., 2018). If such characteristics were present in ancestral Solanum, they could have promoted rapid speciation across the globe, followed by rapid morphological evolution and speciation within areas. The patterns observed here could possibly be the result of three major rapid speciation “pulses” across the evolutionary history of Solanum, involving lineages close to the crown node of Solanum, Leptostemonum, and the EHS clade. The idea of an ecologically opportunistic ancestor is supported by the tendency of many of the major clades near the crown node of Solanum to occupy periodically highly stressed and disturbed habitats, including flooded varzea forests occupied by Thelopodium clade, hyper‐arid deserts occupied by Regmandra clade, and highly disturbed and dynamic open mid‐elevation Andean montane habitats occupied by DulMo clade, where landslides are among the most common areas where many of the species are found (Knapp, 2013; Särkinen et al., 2018; Knapp et al., 2019).

Future studies with larger datasets will be able to carry out additional tests, such as the impact of using phylogenetic models that take into consideration the heterogeneity of molecular sequence evolution (Williams et al., 2021), as well as different data types (Romiguier et al., 2013; Reddy et al., 2017). Future studies will need to untangle how introgression and ILS are potentially affecting the patterns of genomic discordance observed here at different phylogenetic depths (Meleshko et al., 2021). Additional information about recombination, chromosome structure, and genomic size and evolution of Solanum will also be useful to clearly define coalescence genes in phylogenomic datasets, fundamental units in coalescent analyses which are rarely examined (Springer and Gatesy, 2018). Currently, information about genome evolution in Solanum is lacking, as only 62 species (5% of Solanum) are recorded in the plant DNA C‐value database (Pellicer and Leitch, 2020), and 86 species (7% of Solanum) have been studied with chromosome banding and/or FISH techniques (Chiarini et al., 2018). Information about genome size is missing for lineages such as the Thelopodium and Regmandra clades and for the majority of species not directly related to major commercial crops.

Implications for biogeographical and morphological studies in Solanum

The idea that well‐supported and fully bifurcating phylogenies are a requisite for evolutionary studies is built on the premise that such trees are the accurate way of representing evolution. The shift in systematics from “tree”‐ to “bush”‐like thinking, where polytomies and reticulate patterns of evolution are considered as acceptable or real (Poczai, 2013; Mallet et al., 2016; Edelman et al., 2019), comes from the accumulation of studies finding similar unresolvable phylogenetic nodes, despite using different large‐scale genomic sampling strategies and various analytical methods (Suh, 2016). Given the difficulty of resolving short internal branches in phylogenies and the rapid evolution of major clades in Solanum, it will be important to adopt methods that incorporate polytomies and networks to conduct biogeographical and morphological studies (Than et al., 2008; Solís‐Lemus et al., 2017; Wen et al., 2018; Olave and Meyer, 2020; Lutteropp et al., 2021 [Preprint]).

In terms of biogeography, our inability to resolve relationships amongst the major lineages in Solanum, especially along the backbone of Solanum near the crown node, has implications for understanding the ancestral environment of Solanum and its major lineages. Uncertainty amongst the relationships of major clades does not change the hypothesis that the genus probably originated from South America and spread multiple times to Africa, Asia, Australia, North America, and Europe (Olmstead and Palmer, 1997; Echeverría‐Londoño et al., 2020). The polytomy near the crown node of Solanum does, however, cast uncertainty on the specific region and habitat/biome that the major clades originated within the South American continent. For example, the sister relationship of Regmandra and the Potato clade inferred by the Sanger supermatrix analysis suggests that the wild ancestors of both potato and tomato evolved from an ancestor adapted to survive in lomas deserts from coastal South America (Bennett, 2008; Figure 1). Yet, both nuclear and plastome phylogenomic datasets suggest that the Potato clade is more closely related to the DulMo clade found to occur in tropical montane and subtropical biomes (Figure 3).

The hard polytomy along the backbone of Solanum also has important implications for evolutionary biologists interested in trait evolution. Standard methods of trait evolution relying on bifurcating trees may incorrectly infer how traits evolve (Hahn and Nakhleh, 2016). The discordance between traits, gene trees, and species trees has been defined as hemiplasy (Avise and Robinson, 2008), and studies have shown that depending on the level of ILS present in the data, hemiplasy can lead to different interpretations of convergent evolution of traits across phylogenetic trees (Mendes et al., 2016). While broad mapping of morphological traits on a species‐level phylogeny can help gain a rough understanding of phenotypic variation across clades, careful study of gene tree topologies in relation to a trait of interest is essential to gain an exact understanding of its evolutionary origin.

Our findings reflect results from recently published studies showing rapid morphological innovation coinciding with areas of strong phylogenomic discordance in different plants and animal groups (Parins‐Fukuchi et al., 2021), where the signal of nuclear‐plastome discordance corresponds to strong ecological diversification and morphological innovation across major clades in Solanum previously assigned to Clade I. The major clades involved in the nuclear‐plastome discordance along Grade I show large differences in their ecology as well as morphology. Members of the Thelopodium, Regmandra, VANAns, Potato, and DulMo clades occupy a wide range of tropical, montane, and temperate habitats across South America, Africa, and Australia (Symon, 1994; Knapp, 2000; Bohs and Olmstead, 2001; Spooner et al., 200420162019; Bohs, 2005; Peralta et al., 2007; Bennett, 2008; Knapp, 2013; Knapp and Vorontsova, 2016; Tepe et al., 2016; Särkinen et al., 2018; Knapp et al., 2019). Morphology shows equally high polymorphism between these major clades across many traits, such as growth form, which varies from single‐stemmed wand‐like shrubs (Thelopodium clade), annual herbs (Regmandra, Potato, and Morelloid clade), woody climbers and shrubs (VANAns clade), and herbaceous vines rooting along nodes (Potato clade). Similar patterns are observed in inflorescence position and branching, corolla shape, stamen dimorphism, and anther shape showing the presence of high polymorphism in these clades of which only some was retained in Clade II (Hilgenhof et al., unpublished manuscript). Testing the idea that this phenotypic diversity is linked to ecological diversification will require the construction of detailed morphological and ecological datasets to test if this pattern holds up in more formal and rigorous analyses.

CONCLUSIONS

We demonstrate the stability of the majority of the clades defined within Solanum and uncover significant nuclear and nuclear‐plastome discordance amongst relationships of major clades in Solanum based on the first phylogenomic study of the genus with wide species sampling. Three major polytomies are identified in Solanum based on the short branch lengths, gene concordance factor results, and polytomy tests. Two of these polytomies correspond to the biggest and most quickly diversifying lineages within Solanum (Leptostemonum and EHS clades). The third polytomy along the backbone of Solanum near the crown node involves reticulation and strong nuclear‐plastome discordance and highlights great uncertainty in the relationships between the Potato, DulMo, Regmandra, and VANAns clades. This region of nuclear‐plastome discordance corresponds with high ecological and morphological innovation and we argue that it is most likely due to ILS and rapid speciation based on current knowledge of genome evolution in Solanum. Future studies, even with full genome sequences and increased taxon sampling, might not be able to resolve the polytomy near the crown node of Solanum because the pattern of high reticulation combined with internodal short branches and its older age. Data on genome size and chromosome structure of the earliest branching lineages in Solanum will be required to further explore the nature and causes of this hard polytomy. We argue that acknowledging and embracing polytomies and reticulation is crucial if we are to design research programs aimed at understanding the biology of large and rapidly radiating lineages, such as the large and economically important Solanum.

FUNDING INFORMATION

This work was supported by the Fonds de recherche du Québec en Nature et Technologies postdoctoral fellowship and a grant from the Department of Biological Sciences of the University of Moncton to E.G., the Sibbald Trust fellowship to R.H., the Ceiba Foundation to A.O., CNPq Conselho Nacional de Desenvolvimento Científico e Tecnológico awards 479921/2010‐54 and 427198/2016‐0 and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior CAPES/FAPESPA award 88881.159124/2017‐01 to L.L.G., NSF through grant DEB‐0316614 “PBI Solanum: a worldwide treatment” to S.K. and L.B., the Calleva Foundation & Sackler Trust (Plant and Fungal Trees of Life Project at Kew), the LUOMUS Trigger and Systematics Research Fund to P.P., the OECD CRP and Eötvös Research Grant (MAEÖ−00074‐002/2021). Field sampling was supported by the Northern Territory Herbarium (Palmerston, Northern Territory, Australia), and the David Burpee Endowment at Bucknell University (Lewisburg, Pennsylvania, USA) and National Geographic Society Northern Europe Award GEFNE49‐12 (Peru, TS). Peruvian specimens were collected and sequenced under the permission of Ministerio de Agricultura, Dirección General Forestal y de Fauna Silvestre (collection permits 084‐2012‐AG‐DGFFSDGEFFS and 096‐2017‐SERFOR/DGGSPFFS, and genetic resource permit 008‐2014‐MINAGRI‐DGFFS/DGEFFS).

AUTHOR CONTRIBUTIONS

E.G. designed and performed the analyses for the paper, with guidance from P.P., A.O., S.D., and T.S.; E.G. produced all figures, and wrote the manuscript, with major contributions from T.S., as well as P.P., S.D., S.K., and X.A. R.H. and T.S. helped in data gathering and analyses. All other authors contributed data to the main analyses. All authors read and contributed to the final version of the manuscript.

Supporting information

Additional supporting information can be found online in the Supporting Information section at the end of this article.

Appendix S1. Supermatrix sample information, including voucher details and GenBank numbers for sequences used.

Appendix S2. Plastome (PL) sample information, including voucher details and plastome assemblies’ results. Total length, as well as length for the long‐single copy region (LSC), the short‐single copy region (SSC), and the two inverted repeat regions (IR1 and IR2) is shown; statistics of mean coverage per base pair and standard deviation are also provided.

Appendix S3. Target‐capture (TC) sample information, including voucher details and sequence recovery statistics. The number of reads (NumReads), the number of reads mapped to the targets (ReadsMapped), the percentage of reads on target (PctOnTarget), the number of genes with reads (GenesMapped), the number of genes with contigs (GenesWithContigs), (GenesWithSeqs, GenesAt25pct, GenesAt50pct, GenesAt75pct, GenesAt150pct, and the number of genes with paralog warnings (ParalogWarnings) is shown.

Appendix S4. Supermatrix alignment details, with details about the nine regions selected for this study. Number of species sampled per region, accumulative percentage of species sampled per region, aligned length, proportions of parsimony informative characters (PI), and variable sites (VS) per region in the dataset are indicated. Values are calculated with outgroups, and with ambiguous regions and repeats excluded. Bp = base pairs.

Appendix S5. List of polyploid taxa in Solanum.

Appendix S6. ML results for each of the nine individual loci and combined plastid loci. (A) ITS; (B) matK; (C) ndhF; (D) ndhF‐rpL32; (E) psbA‐trnH; (F) rpL32‐trnL; (G) trnL‐trnT; (H) trnS‐trnG; (I) waxy; and (J) seven plastid loci. Nodes with bootstrap support equal and above 95% are in cyan, and with branch support between 75% and 94% in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.

Appendix S7. BI results for each of the nine individual loci and combined plastid loci. (A) ITS; (B) matK; (C) ndhF; (D) ndhF‐rpL32; (E) psbA‐trnH; (F) rpL32‐trnL; (G) trnL‐trnT; (H) trnS‐trnG; (I) waxy; and (J) seven plastid loci. Nodes with posterior probability equal and above 0.95 are in cyan, and nodes with posterior probabilities between 0.75 and 0.95 are in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.

Appendix S8. Plastome (PL) alignment statistics for plastome alignment. Data shows number of sequences, trimming mode, the number of loci retained for coalescent analysis after checking for excessive gene tree branch lengths, alignment length, number of informative and constant sites, pairwise identity, average GC content, percentage of gaps, and average locus length for the exon, intron, and intergenic regions.

Appendix S9. Optimal substitution model used in ML analyses for the PL and TC datasets, determined using ModelFinder in IQ‐TREE2. For each locus, the number of taxa, sites, informative sites, and invariable sites are indicated, as well as the model selected and the AICc score. Worksheet titles correspond to the following: PLUnpartitioned = Models selected for PL unpartitioned datasets, for 151 taxa and 125 taxa; PLBestPartScheme = Models selected for PL datasets analysed according to the best‐partition scheme; TCPartitioned_Min4 = Models selected for loci of the TC dataset, with minimum 4 taxa per loci; TCPartitioned_Min20 = Models selected for loci of the TC dataset, with minimum 20 taxa per loci.

Appendix S10. Target‐capture (TC) alignment statistics. Loci excluded refer to the number of excluded loci based on excessively long branch lengths, and loci retained is the final number of loci retained for both ML and coalescent analyses. Empty sequences inserted refers to amount of missing data. Min = minimum; Bp = base pairs.

Appendix S11. Detailed RaxML of supermatrix phylogenetic tree with 746 taxa. Nodes with bootstrap support equal and above 95% are in cyan, and with bootstrap support between 75% and 94% in red. Bootstrap support values for each node indicated in italic. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.

Appendix S12. Detailed Bayesian inference (Beast) supermatrix phylogenetic tree with 746 taxa. Nodes with posterior probability equal and above 0.95 are in cyan, and nodes with posterior probabilities between 0.75 and 0.95 are in red. Posterior probability values for each indicated in italic. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.

Appendix S13. ML phylogenetic trees of plastome datasets. Nodes with bootstrap support equal and above 95% are in cyan, and with bootstrap support between 75% and 94% in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1. (A) 151 taxa, all data, unpartitioned; (B) 125 taxa, all data, unpartitioned; (C) 151 taxa, all data, best partition scheme; (D) 125 taxa, all data, best partition scheme.

Appendix S14. ML phylogenetic trees of A353 target capture datasets (IQ‐TREE2). Nodes with bootstrap support equal and above 95% are in cyan, and with bootstrap support between 75% and 94% in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1. (A) Filtering threshold of minimum 4 taxa per loci; (B) filtering threshold of minimum 20 taxa per loci.

Appendix S15. Coalescent phylogenetic trees of A353 target capture datasets (ASTRAL‐III). Nodes with multi‐locus local posterior probability support equal and above 0.95 are in cyan, and with support between 0.75 and 0.94 in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1. (A) Filtering threshold of minimum of 4 taxa per loci; (B) filtering threshold of minimum 20 taxa per loci.

Appendix S16. Polytomy test results with ASTRAL‐III. (A) Target Capture A353 species tree ASTRAL‐III, filtering threshold of minimum 4 taxa per loci, branches in gene trees with 10% or less branch support collapsed; (B) Target Capture A353, ASTRAL‐III, filtering threshold of minimum 4 taxa per loci, branches in gene trees with 75% or less branch support collapsed; (C) Target Capture A353, ASTRAL‐III, filtering threshold of minimum 20 taxa per loci, branches in gene trees with 10% or less branch support collapsed; (D) Target Capture A353, ASTRAL‐III, filtering threshold of minimum 20 taxa per loci, branches in gene trees with 75% or less branch support collapsed.

ACKNOWLEDGMENTS

We thank Elliot Gardner for sharing scripts and advice on phylogenomic analyses with HybPiper, Royce Steeves for providing advice on DNA extraction for genome skimming, Felix Forest and Olivier Maurin for providing technical support and providing feedback on the manuscript, and João R. Stehmann, Thais Almeida, Paul Gonzáles, and Maria Baden who greatly contributed to fieldwork and sample acquisition. Finally, we would also like to thank the three reviewers, including Stacey Smith and William J. Baker, who provided constructive reviews and feedback that greatly improved the final version of this manuscript.

Gagnon, E. , Hilgenhof R., Orejuela A., McDonnell A., Sablok G., Aubriot X., Giacomin L., Gouvêa Y., Bragionis T., Stehmann J. R., Bohs L., Dodsworth S., Martine C., Poczai P., Knapp S., and Särkinen T.. 2022. Phylogenomic discordance suggests polytomies along the backbone of the large genus Solanum . American Journal of Botany 109(4):580–601. 10.1002/ajb2.1827

DATA AVAILABILITY STATEMENT

Raw sequence data generated in this study are deposited in various archives, including GenBank (website: https://www.ncbi.nlm.nih.gov/genbank/) and the European Nucleotide Archive (website: https://www.ebi.ac.uk/ena/browser/home); full accession numbers are provided in Appendices S1S2, and S3. In addition, the 10 species trees generated for this study, as well as the alignments used for the different phylogenetic analyses, including the concatenated Sanger supermatrix, the plastome dataset, and the target capture datasets (min04 and min20) are available via Data Dryad, at the following link: https://datadryad.org/stash/dataset/doi:10.5061/dryad.2v6wwpzpt.

REFERENCES

  1. Aberer, A. J. , Krompass D., and Stamatakis A.. 2013. Pruning rogue taxa improves phylogenetic accuracy: An efficient algorithm and webservice. Systematic Biology 62: 162–166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Amiryousefi, A. , Hyvönen J., and Poczai P.. 2018. The chloroplast genome sequence of bittersweet (Solanum dulcamara): Plastid genome structure evolution in Solanaceae. PLoS One 13: e0196069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Andrews, S. 2010. FastQC: A quality control tool for high throughput sequence data. Website: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
  4. APG [Angiosperm Phylogeny Group] . 1998. An ordinal classification for the families of flowering plants. Annals of the Missouri Botanical Garden 85: 531–553. [Google Scholar]
  5. Arseneau, J.‐R. , Steeves R., and Laflamme M.. 2017. Modified low‐salt CTAB extraction of high‐quality DNA from contaminant‐rich tissues. Molecular Ecology Resources 17: 686–693. [DOI] [PubMed] [Google Scholar]
  6. Auguie, B. (2017). gridExtra: Miscellaneous functions for “grid” graphics. R package version 2.3. Website: https://CRAN.R-project.org/package=gridExtra
  7. Avise, J. C. , and Robinson T. J.. 2008. Hemiplasy: A new term in the lexicon of phylogenetics. Systematic Biology 57: 503–507. [DOI] [PubMed] [Google Scholar]
  8. Baker, W. J. , Bailey P., Barber V., Barker A., Bellot S., Bishop D., Botigué L. R., et al. 2021. A comprehensive phylogenomic platform for exploring the angiosperm tree of life. Systematic Biology 71: 301–319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Barchi, L. , Pietrella M., Venturini L., Minio A., Toppino L., Acquadro A., Andolfo G., et al. 2019. A chromosome‐anchored eggplant genome sequence reveals key events in Solanaceae evolution. Scientific Reports 9: 11769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bennett, J. R. 2008. Revision of Solanum section Regmandra (Solanaceae). Edinburgh Journal of Botany 65: 69–112. [Google Scholar]
  11. Bohs, L. 2004. A chloroplast DNA phylogeny of Solanum section Lasiocarpa. Systematic Botany 29: 177–187. [Google Scholar]
  12. Bohs, L. 2005. Major clades in Solanum based on ndhF sequence data. Monographs in Systematic Botany 104: 27–49. [Google Scholar]
  13. Bohs, L. , and Olmstead R. G.. 1997. Phylogenetic relationships in Solanum (Solanaceae) based on ndhF sequences. Systematic Botany 22: 5–17. [Google Scholar]
  14. Bohs, L. , and Olmstead R. G.. 2001. A reassessment of Normania and Triguera (Solanaceae). Plant Systematics and Evolution 228: 33–48. [Google Scholar]
  15. Bolger, A. M. , Lohse M., and Usadel B.. 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Bouckaert, R. , Vaughan T. G., Barido‐Sottani J., Duchêne S., Fourment M., Gavryushkina A., Heled J., et al. 2019. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Computational Biology 15: e1006650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Brown, J. W. , Walker J. F., and Smith S. A.. 2017. Phyx: Phylogenetic tools for UNIX. Bioinformatics 33: 1886–1888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Capella‐Gutiérrez, S. , Silla‐Martínez J. M., and Gabaldón T.. 2009. trimAl: A tool for automated alignment trimming in large‐scale phylogenetic analyses. Bioinformatics 25: 1972–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Chavana, J. , Singh S., Vazquez A., Christoffersen B., Racelis A., and Kariyat R. R.. 2021. Local adaptation to continuous mowing makes the noxious weed Solanum elaeagnifolium a superweed candidate by improving fitness and defense traits. Scientific Reports 11: 6634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Chiarini, F. , Sazatornil F., and Bernardello G.. 2018. Data reassessment in a phylogenetic context gives insight into chromosome evolution in the giant genus Solanum (Solanaceae). Systematics and Biodiversity 16: 397–416. [Google Scholar]
  21. Cowie, B. W. , Venter N., Witkowski E. T. F., Byrne M. J., and Olckers T.. 2018. A review of Solanum mauritianum biocontrol: Prospects, promise and problems: A way forward for South Africa and globally. Biocontrol 63: 475–491. [Google Scholar]
  22. Darriba, D. , Posada D., Kozlov A. M., Stamatakis A., Morel B., and Flouri T.. 2020. ModelTest‐NG: A new and scalable tool for the selection of DNA and protein evolutionary models. Molecular Biology and Evolution 37: 291–294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Degnan, J. H. , and Rosenberg N. A.. 2009. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology & Evolution 24: 332–340. [DOI] [PubMed] [Google Scholar]
  24. Dodsworth, S. , Christenhusz M. J. M., Conran J. G., Guignard M. S., Knapp S., Struebig M., Leitch A. R., and Chase M. W.. 2020. Extensive plastid‐nuclear discordance in a recent radiation of Nicotiana section Suaveolentes (Solanaceae). Botanical Journal of the Linnean Society. Linnean Society of London 193: 546–559. [Google Scholar]
  25. Duvall, M. R. , Burke S. V., and Clark D. C.. 2020. Plastome phylogenomics of Poaceae: Alternate topologies depend on alignment gaps. Botanical Journal of the Linnean Society. Linnean Society of London 192: 9–20. [Google Scholar]
  26. Echeverría‐Londoño, S. , Särkinen T., Fenton I. S., Purvis A., and Knapp S.. 2020. Dynamism and context‐dependency in diversification of the megadiverse plant genus Solanum (Solanaceae). Journal of Systematics and Evolution 58: 767–782. [Google Scholar]
  27. Edelman, N. B. , Frandsen P. B., Miyagi M., Clavijo B., Davey J., Dikow R. B., García‐Accinelli G., et al. 2019. Genomic architecture and introgression shape a butterfly radiation. Science 366: 594–599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Edgar, R. C. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460–2461. [DOI] [PubMed] [Google Scholar]
  29. Ewels, P. , Magnusson M., Lundin S., and Käller M.. 2016. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32: 3047–3048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Gardner, E. M. , Garner M., Cowan R., Dodsworth S., Epitawalage N., Arifiani D., Sahromi S., et al. 2021. Repeated parallel losses of inflexed stamens in Moraceae: Phylogenomics and generic revision of the tribe Moreae and the reinstatement of the tribe Olmedieae (Moraceae). Taxon 70: 946–988. [Google Scholar]
  31. Gates, D. J. , Pilson D., and Smith S. D.. 2018. Filtering of target sequence capture individuals facilitates species tree construction in the plant subtribe Iochrominae (Solanaceae). Molecular Phylogenetics and Evolution 123: 26–34. [DOI] [PubMed] [Google Scholar]
  32. Hahn, M. W. , and Nakhleh L.. 2016. Irrational exuberance for resolved species trees. Evolution 70: 7–17. [DOI] [PubMed] [Google Scholar]
  33. Hime, P. M. , Lemmon A. R., Lemmon E. C. M., Prendini E., Brown J. M., Thomson R. C., Kratovil J. D., et al. 2021. Phylogenomics reveals ancient gene tree discordance in the amphibian tree of life. Systematic Biology 70: 49–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Huang, B. , Ruess H., Liang Q., Colleoni C., and Spooner D. M.. 2019. Analyses of 202 plastid genomes elucidate the phylogeny of Solanum section Petota. Scientific Reports 9: 4454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Huson, D. H. , and Bryant D.. 2006. Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23: 254–267. [DOI] [PubMed] [Google Scholar]
  36. Jeffroy, O. , Brinkmann H., Delsuc F., and Philippe H.. 2006. Phylogenomics: The beginning of incongruence? Trends in Genetics 22: 225–231. [DOI] [PubMed] [Google Scholar]
  37. Jin, J.‐J. , Yu W.‐B., Yang J.‐B., Song Y., dePamphilis C. W., Yi T.‐S., and Li D.‐Z.. 2020. GetOrganelle: A fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biology 21: 241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Johnson, M. G. , Gardner E. M., Liu Y., Medina R., Goffinet B., Shaw A. J., Zerega N. J. C., and Wickett N. J.. 2016. HybPiper: Extracting coding sequence and introns for phylogenetics from high‐throughput sequencing reads using target enrichment. Applications in Plant Sciences 4: 1600016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Johnson, M. G. , Pokorny L., Dodsworth S., Botigué L. R., Cowan R. S., Devault A., Eiserhardt W. L., et al. 2019. A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k‐Medoids clustering. Systematic Biology 68: 594–606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Junier, T. , and Zdobnov E. M.. 2010. The Newick utilities: High‐throughput phylogenetic tree processing in the UNIX shell. Bioinformatics 26: 1669–1670. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Katoh, K. , Kuma K.‐I., Toh H., and Miyata T.. 2005. MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33: 511–518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Knapp, S. 2013. A revision of the Dulcamaroid clade of Solanum L. (Solanaceae). PhytoKeys 22: 1–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Knapp, S. , Barboza G. E., Bohs L., and Särkinen T.. 2019. A revision of the Morelloid clade of Solanum L. (Solanaceae) in North and Central America and the Caribbean. PhytoKeys 123: 1–144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Knapp, S. 2000. A revision of Solanum thelopodium species group (section Anthoresis sensu Seithe, pro parte): Solanaceae. Bulletin of the Natural History Museum, Botany Series 30: 13–30. [Google Scholar]
  45. Knapp, S. , Sagona E., Carbonell A. K. Z., and Chiarini F.. 2017. A revision of the Solanum elaeagnifolium clade (Elaeagnifolium clade; subgenus Leptostemonum, Solanaceae). PhytoKeys 67: 1–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Knapp, S. , and Vorontsova M. S.. 2016. A revision of the “African Non‐Spiny” Clade of Solanum L. (Solanum sections Afrosolanum Bitter, Benderianum Bitter, Lemurisolanum Bitter, Lyciosolanum Bitter, Macronesiotes Bitter, and Quadrangulare Bitter: Solanaceae). PhytoKeys 66: 1–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Koenen, E. J. M. , Ojeda D. I., and Bakker F. T.. 2021. The origin of the legumes is a complex paleopolyploid phylogenomic tangle closely associated with the Cretaceous–Paleogene (K–Pg) mass extinction event. Systematic Biology 70: 508–526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Kumar, S. , Filipski A. J., Battistuzzi F. U., Kosakovsky Pond S. L., and Tamura K.. 2012. Statistics and truth in phylogenomics. Molecular Biology and Evolution 29: 457–472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Kuramae, E. E. , Robert V., Snel B., Weiss M., and Boekhout T.. 2006. Phylogenomics reveal a robust fungal tree of life. FEMS Yeast Research 6: 1213–1220. [DOI] [PubMed] [Google Scholar]
  50. Lanfear, R. , Calcott B., Ho S. Y. W., and Guindon S.. 2012. Partitionfinder: Combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution 29: 1695–1701. [DOI] [PubMed] [Google Scholar]
  51. Larridon, I. , Villaverde T., Zuntini A. R., Pokorny L., Brewer G. E., Epitawalage N., Fairlie I., et al. 2020. Tackling rapid radiations with targeted sequencing. Frontiers in Plant Science 10: 1655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Levin, R. A. , Myers N. R., and Bohs L.. 2006. Phylogenetic relationships among the “spiny solanums” (Solanum subgenus Leptostemonum, Solanaceae). American Journal of Botany 93: 157–169. [Google Scholar]
  53. Levin, R. A. , Watson K., and Bohs L.. 2005. A four‐gene study of evolutionary relationships in Solanum section Acanthophora . American Journal of Botany 92: 603–612. [DOI] [PubMed] [Google Scholar]
  54. Li, H. , and Durbin R.. 2010. Fast and accurate long‐read alignment with Burrows–Wheeler transform. Bioinformatics 26: 589–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Liu, L. , Yu L., Kubatko L., Pearl D. K., and Edwards S. V.. 2009. Coalescent methods for estimating phylogenetic trees. Molecular Phylogenetics and Evolution 53: 320–328. [DOI] [PubMed] [Google Scholar]
  56. Louca, S. , and Pennell M. W.. 2020. Extant timetrees are consistent with a myriad of diversification histories. Nature 580: 502–505. [DOI] [PubMed] [Google Scholar]
  57. Lutteropp, S. , Scornavacca C., Kozlov A. M., and Morel B.. 2021. NetRAX: Accurate and fast maximum likelihood phylogenetic network inference. bioRxiv, website: 10.1101/2021.08.30.458194 [Preprint]. [DOI] [PMC free article] [PubMed]
  58. Malinsky, M. , Svardal H., Tyers A. M., Miska E. A., Genner M. J., Turner G. F., and Durbin R.. 2018. Whole‐genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nature Ecology & Evolution 2: 1940–1955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Mallet, J. , Besansky N., and Hahn M. W.. 2016. How reticulated are species? BioEssays 38: 140–149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Meleshko, O. , Martin M. D., Korneliussen T. S., Schröck C., Lamkowski P., Schmutz J., Healey A., et al. 2021. Extensive genome‐wide phylogenetic discordance is due to incomplete lineage sorting and not ongoing introgression in a rapidly radiated bryophyte genus. Molecular Biology and Evolution 38: 2750–2766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Mendes, F. K. , Hahn Y., and Hahn M. W.. 2016. Gene tree discordance can generate patterns of diminishing convergence over time. Molecular Biology and Evolution 33: 3299–3307. [DOI] [PubMed] [Google Scholar]
  62. Miller, J. S. , Kamath A., and Levin R. A.. 2009. Do multiple tortoises equal a hare? The utility of nine noncoding plastid regions for species‐level phylogenetics in tribe Lycieae (Solanaceae). Systematic Botany 34: 796–804. [Google Scholar]
  63. Miller, M. A. , Pfeiffer W., and Schwartz T.. 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. 2010 Gateway Computing Environments Workshop (GCE), 1‐8. IEEE [Institute of Electrical and Electronics Engineers], New York, New York, USA.
  64. Minh, B. Q. , Hahn M. W., and Lanfear R.. 2020a. New methods to calculate concordance factors for phylogenomic datasets. Molecular Biology and Evolution 37: 2727–2733. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Minh, B. Q. , Schmidt H. A., Chernomor O., Schrempf D., Woodhams M. D., von Haeseler A., and Lanfear R.. 2020b. IQ‐TREE2: New models and efficient methods for phylogenetic inference in the genomic era. Molecular Biology and Evolution 37: 1530–1534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Morgan, C. C. , Foster P. G., Webb A. E., Pisani D., McInerney J. O., and O'Connell M. J.. 2013. Heterogeneous models place the root of the placental mammal phylogeny. Molecular Biology and Evolution 30: 2145–2156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Olave, M. , and Meyer A.. 2020. Implementing large genomic single nucleotide polymorphism data sets in phylogenetic network reconstructions: A case study of particularly rapid radiations of cichlid fish. Systematic Biology 69: 848–862. [DOI] [PubMed] [Google Scholar]
  68. Olmstead, R. G. , and Palmer J. D.. 1997. Implications for the phylogeny, classification, and biogeography of Solanum from cpDNA restriction site variation. Systematic Botany 22: 19–29. [Google Scholar]
  69. One Thousand Plant Transcriptomes Initiative . 2019. One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574: 679–685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Paradis, E. , and Schliep K.. 2019. ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35: 526–528. [DOI] [PubMed] [Google Scholar]
  71. Parins‐Fukuchi, C. , Stull G. W., and Smith S. A.. 2021. Phylogenomic conflict coincides with rapid morphological innovation. Proceedings of the National Academy of Sciences, USA 118: e2023058118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Pease, J. B. , Haak D. C., Hahn M. W., and Moyle L. C.. 2016. Phylogenomics reveal three sources of adaptive variation during a rapid radiation. PLoS Biology 14: e1002379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Pellicer, J. , and Leitch I. J.. 2020. The Plant DNA C‐values database (release 7.1): An updated online repository of plant genome size data for comparative studies. New Phytologist 226: 301–305. [DOI] [PubMed] [Google Scholar]
  74. Peralta, I. E. , Knapp S., and Spooner D. M.. 2007. The taxonomy of tomatoes: A revision of wild tomatoes (Solanum L. section Lycopersicon (Mill.) Wettst.) and their outgroup relatives (Solanum sections Juglandifolium (Rydb.) Child and Lycopersicoides (Child) Peralta). Systematic Botany Monographs 84: 1–186. [Google Scholar]
  75. Philippe, H. , Brinkmann H., Lavrov D. V., Littlewood D. T. J., Manuel M., Wörheide G., and Baurain D.. 2011. Resolving difficult phylogenetic questions: Why more sequences are not enough. PLoS Biology 9: e1000602. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Philippe, H. , D. M. de Vienne, V. Ranwez, B. Roure, D. Baurain, and F. Delsuc. 2017. Pitfalls in supermatrix phylogenomics. European Journal of Taxonomy 283: 1–125. [Google Scholar]
  77. Poczai, P. 2013. To network or not to network, that is the question. Journal of Genetics 92: 703–705. [DOI] [PubMed] [Google Scholar]
  78. Price, M. N. , Dehal P. S., and Arkin A. P.. 2010. FastTree 2–approximately maximum‐likelihood trees for large alignments. PLoS One 5: e9490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Rambaut, A. , Drummond A. J., Xie D., Baele G., and Suchard M. A.. 2018. Posterior summarization in bayesian phylogenetics using Tracer 1.7. Systematic Biology 67: 901–904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Reddy, S. , Kimball R. T., Pandey A., Hosner P. A., Braun M. J., Hackett S. J., Han K.‐L., et al. 2017. Why do phylogenomic data sets yield conflicting trees? Data type influences the avian tree of life more than taxon sampling. Systematic Biology 66: 857–879. [DOI] [PubMed] [Google Scholar]
  81. Romiguier, J. , Ranwez V., Delsuc F., Galtier N., and Douzery E. J. P.. 2013. Less is more in mammalian phylogenomics: AT‐rich genes minimize tree conflicts and unravel the root of placental mammals. Molecular Biology and Evolution 30: 2134–2144. [DOI] [PubMed] [Google Scholar]
  82. Ronco, F. , Matschiner M., Böhne A., Boila A., Büscher H. H., El Taher A., Indermaur A., et al. 2021. Drivers and dynamics of a massive adaptive radiation in cichlid fishes. Nature 589: 76–81. [DOI] [PubMed] [Google Scholar]
  83. Rosario, L. H. , Rodríguez Padilla J. O., Martínez D. R., Grajales A. M., Mercado Reyes J. A., Veintidós Feliu G. J., Van Ee B., and Siritunga D.. 2019. DNA barcoding of the Solanaceae family in Puerto Rico including endangered and endemic species. Journal of the American Society for Horticultural Science 144: 363–374. [Google Scholar]
  84. Saarela, J. M. , Burke S. V., Wysocki W. P., Barrett M. D., Clark L. G., Craine J. M., Peterson P. M., et al. 2018. A 250 plastome phylogeny of the grass family (Poaceae): Topological support under different data partitions. PeerJ 6: e4299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Sang, T. , Crawford D., and Stuessy T.. 1997. Chloroplast DNA phylogeny, reticulate evolution, and biogeography of Paeonia (Paeoniaceae). American Journal of Botany 84: 1120–1136. [PubMed] [Google Scholar]
  86. Särkinen, T. , Barboza G. E., and Knapp S.. 2015. True black nightshades: Phylogeny and delimitation of the Morelloid clade of Solanum . Taxon 64: 945–958. [Google Scholar]
  87. Särkinen, T. , Bohs L., Olmstead R. G., and Knapp S.. 2013. A phylogenetic framework for evolutionary study of the nightshades (Solanaceae): A dated 1000‐tip tree. BMC Evolutionary Biology 13: 214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Särkinen, T. , Poczai P., Barboza G. E., van der Weerden G. M., Baden M., and Knapp S.. 2018. A revision of the Old World black nightshades (Morelloid clade of Solanum L., Solanaceae). PhytoKeys 106: 1–223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Sayyari, E. , and Mirarab S.. 2016. Fast coalescent‐based computation of local branch support from quartet frequencies. Molecular Biology and Evolution 33: 1654–1668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Sayyari, E. , and Mirarab S.. 2018. Testing for polytomies in phylogenetic species trees using quartet frequencies. Genes 9: 132. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Schrempf, D. , and Szöllősi G.. 2020. The sources of phylogenetic conflicts. In Scornavacca C., Delsuc F., and Galtier N. [eds.], Phylogenetics in the Genomic Era, 3.1:1–3.1:23. No commercial publisher.
  92. Simion, P. , Philippe H., Baurain D., Jager M., Richter D. J., Di Franco A., Roure B., et al. 2017. A large and consistent phylogenomic dataset supports sponges as the sister group to all other animals. Current Biology 27: 958–967. [DOI] [PubMed] [Google Scholar]
  93. Solís‐Lemus, C. , Bastide P., and Ané C.. 2017. PhyloNetworks: A package for phylogenetic networks. Molecular Biology and Evolution 34: 3292–3298. [DOI] [PubMed] [Google Scholar]
  94. Spalink, D. , Stoffel K., Walden G. K., Hulse‐Kemp A. M., Hill T. A., VanDeynze A., and Bohs L.. 2018. Comparative transcriptomics and genomic patterns of discordance in Capsiceae (Solanaceae). Molecular Phylogenetics and Evolution 126: 293–302. [DOI] [PubMed] [Google Scholar]
  95. Spooner, D. M. , Alvarez N., Peralta I. E., and Clausen A. M.. 2016. Taxonomy of wild potatoes and their relatives in Southern South America (Solanum sect. Petota and Etuberosum). Systematic Botany Monographs 100: 1–240. [Google Scholar]
  96. Spooner, D. M. , van den Berg R. G., Rodrigues A., Bamberg J. B., Hijmans R. J., and Lara‐Cabrera S.. 2004. Wild potatoes (Solanum section Petota; Solanaceae) of North and Central America. Systematic Botany Monographs 68: 1–209. [Google Scholar]
  97. Spooner, D. M. , Jansky S., Rodríguez F., Simon R., Ames M., Fajardo D., and Castillo R. O.. 2019. Taxonomy of wild potatoes in northern South America (Solanum section Petota). Systematic Botany Monographs 108: 1–305. [Google Scholar]
  98. Springer, M. S. , and Gatesy J.. 2018. Delimiting coalescence genes (C‐Genes) in phylogenomic data sets. Genes 9: 123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Stamatakis, A. 2006. RAxML‐VI‐HPC: Maximum likelihood‐based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688–2690. [DOI] [PubMed] [Google Scholar]
  100. Stamatakis, A. 2014. RAxML version 8: A tool for phylogenetic analysis and post‐analysis of large phylogenies. Bioinformatics 30: 1312–1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Stern, S. , de M. Agra F., and Bohs L.. 2011. Molecular delimitation of clades within New World species of the “spiny solanums” (Solanum subg. Leptostemonum). Taxon 60: 1429–1441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Stern, S. , and Bohs L.. 2012. An explosive innovation: Phylogenetic relationships of Solanum section Gonatotrichum (Solanaceae). PhytoKeys 8: 89–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Strickler, S. R. , Bombarely A., Munkvold J. D., York T., Menda N., Martin G. B., and Mueller L. A.. 2015. Comparative genomics and phylogenetic discordance of cultivated tomato and close wild relatives. PeerJ 3: e793. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Suh, A. 2016. The phylogenomic forest of bird trees contains a hard polytomy at the root of Neoaves. Zoologica Scripta 45: 50–62. [Google Scholar]
  105. Suh, A. , Smeds L., and Ellegren H.. 2015. The dynamics of incomplete lineage sorting across the ancient adaptive radiation of neoavian birds. PLoS Biology 13: e1002224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Symon, D. E. 1994. Kangaroo apples: Solanum sect. Archaesolanum. Published by the author, Adelaide, Australia.
  107. Taberlet, P. , Gielly L., Pautou G., and Bouvet J.. 1991. Universal primers for amplification of three non‐coding regions of chloroplast DNA. Plant Molecular Biology 17: 1105–1109. [DOI] [PubMed] [Google Scholar]
  108. Tavaré, S. 1986. Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures on Mathematics in the Life Sciences 17: 57–86. [Google Scholar]
  109. Tepe, E. J. , Anderson G. J., Spooner D. M., and Bohs L.. 2016. Relationships among wild relatives of the tomato, potato, and pepino. Taxon 65: 262–276. [Google Scholar]
  110. Than, C. , Ruths D., and Nakhleh L.. 2008. PhyloNet: A software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9: 322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Tillich, M. , Lehwark P., Pellizzer T., Ulbricht‐Jones E. S., Fischer A., Bock R., and Greiner S.. 2017. GeSeq‐versatile and accurate annotation of organelle genomes. Nucleic Acids Research 45: W6–W11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Villanueva, R. A. M. , and Chen Z. J.. 2019. ggplot2: Elegant graphics for data analysis (2nd ed.). Measurement: Interdisciplinary Research and Perspectives 17: 160–167. [Google Scholar]
  113. Weese, T. L. , and Bohs L.. 2007. A three‐gene phylogeny of the genus Solanum (Solanaceae). Systematic Botany 32: 445–463. [Google Scholar]
  114. Wen, D. , Yu Y., Zhu J., and Nakhleh L.. 2018. Inferring phylogenetic networks using PhyloNet. Systematic Biology 67: 735–740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Wendel, J. F. , and Doyle J. J.. 1998. Phylogenetic incongruence: Window into genome history and molecular evolution. In Soltis D. E., Soltis P. S., and Doyle J. J. [eds.], Molecular Systematics of plants II: DNA sequencing, 265–296. Springer, Boston, Massachusetts, USA. [Google Scholar]
  116. Wickett, N. J. , Mirarab S., Nguyen N., Warnow T., Carpenter E., Matasci N., Ayyampalayam S., et al. 2014. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proceedings of the National Academy of Sciences, USA 111: e4859–e4868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Wickham, H. , and Wickham M. H.. 2019. Package ‘stringr.’ Website: http://stringr.tidyverse.org, https://github.com/tidyverse/stringr
  118. Williams, T. A. , Schrempf D., Szöllősi G. J., Cox C. J., Foster P. G., and Embley T. M.. 2021. Inferring the deep past from molecular data. Genome Biology and Evolution 13(5): evab067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Wu, M. , Kostyun J. L., and Moyle L. C.. 2019. Genome sequence of Jaltomata addresses rapid reproductive trait evolution and enhances comparative genomics in the hyper‐diverse Solanaceae. Genome Biology and Evolution 11: 335–349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Yu, G. 2020. Using ggtree to visualize data on tree‐like structures. Current Protocols in Bioinformatics 69: e96. [DOI] [PubMed] [Google Scholar]
  121. Zhang, C. , Rabiee M., Sayyari E., and Mirarab S.. 2018. ASTRAL‐III: Polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics 19: 153. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional supporting information can be found online in the Supporting Information section at the end of this article.

Appendix S1. Supermatrix sample information, including voucher details and GenBank numbers for sequences used.

Appendix S2. Plastome (PL) sample information, including voucher details and plastome assemblies’ results. Total length, as well as length for the long‐single copy region (LSC), the short‐single copy region (SSC), and the two inverted repeat regions (IR1 and IR2) is shown; statistics of mean coverage per base pair and standard deviation are also provided.

Appendix S3. Target‐capture (TC) sample information, including voucher details and sequence recovery statistics. The number of reads (NumReads), the number of reads mapped to the targets (ReadsMapped), the percentage of reads on target (PctOnTarget), the number of genes with reads (GenesMapped), the number of genes with contigs (GenesWithContigs), (GenesWithSeqs, GenesAt25pct, GenesAt50pct, GenesAt75pct, GenesAt150pct, and the number of genes with paralog warnings (ParalogWarnings) is shown.

Appendix S4. Supermatrix alignment details, with details about the nine regions selected for this study. Number of species sampled per region, accumulative percentage of species sampled per region, aligned length, proportions of parsimony informative characters (PI), and variable sites (VS) per region in the dataset are indicated. Values are calculated with outgroups, and with ambiguous regions and repeats excluded. Bp = base pairs.

Appendix S5. List of polyploid taxa in Solanum.

Appendix S6. ML results for each of the nine individual loci and combined plastid loci. (A) ITS; (B) matK; (C) ndhF; (D) ndhF‐rpL32; (E) psbA‐trnH; (F) rpL32‐trnL; (G) trnL‐trnT; (H) trnS‐trnG; (I) waxy; and (J) seven plastid loci. Nodes with bootstrap support equal and above 95% are in cyan, and with branch support between 75% and 94% in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.

Appendix S7. BI results for each of the nine individual loci and combined plastid loci. (A) ITS; (B) matK; (C) ndhF; (D) ndhF‐rpL32; (E) psbA‐trnH; (F) rpL32‐trnL; (G) trnL‐trnT; (H) trnS‐trnG; (I) waxy; and (J) seven plastid loci. Nodes with posterior probability equal and above 0.95 are in cyan, and nodes with posterior probabilities between 0.75 and 0.95 are in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.

Appendix S8. Plastome (PL) alignment statistics for plastome alignment. Data shows number of sequences, trimming mode, the number of loci retained for coalescent analysis after checking for excessive gene tree branch lengths, alignment length, number of informative and constant sites, pairwise identity, average GC content, percentage of gaps, and average locus length for the exon, intron, and intergenic regions.

Appendix S9. Optimal substitution model used in ML analyses for the PL and TC datasets, determined using ModelFinder in IQ‐TREE2. For each locus, the number of taxa, sites, informative sites, and invariable sites are indicated, as well as the model selected and the AICc score. Worksheet titles correspond to the following: PLUnpartitioned = Models selected for PL unpartitioned datasets, for 151 taxa and 125 taxa; PLBestPartScheme = Models selected for PL datasets analysed according to the best‐partition scheme; TCPartitioned_Min4 = Models selected for loci of the TC dataset, with minimum 4 taxa per loci; TCPartitioned_Min20 = Models selected for loci of the TC dataset, with minimum 20 taxa per loci.

Appendix S10. Target‐capture (TC) alignment statistics. Loci excluded refer to the number of excluded loci based on excessively long branch lengths, and loci retained is the final number of loci retained for both ML and coalescent analyses. Empty sequences inserted refers to amount of missing data. Min = minimum; Bp = base pairs.

Appendix S11. Detailed RaxML of supermatrix phylogenetic tree with 746 taxa. Nodes with bootstrap support equal and above 95% are in cyan, and with bootstrap support between 75% and 94% in red. Bootstrap support values for each node indicated in italic. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.

Appendix S12. Detailed Bayesian inference (Beast) supermatrix phylogenetic tree with 746 taxa. Nodes with posterior probability equal and above 0.95 are in cyan, and nodes with posterior probabilities between 0.75 and 0.95 are in red. Posterior probability values for each indicated in italic. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1.

Appendix S13. ML phylogenetic trees of plastome datasets. Nodes with bootstrap support equal and above 95% are in cyan, and with bootstrap support between 75% and 94% in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1. (A) 151 taxa, all data, unpartitioned; (B) 125 taxa, all data, unpartitioned; (C) 151 taxa, all data, best partition scheme; (D) 125 taxa, all data, best partition scheme.

Appendix S14. ML phylogenetic trees of A353 target capture datasets (IQ‐TREE2). Nodes with bootstrap support equal and above 95% are in cyan, and with bootstrap support between 75% and 94% in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1. (A) Filtering threshold of minimum 4 taxa per loci; (B) filtering threshold of minimum 20 taxa per loci.

Appendix S15. Coalescent phylogenetic trees of A353 target capture datasets (ASTRAL‐III). Nodes with multi‐locus local posterior probability support equal and above 0.95 are in cyan, and with support between 0.75 and 0.94 in red. Tips indicate species names, followed by major and/or minor clade, as indicated in Table 1. (A) Filtering threshold of minimum of 4 taxa per loci; (B) filtering threshold of minimum 20 taxa per loci.

Appendix S16. Polytomy test results with ASTRAL‐III. (A) Target Capture A353 species tree ASTRAL‐III, filtering threshold of minimum 4 taxa per loci, branches in gene trees with 10% or less branch support collapsed; (B) Target Capture A353, ASTRAL‐III, filtering threshold of minimum 4 taxa per loci, branches in gene trees with 75% or less branch support collapsed; (C) Target Capture A353, ASTRAL‐III, filtering threshold of minimum 20 taxa per loci, branches in gene trees with 10% or less branch support collapsed; (D) Target Capture A353, ASTRAL‐III, filtering threshold of minimum 20 taxa per loci, branches in gene trees with 75% or less branch support collapsed.

Data Availability Statement

Raw sequence data generated in this study are deposited in various archives, including GenBank (website: https://www.ncbi.nlm.nih.gov/genbank/) and the European Nucleotide Archive (website: https://www.ebi.ac.uk/ena/browser/home); full accession numbers are provided in Appendices S1S2, and S3. In addition, the 10 species trees generated for this study, as well as the alignments used for the different phylogenetic analyses, including the concatenated Sanger supermatrix, the plastome dataset, and the target capture datasets (min04 and min20) are available via Data Dryad, at the following link: https://datadryad.org/stash/dataset/doi:10.5061/dryad.2v6wwpzpt.


Articles from American Journal of Botany are provided here courtesy of Wiley

RESOURCES