Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2018 Apr 25;13(4):e0196069. doi: 10.1371/journal.pone.0196069

The chloroplast genome sequence of bittersweet (Solanum dulcamara): Plastid genome structure evolution in Solanaceae

Ali Amiryousefi 1, Jaakko Hyvönen 1,2, Péter Poczai 2,*
Editor: Berthold Heinze3
PMCID: PMC5919006  PMID: 29694416

Abstract

Bittersweet (Solanum dulcamara) is a native Old World member of the nightshade family. This European diploid species can be found from marshlands to high mountainous regions and it is a common weed that serves as an alternative host and source of resistance genes against plant pathogens such as late blight (Phytophthora infestans). We sequenced the complete chloroplast genome of bittersweet, which is 155,580 bp in length and it is characterized by a typical quadripartite structure composed of a large (85,901 bp) and small (18,449 bp) single-copy region interspersed by two identical inverted repeats (25,615 bp). It consists of 112 unique genes from which 81 are protein-coding, 27 tRNA and four rRNA genes. All bittersweet plastid genes including non-functional ones and even intergenic spacer regions are transcribed in primary plastid transcripts covering 95.22% of the genome. These are later substantially edited in a post-transcriptional phase to activate gene functions. By comparing the bittersweet plastid genome with all available Solanaceae sequences we found that gene content and synteny are highly conserved across the family. During genome comparison we have identified several annotation errors, which we have corrected in a manual curation process then we have identified the major plastid genome structural changes in Solanaceae. Interpreted in a phylogenetic context they seem to provide additional support for larger clades. The plastid genome sequence of bittersweet could help to benchmark Solanaceae plastid genome annotations and could be used as a reference for further studies. Such reliable annotations are important for gene diversity calculations, synteny map constructions and assigning partitions for phylogenetic analysis with de novo sequenced plastomes of Solanaceae.

Introduction

The genus Solanum L., with approximately 1,400 species, is one of the largest genera of angiosperms, and includes many major and minor food crops such as tomato, potato, eggplant, and pepino. Bittersweet (Solanum dulcamara L.) is a European native diploid (2n = 2× = 24) species, which is found throughout the northern hemisphere across a wide range of habitats. It was also introduced to North America possibly for its medicinal properties [1]. It is still used as a source of various alkaloids with diuretic, diaphoretic properties to treat rheumatism and skin diseases in Asia and India [2, 3].

This semi-woody perennial vine is easy to recognize (Fig 1). However, it is a highly polymorphic and phenotypically plastic species showing extreme forms, which has led to confused taxonomy. Previous treatments placed Solanum dulcamara to sect. Dulcamara (Moench) Dumort. in subg. Potatoe (G.Don) D’Arcy related to potatoes (sect. Petota Dumort.) and tomatoes (sect. Lycopersicum (Tourn.) Wettst.) [47]. This was based on scandent habit, pinnate leaves and on the articulation of pedicels above the base [1, 4]. However, recent phylogenetic studies have shown that it belongs to the Dulcamaroid clade [811], which is closely related to the Morelloid clade including species of black nightshades of sect. Solanum (e.g. S. nigrum L. and S. scabrum Mill.).

Fig 1. The berries and flowers of Solanum dulcamara L.

Fig 1

Solanum dulcamara serves as a host for important plant pathogens such as those causing bacterial wilt (Ralstonia solanacearum (Smith 1896) Yabuuchi et al. 1996), late blight (Phytophthora infestans (Mont.) de Bary.) and also for some viruses [12, 13]. Late blight, is one of the most serious potato diseases worldwide [14]. However, it was shown that bittersweet has a minimal role in late blight infections since most plants are resistant and the inocula of the pathogen do not overwinter [15]. Populations of this species seem to have experienced a genetic bottleneck [16], but some allelic variation was found to be distributed among populations resulting in more structured populations at larger regional levels [17]. The differentiation of the populations could have arisen by genetic drift or even by inbreeding over a very long period. Bittersweet is mostly an outcrossing species, but its population structure might have been affected by its perennial self-compatibility [18], reducing genetic diversity within regional populations and enhancing inbreeding. This leads to high interpopulation or spatial differentiation [17]. Genetic drift, on the other hand, may not have shaped the population structure of the species recently based on the observed moderate level of diversity among populations [16, 17]. However, over a longer time scale population expansion from postglacial refugia is known to leave such traces [19].

High throughput sequencing is revolutionizing phylogenetics as it allows to obtain hundreds to thousands of markers in a cost effective way. Complete plastid genome (plastome) sequences now could be easily acquired for phylogenomic analyses with relatively low cost. Angiosperm plastid genomes exist in circular and linear forms [20] and the percentage of each form varies within plant cells [21]. They are small, typically ~ 120–150 kb in size and have a highly conserved quadripartate structure containing two inverted repeats (IRA and IRB), which separate the large and small single copy regions (LSC and SSC). The plastid genome includes 110–130 genes primarily participating in photosynthesis, transcription and translation [22]. Their conserved gene content, order and organization makes them relatively well suited for evolutionary studies since gene losses, structural rearrangements, pseudogenes or additional mutation events could be characteristic for some lineages. The information from length mutational events could be used in addition to the information from DNA substitutions occurring in the plastid genome. Such changes have been shown to be informative for example in Araliaceae [23], Geraniaceae [24], Poaceae [25] and in early embryopythe lineages [26]. It has been shown that independent gene and intron losses are limited to the more derived monocot and eudicot clades with lineage-specific correlation between rates of nucleotide substitutions, indels, and genomic rearrangements [27].

Here we present the complete chloroplast genome sequence of bittersweet using high-throughput sequencing, as well as the assembly, annotation, gene expression and unique structure characterization of its plastome. We also compare the gene order, inverted repeat (IR) length and examine the variation of structural changes across the family. In order to achieve this we revise the annotations of Solanaceae plastid genome records and correct possible errors. Using this edited plastid genome dataset we present a phylogenetic hypothesis of Solanaceae and examine the distribution of structural changes in the plastid genomes.

Materials and methods

Chloroplast isolation

Bittersweet leaves were collected in the Kaisaniemi Botanical Garden of the University of Helsinki, Finland during the summer of 2015. DNA isolation was carried out according to the modified high-salt protocol of Shi et al [28]. DNA concentration was measured with a Qubit fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and checked on 0.8% agarose gel. We carried out a multiply-primed rolling circle amplification (RCA) according to the protocol of Atherton et al. [29] using a REPLI-g Mini Kit (Qiagen, Hilden, Germany) to produce abundant DNA template.

Plastid genome sequencing

Paired-end libraries of 300 bp were prepared with Illumina TruSeq DNA Sample prep kit (Illumina, San Diego, CA, USA). Fragment analysis was conducted with an Agilent Technologies 2100 Bioanalyzer using a DNA 1000 chip. Sequencing was carried out on an Illumina MiSeq platform from both ends with 150 bp read length.

Genome assembly and annotation

Raw reads were first filtered to obtain high-quality clean data by removing low quality reads with a sliding window quality cutoff of Q20 using Trimmomatic [30]. Plastid reads were filtered by reference mapping to Solanaceae plastid genome sequences using Geneious 9.1.7. [31] with medium-low sensitivity and 1,000 iterations. From the collected reads a de novo assembly was carried out with the built-in Geneious assembler platform with zero mismatches and gaps allowed among the reads. The similar procedure was conducted with Velvet v1.2.10 [32] with k-mer length 37, minimum contig length 74 and default settings by applying a 400× upper coverage limit. The resulting contigs were then circularized by matching end points. The results of the reference mapping and two de novo methods were compared and inspected. Sanger-based gap closure and IR junction verification was carried out following Moore et al. [33]. Gene annotation was made with a two-step procedure. First we used gene prediction tools DOGMA [34], tRNAscan-SE [35], cpGAVAS [36], Verdant [37] and GeSeq [38] to obtain annotations based on different approaches. In a second step we inspected and curated all annotation manually with comparisons to all published (as of 18.10.2016) plastid genomes of Solanaceae using Geneious. Local BLAST searches were further carried out to confirm the position of CDS regions and genes. We confirmed start and stop codons manually and by comparison to RNA-seq data. For each gene we inspected gene length based on amino acid translations and reconfirmed any internal stop codons. The resulting genome map was drawn with OGDraw v.1.2 [39]. The annotated bittersweet plastid genome was further used as a reference to revise all Solanaceae plastid genomes (deposited by 16.8.2016). Reannotation followed the two-step protocol described above. Plastid genome sequences were transformed into fasta file format then annotated with the software tools [3438]. All annotations were transferred to Geneious as a new track under the corresponding genome. Sequences were aligned, compared and manually curated compared to bittersweet.

Genome analyses

Codon frequency and relative synonymous codon usage (RSCU) was calculated on the basis of protein-coding genes using an in-house script. We also computed the overall mean of pairwise distances of 80 protein-coding genes of the 32 Solanaceae species based on the Kimura 2-parameter model using MEGA 7.0.21 [40]. Standard error estimate(s) were obtained using bootstrap (1,000 replicates). Complete plastid genome sequences were compared and aligned using mVISTA online tools [41], while the expansion and contraction of the inverted repeat (IR) regions at junction sites was examined and plotted using IRscope [42]. We identified and located repeat sequences (n ≥30 bp and a sequence identity ≥ 90%) found in the bittersweet plastome using REPuter [43]. Repeats larger than 10 bp were classified into the following groups: (i) forward or direct repeats (F), (ii) repeats found in reverse orientation (R), (iii) palindromic repeats forming hairpin loops in their structure (P) and (iv) repeats found in reverse complement orientation (C). Because REPuter overestimates the number of repeats we manually inspected the output file and located the repeats in Geneious. Redundant repeats found entirely within other repeats as well as duplicated parts of tRNAs were pruned. Perfect and compound simple sequence repeats (SSRs) interrupted by 100-bp were located with MISA [44]. A threshold level of seven was applied to mononucleotide repeats, four to dinucleotide repeats and three to tri-, tetra, penta-, and hexanucleotide repeats. Output files were manually edited and exported to Geneious for further inspection.

Transcriptome analysis and RNA editing site prediction

RNA-seq library files were downloaded from NCBI Short Read Archive for Solanum dulcamara (SRR2056039). Reads were mapped to the complete plastid genome and filtered reads were collected with Bowtie 2.0 [45] (mismatch ≤ 2). RNA-seq reads were re-mapped with Geneious using the genome annotation to calculate reads per kilobase per million (RPKM), fragments per kilobase of exon per million fragments mapped (FPKM) and transcripts per million (TPM) for transcript variants. Ambiguously mapped reads were counted as partial matches for each CDS. Putative RNA Editing sites were predicted with an in silico approach using the PREP database [46]. Verification of the predicted editing sites was carried out by FreeBayes [47] variant calling.

Phylogenomic analyses

Our aim was to compare the 32 chloroplast genomes of Solanaceae (data present in NCBI on 16.8.2016) with each other and try to hypothesize when changes have taken place between/among the species and major clades. As outgroup terminals we used Coffea arabica L. of Rubiaceae, Ipomoea batatas (L.) Lam. and I. purpurea (L.) Roth. We aligned the 35 complete chloroplast genomes (S1 Table) with MAFFT [48] (S1 Data) since they were lacking inversions or other major changes. We conducted maximum likelihood (ML) analyses using RAxML-NG [49] under three different strategies. 1) One of the IR regions was removed from all plastid genomes to reduce overrepresentation of duplicated sequences then we run RAxML-NG on the unpartitioned alignment under GTR+I+G substitution model as a single partition; 2) The same data matrix was partitioned by gene, exon, intron and intergenic spacer regions (n = 258) and allowed separate base frequencies, α-shape parameters, and evolutionary rates to be estimated for each; 3) we inferred the best-fitting partitioning strategy with PartitionFinder2 [50] for the alignment (n = 24). The best fitting nucleotide substitution models were inferred with jModelTest2 [51]. Branch support values were obtained from 10,000 non-parametric bootstrapping. For each alignment we conducted ten separate runs with RAxML-NG v0.5.0b since log-likelihoods could show variation among individual runs [52]. The complete plastid genome alignment was analyzed also with parsimony as an optimality criterion using the program TNT [53]. The matrix included 19,956 parsimony informative characters and due to its small size we were able to perform analyses using “traditional” search starting from Wagner trees improved using tree bisection reconnection (TBR) algorithm. This search was performed twice with 3,000 replications. We also examined the phylogenetic distribution of structural changes using the tree constructed with parsimony and ML methods implemented in the ancestral state reconstruction tools of Mesquite 3.2 [54]. Major genomic changes were binary coded (S2 Data) and mapped on phylogenetic trees. Phylogenetic trees were visualized and edited with TreeGraph2 [55].

Results and discussion

Chloroplast genome assembly and validation

Enriched chloroplast DNA was used to generate 1,645,956 paired-end reads, with an average fragment length of 277 bp, which generated average 1,340 × genome coverage. Low quality reads (Q20) were filtered out, and the remaining high quality reads were utilized in further assembly. For genome assembly we used one reference mapping and two de novo methods. As a first step quality filtered reads were mapped to Solanaceae reference genomes, which resulted in an entire contig showing good agreement with published genome sequences. Based on these collected reads we used Geneious and Velvet to produce a single contiguous fragment representing the plastid genome. The three assemblies were compared and discrepancies were manually resolved. With Velvet we obtained a linear contig 43 bp longer (155,623 bp) than with Geneious (155,580 bp) which was caused by a repeated sequence at the start and end point and these were removed. Most de novo methods do not account for the circularity of the plastid genome, while Geneious overcomes this by allowing contig circularization during the assembly. The assembly was validated by PCR amplification and Sanger sequencing targeting the four junctions between the IRs and LSC/SSC regions. Sanger results showed identical sequences when compared to the plastid genome demonstrating the accuracy of the assembly. The final chloroplast genome sequence was then submitted to GenBank (KY863443).

Genome organization, repeats and sequence diversity

The chloroplast genome of Solanum dulcamara is 155,580 bp long showing a quadripartite structure of long and small single-copy regions of length 85,901 and 18,449 bp, separated with two inverted repeat regions of 25,615 bp (Fig 2). The genome contains 81 protein-coding, 27 tRNA and four rRNA genes comprising the total of 114 unique genes (S2 Table). Seventeen genes contained introns, with ycf3 and clpP containing two. All of these belong to group II introns except trnL-UAA with group I intron (S3 Table). The distribution of the genes on different regions of the genome exhibit similarity with other Solanaceae with 13 genes in the SSC and 19 genes in the IR while the rest were on the LSC. The overall GC content of the chloroplast genome is 37.8% resembling other species of Solanaceae (S4 Table). Eighty percent of the total length of the genome is related to genetic regions. The Arg amino acid coded with AGA codon was the most frequent codon showing RSCU rate of 1,187 (S5 Table).

Fig 2. Map of the chloroplast genome of the Solanum dulcamara.

Fig 2

Genes lying inside of the outer circle are transcribed counterclockwise while those outside that circle are transcribed clockwise. Genes belonging to different functional groups are color coded differently and the GC, AT content of the genome are plotted on the inner circle as dark and light gray, respectively. The inverted repeats, large single copy, and small single copy regions are denoted by IR, LSC, and SSC, respectively.

The majority of the genes show relatively slow evolutionary divergence since all genes had an average sequence distance of less than 0.10 (S6 Table). Low levels of sequence distances indicate the conserved nature of protein-coding genes in Solanaceae. The only gene showing slightly larger distance with a unique function was sprA (d = 0.114; S.E = 0.016). Chloroplast genes are mostly subjected to purifying selection and low sequence diversity is due to conservation of the functions of the photosynthetic system. In this context the plastid genome diversity of Solanaceae do not resemble other economically important plant families such as Poaceae where plastid genomes harbor many divergent genes and unique plastid rearrangements [25].

Using MISA we identified 374 SSRs in the bittersweet plastid genome, of which 253 were mono-, 40 di-, 70 tri-, 10 tetra- and one was a pentanucleotide (S7 Table and S3 Data). SSRs were more abundant in the LSC and SSC regions compared to the IRs and 107 occurred in compound formation that were composed of several combinations of SSRs interrupted by maximum distances of 100 bp. The most abundant motifs of the SSRs were poly-A/T stretches characteristic of angiosperm plastid genomes. We also identified 25 larger repeats (> 10 bp) in the bittersweet plastid genome composed of 12 forward, five reverse, five palindromic and three mixed (forward/palindromic) repeats (Table 1) using REPuter. The largest repeat with a size of 83 bp was a forward repeat found in the IGS region of ycf3 and trnS-GGA. Forward repeats were commonly distributed in the intergenic spacer regions of the genome located mostly in the LSC. Two repeats were found among the introns of ndhA, ycf3 and petD while one repeat appeared in the infA pseudogene. Three repeats were found among the CDS of atpI, ndhC and ycf2, while another motif was repeated in the psaA and psbB gene. The repeats in atpI and ycf2 seem to be conserved since they have also been reported from grasses [25]. The most variable region was the trnE-UUC—trnT-GGU IGS, which had two palindromic and one forward repeat.

Table 1. Repeat sequences of the Solanum dulcamara chloroplast genome.

No Type Location   Region Repeat unit Period size (bp) Copy Nr.
1 F ycf3—trnS-GGA IGS LSC AACAATTTTAAAGAAAAATTGTATCTTTATCCCGGAGTC
TTGAAGGAAAGAAAAATGGTTCTTTGTTTTGACTTTGATGAAA
83 2
2 F psaA and psbB CDS LSC TGCAATAGCTAAATGGTGATGGGCAATATCAGTCAGCC 38 2
3 F ndhA and ycf3 intron LSC/SSC CAGAACCGTACGTGAGATTTTCACCTCATACGGCTCCT 38 2
4 F infA pseudogene LSC AGGTATCAACTAATCTAATCCAATTTGGATATTATAAA 38 2
5 F atpB—rbcL IGS LSC TTAGCACTCGATGAGACTGAGTTAATTTGCAAGCT 34 2
6 F psbA—ycf3—trnS-GGA IGS LSC TTAATATAATAAAAAGAAGTCTATTTTGT 29 2
7 F sprA—trnL-UAG IGS SSC CCTTTTTAACTCTATTCCTTAATTGAGT 28 2
8 P rps12—trnV-GAC IGS IR TGAGATTTTCACCTCATACGGCTCCT 26 2
9 P petD intron LSC TATAAGTGAACTAGATAAAACGGAAT 26 2
10 F trnG-GCC—trnR-UCU IGS LSC TTAGTACATCATTGAATATACAA 23 2
11 F psaJ—rpl33 IGS LSC GTGGACGGGCTGAGGAATGGGG 22 2
12 F/P rps12—trnV-GAC IGS IR ATTAGATTAGTATTAGTTAGT 21 4
13 F ndhC—trnV-UAC IGS LSC TCCTTTTATTATTATTTAAT 20 2
14 P psbT—psbN IGS LSC AGTTGAAGTACGGAGCCTCC 20 2
15 F trnE-UUC—trnT-GGU and rps4—trnT-UGU IGS LSC TTATTTAGTATTTCGAATT 19 2
16 F/P ycf2 CDS IR CGATATTGATGATAGTGAC 19 4
17 F rps16—trnQ-UUG IGS LSC ATTATAATATTAATTA 16 3
18 P trnE-UUC—trnT-GGU IGS LSC TTTTATTTAGAAA 13 2
19 P trnE-UUC—trnT-GGU IGS LSC CATCATACTATGA 13 2
20 R trnF-GAA—ndhJ IGS LSC TCTCCTCTTTT 11 2
21 R ndhC CDS LSC CATCAAAAACA 11 2
22 R atpH—atpI IGS LSC TTTATTATTTA 11 2
23 R atpI CDS LSC ACAAAAATAA 11 2
24 R petL—petG IGS LSC CCTCTTTTTT 10 2
25 F/P rps12—trnV-GAC IGS IR AACTAATACT 10 6

Reannotation of Solanaceae plastid genomes

We noticed a litany of errors in currently deposited annotations, which were corrected for our analyses in a two-step curation process using gene prediction tools followed by manual adjustments. The reannotated genome files could be accessed as an online supplement (S4 and S5 Data). We provide here the first annotation for the sequences of S. pennellii Correll and Iochroma loxense (Kunth) Miers, which entirely lacked genome features. A complete list of annotation errors is found in S8 Table, and illustrates the difficulties encountered when attempting to compare across genomes. These differences could cause considerable consequences inferring gene functionality or synteny. In general annotations of the LSC and SSC corresponding to the basic quadripartite structure of angiosperm plastid genomes were entirely missing or sparsely indicated. Inverted repeats (IRs) were either unannotated or their orientation, size and correct naming was erroneous. Compared to the tobacco reference order LSC-IRB-SSC-IRA [56], the erroneous annotation LSC-IRA-SSC-IRB is often applied. It is important to note that the IR sequences of the Atropa belladonna L. and Saracha punctate Ruiz. c Pav. were dissimilar. Inverted repeat sequences are under concerted evolution [22] and divergent sequences could be possible sequencing/assembly errors in these two genomes or they could represent a relatively rare case of chloroplast evolution. Several protein-coding genes had errors with assigned start/stop codons. For example, the start codon of the rpoC2 gene is shifted with 12 bps in most deposited plastid genomes except in Nicotiana L. species and in Datura stramonium L. Annotations were found to be insufficient for genes containing introns since they were lacking exon and/or intron designations. The exon-intron boundaries had variable annotation for many genes with high level of synteny, e.g., atpF or rpoC1. Gene annotations were missing for some species in case of psbK and psbZ, while the later was often annotated as ihbA now regarded as a synonym of psbZ.

Besides previously described genes we located and annotated hypothetical gene ycf68 the 218 bp long small plastid RNA (sprA) gene in all studied genomes. Homologs of sprA are present in eudicots but absent from monocots and they are rarely annotated in plastid genomes. This gene was reported to play a role in the 16S rRNA maturation in Nicotiana tabacum L. [57], but its function is non-essential under normal growth conditions [58]. It is not part of the catalytic core nor does it guide the rRNA machinery rather it acts independently. In this respect its function is similar to other non-essential plastid spRNAs.

According to our experiences during the reannotation none of the currently existing tools provided submission ready annotations. They required minor or even extensive manual curation especially with the most commonly used DOGMA producing results which require expert interpretation and laborious adjustments. For example annotating intron-containing genes or genes with short exons such as petB, and dealing with trans-splicing reading frames like rps12 is challenging with DOGMA. Moreover DOGMA [34] generates a special output file compared to CpGAVAS [36] or GeSeq [38], which generate standard general feature format (.gff) or GenBank (.gb) files that can be integrated with other software without further processing. From the currently available tools GeSeq [38] generated the highest quality results by annotating >95% of the genes and coding regions correctly compared to our curated reference set. In most cases annotation errors were propagated from erroneous references to newly assembled genomes creating a systematic problem in Solanaceae. For future reference we advise the jettison of outdated annotation tools such as DOGMA and advise the use of up-to-date novel software such as GeSeq to avoid complications. For de novo sequenced Solanaceae plastid genomes bittersweet can also serve as a novel reference for comparison and annotation.

Expansion and contraction of IR regions

By using the curated genome annotations we compared the junction sites of ten selected Solanaceae plastid genomes. In general IRs are systematically un-annotated in deposited plastid genomes with several genes, for example rpl2, missing. Pseudogenes like the truncated ψrps19 are mislabeled or entirely missing, which made the comparison of the IR regions cumbersome and time consuming. Therefore, we utilized an in house script, IRscope [42] to overcome these problems, and located the IRs and plotted the genes in vicinity of the junctions (Fig 3). The length of the IR regions were similar ranging from 25,343 bp to 25,906 bp showing some expansion. The endpoint of the Solanaceae JLA is characteristically located upstream of the rps19 and downstream of the trnH-GUG. In Solanoideae, the IR expanded to partially include rps19 creating a truncated ψrps19 copy at JLA, thus this pseudogene is missing from Nicotiana. The extent of the IR expansion to rps19 varies from 24 to 91 bp and the end point seems to be conserved not exceeding to the following intergenic spacer region. Furthermore, infA, ycf15, and a copy of ycf1 located on the JSB were detected as pseudogenes. In contrast to Solanum tuberosum and S. lycopersicum where JSB is tangent to the end of the pseudo ycf1 gene, the copy of this gene in S. dulcamara is showing an extra part extended further to the SSC (Fig 3).

Fig 3. Junction sites of the inverted repeats.

Fig 3

For each species, genes transcribed in positive strand are depicted on the top of their corresponding track with right to left direction, while the genes on the negative strand are depicted below from left to right. The arrows are showing the distance of the start or end coordinate of a given gene from the corresponding junction site. For the genes extending from a region to another, the T bar above or below them show the extent of their parts with their corresponding values in base pair while nothing is plotted for the genes tangent to the sites. The plotted genes and distances in the vicinity of the junction sites are the scaled projection of the genome. JLB (IRb /LSC), JSB (IRb/SSC), JSA (SSC/IRa) and JLA (IRa/LSC) denote the junction sites between each corresponding two regions on the genome.

Phylogenetic relationships in Solanaceae

Our phylogenetic analyses of the whole plastid genome alignment resulted in highly resolved trees (Fig 4), with almost all clades recovered having maximum branch support values (S1 Fig). We conducted phylogenetic analysis with three different partitioning strategies under maximum likelihood and analyzed the matrix also using parsimony. All our analyses resolved similar topologies which confirm results of previous phylogenetic analyses based on fewer genes [10, 59] but in several cases groups with low support values of earlier studies are resolved in our tree with high support values.

Fig 4. Cladogram illustrating the phylogenetic relationships of Solanaceae based on complete chloroplast genome sequences.

Fig 4

Plastid genome rearrangement events are mapped on the branches of the best scoring maximum likelihood tree generated with RAxML-NG. Each node has 100% bootstrap support value. A node with lower support value indicated and those with support values below 50% collapsed. Currently recognized suprageneric groups are listed on the right.

Trees of parsimony and ML analyses are congruent except for the clade composed of iochromas (S1 Fig). Iochrominae is a diverse clade of Physaleae with ca. 34 species and six traditionally recognized genera, including Acnistus Schott, Dunalia Kunth, Eriolarynx (Hunz.) Hunz, Iochroma Benth., Saracha Ruiz & Pav. and Vassobia Rusby. Members of this group are shrubs of high elevation in the Andes displaying great diversity in floral characteristics and pollination system. Recent molecular phylogenetic studies resolved Iochrominae with high support value but relationships within the clade have remained poorly resolved [10, 59]. In this group nodal resolution does not scale proportionately to the length of sequence analyzed, and structural variations in the plastid genome seem to be accumulated as compared to other clades.

Iochrominae represented here by Iochroma, Dunalia and Saracha appear to be monophyletic based on the analyses of the complete chloroplast genome sequences. However, our results also suggest that two of these morphologically delimited genera (Iochroma and Dunalia) are not monophyletic. Smith and Baum [60] utilizing nuclear markers (ITS, waxy and LEAFY) also found that generic boundaries are not congruent with the current taxonomy. Iochromas might have highly reticulated history that is impossible to be represented by a dichotomic tree. The unequivocal resolution of iochromas will likely require the inclusion of nuclear genomic regions.

We resolved Solanum dulcamara in a separate clade with S. nigrum appearing as a sister group. This reinforces the close relationship of the Dulcamaroid and Morelloid clades as proposed by other molecular phylogenetic analyses based on fewer markers [810]. The informally named x = 12 clade is found in our analysis as sister to Nicotianoideae. In this group the chromosome numbers are based on 12 pairs [61], and members are estimated to have gone through two separate whole-genome duplication (WGD) events ca. 117 Ma [62] and 49 Ma BP [63], respectively. Increased sampling outside this group is needed since this could shed light on ancient WGDs in the family. Plastid genomes of Solanaceae hold much promise for resolving relationships among clades of the family that have previously been problematic. Although the phylogenomic tree presented in this study is largely robust it should be kept on mind that our sampling is still sparse in terms of the number of terminals. It is also important to note that organellar phylogenomics may fail in rapidly radiating groups with interspecific hybridization as exemplified here by iochromas. Other biological processes such as incomplete lineage sorting might also make phylogenetic analyses very difficult, however, organellar phylogenomics can be used to detect such processes.

Plastid genome structure of Solanaceae

Intending to identify and map the major structural changes of Solanaceae plastid genomes on the phylogenetic tree, we selected ten Solanaceae plastid genomes for detailed comparison representing diverse groups of the family and included two outgroup taxa in the analysis. Gene comparisons were extended to the entire Solanaceae dataset using local alignments with MAFFT and the curated genome annotations. The size of the plastid genomes varied between 155,312 bp (Solanum tuberosum) to 162,046 bp (Ipomoea purpurea) (S4 Table). Our comparison shows that gene content and synteny are highly conserved across Solanaceae plastid genomes (S2 Fig). All species analyzed display complete gene synteny when accounting for expansion and contraction of the IRs (Fig 3). The organization and evolution of Solanaceae plastid DNA have been analyzed by previous studies using restriction site methods [64], PCR surveys [6568] and complete genome sequences [6974]. These comparisons highlighted some features of Solanaceae but the phylogenetic distribution of these rearrangements have not been examined. Our comprehensive comparison of complete chloroplast genomes of ten Solanaceae and S. dulcamara confirm the presence of all the genomic rearrangements reported previously. We will briefly review the conclusions made before and then highlight the novel aspects resulting from our analysis and moreover, examine the distribution of these structural changes using the phylogenetic hypothesis constructed based on complete plastid genome alignment.

We observed ten characteristic features in Solanaceae plastid genomes linked to indels or pseudogenization processes (Table 2). Two genes, one copy of ψycf1 and ψrps19 at the IRb/SSC and IRa/LSC junction were truncated pseudogenes, while infA has become non-functional through partial degradation. The substitutions of infA orthologues in Solanaceae show almost equal numbers of substitutions at all codon positions with missing start codons. It is also a pseudogene in Ipomoea representing Convolvulaceae, the sister family of Solanaceae but it appears to be functional in Coffea of Rubiaceae [75] used as a distant outgroup of Lamiids. The infA gene seem to have become non-functional in the ancestor of Solanales multiple times independently. In Solanaceae the pseudogenization further continued with a monophyletic 124-bp deletion in the ancestor of the genus Solanum. Further changes appeared in four protein-coding genes; there is a 64-bp deletion in psbD of Iochroma tingoanum while 31-bp was deleted from the rpl20 gene in members of Physaleae. Capsicum lycianthoides Bitter had a unique 15-bp insertion in the rpl33 gene. The accD gene, which encodes one of the four subunits of the acetyl-CoA carboxylase enzyme in most chloroplasts show a 24-bp insertion in the members of the ‘x = 12 clade’ [61]. This seems to be an ancestral trait shared by members of Nicotianoideae and Solanoideae and maintained in Datura L., Nicotiana, Physalis L. and Iochromas but lost independently in Hyoscyamus L., Capsicum L. and Solanum. The latter two went through a characteristic 141-bp and a small 9-bp insertion. The 141-bp deletion was also confirmed in Capsicum by Jo et al. [72]. The small plastid RNA (sprA) gene, which includes a complementary segment to the pre-16S rRNA shows high variability among Solanaceae. Functional sprA copies were present in most Solanaceae but several mutation event indicate it has be non-functional is some groups. A 52-bp deletion appeared in Capsicum at the 5’ and further 37-bp were deleted in iochromas while Physalis showed an autapomorphic 14-bp insertion (S3 Fig). The function sprA has been lost independently multiple times once in Iochrominae and in Capsaceae, however, the gene remained functional in Capsicum lycianthoides.

Table 2. Major changes in the chloroplast genomes of Solanaceae.

Gene Insertion Deletion Pseudogene Notes
accD 2 1 - 24-bp deletion in the 'x = 12 clade' except
(Nicotiana, Datura, Physalis, Iochromas)
141-bp insertion in Capsicum
9-bp insertion in Solanum
infA - 1 + 124-bp deletion in Solanum
psbD - 1 - 64-bp deletion in Iochroma tongoanum
rpl20 - 1 - 31-bp deletion Physaleae
rpl33 1 - 15-bp insertion in Capsicum lycianthoides
rps19 - - + -
sprA 1 2 - 14-bp insertion in Physalis
52-bp deletion in Capsicum
37-bp deletion Iochromas
trnA-UGC - 2 - 108-bp and 141-bp intron deletion in Nicotiana and Atropa/Hyosciamus
trnF-GAA - - + Uniting a group of Pseudosolanoids
ycf1 - - + Truncated pseudogenization of one ycf1 copy in Solanaceae.

Genomic changes also affect tRNA genes and neighboring regions. The most notable change is the duplication of the original phenylalanine (trnF-GAA) gene in a tandem array composed by multiple pseudogene copies in Solanaceae. The pseudogene copies are composed of several highly structured motifs that are partial residues or entire parts of the anticodon, T- and D-domains of the original trnF gene [66]. Previously it was shown that these copies are subjected to possible inter- or intrachromosomal recombination events [67] and they have high taxonomic relevance uniting a unique plastid clade of Pseudosolanoids [68]. They provide support for previous results [10, 59] separating the Atropina and Juanulloae clades from Solaneae, Capsaceae, Physaleae, Datureae and Salpichroina [68]. Another tRNA related structural change is apparent in the group II intron of trnA-UGC, where 108-bp was deleted in Nicotiana and extended up to 147-bp in Atropa L. and Hyoscyamus.

Gene expression analyses

We carried out the expression analysis of 85 protein-coding genes (Table 3). As we were mostly interested about CDS/gene features we used only these annotation types for read mapping. We also used the RNA-seq data set to verify start/stop codon positions and further ultimate or penultimate editing sites from the reannotation process. A total of 147,721 reads were mapped to the bittersweet plastid genome with an average 112× read depth. The largest portion of reads 25,910 (17.53%) and 12,582 (8.51%) was derived from adenosine triphosphate (ATP) synthase genes and from the photosystem II (PSII) complex. All genes were normally expressed while the five most abundant were atpB, atpE, clpP, rps7 and psbM (>10,000 FPKM). The assembled consensus sequence from the mapped reads (148,110 bp long) covered 95.22% of the genome spanning through also intergenic spacer (IGS) sequences. Accordingly, a nearly complete pseudo Solanum dulcamara plastid genome was unexpectedly obtained by means of transcriptome data. We found multiple transcripts mapping to several non-functional genes for example ycf15, infA, or to truncated pseudogenes ψycf1 and ψrps19 at the JLA (IRa/LSC). From these infA, ψycf1 and ψrps19 were nearly completely covered (S4 Fig) showing that they are indeed transcribed, while ycf15 had sparse coverage. This indicates that transcriptome sequencing captured both primary and processed mRNA sequences of the plastome. The detected and mapped reads of the bittersweet plastid RNA population could be grouped into three major types i) mRNAs ii) non-coding RNAs from IGS regions and iii) tranditonal non-coding RNAs (rRNAs and tRNAs). Similar patterns were observed by Shi et al. [76] and also in earlier studies using Northern blot hybridization where 90% of the plastid genome was found to be transcribed [77]. Such patterns could be caused by transcriptional uncoupling of genes in polycistronic clusters [78]. Non-coding RNAs (ncRNAs) in the plastome are further transcribed from intergenic regions (IGSs), which play important role in post-transcriptional regulation [79]. Cyanobacteria contain several ncRNAs making it plausible that also plastomes harbor a wide variety of undetected regulatory ncRNAs [80]. These results show that non-functional genes are transcribed as a precursor polycistronic transcript, which are later edited during pre-mRNA maturation. In order to activate the function of other genes plastid primary transcripts are edited and expression in the plastome mainly occurs at a post-transcriptional stage. The multiple transcription arrangement leading to the full transcription of plastid genomes is a prokaryotic ancestral trait still preserved in eukaryotic cells billion years after the primary endosymbiosis [81, 82].

Table 3. RNA Expression of protein-coding genes in the Solanum dulcamara chloroplast genome.

Reads per kilobase per million (RPKM), fragments per kilobase of exon per million fragments mapped (FPKM) and transcripts per million (TPM) for transcript variants.

Gene Location min. Max Length FPKM RPKM TPM
atpB 54,285 55,781 1,497 278926.7 278926.7 232422.2
atpE 53,887 54,288 402 238932.2 238932.2 199095.9
clpP 71,842 73,864 591 120109.6 120109.6 100084.1
rps7 142,238 142,705 468 91956.3 91956.3 76624.7
rps7 98,701 99,168 468 88572.2 88572.2 73804.8
psbM 30,605 30,709 105 22431.7 22431.7 18691.7
psbA 552 1,613 1,062 21738.5 21738.5 18114.1
ycf1 125,388 131,069 5,682 21573.2 21573.2 17976.3
psbK 7,750 7,935 186 21287.0 21287.0 17737.9
psaJ 68,897 69,031 135 19101.3 19101.3 15916.6
rbcL 56,597 58,030 1,434 13932.8 13932.8 11609.9
rpl20 70,391 70,777 387 11700.0 11700.0 9749.3
psbI 8,248 8,406 159 11493.2 11493.2 9576.9
rps12 71,590 142,184 372 11407.7 11407.7 9505.7
rps12 71,590 100,015 372 11243.9 11243.9 9369.3
psbJ 65,856 65,978 123 10565.0 10565.0 8803.5
atpF 11,989 13,234 555 8579.1 8579.1 7148.8
psbE 66,378 66,629 252 8540.8 8540.8 7116.8
rps16 5,077 6,199 267 7528.7 7528.7 6273.4
atpH 13,637 13,882 246 7180.9 7180.9 5983.6
ycf1 110,382 111,527 1,146 6963.1 6963.1 5802.2
rps18 69,855 70,160 306 6768.2 6768.2 5639.8
rps15 124,723 124,986 264 6537.5 6537.5 5447.5
rps19 85,655 85,933 279 6258.8 6258.8 5215.3
rpl22 85,135 85,602 468 6225.9 6225.9 5187.8
rps14 38,024 38,326 303 6098.1 6098.1 5081.4
ndhH 123,425 124,606 1,182 5531.4 5531.4 4609.1
psbT 76,034 76,138 105 5511.2 5511.2 4592.4
rpl16 82,913 84,349 405 5113.7 5113.7 4261.1
psbZ 37,053 37,241 189 4941.9 4941.9 4117.9
psaC 118,619 118,864 246 4704.7 4704.7 3920.3
cemA 62,915 63,604 690 4472.9 4472.9 3727.1
rps3 84,494 85,150 657 4249.4 4249.4 3540.9
ycf3 43,702 45,689 507 4245.1 4245.1 3537.4
psbC 34,984 36,369 1,386 4211.8 4211.8 3509.6
psbB 74,308 75,834 1,527 4135.4 4135.4 3445.9
rpl33 69,463 69,663 201 3838.7 3838.7 3198.7
ndhA 121,171 123,423 1,092 3830.3 3830.3 3191.7
rpl2 153,916 155,406 825 3704.0 3704.0 3086.5
psaB 38,445 40,649 2,205 3480.8 3480.8 2900.4
petN 29,403 29,492 90 3384.1 3384.1 2819.9
psaA 40,675 42,927 2,253 3343.5 3343.5 2786.1
rpl2 86,000 87,490 825 3322.6 3322.6 2768.6
psbD 33,939 35,000 1,062 3259.8 3259.8 2716.3
petB 76,806 78,207 652 3238.8 3238.8 2698.8
ndhK 50,792 51,535 744 3097.5 3097.5 2581.1
ndhI 120,574 121,077 504 2860.4 2860.4 2383.5
ndhB 96,202 98,413 1,533 2834.4 2834.4 2361.9
ndhB 142,993 145,204 1,533 2794.7 2794.7 2328.7
rps2 16,048 16,758 711 2770.1 2770.1 2308.3
atpA 10,411 11,934 1,524 2618.0 2618.0 2181.5
atpI 15,056 15,799 744 2565.4 2565.4 2137.6
rps8 81,838 82,242 405 2506.7 2506.7 2088.8
rpl14 82,410 82,778 369 2366.1 2366.1 1971.6
ndhJ 50,210 50,686 477 2256.1 2256.1 1879.9
psbN 76,212 76,343 132 2230.4 2230.4 1858.6
ndhC 51,526 51,888 363 2097.6 2097.6 1747.9
petD 78,398 79,611 483 1639.5 1639.5 1366.2
psbH 76,455 76,676 222 1554.9 1554.9 1295.6
ndhG 119,645 120,175 531 1453.1 1453.1 1210.8
matK 2,136 3,665 1,530 1446.5 1446.5 1205.4
petG 67,909 68,022 114 1424.9 1424.9 1187.3
rpoC1 21,302 24,105 2,067 1414.5 1414.5 1178.7
rps11 80,882 81,298 417 1412.1 1412.1 1176.6
petA 63,824 64,786 963 1244.0 1244.0 1036.6
rpoA 79,803 80,816 1,014 1201.5 1201.5 1001.1
ycf4 61,594 62,148 555 1170.7 1170.7 975.5
ndhE 119,116 119,421 306 1128.0 1128.0 940.0
rpl23 153,616 153,897 282 1116.0 1116.0 930.0
psaI 61,037 61,147 111 1097.5 1097.5 914.6
rpl23 87,509 87,790 282 1080.0 1080.0 900.0
rpl32 114,524 114,691 168 966.9 966.9 805.7
accD 58,765 60,288 1,524 906.0 906.0 754.9
rps4 46,706 47,311 606 770.6 770.6 642.1
rpoB 24,111 27,338 3,228 673.0 673.0 560.8
petL 67,627 67,722 96 634.5 634.5 528.7
ndhD 116,999 118,501 1,503 580.9 580.9 484.1
rpoC2 16,983 21,149 4,167 438.5 438.5 365.4
rpl36 81,400 81,513 114 356.2 356.2 296.8
ycf2 88,118 94,960 6,843 308.6 308.6 257.1
ycf2 146,446 153,288 6,843 281.9 281.9 234.9
ndhF 111,507 113,729 2,223 246.6 246.6 205.5
ccsA 115,826 116,767 942 215.5 215.5 179.6
ycf15 95,045 95,308 264 76.9 76.9 64.1
ycf15 146,098 146,361 264 76.9 76.9 64.1

Plastid RNA editing

Chloroplast RNA editing was first discovered in 1991 [83] and it could be defined as the post-transcriptional modification of pre-RNAs by insertion, deletion or substitution of specific nucleotides to form functional RNAs. In the plastid genome this processing machinery is crucial to alter the long pre-RNA transcripts as detailed above. The most frequent editing events in plants are C-to-U changes, however, U-to-C editing has also been observed [84]. RNA editing is absent in liverworts and green algae while it is abundant in lycophytes, ferns and hornworts [85]. To gain insight to the RNA metabolism of bittersweet we first predicted 28 RNA editing sites out of 35 plastid genes with PREP (Table 4). We aligned RNA read sequences using bittersweet as a reference genome and by variant searching we confirmed 23 editing sites from those predicted with PREP. We found four additional editing sites with variant search not detected by PREP resulting in 27 confirmed editing sites. From these 25 (92.5%) were C-to-U changes and two were A-to-G and G-to-U conversions resulting in non-synonymous amino acid changes. The percentage of conversion rates for each edit varied between 25 to 95.9% according to the calculated ratio between the numbers of reads with an alternate base compared with the reference. Some edits showed high rates (>90%) for atpF, ndhB, petB, psbE and rps14 genes making it clear that these forms are highly abundant among processed RNAs in bittersweet. Edits of these particular genes has also been reported in previous studies of embryophytes [86, 87] suggesting the conserved feature of such sites. It has been proposed that RNA editing is of monophyletic origin and evolved as a mechanism to conserve certain codons [88]. For example the start codon (AUG) of the psbL and ndhD is RNA edited (C-to-U) in all Solanaceae except in Datura stramonium where the start codon of psbL remains unedited.

Table 4. RNA editing sites in the Solanum dulcamara chloroplast genome.

Gene Name Length Strand Region Nt pos AA pos Effect Nt Change Score RNASeq PREP Number of reads
atpF 1246 + LSC 92 31 CCA (P) = > CUA (L) C = > U 0.86 + + U; 49 (90.7%), C; 5 (9.3%)
ndhA 2258 + SSC 341 114 UCA (S) = > UUA (L) C = > U 1 + + U; 35 (70%), C; 15 (30%)
ndhA 2258 + SSC 566 189 UCA (S) = > UUA (L) C = > U 1 + + U; 20 (34.4%), C; 38 (65.6%)
ndhA 2258 + SSC 1073 358 UCC (S) = > UUC (F) C = > U 1 + + U; 49 (74.2%), C; 17 (25.8%)
ndhB 2212 + IR 149 50 UCA (S) = > UUA (L) C = > U 1 + + U; 33 (86.8%), C; 5 (13.1%)
ndhB 2212 + IR 467 156 CCA (P) = > CUA (L) C = > U 1 + + U; 34 (87.1%), C; 5 (12.9%)
ndhB 2212 + IR 586 196 CAU (H) = > UAU (Y) C = > U 1 + + U; 26 (82.3%), C; 8 (17.7%)
ndhB 2212 + IR 611 204 UCA (S) = > UUA (L) C = > U 0.80 + + U; 33 (89.1%), C; 4 (10.9%)
ndhB 2212 + IR 737 246 CCA (P) = > CUA (L) C = > U 1 + + U; 47 (95.9%), C; 2 (4.1%)
ndhB 2212 + IR 746 249 UCU (S) = > UUU (F) G = > U 1 + + U; 40 (95.2%), C; 2 (4.8%)
ndhB 2212 + IR 780 260 UGG (P) = > UGU (C) C = > U - + - U; 32 (50.8%), G, 31 (49.2%)
ndhB 2212 + IR 830 277 UCA (S) = > UUA (L) C = > U 1 + + U; 44 (97.1%), C; 1 (2.9%)
ndhB 2212 + IR 836 279 UCA (S) = > UUA (L) C = > U 1 - + -
ndhB 2212 + IR 1481 494 CCA (P) = > CUA (L) C = > U 1 + + U; 20 (52.6%), C; 18 (47.4%)
ndhD 1504 + SSC 2 1 ACG (T) = > AUG (M) C = > U - + - U; 40 (95.2%), C; 2 (4.8%)
ndhF 2223 + SSC 290 97 UCA (S) = > UUA (L) C = > U 1 - + -
petB 1398 - LSC 1168 390 CGG (R) = > UGG (W) C = > U 1 + + U; 15 (93.8%), C; 1 (6.2%)
petB 1398 - LSC 1361 454 CCA (P) = > CUA (L) C = > U 1 + + U; 23 (74.2%), C; 8 (25.8%)
psbE 252 + LSC 214 72 CCU (P) = > UCU (S) C = > U 1 + + U; 112 (93.3%), C; 8 (6.7%)
psbL 124 + LSC 2 1 ACG (T) = > AUG (M) C = > U - + - U; 40 (95.2%), C; 2 (4.8%)
rpl20 387 + LSC 308 103 UCA (S) = > UUA (L) C = > U 0.86 + + U; 107 (56.6%), C; 82 (43.4%)
rpoA 1014 + LSC 830 277 UCA (S) = > UUA (L) C = > U 1 + + U; 8 (61.5%), C; 5 (38.5%)
rpoA 1014 + LSC 903 301 AUG (M) = > GUG (V) A = > G - + - G; 25 (62.5%), A; 15 (37.5%)
rpoB 3213 + LSC 338 113 UCU (S) = > UUU (F) C = > U 1 + + U; 15 (75%), C; 5 (25%)
rpoB 3213 + LSC 473 158 UCA (S) = > UUA (L) C = > U 0.86 + + U; 13 (76.5%), C; 4 (23.5%)
rpoB 3213 + LSC 551 184 UCA (S) = > UUA (L) C = > U 1 - + -
rpoB 3213 + LSC 2000 667 UCU (S) = > UUU (F) C = > U 1 - + -
rpoB 3213 + LSC 2426 809 UCA (S) = > UUA (L) C = > U 0.86 + + U; 5 (25%), C; 15 (75%)
rpoC1 2783 + LSC 41 14 UCA (S) = > UUA (L) C = > U 1 + + U; 5 (27.7%), C; 13 (72.3%)
rpoC2 4167 + LSC 119 40 CCC (P) = > CUC (L) C = > U - + - U; 10 (37.1%), C; 17 (62.9%)
rpoC2 4167 + LSC 3731 1244 UCA (S) = > UUA (L) C = > U 0.86 - + -
rps2 711 + LSC 134 45 ACA (T) = > AUA (I) C = > U - + - C; 8 (38.1%); U; 13 (61.9%)
rps2 711 + LSC 248 83 UCA (S) = > UUA (L) C = > U 1 + + C; 5 (31.3%), U; 11 (68.7%)
rps14 303 + LSC 80 27 UCA (S) = > UUA (L) C = > U 1 + + C; 5 (5.8%), U; 81 (94.2%)

Conclusions

Comparison of chloroplast genome organization not only provide us with valuable information for understanding the processes of chloroplast evolution, but also gives insights into the mechanisms underlying genomic rearrangements [25]. Furthermore, investigation of plastid genome structures could trigger further breakthroughs in applied sciences. For example herbicides like PSI and PSII inhibitors have their target genes in the chloroplast genome thus understanding the chloroplast genome may indirectly support the exploration of herbicide resistance and development of novel control methods [89]; while plastid engineering can also be useful to develop resistance to various abiotic and biotic stress factors based on discovered resistance traits. Here we report the complete chloroplast genome sequence of Solanum dulcamara as a genomic tool for potential plastid genome comparative studies. We also present the reannotation of Solanaceae plastid genomes using manual curation using S. dulcamara as a reference. Based on the reannotated genome sequences we introduce a hypothesis of the ancestral plastid genome organization of Solanaceae and the rearrangements unique to some major clades. The ancestral plastid genome of Solanaceae had two degraded non-functional genes, infA and truncated ycf1 copy, a deletion in the trnA intron and the appearance of a highly divergent gene (sprA). Our ancestral genome reconstruction suggests further rearrangements in the stem branch of Solanoideae by the expansion of the IR and the occurrence of a truncated ψrps19 copy at the JLA as a consequence of the expansion. This has been followed by independent rearrangements in deeper nodes such as the accumulation of trnF pseudogenes in tandem arrays at a clade referred to as the ‘Pseudosolanoids’ [68] or by the pseudogenization of sprA in Physaleae and Capsiceae by two deletions. Further degradation of the infA pseudogene is specific for the largest genus Solanum, including tomato and potato.

Supporting information

S1 Data. MAFFT sequence alignment for 35 complete plastid genome sequences used in phylogenetic analysis.

(RAR)

S2 Data. NEXUS file containing the binary coding used to map genomic changes appearing in the chloroplast genome.

(RAR)

S3 Data. Annotated checklist of SSRs in Solanum dulcamara plastid genome hits founds by MISA.

(RAR)

S4 Data. Reannotation file of Solanaceae plastid genomes in Geneious format, accessible with 7.1 or later version.

(RAR)

S5 Data. Reannotation files in GFF and GB file format.

(ZIP)

S1 Table. NCBI GenBank accession numbers used in this study.

(DOCX)

S2 Table. List of genes in the chloroplast genome of bittersweet.

(DOCX)

S3 Table. The genes having intron in the Solanum dulcamara plastid genome and the length of the exons and introns.

(DOCX)

S4 Table. Comparison of major features of Solanum dulcamara and nine Solanaceae plastid genomes.

(DOCX)

S5 Table. Relative synonymous codon usage (RSCU) of Solanum dulcamara is given in parentheses following the codon frequency.

(DOCX)

S6 Table. Estimates of average evolutionary divergence over 80 protein coding-gene sequences from Solanaceae.

(DOCX)

S7 Table. Total number of perfect simple sequence repeats (SSRs) identified within the chloroplast genome of Solanum dulcamara.

(DOCX)

S8 Table. List of annotation errors found in Solanaceae chloroplast genomes.

(XLSX)

S1 Fig. Best scoring maximum likelihood trees obtained with RAxML and the most parsimonious tree generated with TNT.

(DOCX)

S2 Fig. Visualization alignment of chloroplast genome sequences with mVISTA-based identity plots.

(PNG)

S3 Fig. Alignment of the sprA gene in Solanaceae.

(PDF)

S4 Fig. RNAseq reads mapped to the genomic region of ycf15 pseudogene.

(PDF)

Acknowledgments

We thank staff and colleagues of the Viikki Biocenter who kindly contributed reagents, materials and analyses tools for our study.

Data Availability

The chloroplast genome data is available from NCBI under accession number KY863443.

Funding Statement

The authors received no specific funding for this work.

References

  • 1.Knapp S. The revision of the Dulcamaroid clade of Solanum L. (Solanaceae). PhytoKeys. 2013; 22: 1–432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Máthé I, Máthé I Jr. Variations in alkaloids in Solanum dulcamara L In: Hawkes JG, Lester RN, Skelding AD (Eds) The biology and taxonomy of the Solanaceae. Academic Press, London, 1979; pp. 211–222. [Google Scholar]
  • 3.Kumar P, Sharma B, Bakshi N. Biological activity of alkaloids from Solanum dulcamara L. Nat Prod Res. 2009; 23: 719–723. doi: 10.1080/14786410802267692 [DOI] [PubMed] [Google Scholar]
  • 4.D’Arcy WG. Solanaceae studies II: Typification of subdivisions of Solanum. Ann Miss Bot Gard. 1972; 59: 262–278. [Google Scholar]
  • 5.Nee M. Synopsis of Solanum in the New World In: Nee M, Symon DE, Lester RN, Jessop JP (Eds) Solanaceae IV: Advances in biology and utilization. Royal Botanic Gardens, Kew, 1999; 285–333. [Google Scholar]
  • 6.Lester RN. Evolutionary relationschips of tomato, potato, pepino and wild species of Lycopersicon and Solanum In: Hawkes JG, Lester RN, Nee M and Estrada-R N (eds.), Solanaceae III: Taxonomy, Chemistry and Evolution. Roy. Bot. Gardens, Kew: 1991; pp. 283–301 [Google Scholar]
  • 7.Child A, Lester RN. Synopsis of the genus Solanum L. and its infrageneric taxa In: van den Berg RG, Barendse GWM, van der Weerden GM, Mariani C (eds) Solanaceae V: advances in taxonomy and utilization. Nijmegen University Press, 2001; pp 39–52. [Google Scholar]
  • 8.Bohs L. Major clades in Solanum based on ndhF sequence data In: Keating RC, Hollowell VC, Croat TB (eds) A Festschrift for William G. D’Arcy: the legacy of a taxonomist. Missouri Botanical Garden Press, St. Louis: (Monographs in systematic botany from the Missouri Botanical Garden 2005; 104: 27–49) [Google Scholar]
  • 9.Weese T, Bohs L. A three-gene phylogeny of the genus Solanum (Solanaceae). Syst Bot. 2007; 32: 445–463. [Google Scholar]
  • 10.Särkinen T, Bohs L, Olmstead RG, Knapp S. A phylogenetic framework for evolutionary study of the nightshades (Solanaceae): a dated 1000-tip tree. BMC Evol Biol. 2013; 13: 214 doi: 10.1186/1471-2148-13-214 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Särkinen T, Barboza GE, Knapp. True back nightshades: phylogeny and delimitation of the Morelloid clade of Solanum. Taxon. 2015; 64:945–958. [Google Scholar]
  • 12.Takács AP, Kazinczi G, Horváth J, Pribék D. New host-virus relations between different Solanum species and viruses. Meded Rijkuniv Gent Fak Landbouwkd Toegep Biol Wt. 2001; 66: 183–186. [PubMed] [Google Scholar]
  • 13.Perry KL, McLane H. Potato virus m in bittersweet nightshade (Solanum dulcamara) in New York State. Plant Dis. 2011; 95: 619–623. [DOI] [PubMed] [Google Scholar]
  • 14.Hajianfar R, Kolics B, Cernák I, Wolf I, Polgár Zs, Taller J. Expression of biotic stress response genes to Phytophthora infestans inoculation in White Lady, a potato cultivar with race-specific resistance to late blight. Physiol Mol Plant Pathol. 2016; 93:22–28. [Google Scholar]
  • 15.Golas TM, Weerden GMVD, Berg RGVD, Mariani C, Allefs JJHM. Role of Solanum dulcamara L. in potato late blight epidemiology. Potato Res. 2010; 53: 69–81. [Google Scholar]
  • 16.Golas TM, Feron RMC, van den Berg RG, van der Weerden GM, Mariani C, Allefs JJHM Genetic structure of European accessions of Solanum dulcamara L. (Solanaceae). Plant Syst Evol. 2010a; 285: 103–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Poczai P, Varga I, Bell NE, Hyvönen J. Genetic diversity assessment of bittersweet (Solanum dulcamara, Solanaceae) germplasm using conserved DNA-derived polymorphism and intron-targeting markers. Ann Appl Biol. 2011; 159: 141–153. [Google Scholar]
  • 18.Vallejo-Marín M, O’Brien HE. Correlated evolution of self-incompatibility and clonal reproduction in Solanum (Solanaceae). New Phytol. 2006; 173:415–421. [DOI] [PubMed] [Google Scholar]
  • 19.Hewitt GM. Genetic consequences of climatic oscillations in the Quaternary. Phil Trans R Soc Lond B. 2004; 359: 183–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Oldenburg DJ, Bendich AJ. DNA maintenance in plastids and mitochondria of plants. Frontiers Plant Sci. 2015; 6:883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Oldenburg DJ, Bendich AJ. The linear plastid chromosomes of maze: terminal sequences, structures, and implications for DNA replication. Curr Genet. 2016; 62:431–442. doi: 10.1007/s00294-015-0548-0 [DOI] [PubMed] [Google Scholar]
  • 22.Daniell H, Lin C-S, Yu M, Chang W-J. Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol. 2016; 17:134 doi: 10.1186/s13059-016-1004-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kim KJ, Lee HL. Complete chloroplast genome sequences from Korean ginseng (Panax schinseng Nees) and comparative analysis of sequence evolution among 17 vascular plants. DNA Res. 2004; 11: 247–261. [DOI] [PubMed] [Google Scholar]
  • 24.Weng M-L, Blazier JC, Govindu M, Jansen RK. Reconstruction ofthe ancestral plastid genome in Geraniaceae reveals a correlation between genome rearrangments, repeats, and nucleotide substitution rates. Mol Biol Evol. 2014; 31:645–659. doi: 10.1093/molbev/mst257 [DOI] [PubMed] [Google Scholar]
  • 25.Poczai P, Hyvönen J. The complete chloroplast genome sequence of the CAM epiphyte Spanish moss (Tillandsia usneoides, Bromeliaceae) and its comparative analysis. PloS ONE. 2017; 12: e0187199 doi: 10.1371/journal.pone.0187199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Karol KG, Arumuganathan K, Boore JL, Duffy AM, Everett KDE, Hall JD et al. Complete plastome sequences of Equisetum arvense and Isoetes flaccida: implications for phylogeny and plastid genome evolution of early land plant lineages. BMC Evol Biol. 10:321 doi: 10.1186/1471-2148-10-321 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jansen RK, Cai Z, Raubeson LA, Daniell H, dePamphilis CW, Leebens-Mack J et al. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci USA. 2007; 104:19369–19374. doi: 10.1073/pnas.0709121104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Shi C, Hu N, Huang H, Gao J, Zhao Y-J, Gao L-Z. An improved chloroplast DNA extraction procedure for whole plastid genome sequencing. PLoS ONE 2012; 7:e31468 doi: 10.1371/journal.pone.0031468 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Atherton RA, McComish BJ, Shepherd LD, Berry LA, Albert NW, Lockhart PJ. Whole genome sequencing of enriched chloroplast DNA using the Illumina GAII platform. Plant Meth. 2010; 6:22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014; 30:2114–2120. doi: 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 2012; 28: 1647–1649. doi: 10.1093/bioinformatics/bts199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using Bruijin graphs. Genome Res. 2008; 18: 821–829. doi: 10.1101/gr.074492.107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Moore MJ, Bell CD, Soltis PS, Soltis DE. Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc Natl Acad Sci USA. 2007; 104: 19363–19368. doi: 10.1073/pnas.0708072104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wyman SK, Jansen RK, Boore JL. Automatic annotation of organellar genomes with DOGMA. Bioinformatics. 2004; 20: 3252–3255. doi: 10.1093/bioinformatics/bth352 [DOI] [PubMed] [Google Scholar]
  • 35.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomics sequence. Nucleic Acid Res. 1997; 25: 955–964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Liu C, Shi L, Chen H, Zhang J, Lin X, Guan X. CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences. BMC Genomics. 2012; 13: 715 doi: 10.1186/1471-2164-13-715 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.McKain MR, Hartsock RH, Wohl MM, Kellogg EA. Verdant: automated annotation, alignment and phylogenetic analysis of whole chloroplast genomes. Bioinformatics. 2017; 33: 130–132. doi: 10.1093/bioinformatics/btw583 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R et al. GeSeq–versatile and accurate annotation of organelle genomes. Nucl Acids Res. 2017; 45 (W1): W6–W11. doi: 10.1093/nar/gkx391 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lohse M, Drechsel O, Bock R. OrganellarGenomeDRAW (OGDRAW)–a tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr Genet. 2007; 52: 267–274. doi: 10.1007/s00294-007-0161-y [DOI] [PubMed] [Google Scholar]
  • 40.Kumar S, Stecher G, Tamura K. MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016; 33: 1870–1874. doi: 10.1093/molbev/msw054 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I. VISTA: computational tools for comparative genomics. Nucleic Acid Res. 2004; 32: W273–279. doi: 10.1093/nar/gkh458 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Amiryousefi A, Hyvönen J, Poczai P. IRscope: an online program to visualize the junction sites of chloroplast genomes. Bioinformatics. 2018; bty220, https://doi.org/10.1093/bioinformatics/bty220 [DOI] [PubMed] [Google Scholar]
  • 43.Kurtz S, Schleiermacher C. REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics. 1999;15: 426–427. [DOI] [PubMed] [Google Scholar]
  • 44.Thiel T, Michalek W, Varshney RK, Graner A. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.) Theor Appl Genet. 2003; 106: 411–422. doi: 10.1007/s00122-002-1031-0 [DOI] [PubMed] [Google Scholar]
  • 45.Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009; 10: R25 doi: 10.1186/gb-2009-10-3-r25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Mower JP. The PREP suite: predictive RNA editors for plant mitochondrial genes, chloroplast genes and user-defined alignments. Nucleic Acid Res. 2009; 37: W253–W259. doi: 10.1093/nar/gkp337 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. 2012; arXiv preprint arXiv:1207.3907 [q-bio.GN]
  • 48.Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in performance and usability. Mol Biol Evol. 2013; 30: 772–780. doi: 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.RAxML Next Generation: faster, easier-to-use and more flexible. 2018; doi: 10.5281/zenodo.593079
  • 50.Lanfear R, Frandsen PB, Wright AM, Senfeld T, Calcott B. ParitionFinder2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Mol Biol Evol. 2017; 34:772–773. doi: 10.1093/molbev/msw260 [DOI] [PubMed] [Google Scholar]
  • 51.Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models, new heuristics and high-performance computing. Nature Math. 2012; 9:772. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Nguyen L-T, Schmidt HA, Haeseler VA, Minh BQ. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015; 32:268–274. doi: 10.1093/molbev/msu300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Goloboff PA, Farris JS, Nixon KC. TNT, a free program for phylogenetic analysis. Cladistics. 2008; 24: 774–786. [Google Scholar]
  • 54.Maddison WP, Maddison DR. Mesquite: a modular system for evolutionary analysis. 2017; Version 3.2 http://mesquiteproject.org
  • 55.Stöver BC, Müller KF. TreeGraph2: combining and visualizing evidence from different phylogenetic analyses. BMC Bioinf. 2010; 11:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsubayashi T et al. The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO J. 1986; 5:2043–2049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Vera A, Sugiura M. A novel RNA gene in the tobacco plastid genome: its possible role in the maturation of 16S rRNA. EMBO J. 1994; 13: 2211–2217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Sugita M, Svab Z, Maliga P, Sugiura M. Targeted deletion of sprA from the tobacco plastid genome indicates that the encoded small RNA is not essential for pre-16S rRNA maturation in plastids. Mol Gen Genet. 1997; 257:23–27. [DOI] [PubMed] [Google Scholar]
  • 59.Olmstead RG, Bohs L, Migid HA, Santiago-Valentin E, Garcia VF, Collier SM. A molecular phylogeny of the Solanaceae. Taxon. 2008; 57: 1159–1181. [Google Scholar]
  • 60.Smith SD, Baum DA. Phylogenetics of the florally diverse Andean clade Iochrominae (Solanaceae). Am J Bot. 2006; 93: 1140–1153. doi: 10.3732/ajb.93.8.1140 [DOI] [PubMed] [Google Scholar]
  • 61.Olmstead RG, Palmer JD. A chloroplast DNA phylogeny of the Solanaceae: subfamilial relationships and character evolution. Ann Miss Bot Gard. 1992; 79: 346–360. [Google Scholar]
  • 62.Tomato Genome Consortium. The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012; 485: 635–641. doi: 10.1038/nature11119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Bombarely A, Moser M, Amrad A, Bao M, Bapaume L, Barry S et al. Insight into the evolution of the Solanaceae from the parental genomes of Petunia hybrida. Nature Plants. 2016; 2:16074 doi: 10.1038/nplants.2016.74 [DOI] [PubMed] [Google Scholar]
  • 64.Olmstead RG, Palmer JD. A chloroplast DNA phylogeny of the Solanaceae: subfamilial relationships and character evolution. Ann. Miss. Bot. Gard. 1992; 79: 346–360. [Google Scholar]
  • 65.Chung H-J, Jung JD, Park H-W, Kim J-H, Cha HW, Min SR et al. The complete chloroplast genome sequences of Solanum tuberosum and comparative analysis with Solanaceae species identified the presence of a 241-bp deletion in cultivated potato chloroplast DNA sequence. Plant Cell Rep. 2006; 25: 1369–1379. doi: 10.1007/s00299-006-0196-4 [DOI] [PubMed] [Google Scholar]
  • 66.Poczai P, Hyvönen J. Identification and characterization of plastid trnF(GAA) pseudogenes in four species of Solanum (Solanaceae). Biotech Lett. 2011; 33: 2317–2323. [DOI] [PubMed] [Google Scholar]
  • 67.Poczai P, Hyvönen J. Plastid trnF pseudogenes are present in Jalotmata, the sister genus of Solanum (Solanaceae): molecular evolution of tandemly repeated structural mutations. Gene. 2013; 530:143–150. doi: 10.1016/j.gene.2013.08.013 [DOI] [PubMed] [Google Scholar]
  • 68.Poczai P, Hyvönen J. Discovery of novel plastid phenylalanine (trnF) pseudogenes defines a distinctive clade in Solanaceae. SpringerPlus. 2013; 2: 459 doi: 10.1186/2193-1801-2-459 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Schmitz-Linneweber C, Regel R, Du TG, Hupfer H, Herrmann RG, Maier RM. The plastid chromosome of Atropa belladonna and its comparison with that of Nicotiana tabacum: the role of RNA editing in generating divergence in the process of plant speciation. Mol Biol Evol. 2002; 19: 1602–1612. doi: 10.1093/oxfordjournals.molbev.a004222 [DOI] [PubMed] [Google Scholar]
  • 70.Kahlau S, Aspinall S, Gray JC, Bock R. Sequence of the tomato chloroplast DNA and evolutionary comparison of solanaceous plastid genomes. J Mol Evol. 2006; 63: 194–207. doi: 10.1007/s00239-005-0254-5 [DOI] [PubMed] [Google Scholar]
  • 71.Daniell H, Lee S-B, Grevich J, Saski C, Quesada-Vargas T, Guda C et al. Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomes. Theor Appl Genet. 2006; 112: 1503–1518. doi: 10.1007/s00122-006-0254-x [DOI] [PubMed] [Google Scholar]
  • 72.Jo YD, Park J, Kim J, Song W, Hur C-G, Lee Y-H et al. Complete sequencing and comparative analyses of the pepper (Capsicum annuum L.) plastome revealed high frequency of tandem repeats and large insertion/deletions on pepper plastome. Plant Cell Rep. 2011; 30: 217–229. doi: 10.1007/s00299-010-0929-2 [DOI] [PubMed] [Google Scholar]
  • 73.Sanchez-Puerta MV, Abbona CC. The chloroplast genome of Hyoscyamus niger and a phylogenetic study of the tribe Hyoscyameae (Solanaceae). PLoS ONE. 2014; 9: e98353 doi: 10.1371/journal.pone.0098353 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Yang Y, Yuanye D, Qing L, Jinjian L, Xiwen L, Yitao W. Complete chloroplast genome sequence of poisonous and medicinal plant Datura stramonium: organizations and implications for genetic engineering. PLoS ONE. 2014; 9: e110656 doi: 10.1371/journal.pone.0110656 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Samson N, Bausher MG, Lee S-B, Jansen RK, Daniell H. The complete nucleotide sequence of the coffee (Coffea arabica L.) chloroplast genome: organization and implications for biotechnology and phylogenetic relationships amongst angiosperms. Plant Biotech. J. 2007; 5:339–353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Shi C, Wang S, Xia E-H, Jiang J-J, Zeng F-C, Gao L-Z. Full transcription of the chloroplast genome in photosynthetic eukaryotes. Sci Rep. 2016; 6:30135 doi: 10.1038/srep30135 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Woodbury NW, Roberts LL, Palmer JD, Thompson WF. A transcription map of the pea chloroplast genome. Curr Genet. 1988; 14: 75–89. [Google Scholar]
  • 78.Zhelyazkova P, Sharma CM, Förstner KU, Liere K, Vogel J, Börner T. The primary transcriptome of barley chloroplasts: numerous noncoding RNAs and the dominating role of the plastid-encoded RNA polymerase. Plan Cell. 2012; 24: 1123–1136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Germain A, Hotto AM, Barkan A, Stern DB. RNA processing and decay in plastids. WIREs RNA. 2013; 4: 295–316. doi: 10.1002/wrna.1161 [DOI] [PubMed] [Google Scholar]
  • 80.Hotto AM, Germain A, Stern DB. Plastid non-coding RNAs: emerging candidates for gene regulation. Trends Plant Sci. 2012; 17: 737–744. doi: 10.1016/j.tplants.2012.08.002 [DOI] [PubMed] [Google Scholar]
  • 81.Jacquier A. The complex eukaryotic transcriptome: unexpected pervasive transcription and novel small RNAs. Nature Rev Genet. 2009; 10: 833–844. doi: 10.1038/nrg2683 [DOI] [PubMed] [Google Scholar]
  • 82.Shi C, Liu Y, Huang H, Xia E-H, Zhang H-B, Gao L-Z. Contradiction between plastid gene transcription and function due to complex posttranscriptional splicing: an exemplary study of ycf15 function and evolution in angiosperms. PLoS ONE. 2013; 8: e59620 doi: 10.1371/journal.pone.0059620 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Hoch B, Maier RM, Appel K, Igloi GL, Kössel H. Editing of a chloroplast mRNA by creation of an initiation codon. Nature. 1991; 353: 178–180. doi: 10.1038/353178a0 [DOI] [PubMed] [Google Scholar]
  • 84.Tsudzuki T, Wakasugi T, Sugiura M. Comparative analysis of RNA editing sites in higher plant chloroplasts. J Mol Evol. 2001; 53: 327–332. doi: 10.1007/s002390010222 [DOI] [PubMed] [Google Scholar]
  • 85.Oldenkott B, Yamaguchi K, Tsuji-Tsukinoki S, Knie N, Knoop V. Chloroplast RNA editing going extreme: more than 3400 events of C-to-U editing in the chloroplast transcriptome of the lycophyte Selaginella uncinata. RNA. 2014; 20: 1499–1506. doi: 10.1261/rna.045575.114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Lee J, Kang Y, Shin SC, Park H, Lee H. Combined analysis of the chloroplast genome and transcriptome of the antarctic vascular plant Deschampsia antarctica Desv. PLoS ONE. 2014; 9: e92501 doi: 10.1371/journal.pone.0092501 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Wang W, Zhang W, Wu Y, Maliga P, Messing J. RNA Editing in chloroplasts of Spirodela polyrhiza, an aquatic monocotelydonous species. PLoS ONE. 2015; 10: e0140285 doi: 10.1371/journal.pone.0140285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Tillich M, Lehwark P, Morton BR, Maier UG. The evolution of chloroplast RNA editing. Mol Biol Evol. 2006; 23: 1912–1921. doi: 10.1093/molbev/msl054 [DOI] [PubMed] [Google Scholar]
  • 89.Nagy E, Hegedűs G, Taller J, Kutasy B, Virág E. Illumina sequencing of the chloroplast genome of common ragweed (Ambrosia artemisiifolia L.) Data Brief. 2017; 15:606–611. doi: 10.1016/j.dib.2017.10.009 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Data. MAFFT sequence alignment for 35 complete plastid genome sequences used in phylogenetic analysis.

(RAR)

S2 Data. NEXUS file containing the binary coding used to map genomic changes appearing in the chloroplast genome.

(RAR)

S3 Data. Annotated checklist of SSRs in Solanum dulcamara plastid genome hits founds by MISA.

(RAR)

S4 Data. Reannotation file of Solanaceae plastid genomes in Geneious format, accessible with 7.1 or later version.

(RAR)

S5 Data. Reannotation files in GFF and GB file format.

(ZIP)

S1 Table. NCBI GenBank accession numbers used in this study.

(DOCX)

S2 Table. List of genes in the chloroplast genome of bittersweet.

(DOCX)

S3 Table. The genes having intron in the Solanum dulcamara plastid genome and the length of the exons and introns.

(DOCX)

S4 Table. Comparison of major features of Solanum dulcamara and nine Solanaceae plastid genomes.

(DOCX)

S5 Table. Relative synonymous codon usage (RSCU) of Solanum dulcamara is given in parentheses following the codon frequency.

(DOCX)

S6 Table. Estimates of average evolutionary divergence over 80 protein coding-gene sequences from Solanaceae.

(DOCX)

S7 Table. Total number of perfect simple sequence repeats (SSRs) identified within the chloroplast genome of Solanum dulcamara.

(DOCX)

S8 Table. List of annotation errors found in Solanaceae chloroplast genomes.

(XLSX)

S1 Fig. Best scoring maximum likelihood trees obtained with RAxML and the most parsimonious tree generated with TNT.

(DOCX)

S2 Fig. Visualization alignment of chloroplast genome sequences with mVISTA-based identity plots.

(PNG)

S3 Fig. Alignment of the sprA gene in Solanaceae.

(PDF)

S4 Fig. RNAseq reads mapped to the genomic region of ycf15 pseudogene.

(PDF)

Data Availability Statement

The chloroplast genome data is available from NCBI under accession number KY863443.


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES