Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2019 Dec 20;14(12):e0226865. doi: 10.1371/journal.pone.0226865

Complete chloroplast genomes of two Siraitia Merrill species: Comparative analysis, positive selection and novel molecular marker development

Hongwu Shi 1,#, Meng Yang 1,#, Changming Mo 2, Wenjuan Xie 3, Chang Liu 1, Bin Wu 1,*, Xiaojun Ma 1,*
Editor: Shashi Kumar4
PMCID: PMC6924677  PMID: 31860647

Abstract

Siraitia grosvenorii fruit, known as Luo-Han-Guo, has been used as a traditional Chinese medicine for many years, and mogrosides are its primary active ingredients. Unfortunately, Siraitia siamensis, its wild relative, might be misused due to its indistinguishable appearance, not only threatening the reliability of the medication but also partly exacerbating wild resource scarcity. Therefore, high-resolution genetic markers must be developed to discriminate between these species. Here, the complete chloroplast genomes of S. grosvenorii and S. siamensis were assembled and analyzed for the first time; they were 158,757 and 159,190 bp in length, respectively, and possessed conserved quadripartite circular structures. Both contained 134 annotated genes, including 8 rRNA, 37 tRNA and 89 protein-coding genes. Twenty divergences (Pi > 0.03) were found in the intergenic regions. Nine protein-coding genes, accD, atpA, atpE, atpF, clpP, ndhF, psbH, rbcL, and rpoC2, underwent selection within Cucurbitaceae. Phylogenetic relationship analysis indicated that these two species originated from the same ancestor. Finally, four pairs of molecular markers were developed to distinguish the two species. The results of this study will be beneficial for taxonomic research, identification and conservation of Siraitia Merrill wild resources in the future.

Introduction

Siraitia plants are important perennial vines belonging to the fourth most economically important plant family, Cucurbitaceae, and the genus has been widely cultivated as economic crops in southern China and northern Thailand [1]. Among these crops, S. grosvenorii, a traditional Chinese medicinal plant native to Guangxi, China, has been cultivated for approximately 200 years. The fruit of S. grosvenorii, Luo-Han-Guo, has been used as a traditional Chinese medicine for the treatment of lung congestion, cold, and sore throat [2,3]. The primary active ingredients of S. grosvenorii are mogrosides, which are a class of cucurbitane-type triterpenoids, including mogrosides IV, V and VI. Modern pharmacology studies have shown that mogrosides have antidiabetic, antioxidative and anti-inflammatory effects [4,5]. Mogrosides are also natural zero-calorie sweeteners and have been used as sugar substitutes; mogroside V is 250 times sweeter than sucrose [6]. Compared to S. grosvenorii, S. siamensis has advantages in disease resistance and fruit set percentage [7]. Siamenoside I, a kind of mogroside, was separated from the fruit of S. siamensis and is approximately 560 times sweeter than sucrose [8], and is about 1.4 fold sweeter than aspartame. In recent years, people have paid more attention to developing and utilizing Siraitia germplasm resources because of the importance of mogrosides in sweetener development.

Many studies have focused on the improvement of cultivated varieties and excavation of the potential medicinal value of medicinal plants [911], as well as identification of the wild species [12,13]. Among Siraitia plant materials, only S. grosvenorii fruit has been stipulated to have medicinal use and is listed in the latest edition of the Chinese Pharmacopoeia [14]. Thus, indiscriminate use of wild relatives, such as S. siamensis, might cause Luo-Han-Guo’s poor therapeutic effect. On the other hand, both S. grosvenorii and S. siamensis are dioecious and have a low natural pollination rate, leading to few fruits, although seed traits are important indicators for identifying these two species [15]. Most of the Siraitia plants origin privately from wild resources without professional identification and named with ordinary variety names [16]. These phenomena suggest that the cultivation of Siraitia species has been immethodical and nonstandard on some level. Moreover, the lack of an effective approach for distinguishing among Siraitia species has hindered genetic diversity studies and at least partly led to the gradual loss of some varieties. Thus, high-resolution molecular markers are urgently needed to solve these problems.

Universal molecular markers, such as ITS, rbcL and psbA, are widely used for identifying some species rapidly and accurately [1719], but they cannot distinguish wild relatives. The chloroplast is a vital and semiautonomous plant cell organelle and has essential roles in photosynthesis and carbon fixation [20,21]. Although most plant chloroplast genomes display highly conserved structures, some structural rearrangements, including inverted repeat (IR) loss, gene loss and indels, are the result of adaptation to their environments [22]; thus, several highly variable regions could be developed as markers for species identification [23], such as indel and single nucleotide polymorphism (SNP) markers for Panax ginseng subspecies [24] and indel markers for Ipomoea nil and Ipomoea purpurea [25]. In addition, abundant closely related species could be identified by combining several markers, such as two indel markers for the identification of three Aconitum species [26].

In this study, the complete chloroplast genomes of S. grosvenorii and S. siamensis were assembled and analyzed, providing the first two sequences in Siraitia species. Comparative analyses revealed that the IR regions and coding sequence regions are highly conserved, and several higher-variation regions were primarily located in intergenic regions. Phylogenetic relationship analysis supported the position of two species in the basal lineage of Cucurbitaceae. The identification of nine protein-coding genes in several sites undergoing positive selection contributes to further investigation on the adaptive evolution of plants in ecosystems. Finally, four novel molecular markers (GSPC-F/R, GSPR-F/R, GSPB-F/R and GSPY-F/R) were developed to distinguish the two species. Overall, the sequencing and analysis of two species of chloroplast genomes in Siraitia will be beneficial for enhancing medicinal safety and for the species identification and conservation of wild Siraitia species, and it will provide new insight for the understanding of plant adaptive evolution in ecosystems.

Materials and methods

Plant materials, DNA extraction, and sequencing

The fresh leaves of two-year-old S. grosvenorii and S. siamensis plants were collected from Guangxi Medicinal Botanical Garden of the Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College (Nanning, China), then frozen at -80°C until further use. Total DNA was extracted from approximately 100 mg samples using a plant genomic DNA kit (DP305) (Tiangen Biotech Co., Ltd., Beijing, China), DNA quality was assessed in a Nanodrop 2000 spectrophotometer (Thermo Scientific), and DNA integrity was evaluated using a 1.0% (w/v) agarose gel. DNA samples from each species were used to prepare two separate libraries with an average insert size of 500 bp and sequenced using an Illumina HiSeq 4000 (Illumina Inc., San Diego, CA, USA) with a standard protocol.

Chloroplast genome assembly

First, the low-quality reads sequenced from all the samples were filtered by Trimmomatic software [27]. Then, the trimmed reads, including nuclear and organelle genome data, were used to assemble the chloroplast genome. All chloroplast genomes of plants assessed in the National Center for Biotechnology Information (NCBI) were used to search against Illumina paired-end reads using SRA-BLASTN with an E-value cutoff of 1e-5 [28]. Clean reads with high homology were considered plastome reads and used for downstream genome assembly. SPAdes (v3.10.1) and CLC Genomics Workbench (v7) were used for the de novo genome assembly, the SPAdes using for the assembling that the parameters were set as “-k 21,33,55,77,99,127 –careful” [29]. The contigs obtained were identified by Gepard and spanned the entire plastome [30]. All the identified contigs were assembled using the SeqMan module of DNASTAR (v11.0) [31]. Then, three scaffolds, including the large single-copy (LSC), the IR, and small single-copy (SSC) regions, were obtained. The specific de-novo genome assembler NOVOPlasty was also used to reassemble the two species chloroplast genome for verification [32].To verify the assembly accuracy, the four boundaries between the single-copy (SC) regions and IR regions of the assembled sequence were confirmed by PCR amplification and Sanger sequencing, and the sequences of the primers are listed in S1 Table.

Genome annotation, repeats and simple sequence repeats (SSRs) analyses

The online program Dual Organellar GenoMe Annotator (DOGMA, http://dogma.ccbb.utexas.edu/) and the Chloroplast Genome Annotation, Visualization, Analysis, and GenBank Submission (CPGAVAS) were used to annotate the two genomes [33,34]. The protein-coding sequences were verified by Blastp against the GenBank database. The tRNA genes were identified by tRNAscan-SE and DOGMA [33,35]. Then, manual corrections of the positions of the start and stop codons and the intron/exon boundaries were performed based on the entries in the plastome database using the Apollo program (v.1.11.8) [36,37]. The circular genomic maps were drawn using OrganellarGenomeDRAW (v1.2) with the default setting and checked manually [38]. The newly generated complete chloroplast genome sequences were submitted to GenBank.

The software CodonW (1.4.4) was used to investigate the distribution of codon usage with the relative synonymous codon usage (RSCU) ratio [39]. The codon usage frequency and GC content of both species were calculated using the programs Cusp and Compseq in EMBOSS (v.6.3.1) [40,41]. Repeats, including forward, palindromic, reverse, and complement, were identified by REPuter with the following setting parameters: 3 for Hamming distance and 30 for minimal repeat size [42]. SSRs were detected by MISA software with the parameters set as reported previously, and the cutoffs of the unit numbers for mono-, di-, tri-, tetra-, penta-, and hexa- nucleotides were 8, 4, 4, 3, 3, and 3, respectively [43].

Comparative genomic and selective pressure analyses

The mVISTA program in Shuffle-LAGAN mode with default parameters was used to compare the five complete chloroplast genomes using S. grosvenorii chloroplast genomes as a reference [44,45]. The sequence divergence of the chloroplast genomes was analyzed with a sliding window using DnaSP (v5.10) [46], and the step size was set to 200 bp with a 600 bp window length. Moreover, a total of 104 intergenic regions and 77 exons were manually extracted among four species, and the corresponding sequences aligned using ClustalW2 (v2.0.12) [47] were used to calculate the nucleotide variability (Pi) using DnaSP (v5.10). Selective pressure was analyzed for consensus protein-coding genes among twelve genomes from Cucurbitaceae species. EasyCodeML software with the site model was performed to calculate the nonsynonymous (Ka) and synonymous (Ks) substitution ratios and likelihood ratio tests (LRTs). The values of both Ka/Ks (ω) and the LRTs were coupled to evaluate the selection on amino acid sites [48].

Phylogenetic analyses

A total of 30 chloroplast genomes, including 28 from Cucurbitaceae, were used for the phylogenetic analyses, including Nicotiana tabacum and Arabidopsis thaliana as outgroups. In this study, 28 chloroplast genome sequences were downloaded from NCBI GenBank (S2 Table). The software MAFFT was used to generate alignments of 64 consensus protein-coding gene sequences [49], and then the alignments were manually adjusted by BioEdit [50]. Maximum likelihood (ML) analysis was carried out based on the Tamura-Nei model using a heuristic search for initial trees that were the most appropriate by Modeltest 3.7 [51]. Maximum parsimony (MP) analysis was conducted using PAUP (v4.0a) [52]. Bootstrap analysis was performed with 1000 replicates. Finally, the reconstructed trees were visualized using Figtree (v1.4.3) (http://tree.bio.ed.ac.uk/software/figtree/).

Molecular marker development and validation

The molecular regions between the two Siraitia species were examined by the alignment and comparison of mVISTA similarities. The primers for the molecular markers were designed using the software Primer Premier 5.0 [53]. The accuracy of the molecular markers was verified using PCR amplification, and PCR was conducted with the following program: initial denaturation at 94°C for 2 minutes; followed by 35 cycles of amplification at 94°C for 20 seconds, 56°C for 20 seconds, and 72°C for 2 minutes; and final extension at 72°C for 10 minutes. The PCR products were separated with 1.0% (w/v) agarose gel for 20 minutes at 180 volts. Then, the DNA fragments were purified and sequenced.

Results and discussion

General features of the chloroplast genomes

The chloroplast genome structures of the two species are similar to those of other Cucurbitaceae species, which display a single circular molecule with a typical quadripartite structure. The complete chloroplast genome of S. grosvenorii was 158,757 bp in length and consists of a pair of IRs (IRa and IRb, each 26,288 bp in length) separated by a LSC (87,625 bp) region and a SSC (18,556 bp) region (Fig 1, S1 Fig and S3 Table). The complete chloroplast genome of S. siamensis possessed the same structure. The IRa and IRb were 26,289 bp each, the LSC was 88,069 bp, and the SSC was 18,543 bp. In total, the length of the whole genome was 159,190 bp. (Fig 1, S3 Table). The length of each region is similar to those of most plant chloroplast genomes reported previously [54]. The sequencing data of S. grosvenorii and S. siamensis were deposited in GenBank under the accession numbers MK755853 and MK755854, respectively.

Fig 1. Circular Gene map of the complete chloroplast genomes of S. grosvenorii and S. siamensis.

Fig 1

The quadripartite structure includes two copies of an IR region (IRa and IRb) that separated by (LSC) and SSC regions. Genes drawn in the circle are the transcribed clockwise, and those on the outside are transcribed counter-clockwise. The darker gray area in the inner circle show the GC content, whereas the lighter corresponds to AT content. Different genes groups are colored.

Further analysis results revealed that the two species had approximately 36.8% GC content (S3 Table), which was distributed unevenly across the whole chloroplast genome. In contrast to the LSC regions, the GC contents in the IR regions displayed a higher value across the whole chloroplast genome, 42.8% in both S. grosvenorii and S. siamensis, possibly resulting from rRNA genes (rrn16, rrn23, rrn4.5 and rrn5, 55.3% GC content in both cases) with high GC content located in the IR regions [55]; the higher GC content in the IR region was regarded as an indicator of species affinity [56]. Moreover, the LSC regions had a GC content of 34.6% in both species, and the lowest values of 30.5% and 30.7% were found in the SSC regions in S. grosvenorii and S. siamensis, respectively. In addition, the GC contents of the protein-coding regions were 37.8% in S. grosvenorii and 37.9% in S. siamensis, and the percentages of GC content for the first, second, and third codon positions were 45.6%, 38.0% and 29.8% in S. grosvenorii and 45.6%, 62.0% and 29.9% in S. siamensis, respectively (S3 Table). A bias toward using thymine (T) and adenine (A) in the third codon position has also been observed in other land plant plastomes [5759]. The skewed GC distribution across the whole genome might be associated with the position of the origin and terminals for gene replication [6062].

The two species exhibited the same gene content and arrangement in the chloroplast genome. A total of 134 genes were identified, including 89 protein-coding genes, 37 transfer RNA (tRNA) genes and four ribosomal RNA (rRNA) genes. Eight protein-coding genes, seven tRNA genes, and four rRNA genes were duplicated in the IR regions in both species (Fig 1); ycf15-orf was duplicated, and its start codon was GTG. The data revealed that 23 genes contain introns in both genomes, 21 with one intron and the rest with two introns, and the genes clpP and ycf3 both contained three exons (S4 and S5 Tables). The gene clpP was related to energy transformation [63], and ycf3 was necessary for the stable accumulation of photosystem I complexes [64]. Introns found in functional genes play a significant role in the regulation of gene expression, which can trigger desirable biological traits at particular times [65,66]. In addition, the rps12 gene contained three exons and one intron because of trans-splicing, which resulted in a 5’ end exon located in the LSC region, whereas the remaining exons were located in the IRs (S5 Table). Therefore, rps12 was duplicated in the IR region. Furthermore, the results of gene location analysis revealed twelve genes with partial overlaps in their sequences: trnK-UUU/matK, psbD/psbC, trnM-CAU/trnT-GGU, atpE/atpB, rps3/rpl22, and trnP-UGG/trnP-GGG.

The basic characteristics of the chloroplast genomes of two Siraitia species and six other species from Cucurbitaceae are shown in S6 Table. Comparative analysis showed that the lengths of the eight genomes ranged from 155,293 bp (Cucumis sativus) to 159,190 bp (S. siamensis), and the overall GC content percentage of C. sativus (37.2%) was higher than that of any other genome (36.7%-37.1%). However, little difference could be found in gene number, gene type or GC content between S. grosvenorii and S. siamensis, which suggests that we should focus on other areas to find variation, such as intergenic spacers.

IR/SC boundaries and IR contraction and expansion

The contraction and expansion of the IR regions account for common evolutionary events and are a major cause of differences in chloroplast genome size, and evaluating them could shed some light on the evolution of some taxa [67,68]. A detailed comparison of the IR/SC boundary regions of twelve species is shown in Fig 2. A. thaliana and N. tabacum were set as outgroups, and the rest were Cucurbitaceae. From the comparison, we noticed that the lengths of ycf1 pseudogenes of both Siraitia species were both 1,181 bp, almost as long as those of the other five Cucurbitaceae species, except Lagenaria siceraria (28 bp) and Momordica charantia (29 bp). In addition, the ndhF gene, located in the SSC, reaches 134–135 bp across the IRb/SSC boundary in both L. siceraria and M. charantia, while the corresponding region was 7–12 bp in most other Cucurbitaceae species. Moreover, the gene rps19 is located in the LSC region with 265–277 bp across the LSC/IRb boundary in Trichosanthes kirilowii, Hemsleya lijiangensis and Gynostemma laxiflorum, whereas in other species, it was located away from the LSC/IRb by 68–208 bp. The rpl2 gene, duplicated in the IRs in most species, was present as only one copy in the IRb region in M. charantia, Cucurbita pepo and Citrullus lanatus, which might result from the location of the LSC/IRb boundary. All of these phenomena were related to the contraction/expansion of two IR regions in the complete chloroplast genomes.

Fig 2. Comparison of border distance between adjacent genes and junctions of the LSC, SSC and IRs regions among twelve chloroplast genomes.

Fig 2

Numbers with arrow above the gene features mean the distance between the ends of genes and the borders sites. The figure is not to scale with respect to sequence length.

Codon usage

RSCU is a measure of nonuniform synonymous codon usage in coding sequences in which values above 1 indicate that codons are used more frequently than expected [69,70]. All the protein-coding genes were encoded with 26,509 and 26,508 codons in the S. grosvenorii and S. siamensis chloroplast genomes, respectively. Detailed codon analysis revealed that the two cases had similar codon constitutions and RSCU values (S7 Table). Among these codons, leucine (Leu) and cysteine (Cys) were, respectively, the highest (10.48%) and lowest (1.13%) prevalence amino acid codons in the two Siraitia species. In addition, most of the amino acid codons showed preferences, with exception of methionine (Met) and tryptophan (Trp), which both had RSCU values of 1. The chloroplast genomes of the two Siraitia species both had 30 biased codons with RSCU >1, and the third positions of the biased codons were A/U except for Leu (UUG); otherwise, the codons with high frequency (>30%) and fraction were Asp-GAU, Glu-GAA, Ile-AUU, Lys-AAA, Leu-UUA, Asn-AAU, and Tyr-UAU, and the bias toward these seven codons was consistent with the low content of GC in the third codon position. Fig 3 shows that the RSCU value increased with the quantity of codons coding for a specific amino acid. A strong AT bias in codon usage is common in sequences with strong codon preference and was also found in most other land plant chloroplast genomes [71,72].

Fig 3. Codon usage for 20 amino acid and stop codons in all protein-coding genes.

Fig 3

The columnar stacking diagram on the left and right of each amino acid display the codon usage within the chloroplast genome of S. grosvenorii and S. siamensis, respectively.

Repeat structure and SSRs analyses

Repeat units, which are distributed in chloroplast genomes with high frequency, play an important role in genome evolution [7375]. S2(A) Fig shows repeat structures that were longer than 30 bases in eight species. The repeats of the S. grosvenorii chloroplast genome consist of 19 forward, 24 palindromic, two reverse, and one complement. By contrast, a slightly different number of repeats was found in S. siamensis, which contained 17 forward, 20 palindromic, one reverse and no complement. In the eight-species comparison, the number of repeats for C. lanatus (31) was the lowest, and the highest number of repeats was 49 in M. charantia, G. laxiflorum, and Cucurbita maxima.

On the other hand, SSRs, which are also known as microsatellites and are distributed abundantly across genomes, are tracts of repetitive DNA with certain motifs, ranging from 1–6 or more base pairs, that are repeated typically 5–50 times [76,77]. SSRs are widely used as molecular markers for species identification, analysis of phylogenetic relationships and population genetics because of their high polymorphism rates and stable reproducibility [78,79]. Here, a total of 252 and 253 SSRs were identified by MISA software within the chloroplast genomes of S. grosvenorii and S. siamensis, respectively (S2(B) Fig), and mononucleotide repeats were largest in number, 57 and 56, respectively. Moreover, S8 Table shows that A/T mononucleotide repeats (97.2% and 97.8%, respectively) were the most common, and for dinucleotide repeats, AT/ TA (68.4% and 67.8%, respectively) was the majority. However, only one repeat unit (AAT/TTA) was found among trinucleotide repeats. In addition, AT-rich repetitive motifs were high in the remaining SSR types. The SSRs within the chloroplast genomes of both species mainly comprised AT-rich repetitive motifs, consistent with the fact that AT content (63.2%) was very high (GC content was 36.8%) in both cases. Furthermore, these results were also consistent with previous reports that proportions of short poly-A or poly-T repeats were higher than those of poly-G or poly-C within most SSRs in many plant chloroplast genomes [80,81]. Distribution of the SSRs loci in the chloroplast genome of S. grosvenorii and S. siamensis were exhibited in S9 Table.

Sequence divergence and nucleotide diversity

Complete chloroplast genomes are often used to analyze plant taxonomy, phylogenetic relationships, and genetic diversity [82]. In this study, two Siraitia species were compared with other three species in Cucurbitaceae using mVISTA software, and S. grosvenorii was set as the reference. As shown in Fig 4, sequence divergence was similar for the whole sequences of the complete chloroplast genomes. In contrast to the other two Cucurbitaceae species, M. charantia was more similar to the two Siraitia species across the complete chloroplast genomes. The data plot revealed that the noncoding region was more divergent than its coding counterparts. The two IR regions were both less divergent than the single-copy regions, which might be the result of the four highly conserved rRNAs located in the IR regions [55].

Fig 4. Structure comparison of five chloroplast genomes using mVISTA program.

Fig 4

Gray arrows and thick black lines above the alignment indicate genes with their orientation and the position of the IRs, respectively. Purple bars, blue bars, pink bars, gray bars and white peaks represent exon, Untranslated Region (UTRs), Conserved Noncoding Sequences (CNS), mRNA and genomes differences, respectively. A cut-off of 70% identity was used for the plots, and the Y-scale represents the percent identity ranging from 50% to 100%.

In addition, the nucleotide diversity of 181 regions was analyzed using DnaSP software, including 77 protein-coding genes and 104 intergenic regions among four chloroplast genomes (T. kirilowii, M. charantia, and two Siraitia species). The results revealed that intergenic regions were more divergent than protein-coding genes (Fig 5). The average nucleotide variability (Pi) in the intergenic regions was 0.01772, almost twice as much as that in protein-coding genes (Pi = 0.00950). This is consistent with previous research in angiosperm chloroplast genomes [83]. The trnL-ccsA (Pi = 0.07302) and petG-trnW (Pi = 0.06780) regions were notably variable among the intergenic regions, as were the genes ycf1 (Pi = 0.03048), psaJ (Pi = 0.02972) and atpE (Pi = 0.02447) among the protein-coding genes. ycf1 is commonly used as a representative plant DNA barcoding region [84]. Several highest-level divergences (Pi > 0.03) were found in the intergenic regions and could be developed as specific molecular markers for species identification, including trnR-atpA, ndhC-trnV, petG-trnW, rpl32-trnL, trnL-ccsA, ndhF-rpl32, psbZ-trnG, psbC-trnS, ndhG-ndhI, rps8-rpl14, ccsA-ndhD, trnT-trnL, psbK-psbI, psbA-trnK, rps15-ycf1, trnF-ndhJ, atpA-atpF, rps19-rpl2, psaC-ndhE, and petD-rpoA. It has been reported that divergent noncoding regions allow the discrimination of potential molecular markers and DNA barcodes [85]. Furthermore, sliding window analysis was also performed (S3 Fig). The average value of Pi was 0.00221 between the two Siraitia species, and higher variability was found in the LSC and SSC regions.

Fig 5. Comparison of nucleotide diversity (Pi) value for 77 protein-coding genes and 104 intergenic regions among six closely species.

Fig 5

Phylogenetic relationships of six genera within Cucurbitaceae

Chloroplast genomes are significant genomic resources for the reconstruction of accurate and high-resolution phylogenetic relationships and taxonomic status in angiosperms [86]. Complete chloroplast genomes and protein-coding genes have been widely employed to determine phylogenetic relationships at almost every taxonomic level [87]. In this study, to identify the phylogenetic positions of the two Siraitia species within the Cucurbitaceae family, we aligned 64 protein-coding sequences from 30 chloroplast genomes; A. thaliana and N. tabacum were set as outgroups, and the alignment length was 62,522 bp. The ML and MP trees displayed similar phylogenetic topologies (Fig 6). All nodes in the ML tree and MP tree were strongly supported by high bootstrap values: 22 of 27 nodes with 100% bootstrap values were found in the MP tree and 21 of 27 in the ML tree. In addition, the results illustrated that two Siraitia species were the most closely related species to M. kirilowii, and these two taxa were grouped with two species from Sicyoeae, three species from Cucurbiteae, and nine from Benincaseae, which showed a nested evolutionary relationship in the MP and ML trees. Furthermore, all species were clustered into a lineage distinct from the outgroups and strongly supported the new classification system of Cucurbitaceae [54].

Fig 6. Molecular phylogenetic relationship of Siraitieae with 64 protein-coding genes of 26 cucurbitaceae species.

Fig 6

The unrooted trees were constructed by (A) maximum parsimony (MP) and (B) maximum likelihood (ML) methods with bootstrap values ≥50%.

Selective pressure events

Synonymous substitutions (Ks) accumulate nearly neutrally, whereas nonsynonymous substitutions (Ka) are subjected to selective pressures of varying degree and direction (positive or negative); values of Ka/Ks (ω) above 1.0 indicate that the corresponding genes experience positive selection, while ω values ranging from 0.5 to 1.0 indicate relaxed selection [88]. In the current study, we performed a selective analysis of the exons of each protein-coding gene using site-specific models with four comparison models (M0 vs. M3, M1a vs. M2a, M7 vs. M8 and M7a vs. M8a, LRT threshold p ≤ 0.05) in EasyCodeML software as reported previously [48]. Among the eight models, M2a was the positive model, and p (p0, p1, p2) were the proportions of purifying selection, neutral selection, and positive selection. A total of 58 consensus protein-coding genes from 12 closely related species were evaluated with respect to selective pressure. Nine genes (accD, atpA, atpE, atpF, clpP, ndhF, psbH, rbcL, and rpoC2) were found to have undergone positive selection, and the ω2 values ranged from 2.17 to 11.44 in the M2a model (Table 1).

Table 1. The results of positive selective pressure analysis in M7 vs.M8 model.

Gene name Modle np LnL ω2 (M2a) LRTs
(2ΔLnL)
LRT P-value Positive sites
rpoC2 M8 26 -8422.120232 11.44045 43.1617 0 486 V*, 527 P**, 754 S*, 965 S**, 1024 F*, 1331 P**, 1353 F*
M7 24 -8443.701061
atpF M8 26 -1276.119906 2.17251 6.8495 0.032557126 2 E*, 8 K**, 9 K*, 14 F*, 61 E*, 97 L*
M7 24 -1279.544665
rbcL M8 26 -2716.450055 3.38881 38.4242 0.000000005 251 M*, 255 I**, 470 E*, 472 M*
M7 24 -2735.662163
atpA M8 26 -2937.475433 5.46438 14.4739 0.00071952 258 S*, 484 N*
M7 24 -2944.712359
atpE M8 26 -839.182861 7.38230 7.7637 0.020612244 36 D*, 130 G*
M7 24 -843.064731
ndhF M8 26 -5697.671745 9.36793 15.6948 0.000390771 509 I*, 686 I*
M7 24 -5705.519135
clpP M8 26 -1298.084585 3.91456 9.4032 0.009080555 11 V*, 51 Y*
M7 24 -1302.786205
accD M8 26 -3041.598734 2.91222 9.2524 0.009791966 223 I*
M7 24 -3046.224927
psbH M8 26 -387.753418 11.27276 10.1816 0.006153015 72 S**
M7 24 -392.844231

* P < 0.05

** P < 0.01

np represents the degree of freedom.

To determine which sites were subject to positive selection, naive empirical Bayes (NEB) and Bayes empirical Bayes (BEB) methods were used to analyze the location of consistent selective sites in the alignment of chloroplast genomes in the M7 vs. M8 model. The data analysis revealed that the gene rpoC2 possesses 7 positive selective sites, followed by atpF (6), rbcL (4), atpA (2), atpE (2), clpP (2), ndhF (2), accD (1), and psbH (1). All positively selected sites in these nine genes are shown in Table 1.

Among the nine positively selected genes, rpoC2 encodes the RNA polymerase β”. A comparison of rpoC2 between fertile lines of sorghum and cytoplasmic male sterile lines showed that a 165 bp deletion was identified that encodes several protein motifs involved with transcription factors; this region might play an important role in the regulation of developmental pollination [89]. The Siraitia species are dioecious, so the finding that rpoC2 evolved under positive selection might indicate that it is essential for sex differentiation. The chloroplast plays important roles in photosynthesis and carbon fixation, and six genes (atpF, rbcL, atpA, atpE, ndhF, and psbH) with essential roles in photosynthesis were positively selected in this study. The Siraitia species are distributed in southeast Asia, so requirements for sufficient light might have exerted strong selective forces on the six genes during plant evolution. This phenomenon was also found in species within the Urophysa genus, which is distributed in southwest China [75]. The clpP gene, encoding the ATP-dependent clp protease, is likely involved in the transformation of chloroplast protein and might be essential for shoot development under clpP-mediated protein degradation [63,90,91]. The positive selection of the gene clpP in our study might be associated with the evolution of the vining character within Cucurbitaceae. As for the gene accD, it encodes the β–carboxyl transferase subunit of acetyl-CoA carboxylase [92]; it is a vital gene for leaf development and has effects on leaf longevity and seed yield [93]. Expression of the gene accD might indirectly affect the efficiency of photosynthesis. These nine genes have undergone positive selection, which might be the result of adaptation to their barren environment.

Molecular markers for distinguishing S. grosvenorii and S. siamensis

In this study, several notably variable regions were found in the comparison of the two chloroplast genomes. To develop high-resolution molecular markers for the identification of these two species, the specific divergent regions, including ndhC-trnV-UAC, trnR-UCU-atpA, rpoB-trnC-GCA and the gene ycf1, were chosen as molecular marker regions, and specific primers were designed against the conserved regions (Table 2). The primers, named GSPC-F/R, GSPR-F/R and GSPB-F/R, were used for amplifying the three intergenic regions and produced different-length fragments in S. grosvenorii and S. siamensis, respectively. The gene ycf1, with high divergence, was also chosen for the development of molecular markers, and several SNP sites were found after amplification with GSPY-F/R. (Fig 7). The four molecular markers developed in this study will contribute to the identification of Siraitia species and facilitate efficient utilization and conservation of wild Siraitia resources.

Table 2. Primer identification for molecular markers.

Primer name Primer sequence (5’ to 3’) position
GSPC-F GATGAACCAAATCAAGTGGC ndhC-trnV-UAC
GSPC-R CAGAAGCAGGACGATAGAGA
GSPR-F GGTTCAAATCCTATTGGACG trnR-UCU-atpA
GSPR-R GGCAAGAGGTCAACGATTAC
GSPB-F CTGTTTCCTACTCACACGAG rpoB-trnC-GCA
GSPB-R GGATTGGCTCTATCTCTTCG
GSPY-F GACGACTTGCTTTAGCGTTG ycf1
GSPY-R GGACTAAACAGGAACAAGAG

Fig 7. Schematic diagram displayed the four novel molecular markers.

Fig 7

The indel makers in the intergenic spacers, including GSPC-F/R, GSPR-F/R, and SCPB-F/R, and ycf1 SNP were verified in Siraitia species with five individuals. (A) Sequencing results showed that PCR amplification by GSPC-F/R between the ndhC and trnV-UAC in S. grosvenorii and S. siamensis were 790 bp and 808 bp, respectively. The difference can also be found by GSPR-F/R (B), and SCPB-F/R (C) between the two Siraitia species. (D) GSPY-F/R marker for ycf1 SNP sites were validated in each case, which were effective to discriminate the two Siraitia species.

The lack of an effective approach made the position of different species in the genus unreliable until the Siraitia Merrill was acknowledged in 1980 [15]. The phylogenetic analysis in this study showed that the two Siraitia species were the most closely related species to M. kirilowii, which explains the reason that these species were placed into Momordica L. with the name of Momordica grosvenorii Swingle initially [94]. To the best of our knowledge, only seven species within the genus Siraitia have been confirmed based on morphological characteristics [15], although some varieties may remain undiscovered. Unfortunately, the increasing demand for high production of these species has brought the wild resources to the verge of depletion. The comparative analysis of chloroplast genomes in this study revealed that several highly variable intergenic regions will contribute to the development of specific markers to support the conservation of wild Siraitia varieties. Among the seven species, S. grosvenorii, which is recorded as both a medicinal and an edible species [4], has been widely cultivated as an important commercial crop. S. siamensis is known to have better disease resistance and setting percentage and considerable siamenoside I content. Different germplasm resources should be distinguished accurately and used for different purposes. Not only do adulterants of raw medicinal materials threaten the safety and reliability of food and medicine, but they also exacerbate the scarcity of wild resources in similar species. Our development of novel molecular markers will be of great use in the species authentication and conservation of Siraitia wild resources in the future.

Conclusions

In the current study, two complete chloroplast genomes of the Siraitia genus were assembled and analyzed for the first time. In a comparison with other species within Cucurbitaceae, several highly divergent noncoding regions were identified that would be beneficial for developing high-resolution molecular markers. Phylogenetic relationship analysis supported that S. grosvenorii and S. siamensis originated from the same ancestor, consistent with previous studies. Furthermore, 9 protein-coding genes were found to undergo selection, which might be the result of adaptation to the environment. Finally, molecular markers (GSPC-F/R, GSPR-F/R, GSPB-F/R and GSPY-F/R) were developed to distinguish the two species. The results in this study will be beneficial for taxonomic research, species identification, and conservation of the genetic diversity of Siraitia wild resources in the future.

Supporting information

S1 Fig. Chloroplast genomes of S. grosvenorii and S. siamensis in linear form.

(TIF)

S2 Fig. Distribution of the number of different type repeats and SSRs.

(A) Repeat sequence in eight chloroplast genomes. Repeat sequences were identified by REPuter with length ≥30bp and sequence identified ≥90%. F, P, R, and C are the abbreviation of repeat type F (forward), P (palindrome), R (reverse) and C (complement), respectively. Different length repeat sequences are colored correspondingly. (B) Analysis of simple sequence repeat (SSRs) in chloroplast genomes of five species.

(TIF)

S3 Fig. Sliding window analysis of the whole chloroplast genome.

Window length: 600 sites, Step size: 200 sites. X-axis: position of the midpoint of a window; Y-axis: nucleotide diversity (π) of each window. (A) Pi among S. grosvenorii and S. siamensis; (B) Pi among six species of Cucurbitaceae.

(TIF)

S1 Table. Primer sequence at the boundaries between single cope and IR regions.

(DOCX)

S2 Table. List of chloroplast genome sequence used in the study.

(DOCX)

S3 Table. Base composition in the chloroplast genomes of S. grosvenorii and S. siamensis.

(DOCX)

S4 Table. Genes contained in the chloroplast genomes of S. grosvenorii and S. siamensis.

(DOCX)

S5 Table. Location information of genes with introns in the chloroplast genome of S. grosvenorii and S. siamensis.

(DOCX)

S6 Table. Comparisons among the chloroplast genome characteristics of S. grosvenorii and S. siamensis, and other six Cucurbitaceae species.

(DOCX)

S7 Table. Codon usage and codon-anticodon recognition in all protein-coding genes of the chloroplast genomes of two Siraitia species.

(DOCX)

S8 Table. Types and amounts of SSRs in the S. grosvenorii and S. siamensis chloroplast genomes.

(DOCX)

S9 Table. Distribution of the SSRs loci in the chloroplast genome of S. grosvenorii and S. siamensis.

(XLSX)

Acknowledgments

The authors acknowledge Mr. Jianguo Zhou, Miss. Hui Zhang and Miss. Mei Jiang in Institute of Medicinal Plant Development for their kind suggestions in the raw data process. We also thank the Guangxi Medicinal Botanical Garden of the Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College (Nanning, China) for supplying the plant materials as a generous gift.

Data Availability

The chloroplast genome sequencing data were deposited in GenBank under accession numbers MK755853 and MK755854, respectively.

Funding Statement

This work was supported by Beijing Municipal Natural Science Foundation (5172028), the National Natural Science Foundation of China (81373914, 81573521), and CAMS Innovation Fund for Medical Sciences (CIFMS) (No. 2017-I2M-1-013).

References

  • 1.Jeffrey C (1980) A review of the Cucurbitaceae. Botanical Journal of the Linnean Society 81: 233–247. 10.1111/j.1095-8339.1980.tb01676.x [DOI] [Google Scholar]
  • 2.Tang Q, Ma X, Mo C, Wilson IW, Song C, Zhao H, et al. (2011) An efficient approach to finding Siraitia grosvenorii triterpene biosynthetic genes by RNA-seq and digital gene expression analysis. BMC Genomics 12: 343 10.1186/1471-2164-12-343 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Philippe RN, De MM, Anderson J, Ajikumar PK (2014) Biotechnological production of natural zero-calorie sweeteners. Curr Opin Biotechnol 26: 155–161. 10.1016/j.copbio.2014.01.004 [DOI] [PubMed] [Google Scholar]
  • 4.Chun LI, Lin LM, Feng S, Wang ZM, Huo HR, Li D, et al. (2014) Chemistry and pharmacology of Siraitia grosvenorii: A review. Chinese Journal of Natural Medicines 12: 89–102. 10.1016/S1875-5364(14)60015-7 [DOI] [PubMed] [Google Scholar]
  • 5.Deng F, Liang X, Yang L, Liu Q, Liu H (2013) Analysis of Mogroside V in Siraitia grosvenorii with micelle-mediated cloud-Point extraction. Phytochemical Analysis 24: 381–385. 10.1002/pca.2420 [DOI] [PubMed] [Google Scholar]
  • 6.Itkin M, Davidovich-Rikanati R, Cohen S, Portnoy V, Doron-Faigenboim A, Oren E, et al. (2016) The biosynthetic pathway of the nonsugar, high-intensity sweetener mogroside V from Siraitia grosvenorii. Proceedings of the National Academy of Sciences of the United States of America 113: E7619–E7628. 10.1073/pnas.1604828113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Zhong S, Lin W, Tang B (1993) Characteristics and cultivation of Siraitia siamensis. Guangxi Argricultural Sciences 5: 218–218. [Google Scholar]
  • 8.Kasai R, Nie RL, Nashi K, Ohtani K, Zhou J, Tao GD, et al. (1989) Sweet cucurbitane glycosides from fruits of Siraitia siamensis (chi-zi luo-han-guo), a Chinese folk medicine. Agricultural and Biological Chemistry 53: 3347–3349. 10.1271/bbb1961.53.3347 [DOI] [Google Scholar]
  • 9.Koc S, Isgor BS, Isgor YG, Shomali MN, Yildirim O (2015) The potential medicinal value of plants from Asteraceae family with antioxidant defense enzymes as biological targets. Pharmaceutical Biology 53: 746–751. 10.3109/13880209.2014.942788 [DOI] [PubMed] [Google Scholar]
  • 10.Misra RS, Sriram S, Govil JN, Pandey J, Shivakumar BG, Singh VK (2002) Medicinal value and export potential of tropical tuber crops. Crop Improvement Production Technology Trade and Commerce: 376–386. [Google Scholar]
  • 11.Zheng Y, Zhang WJ, Wang XM (2013) Triptolide with potential medicinal value for diseases of the central nervous system. Cns Neuroscience and Therapeutics 19: 76–82. 10.1111/cns.12039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Rivière-Dobigny T, Doan LP, Quang NL, Maillard JC, Michaux J (2010) Species identification, molecular sexing and genotyping using non-invasive approaches in two wild Bovids species: Bos gaurus and Bos javanicus. Zoo Biology 28: 127–136. [DOI] [PubMed] [Google Scholar]
  • 13.Caroline T, Segatto ALA, Júlia B, Bonatto SL, Freitas LB (2015) Genetic differentiation and hybrid identification using microsatellite markers in closely related wild species. AoB Plants 7: plv084 10.1093/aobpla/plv084 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Pharmacopoeia of the People's of Republic China 2015 Edition. In: Commission CP, editor. Beijing: China Medical Science Press. [Google Scholar]
  • 15.Lu AM, Zhang ZY (1984) The genus Siraitia Merr. in China. Guihaia 4: 27–33. [Google Scholar]
  • 16.Ma XJ, Mo CM, Bai LH, Feng SX (2008) A New Siraitia grosvenorii Cultivar 'Yongqing 1'. Acta Horticulturae Sinica 35: 1855–1855. 10.16420/j.issn.0513-353x.2008.12.028 [DOI] [Google Scholar]
  • 17.Trobajo R, Mann DG, Clavero E, Evans KM, Vanormelingen P, Mcgregor RC (2010) The use of partial cox1, rbcL and LSU rDNA sequences for phylogenetics and species identification within the Nitzschia palea species complex (Bacillariophyceae). European Journal of Phycology 45: 413–425. 10.1080/09670262.2010.498586 [DOI] [Google Scholar]
  • 18.Turenne CY, Sanche SE, Hoban DJ, Karlowsky JA, Kabani AM (1999) Rapid identification of fungi by using the ITS2 genetic region and an automated fluorescent capillary electrophoresis system. Journal of Clinical Microbiology 37: 1846–1851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Liu C, Liang D, Gao T, Pang X, Song J, Yao H, et al. (2011) PTIGS-IdIt, a system for species identification by DNA sequences of the psbA-trnH intergenic spacer region. BMC Bioinformatics 12: S4 10.1186/1471-2105-12-S13-S4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Park J (2001) The Cell: A molecular approach, second edition. Yale Journal of Biology and Medicine 74: 361–365. 10.1016/0014-5793(78)81037-0 [DOI] [Google Scholar]
  • 21.Jansen RK, Ruhlman TA (2012) Plastid genomes of seed plants. Genomics of Chloroplasts and Mitochondria 35: 103–126. 10.1007/978-94-007-2920-9_5 [DOI] [Google Scholar]
  • 22.Etienne D, Sota F, Catherine FS, Mark B, Ian S (2011) Rampant gene loss in the underground orchid Rhizanthella gardneri highlights evolutionary constraints on plastid genomes. Molecular Biology and Evolution 28: 2077–2086. 10.1093/molbev/msr028 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Parks M, Cronn R, Liston A (2009) Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes. BMC Biology 7: 84 10.1186/1741-7007-7-84 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kim K, Lee SC, Lee J, Lee HO, Joh HJ, Kim NH, et al. (2015) Comprehensive survey of genetic diversity in chloroplast genomes and 45S nrDNAs within Panax ginseng species. Plos One 10: e0117159 10.1371/journal.pone.0117159 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Park I, Yang S, Kim WJ, Noh P, Lee HO, Moon BC (2018) The complete chloroplast genomes of Six Ipomoea species and indel marker development for the discrimination of authentic Pharbitidis Semen (seeds of I. nil or I. purpurea). Frontiers in Plant Science 9: 965 10.3389/fpls.2018.00965 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Park I, Yang S, Choi G, Kim WJ, Moon BC (2017) The complete chloroplast genome sequences of Aconitum pseudolaeve and Aconitum longecassidatum, and development of molecular markers for distinguishing species in the Aconitum subgenus Lycoctonum. Molecules 22: 2012 10.3390/molecules22112012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Bolger AM, Marc L, Bjoern U (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120. 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. (2009) BLAST+: architecture and applications. BMC Bioinformatics 10: 421 10.1186/1471-2105-10-421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Anton B, Sergey N, Dmitry A, Gurevich AA, Mikhail D, Kulikov AS, et al. (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 19: 455–477. 10.1089/cmb.2012.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Krumsiek J, Rattei AT (2007) Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23: 1026–1028. 10.1093/bioinformatics/btm039 [DOI] [PubMed] [Google Scholar]
  • 31.Swindell SR, Plasterer TN (1997) SEQMAN. Contig assembly. Methods in Molecular Biology 70: 75–89. 10.1385/0-89603-358-9:75 [DOI] [PubMed] [Google Scholar]
  • 32.Dierckxsens N, Mardulyn P, Smits G (2016) NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Research 2016: gkw955 10.1093/nar/gkw955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wyman SK, Jansen RK, Boore JL (2004) Automatic annotation of organellar genomes with DOGMA. Bioinformatics 20: 3252–3255. 10.1093/bioinformatics/bth352 [DOI] [PubMed] [Google Scholar]
  • 34.Liu C, Shi L, Zhu Y, Chen H, Zhang J, Lin X, et al. (2012) CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences. BMC Genomics 13: 715 10.1186/1471-2164-13-715 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25: 955–964. 10.1093/nar/25.5.955 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Liying C, Narayanan V, Alexander R, Kerr W, Jansen RK, Jim LM, et al. (2006) ChloroplastDB: the Chloroplast Genome Database. Nucleic Acids Research 34: D692–D696. 10.1093/nar/gkj055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ed L, Nomi HM, Gibson, Raymond C, Suzanna L (2009) Apollo: a community resource for genome annotation editing. Bioinformatics 25: 1836–1837. 10.1093/bioinformatics/btp314 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Dean L, Bjorn C (2004) ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Research 32: 11–16. 10.1093/nar/gkh152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Sharp PM, Li WH (1987) The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Research 15: 1281–1295. 10.1093/nar/15.3.1281 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Rice P, Longden I, Bleasby A (2000) EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 16: 276–277. 10.1016/s0168-9525(00)02024-2 [DOI] [PubMed] [Google Scholar]
  • 41.Itaya H (2013) GEMBASSY: an EMBOSS associated software package for comprehensive genome analyses. Source Code for Biology and Medicine 8: 17 10.1186/1751-0473-8-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Kurtz S, Schleiermacher C (1999) REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics 15: 426–427. 10.1093/bioinformatics/15.5.426 [DOI] [PubMed] [Google Scholar]
  • 43.Beier S, Thiel T, Münch T, Scholz U, Mascher M (2017) MISA-web: a web server for microsatellite prediction. Bioinformatics 33: 2583–2585. 10.1093/bioinformatics/btx198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Dubchak I, Ryaboy DV (2005) VISTA family of computational tools for comparative analysis of DNA sequences and whole genomes. Methods in Molecular Biology 338: 69–89. 10.1385/1-59745-097-9:69 [DOI] [PubMed] [Google Scholar]
  • 45.Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, et al. (2000) VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics 16: 1046–1047. 10.1093/bioinformatics/16.11.1046 [DOI] [PubMed] [Google Scholar]
  • 46.Librado P, Rozas J (2009) Dnasp v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25: 1451–1452. 10.1093/bioinformatics/btp187 [DOI] [PubMed] [Google Scholar]
  • 47.Thompson JD, Gibson TJ, Higgins DG (2002) Multiple sequence alignment using ClustalW and ClustalX. Current Protocols Bioinformatics Chapter 2: Unit 2.3. 10.1002/0471250953.bi0203s00 [DOI] [PubMed] [Google Scholar]
  • 48.Gao F, Chen C, Arab DA, Du Z, He Y, Ho SYW (2019) EasyCodeML: A visual tool for analysis of selection using CodeML. Ecology and Evolution: 1–8. 10.1038/s41559-018-0779-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Kazutaka K, Kei-Ichi K, Hiroyuki T, Takashi M (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research 33: 511–518. 10.1093/nar/gki198 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Tippmann HF (2004) Analysis for free: Comparing programs for sequence analysis. Brief Bioinform 5: 82–87. 10.1093/bib/5.1.82 [DOI] [PubMed] [Google Scholar]
  • 51.Posada D (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics 14: 817–818. 10.1093/bioinformatics/14.9.817 [DOI] [PubMed] [Google Scholar]
  • 52.Swofford DL (2002) PauP*. Phylogenetic Analysis Using Parsimony (* and other methods). Version 4.0b10 ed. MA, USA.
  • 53.Lalitha S (2000) Primer Premier 5. Biotech software and internet report 1: 270–272. 10.1089/152791600459894 [DOI] [Google Scholar]
  • 54.Zhang X, Zhou T, Kanwal N, Zhao Y, Bai G, Zhao G (2017) Completion of eight Gynostemma BL. (Cucurbitaceae) chloroplast genomes: characterization, comparative analysis, and phylogenetic relationships. Frontiers in Plant Science 8: 1583 10.3389/fpls.2017.01583 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Qian J, Song JY, Gao HH, Zhu YJ, Xu J, Pang XH, et al. (2013) The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza. Plos One 8: e57607 10.1371/journal.pone.0057607 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Shen X, Wu M, Liao B, Liu Z, Bai R, Xiao S, et al. (2017) Complete chloroplast genome sequence and phylogenetic analysis of the medicinal plant Artemisia annua. Molecules 22: 1330 10.3390/molecules22081330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Clegg MT, Gaut BS, Learn GH, Morton BR (1994) Rates and patterns of chloroplast DNA evolution. Proceedings of the National Academy of Sciences of the United States of America 91: 6795–6801. 10.1073/pnas.91.15.6795 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Jiang M, Chen H, He S, Wang L, Chen AJ, Liu C (2018) Sequencing, characterization, and comparative analyses of the plastome of Caragana rosea var. rosea. International Journal of Molecular Science 19: 1419 10.3390/ijms19051419 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Zhou J, Chen X, Cui Y, Sun W, Li Y, Wang Y, et al. (2017) Molecular structure and phylogenetic analyses of complete chloroplast genomes of two Aristolochia medicinal species. International Journal of Molecular Science 18: 1839 10.3390/ijms18091839 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Lobry JR (1996) Asymmetric substitution patterns in the two DNA strands of bacteria. Molecular Biology and Evolution 13: 660–665. 10.1093/oxfordjournals.molbev.a025626 [DOI] [PubMed] [Google Scholar]
  • 61.Necsulea A, Lobry J (2007) A new method for assessing the effect of replication on DNA base composition asymmetry. Molecular Biology and Evolution 24: 2169–2179. 10.1093/molbev/msm148 [DOI] [PubMed] [Google Scholar]
  • 62.Tillier ER, Collins RA (2000) The contributions of replication orientation, gene direction, and signal sequences to base-composition asymmetries in bacterial genomes. Journal of Molecular Evolution 50: 249–257. 10.1007/s002399910029 [DOI] [PubMed] [Google Scholar]
  • 63.Clarke AK, Gustafsson P, Lidholm JÅ (1994) Identification and expression of the chloroplast clpP gene in the conifer Pinus contorta. Plant Molecular Biology 26: 851–862. 10.1007/bf00028853 [DOI] [PubMed] [Google Scholar]
  • 64.Boudreau E, Takahashi Y, Lemieux C, Turmel M, Rochaix JD (2014) The chloroplast ycf3 and ycf4 open reading frames of Chlamydomonas reinhardtii are required for the accumulation of the photosystem I complex. Embo Journal 16: 6095–6104. 10.1093/emboj/16.20.6095 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Rose AB (2008) Intron-mediated regulation of gene expression. Curr Top Microbiol Immunol 326: 277–290. 10.1007/978-3-540-76776-3_15 [DOI] [PubMed] [Google Scholar]
  • 66.Sugita M, Sugiura M (1996) Regulation of gene expression in chloroplasts of higher plants. Plant Molecular Biology 32: 315–326. 10.1007/bf00039388 [DOI] [PubMed] [Google Scholar]
  • 67.Boudreau E, Turmel M (1995) Gene rearrangements in Chlamydomonas chloroplast DNAs are accounted for by inversions and by the expansion/contraction of the inverted repeat. Plant Molecular Biology 27: 351–364. 10.1007/bf00020189 [DOI] [PubMed] [Google Scholar]
  • 68.Nazareno AG, Carlsen M, Lohmann LG (2015) Complete chloroplast genome of Tanaecium tetragonolobum: The first Bignoniaceae plastome. Plos One 10: e0129930 10.1371/journal.pone.0129930 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Comeron JM, Aguadé M (1998) An evaluation of measures of synonymous codon usage bias. Journal of Molecular Evolution 47: 268–274. 10.1007/pl00006384 [DOI] [PubMed] [Google Scholar]
  • 70.Lee S, Weon S, Lee S, Kang C (2010) Relative codon adaptation index, a sensitive measure of codon usage bias. Evolutionary Bioinformatics 2010: 47–55. 10.4137/EBO.S4608 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Zhou J, Cui Y, Chen X, Li Y, Xu Z, Duan B, et al. (2018) Complete chloroplast genomes of Papaver rhoeas and Papaver orientale: molecular structures, comparative analysis, and phylogenetic analysis. Molecules 23: 437 10.3390/molecules23020437 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Guo S, Guo L, Zhao W, Xu J, Li Y, Zhang X, et al. (2018) Complete chloroplast genome sequence and phylogenetic analysis of Paeonia ostii. Molecules 23: 246 10.3390/molecules23020246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Liu W, Kong H, Zhou J, Fritsch PW, Hao G, Gong W (2018) Complete chloroplast genome of Cercis chuniana (Fabaceae) with structural and genetic comparison to six species in Caesalpinioideae. International Journal of Molecular Science 19: 1286 10.3390/ijms19051286 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Dong W, Xu C, Cheng T, Lin K, Zhou S (2013) Sequencing angiosperm plastid genomes made easy: a complete set of universal primers and a case study on the phylogeny of Saxifragales. Genome Biology & Evolution 5: 989–997. 10.1093/gbe/evt063 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Xie DF, Yu Y, Deng YQ, Li J, Liu HY, Zhou SD, et al. (2018) Comparative analysis of the chloroplast genomes of the Chinese endemic genus Urophysa and their contribution to chloroplast phylogeny and adaptive evolution. International Journal of Molecular Science 19: 1847 10.3390/ijms19071847 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Guy-Franck R, Alix K, Bernard D (2008) Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiology and molecular biology reviews 72: 686–727. 10.1128/MMBR.00011-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Gulcher J (2012) Microsatellite markers for linkage and association studies. Cold Spring Harbor Protocols 2012: 425–432. 10.1101/pdb.top068510 [DOI] [PubMed] [Google Scholar]
  • 78.Kawabe A, Nukii H, Furihata H (2018) Exploring the history of chloroplast capture in Arabis using whole chloroplast genome sequencing. International Journal of Molecular Science 19: 602 10.3390/ijms19020602 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Smith JSC, Chin ECL, Shu H, Smith OS, Wall SJ, Senior ML, et al. (1997) An evaluation of the utility of SSR loci as molecular markers in maize (Zea mays L.): comparisons with data from RFLPS and pedigree. Theoretical and Applied Genetics 95: 163–173. 10.1007/s001220050544 [DOI] [Google Scholar]
  • 80.Li X, Yang Y, Henry RJ, Rossetto M, Wang Y, Chen S (2015) Plant DNA barcoding: from gene to genome. Biol Rev Camb Philos Soc 90: 157–166. 10.1111/brv.12104 [DOI] [PubMed] [Google Scholar]
  • 81.Lu Q, Ye W, Lu R (2018) Phylogenomic and comparative analyses of complete plastomes of Croomia and Stemona (Stemonaceae). International Journal of Molecular Science 19: 2383 10.3390/ijms19082383 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Khan A, Khan IA, Asif H, Azim MK (2010) Current trends in chloroplast genome research. African Journal of Biotechnology 9: 3494–3500. 10.1186/1471-2105-11-321 [DOI] [Google Scholar]
  • 83.Zhang X, Zhou T, Yang J, Sun JJ, Ju MM, Zhao YM, et al. (2018) Comparative analyses of chloroplast genomes of Cucurbitaceae species: lights into selective pressures and phylogenetic relationships. Molecules 23: 2165 10.3390/molecules23092165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Group CPW (2009) A DNA barcode for land plants. Proceedings of the National Academy of Sciences of the United States of America 106: 12794–12797. 10.1073/pnas.0905845106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Lei W, Ni D, Wang Y, Shao J, Wang X, Yang D, et al. (2016) Intraspecific and heteroplasmic variations, gene losses and inversions in the chloroplast genome of Astragalus membranaceus. Scientific Reports 6: 21669 10.1038/srep21669 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Jansen RK, Raubeson LA, Boore JL, Depamphilis CW, Chumley TW, Haberle RC, et al. (2005) Methods for obtaining and analyzing whole chloroplast genome sequences. Methods in Enzymology 395: 348–384. 10.1016/S0076-6879(05)95020-9 [DOI] [PubMed] [Google Scholar]
  • 87.Li Y, Zhou J, Chen X, Cui Y, Xu Z, Li Y, et al. (2017) Gene losses and partial deletion of small single-copy regions of the chloroplast genomes of two hemiparasitic Taxillus species. Scientific Reports 7: 12834 10.1038/s41598-017-13401-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Ohta T (1995) Synonymous and nonsynonymous substitutions in mammalian genes and the nearly neutral theory. Journal of Molecular Evolution 40: 56–63. 10.1007/bf00166595 [DOI] [PubMed] [Google Scholar]
  • 89.Chen Z, Schertz KF, Mullet JE, Dubell A, Hart GE (1995) Characterization and expression of rpoC2 in CMS and fertile lines of sorghum. Plant Molecular Biology 28: 799–809. 10.1007/bf00042066 [DOI] [PubMed] [Google Scholar]
  • 90.Hiroshi K, Pal M (2003) The plastid clpP1 protease gene is essential for plant development. Nature 425: 86–89. 10.1038/nature01909 [DOI] [PubMed] [Google Scholar]
  • 91.Maliga P (2004) Plastid transformation in higher plants. Annual Review of Plant Biology 55: 289–313. 10.1146/annurev.arplant.55.031903.141633 [DOI] [PubMed] [Google Scholar]
  • 92.Tseng CC, Sung TY, Li YC, Hsu SJ, Lin CL, Hsieh MH (2010) Editing of accD and ndhF chloroplast transcripts is partially affected in the Arabidopsis vanilla cream1 mutant. Plant Molecular Biology 73: 309–323. 10.1007/s11103-010-9616-5 [DOI] [PubMed] [Google Scholar]
  • 93.Madoka Y, Tomizawa KI, Mizoi J, Nishida I, Nagano Y, Sasaki Y (2002) Chloroplast transformation with modified accD operon increases acetyl-CoA carboxylase and causes extension of leaf longevity and increase in seed yield in tobacco. Plant and Cell Physiology 43: 1518–1525. 10.1093/pcp/pcf172 [DOI] [PubMed] [Google Scholar]
  • 94.Li J (1993) A revision of the genus Siraitia Merr. and two new genera of Cucurbitaceae. Acta Phytotax Sinica 31: 45–55. [Google Scholar]

Decision Letter 0

Shashi Kumar

9 Sep 2019

PONE-D-19-14994

Complete chloroplast genomes of two Siraitia Merrill species: comparative analysis, positive selection and novel molecular marker development

PLOS ONE

Dear Dr Ma,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Please make all the necessary changes, which is essential and improve your manuscript by incorporating all the comments suggested by three reviewer before submitting the revised version.

==============================

We would appreciate receiving your revised manuscript by November 7, 2019. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Shashi Kumar, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

Additional Editor Comments (if provided):

Dear Dr. Ma,

Please correct your manuscript following the comments of three revivers.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper assembles and analyses two complete chloroplast genomes of the Siraitia genus. Paper covers most of the aspects of comparative genome analysis between the two Siraitia genus and have developed molecular markers to distinguish the two species. Overall the paper is well written, however there are few suggestions mentioned below:

1) Line 46: There is no reference to the statement written. Please add reference to the this statement.

2) Line 56: "Unstable treatment effect" statement is not clear.

3) Line 62: "most origin plants" which origin plants author is mentioning, statement is not clear.

4) Line 168: Please mention the version of Modeltest used.

5) Line 197: Please add some description of the figure shown " The quadripartite structure includes two copies of an IR region (IRA and IRB) that separate large single-copy (LSC) and small single copy (SSC) regions ".

6) Line 199: It would be easier if you could represent the chloroplast genome in linear form also for better and faster visualization.

7) Line 230: Add reference of "S5 Table" next to line " In addition, the rps12 gene contained three exons and one intron because of trans-splicing, which resulted in a 5’ end exon located in the LSC region, whereas the remaining exons were located in the IRs".

8) Line 242: Does "SC" written in this line refers to SSC region of chloroplast genome?

9) Line 315: Distribution of the SSRs loci in chloroplast genome can also be mentioned giving the details of their location in coding region or intergenic spacer region in tabular form would be helpful

10) Line 392: Please correct the spelling of freedom it's written "freedon".

11) Line 437: Please rewrite the description of Fig 7 in little more detail.

12) Line 590: Reference number 43 is not written in the prescribed format.

Reviewer #2: This study reports the complete plastome of Siraitia grosvenorii, a traditional chinese medicine with antidiabetic, antioxidative and anti inflammatory properties and Siraitia siamensis, its wild relative, having 158751 bp and 159190 bp respectively. High resolution novel molecular markers GSPC-F/R, GSPR-F/R, GSPB-F/R and GSPY-F/R were developed to distinguish between the two species. Phylogenetic analysis of the two species within the Cucurbitaceae family placed the two species close to M. kirilowii

Minor Comments

Material and Methods

Chloroplast genome assembly:

1) Mention the parameters used while assembling using SPAdes.

2) Why specific de-novo genome assembler like NOVOPlasty which is specific for plant chloroplast genome was not used.

3) How are the results different if specific genome assemblers are used

Genome annotation, repeats and simple sequence repeats (SSRs) analyses:

1) DOGMA is no longer supported. Annotate using newer plastid annotation tools.

Reviewer #3: First of all, l would like to acknowledge Dr Shashi Kumar, Aademic Editorail Board Membeanting member of PLOS One, for providing me this Scholastic Opportunity to peer review, and His patience as well as forbearance whilst receiving my Reviewer Comments. At the outset, I am glad to congratulate the Respected authors from Chinese Academy of Medical Sciences & Peking Union College Beijing, PRC: Drs. Xiaojun Ma, Hongwu Shi, Meng Yang, Changming Mo, Wenjuan Xie, Chang Liu and Bin Wu for having worked assiduously on such a significant paper on Chroloplast de No asssembly and sequence anaysis, with deeply underpinned significane in Medicinal Plant Breeding.

This research artTcle is worthy to be accepted for publication in PLOS ONE, since it satisfies each of the below Criteria:

1. The study presents the results of original research.

2. Results reported have not been published elsewhere.

3. Experiments, statistics, and other analyses are performed to a high technical standard and are described in sufficient detail.

4. Conclusions are presented in an appropriate fashion and are supported by the data.

5. The article is presented in an intelligible fashion and is written in standard English.

6. The research meets all applicable standards for the ethics of experimentation and research integrity.

7. The article adheres to appropriate reporting guidelines and community standards for data availability.

(i) The 333bp disparity between Chloroplast genome assemblies of grosvenorii and siamensis can be carefully addressed using Compositional data analysis paradigm for checking INTEGRITY of genome assemblies carried out thus far expected to be published by the Peer Reviewer soon in NAR-Genomics Bioinformtics CoDA issue CfP. The authors are kindly requested to revisit in this regard, my works on FastQ-ome (15th SocBiN, Moscow, Russia) and IUPAC-IUB probability conservatioBan (DFG- Hyderabad).

(ii) The 'Cucurbitaceae' phylogenetic divergence lineage is suggested to be revised and precisely expressed in terms of Mya or Bya using Carbon-dating.

(iii) The sweetness indices may be more favorably expressed in terms of Aspartame, an artifical sweetner, which is ~400 times sweeter than Sucrose.

(iv) Chinese Pharmacopoeia and Guangxi Medical Botanical Garden of IMPLAD can please share the 2 Germplasm accessions with NBPGR, National Bureau of Plant Genetic Resources, New Delhi (India) for better data interoperability. Also, apart from GenBank/ NCBI, kindly deposit and archive in Gigascience as well. Hopefully, RefSeq assemblies shall be annotated soon, with Feature table flat files made available, such as rRNA, tRNA, CDS.

(v) The two INDEL markers and 3 Aconitum species may be plotted in a Ternary diagram for ease of Visualisation for readers.

(vi) Apart from Trimmomatic for the Adapter removal, fastx-clipper and Cutadapt may be employed, for comparative validation, fastqc_data ext files may be autoparsed for text-mining the adapter sequences using Natural Language Processing from master Adapter files, such as Illumina Universal Sequence adapters.

(vii) SRA-BLASTN may be preferably mentioned over simply BLASTN under "Chloroplast genome assembly" section, since Illumina paired end reads are the BLAST database.

(viii) Apart from SPAdes and CLC Genomics Workbench, minia developed by Rayan Chikki, Inria (France) can also please be used since it is Computationally efficient.

(ix) SSR analysis may be re-performed in line with Peer Reviewer's FAO, Rome abstract communicated to 8th International Conference on Agricultural Statistics, IASRI New Delhi, toward which all the Chinese authors are cordially inviited to attend my presentation, as a self-cited extension of my Indo-Belarusian and 2nd IOCBS,Kolkata works on k-Mer based permutative Confusion matrix analysis of mono-, di-, tri-, tetra- nucleotide frequencies with Five number summary by Sourmash, jellyfish.a

(x) Relative Synonymoous Codon Usage (RCSU) may be repercieved in the Light of Chargaff's rules and Wobble hypothesis. Refer to my Big Data Genetic Code bit.ly/bdgcode

(xi) Coming to Hamming distane allowance of 30, apart from REPuter, RADMIDs subtool from RADtools (UK) may be implemented, even for deNovo SNP genotyping in the Chloroplast genomes.

(xii) MAFFT fourier transform-based Multiple Sequence Alignment of the 64 Protein-coding gene sequences may be complemened by Low Complexity Regions-MSA analysis akin to My DST-NNMCB proceeding derivative work of Rune Lindin Lab, Denmark Kinome Rendering Work furthered during my Pre Doctorate at SggW, Poland during Erasmus Mundu

(xiii) Intron-Exon distributions in clpP, ycf3 genes may be set theoretically analysed using Venny plots. Partial sequence overlaps in gene-Codon pairs, may be visualized using SPAdes ICARUS broswer from Center for Algorithmic Bioinformatics, SPbU Russia, since SPAdes was already run anyway.

(xiv) Contractionn-Expansion of IR regions may be modeled, off-bit using Compression-Rarefaction profiles inspired by Modern Acoustic Physics.

(xv) Strong AT/GC bias observations can neatly be further strengthened using Kurtosis calculations.

(xvi) Repeat structure can be subject to justification using pattern recognition as per Peer Reviewer's communication to Dr Liela Taher, University of Nurenbeg, Germany based upon Elementary Cellular Automata.

(xvii) Sequence divergence can be modeled using Entropy based decision trees as per my Seminar at Polish Academy of Sciences: Nencki Institute on SuperReference genome

(xviii) Specificity of Medicinal and Wild type molecular markers can be assessed better by Peer Reviewer's Octa filial homozygosity computations Schema to be presented Orally at IRRI, Phillipines Germplasm to Genome Conference at NASC, New Delhi : to which the All Authors and their research community are being cordially invited.

(xix) Selective pressure events may be better deciphered using Gene Expression studies, Both trancriptomic and small RNA interference, under Drought and Stress levels.

(xx) Trancription Factor DNA motif analysis can be confirmed using JASPAR database, ChIP seq approaches using MAC14 and gem-Tools at Bioinformatics level, and computationally augmented by Molecular Information Theory conception developed by Dr Thomas D. Schneider at NCI, National Cancer Institute, USA.

(xxI) Molecular markers at a distinguishable level for S.grosvenorii and S.siamenis can be ambitiously characterized by Tagging with Biodegradeable IoT/ Internet of Things, please refer to my TEDx talk delivered 2015 at Skopje, Former Yugoslavian Republic of Macedonia.

(xxii) Primer identification of the molecular markers can be inten-silico cross validated usiing Primer-BLAST and ApoE for instance.

(xxiii) The Four Molecular Markers GSPC-F/R, GSPR-F/R, GSPB-F/R, GSPY-F/R can be fitted into Confusion Matrix parameters namely True positives, Negatives and False Positives, Negatives so as to successfully subject the Binary classfication of the two Chloroplast species in a Mathematically coherent manner. Thank you.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Surbhi sharma

Reviewer #2: No

Reviewer #3: Yes: PRAHARSHIT SHARMA

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2019 Dec 20;14(12):e0226865. doi: 10.1371/journal.pone.0226865.r002

Author response to Decision Letter 0


24 Oct 2019

Dear Dr Shashi Kumar:

We would like to thank you and three reviewers for the time in reviewing the manuscript "Complete chloroplast genomes of two Siraitia Merrill species: comparative analysis, positive selection and novel molecular marker development" (Manuscript Number: PONE-D-19-14994), and for providing us the opportunity to revise our manuscript. We have now improved the manuscript by adding new supporting information according to reviewer’s suggestions. The reviewer #3 is good at bioinformatics, who put forward many constructive comments, including many softwares and genetic algorithms. In our study, the softwares used were issued and ciated by the authoritative journal, and we did not aim to compare the difference among the software and algorithms. Anyway, we did our best to feed back the third reviewer’s suggestions. Below please find our point-to-point responses to the reviewers’ comments.

We are looking forward to hear from your regarding our revised manuscript.

Best regards,

Xiaojun Ma, Professor

Institute of Medicinal Plant Development, Chinese Academy of Medical Science & Peking Union Medical College

Phone: (+86) 010-5783-3155

Email: mayixuan10@163.com

Reviewer #1: The paper assembles and analyses two complete chloroplast genomes of the Siraitia genus. Paper covers most of the aspects of comparative genome analysis between the two Siraitia genus and have developed molecular markers to distinguish the two species. Overall the paper is well written, however there are few suggestions mentioned below:

1) Line 46: There is no reference to the statement written. Please add reference to the this statement.

Response: Thanks for the suggestion. We have added a reference to the statement.

2) Line 56: "Unstable treatment effect" statement is not clear.

Response: We thank the reviewer’s suggestion. “Unstable treatment effect" has been corrected as “poor therapeutic effect”

3) Line 62: "most origin plants" which origin plants author is mentioning, statement is not clear.

Response: The sentence “To reduce the proportion of staminiferous plants and plant viral disease, most of the species have been collected and are cultivated via tissue culture to improve production, whereas most origin plants were obtained privately from wild resources without professional identification and named with ordinary variety names.” has been revised as “Most of the Siraitia plants origin privately from wild resources without professional identification and named with ordinary variety names.”

4) Line 168: Please mention the version of Modeltest used.

Response: We have added the version 3.7 to the Modeltest.

5) Line 197: Please add some description of the figure shown " The quadripartite structure includes two copies of an IR region (IRA and IRB) that separate large single-copy (LSC) and small single copy (SSC) regions ".

Response: According to reviewer's suggestion, we have described the figure “ Circular Gene map of the complete chloroplast genomes of S. grosvenorii and S. siamensis. The quadripartite structure includes two copies of an IR region (IRa and IRb) that separated by (LSC) and SSC regions. Genes drawn in the circle are the transcribed clockwise, and those on the outside are transcribed counter-clockwise. The darker gray area in the inner circle show the GC content, whereas the lighter corresponds to AT content. Different genes groups are colored.” .

6) Line 199: It would be easier if you could represent the chloroplast genome in linear form also for better and faster visualization.

Response: Thank for the suggestion. We have added a “S1 Figure” in the supporting information, which display the chloroplast genome in linear form. It is known that chloroplast genome is a single circular molecule with a typical quadripartite structure, therefore we retain the circular molecule structure in the main manuscript.

7) Line 230: Add reference of "S5 Table" next to line " In addition, the rps12 gene contained three exons and one intron because of trans-splicing, which resulted in a 5’ end exon located in the LSC region, whereas the remaining exons were located in the IRs".

Response: It is a good idea. The position of the "S5 Table" has been corrected.

8) Line 242: Does "SC" written in this line refers to SSC region of chloroplast genome?

Response:The “SC” means “single-copy”. We have decribed SC in the paragraph of Chloroplast Genome Assembly in the Materials and Methods as “the four boundaries between the single-copy (SC) regions and IR regions…”.

9) Line 315: Distribution of the SSRs loci in chloroplast genome can also be mentioned giving the details of their location in coding region or intergenic spacer region in tabular form would be helpful.

Response: Thanks for the suggestion. We have added a S9 Table (XLSX, Distribution of the SSRs loci in chloroplast genome of S. grosvenorii and S. siamensis.) in the supporting information which contain SSRs information of the two species in detail.

10) Line 392: Please correct the spelling of freedom it's written "freedon".

Response: The mistake has been corrected.

11) Line 437: Please rewrite the description of Fig 7 in little more detail.

Response: The description of Fig 7 has been revised as “Fig 7. Schematic diagram displayed the four novel molecular markers. The indel makers in the intergenic spacers, including GSPC-F/R, GSPR-F/R, and SCPB-F/R, and ycf1 SNP were verified in Siraitia species with five individuals. (A) Sequencing results showed that PCR amplification by GSPC-F/R between the ndhC and trnV-UAC in S. grosvenorii and S. siamensis were 790 bp and 808 bp, respectively. The difference can also be found by GSPR-F/R (B), and SCPB-F/R (C) between the two Siraitia species. (D) GSPY-F/R marker for ycf1 SNP sites were validated in each cases, which were effective to discriminate the two Siraitia species.”

12) Line 590: Reference number 43 is not written in the prescribed format.

Response: The reference has been corrected as “Beier S, Thiel T, Münch T, Scholz U, Mascher M (2017) MISA-web: a web server for microsatellite prediction. Bioinformatics 33: 2583-2585. doi: 10.1093/bioinformatics/btx198.”

Reviewer #2: This study reports the complete plastome of Siraitia grosvenorii, a traditional chinese medicine with antidiabetic, antioxidative and anti-inflammatory properties and Siraitia siamensis, its wild relative, having 158751 bp and 159190 bp respectively. High resolution novel molecular markers GSPC-F/R, GSPR-F/R, GSPB-F/R and GSPY-F/R were developed to distinguish between the two species. Phylogenetic analysis of the two species within the Cucurbitaceae family placed the two species close to M. kirilowii.

Minor Comments

Material and Methods

Chloroplast genome assembly:

1) Mention the parameters used while assembling using SPAdes.

Response: Thanks for the suggestions. The parameters has been added in the manuscript. The SPAdes using for the assembling that the parameters were set as “-k 21,33,55,77,99,127 –careful”.

2) Why specific de-novo genome assembler like NOVOPlasty which is specific for plant chloroplast genome was not used.

Response: Thanks for the suggestion. In this study, the raw data were acquired and analyzed in march, 2017, while NOVOPlasty was not widely applied in the chloroplast genomen assembly at that time. After reading the comments, we download the paper by Nicolas et al. (NOVOPlasty: De novo assembly of organelle genomes from whole genome data), and knew that NOVOPlasty is an effective genome assembler for plant chloroplast genome. Therefore, we used NOVOPlasty to assemble the two species chloroplast genomes for validation. Result showed that the assembly by NOVOPlasty was identical to that of by SPAdes and CLC Genomics Workbench, indicating the reliability of our assembly.

3) How are the results different if specific genome assemblers are used.

Response: It only took about several hours for us to finish the assembly of the two Siraitia chloroplast genomes by NOVOPlasty. Result showed that the assembly by NOVOPlasty was identical to that of by SPAdes and CLC Genomics Workbench previously. Therefore, we will use NOVOPlasty for the chloroplast genome assembly of other plant species in the next work.

Genome annotation, repeats and simple sequence repeats (SSRs) analyses:

1) DOGMA is no longer supported. Annotate using newer plastid annotation tools.

Response: In this work, the chloroplast genome was premarily annotated by CPGAVAS, which is an effective software for the complete annotation of chloroplast genome. But a few annotions were blurry for CPGAVAS, which were re-annotated by DOGMA. Combined CPGAVAS with DOGMA, the chloroplast genomes of the two S. grosvenorii were annotated completely, and all the genes were blasted in the NCBI database for verification.

Reviewer #3: First of all, l would like to acknowledge Dr Shashi Kumar, Aademic Editorail Board Membeanting member of PLOS One, for providing me this Scholastic Opportunity to peer review, and His patience as well as forbearance whilst receiving my Reviewer Comments. At the outset, I am glad to congratulate the Respected authors from Chinese Academy of Medical Sciences & Peking Union College Beijing, PRC: Drs. Xiaojun Ma, Hongwu Shi, Meng Yang, Changming Mo, Wenjuan Xie, Chang Liu and Bin Wu for having worked assiduously on such a significant paper on Chroloplast de No asssembly and sequence anaysis, with deeply underpinned significane in Medicinal Plant Breeding.

This research article is worthy to be accepted for publication in PLOS ONE, since it satisfies each of the below Criteria:

1. The study presents the results of original research.

2. Results reported have not been published elsewhere.

3. Experiments, statistics, and other analyses are performed to a high technical standard and are described in sufficient detail.

4. Conclusions are presented in an appropriate fashion and are supported by the data.

5. The article is presented in an intelligible fashion and is written in standard English.

6. The research meets all applicable standards for the ethics of experimentation and research integrity.

7. The article adheres to appropriate reporting guidelines and community standards for data availability.

(i) The 333bp disparity between Chloroplast genome assemblies of grosvenorii and siamensis can be carefully addressed using Compositional data analysis paradigm for checking INTEGRITY of genome assemblies carried out thus far expected to be published by the Peer Reviewer soon in NAR-Genomics Bioinformtics CoDA issue CfP. The authors are kindly requested to revisit in this regard, my works on FastQ-ome (15th SocBiN, Moscow, Russia) and IUPAC-IUB probability conservatioBan (DFG- Hyderabad).

Response: Thanks for the suggestions. The assembly of the two chloroplast genomes were verified by specific de-novo genome assembler NOVOPlasty, and the boundary sequences were validated by PCR and suquencing, which indicated the integrity of the genome. We are glad to revisit the reviewer’s work on FastQ-ome and get lot from the paper “A random forest ensemble of FastQ reads as decision trees”.

(ii) The 'Cucurbitaceae' phylogenetic divergence lineage is suggested to be revised and precisely expressed in terms of Mya or Bya using Carbon-dating.

Response: Thanks for the suggestion. Phylogenetic divergence lineage could be expressed in different methods. Xia et al reported the S. grosvenorii divergened from menbers of the Cucurbitaceae family at approximately 40.9 million years ago based on the phylogenetic analysis (DOI: 10.1093/gigascience/giy067). In this manuscript, we constructed the 'Cucurbitaceae' phylogenetic divergence lineage in the molecular level based on coding-sequence in the chloroplast genomes, which contained conservative and divergent sequences. It is not the focus for us to constructe the phylogenetic divergence lineage in terms of Mya or Bya.

(iii) The sweetness indices may be more favorably expressed in terms of Aspartame, an artifical sweetner, which is ~400 times sweeter than Sucrose.

Response: We agree the standpoint. Aspartame is more favorably for display the sweetness indices. For the authenticity of the quotation, we corrected it as “and is approximately 560 times sweeter than sucrose, and is about 1.4 fold sweeter than aspartame”.

(iv) Chinese Pharmacopoeia and Guangxi Medical Botanical Garden of IMPLAD can please share the 2 Germplasm accessions with NBPGR, National Bureau of Plant Genetic Resources, New Delhi (India) for better data interoperability. Also, apart from GenBank/ NCBI, kindly deposit and archive in Gigascience as well. Hopefully, RefSeq assemblies shall be annotated soon, with Feature table flat files made available, such as rRNA, tRNA, CDS.

Response: Thanks for the suggestion. Firstly, we are pleased to share the 2 germplasm accessions with NBPGR (India) for better data interoperability within the scope of laws and regulations in P.R.C.. Secondly, we have uploaded the data of the two chloroplast genomes sequences on the GenBank/ NCBI with annotation, including the rRNA, tRNA and CDS. When the manuscript is be accepted, we will release all the data. We will deposit the data in Gigascience until the manuscript are published.

(v) The two INDEL markers and 3 Aconitum species may be plotted in a Ternary diagram for ease of Visualisation for readers.

Response: It is a good idea to plot the two INDEL markers and 3 Aconitum specie in a Ternary diagram for the ease of visualisation for readers. We would use this display method in the latter writing. In this manuscript, the “Two INDEL markers and 3 Aconitum species” in the Introduction was summarized others results. If the reader want to learn more details, they could read the original article.

(vi) Apart from Trimmomatic for the Adapter removal, fastx-clipper and Cutadapt may be employed, for comparative validation, fastqc_data ext files may be autoparsed for text-mining the adapter sequences using Natural Language Processing from master Adapter files, such as Illumina Universal Sequence adapters.

Response: Thanks for the reviewer’s suggestion. It was known that Trimmomatic was widely used for the Adapter removal, of raw seqnencing data and related article have been published in authoritative journals, such as Nature Protocols (DOI:10.1038/nprot.2016.011 ) and Genome Biology (DOI: 10.1186/s13059-014-0517-9). Both fastx-clipper and Cutadapt are all-right softwares for adapter removal, and Illumina Universal Sequence adapters are good for solving the comparative validation. We will improve our methods in the latter research.

(vii) SRA-BLASTN may be preferably mentioned over simply BLASTN under "Chloroplast genome assembly" section, since Illumina paired end reads are the BLAST database.

Response: Thanks for the suggestion. We are sorry that the methods of "Chloroplast genome assembly" writen in the manuscript was simply, which had benn improved. We have corrected the “BLASTN” with “SRA-BLASTN”.

(viii) Apart from SPAdes and CLC Genomics Workbench, minia developed by Rayan Chikki, Inria (France) can also please be used since it is Computationally efficient.

Response: Thanks for the suggestion. Apart from SPAdes and CLC Genomics Workbench, we also used specific de-novo genome assembler NOVOPlasty, which is specific for plant chloroplast genome. We will adopt the Minia for the further research, because it is also a good software.

(ix) SSR analysis may be re-performed in line with Peer Reviewer's FAO, Rome abstract communicated to 8th International Conference on Agricultural Statistics, IASRI New Delhi, toward which all the Chinese authors are cordially invited to attend my presentation, as a self-cited extension of my Indo-Belarusian and 2nd IOCBS, Kolkata works on k-Mer based permutative Confusion matrix analysis of mono-, di-, tri-, tetra- nucleotide frequencies with Five number summary by Sourmash, jellyfish.a

Response: Thanks for the suggestions. We have re-performed the SSR analysis and added S9 Table (XLSX) in supporting information, which described distribution of the SSRs loci in chloroplast genomes in details.

(x) Relative Synonymoous Codon Usage (RCSU) may be repercieved in the Light of Chargaff's rules and Wobble hypothesis. Refer to my Big Data Genetic Code bit.ly/bdgcode

Response: Thanks for the suggestion. We analyzed the Relative Synonymoous Codon Usage (RCSU) by the software CodonW (1.4.4). The programme CodonW is written in standard ANSI C, and it compiles cleanly using the GNU C compiler (version 2.7.2.1) with the stringent ANSI and pedantic command line switches. In addition, Cusp and Compseq in EMBOSS (v.6.3.1) were used for the analysis of codon usage frequency and GC content. Bioinformation analysis by the programme CodonW, Cusp and Compseq in EMBOSS has been used in, many paper, such as ‘Weinel C et al. General method of rapid Smith/Birnstiel mapping adds for gap closure in shotgun microbial genome sequencing projects: application to Pseudomonas putida KT2440. Nucleic Acids Research, 2001, 29(22):E110.

(xi) Coming to Hamming distane allowance of 30, apart from REPuter, RADMIDs subtool from RADtools (UK) may be implemented, even for deNovo SNP genotyping in the Chloroplast genomes.

Response: Thanks for the suggestions. In this manuscript, REPuter software was used to analysis the Repeats, including forward, palindromic, reverse, and complement. REPuter is a authoritative software for repeats analysis as decribed in the paper (‘REPuter: fast computation of maximal repeats in complete genomes’, DOI: 10.1093/bioinformatics/15.5.426). In the recent years, REPuter was widely used in some medicinal plant chloroplast genome analysis, and was cited frequently. We also believe that the RADMIDs subtool from RADtools is an effective for repeats analysis. It is known that very software own superiorities and limitations. The RADMIDs subtool would be used in our further study.

(xii) MAFFT fourier transform-based Multiple Sequence Alignment of the 64 Protein-coding gene sequences may be complemented by Low Complexity Regions-MSA analysis akin to My DST-NNMCB proceeding derivative work of Rune Lindin Lab, Denmark Kinome Rendering Work furthered during my Pre Doctorate at SggW, Poland during Erasmus Mundu

Response: Thanks for the suggestion. In our manuscript, The MAFFT fourier transform-based Multiple Sequence Alignment of 64 consensus protein-coding gene sequences was complemented by Low Complexity Regions-MSA analysis . MAFFT alignment was used in many species for the phylogenetic analyses of chloroplast genomes, and was issued in Nucleic Acids Research titled with ‘aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity’ (DOI: 10.1093/nar/gkt389).

(xiii) Intron-Exon distributions in clpP, ycf3 genes may be set theoretically analysed using Venny plots. Partial sequence overlaps in gene-Codon pairs, may be visualized using SPAdes ICARUS broswer from Center for Algorithmic Bioinformatics, SPbU Russia, since SPAdes was already run anyway.

Response: Thanks for the suggestion. Intron-Exon distributions in clpP, ycf3 genes and some other gene contained introns and exons could be found in Fig 4. Partial sequence overlaps in gene-Codon pairs could also be visualized in Fig 4.

(xiv) Contraction-Expansion of IR regions may be modeled, off-bit using Compression-Rarefaction profiles inspired by Modern Acoustic Physics.

Response: This is a good idea. The Compression-Rarefaction profiles inspired by Modern Acoustic Physics is creative. We want to have a try to using this method for Contraction-Expansion of IR regions model, although our team don’t have this experience. If possible, we are pleasured to learn and discuss it with you.

(xv) Strong AT/GC bias observations can neatly be further strengthened using Kurtosis calculations.

Response: Thank you for the recommendation. The Kurtosis calculations is a software for deal with mass data with gaussian distribution. In our study, GC content in the four regions (LSC, SSC, IRa, and IRb) were analyzed, and two species had approximately 36.8% GC. We attempt to analysis the AT/GC bias by Kurtosis calculations, whereas it was difficult to assigned the parameters for the four bases, and there is litter reference for the AT/GC bias using Kurtosis calculations.

(xvi) Repeat structure can be subject to justification using pattern recognition as per Peer Reviewer's communication to Dr Liela Taher, University of Nurenbeg, Germany based upon Elementary Cellular Automata.

Response: Thanks for the suggestion. In our manuscript, repeat structure was identified by REPuter software. We think that all the result from the bioinformatical analysis should be justified with experiments before applied. The pattern recognition might make the analysis results get more accurate than analysis by REPuter software. If necessary, we will apply the pattern recognition for the Repeat structure until it is meaningful in application.

(xvii) Sequence divergence can be modeled using Entropy based decision trees as per my Seminar at Polish Academy of Sciences: Nencki Institute on SuperReference genome

Response: Entropy based decision trees might be a new approach for the Sequence divergence analysis. We think it would be visual with the modeled analysis. We want to learn how to make the Sequence divergence modeled because of the lack of details for your Seminar at Polish Academy of Sciences.

(xviii) Specificity of Medicinal and Wild type molecular markers can be assessed better by Peer Reviewer's Octa filial homozygosity computations Schema to be presented Orally at IRRI, Phillipines Germplasm to Genome Conference at NASC, New Delhi : to which the All Authors and their research community are being cordially invited.

Response: Thanks for the suggestion. The homozygosity computations Schema for the assessments of specificity of medicinal and wild type molecular markers is a creative method. We will pay attention to the conference at NASC, New Delhi, and the molecular markers developed form the two Siraitia species chloroplast genomes will be assessed with the homozygosity by computational approach in the future.

(xix) Selective pressure events may be better deciphered using Gene Expression studies, Both transcriptomic and small RNA interference, under Drought and Stress levels.

Response: We agree with the points. The transcriptomic and small RNA interference are important approach for deciphering the selective pressure events. In the further work, we will use these methods to verify the selective pressure events under drought and stress levels. In this manuscript, all the analysis were based on the genome level.

(xx) Transcription Factor DNA motif analysis can be confirmed using JASPAR database, ChIP seq approaches using MAC14 and gem-Tools at Bioinformatics level, and computationally augmented by Molecular Information Theory conception developed by Dr Thomas D. Schneider at NCI, National Cancer Institute, USA.

Response: Thanks for the review’s suggestions. Transcription Factor DNA motif analysis is not the focus in this work. We will improve our work with these methods in the future.

(xxi) Molecular markers at a distinguishable level for S. grosvenorii and S. siamenis can be ambitiously characterized by Tagging with Biodegradeable IoT/ Internet of Things, please refer to my TEDx talk delivered 2015 at Skopje, Former Yugoslavian Republic of Macedonia.

Response: Thank you for the suggestion. In this study, we developed four molecular markers to distinguish the S. grosvenorii and S. siamensis. We are desired to explore more Siraitia resources to exam the availability of the four molecular markers. Finally, we will make the markers by Tagging with Biodegradeable IoT/ Internet of Things.

(xxii) Primer identification of the molecular markers can be inten-silico cross validated usiing Primer-BLAST and ApoE for instance.

Response: We agreed with the viewpoint. In order to insure the accuracy of markers, these markers were validated by PCR and sequenced.

(xxiii) The Four Molecular Markers GSPC-F/R, GSPR-F/R, GSPB-F/R, GSPY-F/R can be fitted into Confusion Matrix parameters namely True positives, Negatives and False Positives, Negatives so as to successfully subject the Binary classfication of the two Chloroplast species in a Mathematically coherent manner. Thank you.

Response: In our study, all the four molecular markers GSPC-F/R, GSPR-F/R, GSPB-F/R, GSPY-F/R were validated the truth by the two species with different individuals. As shown in Fig 7, the indel makers in the intergenic spacers, including GSPC-F/R, GSPR-F/R, and SCPB-F/R, and ycf1 SNP were verified in Siraitia species with five individuals. Sequencing results showed that PCR amplification by GSPC-F/R between the ndhC and trnV-UAC in S. grosvenorii and S. siamensis were 790 bp and 808 bp, respectively. The difference can also be found by GSPR-F/R, and SCPB-F/R between the two Siraitia species. GSPY-F/R marker for ycf1 SNP sites were validated in each case, which were effective to discriminate the two Siraitia species.

Attachment

Submitted filename: Response to Reviewers.doc

Decision Letter 1

Shashi Kumar

10 Dec 2019

Complete chloroplast genomes of two Siraitia Merrill species: comparative analysis, positive selection and novel molecular marker development

PONE-D-19-14994R1

Dear Dr. Xiaojun Ma,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Shashi Kumar, Ph.D.

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Author have incorporate the changes suggested in previous review which gives more clarity to the data and thus to the final study.

Reviewer #2: The authors have addressed all the comments raised in the previous version of the manuscript which has improved the manuscript considerably. The manuscript is technically sound and the data supports the conclusions drawn. All the statistical analyses have been rigorously performed. The manuscript meets PLOS ONE criteria and I am happy to accept this publication. Congratulations to the authors

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

Shashi Kumar

13 Dec 2019

PONE-D-19-14994R1

Complete chloroplast genomes of two Siraitia Merrill species: comparative analysis, positive selection and novel molecular marker development

Dear Dr. Ma:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Shashi Kumar

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Chloroplast genomes of S. grosvenorii and S. siamensis in linear form.

    (TIF)

    S2 Fig. Distribution of the number of different type repeats and SSRs.

    (A) Repeat sequence in eight chloroplast genomes. Repeat sequences were identified by REPuter with length ≥30bp and sequence identified ≥90%. F, P, R, and C are the abbreviation of repeat type F (forward), P (palindrome), R (reverse) and C (complement), respectively. Different length repeat sequences are colored correspondingly. (B) Analysis of simple sequence repeat (SSRs) in chloroplast genomes of five species.

    (TIF)

    S3 Fig. Sliding window analysis of the whole chloroplast genome.

    Window length: 600 sites, Step size: 200 sites. X-axis: position of the midpoint of a window; Y-axis: nucleotide diversity (π) of each window. (A) Pi among S. grosvenorii and S. siamensis; (B) Pi among six species of Cucurbitaceae.

    (TIF)

    S1 Table. Primer sequence at the boundaries between single cope and IR regions.

    (DOCX)

    S2 Table. List of chloroplast genome sequence used in the study.

    (DOCX)

    S3 Table. Base composition in the chloroplast genomes of S. grosvenorii and S. siamensis.

    (DOCX)

    S4 Table. Genes contained in the chloroplast genomes of S. grosvenorii and S. siamensis.

    (DOCX)

    S5 Table. Location information of genes with introns in the chloroplast genome of S. grosvenorii and S. siamensis.

    (DOCX)

    S6 Table. Comparisons among the chloroplast genome characteristics of S. grosvenorii and S. siamensis, and other six Cucurbitaceae species.

    (DOCX)

    S7 Table. Codon usage and codon-anticodon recognition in all protein-coding genes of the chloroplast genomes of two Siraitia species.

    (DOCX)

    S8 Table. Types and amounts of SSRs in the S. grosvenorii and S. siamensis chloroplast genomes.

    (DOCX)

    S9 Table. Distribution of the SSRs loci in the chloroplast genome of S. grosvenorii and S. siamensis.

    (XLSX)

    Attachment

    Submitted filename: Response to Reviewers.doc

    Data Availability Statement

    The chloroplast genome sequencing data were deposited in GenBank under accession numbers MK755853 and MK755854, respectively.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES