Skip to main content
F1000Research logoLink to F1000Research
. 2020 Jul 21;9:751. [Version 1] doi: 10.12688/f1000research.24396.1

The complete genome sequence of Stevia rebaudiana, the Sweetleaf

Kathleen O'Neill 1, Stacy Pirro 1,a
PMCID: PMC7676390  PMID: 33299551

Abstract

The Sweetleaf ( Stevia rebaudiana: Asteraceae) is widely grown for use as a sweetener.  We present the whole genome sequence and annotation of this species.  A total of 146,838,888 paired-end reads consisting of 22.2G bases were obtained by sequencing one leaf from a commercially grown seedling.  The reads were assembled by a de-novo method followed by alignment to related species.   Annotation was performed via GenMark-ES. The raw and assembled data is publicly available via GenBank: Sequence Read Archive ( SRR6792730) and Assembly ( GCA_009936405).

Keywords: Stevia rebaudiana, Sweetleaf, genome, assembly, annotation

Introduction

The Sweetleaf ( Stevia rebaudiana: Asteraceae) is cultivated commercially for use as a sweetener. The sweetness is due to various steviol glycosides, primarily stevioside and rebaudioside. These compounds have 200-300X the sweetness of sugar ( Abdullateef & Osman, 2012) but have no calories. The market for raw Stevia and derived products is expected to exceed 1B USD by 2021 ( International Stevia Council, 2017).

Stevia rebaudiana has been used as a sweetener for centuries in Brazil and Paraguay ( Misra et al., 2011). Botanist Moisés Santiago Bertoni first described the plant as growing in eastern Paraguay and noted its use as a sweetener ( Bertoni, 1899).

Chemists Bridel and Lavielle isolated the glycosides stevioside and rebaudioside that give the leaves their sweet taste ( Bridel & Lavielle, 1931). The chemical structures of the aglycone steviol and its glycoside have been solved ( Mosettig & Nes, 1955).

A complete genome sequence for this species will assist with discovering markers for crop yields, disease and drought resistance, and determining the biochemical pathways for the relevant metabolites.

Methods

A single commercially grown Stevia rebaudiana plant was used for this study (Behnke Nurseries, Beltsville, MD, USA). DNA extraction was performed on tissue from a single leaf using the Qiagen DNAeasy genomic extraction kit for plants, using the standard process. A paired-end sequencing library was constructed using the Illumina TruSeq kit, according to the manufacturer’s instructions. The library was sequenced on an Illumina Hi-Seq platform in paired-end, 2 × 150bp format.

The resulting fastq files were trimmed of adapter/primer sequence and low-quality regions with Trimmomatic v0.33 ( Bolger et al., 2014). The trimmed sequence was assembled by SPAdes v2.5 ( Bankevich et al., 2012) followed by a finishing step using RagTag v1.0.0 ( Alonge, 2020) to make additional contig joins based on conserved regions in related plant species: Erigeron canadensis ( GCA_010389155), Mikania micrantha ( GCA_009363875), and Helianthus annuus ( GCA_002127325). Default parameters were used for all assembly steps.

Annotation was performed using GeneMark-ES v2.0 ( Lomsadze et al., 2005). Annotation was performed fully de novo without a curated training set and default parameters.

Results

The genome assembly yielded a total sequence length of 411,383,069 bp over 55,557 scaffolds with an N50 of 37,276,437. The GeneMark-ES annotation resulted in 24,994 genes.

Data availability

Underlying data

Raw and assembled data is publicly available via GenBank:

Raw genome of Stevia rebaudiana, Accession number SRR6792730: https://www.ncbi.nlm.nih.gov/sra/?term=SRR6792730

Assembly of Stevia rebaudiana, Accession number ASM993640v1: https://www.ncbi.nlm.nih.gov/assembly/GCA_009936405.1/

Funding Statement

This study was supported by IRIDIAN GENOMES (IRGEN-55670).

[version 1; peer review: 2 approved

References

  1. Abdullateef RA, Osman M: Studies on effects of pruning on vegetative traits in Stevia rebaudiana Bertoni (Compositae). Int J Biol. 2012;4(1). 10.5539/ijb.v4n1p146 [DOI] [Google Scholar]
  2. Alonge M: Ragtag: Reference-guided genome assembly correction and scaffolding.GitHub archive.2020. 10.5281/zenodo.3887140 [DOI] [Google Scholar]
  3. Bankevich A, Nurk S, Antipov D, et al. : SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J Comput Biol. 2012;19(5):455–477. 10.1089/cmb.2012.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bertoni MS: Revista de Agronomia de l'Assomption. 1899. [Google Scholar]
  5. Bolger AM, Lohse M, Usadel B: Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics. 2014;30(15):2114–20. 10.1093/bioinformatics/btu170 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bridel M, Lavielle R: Sur le principe sucre des feuilles de kaa-he-e (stevia rebaundiana B). Comptes rendus de l'Académie des Sciences. 1931;192:1123–5. [Google Scholar]
  7. International Stevia Council: Industry data report, September 2017.2017. [Google Scholar]
  8. Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, et al. : Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005;33(20):6494–6506. 10.1093/nar/gki937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Misra H, Soni M, Silawat N, et al. : Antidiabetic activity of medium-polar extract from the leaves of Stevia rebaudiana Bert. (Bertoni) on alloxan-induced diabetic rats. J Pharm Bioallied Sci. 2011;3(2):242–8. 10.4103/0975-7406.80779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Mosettig E, Nes WR: Stevioside. II. The structure of the aglucon. J Org Chem. 1955;20(7):884–899. 10.1021/jo01125a013 [DOI] [Google Scholar]
F1000Res. 2020 Nov 18. doi: 10.5256/f1000research.26914.r74315

Reviewer response for version 1

Dawson White 1

Stevia is an economically important plant with no prior genomic resource for researchers. Standard methods were used for laboratory preparation, sequencing, assembly, and annotation of the genome. The resulting assembly and gene annotations are good based on the provided statistics.

I would like to see that a voucher specimen of the sequenced plant (or a similar individual) at Behnke Nursuries was deposited in a herbarium. This is important for reproducible science and its omission almost causes me to mark that the methods are only "partly" sufficient.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Plant systematics.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2020 Aug 7. doi: 10.5256/f1000research.26914.r67544

Reviewer response for version 1

Andrew Miller 1,2

The authors present the whole genome sequence of Sweetleaf, the plant where the sweetener, Stevia, is derived.  I am surprised someone has not already sequenced this genome - it is certainly timely.  The data are well presented and publicly available through NCBI.  It is well-written and worthy of indexing. I only have minor comments below:

  1. Should the keywords repeat words found in the title?

  2. Can more be said in the Results about the genome?

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Sanger sequencing; mycology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

F1000Res. 2020 Aug 7. doi: 10.5256/f1000research.26914.r68815

Reviewer response for version 1

Eric Lu Zhang 1

The authors sequenced and assembled the genome of Sweetleaf by Illumina short-reads and annotated the genes using de novo prediction. The data could be useful to the filed, but some points need to be clarified:

  1. ​​​​​​Commonly, the first step of genome assembly is to estimate the genome size using k-mer frequency distribution. The authors should add such analysis in the revision. 

  2. Based on the results in (1), the genome size of Stevia rebaudiana is ca. 1.3Gb. I don't think SPAdes is a good choice as it only works well on the small geneomes.

  3. Gene prediction is too simple and may loose considerable genes. The prediction should include multifaceted information, such as de novo prediction, Homology-based genes prediction et al.

  4. Add a citation for "standard process" of DNA extraction and the model of sequencing platform, e.g. Hi-Seq 2500.

  5. It is better to include a table to show the statistics, such as N50, genome size, the number of genes.

  6. Both of the assemblies from short-reads and short-reads+RagTag should be provided, becasue RagTag may introduce certain bias.

Are sufficient details of methods and materials provided to allow replication by others?

Partly

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Partly

Reviewer Expertise:

Computational genomics, bioinformatics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.


Articles from F1000Research are provided here courtesy of F1000 Research Ltd

RESOURCES