Illumina sequencing data of the complete chloroplast genome of rare species Juniperus seravschanica (Cupressaceae) from Kazakhstan

Moldir Yermagambetova; Saule Abugalieva; Yerlan Turuspekov; Shyryn Almerekova

doi:10.1016/j.dib.2022.108866

. 2022 Dec 29;46:108866. doi: 10.1016/j.dib.2022.108866

Illumina sequencing data of the complete chloroplast genome of rare species Juniperus seravschanica (Cupressaceae) from Kazakhstan

Moldir Yermagambetova ^a,^b, Saule Abugalieva ^a,^b, Yerlan Turuspekov ^a,^b, Shyryn Almerekova ^a,^⁎

PMCID: PMC9850033 PMID: 36687154

Abstract

The species of the genus Juniperus L. play an important role in Kazakhstan forest ecosystems and one of them is Juniperus seravschanica Kom. which has been listed as a rare species in the Red Book of Kazakhstan. The distribution area of J. seravschanica extends from Central Asia (Kazakhstan, Uzbekistan, Kyrgyzstan, Tajikistan, and Turkmenistan) to northern and eastern Afghanistan, northern Pakistan, Kashmir, southeastern Iran, and Oman. J. seravschanica occurred in the southern part of Kazakhstan along with the ranges Karatau, Talas Alatau, Kyrgyz Alatau, Chu-Ili, Karzhantau, and Ugam. The distribution area of J. seravschanica is constantly decreasing due to intensive logging, forest fires, and excessive cattle grazing. The species has ecological importance in the stabilization of mountain slopes against erosion, for hydrobiological regulation, and as a significant medicinal herb. The species J. excelsa M. Bieb., J. polycarpos K.Koch (var. polycarpos and var. turcomanica R.P.Adams), and J. seravschanica are morphologically very similar with some difficulties in species identification. For a better understanding of the evolutionary relationship of these species in the Juniperus genus, it is important to obtain genetic information on the highly conserved chloroplast (cp) genome. Due to the conserved genomic structure, the cp genome nucleotide sequences are widely used in species distinguishing and reconstructing phylogenetic relationships. Unfortunately, there are no publicly available nucleotide sequences of cp genomes data for J. polycarpos (var. polycarpos and var. turcomanica), J. excelsa and J. seravschanica. We report the de novo assembly of the J. seravschanica chloroplast genome by applying next-generation sequencing technology based on Illumina NovaSeq 6000. The assembled cp genome of J. seravschanica is 127,609 bp in length and contained 118 genes, including 82 protein-coding genes, 32 transfer RNA genes, and 4 ribosomal RNA genes. In total 152 simple sequence repeats were identified in the chloroplast genome sequence of J. seravschanica. The Bioproject (PRJNA883033), Sequence Read Archive (SRR21673293), and GenBank (OL684343) data were deposited at National Center for Biotechnology Information.

Keywords: Cupressaceae, Juniperus seravschanica, Rare species, Illumina sequencing, Chloroplast genome, De novo assembly, Chloroplast SSRs

Specifications Table

Subject	Omics: Genomics
Specific subject area	Genomics, Forest ecosystem, Environmental science
Type of data	Tables, Figure
How the data were acquired	The data were acquired using the Illumina NovaSeq 6000 (San Diego, USA) sequencer and assembled with SPAdes v. 3.13.0
Data format	Raw data (fastq) and analyzed data (fasta)
Description of data collection	The fresh leaves of J. seravschanica were collected from the Turkistan region of Southern Kazakhstan and desiccated in silica gel. Total DNA was isolated from the leaves using the CTAB protocol [1]. The concentration and quality of the extracted DNA were checked by gel electrophoresis in agarose and Nanodrop 2000 spectrophotometry (Thermo Fisher Scientific, Wilmington, DE, USA). Paired-end sequencing was performed using NovaSeq 6000 platform (Illumina, San Diego, CA, USA).
Data source location	• Institution: Institute of Plant Biology and Biotechnology • City/Town/Region: Almaty • Country: Kazakhstan GPS coordinates for collected sample: 42.331250 N, 70.372583 E, altitude 1605 m.
Data accessibility	Repository name: National Center for Biotechnology Information Raw data are available in the Sequence Read Archive (SRA) under BioProject PRJNA883033 with SRA number SRR21673293. The complete chloroplast genome is available under accession number OL684343 Direct URL to data: https://www.ncbi.nlm.nih.gov/sra/PRJNA883033 (SRA) https://www.ncbi.nlm.nih.gov/nuccore/OL684343 (Nucleotide)

Open in a new tab

Value of the Data

•
The newly sequenced chloroplast genome data of J. seravschanica can be useful in plant molecular identification and evaluating phylogenetic relationships at the Juniperus genus level.
•
Researchers in molecular botany, genomics, and bioinformatics will benefit from these data.
•
The detected simple sequence repeats can be used in the development of potentially useful molecular markers and evaluation of genetic diversity in J. seravschanica populations and closely related species.

1. Objective

Chloroplast genome data can be used in species distinguishing and reconstructing plant evolutionary relationships due to the highly conserved genome structure. There are some difficulties in species identification for morphologically very similar Juniperus species J. excelsa, J. polycarpos (var. polycarpos and var. turcomanica), and J. seravschanica. Unfortunately, presently there are no publicly available nucleotide sequences of cp genomes data for these listed species. In the present study, we report de novo assembled data of the J. seravschanica cp genome by applying next-generation sequencing technology based on Illumina NovaSeq 6000. The genome assembly details and annotation for the J. seravschanica cp genome were described. The obtained data will provide valuable resources for plant molecular identification and evaluation of phylogenetic relationships at the genus level.

2. Data Description

Complete chloroplast genome sequencing using Illumina NovaSeq 6000 of J. seravschanica generated about 4 GB of raw data which consisting 24,772,052 paired-end reads with GC content of 34,45% and phred score of 94,39% (Q30) and 98,85% (Q20). The assembled chloroplast genome size of the J. seravschanica was 127,609 bp. The structure of the chloroplast genome is circular with a small single-copy region (SSC) and a large single-copy region (LSC). Fig. 1 presented a circular gene map of J. seravschanica chloroplast genome.

Fig 1 — Gene map of the *J. seravschanica* chloroplast genome.

Maximum parsimony concatenated phylogenetic tree based on matK and rbcL nucleotide sequences is given in Fig. 2. The phylogenetic tree separated Juniperus species into two clades which corresponding to the sections Juniperus and Sabina.

The Bioproject (PRJNA883033), Sequence Read Archive (SRR21673293), and GenBank (OL684343) data were deposited at National Center for Biotechnology Information.The chloroplast genome of J. seravschanica encoded 118 genes, including 82 protein-coding genes, 32 transfer RNA (tRNA) genes, and 4 ribosomal RNA (rRNA) genes (Table 1).

Table 1.

List of genes identified in the J. seravschanica cp genome.

Category	Group of genes	Name of genes
Self-replication	Transfer RNA	trnA-UGC, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCC, trnH-GUG, trnI-CAU (x2), trnI-GAU, trnK-UUU, trnL-CAA, trnL-UAA, trnL-UAG, trnM-CAU (x2), trnN-GUU, trnP-UGG, trnQ-UUG (x2), trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, trnV-UAC, trnW-CCA, trnY-GUA
	Ribosomal RNA	rrn16, rrn23, rrn5, rrn4.5
	Small subunit of ribosome	rps2, rps3, rps4, rps7, rps8, rps11, rps12, rps14, rps15, rps18, rps19*
	Large subunit of ribosome	rpl14, rpl16, rpl2, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36*
	DNA-dependent RNA polymerase	rpoA, rpoB, rpoC1, rpoC2*
	Translational initiation factor	infA
Genes for photosynthesis	Rubisco	rbcL
	Photosystem I	psaA, psaB, psaC, psaI, psaJ, psaM
	Photosystem II	psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
	ATP synthase	atpA, atpB, atpE, atpF, atpH, atpI*
	Subunits of cytochrome	petA, petB, petD, petG, petL, petN
	Chlorophyll biosynthesis	chlB, chlL, chlN
	NADH dehydrogenase	ndhA, ndhB, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Other genes	Maturase	matK
	Protease	clpP
	Envelope membrane protein	cemA
	Subunit of acetyl-CoA	accD
	C-type cytochrome synthesis gene	ccsA
Genes of unknown function	Conserved open reading frames	ycf1, ycf2, ycf3, ycf4

Open in a new tab

Note: One or two asterisks indicate one or two intron-containing genes, respectively, (x2) indicates duplicated genes.

Among these 118 genes, three genes (trnI-CAU, trnM-CAU and trnQ-UUG) are duplicated, 16 genes (trnA-UGC, trnG-UCC, trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC, rps12, rpl16, rpl2, rpl23, rpoC1, atpF, petB, petD, ndhA and ndhB) contain one intron and one gene (ycf3) contain two introns. The overall GC content of the J. seravschanica assembled chloroplast genome was 35.05%.

In total, 152 simple sequence repeats (SSRs) were determined in J. seravschanica plastome by MicroSAtellite (MISA) [2]. Four types of SSRs were detected: 108 mononucleotides, 33 dinucleotides, 5 trinucleotides, and 6 tetranucleotides. Types and amounts of identified SSRs are provided in Table 2.

Table 2.

Types and amounts of simple sequence repeats (SSRs) in the J. seravschanica chloroplast genome.

SSR type	Repeat Unit	Ammount	Ratio (%)
Mono	A/T	106	98.1
Mono	C/G	2	1.9
Di	AC/GT	5	15.1
	AG/CT	13	39.4
	AT/AT	15	45.5
Tri	AAG/CTT	3	60
Tri	AAT/ATT	2	40
Tetra	AAAC/GTTT	1	16.7
	AAAG/CTTT	2	33.2
	AAGT/ACTT	1	16.7
	ACCT/AGGT	1	16.7
	ATCC/ATGG	1	16.7

Open in a new tab

Among these 152 SSR markers (Supplementary Table 1), 77 (50.7%) SSRs were located in intergenic region, 56 (36.8%) in protein-coding genes, 14 (9.2%) in introns, 3 (2%) in rRNA and 2 in tRNA (1.3%) (rrn23 and trnS-UGA, trnS-GCU, respectively). Most of the SSRs identified in J. seravschanica chloroplast genome were located in the intergenic and genic regions (87.5%).

3. Experimental Design, Materials and Methods

3.1. Plant Material and DNA Extraction

In this study, the fresh leaves of J. seravschanica were collected from the Turkistan region of Southern Kazakhstan (42.331250N, 70.372583E). Fresh leaves from J. seravschanica samples were desiccated in silica gel and stored at room temperature until DNA extraction. Then, the total DNA was isolated from the leaves under highly sterile conditions using the CTAB protocol [1]. The concentration and quality of the extracted DNA were checked by gel electrophoresis in agarose and Nanodrop 2000 spectrophotometry (Thermo Fisher Scientific, Wilmington, DE, USA).

3.2. Library Preparation and Sequencing

Library preparation and cp genome sequencing were conducted by Macrogen Inc. (Seoul, Korea). The library was performed with the TruSeq Nano DNA Kit (Illumina, USA). Paired-end sequencing was performed on Illumina NovaSeq 6000 sequencer based on sequencing by synthesis technology. Generated raw read Fastq format files were used for the genome assembly.

3.3. Genome Assembly and Annotation

For accurate genome assembly raw data were quality filtered. Reads in which 90% of the bases had a phred score of 20 or higher were used for assembly. After quality filtering, poly-G trimming was performed using fastp 0.19.4 with a quality phred option as 10 and an unqualified percent limit as 50. In order to reduce biases in the analysis, low-quality reads were removed using Trimmomatic [3]. After filtering, the library for J. seravschanica included 24,772,052 total reads. Trimmed reads were used for de novo assembly by SPAdes 3.13.0 [4] assembler approach. The complete genome contigs were combined into one contig by joining overlapping DNA segments of each contig. After the draft genome was assembled, the locations of protein genes were predicted and their functions were annotated using Prokka [5]. The circular map (Fig. 1) of the J. seravschanica cp genome was generated using Organellar Genome DRAW (OGDRAW) software [6].

3.4. Detection of SSR Markers in the Chloroplast Genome

Simple sequence repeats (SSRs) were determined by MISA software [2] with the following thresholds: eight for mononucleotide repeats, four for dinucleotide repeats, four for trinucleotide repeats, three for tetranucleotide repeats, three for pentanucleotide repeats, and three for hexanucleotide repeats. A total of 152 putative SSR markers were identified in the chloroplast genome sequence of J. seravschanica (Table 2).

Ethics Statements

The manuscript adheres to Ethical requirements for publication. The work does not involve studies with animals and humans.

CRediT authorship contribution statement

Moldir Yermagambetova: Methodology, Investigation, Data curation. Saule Abugalieva: Investigation, Writing – review & editing. Yerlan Turuspekov: Conceptualization, Investigation, Writing – review & editing. Shyryn Almerekova: Supervision, Conceptualization, Investigation, Software, Writing – original draft.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research was funded by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (Grant number AP09259027).

Footnotes

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.dib.2022.108866.

Appendix. Supplementary materials

mmc1.docx^{(34.9KB, docx)}

Data Availability

Sequence Read Archive (Original data) (National Center for Biotechnology Information).

Complete chloroplast genome of Juniperus seravschanica (Original data) (National Center for Biotechnology Information).

References

1.Doyle J.J., Doyle J.L. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 1987;19:11–15. [Google Scholar]
2.Beier S., Thiel T., Münch T., Scholz U., Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33:2583–2585. doi: 10.1093/bioinformatics/btx198. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin V.M., Nikolenko S.I., Pham S., Prjibelski A.D., Pyshkin A.V., Sirotkin A.V., Vyahhi N., Tesler G., Alekseyev M.A., Pevzner P.A. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
6.Lohse M., Drechsel O., Kahlau S., Bock R. OrganellarGenomeDRAW-a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic Acids Res. 2013;41:W575–W581. doi: 10.1093/nar/gkt289. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.docx^{(34.9KB, docx)}

Data Availability Statement

Sequence Read Archive (Original data) (National Center for Biotechnology Information).

Complete chloroplast genome of Juniperus seravschanica (Original data) (National Center for Biotechnology Information).

[bib0001] 1.Doyle J.J., Doyle J.L. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull. 1987;19:11–15. [Google Scholar]

[bib0002] 2.Beier S., Thiel T., Münch T., Scholz U., Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33:2583–2585. doi: 10.1093/bioinformatics/btx198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0003] 3.Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0004] 4.Bankevich A., Nurk S., Antipov D., Gurevich A.A., Dvorkin M., Kulikov A.S., Lesin V.M., Nikolenko S.I., Pham S., Prjibelski A.D., Pyshkin A.V., Sirotkin A.V., Vyahhi N., Tesler G., Alekseyev M.A., Pevzner P.A. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0005] 5.Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]

[bib0006] 6.Lohse M., Drechsel O., Kahlau S., Bock R. OrganellarGenomeDRAW-a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets. Nucleic Acids Res. 2013;41:W575–W581. doi: 10.1093/nar/gkt289. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Illumina sequencing data of the complete chloroplast genome of rare species Juniperus seravschanica (Cupressaceae) from Kazakhstan

Moldir Yermagambetova

Saule Abugalieva

Yerlan Turuspekov

Shyryn Almerekova

Abstract

Value of the Data

1. Objective

2. Data Description

Fig. 1.

Fig. 2.

Table 1.

Table 2.

3. Experimental Design, Materials and Methods

3.1. Plant Material and DNA Extraction

3.2. Library Preparation and Sequencing

3.3. Genome Assembly and Annotation

3.4. Detection of SSR Markers in the Chloroplast Genome

Ethics Statements

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Footnotes

Appendix. Supplementary materials

Data Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Illumina sequencing data of the complete chloroplast genome of rare species Juniperus seravschanica (Cupressaceae) from Kazakhstan

Moldir Yermagambetova

Saule Abugalieva

Yerlan Turuspekov

Shyryn Almerekova

Abstract

Value of the Data

1. Objective

2. Data Description

Fig. 1.

Fig. 2.

Table 1.

Table 2.

3. Experimental Design, Materials and Methods

3.1. Plant Material and DNA Extraction

3.2. Library Preparation and Sequencing

3.3. Genome Assembly and Annotation

3.4. Detection of SSR Markers in the Chloroplast Genome

Ethics Statements

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Footnotes

Appendix. Supplementary materials

Data Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases