Skip to main content
Data in Brief logoLink to Data in Brief
. 2019 Dec 14;28:104998. doi: 10.1016/j.dib.2019.104998

De novo transcriptome datasets of Shorea balangeran leaves and basal stem in waterlogged and dry soil

Fitri Indriani a, Ulfah J Siregar b, Deden D Matra c, Iskandar Z Siregar b,
PMCID: PMC7093797  PMID: 32226802

Abstract

Shorea balangeran Burk locally known as balangeran has been widely used as recommended species for tropical peat swamp forest restoration, due to the capability of these species to grow in waterlogged and dry areas. However, the information concerning genetic basis of adaptation to ecological condition variation is limited and no transcriptome study has been reported in this context. Here we reported two sets of transcriptome data from a sample of leaf and basal stem that were taken from seedlings growing in potted media containing peat and mineral soil. The raw reads are stored in the DDBJ platform with accession number DRA008633.

Keywords: Shorea balangeran, Transriptome, RNA-seq, Adaptation


Specifications Table

Subject Agricultural and Biological Sciences: Forestry
Specific subject area Molecular study in Forestry
Type of data RNA Sequencing Data
How data were acquired Illumina Hiseq 4000
Data format Raw sequencing reads and assembled contigs
Parameters for data collection Leaf and basal steam of balangeran seedlings planted in waterlogged peat, dry peat, waterlogged mineral soil and dry mineral soil
Description of data collection Total RNA was sequenced using Illumina Hiseq 4000 platform in NovogenAIT, Singapore
Data source location Bogor, West Java Indonesia
Data accessibility Repository name: DDBJ (DNA Data Bank of Japan)
Data identification number: DRA008633
Direct URL to data:
https://ddbj.nig.ac.jp/DRASearch/submission?acc=%20DRA008633
Related research article F. Indriani, D.D. Matra, U.J. Siregar, I.Z. Siregar
Ecological aspects and genetic diversity of Shorea balangeran in two forest types of Muara Kendawangan Nature Reserve, West Kalimantan, Indonesia,
Biodiversitas. 20 (2019) 482–488 https://doi.org/10.13057/biodiv/d200226
Value of the Data
  • This is the first transcriptome data of Shorea balangeran from leaves and basal stem

  • This data is beneficial to elucidate the molecular mechanism and gene pathway of Shorea balangeran response to different ecological condition

  • This data allows further analysis to identify genes of interest that play roles in Shorea balangeran adaptation process

1. Data

Shorea balangeran (balangeran) belongs to Dipterocarpaceae family that is distributed in peat and heath forest in Indonesia [1]. In this study the de novo transcriptome assembly of balangeran is reported for the first time. The transcriptome data were obtained from leaves and basal stem of seedling that were growing in potted media containing each of peat and mineral soil. The high quality of mRNA extracted were sequenced using Illumina Hiseq 4000. The statistics of the reads and assembled sequences are presented in Table 1. The overview of transcriptome data were showed in Table 2. Analysis showed that 113,998 contigs (63.62%) had significant matches in nr NCBI database and 78,407 (43.49%) in Swiss-Prot database and 90,875 (50.40%) in TrEMBL database. Out of 180,291 merged contigs, a total 130,314 open reading frames (ORFs) were identified (Table 3) with 5prime partial ORFs type 31,209 (23.95%), 3prime partial 17,633 (13.53%) and complete ORFs type 64,374 (49.40%) were identified. In this study, microsatellite motifs from merged contigs were identified (Table 4), mononucleotides were the most abundant type (44,626, 70.30%), followed by trinucleotides (11,160, 17.58%) and dinucleotides (6,270, 9.88%).

Table 1.

The properties of reads and assembled sequences of balangeran.

Features Numbers
Leaf Basal Stem Mergedb (Leaf and Basal Stem)
Reads
Number of reads 64,101,942 56,537,051 120,638,993
Number of bases 9,615,291,300 8,480,557,650 18,095,848,950
Number of post-trimming reads 62,400,243
(97.35)
54,917,915
(97.14)
117,318,158
(97.25)
Number of post-trimming bases 9,360,036,450
(97.35)
8,237,687,250
(97.14)
17,597,723,700
(97.25)
Transcriptsa
Number of transcript 279,598 574,875
Number of bases 175,610,736 342,696,076
Length range (bp) 201-16,510 201-16,960
Average (bp) 628.08 596.12
N50 (bp) 940 839
GC contents (%) 42.28 45.56
Contigsb
Number of contig 187,297 440,665 180,291
Number of bases 118,677,247 252,486,917 197,305,352
Length range (bp) 201-16,510 201-16,960 201-17,014
Average (bp) 633.63 572.97 1094.37
N50 (bp) 918 762 1489
GC contents (%) 42.6 46.2 44.3
a

Constructed by Trinity Program.

b

Constructed by CAP3, cd-hit-est, and corset (only for merged contig) programs.

Table 2.

Functional annotation of balangeran contigs using several database.

Database Source Number (percentage)
Contig Number 180,291
Non-redundant protein (nr) NCBI 113,998 (63.62)
Non-redundant Nucleotide (nt) NCBI 53,407 (29.62)
Swiss-Prot UniProt 78,407 (43.49)
TrEMBL UniProt 90,875 (50.40)

Table 3.

Open Reading Frames (ORFs) prediction characteristics of balangeran contigs using TransDecoder.

Features Contigs Number (percentage)
ORF contig 130,314
ORFs Type :
 a. 5prime_partial 31,209 (23.95)
 b. 3prime_partial 17,633 (13.53)
 c. Internal 17,104 (13.13)
 d. Complete 64,374 (49.40)

Table 4.

Number and motif of microsatellite of balangeran contigs.

Motifs Number of Contigs (percentage)
Leaf Basal Stem Merged
Mononucleotide 26,259 (72.93) 48,786 (68.83) 44,626 (70.30)
Dinucleotide 3939 (10.94) 6943 (9.80) 6270 (9.88)
Trinucleotide 5192 (14.42) 13,443 (18.97) 11,160 (17.58)
Tetranucleotide 421 (1.17) 1221 (1.72) 995 (1.57)
Pentanucleotide 142 (0.39) 292 (0.41) 267 (0.42)
Hexanucleotide 54 (0.15) 193 (0.27) 164 (0.26)

2. Experimental design, materials, and methods

Balangeran seedlings were treated and raised in the nursery of Department of Silviculture, Faculty of Forestry, IPB University Bogor for 6 months. Two seedlings were grown in peat soil in which each seedling planted in waterlogged peat and dry peat. Two seedlings were grown in mineral soil in which each seedling planted in waterlogged soil and dry soil. Total RNA was isolated from leaves and basal stem using Plant Total RNA mini kit (Geneaid) following the protocol. The quantity and integrity were evaluated using P360 Nanophotometer (Implen, München, Germany) and Bioanalyzer 2100 (Agilent Technologies). RNA samples had RNA integrity number (RIN) values between 7.4 and 8.6. The total RNA extracted were used for Illumina Hiseq 4000 sequencing following the protocol (NovogeneAIT, Singapore). The quality of raw data were examined using FastQC [2] then performed using Trimmomatic 0.39 to remove adaptor sequences, contamination and low-quality reads with default parameters (TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25) [3]. Pre-processed read from each leaves and basal stem were de novo assembled using Trinity v.2.3.2 with default parameter (minimum length = 200) [4], generated high quality contigs [5]. The each contigs from leaves and basal stem were reconstructed using CAP3 [6] and CD-HIT-EST v.4.6.8 [7]. The contigs form leaves and basal stem were merged and reconstructed using CAP3, CD-HIT-EST, and Corset program [8,9]. The database such as NCBI non-redundant (nr) (downloaded by October 1, 2018) and NCBI nucleotide sequence (nt) (downloaded by October 1, 2018), SwissProt and TrEMBL of UniProt (downloaded by September 14, 2018) were used to annotate the contigs using BLAST + program [10]. Open reading frames (ORFs) of contigs were predicted by the TransDecoder package (https://github.com/TransDecoder/TransDecoder), with the minimum ORF length of 100 bp [11]. Microsatellite discovery was analyzed using MISA software (http://pgrc.ipk-gatersleben.de/misa) with parameter (unit size-minimum repeats) as follows: 1–10, 2–6, 3–5, 4–5, 5–5, 6–5 and the interruptions (maximum difference between microsatellites) was 100 bases.

Acknowledgments

This research was funded by PMDSU program (Master Program of Education Leading to Doctoral Degree for Excellent Graduates) from Ministry of Research, Technology and Higher Education of the Republic of Indonesia and it was also partially funded under World Class University (WCU) Program, managed by Institut Teknologi Bandung. We are thankful to FRDC, FOERDIA, MoEF, Bogor, West Java, Indonesia for providing seedlings of Shorea balangeran.

Conflict of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  • 1.Indriani F., Matra D.D., Siregar U.J., Siregar I.Z. Ecological aspects and genetic diversity of Shorea balangeran in two forest types of Muara Kendawangan Nature Reserve, West Kalimantan, Indonesia. Biodiversitas. 2019;20:482–488. [Google Scholar]
  • 2.Andrews S. FastQC A Quality Control tool for High Throughput Sequence Data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/2010
  • 3.M Bolger A., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina Sequence Data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Grabherr M.G., Haas B.J., Yassour M., Levin J.Z., Thompson D.A., Amit I., Adiconis X., Fan L., Raychowdhury R., Zeng Q., Chen Z., Mauceli E., Hacohen N., Gnirke A., Rhind N., di Palma F., Birren B.W., Nusbaum C., Lindblad-Toh K., Friedman N., Regev A. Fulllength transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 2011;29:644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Matra D.D., Ritonga A.W., Natawijaya A., Poerwanto R., Sobir, Widodo W.D., Inoue E. Dataset from de novo transcriptome assembly of Nephelium lappaceum aril. Data in Brief. 2019;22:566–569. doi: 10.1016/j.dib.2018.12.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Huan X., Madan A. CAP3: a DNA sequence assembly program. Genome Res. 1999;9:868–877. doi: 10.1101/gr.9.9.868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Li W., Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
  • 8.D Matra D., Kozaki T., Ishii K., Poerwanto R., Inoue E. Comparative transcriptome analysis of translucent flesh disorder in mangosteen (Garcinia mangostana L.) fruits in response to different water regimes. PLoS One. 2019;14 doi: 10.1371/journal.pone.0219976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Davidson N.M., Oshlack A. Corset: enabling differential gene expression analysis for de novo assembled transcriptomes. Genome Biol. 2014;15 doi: 10.1186/s13059-014-0410-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  • 11.Matra D.D., Kozaki T., Ishii K., Poerwanto R., Inoue E. De novo transcriptome assembly of mangosteen (Garcinia mangostana L.) fruit. Genom. Data. 2016;10:35–37. doi: 10.1016/j.gdata.2016.09.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES