Abstract
Polygonum minus plant is rich in secondary metabolites, especially terpenoids and flavonoids. Present study generates transcriptome resource for P. minus to decipher its secondary metabolite biosynthesis pathways. Raw reads and the transcriptome assembly project have been deposited at GenBank under the accessions SRX313492 (root) and SRX669305 (leaf) respectively.
Keywords: Hybrid assembly, Kesum, Secondary metabolites, Transcriptome
Specifications | |
---|---|
Subject area | Biology, Plant Molecular Biology |
Type of data | Transcriptome sequences |
Organism/Cell line/tissue | Polygonum minus (leaf and root) |
Sequencer type | Illumina HiSeq™ 2000 (leaf), Roche 454 GS-FLX (root) |
Data format | Raw and processed |
Experimental factors | Controlled growth chamber (leaf), experimental plot (root) |
Experimental features | RNA-seq dataset for gene discovery in plant |
Sample source location | Malaysia |
Data accessibility | GenBank accession SRX669305 (http://www.ncbi.nlm.nih.gov/sra/SRX669305) and SRX313492 (http://www.ncbi.nlm.nih.gov/sra/SRX313492) |
1. Value of the data
-
•
Current transcriptome datasets greatly improve the previous EST study in P. minus [1].
-
•
P. minus is a non-model medicinal plant rich in terpenoids bioactive compounds [2].
-
•
Improved transcript repository with increased KEGG pathways coverage provide extensive genetic resource to integrate research between gene expression and metabolite compounds in P. minus.
-
•
This data will add to the Polygonum transcriptome resource for understanding secondary metabolite production in this genus.
2. Data
To profile the leaf and root transcriptomes of P. minus, RNA-seq short reads were generated from the polyA-enriched cDNA libraries prepared from the total RNAs extracted from the leaf and root tissues. The short reads were filtered, processed, assembled and analyzed as described below. The raw data and assembly project have been deposited at GenBank under the accessions SRX313492 (http://www.ncbi.nlm.nih.gov/sra/SRX313492) and SRX669305 (http://www.ncbi.nlm.nih.gov/sra/SRX669305) for the root and leaf tissues respectively.
3. Experimental design, materials and methods
3.1. Plant materials
Sampling of P. minus root and leaf tissues were done from the experimental plot (3° 16′14.63″ N, 101° 41′ 11.32″ E) at Universiti Kebangsaan Malaysia, Bangi. Collected samples were rinsed with distilled water and frozen in liquid nitrogen before stored under − 80 °C.
3.2. Total RNA extraction, quality control, library preparation and RNA-seq
Total RNA was extracted accordingly to protocol reported by Lopez-Gomez [3]. 250 ng of poly(A) RNA was prepared from P. minus root sample using PolyATract mRNA isolation kit (Promega, USA) and used as starting material in Roche 454 GS FLX pyrosequencing platform at Malaysia Genome Institute, Malaysia. PCR emulsion was done with long fragment Lib-emPCR amplification for amplicons that are 550 bp or greater in length. The conditions used are as follows: 94 °C for 4 min, 50 cycles of 94 °C for 30 s and 60 °C for 10 min.
Two biological replicates of P. minus leaf samples were sequenced using the Illumina HiSeq 2000 sequencing platform. Paired end reads with 90 bp was generated through the standard library preparation protocol implemented by BGI-Shenzhen, P. R. China.
3.3. Transcriptome de novo assembly, annotation and classification
Raw reads were filtered to remove adapter sequences with sequence pre-processing tools, Cutadapt [4] and Trimmomatic [5]. High quality Illumina raw reads with phred score ≥ 25 were kept for assembly. Root 454 reads were clipped to pseudo reads equivalent to that of leaf Illumina short reads of 90 bp with 5 bp overlap using an in-house PHP script (http://gitlab.inbiosis.ws/open-source/rnaseq-utils). These reads were then digital normalized with Khmer protocol (http://khmer.readthedocs.org/en/v1.0/). De novo hybrid assembly of these processed reads was performed with Trinity (release r20140717) [6]. Statistics of the hybrid assembly is showed in Table 1.
Table 1.
Attributes | Value |
---|---|
Pre-assembly | |
Total raw reads | 48,615,711 |
Total processed reads | 34,365,872 |
Post-assembly | |
Number of unigenes | 108,541 |
Number of unique transcripts | 188,735 |
N50 (bp) | 1009 |
Size range (bp) | 201–12,106 |
Protein coding sequences of unique transcripts were analyzed via Transdecoder which was embedded as a utility script in Trinity pipeline. Standard Trinotate (release r20140708) annotation pipeline (https://trinotate.github.io/) was carried out to annotate the assembled unique transcripts against Swissprot [7], Pfam [8], eggNOG [9], Gene Ontology [10], SignalP [11], and Rnammer [12]. Summary of the annotation is showed in Table 2. Annotated Gene Ontology terms from Trinotate were associated with EC2GO database [13] for KEGG Pathway mapping via KEGG Search & Color Mapper API [14] (Table 3).
Table 2.
Annotation/tools | Number of unique transcripts |
---|---|
Total Transdecoder Peptides | 86,295 |
BLASTX-SwissProt | 17,307 |
BLASTP-SwissProt | 29,283 |
PFAM-TMHMM | 13,617 |
eggNOG | 29,004 |
Gene Ontology (GO) | 52,796 |
SignalP | 3715 |
RNAMMER | 9 |
Table 3.
Mapping resources | Total mapping entities |
---|---|
GO2EC | 482 unique enzymes |
KEGG search & color | 7037 unique KO, 376 KEGG pathways |
Conflict of interest
All the authors have approved submission and there are no conflicts of interest.
Acknowledgments
This research was supported by Fundamental Research Grant Scheme (FRGS/1/2013/SG05/UKM/01/2) from the Malaysian Ministry of High Education (MOHE) and Research University Grant under the Arus Perdana (UKM-AP-BPB-14-2009) from Universiti Kebangsaan Malaysia.
References
- 1.Roslan N.D., Yusop J.M., Baharum S.N., Othman R., Mohamed-Hussein Z.-A., Ismail I., Noor N.M., Zainal Z. Int. J. Mol. Sci. 2012;13:2692–2706. doi: 10.3390/ijms13032692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Baharum S.N., Bunawan H., Ghani M.a.A., Mustapha W.A.W., Noor N.M. Molecules. 2010;15:7006–7015. doi: 10.3390/molecules15107006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lopez-Gomez R., Gomez-Lim M.A. Hortscience. 1992;27:440–442. [Google Scholar]
- 4.Martin M. EMBnet.journal. 2011;17:10–12. [Google Scholar]
- 5.Bolger A.M., Lohse M., Usadel B. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Haas B.J., Papanicolaou A., Yassour M., Grabherr M., Blood P.D., Bowden J., Couger M.B., Eccles D., Li B., Lieber M. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I. Nucleic Acids Res. 2003;31:365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Finn R.D., Bateman A., Clements J., Coggill P., Eberhardt R.Y., Eddy S.R., Heger A., Hetherington K., Holm L., Mistry J. Nucleic Acids Res. 2013:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Powell S., Forslund K., Szklarczyk D., Trachana K., Roth A., Huerta-Cepas J., Gabaldón T., Rattei T., Creevey C., Kuhn M. Nucleic Acids Res. 2013:D231–D239. doi: 10.1093/nar/gkt1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.C . Gene Ontology. Nucleic Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Petersen T.N., Brunak S., von Heijne G. H. Nielsen. Nat. Methods. 2011;8:785–786. doi: 10.1038/nmeth.1701. [DOI] [PubMed] [Google Scholar]
- 12.Lagesen K., Hallin P., Rødland E.A., Stærfeldt H.-H., Rognes T. D.W. Ussery. Nucleic Acids Res. 2007;35:3100–3108. doi: 10.1093/nar/gkm160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Camon E., Barrell D., Brooksbank C., Magrane M., Apweiler R. Comp. Funct. Genomics. 2003;4:71–74. doi: 10.1002/cfg.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kawashima S., Katayama T., Sato Y., Kanehisa M. Genome Inform. 2003;14:673–674. [Google Scholar]