Skip to main content
Genomics Data logoLink to Genomics Data
. 2015 Nov 7;7:12–13. doi: 10.1016/j.gdata.2015.11.003

RNA-seq analysis for secondary metabolite pathway gene discovery in Polygonum minus

Kok-Keong Loke a, Reyhaneh Rahnamaie-Tajadod a, Chean-Chean Yeoh b, Hoe-Han Goh a,, Zeti-Azura Mohamed-Hussein a,b, Normah Mohd Noor a, Zamri Zainal a,b, Ismanizan Ismail a,b
PMCID: PMC4778588  PMID: 26981350

Abstract

Polygonum minus plant is rich in secondary metabolites, especially terpenoids and flavonoids. Present study generates transcriptome resource for P. minus to decipher its secondary metabolite biosynthesis pathways. Raw reads and the transcriptome assembly project have been deposited at GenBank under the accessions SRX313492 (root) and SRX669305 (leaf) respectively.

Keywords: Hybrid assembly, Kesum, Secondary metabolites, Transcriptome


Specifications
Subject area Biology, Plant Molecular Biology
Type of data Transcriptome sequences
Organism/Cell line/tissue Polygonum minus (leaf and root)
Sequencer type Illumina HiSeq™ 2000 (leaf), Roche 454 GS-FLX (root)
Data format Raw and processed
Experimental factors Controlled growth chamber (leaf), experimental plot (root)
Experimental features RNA-seq dataset for gene discovery in plant
Sample source location Malaysia
Data accessibility GenBank accession
SRX669305 (http://www.ncbi.nlm.nih.gov/sra/SRX669305) and SRX313492 (http://www.ncbi.nlm.nih.gov/sra/SRX313492)

1. Value of the data

  • Current transcriptome datasets greatly improve the previous EST study in P. minus [1].

  • P. minus is a non-model medicinal plant rich in terpenoids bioactive compounds [2].

  • Improved transcript repository with increased KEGG pathways coverage provide extensive genetic resource to integrate research between gene expression and metabolite compounds in P. minus.

  • This data will add to the Polygonum transcriptome resource for understanding secondary metabolite production in this genus.

2. Data

To profile the leaf and root transcriptomes of P. minus, RNA-seq short reads were generated from the polyA-enriched cDNA libraries prepared from the total RNAs extracted from the leaf and root tissues. The short reads were filtered, processed, assembled and analyzed as described below. The raw data and assembly project have been deposited at GenBank under the accessions SRX313492 (http://www.ncbi.nlm.nih.gov/sra/SRX313492) and SRX669305 (http://www.ncbi.nlm.nih.gov/sra/SRX669305) for the root and leaf tissues respectively.

3. Experimental design, materials and methods

3.1. Plant materials

Sampling of P. minus root and leaf tissues were done from the experimental plot (3° 16′14.63″ N, 101° 41′ 11.32″ E) at Universiti Kebangsaan Malaysia, Bangi. Collected samples were rinsed with distilled water and frozen in liquid nitrogen before stored under − 80 °C.

3.2. Total RNA extraction, quality control, library preparation and RNA-seq

Total RNA was extracted accordingly to protocol reported by Lopez-Gomez [3]. 250 ng of poly(A) RNA was prepared from P. minus root sample using PolyATract mRNA isolation kit (Promega, USA) and used as starting material in Roche 454 GS FLX pyrosequencing platform at Malaysia Genome Institute, Malaysia. PCR emulsion was done with long fragment Lib-emPCR amplification for amplicons that are 550 bp or greater in length. The conditions used are as follows: 94 °C for 4 min, 50 cycles of 94 °C for 30 s and 60 °C for 10 min.

Two biological replicates of P. minus leaf samples were sequenced using the Illumina HiSeq 2000 sequencing platform. Paired end reads with 90 bp was generated through the standard library preparation protocol implemented by BGI-Shenzhen, P. R. China.

3.3. Transcriptome de novo assembly, annotation and classification

Raw reads were filtered to remove adapter sequences with sequence pre-processing tools, Cutadapt [4] and Trimmomatic [5]. High quality Illumina raw reads with phred score ≥ 25 were kept for assembly. Root 454 reads were clipped to pseudo reads equivalent to that of leaf Illumina short reads of 90 bp with 5 bp overlap using an in-house PHP script (http://gitlab.inbiosis.ws/open-source/rnaseq-utils). These reads were then digital normalized with Khmer protocol (http://khmer.readthedocs.org/en/v1.0/). De novo hybrid assembly of these processed reads was performed with Trinity (release r20140717) [6]. Statistics of the hybrid assembly is showed in Table 1.

Table 1.

Statistics of P. minus hybrid assembly.

Attributes Value
Pre-assembly
Total raw reads 48,615,711
Total processed reads 34,365,872



Post-assembly
Number of unigenes 108,541
Number of unique transcripts 188,735
N50 (bp) 1009
Size range (bp) 201–12,106

Protein coding sequences of unique transcripts were analyzed via Transdecoder which was embedded as a utility script in Trinity pipeline. Standard Trinotate (release r20140708) annotation pipeline (https://trinotate.github.io/) was carried out to annotate the assembled unique transcripts against Swissprot [7], Pfam [8], eggNOG [9], Gene Ontology [10], SignalP [11], and Rnammer [12]. Summary of the annotation is showed in Table 2. Annotated Gene Ontology terms from Trinotate were associated with EC2GO database [13] for KEGG Pathway mapping via KEGG Search & Color Mapper API [14] (Table 3).

Table 2.

Functional annotation of P. minus unique transcripts.

Annotation/tools Number of unique transcripts
Total Transdecoder Peptides 86,295
BLASTX-SwissProt 17,307
BLASTP-SwissProt 29,283
PFAM-TMHMM 13,617
eggNOG 29,004
Gene Ontology (GO) 52,796
SignalP 3715
RNAMMER 9

Table 3.

Statistics of EC2GO mapped enzymes and KEGG pathway mapping.

Mapping resources Total mapping entities
GO2EC 482 unique enzymes
KEGG search & color 7037 unique KO, 376 KEGG pathways

Conflict of interest

All the authors have approved submission and there are no conflicts of interest.

Acknowledgments

This research was supported by Fundamental Research Grant Scheme (FRGS/1/2013/SG05/UKM/01/2) from the Malaysian Ministry of High Education (MOHE) and Research University Grant under the Arus Perdana (UKM-AP-BPB-14-2009) from Universiti Kebangsaan Malaysia.

References

  • 1.Roslan N.D., Yusop J.M., Baharum S.N., Othman R., Mohamed-Hussein Z.-A., Ismail I., Noor N.M., Zainal Z. Int. J. Mol. Sci. 2012;13:2692–2706. doi: 10.3390/ijms13032692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Baharum S.N., Bunawan H., Ghani M.a.A., Mustapha W.A.W., Noor N.M. Molecules. 2010;15:7006–7015. doi: 10.3390/molecules15107006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lopez-Gomez R., Gomez-Lim M.A. Hortscience. 1992;27:440–442. [Google Scholar]
  • 4.Martin M. EMBnet.journal. 2011;17:10–12. [Google Scholar]
  • 5.Bolger A.M., Lohse M., Usadel B. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Haas B.J., Papanicolaou A., Yassour M., Grabherr M., Blood P.D., Bowden J., Couger M.B., Eccles D., Li B., Lieber M. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I. Nucleic Acids Res. 2003;31:365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Finn R.D., Bateman A., Clements J., Coggill P., Eberhardt R.Y., Eddy S.R., Heger A., Hetherington K., Holm L., Mistry J. Nucleic Acids Res. 2013:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Powell S., Forslund K., Szklarczyk D., Trachana K., Roth A., Huerta-Cepas J., Gabaldón T., Rattei T., Creevey C., Kuhn M. Nucleic Acids Res. 2013:D231–D239. doi: 10.1093/nar/gkt1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.C . Gene Ontology. Nucleic Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Petersen T.N., Brunak S., von Heijne G. H. Nielsen. Nat. Methods. 2011;8:785–786. doi: 10.1038/nmeth.1701. [DOI] [PubMed] [Google Scholar]
  • 12.Lagesen K., Hallin P., Rødland E.A., Stærfeldt H.-H., Rognes T. D.W. Ussery. Nucleic Acids Res. 2007;35:3100–3108. doi: 10.1093/nar/gkm160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Camon E., Barrell D., Brooksbank C., Magrane M., Apweiler R. Comp. Funct. Genomics. 2003;4:71–74. doi: 10.1002/cfg.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kawashima S., Katayama T., Sato Y., Kanehisa M. Genome Inform. 2003;14:673–674. [Google Scholar]

Articles from Genomics Data are provided here courtesy of Elsevier

RESOURCES