RNA-seq analysis for secondary metabolite pathway gene discovery in Polygonum minus

Kok-Keong Loke; Reyhaneh Rahnamaie-Tajadod; Chean-Chean Yeoh; Hoe-Han Goh; Zeti-Azura Mohamed-Hussein; Normah Mohd Noor; Zamri Zainal; Ismanizan Ismail

doi:10.1016/j.gdata.2015.11.003

. 2015 Nov 7;7:12–13. doi: 10.1016/j.gdata.2015.11.003

RNA-seq analysis for secondary metabolite pathway gene discovery in Polygonum minus

Kok-Keong Loke ^a, Reyhaneh Rahnamaie-Tajadod ^a, Chean-Chean Yeoh ^b, Hoe-Han Goh ^a,^⁎, Zeti-Azura Mohamed-Hussein ^a,^b, Normah Mohd Noor ^a, Zamri Zainal ^a,^b, Ismanizan Ismail ^a,^b

PMCID: PMC4778588 PMID: 26981350

Abstract

Polygonum minus plant is rich in secondary metabolites, especially terpenoids and flavonoids. Present study generates transcriptome resource for P. minus to decipher its secondary metabolite biosynthesis pathways. Raw reads and the transcriptome assembly project have been deposited at GenBank under the accessions SRX313492 (root) and SRX669305 (leaf) respectively.

Keywords: Hybrid assembly, Kesum, Secondary metabolites, Transcriptome

Specifications
Subject area	Biology, Plant Molecular Biology
Type of data	Transcriptome sequences
Organism/Cell line/tissue	Polygonum minus (leaf and root)
Sequencer type	Illumina HiSeq™ 2000 (leaf), Roche 454 GS-FLX (root)
Data format	Raw and processed
Experimental factors	Controlled growth chamber (leaf), experimental plot (root)
Experimental features	RNA-seq dataset for gene discovery in plant
Sample source location	Malaysia
Data accessibility	GenBank accession SRX669305 (http://www.ncbi.nlm.nih.gov/sra/SRX669305) and SRX313492 (http://www.ncbi.nlm.nih.gov/sra/SRX313492)

Open in a new tab

1. Value of the data

•
Current transcriptome datasets greatly improve the previous EST study in P. minus [1].
•
P. minus is a non-model medicinal plant rich in terpenoids bioactive compounds [2].
•
Improved transcript repository with increased KEGG pathways coverage provide extensive genetic resource to integrate research between gene expression and metabolite compounds in P. minus.
•
This data will add to the Polygonum transcriptome resource for understanding secondary metabolite production in this genus.

2. Data

To profile the leaf and root transcriptomes of P. minus, RNA-seq short reads were generated from the polyA-enriched cDNA libraries prepared from the total RNAs extracted from the leaf and root tissues. The short reads were filtered, processed, assembled and analyzed as described below. The raw data and assembly project have been deposited at GenBank under the accessions SRX313492 (http://www.ncbi.nlm.nih.gov/sra/SRX313492) and SRX669305 (http://www.ncbi.nlm.nih.gov/sra/SRX669305) for the root and leaf tissues respectively.

3. Experimental design, materials and methods

3.1. Plant materials

Sampling of P. minus root and leaf tissues were done from the experimental plot (3° 16′14.63″ N, 101° 41′ 11.32″ E) at Universiti Kebangsaan Malaysia, Bangi. Collected samples were rinsed with distilled water and frozen in liquid nitrogen before stored under − 80 °C.

3.2. Total RNA extraction, quality control, library preparation and RNA-seq

Total RNA was extracted accordingly to protocol reported by Lopez-Gomez [3]. 250 ng of poly(A) RNA was prepared from P. minus root sample using PolyATract mRNA isolation kit (Promega, USA) and used as starting material in Roche 454 GS FLX pyrosequencing platform at Malaysia Genome Institute, Malaysia. PCR emulsion was done with long fragment Lib-emPCR amplification for amplicons that are 550 bp or greater in length. The conditions used are as follows: 94 °C for 4 min, 50 cycles of 94 °C for 30 s and 60 °C for 10 min.

Two biological replicates of P. minus leaf samples were sequenced using the Illumina HiSeq 2000 sequencing platform. Paired end reads with 90 bp was generated through the standard library preparation protocol implemented by BGI-Shenzhen, P. R. China.

3.3. Transcriptome de novo assembly, annotation and classification

Raw reads were filtered to remove adapter sequences with sequence pre-processing tools, Cutadapt [4] and Trimmomatic [5]. High quality Illumina raw reads with phred score ≥ 25 were kept for assembly. Root 454 reads were clipped to pseudo reads equivalent to that of leaf Illumina short reads of 90 bp with 5 bp overlap using an in-house PHP script (http://gitlab.inbiosis.ws/open-source/rnaseq-utils). These reads were then digital normalized with Khmer protocol (http://khmer.readthedocs.org/en/v1.0/). De novo hybrid assembly of these processed reads was performed with Trinity (release r20140717) [6]. Statistics of the hybrid assembly is showed in Table 1.

Table 1.

Statistics of P. minus hybrid assembly.

Attributes	Value
Pre-assembly
Total raw reads	48,615,711
Total processed reads	34,365,872

Post-assembly
Number of unigenes	108,541
Number of unique transcripts	188,735
N50 (bp)	1009
Size range (bp)	201–12,106

Open in a new tab

Protein coding sequences of unique transcripts were analyzed via Transdecoder which was embedded as a utility script in Trinity pipeline. Standard Trinotate (release r20140708) annotation pipeline (https://trinotate.github.io/) was carried out to annotate the assembled unique transcripts against Swissprot [7], Pfam [8], eggNOG [9], Gene Ontology [10], SignalP [11], and Rnammer [12]. Summary of the annotation is showed in Table 2. Annotated Gene Ontology terms from Trinotate were associated with EC2GO database [13] for KEGG Pathway mapping via KEGG Search & Color Mapper API [14] (Table 3).

Table 2.

Functional annotation of P. minus unique transcripts.

Annotation/tools	Number of unique transcripts
Total Transdecoder Peptides	86,295
BLASTX-SwissProt	17,307
BLASTP-SwissProt	29,283
PFAM-TMHMM	13,617
eggNOG	29,004
Gene Ontology (GO)	52,796
SignalP	3715
RNAMMER	9

Open in a new tab

Table 3.

Statistics of EC2GO mapped enzymes and KEGG pathway mapping.

Mapping resources	Total mapping entities
GO2EC	482 unique enzymes
KEGG search & color	7037 unique KO, 376 KEGG pathways

Open in a new tab

Conflict of interest

All the authors have approved submission and there are no conflicts of interest.

Acknowledgments

This research was supported by Fundamental Research Grant Scheme (FRGS/1/2013/SG05/UKM/01/2) from the Malaysian Ministry of High Education (MOHE) and Research University Grant under the Arus Perdana (UKM-AP-BPB-14-2009) from Universiti Kebangsaan Malaysia.

References

1.Roslan N.D., Yusop J.M., Baharum S.N., Othman R., Mohamed-Hussein Z.-A., Ismail I., Noor N.M., Zainal Z. Int. J. Mol. Sci. 2012;13:2692–2706. doi: 10.3390/ijms13032692. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Baharum S.N., Bunawan H., Ghani M.a.A., Mustapha W.A.W., Noor N.M. Molecules. 2010;15:7006–7015. doi: 10.3390/molecules15107006. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Lopez-Gomez R., Gomez-Lim M.A. Hortscience. 1992;27:440–442. [Google Scholar]
4.Martin M. EMBnet.journal. 2011;17:10–12. [Google Scholar]
5.Bolger A.M., Lohse M., Usadel B. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Haas B.J., Papanicolaou A., Yassour M., Grabherr M., Blood P.D., Bowden J., Couger M.B., Eccles D., Li B., Lieber M. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I. Nucleic Acids Res. 2003;31:365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Finn R.D., Bateman A., Clements J., Coggill P., Eberhardt R.Y., Eddy S.R., Heger A., Hetherington K., Holm L., Mistry J. Nucleic Acids Res. 2013:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Powell S., Forslund K., Szklarczyk D., Trachana K., Roth A., Huerta-Cepas J., Gabaldón T., Rattei T., Creevey C., Kuhn M. Nucleic Acids Res. 2013:D231–D239. doi: 10.1093/nar/gkt1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.C . Gene Ontology. Nucleic Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Petersen T.N., Brunak S., von Heijne G. H. Nielsen. Nat. Methods. 2011;8:785–786. doi: 10.1038/nmeth.1701. [DOI] [PubMed] [Google Scholar]
12.Lagesen K., Hallin P., Rødland E.A., Stærfeldt H.-H., Rognes T. D.W. Ussery. Nucleic Acids Res. 2007;35:3100–3108. doi: 10.1093/nar/gkm160. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Camon E., Barrell D., Brooksbank C., Magrane M., Apweiler R. Comp. Funct. Genomics. 2003;4:71–74. doi: 10.1002/cfg.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Kawashima S., Katayama T., Sato Y., Kanehisa M. Genome Inform. 2003;14:673–674. [Google Scholar]

[bb0005] 1.Roslan N.D., Yusop J.M., Baharum S.N., Othman R., Mohamed-Hussein Z.-A., Ismail I., Noor N.M., Zainal Z. Int. J. Mol. Sci. 2012;13:2692–2706. doi: 10.3390/ijms13032692. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0010] 2.Baharum S.N., Bunawan H., Ghani M.a.A., Mustapha W.A.W., Noor N.M. Molecules. 2010;15:7006–7015. doi: 10.3390/molecules15107006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0015] 3.Lopez-Gomez R., Gomez-Lim M.A. Hortscience. 1992;27:440–442. [Google Scholar]

[bb0020] 4.Martin M. EMBnet.journal. 2011;17:10–12. [Google Scholar]

[bb0025] 5.Bolger A.M., Lohse M., Usadel B. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0030] 6.Haas B.J., Papanicolaou A., Yassour M., Grabherr M., Blood P.D., Bowden J., Couger M.B., Eccles D., Li B., Lieber M. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0035] 7.Boeckmann B., Bairoch A., Apweiler R., Blatter M.-C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I. Nucleic Acids Res. 2003;31:365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0040] 8.Finn R.D., Bateman A., Clements J., Coggill P., Eberhardt R.Y., Eddy S.R., Heger A., Hetherington K., Holm L., Mistry J. Nucleic Acids Res. 2013:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0045] 9.Powell S., Forslund K., Szklarczyk D., Trachana K., Roth A., Huerta-Cepas J., Gabaldón T., Rattei T., Creevey C., Kuhn M. Nucleic Acids Res. 2013:D231–D239. doi: 10.1093/nar/gkt1253. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0050] 10.C . Gene Ontology. Nucleic Acids Res. 2004;32:D258–D261. doi: 10.1093/nar/gkh036. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0055] 11.Petersen T.N., Brunak S., von Heijne G. H. Nielsen. Nat. Methods. 2011;8:785–786. doi: 10.1038/nmeth.1701. [DOI] [PubMed] [Google Scholar]

[bb0060] 12.Lagesen K., Hallin P., Rødland E.A., Stærfeldt H.-H., Rognes T. D.W. Ussery. Nucleic Acids Res. 2007;35:3100–3108. doi: 10.1093/nar/gkm160. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0065] 13.Camon E., Barrell D., Brooksbank C., Magrane M., Apweiler R. Comp. Funct. Genomics. 2003;4:71–74. doi: 10.1002/cfg.235. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0070] 14.Kawashima S., Katayama T., Sato Y., Kanehisa M. Genome Inform. 2003;14:673–674. [Google Scholar]

PERMALINK

RNA-seq analysis for secondary metabolite pathway gene discovery in Polygonum minus

Kok-Keong Loke

Reyhaneh Rahnamaie-Tajadod

Chean-Chean Yeoh

Hoe-Han Goh

Zeti-Azura Mohamed-Hussein

Normah Mohd Noor

Zamri Zainal

Ismanizan Ismail

Abstract

1. Value of the data

2. Data

3. Experimental design, materials and methods

3.1. Plant materials

3.2. Total RNA extraction, quality control, library preparation and RNA-seq

3.3. Transcriptome de novo assembly, annotation and classification

Table 1.

Table 2.

Table 3.

Conflict of interest

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

RNA-seq analysis for secondary metabolite pathway gene discovery in Polygonum minus

Kok-Keong Loke

Reyhaneh Rahnamaie-Tajadod

Chean-Chean Yeoh

Hoe-Han Goh

Zeti-Azura Mohamed-Hussein

Normah Mohd Noor

Zamri Zainal

Ismanizan Ismail

Abstract

1. Value of the data

2. Data

3. Experimental design, materials and methods

3.1. Plant materials

3.2. Total RNA extraction, quality control, library preparation and RNA-seq

3.3. Transcriptome de novo assembly, annotation and classification

Table 1.

Table 2.

Table 3.

Conflict of interest

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases