RNA-seq data from whole rice grains of pigmented and non-pigmented Malaysian rice varieties

Rabiatul-Adawiah Zainal-Abidin; Zamri Zainal; Zeti-Azura Mohamed-Hussein; Norliza Abu-Bakar; Mohd Shahril Firdaus Ab Razak; Sanimah Simoh; Yun Shin Sew

doi:10.1016/j.dib.2020.105432

. 2020 Mar 16;30:105432. doi: 10.1016/j.dib.2020.105432

RNA-seq data from whole rice grains of pigmented and non-pigmented Malaysian rice varieties

Rabiatul-Adawiah Zainal-Abidin ^a,^b, Zamri Zainal ^b,^c, Zeti-Azura Mohamed-Hussein ^b,^c, Norliza Abu-Bakar ^a, Mohd Shahril Firdaus Ab Razak ^a, Sanimah Simoh ^a, Yun Shin Sew ^a,^⁎

PMCID: PMC7138961 PMID: 32280737

Abstract

Pigmented rice is enriched with antioxidants, macro- and micronutrients. A comprehensive investigation of the gene expression patterns among the pigmented rice varieties would help to understand the cellular mechanism and biological processes of rice grain pigmentation. Hence, we performed RNA sequencing and analysis on the whole grain of dehusked mature seeds of selected six Malaysian rice varieties with varying grain pigmentations. These varieties were black rice (BALI and Pulut Hitam 9), red rice (MRM16 and MRQ100) and white rice (MR297 and MRQ76). Illumina HiSeq™ 4000 sequencer was used to generate total raw nucleotides of approximately 53 Gb in size. From 353,937,212 total paired-end raw reads, 340,131,496 total clean reads were obtained. The raw reads were deposited into European Nucleotide Archive (ENA) database and can be accessed via accession number PRJEB34340. This dataset allows us to identify and profile all expressed genes with functions related to nutritional traits (i.e. antioxidants, folate and amylose content) and quality trait (i.e. aroma) across both pigmented and non-pigmented rice varieties. In addition, the transcriptome data obtained will be valuable for discovery of potential gene markers and functional SNPs related to functional traits to assist in rice breeding programme.

Keywords: Rice grain, Pigmented rice, Transcriptome, Nutritional trait, Quality trait

Specifications table

Subject	Agricultural and Biological Sciences
Specific subject area	Plant transcriptomics
Type of data	Table, text file
How data were acquired	Illumina HiSeq™ 4000 sequencing platform
Data format	Raw (FASTQ)
Parameters for data collection	Mature rice seeds of 6 rice varieties with varying grain pigmentations namely black rice (BALI and Pulut Hitam 9), red rice (MRM16 and MRQ100) and white rice (MR297 and MRQ76) were collected from MARDI rice field plots. BALI is a landrace rice variety while PH9, MRM16, MRQ100, MR297 and MRQ76 are modern and cultivated rice varieties. BALI, PH9, MRM16 and MRQ100 were chosen due to their high antioxidant properties [1]. MRQ76 is an aromatic rice variety [2] while MR297 has shown to have high micronutrient content [1]. The seeds were dehusked and the whole rice grains were used for total RNA extraction, cDNA library preparation and sequencing.
Description of data collection	RNAseq dataset was collected from paired-end sequencing of rice cDNA libraries using Illumina HiSeq4000™ platform with 2 × 150 bp reads. The raw reads were recorded in a FASTQ file. Raw reads were filtered to remove reads containing adapter or reads of low quality, and clean reads were mapped to reference genome of Oryza sativa japonica cv. Nipponbare. Total mapped reads and number of transcripts were estimated from transcript assembly with a threshold of FPKM ≥ 0.1.
Data source location	City/Town/Region: Serdang, Selangor Country: Malaysia Latitude and longitude (and GPS coordinates) for collected samples/data:] 2.9885871″N 101.697955417″E
Data accessibility	The raw paired-end transcriptome sequence reads from BALI, PH9, MRM16, MRQ100, MR297 and MRQ76 were deposited in the ENA database (www.ebi.ac.uk/ena) under the accession number PRJEB34340. Direct URL to data: https://www.ebi.ac.uk/ena/browser/view/PRJEB34340

Open in a new tab

Value of the data

•
These RNA-seq data obtained from the selected 6 rice varieties which represent the first complete set of transcriptome data generated from rice varieties with varying grain pigmentations (black, red and white).
•
This dataset allows us to discover functional genes related to rice grain pigmentation, nutritional and aromatic properties.
•
These data permit comparative transcriptomics between pigmented and non-pigmented rice varieties. Differential gene expression profiles between varieties could help in understanding of molecular mechanisms and biological processes that responsible for certain valuable rice trait.
•
These RNAseq data together with rice genomic data are important for identification of functional markers such as single nucleotide polymorphisms (SNPs) and microsatellites related to nutritional and quality traits for future rice genetic improvement research.

1. Data description

The dataset in this article is RNA-seq raw reads for dehusked whole rice grains obtained from mature seeds of four pigmented (BALI, Pulut Hitam 9, MRM16 and MRQ100) and two non-pigmented (MR297 and MRQ76) rice varieties. Raw data obtained from Illumina HiSeq™ 4000 sequencer were deposited as FASTQ format in ENA database (accession number: PRJEB34340). The accession number for individual rice variety in ENA database were presented as ENA run primary accession in Table 1. Analyses of sequencing data from each rice variety e.g. raw and clean reads, raw and clean nucleotide were performed as shown in Table 2. The quality of clean reads were assessed and the percentage of high quality clean reads were obtained. By mapping clean reads to Oryza sativa japonica cv. Nipponbare reference genome, the number of mapped reads were estimated (Table 3). Oryza sativa japonica cv. Nipponbare genome was used for clean reads mapping due to it is a well-assembled and annotated genome. Although a few indica rice cultivars have been sequenced however those genomes were not well-annotated [3]. Additionally, transcript assembly to reference genome with a threshold of FPKM ≥ 0.1 predicted the number of transcripts for each rice variety as listed in Table 3.

Table 1.

List of accession number of individual pigmented and non-pigmented rice transcriptome in ENA database.

Rice variety	Phenotype	ENA studies primary accession	ENA run primary accession
BALI	Pigmented (Black)	PRJEB34340	ERR3515585
Pulut Hitam 9	Pigmented (Black)	PRJEB34340	ERR3515586
MRM16	Pigmented (Red)	PRJEB34340	ERR3515587
MRQ100	Pigmented (Red)	PRJEB34340	ERR3515588
MR297	Non-pigmented (White)	PRJEB34340	ERR3515589
MRQ76	Non-pigmented (White)	PRJEB34340	ERR3515590

Open in a new tab

Table 2.

Statistics of sequencing data of individual pigmented and non-pigmented rice variety.

Rice variety	Phenotype	Raw reads (paired-end)	Raw nucleotides (bp)	Clean reads (paired-end)	Clean nucleotides (bp)
BALI	Pigmented (Black)	53,901,374	8085,206,100	52,008,296	52,008,296
Pulut Hitam 9	Pigmented (Black)	63,166,848	9475,027,200	61,139,386	61,139,386
MRM16	Pigmented (Red)	61,143,304	9171,495,600	58,422,974	58,422,974
MRQ100	Pigmented (Red)	47,999,632	7199,944,800	45,725,970	45,725,970
MR297	Non-pigmented (White)	73,151,820	10,972,773,000	71,132,050	71,132,050
MRQ76	Non-pigmented (White)	54,574,234	8186,134,100	51,702,820	51,702,820
Total		353,937,212	53,090,580,800 ∼53 Gb	340,131,496	51,019,724,400 ∼51 Gb

Open in a new tab

Table 3.

Statistics of reads mapping and transcripts assembly for each pigmented and non-pigmented rice variety.

Rice variety	High-quality reads (paired-end)	Percentage of high quality reads (%)	Mapped reads	Percentages of mapped reads (%)	Number of transcripts
BALI	52,008,296	98.19	41,648,421	80.08	24,307
Pulut Hitam 9	61,139,386	99.40	49,891,537	81.6	26,223
MRM16	58,422,974	99.31	46,768,318	80.05	25,066
MRQ100	45,725,970	94.43	34,637,723	75.75	25,123
MR297	71,132,050	99.26	55,444,321	77.95	25,416
MRQ76	51,702,820	99.36	40,626,254	78.58	25,092
Total	340,131,496		269,016,574

Open in a new tab

2. Experimental design, materials, and methods

2.1. Plant materials, total RNA extraction and quality assessment of total RNA

Mature seeds of each pigmented and non-pigmented rice variety were obtained in the field plots at MARDI Seberang Perai, Penang, Malaysia. The seeds were dehusked and the whole rice grain tissue (three plants of each variety) were ground into fine powder using liquid nitrogen. Total RNA extraction was performed using MTL method [4] with modifications. NanoDrop ND-1000 (Thermo Scientific, Waltham, MA, USA) ultraviolet spectrophotometer was used to evaluate the isolated total RNA quantity and 1% (w/v) agarose gel electrophoresis was used to observe for the RNA degradation and contamination.

2.2. Library preparation and transcriptome sequencing

High-quality total RNA samples with RIN values ≥ 6.5 were subjected to isolation of messenger RNAs using oligo(dT) beads and cDNA synthesis was performed using random hexamers and SuperScript II Reverse Transcriptase (Invitrogen, USA) according to manufacturers' instructions. After that, second-strand synthesis by nick-translation was carried out using a custom second-strand synthesis buffer (Illumina) added with dNTPs, RNase H and Escherichia coli polymerase I. The cDNA library was then constructed after a round of purification, terminal repair, A-tailing, ligation of sequencing adapters, size selection and PCR enrichment. The cDNA library concentration was quantified using a Qubit 2.0 fluorometer (Life Technologies, USA), and then diluted to 1 ng/µl before checking insert size on an Agilent 2100 bioanalyzer (Agilent Technologies, USA). Paired-end sequencing was performed on the cDNA fragments from the resulting libraries using Illumina HiSeq4000™ platform with read length of 150 bp at each end.

2.3. Repository and processing of RNA-seq raw data

The sequencing raw reads were deposited into European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena) with an accession number of PRJEB34340. Table 1 shows the ENA Run Primary accession numbers of individual pigmented and non-pigmented rice transcriptome in ENA database. The raw reads were subsequently filtered using Trimmomatic version 0.36 [5] to remove the adapter sequences, contamination and low-quality reads. Table 2 shows the statistics of raw and clean reads of individual rice transcriptome after sequence processing and analysis.

2.4. Reads mapping, transcripts assembly and gene expression analysis

The clean reads were mapped to the reference genome of Oryza sativa japonica cv. Nipponbare. Bowtie2 version 2.3.0 was used to index the reference genome, while TopHat2 version 2.0.12 [6] was used to map the clean reads onto the reference genome. The default parameters were used for the above analyses. HTSeq version 0.6.1 [7] was used to estimate the Fragments Per Kilobase of transcript per Million mapped reads (FPKMs) that were mapped to each rice gene. A threshold of FPKM ≥ 0.1 was used to determine the significance of gene expression. Cufflinks version 2.1.1 [8] was used to combine and assemble the mapped reads into the transcript. The number of mapped reads, percentage of mapped reads and number of transcripts are shown in Table 3. These sequences and information will be used for further downstream analyses such as differential expressed genes, genes co-expression network and SNPs calling.

Acknowledgments

The authors would like to acknowledge financial support from MARDI RMK-11 Developmental Fund (P21003004010001-l).

Conflict of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Footnotes

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.dib.2020.105432.

Appendix. Supplementary materials

mmc1.xml^{(1.2KB, xml)}

References

1.Sew Y.S., Ahmad M.A., Abd Rashid M.R., Abu Bakar N., Machap C., Ling A.C.K., Zainal Abidin R.A., Rozano L., Simoh S. Antioxidant activities and microelement composition of Malaysian local pigmented and non-pigmented rice varieties. Trans. Persatuan Genet. Malays. 2016;3:205–212. [Google Scholar]
2.Harun R., Halim N.A., Engku Ariff E.E., Serin T. FFTC Agricultural Policy Platform (FFTC-AP) 2018. Consumer preferences on Malaysia's specialty rice; pp. 1–9. [Google Scholar]
3.Mahesh H.B., Shirke M.D., Singh S., Rajamani A., Hittalmani S., Wang G.L., Gowda M. Indica rice genome assembly, annotation and mining of blast disease resistance genes. BMC Genom. 2016;17:242. doi: 10.1186/s12864-016-2523-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Mornkham T., Wangsomnuk P.P., Fu Y.B., Wangsomnuk P., Jogloy S., Patanothai A. Extractions of high quality RNA from the seeds of Jerusalem artichoke and other plant species with high levels of starch and lipid. Plants. 2013;2(2):302–316. doi: 10.3390/plants2020302. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Bolger A.M., Lohse M., Trimmomatic U.B. a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Kim D., Pertea G., Trapnell C., Pimentel H., Kelley R., Salzberg S.L. TopHat2 : accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:1–13. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Anders S., Pyl P.T., Huber W. HTSeq-A Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Trapnell C., Roberts A., Goff L., Pertea G., Kim D., Kelley D.R., Pimentel H., Salzberg S.L., Rinn J.L., Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with Tophat and Cufflinks. Nat. Protoc. 2012;7(3):562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

mmc1.xml^{(1.2KB, xml)}

[bib0001] 1.Sew Y.S., Ahmad M.A., Abd Rashid M.R., Abu Bakar N., Machap C., Ling A.C.K., Zainal Abidin R.A., Rozano L., Simoh S. Antioxidant activities and microelement composition of Malaysian local pigmented and non-pigmented rice varieties. Trans. Persatuan Genet. Malays. 2016;3:205–212. [Google Scholar]

[bib0002] 2.Harun R., Halim N.A., Engku Ariff E.E., Serin T. FFTC Agricultural Policy Platform (FFTC-AP) 2018. Consumer preferences on Malaysia's specialty rice; pp. 1–9. [Google Scholar]

[bib0003] 3.Mahesh H.B., Shirke M.D., Singh S., Rajamani A., Hittalmani S., Wang G.L., Gowda M. Indica rice genome assembly, annotation and mining of blast disease resistance genes. BMC Genom. 2016;17:242. doi: 10.1186/s12864-016-2523-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0004] 4.Mornkham T., Wangsomnuk P.P., Fu Y.B., Wangsomnuk P., Jogloy S., Patanothai A. Extractions of high quality RNA from the seeds of Jerusalem artichoke and other plant species with high levels of starch and lipid. Plants. 2013;2(2):302–316. doi: 10.3390/plants2020302. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0005] 5.Bolger A.M., Lohse M., Trimmomatic U.B. a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0006] 6.Kim D., Pertea G., Trapnell C., Pimentel H., Kelley R., Salzberg S.L. TopHat2 : accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:1–13. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0007] 7.Anders S., Pyl P.T., Huber W. HTSeq-A Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0008] 8.Trapnell C., Roberts A., Goff L., Pertea G., Kim D., Kelley D.R., Pimentel H., Salzberg S.L., Rinn J.L., Pachter L. Differential gene and transcript expression analysis of RNA-seq experiments with Tophat and Cufflinks. Nat. Protoc. 2012;7(3):562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

RNA-seq data from whole rice grains of pigmented and non-pigmented Malaysian rice varieties

Rabiatul-Adawiah Zainal-Abidin

Zamri Zainal

Zeti-Azura Mohamed-Hussein

Norliza Abu-Bakar

Mohd Shahril Firdaus Ab Razak

Sanimah Simoh

Yun Shin Sew

Abstract

Value of the data

1. Data description

Table 1.

Table 2.

Table 3.

2. Experimental design, materials, and methods

2.1. Plant materials, total RNA extraction and quality assessment of total RNA

2.2. Library preparation and transcriptome sequencing

2.3. Repository and processing of RNA-seq raw data

2.4. Reads mapping, transcripts assembly and gene expression analysis

Acknowledgments

Conflict of Interest

Footnotes

Appendix. Supplementary materials

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

RNA-seq data from whole rice grains of pigmented and non-pigmented Malaysian rice varieties

Rabiatul-Adawiah Zainal-Abidin

Zamri Zainal

Zeti-Azura Mohamed-Hussein

Norliza Abu-Bakar

Mohd Shahril Firdaus Ab Razak

Sanimah Simoh

Yun Shin Sew

Abstract

Value of the data

1. Data description

Table 1.

Table 2.

Table 3.

2. Experimental design, materials, and methods

2.1. Plant materials, total RNA extraction and quality assessment of total RNA

2.2. Library preparation and transcriptome sequencing

2.3. Repository and processing of RNA-seq raw data

2.4. Reads mapping, transcripts assembly and gene expression analysis

Acknowledgments

Conflict of Interest

Footnotes

Appendix. Supplementary materials

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases