Skip to main content
Data in Brief logoLink to Data in Brief
. 2022 Feb 3;41:107908. doi: 10.1016/j.dib.2022.107908

Transcriptome dataset of sago palm in peat soil

Wei-Jie Yan a,, Hasnain Hussain a,, Hung Hui Chung b, Norzainizul Julaihi c, Rina Tommy c
PMCID: PMC8857430  PMID: 35242906

Abstract

Sago palm (Metroxylon sagu Rottb.) is an important agricultural starch-producing palm that contributes to Malaysia's economics, especially in the State of Sarawak, Malaysian Borneo. In this palm tree, the central part of the plant storage-starch. Under normal condition, sago palm develop its trunk after 4-5 years being planted. However, sago palms planted on deep-peat soil failed to develop their trunk even after 17 years of being planted. This phenomenon is known as ‘non-trunking’, which eliminates the economic value of the palms. Numerous research has been done to address the phenomenon, but the molecular mechanisms of sago palm responding toward the responsible stresses are still lacking. Therefore, in this study, leaf samples were collected from trunking (normal) and non-trunking sago palms planted on peat soil for total RNA extraction, followed by next-generation sequencing using the BGISEQ-500 platform. The raw reads were cleaned, and de novo assembled using TRINITY software package. A total of 40.11 Gb bases were sequenced from the sago palm leaf samples. The assembled sequence produced 102,447 unigenes, with N50 score 1809 bp and GC ratio of 44.34%. The alignment of unigenes with seven functional databases (NR, NT, GO, KOG, KEGG, SwissProt and InterPro) resulted in the annotation of 65,523 (63.96%) unigenes. Functional annotation results in the detection of 46,335 coding DNA sequences by Transdecoder. A total of 30,039 simple-sequence repeats distributed on 21,676 unigenes were detected using Primer3 software, and 2355 transcription factor coding unigenes were predicted using getorf and hmmseach software. This work is registered under NCBI BioProject PRJNA781491. The raw RNA sequencing data are available in Sequence Read Archive (SRA) database with accession numbers SRX13165895, SRX13165896, SRX13165897, SRX13165898, SRX13165899, and SRX13165900. Gene expression and annotation information are accessible in public functional genomics data repository Gene Expression Omnibus (GEO) with accession number GSE189085.

Keywords: Sago palm, Metroxylon sagu, RNA sequencing, Transcriptome, Non-trunking

Specifications Table

Subject Biological sciences; Omics: Transcriptomics
Specific subject area Trunk development of sago palm under stress
Type of data Transcriptomics data (raw RNA sequence reads, gene expression and sequence annotation)
How the data were acquired BGISEQ-500 platform
Data format Raw: *fastq.gz files
Assembly: *Unigene.fa.gz files
Processed Data: *gene.fpkm.txt.gz files
Description of data collection Total RNA was extracted from trunking and non-trunking sago palm (M. sagu) leaf tissue, mRNA library preparation and then was sequenced using the BGISEQ-500 platform
Data source location Dalat Sago Plantation, Mukah, Sarawak, Malaysia
GPS location are listed in Table 1)
Data accessibility Repository name: NCBI's Gene expression omnibus (GEO)
Data identification number: GSE189085
Direct URL to data: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE189085
Repository name: NCBI's Sequence Read Archive (SRA)
Sample ID: GSM5694359 (ST1: Trunking Sample 1)
Data identification number: SRX13165895
Direct URL to data: https://www.ncbi.nlm.nih.gov/sra/SRX13165895[accn]
Repository name: NCBI's Sequence Read Archive (SRA)
Sample ID: GSM5694360 (ST4: Trunking Sample 4)
Data identification number: SRX13165896
Direct URL to data: https://www.ncbi.nlm.nih.gov/sra/SRX13165896[accn]
Repository name: NCBI's Sequence Read Archive (SRA)
Sample ID: GSM5694361 (ST5: Trunking Sample 5)
Data identification number: SRX13165897
Direct URL to data: https://www.ncbi.nlm.nih.gov/sra/SRX13165897[accn]
Repository name: NCBI's Sequence Read Archive (SRA)
Sample ID: GSM5694362 (NT7: Trunking Sample 7)
Data identification number: SRX13165898
Direct URL to data: https://www.ncbi.nlm.nih.gov/sra/SRX13165898[accn]
Repository name: NCBI's Sequence Read Archive (SRA)
Sample ID: GSM5694363 (NT8: Trunking Sample 8)
Data identification number: SRX13165899
Direct URL to data: https://www.ncbi.nlm.nih.gov/sra/SRX13165899[accn]
Repository name: NCBI's Sequence Read Archive (SRA)
Sample ID: GSM5694364 (NT9: Trunking Sample 9)
Data identification number: SRX13165900
Direct URL to data: https://www.ncbi.nlm.nih.gov/sra/SRX13165900[accn]

Value of the Data

  • This data is useful for the scientific community as it provides insights into the transcriptome of M. sagu.

  • This data provides a comprehensive transcriptomic expression using pair-end sequencing with two sets of samples with three biological replicate datasets, each to comprehend gene expression contributing to the non-trunking phenomenon in M. sagu.

  • Researchers involved with the work related to the omics study of M. sagu could also benefit from this data as cross-references information to support their findings.

1. Data Description

Sago palm grows through a series of developmental stages, which takes up to twelve years to be ready for the harvest. M. sagu generates suckers (soboliferous) every 18 months as the successor of the mother plant, which dies after fruiting (hapaxanth). Mature sago palm yields 15–25 metric tons of air-dried starch per hectare at the end of an 8-year growth cycle under good condition [1]. The advantages of sago palm as a starch-producing crop that grows in peat soil with seasonal waterlogged has triggered the Land Custody and Development Authority Sarawak [2] to initiate the commercial plantation in Mukah, Sarawak in 1987. However, there was the occurrence of non-trunking sago palms even after ten years of cultivation. The non-trunking sago palm reduced starch yield per hectare of land, resulting in the instability of the sago starch market. It reduced the plantation income, consequently restricting the development of sago industries and loss of confidence in this palm by the potential or current sago palm farmers [3].

Numerous studies were performed to address the non-trunking sago palm problem such as soil physicochemical properties [2], soil microbiome [4] and molecular studies [5], [6], [7]. The general outcome of the studies revealed that the mineral deficiency causes the non-trunking in sago palm, but how this deficiency affects sago development remains unanswered. Currently, several research studies of this palm in genomics and proteomics are being conducted. In conjunction with those studies, this study utilises transcriptome analysis to compare the gene expression between the trunking and non-trunking sago palm leaf tissue to highlight the differential expressed genes and their correlation with the non-trunking phenomenon in sago palm.

The information in this article includes the transcriptomics of trunking sago palm (control) and non-trunking sago palm (target of interest) from peat soil. The global gene expression between the trunking and non-trunking sago palm was evaluated by differential expressed genes analysis. The files of the transcriptome dataset, which generated from 6 libraries of raw data and 2 sets of processed data, were submitted to Sequence Read Archive (SRA) and Gene Expression Omnibus (GEO) NCBI database.

2. Experimental Design, Materials and Methods

2.1. Sample collection

Sago palm leaf tissues were used in this study. They were six samples consisting of three biological replicates of 2 phenotypes. All the samples were collected from Dalat Sago Plantation in Mukah, Sarawak (Table 1). The samples were wiped with a kitchen towel containing 70% ethanol to remove debris. The samples were then stored in containers followed by snap-freeze in liquid nitrogen. The samples were kept in liquid nitrogen before being transferred into a ‒80 °C freezer for long-term storage.

Table 1.

Samples Global positioning system (GPS).

Morphology Sample GPS (WGS84 datum)
Non-Trunking (Target of interest) SN7 +2° 49′ 40.5″, +111° 50′ 25.8″
SN8 +2° 49′ 40.7″, +111° 50′ 25.6″
SN9 +2° 49′ 40.4″, +111° 50′ 26.3″
Trunking (Control) ST1 +2° 51′ 07.7″, +111° 49′ 35.9″
ST4 +2° 51′ 08.0″, +111° 49′ 35.5″
ST5 +2° 51′ 07.8″, +111° 49′ 36.4″

2.2. RNA extraction and RNA-seq information

Total RNA of the six samples were extracted using CTAB protocol and sequenced using BGISEQ-500 platform. Trunking and non-trunking sago palm (M. sagu) transcriptome were successfully sequenced, and the raw RNA sequence reads were deposited in NCBI's Sequence Read Archive (SRA) database with the accession numbers SRX13165895, SRX13165896, SRX13165897, SRX13165898, SRX13165899, and SRX13165900.

The total RNA samples were subjected to mRNA enrichment before the RNA sequencing. About 40.11 Gb bases raw sequence reads of the six RNA samples were successfully generated using the BGISEQ-500 sequencing platform. The raw reads containing more than 5% unknown N base, adaptor-polluted and more than 20% of bases in the total read with a quality score lower than 15 were then cut-off, and the remaining reads are characterised as clean reads. The clean read ratio exceeded 95% with high accuracy reflected by Q score Q30 (equivalent to the probability of an incorrect base call of 1 in 1000 times) above 90% of the reads and Q20 (equivalent to the probability of an incorrect base call of 1 in 100 times) above 95% of the reads (Refer Table 2).

Table 2.

Clean reads quality metrics.

Sample Total Raw Reads (Mb) Total Clean Reads (Mb) Total Clean Bases (Gb) Clean Reads Q20 (%) Clean Reads Q30 (%) Clean Reads Ratio (%)
ST1 69.96 66.82 6.68 97.88 91.45 95.51
ST4 69.96 66.97 6.7 98.09 92.07 95.72
ST5 69.96 66.68 6.67 97.76 91.14 95.31
SN7 69.96 66.66 6.67 98 91.78 95.27
SN8 69.41 66.62 6.66 97.94 91.59 95.99
SN9 69.96 67.33 6.73 97.98 91.63 96.24

Keys; Sample: Sample name

Total Raw Reads(Mb): The reads amount before filtering

Total Clean Reads(Mb): The reads amount after filtering

Total Clean Bases(Gb): The total base amount after filtering

Clean Reads Q20(%): The rate of bases which quality is greater than 20 value in clean reads

Clean Reads Q30(%): The rate of bases which quality is greater than 30 value in clean reads

Clean Reads Ratio(%): The ratio of the amount of clean reads

2.3. De novo assembly

The clean reads were then de novo assembled using trinity software and generated the reference sequence (Table 3). Reference sequences were then undergone abundance screening using TIGR gene indices clustering tools (TGICL) to obtain unique gene (Unigene) sequences (Table 4; Fig. 1).

Table 3.

Quality metrics of transcripts.

Sample Total Number Total Length Mean Length N50 N70 N90 GC(%)
ST1 75730 56765699 749 1334 740 283 45.54
ST4 94989 73822623 777 1447 778 285 44.11
ST5 84578 65560398 775 1430 776 285 44.67
SN7 68068 55890222 821 1472 860 309 45.44
SN8 91100 65484623 718 1303 684 270 44.75
SN9 97205 78047560 802 1503 821 293 43.65

Keys; Sample: Sample name

Total Number: The total number of transcripts

Total Length: The read length of transcripts

Mean Length: The average length of transcripts

N50: The N50 length is used to determine the assembly continuity, the higher the better. N50 is a weighted median statistic that 50% of the total length is contained in Unigenes that are equal to or larger than this value;

N70: Similar to N50

N90: Similar to N50

GC(%): the percentage of G and C bases in all transcripts

Table 4.

Quality metrics of unigenes.

Sample Total Number Total Length Mean Length N50 N70 N90 GC(%)
ST1 53270 45270441 849 1409 847 332 45.64
ST4 68390 59838729 874 1522 889 333 44.17
ST5 60537 52810113 872 1506 881 333 44.72
SN7 48542 44955041 926 1542 973 367 45.51
SN8 64137 52366130 816 1390 800 316 44.82
SN9 70377 63341038 900 1570 924 342 43.69
All-Unigene 102447 103410779 1009 1809 1122 382 44.34

Keys; Sample: Sample name

Total Number: The total number of transcripts

Total Length: The read length of transcripts

Mean Length: The average length of transcripts

N50: The N50 length is used to determine the assembly continuity, the higher the better. N50 is a weighted median statistic that 50% of the total length is contained in Unigenes that are equal to or larger than this value;

N70: Similar to N50

N90: Similar to N50

GC(%): the percentage of G and C bases in all transcripts

Fig. 1.

Fig. 1.

Unigene length distribution. X axis represents the length of Unigenes. Y axis represents the number of unigenes.

2.4. Unigene functional annotation

After assembly, the Unigenes were functionally annotated with seven functional databases, namely; NCBI protein database (NR), NCBI nucleotide database (NT), Gene Ontology (GO), Eukaryotic Orthologous Groups of proteins (KOG), Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-Prot, a curated protein sequence database of UniProt, and InterPro (Table 5; Fig. 2). Unigene annotation and expression information are deposited in NCBI's Gene Expression Omnibus (GEO) with accession number GSE189085.

Table 5.

Annotation summary of the Unigenes with the seven databases.

Values Total Nr Nt SwissProt KEGG KOG InterPro GO Intersection Overall
Number 102,447 56,600 54,530 42,327 44,057 45,181 47,388 15,312 7,421 65,523
Percentage 100% 55.25% 53.23% 41.32% 43.00% 44.10% 46.26% 14.95% 7.24% 63.96%

Fig. 2.

Fig. 2.

Venn diagram between NR, KOG, KEGG, Swissprot and Interpro.

2.5 Unigene expression

Based on the assembly result, the clean reads of each sample were mapped to the Unigenes with Bowtie2 software and the gene expression level were calculated with RSEM. Correlation between samples are distinguished in Principal component analysis (PCA) (Fig. 3).

Fig. 3.

Fig. 3.

Principal component analysis of the samples gene expressions.

Transcriptomic data of two sago phenotypes were completed, with 40.11 Gb bases sequenced, producing annotated Unigenes, and the detection of SSR and transcription factors. The data obtained from this study can be used to understand gene expression contributing to the trunking phenomenon in M. sagu.

Ethics Statement

This work does not contain any studies with humans. The original collections of sago palm leaf (M. sagu) were made with the direct permission of Dalat Sago Plantation owned by Land Custody and Development Authority (LCDA) Holdings Sdn. Bhd., in the Mukah division. The sago palm leaf samples were not collected from any National Parks or protected wilderness areas. Additionally, the sago palm (M. sagu) are not endangered species.

CRediT authorship contribution statement

Wei-Jie Yan: Conceptualization, Methodology, Data curation, Writing – original draft, Visualization. Hasnain Hussain: Conceptualization, Funding acquisition, Supervision, Writing – review & editing. Hung Hui Chung: Writing – review & editing. Norzainizul Julaihi: Funding acquisition, Writing – review & editing. Rina Tommy: Funding acquisition, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research is supported by Sarawak Research Development Council Grant: RDCRG/CAT/2019/23. The authors thank Land Custody and Development Authority (LCDA) Holdings Sdn. Bhd. for the permission to obtain samples from its plantation and logistical supports.

Contributor Information

Wei-Jie Yan, Email: 17010084@siswa.unimas.my.

Hasnain Hussain, Email: hhasnain@unimas.my.

References

  • 1.Flach M. Vol. 13. Bioversity International; 1997. (Sago Palm: Metroxylon sagu Rottb. - Promoting the Conservation and Use of Underutilised and Neglected Crops). [Google Scholar]
  • 2.Melling L. Department of Agriculture, Soils Branch; Sarawak: 2000. Dalat & Mukah sago plantation peat soil study: Final report. [Google Scholar]
  • 3.Chua S.N.D., Kho E.P., Lim S.F., Hussain M.H. Sago palm (Metroxylon sagu) starch yield, influencing factors and estimation from morphological traits. Adv. Mater. Process. Technol. 2021:1–23. doi: 10.1080/2374068X.2021.1878702. [DOI] [Google Scholar]
  • 4.Apun K., Lihan S., Wong M.K., Bilung L.M. Proceedings of the 1st ASEAN Sago Symposium. 2009. Microbiological characteristics of trunking and non-trunking sago palm peat soil; pp. 29–30. 2009Current Trend and Development in Sago Research. Universiti Malaysia Sarawak: Kuching. [Google Scholar]
  • 5.Hussain H., Yan W.J., Ngaini Z., Julaihi N., Tommy R., Bhawani S.A. Differential metabolites markers from trunking and stressed non-trunking sago palm (Metroxylon sagu Rottb.) Curr. Chem. Biol. 2020;14:262–278. doi: 10.2174/2212796814999200930120925. [DOI] [Google Scholar]
  • 6.Hussain H., Kamal M.M., Al-Obaidi J.R., Hamdin N.E., Ngaini Z., Mohd-Yusuf Y. Proteomics of sago palm towards identifying contributory proteins in stress-tolerant cultivar. Protein J. 2020;39(1):62–72. doi: 10.1007/s10930-019-09878-9. [DOI] [PubMed] [Google Scholar]
  • 7.Lim L.W.K., Chung H.H., Hussain H., Gan H.M. Genome survey of sago palm (Metroxylon sagu Rottboll) Plant Gene. 2021;28 doi: 10.1016/j.plgene.2021.100341. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES