The first transcriptome sequencing and data analysis of the Javan mahseer (Tor tambra)

Melinda Mei Lin Lau; Leonard Whye Kit Lim; Hung Hui Chung; Han Ming Gan

doi:10.1016/j.dib.2021.107481

. 2021 Oct 14;39:107481. doi: 10.1016/j.dib.2021.107481

The first transcriptome sequencing and data analysis of the Javan mahseer (Tor tambra)

Melinda Mei Lin Lau ^a, Leonard Whye Kit Lim ^a, Hung Hui Chung ^a,^⁎, Han Ming Gan ^b,^c

PMCID: PMC8529094 PMID: 34712757

Abstract

The Javan mahseer (Tor tambra) is one of the most valuable freshwater fish found in Tor species. To date, other than mitogenomic data (BioProject: PRJNA422829), genomic and transcriptomic resources for this species are still lacking which is crucial to understand the molecular mechanisms associated with important traits such as growth, immune response, reproduction and sex determination. For the first time, we sequenced the transcriptome from a whole juvenile fish using Illumina NovaSEQ6000 generating raw paired-end reads. De novo transcriptome assembly generated a draft transcriptome (BUSCO5 completeness of 91.2% [Actinopterygii_odb10 database]) consisting of 259,403 putative transcripts with a total and N50 length of 333,881,215 bp and 2283 bp, respectively. A total count of 77,503 non-redundant protein coding sequences were predicted from the transcripts and used for functional annotation. We mapped the predicted proteins to 304 known KEGG pathways with signal transduction cluster having the highest representation followed by immune system and endocrine system. In addition, transcripts exhibiting significant similarity to previously published growth-and immune-related genes were identified which will facilitate future molecular breeding of Tor tambra.

Keywords: Transcriptome, Unigenes, Gene annotation, Tor tambra

Specifications Table

Subject	Biological Sciences
Specific subject area	Omics: Transcriptomics
Type of data	Sequencing raw reads, assembly, Table, Figure, Graph
How data were acquired	Sequencing
Data format	Raw Reads (fastq), Assembly (fasta)
Parameters for data collection	Total RNA extracted from a whole specimen of fish fry was used for library preparation and sequencing.
Description of data collection	Total RNA extraction was performed using Wizol TriZol-like reagent (WizBio). The purified total RNA was subjected to mRNA enrichment using poly-T magnetic bead (NEB). The enriched mRNA was subsequently processed using NEB Ultra II RNA library preparation kit and sequenced on an Illumina NovaSeq6000 (2 × 150 bp)
Data source location	The sample fish fry in this study was provided by a fish breeder who claimed that it originated from the Pahang, Malaysia. We subsequently extracted the mitochondrial genes from the transcriptome and showed that this specimen indeed formed a monophyletic cluster with Tor spp described from Pahang, Malaysia (Fig. 1) [1].
Data accessibility	Raw data and final assembled contigs were deposited in the NCBI database under the Bioproject PRJNA727425 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA727425). Additional files such as BUSCO analysis output, GO annotation, KEGG annotation and COG annotation are available in the Zenodo database https://doi.org/10.5281/zenodo.4766490.

Open in a new tab

Value of the Data

•
Transcriptome dataset from the Javan mahseer is useful to gain insight into transcription regulation and biomarker discovery for the subsequent improvement of this species for aquaculture purposes.
•
High completeness of transcriptome dataset will aid in future phylotranscriptomic studies especially for fish taxonomist.
•
The dataset is useful in facilitating genetic management for the conservation of remaining populations of mahseer in Malaysian rivers.

1. Data Description

Standard RNA sequencing was performed to generate the transcriptome assembly from Javan mahseer (Tor tambra). Sequencing and assembly results are summarized in Table 1. Coding region was extracted using TransDecoder generating 77,503 predicted non-redundant proteins [2]. The proteins were annotated using eggNOG mapper [3] that will perform mapping to the KEGG, GO and COG databases. The sequence length of each unigene ranged from < 300 bp to > 5000 bp (Fig. 2). The number of unigenes had shown a decreasing trend when the length increases. A total of 40,150, 42,644 and 61,616 unigenes were annotated to GO, KEGG and COG databases, respectively. A Venn diagram had illustrated the differences and commonalities of unigenes toward the three databases (Fig. 3). Among a total of 63,191 unigenes, COG databases had the highest number of matches (61,616 unigenes) while another 42,644 and 40,150 unigenes matched to KEGG and GO databases, respectively (Table 2). Overall, 32,317 (51.14%) unigenes were found to exhibit a significant match to all the three major databases with 50,405 unigenes (79.77%) portrayed significant match to at least one hit to these databases (Table 2).

Fig 1 — The maximum-likelihood phylogenetic tree constructed based on standard cytochrome oxidase I gene fragment with 1000 bootstrap replications, with the black bracket highlighted showing the sample fish fry involved in this study [1].

Table 1.

Transcriptome sequencing and assembly statistics.

Raw sequence reads	108,657,770 (16.29 Gb)
Number of contigs Total assembled contig length Contig N50 length Number of predicted proteins Total predicted protein length	278, 297 276, 327, 107 bp 1,922 bp 77,503 24,833,897 aa
BUSCO Completeness (Actinopterygii odb10)
Actinopterygii odb10: Complete BUSCOs	84% (3055)
Complete and single-copy BUSCOs	18.7% (679)
Complete and duplicated BUSCOs	6.9% (250)
Missing BUSCOs	9.1% (335)

Open in a new tab

Fig 2 — Length distribution of unigenes *Tor tambra*.

Fig 3 — Venn diagram showing differences and commonality of annotation based on GO, KEGG and COG.

Table 2.

Unigenes functional annotation by various databases.

Database	Number of Unigenes	Percentage (%)
GO	40,150	63.54
KEGG	42,644	67.48
COG	61,616	97.51
Annotated in at least one database	50,405	79.77
Annotated in all database	32,317	51.14
All unigenes	63,191	100.00

Open in a new tab

Fig. 4 showed the top ten subcategories account for each main ontology for GO databases. For biological process, 4404 (9.87%) were in the metabolism process, 2125 (4.76%) accounted for cell organization and biogenesis while another 1773 (3.97%) were in transport. For molecular function, 3297 (7.39%) were responsible for development while 2121 (4.75%) and 1222 (2.74%) counts were catalytic activity and binding, respectively. Meanwhile, for cellular component, a total of 1643 (3.68%) counts were accounted for cell, 1256 (2.81%) were categorized as intracellular and cytoplasm with a count of 608 (1.36%). There is a very small number of counts that grouped to extracellular region (0.22%), nucleoplasm (0.17%) and mitochondrion (0.17%).

KEGG is another widely-used reference database consisting of pathway networks for integrating and interpreting large-scale datasets generated by RNA sequencing. A total of 34 categories of KEGG database consisting of 5 main groups (Cellular Processes, Environmental Information Processing, Genetic Information Processing, Metabolism and Organismal System) had been mapped and successfully located to 304 known KEGG pathways (Fig. 5). Among the five main categories, the largest category was organismal system (36,792, 38.79%) whilst genetic information processing had the lowest count (4640, 4.89%). The cluster having the most counts are as follow: signal transduction (17527, 18.48%), immune system (10897, 11.49%) and endocrine system (9059, 9.55%). In terms of signal transduction, various pathways such as two-component system, MAPK, ErbB, Ras, Rap1, Wnt, Notch, Hedgehog, TGF-beta, Hippo. VEGF, Apelin, JAK-STAT, NF-kappa B, TNF, HIF-1, FoxO, calcium, phosphatidylinositol, phospholipase D, sphingolipid, cAMP, cGMP-PKG, PI3K-Akt, AMPK and mTOR were found in Tor tambra, indicating a large number of signal generation during development stage. Fig. 6 shows the top 10 KEGG cluster components with the most counts among the 5 main KEGG groups. The largest count was metabolic pathway from metabolism category (4386, 4.62%), followed by NOD-like receptor signaling pathway (2247, 2.37%) and necroptosis (1940, 2.05%). Necroptosis belongs to the category cellular processes while NOD-like receptor signaling pathway belong to the organismal systems category.

COG database consists of clusters of orthologous groups and is divided into 25 COG classifications (Fig. 7). Altogether 63,191 unigenes were mapped to COG database that can be grouped into 4 mainly categories, information storage and processing (15.59%), cellular processes and signaling (40.63%), metabolism (12.62%) and poorly characterised (31.17%). Among the 25 classifications, the largest clusters were function unknown (20560, 31.17%) and signal transduction mechanism (13521, 20.50%), followed by posttranslational modification, protein turnover, chaperones (5138, 7.79%), transcription (4529, 6.87%) and cytoskeleton (2364, 3.58%).

73 growth-related genes and 30 immune-related genes were selected based on literature review [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. Each gene was searched for its respective accession number compatible to its protein sequence in NCBI (https://www.ncbi.nlm.nih.gov/). Out of the 103 genes, 51 growth-related genes and 13 immune-related genes were selected based on a stringent E-value cutoff of 10⁻¹⁰. Table 3 had listed on the growth-related proteins while Table 4 listed for immune-related proteins.

Table 3.

Growth-related protein. Protein marked with * asterisk sign were proteins selected after e-value cutoff while best parameters were inputed for proteins that did not pass the cutoff filter.

Contig ID	Protein	Accession	Blast Hit	Aligned Length	Similarity (%)	Reference
TRINITY_DN2932_c0_g1_i4.p1	*Acetyl-CoA carboxylase alpha (Haplotype 1)	ADT82650.1	15	2342	74.552	[7]
TRINITY_DN2318_c0_g1_i1.p1	*Acetyl-CoA carboxylase alpha (Haplotype 2)	ADX43925.1	15	2390	96.987	[7]
TRINITY_DN2932_c0_g1_i4.p1	*Acetyl-CoA carboxylase 2	XP_018962132.1	20	1837	96.788	[7]
TRINITY_DN1703_c0_g1_i1.p1	*Acetyl-CoA Acetyltransferase 2	AAD34966.1	13	395	91.899	[8]
TRINITY_DN1946_c0_g2_i1.p1	*Alpha actinin 3	XP_031588180.1	189	896	93.973	[6]
TRINITY_DN2816_c0_g1_i1.p1	*ATP synthase	BAA82837.1	10	518	99.421	[5]
TRINITY_DN7716_c0_g1_i1.p1	*G2/mitotic-specific cyclin-B1	NP_571588.1	27	397	94.71	[7]
TRINITY_DN7305_c0_g1_i14.p1	*Cell division cycle protein 20	NP_998245.2	460	496	90.524	[7]
TRINITY_DN9513_c0_g1_i1.p1	*Conserved Edge Expressed	ABK35126.2	2	322	91.304	[11]
TRINITY_DN148_c0_g1_i1.p1	*Creatine kinase M-type	XP_028660838.1	18	381	81.102	[6]
TRINITY_DN4485_c0_g1_i10.p1	*Peroxisomal carnitine O-octanoyltransferase	XP_018953213.1	35	333	89.189	[9]
TRINITY_DN7185_c0_g1_i1.p1	*Cathepsin K precursor	NP_001017778.1	54	335	68.657	[5]
TRINITY_DN2821_c0_g1_i10.p1	*Cathepsin L	AAI08032.1	54	612	95.425	[5]
TRINITY_DN3405_c0_g1_i1.p2	*25-hydroxycholesterol_7-alpha-hydroxylase	XP_699028.2	69	506	81.423	[9]
TRINITY_DN2424_c0_g1_i2.p1	*Delta-6_desaturase	AZL94116.1	12	444	97.973	[13]
TRINITY_DN3834_c0_g3_i1.p1	*Elongation of very long chain fatty acids protein 6	XP_018957993.1	25	266	99.624	[5]
TRINITY_DN787_c0_g1_i1.p1	*Fatty acid synthase	ARO92273.1	35	2517	94.915	[9]
TRINITY_DN14382_c0_g1_i1.p1	*Farnesyl pyrophosphate synthase isoform X1	XP_005472704.1	1	359	81.058	[9]
TRINITY_DN11024_c0_g1_i2.p1	*Forkhead box protein K1 isoform X1	XP_025764132.1	90	687	64.338	[9]
TRINITY_DN741_c0_g1_i8.p1	*Glucose-6-phosphatase	AVP32214.1	8	355	95.211	[5]
TRINITY_DN13312_c0_g2_i4.p1	*Growth hormone receptor	ADZ13484.1	29	602	95.085	[10]
TRINITY_DN46810_c0_g1_i2.p1	*Glutathione synthetase	XP_018970229.1	1	475	90.105	[9]
TRINITY_DN909_c1_g1_i3.p1	*70-kDa heat shock proteins	AAF70445.1	49	643	85.07	[8]
TRINITY_DN1380_c0_g2_i1.p1	*Insulin-like growth factor-binding protein 1	ACV72066.1	45	262	96.947	[7]
TRINITY_DN958_c0_g1_i1.p1	*Insulin-like growth factor-binding protein 2	ACM47497.1	54	274	97.08	[7]
TRINITY_DN7562_c0_g1_i3.p1	*Insulin-like growth factor-binding protein 3	ACM47527.1	67	293	89.761	[7]
TRINITY_DN4955_c0_g1_i1.p1	*Inositol Monophosphatase 1	XP_018975133.1	5	282	97.872	[9]
TRINITY_DN230_c0_g1_i10.p1	*Insulin-induced_gene_1_protein	NP_956163.1	8	251	97.61	[8]
TRINITY_DN5482_c0_g1_i1.p1	*Tyrosine-protein kinase JAK2-like	XP_022620925.1	1540	1117	76.455	[14]
TRINITY_DN844_c0_g1_i1.p1	*Lipopolysaccharide binding protein	NP_001118057.1	70	469	74.2	[8]
TRINITY_DN1369_c0_g1_i2.p1	*Hepatic triacylglycerol lipase	XP_018956861.1	23	498	91.767	[5]
TRINITY_DN2357_c0_g1_i12.p1	*Myocyte Enhancer Factor 2°	BAA33567.1	22	475	92.632	[11]
TRINITY_DN31591_c0_g1_i2.p1	*Myostatin 1	AJF48833.1	44	375	98.133	[11]
TRINITY_DN63384_c0_g1_i1.p1	*Myostatin 2	AJF48834.1	43	366	94.536	[11]
TRINITY_DN610_c0_g1_i3.p1	*Cytochrome P450	AAK37960.1	178	505	68.119	[8]
TRINITY_DN26556_c0_g1_i1.p1	*Paired box protein 7 isoform X1	XP_013988550.1	230	520	93.077	[11]
TRINITY_DN6261_c0_g1_i12.p1	*Cytosolic phospholipase A2 gamma-like	XP_018952444.1	45	580	61.724	[5]
TRINITY_DN98343_c0_g1_i1.p1	*Proopiomelanocortin	AAM93491.2	3	117	40.171	[14]
TRINITY_DN10545_c1_g2_i1.p1	*Peroxisome proliferator-activated receptor alpha	CAJ76702.1	175	462	87.013	[8]
TRINITY_DN6218_c0_g1_i1.p1	*Peroxisome proliferator-activated receptor beta	ACR15760.1	1171	488	74.795	[8]
TRINITY_DN2759_c3_g1_i3.p1	*Prolactin receptor	QIB98245.1	45	611	86.579	[10]
TRINITY_DN371_c1_g1_i2.p1	*Antithrombin-III	XP_018920986.1	83	450	91.778	[5]
TRINITY_DN2228_c1_g2_i1.p1	*SMAD family member 3	ABI94729.1	25	423	99.527	[9]
TRINITY_DN2168_c0_g1_i3.p1	*Secreted Protein Acidic And Cysteine Rich	XP_003447656.1	34	300	83.667	[9]
TRINITY_DN10498_c0_g1_i2.p1	*Squalene monooxygenase	NP_001103509.1	3	557	90.305	[9]
TRINITY_DN26296_c0_g1_i1.p1	*Somatostatin Receptor type 1-like	XP_018943223.1	563	367	98.365	[10]
TRINITY_DN17037_c0_g1_i2.p1	*Somatostatin Receptor type 2-like	XP_018946514.1	404	337	88.427	[10]
TRINITY_DN6684_c0_g1_i4.p1	*Signal transducer and activator of transcription 1b	NP_956385.2	28	722	80.332	[14]
TRINITY_DN150_c0_g1_i2.p1	*Signal transducer and activator of transcription 2	NP_001258730.1	29	852	78.638	[14]
TRINITY_DN677_c0_g1_i4.p1	*Signal transducer and activator of transcription 3	BAH47263.1	28	806	99.007	[14]
TRINITY_DN28414_c0_g1_i5.p1	*Signal transducer and activator of transcription 4	NP_001004510.1	29	679	95.582	[14]
TRINITY_DN12936_c0_g1_i1.p1	*Signal transducer and activator of transcription 5	BAH47264.1	28	787	84.117	[14]
TRINITY_DN8483_c0_g1_i1.p1	*Ubiquitin carboxyl-terminal hydrolase 38	XP_003449754.1	114	1039	70.356	[9]

Open in a new tab

Table 4.

Immune-related proteins. Protein marked with * asterisk sign were proteins selected after e-value cutoff while best parameters were inputed for proteins that did not pass the cutoff filter.

Contig ID	Protein	Accession	Blast Hit	Assembled Length	Similarity (%)	Reference
TRINITY_DN399_c1_g1_i6.p1	*C-X-C Motif Chemokine Receptor 4	BAA32797.1	454	338	93.491	[4]
TRINITY_DN3511_c0_g1_i2.p1	*Myeloid differentiation primary response protein MyD88	XP_018923074.1	18	282	91.489	[14]
TRINITY_DN37340_c0_g2_i1.p1	*60S ribosomal protein L8	XP_034741439.1	4	257	98.054	[4]
TRINITY_DN29928_c0_g1_i1.p1	*Toll-like receptor 1	NP_001124065.1	264	798	75.439	[10]
TRINITY_DN32082_c1_g1_i2.p1	*Toll-like receptor 13	NP_001133860.1	941	951	45.216	[14]
-	*Toll-like receptor 14	AXL48518.1	451	-	-	[14]
TRINITY_DN33713_c0_g1_i2.p1	*Toll-like receptor 2	NP_997977.1	279	787	79.288	[14]
TRINITY_DN10726_c0_g1_i3.p1	*Toll-like receptor 21	AVX48323.1	829	960	87.604	[10]
TRINITY_DN32082_c1_g1_i2.p1	*Toll-like receptor 22	NP_001117884.1	1	958	47.182	[10]
TRINITY_DN39710_c0_g1_i2.p1	*Toll-like receptor 3	ABL11473.1	581	904	88.496	[12]
TRINITY_DN32146_c0_g1_i5.p1	*Toll-like receptor 4ba	AHH85806.1	300	755	73.51	[12]
TRINITY_DN32146_c0_g1_i5.p1	*Toll-like receptor 4bb	AHH85807.1	379	817	91.31	[12]
TRINITY_DN12666_c0_g1_i5.p1	*Toll-like receptor 5b precursor	NP_001124067.2	716	686	79.446	[14]
TRINITY_DN50330_c0_g1_i1.p1	*Toll-like receptor 7	AIS23537.1	550	1026	40.448	[14]
TRINITY_DN28916_c0_g1_i1.p1	*Toll-like receptor 9	ADE20130.1	1252	69	33.333	[4]

Open in a new tab

2. Experimental Design, Materials and Methods

2.1. Sampling and RNA extraction

A euthanized juvenile fish fry was provided by a local fish breeder. The whole specimen was homogenized in Wizol reagent (WizBio), a Trizol-like reagent. Total RNA extraction was subsequently performed as per the manufacturer's instructions.

2.2. Library construction and sequencing

Approximately 1 ug of total RNA was used as the input for mRNA enrichment using NEBNext Poly(A) mRNA magnetic isolation module (NEB). The enriched mRNA was subsequently processed using the NEBNext Ultra II non-directional RNA library preparation kit (NEB). Sequencing of the RNA library was performed on an Illumina NovaSeq6000 using the run configuration of 2 × 150 bp.

2.3. Sequence data processing and assembly

Raw reads were filtered for poly-G at the 3’ end, Illumina adapter and low-quality reads using the default setting of fastp v0.22.0 [15]. The trimmed paired-end reads were assembled de novo using Trinity v.2.8.5 using the default setting [16]. The transcriptome completeness was assessed using BUSCO v5 [17] based on the single-copy orthologs represented in the actinopterygii_odb10 database.

2.4. Mitogenome reconstruction and phylogeny

Trimmed pair-end reads were aligned to the reference mitochondrial genome of the Javan mahseer (GenBank Accession Code: NC_036511.1) using bowtie2 [18]. The SAM alignment was normalized to reduce high coverage particularly in the rRNA gene region followed by consensus generation using the samtools mpile up and bcftools [19]. The draft mitogenome assembly was annotated and used for phylogenetic analysis as previously described [1].

2.5. Annotation of unigenes

The protein coding sequences were extracted using TransDecoder v.5.5.0 followed by clustering at 98% protein similarity using cdhit v4.7 (-g 1 -c 98). The non-redundant predicted protein dataset was annotated using eggNOG mapper (evolutionary genealogy of genes: Non-supervised Orthologous Groups) with a minimum E-value of 0.001. Functional annotation of unigenes was executed by mapping against the three databases, GO (Gene Ontology), KEGG (Kyoto Encyclopedia of Genes and Genomes) and COG (the Clusters of Orthologous Groups).

Ethics Statement

All experiments comply with the ARRIVE guidelines and were carried out in accordance with the U.K. Animals (Scientific Procedures) Act, 1986 and associated guidelines, EU Directive 2010/63/EU for animal experiments, or the National Institutes of Health guide for the care and use of Laboratory animals (NIH Publications No. 8023, revised 1978).

CRediT authorship contribution statement

Melinda Mei Lin Lau: Writing – original draft, Data curation, Conceptualization. Leonard Whye Kit Lim: Data curation, Writing – original draft, Conceptualization. Hung Hui Chung: Conceptualization, Funding acquisition, Writing – review & editing. Han Ming Gan: Methodology, Conceptualization, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.

Acknowledgments

The work was funded by Sarawak Research and Development Council through the Research Initiation Grant Scheme with grant number RDCRG/RIF/2019/13 awarded to H. H. Chung.

References

1.Lim L.W.K., Chung H.H., Lau M.M.L., Aziz F., Gan H.M. Improving the phylogenetic resolution of Malaysian and Javan mahseer (Cyprinidae), Tor tambroides and Tor tambra: whole mitogenomes sequencing, phylogeny and potential mitogenome markers. Gene. 2021;791 doi: 10.1016/j.gene.2021.145708. [DOI] [PubMed] [Google Scholar]
2.Haas B.J., Papanicolaou A., Yassour M., Grabherr M., Blood P.D., Bowden J., Regev A. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8(8):1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Huerta-Cepas J., Forslund K., Coelho L.P., Szklarczyk D., Jensen L.J., Von Mering C., Bork P. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 2017;34(8):2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Chandhini S., Rejish Kumar V.J. Transcriptomics in aquaculture: current status and applications. Rev. Aquac. 2019;11(4):1379–1397. [Google Scholar]
5.Dam C.T.M., Ventura T., Booth M., Pirozzi I., Salini M., Smullen R., Elizur A. Intestinal transcriptome analysis highlights key differentially expressed genes involved in nutrient metabolism and digestion in yellowtail kingfish (Seriola lalandi) fed terrestrial animal and plant proteins. Genes. 2020;11(6):621. doi: 10.3390/genes11060621. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Danzmann R.G., Kocmarek A.L., Norman J.D., Rexroad C.E., Palti Y. Transcriptome profiling in fast versus slow-growing rainbow trout across seasonal gradients. BMC Genom. 2016;17(1):1–18. doi: 10.1186/s12864-016-2363-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Guan W.Z., Qiu G.F. Transcriptome analysis of the growth performance of hybrid mandarin fish after food conversion. PloS One. 2020;15(10) doi: 10.1371/journal.pone.0240308. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hu G., Gu W., Sun P., Bai Q., Wang B. Transcriptome analyses reveal lipid metabolic process in liver related to the difference of carcass fat content in rainbow trout (Oncorhynchus mykiss) Int. J. Genom. 2016;2016 doi: 10.1155/2016/7281585. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Lin G., Thevasagayam N.M., Wan Z.Y., Ye B.Q., Yue G.H. Transcriptome analysis identified genes for growth and omega-3/-6 ratio in saline tilapia. Front. Genet. 2019;10:244. doi: 10.3389/fgene.2019.00244. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Ma D., Ma A., Huang Z., Wang G., Wang T., Xia D., Ma B. Transcriptome analysis for identification of genes related to gonad differentiation, growth, immune response and marker discovery in the turbot (Scophthalmus maximus) PLoS One. 2016;11(2) doi: 10.1371/journal.pone.0149414. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Overturf K., Sakhrani D., Devlin R.H. Expression profile for metabolic and growth-related genes in domesticated and transgenic coho salmon (Oncorhynchus kisutch) modified for increased growth hormone production. Aquaculture. 2010;307(1-2):111–122. [Google Scholar]
12.Palti Y. Toll-like receptors in bony fish: from genomics to function. Dev. Comp. Immunol. 2011;35(12):1263–1272. doi: 10.1016/j.dci.2011.03.006. [DOI] [PubMed] [Google Scholar]
13.Vagner M., Santigosa E. Characterization and modulation of gene expression and enzymatic activity of delta-6 desaturase in teleosts: a review. Aquaculture. 2011;315(1-2):131–143. [Google Scholar]
14.Xie Z.Z, Ling X., Dengdong W., Chao F., Qiongyu L., Zihao L., Haoran L. Transcriptome analysis of the Trachinotus ovatus: identification of reproduction, growth and immune-related genes and microsatellite markers. PloS One. 2014;9(10) doi: 10.1371/journal.pone.0109419. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Chen S., Zhou Y., Chen Y., Gu J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Grabherr M.G., Haas B.J., Yassour M., Levin J.Z., Thompson D.A., Amit I., Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011;29(7):644. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
18.Langmead B., Trapnell C., Pop M., Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):1–10. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0001] 1.Lim L.W.K., Chung H.H., Lau M.M.L., Aziz F., Gan H.M. Improving the phylogenetic resolution of Malaysian and Javan mahseer (Cyprinidae), Tor tambroides and Tor tambra: whole mitogenomes sequencing, phylogeny and potential mitogenome markers. Gene. 2021;791 doi: 10.1016/j.gene.2021.145708. [DOI] [PubMed] [Google Scholar]

[bib0002] 2.Haas B.J., Papanicolaou A., Yassour M., Grabherr M., Blood P.D., Bowden J., Regev A. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8(8):1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0003] 3.Huerta-Cepas J., Forslund K., Coelho L.P., Szklarczyk D., Jensen L.J., Von Mering C., Bork P. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 2017;34(8):2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0004] 4.Chandhini S., Rejish Kumar V.J. Transcriptomics in aquaculture: current status and applications. Rev. Aquac. 2019;11(4):1379–1397. [Google Scholar]

[bib0005] 5.Dam C.T.M., Ventura T., Booth M., Pirozzi I., Salini M., Smullen R., Elizur A. Intestinal transcriptome analysis highlights key differentially expressed genes involved in nutrient metabolism and digestion in yellowtail kingfish (Seriola lalandi) fed terrestrial animal and plant proteins. Genes. 2020;11(6):621. doi: 10.3390/genes11060621. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0006] 6.Danzmann R.G., Kocmarek A.L., Norman J.D., Rexroad C.E., Palti Y. Transcriptome profiling in fast versus slow-growing rainbow trout across seasonal gradients. BMC Genom. 2016;17(1):1–18. doi: 10.1186/s12864-016-2363-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0007] 7.Guan W.Z., Qiu G.F. Transcriptome analysis of the growth performance of hybrid mandarin fish after food conversion. PloS One. 2020;15(10) doi: 10.1371/journal.pone.0240308. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0008] 8.Hu G., Gu W., Sun P., Bai Q., Wang B. Transcriptome analyses reveal lipid metabolic process in liver related to the difference of carcass fat content in rainbow trout (Oncorhynchus mykiss) Int. J. Genom. 2016;2016 doi: 10.1155/2016/7281585. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0009] 9.Lin G., Thevasagayam N.M., Wan Z.Y., Ye B.Q., Yue G.H. Transcriptome analysis identified genes for growth and omega-3/-6 ratio in saline tilapia. Front. Genet. 2019;10:244. doi: 10.3389/fgene.2019.00244. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0010] 10.Ma D., Ma A., Huang Z., Wang G., Wang T., Xia D., Ma B. Transcriptome analysis for identification of genes related to gonad differentiation, growth, immune response and marker discovery in the turbot (Scophthalmus maximus) PLoS One. 2016;11(2) doi: 10.1371/journal.pone.0149414. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0011] 11.Overturf K., Sakhrani D., Devlin R.H. Expression profile for metabolic and growth-related genes in domesticated and transgenic coho salmon (Oncorhynchus kisutch) modified for increased growth hormone production. Aquaculture. 2010;307(1-2):111–122. [Google Scholar]

[bib0012] 12.Palti Y. Toll-like receptors in bony fish: from genomics to function. Dev. Comp. Immunol. 2011;35(12):1263–1272. doi: 10.1016/j.dci.2011.03.006. [DOI] [PubMed] [Google Scholar]

[bib0013] 13.Vagner M., Santigosa E. Characterization and modulation of gene expression and enzymatic activity of delta-6 desaturase in teleosts: a review. Aquaculture. 2011;315(1-2):131–143. [Google Scholar]

[bib0014] 14.Xie Z.Z, Ling X., Dengdong W., Chao F., Qiongyu L., Zihao L., Haoran L. Transcriptome analysis of the Trachinotus ovatus: identification of reproduction, growth and immune-related genes and microsatellite markers. PloS One. 2014;9(10) doi: 10.1371/journal.pone.0109419. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0015] 15.Chen S., Zhou Y., Chen Y., Gu J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0016] 16.Grabherr M.G., Haas B.J., Yassour M., Levin J.Z., Thompson D.A., Amit I., Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011;29(7):644. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0017] 17.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[bib0018] 18.Langmead B., Trapnell C., Pop M., Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):1–10. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib0019] 19.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The first transcriptome sequencing and data analysis of the Javan mahseer (Tor tambra)

Melinda Mei Lin Lau

Leonard Whye Kit Lim

Hung Hui Chung

Han Ming Gan

Abstract

1. Data Description