Skip to main content
Data in Brief logoLink to Data in Brief
. 2021 Oct 14;39:107481. doi: 10.1016/j.dib.2021.107481

The first transcriptome sequencing and data analysis of the Javan mahseer (Tor tambra)

Melinda Mei Lin Lau a, Leonard Whye Kit Lim a, Hung Hui Chung a,, Han Ming Gan b,c
PMCID: PMC8529094  PMID: 34712757

Abstract

The Javan mahseer (Tor tambra) is one of the most valuable freshwater fish found in Tor species. To date, other than mitogenomic data (BioProject: PRJNA422829), genomic and transcriptomic resources for this species are still lacking which is crucial to understand the molecular mechanisms associated with important traits such as growth, immune response, reproduction and sex determination. For the first time, we sequenced the transcriptome from a whole juvenile fish using Illumina NovaSEQ6000 generating raw paired-end reads. De novo transcriptome assembly generated a draft transcriptome (BUSCO5 completeness of 91.2% [Actinopterygii_odb10 database]) consisting of 259,403 putative transcripts with a total and N50 length of 333,881,215 bp and 2283 bp, respectively. A total count of 77,503 non-redundant protein coding sequences were predicted from the transcripts and used for functional annotation. We mapped the predicted proteins to 304 known KEGG pathways with signal transduction cluster having the highest representation followed by immune system and endocrine system. In addition, transcripts exhibiting significant similarity to previously published growth-and immune-related genes were identified which will facilitate future molecular breeding of Tor tambra.

Keywords: Transcriptome, Unigenes, Gene annotation, Tor tambra


Specifications Table

Subject Biological Sciences
Specific subject area Omics: Transcriptomics
Type of data Sequencing raw reads, assembly, Table, Figure, Graph
How data were acquired Sequencing
Data format Raw Reads (fastq), Assembly (fasta)
Parameters for data collection Total RNA extracted from a whole specimen of fish fry was used for library preparation and sequencing.
Description of data collection Total RNA extraction was performed using Wizol TriZol-like reagent (WizBio). The purified total RNA was subjected to mRNA enrichment using poly-T magnetic bead (NEB). The enriched mRNA was subsequently processed using NEB Ultra II RNA library preparation kit and sequenced on an Illumina NovaSeq6000 (2 × 150 bp)
Data source location The sample fish fry in this study was provided by a fish breeder who claimed that it originated from the Pahang, Malaysia. We subsequently extracted the mitochondrial genes from the transcriptome and showed that this specimen indeed formed a monophyletic cluster with Tor spp described from Pahang, Malaysia (Fig. 1) [1].
Data accessibility Raw data and final assembled contigs were deposited in the NCBI database under the Bioproject PRJNA727425 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA727425). Additional files such as BUSCO analysis output, GO annotation, KEGG annotation and COG annotation are available in the Zenodo database https://doi.org/10.5281/zenodo.4766490.

Value of the Data

  • Transcriptome dataset from the Javan mahseer is useful to gain insight into transcription regulation and biomarker discovery for the subsequent improvement of this species for aquaculture purposes.

  • High completeness of transcriptome dataset will aid in future phylotranscriptomic studies especially for fish taxonomist.

  • The dataset is useful in facilitating genetic management for the conservation of remaining populations of mahseer in Malaysian rivers.

1. Data Description

Standard RNA sequencing was performed to generate the transcriptome assembly from Javan mahseer (Tor tambra). Sequencing and assembly results are summarized in Table 1. Coding region was extracted using TransDecoder generating 77,503 predicted non-redundant proteins [2]. The proteins were annotated using eggNOG mapper [3] that will perform mapping to the KEGG, GO and COG databases. The sequence length of each unigene ranged from < 300 bp to > 5000 bp (Fig. 2). The number of unigenes had shown a decreasing trend when the length increases. A total of 40,150, 42,644 and 61,616 unigenes were annotated to GO, KEGG and COG databases, respectively. A Venn diagram had illustrated the differences and commonalities of unigenes toward the three databases (Fig. 3). Among a total of 63,191 unigenes, COG databases had the highest number of matches (61,616 unigenes) while another 42,644 and 40,150 unigenes matched to KEGG and GO databases, respectively (Table 2). Overall, 32,317 (51.14%) unigenes were found to exhibit a significant match to all the three major databases with 50,405 unigenes (79.77%) portrayed significant match to at least one hit to these databases (Table 2).

Fig. 1.

Fig 1

The maximum-likelihood phylogenetic tree constructed based on standard cytochrome oxidase I gene fragment with 1000 bootstrap replications, with the black bracket highlighted showing the sample fish fry involved in this study [1].

Table 1.

Transcriptome sequencing and assembly statistics.

Raw sequence reads 108,657,770 (16.29 Gb)
Number of contigs
Total assembled contig length
Contig N50 length
Number of predicted proteins
Total predicted protein length
278, 297
276, 327, 107 bp
1,922 bp
77,503
24,833,897 aa
BUSCO Completeness (Actinopterygii odb10)
Actinopterygii odb10: Complete BUSCOs 84% (3055)
Complete and single-copy BUSCOs 18.7% (679)
Complete and duplicated BUSCOs 6.9% (250)
Missing BUSCOs 9.1% (335)

Fig. 2.

Fig 2

Length distribution of unigenes Tor tambra.

Fig. 3.

Fig 3

Venn diagram showing differences and commonality of annotation based on GO, KEGG and COG.

Table 2.

Unigenes functional annotation by various databases.

Database Number of Unigenes Percentage (%)
GO 40,150 63.54
KEGG 42,644 67.48
COG 61,616 97.51
Annotated in at least one database 50,405 79.77
Annotated in all database 32,317 51.14
All unigenes 63,191 100.00

Fig. 4 showed the top ten subcategories account for each main ontology for GO databases. For biological process, 4404 (9.87%) were in the metabolism process, 2125 (4.76%) accounted for cell organization and biogenesis while another 1773 (3.97%) were in transport. For molecular function, 3297 (7.39%) were responsible for development while 2121 (4.75%) and 1222 (2.74%) counts were catalytic activity and binding, respectively. Meanwhile, for cellular component, a total of 1643 (3.68%) counts were accounted for cell, 1256 (2.81%) were categorized as intracellular and cytoplasm with a count of 608 (1.36%). There is a very small number of counts that grouped to extracellular region (0.22%), nucleoplasm (0.17%) and mitochondrion (0.17%).

Fig. 4.

Fig 4

GO functional annotations.

KEGG is another widely-used reference database consisting of pathway networks for integrating and interpreting large-scale datasets generated by RNA sequencing. A total of 34 categories of KEGG database consisting of 5 main groups (Cellular Processes, Environmental Information Processing, Genetic Information Processing, Metabolism and Organismal System) had been mapped and successfully located to 304 known KEGG pathways (Fig. 5). Among the five main categories, the largest category was organismal system (36,792, 38.79%) whilst genetic information processing had the lowest count (4640, 4.89%). The cluster having the most counts are as follow: signal transduction (17527, 18.48%), immune system (10897, 11.49%) and endocrine system (9059, 9.55%). In terms of signal transduction, various pathways such as two-component system, MAPK, ErbB, Ras, Rap1, Wnt, Notch, Hedgehog, TGF-beta, Hippo. VEGF, Apelin, JAK-STAT, NF-kappa B, TNF, HIF-1, FoxO, calcium, phosphatidylinositol, phospholipase D, sphingolipid, cAMP, cGMP-PKG, PI3K-Akt, AMPK and mTOR were found in Tor tambra, indicating a large number of signal generation during development stage. Fig. 6 shows the top 10 KEGG cluster components with the most counts among the 5 main KEGG groups. The largest count was metabolic pathway from metabolism category (4386, 4.62%), followed by NOD-like receptor signaling pathway (2247, 2.37%) and necroptosis (1940, 2.05%). Necroptosis belongs to the category cellular processes while NOD-like receptor signaling pathway belong to the organismal systems category.

Fig. 5.

Fig 5

KEGG annotation.

Fig. 6.

Fig 6

Top 10 KEGG annotations.

COG database consists of clusters of orthologous groups and is divided into 25 COG classifications (Fig. 7). Altogether 63,191 unigenes were mapped to COG database that can be grouped into 4 mainly categories, information storage and processing (15.59%), cellular processes and signaling (40.63%), metabolism (12.62%) and poorly characterised (31.17%). Among the 25 classifications, the largest clusters were function unknown (20560, 31.17%) and signal transduction mechanism (13521, 20.50%), followed by posttranslational modification, protein turnover, chaperones (5138, 7.79%), transcription (4529, 6.87%) and cytoskeleton (2364, 3.58%).

Fig. 7.

Fig 7

COG annotation.

73 growth-related genes and 30 immune-related genes were selected based on literature review [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. Each gene was searched for its respective accession number compatible to its protein sequence in NCBI (https://www.ncbi.nlm.nih.gov/). Out of the 103 genes, 51 growth-related genes and 13 immune-related genes were selected based on a stringent E-value cutoff of 10−10. Table 3 had listed on the growth-related proteins while Table 4 listed for immune-related proteins.

Table 3.

Growth-related protein. Protein marked with * asterisk sign were proteins selected after e-value cutoff while best parameters were inputed for proteins that did not pass the cutoff filter.

Contig ID Protein Accession Blast Hit Aligned Length Similarity (%) Reference
TRINITY_DN2932_c0_g1_i4.p1 *Acetyl-CoA carboxylase alpha (Haplotype 1) ADT82650.1 15 2342 74.552 [7]
TRINITY_DN2318_c0_g1_i1.p1 *Acetyl-CoA carboxylase alpha (Haplotype 2) ADX43925.1 15 2390 96.987 [7]
TRINITY_DN2932_c0_g1_i4.p1 *Acetyl-CoA carboxylase 2 XP_018962132.1 20 1837 96.788 [7]
TRINITY_DN1703_c0_g1_i1.p1 *Acetyl-CoA Acetyltransferase 2 AAD34966.1 13 395 91.899 [8]
TRINITY_DN1946_c0_g2_i1.p1 *Alpha actinin 3 XP_031588180.1 189 896 93.973 [6]
TRINITY_DN2816_c0_g1_i1.p1 *ATP synthase BAA82837.1 10 518 99.421 [5]
TRINITY_DN7716_c0_g1_i1.p1 *G2/mitotic-specific cyclin-B1 NP_571588.1 27 397 94.71 [7]
TRINITY_DN7305_c0_g1_i14.p1 *Cell division cycle protein 20 NP_998245.2 460 496 90.524 [7]
TRINITY_DN9513_c0_g1_i1.p1 *Conserved Edge Expressed ABK35126.2 2 322 91.304 [11]
TRINITY_DN148_c0_g1_i1.p1 *Creatine kinase M-type XP_028660838.1 18 381 81.102 [6]
TRINITY_DN4485_c0_g1_i10.p1 *Peroxisomal carnitine O-octanoyltransferase XP_018953213.1 35 333 89.189 [9]
TRINITY_DN7185_c0_g1_i1.p1 *Cathepsin K precursor NP_001017778.1 54 335 68.657 [5]
TRINITY_DN2821_c0_g1_i10.p1 *Cathepsin L AAI08032.1 54 612 95.425 [5]
TRINITY_DN3405_c0_g1_i1.p2 *25-hydroxycholesterol_7-alpha-hydroxylase XP_699028.2 69 506 81.423 [9]
TRINITY_DN2424_c0_g1_i2.p1 *Delta-6_desaturase AZL94116.1 12 444 97.973 [13]
TRINITY_DN3834_c0_g3_i1.p1 *Elongation of very long chain fatty acids protein 6 XP_018957993.1 25 266 99.624 [5]
TRINITY_DN787_c0_g1_i1.p1 *Fatty acid synthase ARO92273.1 35 2517 94.915 [9]
TRINITY_DN14382_c0_g1_i1.p1 *Farnesyl pyrophosphate synthase isoform X1 XP_005472704.1 1 359 81.058 [9]
TRINITY_DN11024_c0_g1_i2.p1 *Forkhead box protein K1 isoform X1 XP_025764132.1 90 687 64.338 [9]
TRINITY_DN741_c0_g1_i8.p1 *Glucose-6-phosphatase AVP32214.1 8 355 95.211 [5]
TRINITY_DN13312_c0_g2_i4.p1 *Growth hormone receptor ADZ13484.1 29 602 95.085 [10]
TRINITY_DN46810_c0_g1_i2.p1 *Glutathione synthetase XP_018970229.1 1 475 90.105 [9]
TRINITY_DN909_c1_g1_i3.p1 *70-kDa heat shock proteins AAF70445.1 49 643 85.07 [8]
TRINITY_DN1380_c0_g2_i1.p1 *Insulin-like growth factor-binding protein 1 ACV72066.1 45 262 96.947 [7]
TRINITY_DN958_c0_g1_i1.p1 *Insulin-like growth factor-binding protein 2 ACM47497.1 54 274 97.08 [7]
TRINITY_DN7562_c0_g1_i3.p1 *Insulin-like growth factor-binding protein 3 ACM47527.1 67 293 89.761 [7]
TRINITY_DN4955_c0_g1_i1.p1 *Inositol Monophosphatase 1 XP_018975133.1 5 282 97.872 [9]
TRINITY_DN230_c0_g1_i10.p1 *Insulin-induced_gene_1_protein NP_956163.1 8 251 97.61 [8]
TRINITY_DN5482_c0_g1_i1.p1 *Tyrosine-protein kinase JAK2-like XP_022620925.1 1540 1117 76.455 [14]
TRINITY_DN844_c0_g1_i1.p1 *Lipopolysaccharide binding protein NP_001118057.1 70 469 74.2 [8]
TRINITY_DN1369_c0_g1_i2.p1 *Hepatic triacylglycerol lipase XP_018956861.1 23 498 91.767 [5]
TRINITY_DN2357_c0_g1_i12.p1 *Myocyte Enhancer Factor 2° BAA33567.1 22 475 92.632 [11]
TRINITY_DN31591_c0_g1_i2.p1 *Myostatin 1 AJF48833.1 44 375 98.133 [11]
TRINITY_DN63384_c0_g1_i1.p1 *Myostatin 2 AJF48834.1 43 366 94.536 [11]
TRINITY_DN610_c0_g1_i3.p1 *Cytochrome P450 AAK37960.1 178 505 68.119 [8]
TRINITY_DN26556_c0_g1_i1.p1 *Paired box protein 7 isoform X1 XP_013988550.1 230 520 93.077 [11]
TRINITY_DN6261_c0_g1_i12.p1 *Cytosolic phospholipase A2 gamma-like XP_018952444.1 45 580 61.724 [5]
TRINITY_DN98343_c0_g1_i1.p1 *Proopiomelanocortin AAM93491.2 3 117 40.171 [14]
TRINITY_DN10545_c1_g2_i1.p1 *Peroxisome proliferator-activated receptor alpha CAJ76702.1 175 462 87.013 [8]
TRINITY_DN6218_c0_g1_i1.p1 *Peroxisome proliferator-activated receptor beta ACR15760.1 1171 488 74.795 [8]
TRINITY_DN2759_c3_g1_i3.p1 *Prolactin receptor QIB98245.1 45 611 86.579 [10]
TRINITY_DN371_c1_g1_i2.p1 *Antithrombin-III XP_018920986.1 83 450 91.778 [5]
TRINITY_DN2228_c1_g2_i1.p1 *SMAD family member 3 ABI94729.1 25 423 99.527 [9]
TRINITY_DN2168_c0_g1_i3.p1 *Secreted Protein Acidic And Cysteine Rich XP_003447656.1 34 300 83.667 [9]
TRINITY_DN10498_c0_g1_i2.p1 *Squalene monooxygenase NP_001103509.1 3 557 90.305 [9]
TRINITY_DN26296_c0_g1_i1.p1 *Somatostatin Receptor type 1-like XP_018943223.1 563 367 98.365 [10]
TRINITY_DN17037_c0_g1_i2.p1 *Somatostatin Receptor type 2-like XP_018946514.1 404 337 88.427 [10]
TRINITY_DN6684_c0_g1_i4.p1 *Signal transducer and activator of transcription 1b NP_956385.2 28 722 80.332 [14]
TRINITY_DN150_c0_g1_i2.p1 *Signal transducer and activator of transcription 2 NP_001258730.1 29 852 78.638 [14]
TRINITY_DN677_c0_g1_i4.p1 *Signal transducer and activator of transcription 3 BAH47263.1 28 806 99.007 [14]
TRINITY_DN28414_c0_g1_i5.p1 *Signal transducer and activator of transcription 4 NP_001004510.1 29 679 95.582 [14]
TRINITY_DN12936_c0_g1_i1.p1 *Signal transducer and activator of transcription 5 BAH47264.1 28 787 84.117 [14]
TRINITY_DN8483_c0_g1_i1.p1 *Ubiquitin carboxyl-terminal hydrolase 38 XP_003449754.1 114 1039 70.356 [9]

Table 4.

Immune-related proteins. Protein marked with * asterisk sign were proteins selected after e-value cutoff while best parameters were inputed for proteins that did not pass the cutoff filter.

Contig ID Protein Accession Blast Hit Assembled Length Similarity (%) Reference
TRINITY_DN399_c1_g1_i6.p1 *C-X-C Motif Chemokine Receptor 4 BAA32797.1 454 338 93.491 [4]
TRINITY_DN3511_c0_g1_i2.p1 *Myeloid differentiation primary response protein MyD88 XP_018923074.1 18 282 91.489 [14]
TRINITY_DN37340_c0_g2_i1.p1 *60S ribosomal protein L8 XP_034741439.1 4 257 98.054 [4]
TRINITY_DN29928_c0_g1_i1.p1 *Toll-like receptor 1 NP_001124065.1 264 798 75.439 [10]
TRINITY_DN32082_c1_g1_i2.p1 *Toll-like receptor 13 NP_001133860.1 941 951 45.216 [14]
- *Toll-like receptor 14 AXL48518.1 451 - - [14]
TRINITY_DN33713_c0_g1_i2.p1 *Toll-like receptor 2 NP_997977.1 279 787 79.288 [14]
TRINITY_DN10726_c0_g1_i3.p1 *Toll-like receptor 21 AVX48323.1 829 960 87.604 [10]
TRINITY_DN32082_c1_g1_i2.p1 *Toll-like receptor 22 NP_001117884.1 1 958 47.182 [10]
TRINITY_DN39710_c0_g1_i2.p1 *Toll-like receptor 3 ABL11473.1 581 904 88.496 [12]
TRINITY_DN32146_c0_g1_i5.p1 *Toll-like receptor 4ba AHH85806.1 300 755 73.51 [12]
TRINITY_DN32146_c0_g1_i5.p1 *Toll-like receptor 4bb AHH85807.1 379 817 91.31 [12]
TRINITY_DN12666_c0_g1_i5.p1 *Toll-like receptor 5b precursor NP_001124067.2 716 686 79.446 [14]
TRINITY_DN50330_c0_g1_i1.p1 *Toll-like receptor 7 AIS23537.1 550 1026 40.448 [14]
TRINITY_DN28916_c0_g1_i1.p1 *Toll-like receptor 9 ADE20130.1 1252 69 33.333 [4]

2. Experimental Design, Materials and Methods

2.1. Sampling and RNA extraction

A euthanized juvenile fish fry was provided by a local fish breeder. The whole specimen was homogenized in Wizol reagent (WizBio), a Trizol-like reagent. Total RNA extraction was subsequently performed as per the manufacturer's instructions.

2.2. Library construction and sequencing

Approximately 1 ug of total RNA was used as the input for mRNA enrichment using NEBNext Poly(A) mRNA magnetic isolation module (NEB). The enriched mRNA was subsequently processed using the NEBNext Ultra II non-directional RNA library preparation kit (NEB). Sequencing of the RNA library was performed on an Illumina NovaSeq6000 using the run configuration of 2 × 150 bp.

2.3. Sequence data processing and assembly

Raw reads were filtered for poly-G at the 3’ end, Illumina adapter and low-quality reads using the default setting of fastp v0.22.0 [15]. The trimmed paired-end reads were assembled de novo using Trinity v.2.8.5 using the default setting [16]. The transcriptome completeness was assessed using BUSCO v5 [17] based on the single-copy orthologs represented in the actinopterygii_odb10 database.

2.4. Mitogenome reconstruction and phylogeny

Trimmed pair-end reads were aligned to the reference mitochondrial genome of the Javan mahseer (GenBank Accession Code: NC_036511.1) using bowtie2 [18]. The SAM alignment was normalized to reduce high coverage particularly in the rRNA gene region followed by consensus generation using the samtools mpile up and bcftools [19]. The draft mitogenome assembly was annotated and used for phylogenetic analysis as previously described [1].

2.5. Annotation of unigenes

The protein coding sequences were extracted using TransDecoder v.5.5.0 followed by clustering at 98% protein similarity using cdhit v4.7 (-g 1 -c 98). The non-redundant predicted protein dataset was annotated using eggNOG mapper (evolutionary genealogy of genes: Non-supervised Orthologous Groups) with a minimum E-value of 0.001. Functional annotation of unigenes was executed by mapping against the three databases, GO (Gene Ontology), KEGG (Kyoto Encyclopedia of Genes and Genomes) and COG (the Clusters of Orthologous Groups).

Ethics Statement

All experiments comply with the ARRIVE guidelines and were carried out in accordance with the U.K. Animals (Scientific Procedures) Act, 1986 and associated guidelines, EU Directive 2010/63/EU for animal experiments, or the National Institutes of Health guide for the care and use of Laboratory animals (NIH Publications No. 8023, revised 1978).

CRediT authorship contribution statement

Melinda Mei Lin Lau: Writing – original draft, Data curation, Conceptualization. Leonard Whye Kit Lim: Data curation, Writing – original draft, Conceptualization. Hung Hui Chung: Conceptualization, Funding acquisition, Writing – review & editing. Han Ming Gan: Methodology, Conceptualization, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.

Acknowledgments

The work was funded by Sarawak Research and Development Council through the Research Initiation Grant Scheme with grant number RDCRG/RIF/2019/13 awarded to H. H. Chung.

References

  • 1.Lim L.W.K., Chung H.H., Lau M.M.L., Aziz F., Gan H.M. Improving the phylogenetic resolution of Malaysian and Javan mahseer (Cyprinidae), Tor tambroides and Tor tambra: whole mitogenomes sequencing, phylogeny and potential mitogenome markers. Gene. 2021;791 doi: 10.1016/j.gene.2021.145708. [DOI] [PubMed] [Google Scholar]
  • 2.Haas B.J., Papanicolaou A., Yassour M., Grabherr M., Blood P.D., Bowden J., Regev A. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8(8):1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Huerta-Cepas J., Forslund K., Coelho L.P., Szklarczyk D., Jensen L.J., Von Mering C., Bork P. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 2017;34(8):2115–2122. doi: 10.1093/molbev/msx148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chandhini S., Rejish Kumar V.J. Transcriptomics in aquaculture: current status and applications. Rev. Aquac. 2019;11(4):1379–1397. [Google Scholar]
  • 5.Dam C.T.M., Ventura T., Booth M., Pirozzi I., Salini M., Smullen R., Elizur A. Intestinal transcriptome analysis highlights key differentially expressed genes involved in nutrient metabolism and digestion in yellowtail kingfish (Seriola lalandi) fed terrestrial animal and plant proteins. Genes. 2020;11(6):621. doi: 10.3390/genes11060621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Danzmann R.G., Kocmarek A.L., Norman J.D., Rexroad C.E., Palti Y. Transcriptome profiling in fast versus slow-growing rainbow trout across seasonal gradients. BMC Genom. 2016;17(1):1–18. doi: 10.1186/s12864-016-2363-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Guan W.Z., Qiu G.F. Transcriptome analysis of the growth performance of hybrid mandarin fish after food conversion. PloS One. 2020;15(10) doi: 10.1371/journal.pone.0240308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hu G., Gu W., Sun P., Bai Q., Wang B. Transcriptome analyses reveal lipid metabolic process in liver related to the difference of carcass fat content in rainbow trout (Oncorhynchus mykiss) Int. J. Genom. 2016;2016 doi: 10.1155/2016/7281585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lin G., Thevasagayam N.M., Wan Z.Y., Ye B.Q., Yue G.H. Transcriptome analysis identified genes for growth and omega-3/-6 ratio in saline tilapia. Front. Genet. 2019;10:244. doi: 10.3389/fgene.2019.00244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ma D., Ma A., Huang Z., Wang G., Wang T., Xia D., Ma B. Transcriptome analysis for identification of genes related to gonad differentiation, growth, immune response and marker discovery in the turbot (Scophthalmus maximus) PLoS One. 2016;11(2) doi: 10.1371/journal.pone.0149414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Overturf K., Sakhrani D., Devlin R.H. Expression profile for metabolic and growth-related genes in domesticated and transgenic coho salmon (Oncorhynchus kisutch) modified for increased growth hormone production. Aquaculture. 2010;307(1-2):111–122. [Google Scholar]
  • 12.Palti Y. Toll-like receptors in bony fish: from genomics to function. Dev. Comp. Immunol. 2011;35(12):1263–1272. doi: 10.1016/j.dci.2011.03.006. [DOI] [PubMed] [Google Scholar]
  • 13.Vagner M., Santigosa E. Characterization and modulation of gene expression and enzymatic activity of delta-6 desaturase in teleosts: a review. Aquaculture. 2011;315(1-2):131–143. [Google Scholar]
  • 14.Xie Z.Z, Ling X., Dengdong W., Chao F., Qiongyu L., Zihao L., Haoran L. Transcriptome analysis of the Trachinotus ovatus: identification of reproduction, growth and immune-related genes and microsatellite markers. PloS One. 2014;9(10) doi: 10.1371/journal.pone.0109419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen S., Zhou Y., Chen Y., Gu J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Grabherr M.G., Haas B.J., Yassour M., Levin J.Z., Thompson D.A., Amit I., Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011;29(7):644. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  • 18.Langmead B., Trapnell C., Pop M., Salzberg S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):1–10. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES