Skip to main content
Scientific Data logoLink to Scientific Data
. 2024 Apr 24;11:426. doi: 10.1038/s41597-024-03262-8

Chromosome-level genome assembly of the yellow-cheek carp Elopichthys bambusa

Shunyao Li 1, Xuemei Xiong 1, Siyu Qiu 1, Zhigang Shen 1, Yan He 1, Zexia Gao 1,2,, Shiming Wan 1,2,
PMCID: PMC11043341  PMID: 38658574

Abstract

Yellow-cheek carp (Elopichthys bambusa) is a typical large and ferocious carnivorous fish endemic to East Asia, with high growth rate, nutritional value and economic value. In this study, a chromosome-level genome of yellow-cheek carp was generated by combining PacBio reads, Illumina reads and Hi-C data. The genome size is 827.63 Mb with a scaffold N50 size of 33.65 Mb, and 99.51% (823.61 Mb) of the assembled sequences were anchored to 24 pseudo-chromosomes. The genome is predicted to contain 24,153 protein-coding genes, with 95.54% having functional annotations. Repeat elements account for approximately 55.17% of the genomic landscape. The completeness of yellow-cheek carp genome assembly is highlighted by a BUSCO score of 98.4%. This genome will help us understand the genetic diversity of yellow-cheek carp and facilitate its conservation planning.

Subject terms: Genome, Ichthyology, Animal breeding

Background & Summary

Yellow-cheek carp (Elopichthys bambusa), also known as “water tiger”, is a species in the order Elopichthys, subfamily Leuciscinae and family Cyprinidae. Yellow-cheek carp is a typical large and ferocious carnivorous fish endemic to East Asia. In China, it is mainly distributed in river systems such as the Yangtze River, Pearl River and Yellow River1. Yellow-cheek carp lives in the upper layer of rivers and lakes, it has a strong swimming ability and chases other fish for food. Yellow-cheek carp can prey on diseased and weak fish to control their population size, which is of great significance for maintaining the ecological balance of the water environment2. Yellow-cheek carp is also an important characteristic economic fish with firm meat, delicious taste, and rich in high-quality protein, unsaturated fatty acids, minerals and other nutrients35. However, anthropic factors such as overfishing, hydrological modification and water pollution have led to the dwindling natural resources of yellow-cheek carp6,7, which has been listed in the “Key Protected Endangered and Threatened Aquatic Species” and the IUCN Red List of Threatened Species (Version 2020.3)8.

The typical carnivorous yellow-cheek carp is particularly special among East Asian carp species that are mainly omnivorous and herbivorous. For example, yellow-cheeked carp and grass carp both belong to the subfamily Leuciscinae and had the closest relationship. Interestingly, they have evolved completely opposite feeding habits9, which provides excellent material for studying the evolution and genetic regulation mechanisms of fish feeding habits. However, the lack of genomic information limits the study on the carnivorous formation mechanism of yellow-cheek carp. At the same time, higher breeding profits have also promoted the continuous development of the artificial breeding industry of yellow-cheek carp. Using live fish or frozen fish as the main bait not only results in higher breeding costs for yellow-cheeked carp, but also easily causes pollution of the aquaculture water, which greatly restricts the expansion of the farming scale10. Therefore, research on the dietary transformation of typical carnivorous fishes such as yellow-cheek carp has gradually become a hot topic, and there is an urgent need for genetic breeding of yellow-cheek carp based on whole-genome information.

In this research, we have combined PacBio long-read sequencing, Illumina short-read sequencing and Hi-C technology to generate a high-quality chromosome-level genome of the yellow-cheek carp (Fig. 1). Accordingly, we expect rapid progress in the genetics research of yellow-cheeked carp, and functional genes related to key economic traits of yellow-cheeked carp will continue to be discovered. The elucidation of the genome structures and functions will promote more in-depth research to better understand the genetic basis for the formation of important traits such as the carnivorous in yellow-cheeked carp, thereby making contributions to its resource protection, genetic selection and artificial breeding.

Fig. 1.

Fig. 1

Characterization of assembled yellow-cheek carp genome. Circos plot of the yellow-cheek carp genome, with visualization of gene density (1), TRP (2), LTR (3), SINE (4), LINE (5) and GC content (6) in order from outside to inside.

Methods

Sample collection and sequencing

An adult male yellow-cheek carp was collected from the Yangtze River in Wuhan, Hubei, China. High-quality genomic DNA was extracted from muscle by the CTAB method for Illumina sequencing, PacBio SMRT sequencing11 and Hi-C. The quality of the extracted DNA was assessed using agarose gel electrophoresis and NanoDrop Spectrophotometer (Thermo Fisher Scientific, USA), and quantified by a Qubit Fluorometer (Invitrogen, USA).

For Illumina sequencing, the genomic DNA was randomly sheared to 300~500 bp fragments, and a paired-end genomic library was prepared following the manufacturer’s protocol. Then, the library was sequenced on an Illumina NovaSeq platform using a paired-end 150 bp layout to enable genome survey and base-level correction. For PacBio long-read sequencing, SMRTbell libraries were constructed using the genomic DNA and sequenced on the PacBio Sequel II sequencing platform. After, approximately 58.98 Gb of Illumina short-read data (coverage of 71.31×) and 27.35 Gb of PacBio continuous long reads (CLR) data (coverage of 32.65×) was obtained.

To generate a chromosomal-level assembly of the yellow-cheek carp genome, a Hi-C library was generated using the DNA extracted from the same yellow-cheek carp. After cell crosslinking, cell lysis, chromatin digestion, biotin labelling, proximal chromatin DNA ligation and DNA purification, the resulting Hi-C library was subjected to paired-end sequencing with 150 bp read lengths on an Illumina NovaSeq platform. Finally, the size of Hi-C data obtained was 151.98 Gb, covering 183.78× of the genome.

To aid genome annotation, the total RNA from muscle, spleen, gonad and skin was extracted and tested for purity and integrity using a NanoDrop Spectrophotometer (Thermo Fisher Scientific, USA) and Agilent 2100 bioanalyzer (Agilent Technologies, USA). The RNA library was constructed using the NEBNext® UltraTM RNA Library Prep Kit (Illumina, USA) following the manufacturer’s protocol and sequenced on an Illumina NovaSeq. 6000 platform. Finally, 23.74 Gb of data was obtained (Table 1).

Table 1.

Statistics of the sequencing data used for genome assembly.

Libraries Insert sizes Clean data (bp) Sequencing coverage (×)
Illumina 300 bp 58,975,349,100 71.31×
PacBio 10–15 kb 27,351,494,268 32.65×
Hi-C 300 bp 151,983,658,870 183.78×
RNA 300 bp 23,735,378,400 27.81×

Genome assembly

First, SOAPnuke (v2.1.0)12 was used to perform quality control of Illumina data, and the clean data were utilized for genome size estimation. K-mer analysis13 was conducted using GCE (v1.0.2). As a result, the genome size was estimated to be 786.16 Mb, with a heterozygosity ratio of 0.47% and repeat sequence ratio of 47.03% (Table 2). A total of 27.35 Gb PacBio long-read data were used for de novo genome assembly using MECAT2 (v2.0.0)14 and NextDenovo (v2.4.0). The polishing was then carried out by the software gcpp (v2.0.2) and pilon (v1.22)15. Based on these sequencing data, the resulting assembly consists of 170 contigs and has a total length of 827.63 Mb (Table 3).

Table 2.

K-mer frequency and genome size evaluation of yellow-cheek carp genome.

K-mer number K-mer Depth Genome Size (Mb) Heterozygous Ratio (%) Repeat (%)
52,684,645,196 64 786.16 0.47 47.03

Table 3.

Statistics for Hi-C assisted assembly.

Total Contig Num Contig N50 Scaffold Num Scaffold N50 Proportion GC-percent
Hi-C assisted pre-assembly 827,626,473 170 9,879,208
Hi-C-assisted assembly 823,606,315 165 9,879,208 24 33,649,237 99.51% 37.45

Hi-C scaffolding

The Hi-C technology was used for chromosome-level genome assembly. The Trimmomatic16 with parameters (LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:50) was used to remove adapters and low-quality fragments of the raw Hi-C reads data. The processed reads were then aligned to the assembly using the Juicer (v1.6)17 with default settings. Contigs were scaffolded using 3D-DNA pipeline18 with all valid Hi-C reads. We use the Juicebox (v2.13.07)17 to adjust the chromosome-scale scaffolds manually(Fig. 2, Table 4). And there are 141 gaps among the 24 chromosomes.

Fig. 2.

Fig. 2

Genome-wide Hi-C interaction mapping of chromosome sections.

Table 4.

Chromosome and reference genome corresponding chromosome statistical results.

Chromosome ID Number of Contigs Length (bp) Gaps
chr1 10 48,801,470 9
chr2 3 47,476,723 2
chr3 15 43,850,734 14
chr4 14 41,595,563 13
chr5 7 40,868,316 6
chr6 6 36,732,165 5
chr7 8 36,442,319 7
chr8 4 35,157,168 3
chr9 8 35,141,945 7
chr10 4 34,436,776 3
chr11 4 33,649,237 3
chr12 6 33,538,482 5
chr13 7 32,527,850 6
chr14 9 32,137,104 8
chr15 5 31,940,173 4
chr16 3 31,691,200 2
chr17 5 30,801,312 4
chr18 8 30,664,716 7
chr19 3 30,038,157 2
chr20 9 29,852,686 8
chr21 7 27,984,395 6
chr22 5 26,913,480 4
chr23 10 26,690,801 9
chr24 5 24,744,043 4
TOTAL 165 823,676,815 141

Repeat annotation

We used de novo prediction and homology comparison to annotate the genomic repetitive sequences. RepeatModeler19 were used to detected and classified the repetitive sequences in the genome assembly using tools including RECON(v1.08)20, RepeatScout(v1.0.5)21, LTR-FINDER(v1.0.5)22 and TRF (v4.0.935)23. For homology comparison, RepeatMasker (open-4.0.9) and RepeatProteinMask (open-4.0.9) were used to identify the known TEs of the yellow-cheek carp genome in the Repbase TE library24,25 and TE protein database, respectively. The results showed that the genome repetitive sequence size was 456.66 Mb, accounting for 55.17% of the assembled genome. Among the repeat elements, short interspersed nuclear elements (SINEs) accounted for 0.24% of genome size and long interspersed nuclear elements (LINEs) accounted for 7.67%. Long terminal repeats (LTRs) and DNA elements accounted for 12.31% and 34.87%, respectively (Table 5).

Table 5.

Repetitive elements and their proportions in yellow-cheek carp genome.

Type Repbase TEs Protein TEs Denovo TEs Combined TEs
Length (bp) Percentage (%) Length (bp) Percentage (%) Length (bp) Percentage (%) Length (bp) Percentage (%)
DNA 135,569,082 16.38 21,468,489 2.59 208,673,761 25.21 288,628,347 34.87
LINE 17,380,180 2.1 17,851,894 2.16 52,066,672 6.29 63,480,091 7.67
SINE 1,034,564 0.12 0 0 1,364,468 0.16 2,016,734 0.24
LTR 24,846,205 3 19,281,719 2.33 91,771,796 11.09 101,898,770 12.31
Unknow 18,87,900 0.23 6,603 0 44,616,285 5.39 46,455,288 5.61
Total 173,959,113 21.02 58,476,207 7.06 343,673,320 41.52 429,931,954 51.94

Protein-coding gene prediction and annotation

In this research, the ab initio gene prediction, homology-based gene prediction and transcript prediction were used to predicted protein-coding genes of the yellow-cheek carp genome. Prior to gene prediction, the assembled yellow-cheek carp genome was hard and soft masked using RepeatMasker. The ab initio gene prediction was performed using Augustus (v3.3.1)26,27 and Genescan (v1.0)28. Models used for each gene predictor were trained from a set of high-quality proteins generated from the RNA-Seq data. For the homology-based prediction, Glimmer HMM(v3.0.4)29 was used to align the protein sequences to our genome assembly and predict coding genes with the default parameters. The reference protein sequences of five fish species, including Ctenopharyngodon idella, Sinocyclocheilus grahami, Megalobrama amblycephala, Danio rerio and Cyprinus carpio, were sourced from the NCBI database. For the transcript prediction, clean RNA-Seq reads were assembled into the yellow-cheek carp genome using Stringtie (v2.1.1)30. Then the gene structure was formed using PASA (v2.4.1)31. To consolidate the results from these three methods, MAKER (v3.00)32 was employed to enable the merging and integration of gene predictions.

For functional annotation of predicted gene, BLASTP (v2.6.0)33,34 was used to align the anticipated genes to the Kyoto Encyclopedia of Genes and Genomes (KEGG)35, Gene Ontology (GO)36, NCBI-NR (non-redundant protein database), Swiss-Prot37, TrEMBL38 and InterPro39 database. In total, we successfully predicted 24,153 protein-coding genes within the genome. These predicted genes displayed an average coding sequence length of 1638.21 bp, an average gene length of 18969.98 bp, and an average exon number of 9.87 (Table 6). Further, 22,965 genes, which accounts for 95.54% of the total number of predicted genes, were successfully assigned with at least one functional annotation (Table 7).

Table 6.

Basic statistical results of gene prediction.

Gene set Number Average gene length (bp) Average CDS length (bp) Average exon number per gene Average exon length (bp) Average intron length (bp)
denovo/AUGUSTUS 19,271 19,665.20 1,726.50 10.08 171.34 1,976.46
denovo/GlimmHMM 54,008 14,259.34 905.18 6.10 148.33 2,617.17
denovo/Genscan 23,400 24,954.02 1,692.64 9.19 184.09 2,838.60
homo/C. carpio 46,149 10,108.37 1,077.86 5.61 91.98 1,957.04
homo/S. grahami 43,803 11,026.80 1,115.46 5.75 193.90 2,085.45
homo/M. amblycephala 47,792 12,277.38 1,201.90 5.81 207.02 2,304.66
homo/D. rerio 45,504 9,494.07 1,020.30 5.28 193.18 1,979.17
homo/C. idella 63,196 7,385.67 972.24 4.59 211.79 1,786.17
trans.orf/RNAseq 15,467 21,165.74 1,680.38 10.78 281.86 1,853.98
PASA 24,038 19,597.60 1,651.11 9.97 257.30 1,898.72
MAKER 24,153 18,969.98 1,638.21 9.87 243.04 1,868.06

Table 7.

Functional annotation statistics.

Gene number Percent (%)
Total 24,038 NA
InterPro 20,189 83.99
GO 14,812 61.62
KEGG_ALL 22,561 93.86
KEGG_KO 16,013 66.62
Swissprot 20,884 86.88
TrEMBL 22,382 93.11
NR 22,936 95.42
Annotated 22,965 95.54
Unannotated 1,073 4.46

Annotation of non-coding RNA genes

The tRNAscan-SE (v1.3.1)40 algorithms with default parameters were used to identify the genes associated with tRNA. We downloaded the closely related species rRNA sequences from the Ensembl database. Then rRNAs in the database were aligned against our genome using BLASTn (v2.6.0)41 with E-value <1e-5, identity ≥85% and match length ≥50 bp. The miRNAs and snRNAs were identified by Infernal (v1.1.2)42 software against the Rfam (v14.1) database with default parameters. As a result, we annotated 76 rRNAs, 2469 tRNAs, 291 MiRNAs and 212 snRNAs (Table 8).

Table 8.

Statistics of non-coding RNA annotation.

Type Copy AverageLength (bp) TotalLength (bp) % of genome
miRNA 291 88.84 25,853 0.0031
tRNA 2,469 75.51 186,428 0.0225
rRNA rRNA 76 338.30 25,711 0.0031
18 S 4 1,891.75 7,567 0.0009
28 S 2 5,047.50 10,095 0.0012
5 S 70 114.99 8,049 0.0010
snRNA snRNA 212 128.75 27,295 0.0033
CD-box 75 108.87 8,165 0.0010
HACA-box 48 156.56 7,515 0.0009
splicing 74 132.45 9,801 0.0012
scaRNA 6 220.00 1,320 0.0002

Data Records

All the raw sequencing data have been deposited in the NCBI database under the accession number SRP47030643. The genome assembly has been deposited at GenBank under the accession GCA_037101425.144. Genome annotations, along with predicted coding sequences and protein sequences, can be accessed through the Figshare45.

Technical Validation

The BUSCO was used to evaluate the quality of the genome assembly. We assessed assembly completeness using BUSCO (v3.0.259)46 with the reference arthropod gene set (n = 3,640). The final genome assembly showed a BUSCO completeness of 98.4%, consisting of 3,538 (97.2%) single-copy BUSCOs, 45 (1.2%) duplicated BUSCOs, 26 (0.7%) fragmented BUSCOs, and 31 (0.9%) missing BUSCOs (Table 9). Comparison of BUSCO results with Squaliobarbus curriculus (95.8%) and Mylopharyngodon piceus (96.0%) revealed the high genome assembly quality of yellow-cheeked carp47.

Table 9.

Statistical result of BUSCO evaluation results of genome assembly.

Number Percentage (%)
Complete BUSCOs 3,583 98.4
Complete and single-copy BUSCOs 3,538 97.2
Complete and duplicated BUSCOs 45 1.2
Fragmented BUSCOs 26 0.7
Missing BUSCOs 31 0.9
Total BUSCO groups searched 3,640 100

Acknowledgements

This work was supported by the Key Research and Development Program of Hubei Province (2021BBA233 and 2023BBA001). The founders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author contributions

S.L., S.W. and Z.G. conceived this study. S.L., S.Q., Z.S. and Y.H. collected the samples and performed the experiments; S.L. and X.X. performed the research and analyzed the data. S.L. drafted the manuscript. All authors have read and approved the final manuscript.

Code availability

All commands and pipelines used in data processing were executed according to the manual and protocols of the corresponding bioinformatic software. No specific code has been developed for this study.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Zexia Gao, Email: gaozx@mail.hzau.edu.cn.

Shiming Wan, Email: wansm@mail.hzau.edu.cn.

References

  • 1.Zhu NS, Chen HX. Food habits of yellow-cheek carp in Liangzi lake. Acta Hydrobiologica Sinica. 1959;03:262–271. [Google Scholar]
  • 2.Liang ZS, Yi BL, Yu ZT. Reproductive habits and embryonic development of yellow-cheek carp in the main stream of the Yangtze River and the Han River. Acta Hydrobiologica Sinica. 1984;04:389–403. [Google Scholar]
  • 3.Ma XF, Wang WM, Yang ZL. Biochemical composition and nutritional characteristics of yellow-cheek carp. Journal of Huazhong Agricultural University. 2008;06:759–762. [Google Scholar]
  • 4.Yi CP, Zhong CM. Yellow-cheek carp fat content determination and fatty acid composition analysis. Food Science. 2013;14:255–258. [Google Scholar]
  • 5.Zhang ZQ, et al. Yellow-cheek carp meat rate and muscle nutrient analysis. Tianjin Agricultural Sciences. 2013;04:29–33. [Google Scholar]
  • 6.Zhu TB, et al. Lushan west sea yellow-cheek carp national aquatic germplasm resources protection zone aquatic biological resources preliminary investigation. Biotic Resources. 2021;02:188–193. [Google Scholar]
  • 7.Qi XR. Survey of fishery resources in the upper Han River. Journal of Fisheries Research. 2022;01:21–32. [Google Scholar]
  • 8.Liao F, et al. Complete mitochondrial genome of Elopichthys bambusa (Cypriniformes, Cyprinidae) Mitochondrial DNA. 2016;27:1387–1388. doi: 10.3109/19401736.2014.947593. [DOI] [PubMed] [Google Scholar]
  • 9.Han XL, et al. The AFLP analysis of yellow-cheek carp group genetic diversity. Journal of Nanjing Normal University (Natural Science Edition). 2009;01:110–114. [Google Scholar]
  • 10.Yang W, Fan QX. The specialization breeding technology of yellow-cheek carp. Animals Breeding and Feed. 2011;09:32–33. [Google Scholar]
  • 11.Chin CS, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chen Y, et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience. 2018;7:1–6. doi: 10.1093/gigascience/gix120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Liu B, et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Quantitative Biology. 2013;35:62–67. [Google Scholar]
  • 14.Xiao CL, et al. MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nat Methods. 2017;14:1072–1074. doi: 10.1038/nmeth.4432. [DOI] [PubMed] [Google Scholar]
  • 15.Walker BJ, et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Flynn JM, et al. RepeatModeler 2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Bao Z, Eddy SR. Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Res. 2002;12:1269–1276. doi: 10.1101/gr.88502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–i358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]
  • 22.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research. 2007;35:W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Jurka J, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
  • 25.Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends in Genetics. 2000;16:418–420. doi: 10.1016/S0168-9525(00)02093-X. [DOI] [PubMed] [Google Scholar]
  • 26.Stanke M, Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Research. 2005;33:W465–W467. doi: 10.1093/nar/gki458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Stanke M, et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research. 2006;34:W435–W439. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. [DOI] [PubMed] [Google Scholar]
  • 29.Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]
  • 30.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Haas BJ. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Cantarel BL, et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18:188–196. doi: 10.1101/gr.6743907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.AltschuP SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. Journal of molecular biology. 1990;3:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  • 34.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Research. 2012;40:D109–D114. doi: 10.1093/nar/gkr988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ashburner M, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Boeckmann B. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 2003;31:365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Bairoch A. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research. 2000;28:45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mitchell A, et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Research. 2015;43:D213–D221. doi: 10.1093/nar/gku1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research. 1997;5:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Altschul S. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–1337. doi: 10.1093/bioinformatics/btp157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.2023. NCBI Sequence Read Archive. SRP470306
  • 44.2023. NCBI GenBank. GCA_037101425.1
  • 45.Li S. 2024. Chromosome-level genome assembly of the yellow-cheek carp Elopichthys bambusa. figshare. [DOI] [PMC free article] [PubMed]
  • 46.Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution. 2021;38:4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Xu MRX, et al. Maternal dominance contributes to subgenome differentiation in allopolyploid fishes. Nature Communication. 2023;14:8357. doi: 10.1038/s41467-023-43740-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. 2023. NCBI Sequence Read Archive. SRP470306
  2. 2023. NCBI GenBank. GCA_037101425.1
  3. Li S. 2024. Chromosome-level genome assembly of the yellow-cheek carp Elopichthys bambusa. figshare. [DOI] [PMC free article] [PubMed]

Data Availability Statement

All commands and pipelines used in data processing were executed according to the manual and protocols of the corresponding bioinformatic software. No specific code has been developed for this study.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES