Chromosome-level genome assembly of the yellow-cheek carp Elopichthys bambusa

Shunyao Li; Xuemei Xiong; Siyu Qiu; Zhigang Shen; Yan He; Zexia Gao; Shiming Wan

doi:10.1038/s41597-024-03262-8

. 2024 Apr 24;11:426. doi: 10.1038/s41597-024-03262-8

Chromosome-level genome assembly of the yellow-cheek carp Elopichthys bambusa

Shunyao Li ¹, Xuemei Xiong ¹, Siyu Qiu ¹, Zhigang Shen ¹, Yan He ¹, Zexia Gao ^1,^2,^✉, Shiming Wan ^1,^2,^✉

PMCID: PMC11043341 PMID: 38658574

Abstract

Yellow-cheek carp (Elopichthys bambusa) is a typical large and ferocious carnivorous fish endemic to East Asia, with high growth rate, nutritional value and economic value. In this study, a chromosome-level genome of yellow-cheek carp was generated by combining PacBio reads, Illumina reads and Hi-C data. The genome size is 827.63 Mb with a scaffold N50 size of 33.65 Mb, and 99.51% (823.61 Mb) of the assembled sequences were anchored to 24 pseudo-chromosomes. The genome is predicted to contain 24,153 protein-coding genes, with 95.54% having functional annotations. Repeat elements account for approximately 55.17% of the genomic landscape. The completeness of yellow-cheek carp genome assembly is highlighted by a BUSCO score of 98.4%. This genome will help us understand the genetic diversity of yellow-cheek carp and facilitate its conservation planning.

Subject terms: Genome, Ichthyology, Animal breeding

Background & Summary

Yellow-cheek carp (Elopichthys bambusa), also known as “water tiger”, is a species in the order Elopichthys, subfamily Leuciscinae and family Cyprinidae. Yellow-cheek carp is a typical large and ferocious carnivorous fish endemic to East Asia. In China, it is mainly distributed in river systems such as the Yangtze River, Pearl River and Yellow River¹. Yellow-cheek carp lives in the upper layer of rivers and lakes, it has a strong swimming ability and chases other fish for food. Yellow-cheek carp can prey on diseased and weak fish to control their population size, which is of great significance for maintaining the ecological balance of the water environment². Yellow-cheek carp is also an important characteristic economic fish with firm meat, delicious taste, and rich in high-quality protein, unsaturated fatty acids, minerals and other nutrients^3–5. However, anthropic factors such as overfishing, hydrological modification and water pollution have led to the dwindling natural resources of yellow-cheek carp^6,7, which has been listed in the “Key Protected Endangered and Threatened Aquatic Species” and the IUCN Red List of Threatened Species (Version 2020.3)⁸.

The typical carnivorous yellow-cheek carp is particularly special among East Asian carp species that are mainly omnivorous and herbivorous. For example, yellow-cheeked carp and grass carp both belong to the subfamily Leuciscinae and had the closest relationship. Interestingly, they have evolved completely opposite feeding habits⁹, which provides excellent material for studying the evolution and genetic regulation mechanisms of fish feeding habits. However, the lack of genomic information limits the study on the carnivorous formation mechanism of yellow-cheek carp. At the same time, higher breeding profits have also promoted the continuous development of the artificial breeding industry of yellow-cheek carp. Using live fish or frozen fish as the main bait not only results in higher breeding costs for yellow-cheeked carp, but also easily causes pollution of the aquaculture water, which greatly restricts the expansion of the farming scale¹⁰. Therefore, research on the dietary transformation of typical carnivorous fishes such as yellow-cheek carp has gradually become a hot topic, and there is an urgent need for genetic breeding of yellow-cheek carp based on whole-genome information.

In this research, we have combined PacBio long-read sequencing, Illumina short-read sequencing and Hi-C technology to generate a high-quality chromosome-level genome of the yellow-cheek carp (Fig. 1). Accordingly, we expect rapid progress in the genetics research of yellow-cheeked carp, and functional genes related to key economic traits of yellow-cheeked carp will continue to be discovered. The elucidation of the genome structures and functions will promote more in-depth research to better understand the genetic basis for the formation of important traits such as the carnivorous in yellow-cheeked carp, thereby making contributions to its resource protection, genetic selection and artificial breeding.

Fig. 1 — Characterization of assembled yellow-cheek carp genome. Circos plot of the yellow-cheek carp genome, with visualization of gene density (1), TRP (2), LTR (3), SINE (4), LINE (5) and GC content (6) in order from outside to inside.

Methods

Sample collection and sequencing

An adult male yellow-cheek carp was collected from the Yangtze River in Wuhan, Hubei, China. High-quality genomic DNA was extracted from muscle by the CTAB method for Illumina sequencing, PacBio SMRT sequencing¹¹ and Hi-C. The quality of the extracted DNA was assessed using agarose gel electrophoresis and NanoDrop Spectrophotometer (Thermo Fisher Scientific, USA), and quantified by a Qubit Fluorometer (Invitrogen, USA).

For Illumina sequencing, the genomic DNA was randomly sheared to 300~500 bp fragments, and a paired-end genomic library was prepared following the manufacturer’s protocol. Then, the library was sequenced on an Illumina NovaSeq platform using a paired-end 150 bp layout to enable genome survey and base-level correction. For PacBio long-read sequencing, SMRTbell libraries were constructed using the genomic DNA and sequenced on the PacBio Sequel II sequencing platform. After, approximately 58.98 Gb of Illumina short-read data (coverage of 71.31×) and 27.35 Gb of PacBio continuous long reads (CLR) data (coverage of 32.65×) was obtained.

To generate a chromosomal-level assembly of the yellow-cheek carp genome, a Hi-C library was generated using the DNA extracted from the same yellow-cheek carp. After cell crosslinking, cell lysis, chromatin digestion, biotin labelling, proximal chromatin DNA ligation and DNA purification, the resulting Hi-C library was subjected to paired-end sequencing with 150 bp read lengths on an Illumina NovaSeq platform. Finally, the size of Hi-C data obtained was 151.98 Gb, covering 183.78× of the genome.

To aid genome annotation, the total RNA from muscle, spleen, gonad and skin was extracted and tested for purity and integrity using a NanoDrop Spectrophotometer (Thermo Fisher Scientific, USA) and Agilent 2100 bioanalyzer (Agilent Technologies, USA). The RNA library was constructed using the NEBNext® UltraTM RNA Library Prep Kit (Illumina, USA) following the manufacturer’s protocol and sequenced on an Illumina NovaSeq. 6000 platform. Finally, 23.74 Gb of data was obtained (Table 1).

Table 1.

Statistics of the sequencing data used for genome assembly.

Libraries	Insert sizes	Clean data (bp)	Sequencing coverage (×)
Illumina	300 bp	58,975,349,100	71.31×
PacBio	10–15 kb	27,351,494,268	32.65×
Hi-C	300 bp	151,983,658,870	183.78×
RNA	300 bp	23,735,378,400	27.81×

Open in a new tab

Genome assembly

First, SOAPnuke (v2.1.0)¹² was used to perform quality control of Illumina data, and the clean data were utilized for genome size estimation. K-mer analysis¹³ was conducted using GCE (v1.0.2). As a result, the genome size was estimated to be 786.16 Mb, with a heterozygosity ratio of 0.47% and repeat sequence ratio of 47.03% (Table 2). A total of 27.35 Gb PacBio long-read data were used for de novo genome assembly using MECAT2 (v2.0.0)¹⁴ and NextDenovo (v2.4.0). The polishing was then carried out by the software gcpp (v2.0.2) and pilon (v1.22)¹⁵. Based on these sequencing data, the resulting assembly consists of 170 contigs and has a total length of 827.63 Mb (Table 3).

Table 2.

K-mer frequency and genome size evaluation of yellow-cheek carp genome.

K-mer number	K-mer Depth	Genome Size (Mb)	Heterozygous Ratio (%)	Repeat (%)
52,684,645,196	64	786.16	0.47	47.03

Open in a new tab

Table 3.

Statistics for Hi-C assisted assembly.

	Total	Contig Num	Contig N50	Scaffold Num	Scaffold N50	Proportion	GC-percent
Hi-C assisted pre-assembly	827,626,473	170	9,879,208	—	—	—	—
Hi-C-assisted assembly	823,606,315	165	9,879,208	24	33,649,237	99.51%	37.45

Open in a new tab

Hi-C scaffolding

The Hi-C technology was used for chromosome-level genome assembly. The Trimmomatic¹⁶ with parameters (LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:50) was used to remove adapters and low-quality fragments of the raw Hi-C reads data. The processed reads were then aligned to the assembly using the Juicer (v1.6)¹⁷ with default settings. Contigs were scaffolded using 3D-DNA pipeline¹⁸ with all valid Hi-C reads. We use the Juicebox (v2.13.07)¹⁷ to adjust the chromosome-scale scaffolds manually(Fig. 2, Table 4). And there are 141 gaps among the 24 chromosomes.

Fig. 2 — Genome-wide Hi-C interaction mapping of chromosome sections.

Table 4.

Chromosome and reference genome corresponding chromosome statistical results.

Chromosome ID	Number of Contigs	Length (bp)	Gaps
chr1	10	48,801,470	9
chr2	3	47,476,723	2
chr3	15	43,850,734	14
chr4	14	41,595,563	13
chr5	7	40,868,316	6
chr6	6	36,732,165	5
chr7	8	36,442,319	7
chr8	4	35,157,168	3
chr9	8	35,141,945	7
chr10	4	34,436,776	3
chr11	4	33,649,237	3
chr12	6	33,538,482	5
chr13	7	32,527,850	6
chr14	9	32,137,104	8
chr15	5	31,940,173	4
chr16	3	31,691,200	2
chr17	5	30,801,312	4
chr18	8	30,664,716	7
chr19	3	30,038,157	2
chr20	9	29,852,686	8
chr21	7	27,984,395	6
chr22	5	26,913,480	4
chr23	10	26,690,801	9
chr24	5	24,744,043	4
TOTAL	165	823,676,815	141

Open in a new tab

Repeat annotation

We used de novo prediction and homology comparison to annotate the genomic repetitive sequences. RepeatModeler¹⁹ were used to detected and classified the repetitive sequences in the genome assembly using tools including RECON(v1.08)²⁰, RepeatScout(v1.0.5)²¹, LTR-FINDER(v1.0.5)²² and TRF (v4.0.935)²³. For homology comparison, RepeatMasker (open-4.0.9) and RepeatProteinMask (open-4.0.9) were used to identify the known TEs of the yellow-cheek carp genome in the Repbase TE library^24,25 and TE protein database, respectively. The results showed that the genome repetitive sequence size was 456.66 Mb, accounting for 55.17% of the assembled genome. Among the repeat elements, short interspersed nuclear elements (SINEs) accounted for 0.24% of genome size and long interspersed nuclear elements (LINEs) accounted for 7.67%. Long terminal repeats (LTRs) and DNA elements accounted for 12.31% and 34.87%, respectively (Table 5).

Table 5.

Repetitive elements and their proportions in yellow-cheek carp genome.

Type	Repbase TEs		Protein TEs		Denovo TEs		Combined TEs
Type	Length (bp)	Percentage (%)	Length (bp)	Percentage (%)	Length (bp)	Percentage (%)	Length (bp)	Percentage (%)
DNA	135,569,082	16.38	21,468,489	2.59	208,673,761	25.21	288,628,347	34.87
LINE	17,380,180	2.1	17,851,894	2.16	52,066,672	6.29	63,480,091	7.67
SINE	1,034,564	0.12	0	0	1,364,468	0.16	2,016,734	0.24
LTR	24,846,205	3	19,281,719	2.33	91,771,796	11.09	101,898,770	12.31
Unknow	18,87,900	0.23	6,603	0	44,616,285	5.39	46,455,288	5.61
Total	173,959,113	21.02	58,476,207	7.06	343,673,320	41.52	429,931,954	51.94

Open in a new tab

Protein-coding gene prediction and annotation

In this research, the ab initio gene prediction, homology-based gene prediction and transcript prediction were used to predicted protein-coding genes of the yellow-cheek carp genome. Prior to gene prediction, the assembled yellow-cheek carp genome was hard and soft masked using RepeatMasker. The ab initio gene prediction was performed using Augustus (v3.3.1)^26,27 and Genescan (v1.0)²⁸. Models used for each gene predictor were trained from a set of high-quality proteins generated from the RNA-Seq data. For the homology-based prediction, Glimmer HMM(v3.0.4)²⁹ was used to align the protein sequences to our genome assembly and predict coding genes with the default parameters. The reference protein sequences of five fish species, including Ctenopharyngodon idella, Sinocyclocheilus grahami, Megalobrama amblycephala, Danio rerio and Cyprinus carpio, were sourced from the NCBI database. For the transcript prediction, clean RNA-Seq reads were assembled into the yellow-cheek carp genome using Stringtie (v2.1.1)³⁰. Then the gene structure was formed using PASA (v2.4.1)³¹. To consolidate the results from these three methods, MAKER (v3.00)³² was employed to enable the merging and integration of gene predictions.

For functional annotation of predicted gene, BLASTP (v2.6.0)^33,34 was used to align the anticipated genes to the Kyoto Encyclopedia of Genes and Genomes (KEGG)³⁵, Gene Ontology (GO)³⁶, NCBI-NR (non-redundant protein database), Swiss-Prot³⁷, TrEMBL³⁸ and InterPro³⁹ database. In total, we successfully predicted 24,153 protein-coding genes within the genome. These predicted genes displayed an average coding sequence length of 1638.21 bp, an average gene length of 18969.98 bp, and an average exon number of 9.87 (Table 6). Further, 22,965 genes, which accounts for 95.54% of the total number of predicted genes, were successfully assigned with at least one functional annotation (Table 7).

Table 6.

Basic statistical results of gene prediction.

Gene set	Number	Average gene length (bp)	Average CDS length (bp)	Average exon number per gene	Average exon length (bp)	Average intron length (bp)
denovo/AUGUSTUS	19,271	19,665.20	1,726.50	10.08	171.34	1,976.46
denovo/GlimmHMM	54,008	14,259.34	905.18	6.10	148.33	2,617.17
denovo/Genscan	23,400	24,954.02	1,692.64	9.19	184.09	2,838.60
homo/C. carpio	46,149	10,108.37	1,077.86	5.61	91.98	1,957.04
homo/S. grahami	43,803	11,026.80	1,115.46	5.75	193.90	2,085.45
homo/M. amblycephala	47,792	12,277.38	1,201.90	5.81	207.02	2,304.66
homo/D. rerio	45,504	9,494.07	1,020.30	5.28	193.18	1,979.17
homo/C. idella	63,196	7,385.67	972.24	4.59	211.79	1,786.17
trans.orf/RNAseq	15,467	21,165.74	1,680.38	10.78	281.86	1,853.98
PASA	24,038	19,597.60	1,651.11	9.97	257.30	1,898.72
MAKER	24,153	18,969.98	1,638.21	9.87	243.04	1,868.06

Open in a new tab

Table 7.

Functional annotation statistics.

	Gene number	Percent (%)
Total	24,038	NA
InterPro	20,189	83.99
GO	14,812	61.62
KEGG_ALL	22,561	93.86
KEGG_KO	16,013	66.62
Swissprot	20,884	86.88
TrEMBL	22,382	93.11
NR	22,936	95.42
Annotated	22,965	95.54
Unannotated	1,073	4.46

Open in a new tab

Annotation of non-coding RNA genes

The tRNAscan-SE (v1.3.1)⁴⁰ algorithms with default parameters were used to identify the genes associated with tRNA. We downloaded the closely related species rRNA sequences from the Ensembl database. Then rRNAs in the database were aligned against our genome using BLASTn (v2.6.0)⁴¹ with E-value <1e-5, identity ≥85% and match length ≥50 bp. The miRNAs and snRNAs were identified by Infernal (v1.1.2)⁴² software against the Rfam (v14.1) database with default parameters. As a result, we annotated 76 rRNAs, 2469 tRNAs, 291 MiRNAs and 212 snRNAs (Table 8).

Table 8.

Statistics of non-coding RNA annotation.

Type		Copy	AverageLength (bp)	TotalLength (bp)	% of genome
miRNA		291	88.84	25,853	0.0031
tRNA		2,469	75.51	186,428	0.0225
rRNA	rRNA	76	338.30	25,711	0.0031
	18 S	4	1,891.75	7,567	0.0009
	28 S	2	5,047.50	10,095	0.0012
	5 S	70	114.99	8,049	0.0010
snRNA	snRNA	212	128.75	27,295	0.0033
	CD-box	75	108.87	8,165	0.0010
	HACA-box	48	156.56	7,515	0.0009
	splicing	74	132.45	9,801	0.0012
	scaRNA	6	220.00	1,320	0.0002

Open in a new tab

Data Records

All the raw sequencing data have been deposited in the NCBI database under the accession number SRP470306⁴³. The genome assembly has been deposited at GenBank under the accession GCA_037101425.1⁴⁴. Genome annotations, along with predicted coding sequences and protein sequences, can be accessed through the Figshare⁴⁵.

Technical Validation

The BUSCO was used to evaluate the quality of the genome assembly. We assessed assembly completeness using BUSCO (v3.0.259)⁴⁶ with the reference arthropod gene set (n = 3,640). The final genome assembly showed a BUSCO completeness of 98.4%, consisting of 3,538 (97.2%) single-copy BUSCOs, 45 (1.2%) duplicated BUSCOs, 26 (0.7%) fragmented BUSCOs, and 31 (0.9%) missing BUSCOs (Table 9). Comparison of BUSCO results with Squaliobarbus curriculus (95.8%) and Mylopharyngodon piceus (96.0%) revealed the high genome assembly quality of yellow-cheeked carp⁴⁷.

Table 9.

Statistical result of BUSCO evaluation results of genome assembly.

	Number	Percentage (%)
Complete BUSCOs	3,583	98.4
Complete and single-copy BUSCOs	3,538	97.2
Complete and duplicated BUSCOs	45	1.2
Fragmented BUSCOs	26	0.7
Missing BUSCOs	31	0.9
Total BUSCO groups searched	3,640	100

Open in a new tab

Acknowledgements

This work was supported by the Key Research and Development Program of Hubei Province (2021BBA233 and 2023BBA001). The founders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author contributions

S.L., S.W. and Z.G. conceived this study. S.L., S.Q., Z.S. and Y.H. collected the samples and performed the experiments; S.L. and X.X. performed the research and analyzed the data. S.L. drafted the manuscript. All authors have read and approved the final manuscript.

Code availability

All commands and pipelines used in data processing were executed according to the manual and protocols of the corresponding bioinformatic software. No specific code has been developed for this study.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Zexia Gao, Email: gaozx@mail.hzau.edu.cn.

Shiming Wan, Email: wansm@mail.hzau.edu.cn.

References

1.Zhu NS, Chen HX. Food habits of yellow-cheek carp in Liangzi lake. Acta Hydrobiologica Sinica. 1959;03:262–271. [Google Scholar]
2.Liang ZS, Yi BL, Yu ZT. Reproductive habits and embryonic development of yellow-cheek carp in the main stream of the Yangtze River and the Han River. Acta Hydrobiologica Sinica. 1984;04:389–403. [Google Scholar]
3.Ma XF, Wang WM, Yang ZL. Biochemical composition and nutritional characteristics of yellow-cheek carp. Journal of Huazhong Agricultural University. 2008;06:759–762. [Google Scholar]
4.Yi CP, Zhong CM. Yellow-cheek carp fat content determination and fatty acid composition analysis. Food Science. 2013;14:255–258. [Google Scholar]
5.Zhang ZQ, et al. Yellow-cheek carp meat rate and muscle nutrient analysis. Tianjin Agricultural Sciences. 2013;04:29–33. [Google Scholar]
6.Zhu TB, et al. Lushan west sea yellow-cheek carp national aquatic germplasm resources protection zone aquatic biological resources preliminary investigation. Biotic Resources. 2021;02:188–193. [Google Scholar]
7.Qi XR. Survey of fishery resources in the upper Han River. Journal of Fisheries Research. 2022;01:21–32. [Google Scholar]
8.Liao F, et al. Complete mitochondrial genome of Elopichthys bambusa (Cypriniformes, Cyprinidae) Mitochondrial DNA. 2016;27:1387–1388. doi: 10.3109/19401736.2014.947593. [DOI] [PubMed] [Google Scholar]
9.Han XL, et al. The AFLP analysis of yellow-cheek carp group genetic diversity. Journal of Nanjing Normal University (Natural Science Edition). 2009;01:110–114. [Google Scholar]
10.Yang W, Fan QX. The specialization breeding technology of yellow-cheek carp. Animals Breeding and Feed. 2011;09:32–33. [Google Scholar]
11.Chin CS, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Chen Y, et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience. 2018;7:1–6. doi: 10.1093/gigascience/gix120. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Liu B, et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Quantitative Biology. 2013;35:62–67. [Google Scholar]
14.Xiao CL, et al. MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nat Methods. 2017;14:1072–1074. doi: 10.1038/nmeth.4432. [DOI] [PubMed] [Google Scholar]
15.Walker BJ, et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Flynn JM, et al. RepeatModeler 2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Bao Z, Eddy SR. Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Res. 2002;12:1269–1276. doi: 10.1101/gr.88502. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–i358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]
22.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research. 2007;35:W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Jurka J, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
25.Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends in Genetics. 2000;16:418–420. doi: 10.1016/S0168-9525(00)02093-X. [DOI] [PubMed] [Google Scholar]
26.Stanke M, Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Research. 2005;33:W465–W467. doi: 10.1093/nar/gki458. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Stanke M, et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research. 2006;34:W435–W439. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. [DOI] [PubMed] [Google Scholar]
29.Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]
30.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Haas BJ. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Cantarel BL, et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18:188–196. doi: 10.1101/gr.6743907. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.AltschuP SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. Journal of molecular biology. 1990;3:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
34.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Research. 2012;40:D109–D114. doi: 10.1093/nar/gkr988. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Ashburner M, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Boeckmann B. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 2003;31:365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Bairoch A. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research. 2000;28:45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Mitchell A, et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Research. 2015;43:D213–D221. doi: 10.1093/nar/gku1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research. 1997;5:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Altschul S. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–1337. doi: 10.1093/bioinformatics/btp157. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.2023. NCBI Sequence Read Archive. SRP470306
44.2023. NCBI GenBank. GCA_037101425.1
45.Li S. 2024. Chromosome-level genome assembly of the yellow-cheek carp Elopichthys bambusa. figshare. [DOI] [PMC free article] [PubMed]
46.Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution. 2021;38:4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Xu MRX, et al. Maternal dominance contributes to subgenome differentiation in allopolyploid fishes. Nature Communication. 2023;14:8357. doi: 10.1038/s41467-023-43740-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

2023. NCBI Sequence Read Archive. SRP470306
2023. NCBI GenBank. GCA_037101425.1
Li S. 2024. Chromosome-level genome assembly of the yellow-cheek carp Elopichthys bambusa. figshare. [DOI] [PMC free article] [PubMed]

Data Availability Statement

All commands and pipelines used in data processing were executed according to the manual and protocols of the corresponding bioinformatic software. No specific code has been developed for this study.

[CR1] 1.Zhu NS, Chen HX. Food habits of yellow-cheek carp in Liangzi lake. Acta Hydrobiologica Sinica. 1959;03:262–271. [Google Scholar]

[CR2] 2.Liang ZS, Yi BL, Yu ZT. Reproductive habits and embryonic development of yellow-cheek carp in the main stream of the Yangtze River and the Han River. Acta Hydrobiologica Sinica. 1984;04:389–403. [Google Scholar]

[CR3] 3.Ma XF, Wang WM, Yang ZL. Biochemical composition and nutritional characteristics of yellow-cheek carp. Journal of Huazhong Agricultural University. 2008;06:759–762. [Google Scholar]

[CR4] 4.Yi CP, Zhong CM. Yellow-cheek carp fat content determination and fatty acid composition analysis. Food Science. 2013;14:255–258. [Google Scholar]

[CR5] 5.Zhang ZQ, et al. Yellow-cheek carp meat rate and muscle nutrient analysis. Tianjin Agricultural Sciences. 2013;04:29–33. [Google Scholar]

[CR6] 6.Zhu TB, et al. Lushan west sea yellow-cheek carp national aquatic germplasm resources protection zone aquatic biological resources preliminary investigation. Biotic Resources. 2021;02:188–193. [Google Scholar]

[CR7] 7.Qi XR. Survey of fishery resources in the upper Han River. Journal of Fisheries Research. 2022;01:21–32. [Google Scholar]

[CR8] 8.Liao F, et al. Complete mitochondrial genome of Elopichthys bambusa (Cypriniformes, Cyprinidae) Mitochondrial DNA. 2016;27:1387–1388. doi: 10.3109/19401736.2014.947593. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Han XL, et al. The AFLP analysis of yellow-cheek carp group genetic diversity. Journal of Nanjing Normal University (Natural Science Edition). 2009;01:110–114. [Google Scholar]

[CR10] 10.Yang W, Fan QX. The specialization breeding technology of yellow-cheek carp. Animals Breeding and Feed. 2011;09:32–33. [Google Scholar]

[CR11] 11.Chin CS, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Chen Y, et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience. 2018;7:1–6. doi: 10.1093/gigascience/gix120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Liu B, et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Quantitative Biology. 2013;35:62–67. [Google Scholar]

[CR14] 14.Xiao CL, et al. MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads. Nat Methods. 2017;14:1072–1074. doi: 10.1038/nmeth.4432. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Walker BJ, et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS ONE. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Systems. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Flynn JM, et al. RepeatModeler 2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Bao Z, Eddy SR. Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Res. 2002;12:1269–1276. doi: 10.1101/gr.88502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21:i351–i358. doi: 10.1093/bioinformatics/bti1018. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research. 2007;35:W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Jurka J, et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends in Genetics. 2000;16:418–420. doi: 10.1016/S0168-9525(00)02093-X. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Stanke M, Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Research. 2005;33:W465–W467. doi: 10.1093/nar/gki458. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Stanke M, et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research. 2006;34:W435–W439. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20:2878–2879. doi: 10.1093/bioinformatics/bth315. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Haas BJ. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research. 2003;31:5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Cantarel BL, et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res. 2008;18:188–196. doi: 10.1101/gr.6743907. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.AltschuP SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. Journal of molecular biology. 1990;3:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]

[CR34] 34.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Research. 2012;40:D109–D114. doi: 10.1093/nar/gkr988. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Ashburner M, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Boeckmann B. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research. 2003;31:365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Bairoch A. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research. 2000;28:45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Mitchell A, et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Research. 2015;43:D213–D221. doi: 10.1093/nar/gku1243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research. 1997;5:955–964. doi: 10.1093/nar/25.5.955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Altschul S. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25:1335–1337. doi: 10.1093/bioinformatics/btp157. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.2023. NCBI Sequence Read Archive. SRP470306

[CR44] 44.2023. NCBI GenBank. GCA_037101425.1

[CR45] 45.Li S. 2024. Chromosome-level genome assembly of the yellow-cheek carp Elopichthys bambusa. figshare. [DOI] [PMC free article] [PubMed]

[CR46] 46.Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution. 2021;38:4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Xu MRX, et al. Maternal dominance contributes to subgenome differentiation in allopolyploid fishes. Nature Communication. 2023;14:8357. doi: 10.1038/s41467-023-43740-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Chromosome-level genome assembly of the yellow-cheek carp Elopichthys bambusa

Shunyao Li

Xuemei Xiong

Siyu Qiu

Zhigang Shen

Yan He

Zexia Gao

Shiming Wan

Abstract

Background & Summary

Fig. 1.

Methods

Sample collection and sequencing

Table 1.

Genome assembly

Table 2.

Table 3.

Hi-C scaffolding

Fig. 2.

Table 4.

Repeat annotation

Table 5.

Protein-coding gene prediction and annotation

Table 6.

Table 7.

Annotation of non-coding RNA genes

Table 8.

Data Records

Technical Validation

Table 9.

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases