A chromosome-level genome assembly of the darkbarbel catfish Pelteobagrus vachelli

Gaorui Gong; Wensi Ke; Qian Liao; Yang Xiong; Jingqi Hu; Jie Mei

doi:10.1038/s41597-023-02509-0

. 2023 Sep 8;10:598. doi: 10.1038/s41597-023-02509-0

A chromosome-level genome assembly of the darkbarbel catfish Pelteobagrus vachelli

Gaorui Gong ¹, Wensi Ke ¹, Qian Liao ¹, Yang Xiong ¹, Jingqi Hu ¹, Jie Mei ^1,^✉

PMCID: PMC10491679 PMID: 37684295

Abstract

The darkbarbel catfish (Pelteobagrus vachelli), an economically important aquaculture species in China, is extensively employed in hybrid yellow catfish production due to its superior growth rate. However, information on its genome has been limited, constraining further genetic studies and breeding programs. Leveraging the power of PacBio long-read sequencing and Hi-C technologies, we present a high-quality, chromosome-level genome assembly for the darkbarbel catfish. The resulting assembly spans 692.10 Mb, with an impressive 99.9% distribution over 26 chromosomes. The contig N50 and scaffold N50 are 13.30 Mb and 27.55 Mb, respectively. The genome is predicted to contain 22,109 protein-coding genes, with 96.1% having functional annotations. Repeat elements account for approximately 35.79% of the genomic landscape. The completeness of darkbarbel catfish genome assembly is highlighted by a BUSCO score of 99.07%. This high-quality genome assembly provides a critical resource for future hybrid catfish breeding, comparative genomics, and evolutionary studies in catfish and other related species.

Subject terms: Genomics, Genome

Background & Summary

Siluriformes, better known as catfishes, constitute a significant portion of the teleost orders, making up approximately 11% of all species. With 39 families and around 4,094 species documented to date, this order is among the largest in existence¹. Owing to their delectable meat, minimal intermuscular bone, and impressive feed conversion ratio, catfishes have risen to become one of the top three farmed fish and shellfish, boasting a production of 5,519 kilotons in 2017 alone^2,3.

Yellow catfish (Pelteobagrus fulvidraco) is an important aquaculture fish species in China, the production of which was about 565 thousand tons in 2020⁴. Darkbarbel catfish (Pelteobagrus vachelli), a close relative of yellow catfish within the family Bagridae, has a faster growth rate than yellow catfish⁵. Recently, hybrid yellow catfish (P. fulvidraco♀ × P. vachelli♂) has been widely cultured in China due to its faster growth rate and better emergence rate⁶. Unfortunately, specific research on the darkbarbel catfish genome proved difficult to access, but the general principles of genomics in aquaculture suggest that having a high-quality reference genome for this species will have significant benefits for breeding programs and other genetic studies. The insights gained from the darkbarbel catfish genome can also support the successful production of the hybrid yellow catfish, by contributing to a better understanding of their genetic makeup and the genetic factors influencing their advantageous traits.

In this research, we have employed a combination of PacBio long-read sequencing and Hi-C technology to generate a high-quality, chromosome-level assembly of the darkbarbel catfish genome. With the development of this high-quality reference genome, we foresee a significant propulsion in the field of population genetics and the identification of functional genes associated with critical economic traits in the darkbarbel catfish. The elucidation of these genomic underpinnings is expected to provide deeper insights into the hybrid vigor observed in catfish hybrids, thereby contributing to the optimization of hybrid breeding strategies.

Methods

Sample collection and sequencing

An adult female darkbarbel catfish was collected from the Yangtze River in Wuhan, Hubei, China. High-molecular weight (HMW) genomic DNA was extracted from muscle for Illumina sequencing and PacBio SMRT sequencing. The quality and quantity of the extracted DNA was assessed using standard agarose gel electrophoresis and a Qubit fluorometer (Thermo Fisher Scientific, USA).

For Illumina sequencing, the genomic DNA was randomly sheared to ~350 bp fragments, and a paired-end genomic library was prepared following the manufacturer’s protocol. Then, the library was sequenced on an Illumina HiSeq X-Ten platform using a paired-end 150 bp layout. For PacBio sequencing, the genomic DNA was used to construct SMRTbell libraries following the manufacturer’s protocol. After that, the libraries were sequencing on a PacBio Sequel platform with SMRT technology. Finally, we generated 96.33 Gb Illumina short-read data and 109.76 Gb of raw PacBio continuous long reads (CLR) with an average read length of 13.8 kb and a N50 read length of 22.4 kb (Table 1).

Table 1.

Statistics of the sequencing data.

Library type	Platform	Tissue	Data size (Gb)	Average length (bp)
WGS short reads	Illumina HiSeq X-Ten	Muscle	96.33	150
WGS long reads	Pacbio Sequel II	Muscle	109.76	13,806
Hi-C	Illumina HiSeq X-Ten	Blood	89.12	150
RNA-Seq	Illumina HiSeq X-Ten	Spleen, kidney, brain, muscle, ovary, and liver	13.11	150

Open in a new tab

For genome scaffolding, a Hi-C library was prepared using blood sample from the same darkbarbel catfish used for genomic DNA sequencing. The Hi-C library construction, including cell crosslinking, cell lysis, chromatin digestion, biotin labelling, proximal chromatin DNA ligation and DNA purification, was performed as previously described⁷, and the resulting Hi-C library was then subjected to paired-end sequencing with 150 bp read lengths on an Illumina HiSeq X-Ten platform. As a result, 89.12 Gb of Hi-C read data was generated (Table 1).

To aid in genome annotation, total RNA was extracted from multiple tissues, including spleen, kidney, brain, muscle, ovary and liver, and the quality were evaluated using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, USA) and an Agilent 2100 Bioanalyzer (Agilent Technologies, USA). The mixed RNA sample was used to construct a cDNA library using the TruSeq Stranded mRNA Library Prep Kit (Illumina, USA) following the manufacturer’s protocol. The library was then sequenced on an Illumina HiSeq X-Ten platform using a paired-end 150 bp layout, and 13.11 Gb of data was obtained (Table 1).

Genome assembly and polishing

To assemble the genome, we utilized two different assemblers: Wtdbg2 v2.5⁸ and Flye v2.9⁹ (Fig. 1a). The assembly generated by each assembler with default parameters was then polished using Arrow, a consensus algorithm that can generate highly accurate consensus sequences from PacBio subreads. The two polished assemblies were then merged using Quickmerge¹⁰, a tool that combine multiple genome assemblies into a single consensus assembly. The resulting merged assembly was then polished twice using two rounds of Arrow and two rounds of NextPolish with default parameters (Fig. 1a). We used PacBio subreads for Arrow and Illumina short reads for NextPolish. The resulting assembly consists of 318 contigs and has a total length of 691.96 Mb (Table 2).

Table 2.

Assembly statistics of darkbarbel catfish.

Type	Contig	Scaffold
Number	368	77
N50 (Mb)	13.30	27.55
L50	21	11
Max length (Mb)	25.86	43.66
Total length (Mb)	691.96	692.10

Open in a new tab

Hi-C scaffolding

The raw Hi-C reads were processed to remove adapters and low-quality bases using Fastp v0.20.1¹¹ with parameters -q 20 -l 50. The processed reads were then aligned to the assembly using the Juicer pipeline¹². Then the 3D-DNA pipeline¹³ was used to group the contigs into chromosomes, orient and order the contigs within each chromosome. To further improve the quality of the assembly, we manually corrected the errors using the Juicebox Assembly Tools¹². Following the scaffolding procedure, 691.13 Mb were successfully anchored to the 26 chromosomes (Fig. 1b), encompassing an impressive 99.9% of the total assembly size of 692.10 Mb (Table 2). The observed chromosome number concurs with the karyotype analysis reported in the previous study¹⁴. The scaffold N50 reached a substantial 27.55 Mb for the final assembly (Table 2). Notably, among the 26 chromosomes, 16 of them exhibited exceptional contiguity with no more than 10 gaps observed (Table 3).

Table 3.

Assembly statistics for chromosomes.

Name	Length (bp)	Gaps
Chromosome 1	43,657,716	13
Chromosome 2	40,896,051	14
Chromosome 3	36,888,217	27
Chromosome 4	33,276,060	5
Chromosome 5	32,707,000	6
Chromosome 6	32,597,123	16
Chromosome 7	31,818,857	42
Chromosome 8	30,665,500	13
Chromosome 9	28,731,355	6
Chromosome 10	28,256,991	15
Chromosome 11	27,553,930	4
Chromosome 12	27,443,794	21
Chromosome 13	26,028,991	1
Chromosome 14	24,604,000	8
Chromosome 15	24,378,500	7
Chromosome 16	23,162,500	35
Chromosome 17	22,852,063	2
Chromosome 18	22,476,563	7
Chromosome 19	22,294,437	3
Chromosome 20	22,045,500	3
Chromosome 21	19,572,866	18
Chromosome 22	19,205,148	1
Chromosome 23	18,096,000	8
Chromosome 24	17,657,601	5
Chromosome 25	17,528,722	1
Chromosome 26	17,005,764	10
Unplaced	703,650	0

Open in a new tab

Repeat annotation

RepeatModeler v2.0.2¹⁵ firstly identified repetitive sequences in the genome assembly using several tools, including RECON, RepeatScout, TRF, Ltr_retriever, and LTRharvest. The identified sequences were then clustered and classified into families using RepeatModeler. The classified libraries were combined with the Teleostei library from Repbase¹⁶. RepeatMasker v4.1.4¹⁷ was performed to mask repetitive sequences in the genome assembly using the combined library generated by RepeatModeler. A significant portion of the genome, approximately 35.79%, is masked, resulting in 247,692,317 bp being identified as repetitive elements. Retroelements, including long terminal repeats (LTRs, 6.90%), long interspersed nuclear elements (LINEs, 5.87%), and short interspersed nuclear elements (SINEs, 1.01%), collectively comprise the largest proportion, occupying 13.77% of the genome (Fig. 2). Furthermore, DNA transposons occupy a notable 11.48% of the genome, while unclassified elements add a nuanced layer, constituting 4.69% of the genomic landscape (Fig. 2).

Fig. 2 — Genomic landscape of darkbarbel catfish. Circos plot of darkbarbel catfish illustrating from outside to inside, gene density (a), GC content (b), and the densities of DNA transposons (c), LTRs (d), LINEs (e), and SINEs (f), all represented in 200-kb genomic windows.

Gene prediction and function assignment

In this research, we employed a comprehensive approach combining transcriptome-based, de novo, and homology-based methods to predict genes within the genome. For transcriptome-based prediction, RNA-seq reads underwent stringent quality filtering using Fastp v0.20.1, with specific parameters set at -q 20 -l 50. These filtered reads were then aligned to the genome assembly using HISAT2 v2.2.1¹⁸, followed by assembly using StringTie v2.2.1¹⁹. Gene structures were subsequently predicted utilizing TransDecoder v5.5.0 (https://github.com/TransDecoder/TransDecoder). For de novo prediction, RNA-seq aligned BAM files served as input for training the AUGUSTUS v3.4.0²⁰ gene prediction tool via BRAKER²¹. This trained model was then employed to predict gene structures within the genome. In the homology-based prediction, we utilized miniport v0.11²² to align protein sequences from P. fulvidraco⁷, Silurus meridionalis²³, and Ictalurus punctatus²⁴ to the genome assembly, enabling the prediction of gene structures based on homologous evidence. To consolidate the results from these three methods, EvidenceModeler²⁵ was employed, enabling the merging and integration of gene predictions. Following the gene prediction, the finalized gene sets derived from preceding methods underwent functional annotation through matching with a variety of databases. In particular, we utilized BLASTP v2.9.0²⁶ to align the anticipated genes with SwissProt²⁷, TrEMBL²⁸, eggNOG²⁹, and the NCBI non-redundant (NR) protein databases.

In total, we successfully predicted 22,109 protein-coding genes within the genome (Fig. 2). These predicted genes displayed an average coding sequence length of 1,695.04 bp, an average gene length of ~15 kb, and an average exon number of 10. Further, 96.1% of the total predicted genes, which equates to 21,243 genes, were successfully assigned with at least one functional annotation (Table 4 and Fig. 3).

Table 4.

Statistics of functional annotation result.

Database	Number
Swiss-Prot	19,983 (90.38%)
TrEMBL	21,229 (96.02%)
NR	21,171 (95.76%)
eggNOG	20,182 (91.28%)

Open in a new tab

Fig. 3 — Venn diagram of function annotations from various databases. The Venn diagram displays the overlap and uniqueness of functional gene annotations derived from 4 databases: TrEMBL, NR, Siwss-Prot, and eggNOG.

Genome synteny analysis

To compare the whole genome synteny, two chromosome-level genomes of Bagridae including yellow catfish³⁰ and Chinese longsnout catfish (Leiocassis longirostris)³¹ were aligned to the genome assembly of darkbarbel catfish using LAST v1354³² with default parameters. The synteny were visualized using Circos v0.69.9³³. A high degree of synteny conservation between the compared genomes was observed (Fig. 4).

Fig. 4 — Chromosome sequence synteny comparisons. (a) Syntenic relationship between the darkbarbel catfish genome and the Chinese longsnout catfish genome. (b) Syntenic relationship between the darkbarbel catfish genome and the yellow catfish genome. Each line connects a pair of homologous sequences between the two species.

Data Records

All the raw sequencing data utilized in this study, including WGS, RNA-Seq, and Hi-C, have been deposited in the NCBI database under the BioProject accession number PRJNA819563. Specifically, the Illumina WGS data was archived with the accession number SRR24926343³⁴, while the PacBio WGS data was deposited with the accession number SRR22354957³⁵. The RNA-Seq and Hi-C data sets were archived under the accession numbers SRR24928263³⁶ and SRR21799063³⁷, respectively. The genome assembly is available for public access at the NCBI GenBank under the accession number GCA_030014155.1³⁸. Genome annotations, along with predicted coding sequences and protein sequences, can be accessed through the Figshare³⁹.

Technical Validation

To evaluate the completeness of the genome assembly, we used the BUSCO v5.4.2⁴⁰ with the Actinopterygii database (actinopterygii_odb10) to assess the presences of conserved sing-copy genes in the assembly. Out of the total 3,640 BUSCO groups searched, an impressive 99.07% were identified as complete, indicating a high level of gene content preservation. Among these, 98.54% were both complete and present as single-copy genes, further emphasizing the quality of the assembly. Additionally, only 0.33% of the BUSCOs were fragmented, and 0.60% were missing from the assembly (Table 5). This demonstrates the remarkable completeness and conservation of gene content in the darkbarbel catfish genome assembly, achieving one of the best BUSCO scores observed among reported catfish genomes.

Table 5.

BUSCO assessment result.

Type	Number
Complete BUSCOs	3606 (99.07%)
Complete and single-copy BUSCOs	3587 (98.54%)
Complete and duplicated BUSCOs	19 (0.52%)
Fragmented BUSCOs	12 (0.33%)
Missing BUSCOs	22 (0.60%)
Total BUSCO groups searched	3,640

Open in a new tab

To ensure the quality and accuracy of the genome assembly, we employed a two-step validation process. Firstly, the assembly’s Quality Value (QV) was quantified using Merqury⁴¹, resulting in a QV score of 40.89, reflecting a high-grade assembly. Then, we mapped the raw sequencing data back to the assembly. For WGS short reads, we utilized BWA v0.7.17⁴², which resulted in a high mapping rate of 99.79%. For mapping RNA-Seq reads, we used HISAT2 v2.2.1 and achieved an overall mapping rate of 96.44%.

Acknowledgements

This work was supported by the China Agricultural Research System (CARS-46). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author contributions

G.G. and J.M. conceived this study. Y.X. and J.H. collected the samples and performed the experiments; G.G., W.K. and Q.L. performed the research and analyzed the data. G.G. drafted the manuscript. All authors have read and approved the final manuscript.

Code availability

No custom software codes were developed as part of this research. All bioinformatics tools and pipelines were executed following the manual and protocols provided by the respective software developers. The versions of the software used, along with their corresponding parameters, have been thoroughly described in the Methods section.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Fricke, R., Eschmeyer, W. N. & Fong, J. D. Eschmeyer’s Catalog of Fishes: Genera/Species by Family/Subfamily. https://research.calacademy.org/research/ichthyology/catalog/SpeciesByFamily.asp. Accessed 15 February 2023.
2.Naylor RL, et al. A 20-year retrospective review of global aquaculture. Nature. 2021;591:551–563. doi: 10.1038/s41586-021-03308-6. [DOI] [PubMed] [Google Scholar]
3.Tacon AG. Trends in global aquaculture and aquafeed production: 2000–2017. Rev. Fish. Sci. Aquac. 2020;28:43–56. doi: 10.1080/23308249.2019.1649634. [DOI] [Google Scholar]
4.Huang P, et al. Genome-wide association study reveals the genetic basis of growth trait in yellow catfish with sexual size dimorphism. Genomics. 2022;114:110380. doi: 10.1016/j.ygeno.2022.110380. [DOI] [PubMed] [Google Scholar]
5.Liu Y, et al. Mitochondrial genome of the yellow catfish Pelteobagrus fulvidraco and insights into Bagridae phylogenetics. Genomics. 2019;111:1258–1265. doi: 10.1016/j.ygeno.2018.08.005. [DOI] [PubMed] [Google Scholar]
6.Zhang G, et al. The effects of water temperature and stocking density on survival, feeding and growth of the juveniles of the hybrid yellow catfish from Pelteobagrus fulvidraco (♀)× Pelteobagrus vachelli (♂) Aquac. Res. 2016;47:2844–2850. doi: 10.1111/are.12734. [DOI] [Google Scholar]
7.Gong G, et al. Chromosomal-level assembly of yellow catfish genome using third-generation DNA sequencing and Hi-C analysis. Gigascience. 2018;7:giy120. doi: 10.1093/gigascience/giy120. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods. 2020;17:155–158. doi: 10.1038/s41592-019-0669-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019;37:540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
10.Chakraborty M, Baldwin-Brown JG, Long AD, Emerson J. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res. 2016;44:e147–e147. doi: 10.1093/nar/gkw654. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Zhang J, et al. Comparative analysis of the karyotype and nutritional ingredient for the hybrids of Pelteobagrus fulvidraco (♀)× P. vachelli (♂) and their parental fish. Mar. Fish. 2017;39:149–161. [Google Scholar]
15.Flynn JM, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 2015;6:1–6. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics. 2004;5:4.10. 11–14.10. 14. doi: 10.1002/0471250953.bi0410s05. [DOI] [PubMed] [Google Scholar]
18.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Stanke M, et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:W435–W439. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Li H. Protein-to-genome alignment with miniprot. Bioinformatics. 2023;39:btad014. doi: 10.1093/bioinformatics/btad014. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Zheng S, et al. Chromosome-level assembly of southern catfish (Silurus meridionalis) provides insights into visual adaptation to nocturnal and benthic lifestyles. Mol. Ecol. Resour. 2021;21:1575–1592. doi: 10.1111/1755-0998.13338. [DOI] [PubMed] [Google Scholar]
24.Liu Z, et al. The channel catfish genome sequence provides insights into the evolution of scale formation in teleosts. Nat. Commun. 2016;7:11757. doi: 10.1038/ncomms11757. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9:1–22. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
27.Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank: current status. Nucleic Acids Res. 1994;22:3578. [PMC free article] [PubMed] [Google Scholar]
28.Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000;28:45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Gong G, et al. Origin and chromatin remodeling of young X/Y sex chromosomes in catfish with sexual plasticity. Natl. Sci. Rev. 2023;10:nwac239. doi: 10.1093/nsr/nwac239. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.He WP, et al. Chromosome-level genome assembly of the Chinese longsnout catfish Leiocassis longirostris. Zool. Res. 2021;42:417–422. doi: 10.24272/j.issn.2095-8137.2020.327. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011;21:487–493. doi: 10.1101/gr.113985.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Krzywinski M, et al. Circos: An information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.2023. NCBI Sequence Read Archive. SRR24926343
35.2022. NCBI Sequence Read Archive. SRR22354957
36.2023. NCBI Sequence Read Archive. SRR24928263
37.2022. NCBI Sequence Read Archive. SRR21799063
38.Gong G. 2023. Genbank. GCA_030014155.1
39.Gong G, 2023. Genome annotations of darkbarbel catfish (Pelteobagrus vachelli) figshare. [DOI] [PMC free article] [PubMed]
40.Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 2021;38:4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:1–27. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

2023. NCBI Sequence Read Archive. SRR24926343
2022. NCBI Sequence Read Archive. SRR22354957
2023. NCBI Sequence Read Archive. SRR24928263
2022. NCBI Sequence Read Archive. SRR21799063
Gong G. 2023. Genbank. GCA_030014155.1
Gong G, 2023. Genome annotations of darkbarbel catfish (Pelteobagrus vachelli) figshare. [DOI] [PMC free article] [PubMed]

Data Availability Statement

[CR1] 1.Fricke, R., Eschmeyer, W. N. & Fong, J. D. Eschmeyer’s Catalog of Fishes: Genera/Species by Family/Subfamily. https://research.calacademy.org/research/ichthyology/catalog/SpeciesByFamily.asp. Accessed 15 February 2023.

[CR2] 2.Naylor RL, et al. A 20-year retrospective review of global aquaculture. Nature. 2021;591:551–563. doi: 10.1038/s41586-021-03308-6. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Tacon AG. Trends in global aquaculture and aquafeed production: 2000–2017. Rev. Fish. Sci. Aquac. 2020;28:43–56. doi: 10.1080/23308249.2019.1649634. [DOI] [Google Scholar]

[CR4] 4.Huang P, et al. Genome-wide association study reveals the genetic basis of growth trait in yellow catfish with sexual size dimorphism. Genomics. 2022;114:110380. doi: 10.1016/j.ygeno.2022.110380. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Liu Y, et al. Mitochondrial genome of the yellow catfish Pelteobagrus fulvidraco and insights into Bagridae phylogenetics. Genomics. 2019;111:1258–1265. doi: 10.1016/j.ygeno.2018.08.005. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Zhang G, et al. The effects of water temperature and stocking density on survival, feeding and growth of the juveniles of the hybrid yellow catfish from Pelteobagrus fulvidraco (♀)× Pelteobagrus vachelli (♂) Aquac. Res. 2016;47:2844–2850. doi: 10.1111/are.12734. [DOI] [Google Scholar]

[CR7] 7.Gong G, et al. Chromosomal-level assembly of yellow catfish genome using third-generation DNA sequencing and Hi-C analysis. Gigascience. 2018;7:giy120. doi: 10.1093/gigascience/giy120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods. 2020;17:155–158. doi: 10.1038/s41592-019-0669-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019;37:540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]

[CR10] 10.Chakraborty M, Baldwin-Brown JG, Long AD, Emerson J. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res. 2016;44:e147–e147. doi: 10.1093/nar/gkw654. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Zhang J, et al. Comparative analysis of the karyotype and nutritional ingredient for the hybrids of Pelteobagrus fulvidraco (♀)× P. vachelli (♂) and their parental fish. Mar. Fish. 2017;39:149–161. [Google Scholar]

[CR15] 15.Flynn JM, et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA. 2015;6:1–6. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics. 2004;5:4.10. 11–14.10. 14. doi: 10.1002/0471250953.bi0410s05. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Stanke M, et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:W435–W439. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 2016;32:767–769. doi: 10.1093/bioinformatics/btv661. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Li H. Protein-to-genome alignment with miniprot. Bioinformatics. 2023;39:btad014. doi: 10.1093/bioinformatics/btad014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Zheng S, et al. Chromosome-level assembly of southern catfish (Silurus meridionalis) provides insights into visual adaptation to nocturnal and benthic lifestyles. Mol. Ecol. Resour. 2021;21:1575–1592. doi: 10.1111/1755-0998.13338. [DOI] [PubMed] [Google Scholar]

[CR24] 24.Liu Z, et al. The channel catfish genome sequence provides insights into the evolution of scale formation in teleosts. Nat. Commun. 2016;7:11757. doi: 10.1038/ncomms11757. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9:1–22. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]

[CR27] 27.Bairoch A, Boeckmann B. The SWISS-PROT protein sequence data bank: current status. Nucleic Acids Res. 1994;22:3578. [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000;28:45–48. doi: 10.1093/nar/28.1.45. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Gong G, et al. Origin and chromatin remodeling of young X/Y sex chromosomes in catfish with sexual plasticity. Natl. Sci. Rev. 2023;10:nwac239. doi: 10.1093/nsr/nwac239. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.He WP, et al. Chromosome-level genome assembly of the Chinese longsnout catfish Leiocassis longirostris. Zool. Res. 2021;42:417–422. doi: 10.24272/j.issn.2095-8137.2020.327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011;21:487–493. doi: 10.1101/gr.113985.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Krzywinski M, et al. Circos: An information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.2023. NCBI Sequence Read Archive. SRR24926343

[CR35] 35.2022. NCBI Sequence Read Archive. SRR22354957

[CR36] 36.2023. NCBI Sequence Read Archive. SRR24928263

[CR37] 37.2022. NCBI Sequence Read Archive. SRR21799063

[CR38] 38.Gong G. 2023. Genbank. GCA_030014155.1

[CR39] 39.Gong G, 2023. Genome annotations of darkbarbel catfish (Pelteobagrus vachelli) figshare. [DOI] [PMC free article] [PubMed]

[CR40] 40.Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 2021;38:4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:1–27. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A chromosome-level genome assembly of the darkbarbel catfish Pelteobagrus vachelli

Gaorui Gong

Wensi Ke

Qian Liao

Yang Xiong

Jingqi Hu

Jie Mei

Abstract

Background & Summary

Methods

Sample collection and sequencing

Table 1.

Genome assembly and polishing

Fig. 1.

Table 2.

Hi-C scaffolding

Table 3.

Repeat annotation

Fig. 2.

Gene prediction and function assignment

Table 4.

Fig. 3.

Genome synteny analysis

Fig. 4.

Data Records

Technical Validation

Table 5.

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases