Skip to main content
GigaByte logoLink to GigaByte
. 2025 May 9;2025:gigabyte155. doi: 10.46471/gigabyte.155

Chromosome-level genome assemblies of five Sinocyclocheilus species

Chao Bian 1,, Ruihan Li 2,, Yuqian Ouyang 1,, Junxing Yang 3, Xidong Mu 4,*, Qiong Shi 1,5,*
PMCID: PMC12089701  PMID: 40395690

Abstract

Sinocyclocheilus, a genus of tetraploid fishes endemic to Southwest China’s karst regions, are classified as second-class nationally protected species due to their fragile habitat. Limited high-quality genomic resources have hampered studies on their phylogenetic relationships and the origin of their polyploidy. Here, we present a high-quality genome assembly of the most abundant Sinocyclocheilus species, the golden-line barbel (Sinocyclocheilus grahami), by integrating PacBio long-read and Hi-C sequencing. The resulting scaffold-level genome-assembly is 1.6 Gb long, with a scaffold N50 of up to 30.7 Mb. We annotated 42,806 protein-coding genes. Also, 93.1% of the assembled genome sequences (about 1.5 Gb) and 93.8% of the total predicted genes were successfully anchored onto 48 chromosomes. Furthermore, we obtained chromosome-level genome assemblies for four other Sinocyclocheilus species (S. anophthalmus, S. maitianheensis, S. anshuiensis, and S. rhinocerous) based on homologous comparisons. These genomic resources will enable in-depth investigations on cave adaptation, improvement of economic values, and conservation of diverse Sinocyclocheilus fishes.

Introduction

Sinocyclocheilus (order: Cypriniformes; family: Cyprinidae; subfamily: Barbinae) is a genus of tetraploid fishes endemic to the karst regions of the Yunnan-Guizhou plateau and surrounding areas in Southwest China, including Guangxi, Guizhou, Yunnan, and the Hubei provinces [1]. All members in this genus are classified as second-class protected species, highlighting the urgent need for their conservation and further investigation. Despite recent efforts in research and development, such as an artificial breeding program for S. yunnanensis to prevent extinction [2], many other species, particularly those with small populations and limited distributions, remain in a threatened status.

Due to the long-term geographic isolation, Sinocyclocheilus species have undergone significant speciation, making it the most species-rich genus of cavefish, with 76 known members [1]. This genus inhabits various ecological environments, ranging from surface-dwelling to semi-cave-dwelling and cave-restricted. These distinct habitat types lead to diverse traits in morphology, behavior, and physiology [3], making them good models for studying cave adaptation and phylogenetic evolution. Although Sinocyclocheilus is of significant scientific interest, high-quality genomic resources and whole genome-based comparative studies are rare among Sinocyclocheilus fishes. The lack of genomic information hinders a deeper understanding of key evolutionary questions, such as phylogenetic relationships, the origin of polyploidy, and the evolution of ancestral chromosomes within this genus.

To enrich the genetic resources for Sinocyclocheilus members, we constructed a chromosome-level genome assembly for the most abundant and surface-dwelling representative, S. grahami (NCBI:txid75366, locally named golden-line barbel), using PacBio and Hi-C sequencing technologies, and subsequently conducted a homologous comparison to obtain chromosome-level genome assemblies for four other Sinocyclocheilus species: surface-dwelling S. maitianheensis (NCBI:txid307951), semi-cave-dwelling S. rhinocerous (NCBI:txid307959), cave-restricted S. anshuiensis, (NCBI:txid1608454), and S. Anophthalmus (NCBI:txid307955). To verify the allotetraploid origin of Sinocyclocheilus, we conducted a genome-wide synteny analysis between S. grahami and its relative, the common carp (Cyprinus carpio). The analysis revealed extensive chromosomal rearrangements and supported the allotetraploid origin of the Sinocyclocheilus genus. The genomic data we present in this paper provide valuable genetic resources for a deeper investigation into the mechanisms of cave adaptation, and for exploring the potential economic and ecological values of various Sinocyclocheilus species.

Methods

Sample collection, DNA extraction, and genome and transcriptome sequencing

A muscle sample of artificially bred S. grahami was collected from the Endangered Fish Conservation Center of Kunming Institute of Zoology, which is located in Kunming City, Yunnan Province, China. Genomic DNA (gDNA) and total RNA were extracted using the Nucleic Acid Kit (Qiagen, Germantown, MD, USA) and TRIzol Reagent (Invitrogen, Carlsbad, CA, USA), respectively, following the manufacturer’s instructions.

Multiple sequencing strategies were applied to construct a whole-genome assembly of S. grahami. In brief, the draft genome assembly based on Illumina sequencing technology (Illumina Inc., San Diego, CA, USA) was obtained using our previous study [4] as a reference. The genomic DNA from muscle tissue in our present study was used to construct a SMART bell library with an insert size of 20 kb, and this library was subsequently sequenced on a PacBio Sequel platform (Pacific Biosciences, Menlo Park, CA, USA). For the construction of a chromosome-level genome assembly, a Hi-C (High-throughput chromosome conformation capture) library was generated for sequencing on an Illumina HiSeq X-Ten platform. In addition, a paired-end library with an insert size of 400 bp was constructed from the extracted gDNA and then sequenced on an Illumina HiSeq X-Ten platform for genome size estimation. Adapters, duplicated reads, and low-quality reads with 10 or more N bases were removed by SOAPfilter v2.2 [5]. For transcriptome sequencing (to support the annotation of genes), a paired-end library with an insert size of 350 bp was generated and then sequenced on an Illumina HiSeq X-Ten platform. Raw data were filtered by SOAPnuke v1.0 (RRID:SCR_015025) [6].

Genome size estimation and chromosome-level genome assembly

A 17-mer frequency distribution, confirmed to be a Poisson pattern [7], was applied to estimate the genome size of S. grahami with a library of short-inserted size (400 bp). The genome size calculation formula was set as follows [4]: Genome Size = K numK depth(K num is the number of 17-mer; K depth is the sequencing depth at the core peak frequency).

Based on our published contigs of S. grahami [4], we performed a hybrid genome assembly by combining short contigs [4] with PacBio long reads into the primary scaffolds by using DBG2OLC v1.1 [8] with defaulted parameters. These scaffolds were subsequently extended using SSPACE v2.0 (RRID:SCR_005056) [9]. Minimap (RRID:SCR_018550) [10] and Racon (RRID:SCR_017642) [11] were employed for two rounds of error correction to obtain the final scaffolds with the assistance of PacBio long reads.

The Hi-C raw reads were mapped onto these scaffolds by Bowtie2 (RRID:SCR_016368) [12], and quality control was performed using HiC-Pro v2.8.0 (RRID:SCR_017643) [13] to obtain data for generating a genome-wide interaction matrix. Juicer v1.5 (RRID:SCR_017226) [14] and 3D-DNA de novo v170123 [15] were used to arrange and orient scaffolds into chromosomes. A Hi-C heatmap was drawn by HiCPlotter v0.6.6 [16] for visualization.

Annotation of repeat sequences, gene and function

Three prediction methods were combined for the annotation of repeat sequences, including de novo, homolog-based, and tandem repeat prediction. A de novo repeat library was built using RepeatModeler v1.04 (RRID:SCR_015027) [17] and LTR_FINDER v1.0.6 (RRID:SCR_015247) [18]. Genome sequences were mapped onto this library to identify repeat sequences using RepeatMasker v4.06 (RRID:SCR_012954) [19]. For homolog-based predictions, transposable elements were identified using RepeatMasker v4.06 and RepeatProteinMask v4.06 [19] based on the Repbase TE v21.01 (RRID:SCR_021169) [20] library. Tandem repeat sequences were finally identified by Tandem Repeats Finder v4.09 (TRF, RRID:SCR_022065) [21].

Protein-coding genes were predicted by integrating three methods: homology-based annotation, de novo prediction, and transcriptome-based annotation. Protein sequences of five representative teleost species were downloaded from NCBI [22] for genome-wide mapping onto S. grahami: zebrafish (Danio rerio, NCBI accession: GCF_000002035.6), medaka (Oryzias latipes, NCBI accession: GCF_002234675.1), S. anshuiensis, S. rhinocerous and S. grahami (the primary genome assemblies using Illumina data). BLAT (RRID:SCR_011919) [23] and GeneWise v2.4.2 (RRID:SCR_015054) [24] were used for sequence alignment and gene structure prediction. Augustus v3.2.1 (RRID:SCR_008417) [25] was used to de novo predict coding sequences (CDS) after the repeat elements were masked. Hisat v0.1.6 (RRID:SCR_015530) [26] and Cufflinks v2.2.1 (RRID:SCR_014597) [27] were employed to perform the transcriptome-based annotation. Finally, a non-redundant gene set was merged by MAKER v2.31.8 (RRID:SCR_005309) [28]. For function annotation, we searched four public databases (Swiss-Prot [29], TrEMBL [29], InterPro [30] and KEGG [31]) as references to complete the annotation of gene functions.

Pseudochromosome construction of another four scaffold-level assemblies of different Sinocyclocheilus fishes

The general chromosome number of Sinocyclocheilus fishes is 96 [32]. Pairwise whole-genome alignments were used to construct pseudochromosomes of scaffold-level assemblies for four Sinocyclocheilus fishes (using data from three previously published and one unpublished genome assemblies): S. anshuiensis (GCF_001515605.1), S. rhinocerous (GCF_001515625.1) [4], S. maitianheensis (GCA_018148995.1) [33] and S. anophthalmus (GCA_044706345.1) [34] based on the reference chromosome-level assembly of S. graham [4].

Lastz v1.1 (RRID:SCR_018556) [35] was used to process the genome alignments. Those aligned sequences with a length of more than 10 kb were retained for pseudochromosome construction. Synteny blocks of each genome for all five Sinocyclocheilus members were identified using MCScanX (RRID:SCR_022067) [36] after self-aligning with their own protein-set using BLAST [37] and the optimized parameter of E-value set to less than 1 × 10−5. Circos figures were plotted using Circos (RRID:SCR_011798) [38].

Subgenome identification in S. grahami and phylogenetic analysis

The common carp (Cyprinus carpio) is a well-known allotetraploid species [39] that shared a recent genome-wide duplication event with Sinocyclocheilus species, as we reported [33]. Subgenomes A and B of the common carp [40] were used as the references to identify corresponding synteny blocks in the genomes of goldfish (Carassius auratus), S. graham, and S. anophthalmus for subsequent subgenome construction using MUMmer v4.0beta1 (RRID:SCR_018171) [41]. RectChr (RRID:SCR_026859) [42] was used to visualize the synteny blocks and chromosome structure variations.

For the phylogenetic analysis, BLASTp (RRID:SCR_001010) [37] and OrthoMCL (RRID:SCR_007839) [43] were used for protein sequence alignment and gene family clustering. All the single-copy orthologous genes were aligned using MUSCLE v3.8.31 (RRID:SCR_011812) [44] for all genomes and subgenomes. Subsequently, Gblocks (RRID:SCR_015945) [45] was used to obtain the conservative multi-sequence alignments. Finally, we used PhyML v3.0 (RRID:SCR_014629) [46] to construct a phylogenetic tree using the maximum likelihood method. MCMCtree in the PAML package (RRID:SCR_014932) [47] was used to estimate the divergence time from the above-mentioned fishes and other representative species.

Results and discussion

Chromosome-level genome assemblies of the five Sinocyclocheilus species

A total of 86.4 Gb, 79.1 Gb, and 229.0 Gb of Illumina, Pacbio, and Hi-C reads, respectively, were sequenced. We constructed a chromosome-level genome assembly for S. grahami using PacBio and Hi-C sequencing technologies. For the K-mer analysis, we estimated the genome size to be 1.9 Gb. The final chromosome-level genome assembly of S. grahami is 1.6 Gb with a contig N50 of 738.5 kb and a scaffold N50 of 30.7 Mb. About 93.1% of the assembled genome sequences (1.5 Gb) and 93.8% of the predicted genes were anchored onto 48 chromosomes (Figure 1A–B). For the BUSCO result (RRID:SCR_015008), 85.0% (3,093) of the BUSCO genes were complete, with 60.6% (2,205) being identified as single-copy, 24.4% (888) as duplicated, and a mere 3.6% (131) as fragmented.

Figure 1.

Figure 1.

(A) Circos atlas of the reference chromosome-level genome assembly of S. grahami. Rings from outside to inside show chromosome length (Mb), distribution of gene density in each 100-kb non-overlapping genomic window, distribution of SNP density in each 100-kb non-overlapping genomic window, GC content in each 100-kb non-overlapping genomic window, and internal syntenic blocks of chromosomes that were connected by green lines. Red lines mark a special syntenic block between chromosome 1 and chromosome 3. (B) Genome-wide Hi-C heatmap of the S. grahami genome. (C) Circos atlases of the chromosome-level genome assemblies of four Sinocyclocheilus species. (D–F) Two chromosomal fusions, five chromosomal translocations, and eighteen chromosomal inversion events between S. grahami (top) and C. carpio (bottom). (G) A phylogenetic tree of seven vertebrate genomes and eight sub-genomes of tetraploid species. The orange box represents the clade of sub-genome A; the blue box marks the clade of sub-genome B; the purple box highlights a clade homologous to the ancestors of sub-genome A. Diverge time is numbered in blue, and a geographic time scale in million-years-ago is provided.

A total of 583.2 Mb repeat sequences were annotated (Table 1). A sum of 42,806 protein-coding genes were annotated from the S. grahami genome assembly (Table 2), and 39,458 (92.2% of all) genes were annotated with functions. The detailed function results are shown in Table 3. We also constructed chromosome-level genome assemblies for the other four Sinocyclocheilus species based on homologous comparisons (Figure 1C). Over 82% of the genome sequences of all four species were anchored on these constructed chromosomes.

Table 1.

Statistics of repeat sequences among the S. grahami genomes.

Type S. grahami
Repeat Size (bp) % of genome
ProteinMask 72,493,669 4.6
RepeatMasker 330,527,709 20.8
TRF 33,729,084 2.1
De novo 372,094,752 23.4
Total 583,165,599 36.7
DNA 374,708,178 23.6
LINE 105,180,165 6.6
SINE 5,429,507 0.3
LTR 145,884,791 9.2
Other 4,064 0
Unknown 2,578,927 0.2
Total 547,757,670 34.5

Table 2.

Protein-coding gene annotation of S. grahami genome.

Method Software or Species Gene number Average
Transcript Length (bp) CDS Length (bp) Exons per Gene Exon Length (bp) Intron Length (bp)
de novo Augustus 47,723 20,378 1,148 7.6 150.9 2,911
Homolog Danio rerio 64,268 36,029 1,523 11.4 134.1 3,333
Oryzias latipes 29,532 29,701 1,637 12.5 131.2 2,445
Sinocyclocheilus anshuiensis 55,080 19,982 1,644 12.6 130.3 1,579
Sinocyclocheilus rhinocerous 57,118 21,844 1,631 12.5 130.6 1,760
Sinocyclocheilus grahami 49,556 29,535 1,507 11.1 135.5 2,768
Transcriptome 31,114 9,515 1,628 7.6 214.7 1,198
Consensus MAKER 42,806 18,370 1,331 8.9 148.2 1,984

Note: In homolog annotation, we used the genome and gene set of S. grahami, which were assembled using Illumina data [46].

Table 3.

The number of functional assignments from diverse databases.

Number Percentage (%)
Total 42,806 100
InterPro 29,358 68.6
KEGG 34,734 81.1
Swissprot 33,908 79.2
TrEMBL 38,498 89.9
Annotated 39,458 92.2
Unanotated 3,348 7.8

Allotetraploid origin of diverse Sinocyclocheilus members

To confirm that Sinocyclocheilus fishes originated from allotetraploid, we performed a genome-wide synteny analysis of S. grahami and C. carpio (Figures 1D–F and 2). Compared with common carp, 18 large chromosomal rearrangements were observed in the S. grahami genome, including two chromosomal fusions (Figures 1D and 2), five chromosomal translocations (Figures 1E and 2), and eighteen chromosomal inversions (Figures 1F and 2). Among them, chromosome 1 of S. grahami was homologous to chromosomes A22 and A14 of the common carp; chromosome 3 of S. grahami was homologous to chromosomes B22 and A25 of the common carp. These alignments resulted in S. grahami having two fewer chromosomes than the common carp.

Figure 2.

Figure 2.

Genome synteny of S. grahami (top) and C. carpio (bottom).

According to our synteny results, we renumbered the chromosomes of S. grahami and divided them into two sub-genomes. The other four Sinocyclocheilus members and goldfish were also identified with two sub-genomes for phylogenetic analysis. In the established phylogenetic tree, the group of sub-genome A was clustered into a single branch; the branch of subgenome B was homologous to the ancestors of O. macrolepis, P. huangchuchieni, and P. tetrazona (Figure 1G), similarly to patterns from early reports [38, 48].

Conclusion

We constructed chromosome-level genome assemblies of five Sinocyclocheilus species. These reference genomics data are valuable resources for in-depth studies on phylogenetic evolution and biodiversity of various Sinocyclocheilus species, and lay a solid foundation for understanding cave adaptation and cavefish biology. Our current study can also contribute to species conservation and the exploitation of potential economic and ecological values of diverse Sinocyclocheilus members.

Funding Statement

This study was supported by the National Key Research and Development Program of China (2023YFE0205100), Shenzhen Natural Science Foundation (no. JCYJ20241202124511016), Key Laboratory of Tropical and Subtropical Fishery Resources Application and Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou, 51038, PR China (20220202).

Data availability

The genome assembly of S. grahami was uploaded to NCBI under the BioProject PRJNA1172646, and the genome assembly of S. anophthalmus is available under the BioProject PRJNA669129. The Pacbio, HiC, and transcriptome reads are deposited in NCBI with accession numbers SRR32815372, SRR32815371, and SRR32815370, respectively. All other data, including the repeat and gene annotations, have been shared via the GigaDB repository [49], with separate entries for the individual species genomes [34, 5053].

Abbreviations

CDS, coding sequences; gDNA, Genomic DNA; LINE, long interspersed nuclear element; LTR, long terminal repeat; SINE, short interspersed nuclear element; TRF, Tandem Repeat Finder.

Declarations

Ethics approval and consent to participate

The authors declare that ethical approval was not required for this type of research.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

QS and JY conceived the study and designed the project. YO and JY managed the project and prepared samples. CB, RL, YO, and XM performed data analysis and wrote the manuscript. QS and JY revised the manuscript. All authors contributed to data interpretation.

Funding

This study was supported by the National Key Research and Development Program of China (2023YFE0205100), Shenzhen Natural Science Foundation (no. JCYJ20241202124511016), Key Laboratory of Tropical and Subtropical Fishery Resources Application and Cultivation, Ministry of Agriculture and Rural Affairs, Pearl River Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou, 51038, PR China (20220202).

References

  • 1.Wen H, Luo T, Wang Y et al. Molecular phylogeny and historical biogeography of the cave fish genus Sinocyclocheilus (Cypriniformes: Cyprinidae) in southwest China. Integr. Zool., 2022; 17(2): 311–325. doi: 10.1111/1749-4877.12624. [DOI] [PubMed] [Google Scholar]
  • 2.Yin YH, Zhang XH, Wang XA et al. Construction of a chromosome-level genome assembly for genome-wide identification of growth-related quantitative trait loci in Sinocyclocheilus grahami (Cypriniformes, Cyprinidae). Zool. Res., 2021; 42(3): 262–266. doi: 10.24272/j.issn.2095-8137.2020.321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Krishnan J, Rohner N. . Cavefish and the basis for eye loss. Philos. Trans. R. Soc. Lond. B: Biol. Sci., 2017; 372(1713): 20150487. doi: 10.1098/rstb.2015.0487. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Yang J, Chen X, Bai J et al. The Sinocyclocheilus cavefish genome provides insights into cave adaptation. BMC Biol., 2016; 14: 1. doi: 10.1186/s12915-015-0223-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Li R, Yu C, Li Y et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics, 2009; 25(15): 1966–1967. doi: 10.1093/bioinformatics/btp336. [DOI] [PubMed] [Google Scholar]
  • 6.Chen Y, Chen Y, Shi C et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience, 2018; 7(1): 1–6. doi: 10.1093/gigascience/gix120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Liu B, Shi Y, Yuan J et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv preprint. 2013; 10.48550/arXiv.1308.2012. [DOI]
  • 8.Ye C, Hill CM, Wu S et al. DBG2OLC: efficient assembly of large genomes using long erroneous reads of the third generation sequencing technologies. Sci. Rep., 2016; 6: 31900. doi: 10.1038/srep31900. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Boetzer M, Henkel CV, Jansen HJ et al. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics, 2010; 27(4): 578–579. doi: 10.1093/bioinformatics/btq683. [DOI] [PubMed] [Google Scholar]
  • 10.Li H. . Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics, 2016; 32(14): 2103–2110. doi: 10.1093/bioinformatics/btw152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Vaser R, Sović I, Nagarajan N et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res., 2017; 27(5): 737–746. doi: 10.1101/gr.214270.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Langmead B, Salzberg SL. . Fast gapped-read alignment with Bowtie 2. Nat. Methods, 2012; 9(4): 357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Servant N, Varoquaux N, Lajoie BR et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol., 2015; 16(1): 259. doi: 10.1186/s13059-015-0831-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Durand NC, Shamim MS, Machol I et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst., 2016; 3(1): 95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Dudchenko O, Batra SS, Omer AD et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science, 2017; 356(6333): 92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Akdemir KC, Chin L. . HiCPlotter integrates genomic data with interaction matrices. Genome Biol., 2015; 16(1): 198. doi: 10.1186/s13059-015-0767-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Chen N. . Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform., 2004; 5(1): 4.10.1–4.10.14. doi: 10.1002/0471250953.bi0410s05. [DOI] [PubMed] [Google Scholar]
  • 18.Xu Z, Wang H. . LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res., 2007; 35(suppl_2): W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Tarailo-Graovac M, Chen N. . Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform., 2009; 25(1): 4.10.1–4.10.14. doi: 10.1002/0471250953.bi0410s25. [DOI] [PubMed] [Google Scholar]
  • 20.Jurka J, Kapitonov VV, Pavlicek A et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res., 2005; 110(1–4): 462–467. doi: 10.1159/000084979. [DOI] [PubMed] [Google Scholar]
  • 21.Benson G. . Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res., 1999; 27(2): 573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Benson DA, Karsch-Mizrachi I, Lipman DJ et al. GenBank. Nucleic Acids Res., 2006; 34(Database issue): D16–D20. doi: 10.1093/nar/gkj157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.James Kent W. . BLAT—the BLAST-like alignment tool. Genome Res., 2002; 12(4): 656–664. doi: 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Birney E, Clamp M, Durbin R. . GeneWise and genomewise. Genome Res., 2004; 14(5): 988–995. doi: 10.1101/gr.1865504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Stanke M, Keller O, Gunduz I et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res., 2006; 34(Web Server issue): W435–W439. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kim D, Langmead B, Salzberg SL. . HISAT: a fast spliced aligner with low memory requirements. Nat. Methods, 2015; 12(4): 357–360. doi: 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Trapnell C, Hendrickson DG, Sauvageau M et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol., 2013; 31(1): 46–53. doi: 10.1038/nbt.2450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Cantarel BL, Korf I, Robb SM et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res., 2008; 18(1): 188–196. doi: 10.1101/gr.6743907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Boeckmann B, Bairoch A, Apweiler R et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 2003; 31(1): 365–370. doi: 10.1093/nar/gkg095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Hunter S, Apweiler R, Attwood TK et al. InterPro: the integrative protein signature database. Nucleic Acids Res., 2009; 37(Database issue): D211–D215. doi: 10.1093/nar/gkn785. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kanehisa M, Furumichi M, Tanabe M et al. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res., 2017; 45(D1): D353–D361. doi: 10.1093/nar/gkw1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Heng X, Ren-Dong Z, Jian-Guo F et al. Nuclear DNA content and ploidy of seventeen species of fishes in Sinocyclocheilus. Zool. Res., 2002; 23(3): 195–199. https://www.zoores.ac.cn/article/id/860. [Google Scholar]
  • 33.Li R, Wang X, Bian C et al. Whole-genome sequencing of Sinocyclocheilus maitianheensis reveals phylogenetic evolution and immunological variances in various Sinocyclocheilus fishes. Front. Genet., 2021; 12: 736500. doi: 10.3389/fgene.2021.736500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bian C, Li R, Ouyang Y et al. Genome assembly of the eyeless golden-line fish Sinocyclocheilus anophthalmus . GigaScience Database, 2025; 10.5524/102705. [DOI] [Google Scholar]
  • 35.Harris RS. . Improved Pairwise Alignment of Genomic DNA. The Pennsylvania State University, 2007. https://www.bx.psu.edu/∼rsharris/lastz/. [Google Scholar]
  • 36.Wang Y, Tang H, Debarry JD et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res., 2012; 40(7): e49. doi: 10.1093/nar/gkr1293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.McGinnis S, Madden TL. . BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res., 2004; 32(Web Server issue): W20–W25. doi: 10.1093/nar/gkh435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Krzywinski M, Schein J, Birol I et al. Circos: an information aesthetic for comparative genomics. Genome Res., 2009; 19(9): 1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Xu P, Xu J, Liu G et al. The allotetraploid origin and asymmetrical genome evolution of the common carp Cyprinus carpio . Nat. Commun., 2019; 10(1): 4625. doi: 10.1038/s41467-019-12644-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Li JT, Wang Q, Huang Yang MD et al. Parallel subgenome structure and divergent expression evolution of allo-tetraploid common carp and goldfish. Nat. Genet., 2021; 53(10): 1493–1503. doi: 10.1038/s41588-021-00933-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Marçais G, Delcher AL, Phillippy AM et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol., 2018; 14(1): e1005944. doi: 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.He W. . RectChr. GitHub. 2025; https://github.com/BGI-shenzhen/RectChr.
  • 43.Fischer S, Brunk BP, Chen F et al. Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr. Protoc. Bioinform., 2011; 35: 6.12.1–6.12.19. doi: 10.1002/0471250953.bi0612s35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Edgar RC. . MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 2004; 32(5): 1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Castresana J. . Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol., 2000; 17(4): 540–552. doi: 10.1093/oxfordjournals.molbev.a026334. [DOI] [PubMed] [Google Scholar]
  • 46.Guindon S, Dufayard JF, Lefort V et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol., 2010; 59(3): 307–321. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
  • 47.Yang Z, Rannala B. . Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol. Biol. Evol., 2006; 23(1): 212–226. doi: 10.1093/molbev/msj024. [DOI] [PubMed] [Google Scholar]
  • 48.Li JT, Wang Q, Huang Yang MD et al. Parallel subgenome structure and divergent expression evolution of allo-tetraploid common carp and goldfish. Nat. Genet., 2021; 53(10): 1493–1503. doi: 10.1038/s41588-021-00933-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Bian C, Li R, Ouyang Y et al. Supporting data for “Chromosome-level genome assemblies of five Sinocyclocheilus species”. GigaScience Database, 2025; 10.5524/102703. [DOI] [Google Scholar]
  • 50.Bian C, Li R, Ouyang Y et al. Genome assembly of the golden-line barbel Sinocyclocheilus grahami . GigaScience Database, 2025; 10.5524/102704. [DOI] [Google Scholar]
  • 51.Bian C, Li R, Ouyang Y et al. Genome assembly of the tetraploid fish Sinocyclocheilus maitianheensis . GigaScience Database, 2025; 10.5524/102706. [DOI] [Google Scholar]
  • 52.Bian C, Li R, Ouyang Y et al. Genome assembly of the tetraploid fish Sinocyclocheilus anshuiensis . GigaScience Database, 2025; 10.5524/102707. [DOI] [Google Scholar]
  • 53.Bian C, Li R, Ouyang Y et al. Genome assembly of the tetraploid fish Sinocyclocheilus rhinocerous . GigaScience Database, 2025; 10.5524/102708. [DOI] [Google Scholar]
GigaByte. 2025 May 9;2025:gigabyte155.

Article Submission

Chao Bian
GigaByte.

Assign Handling Editor

Editor: Scott Edmunds
GigaByte.

Editor Assess MS

Editor: Qing Lan
GigaByte.

Curator Assess MS

Editor: Bastien Molcrette
GigaByte.

Review MS

Editor: Jun Wang

Reviewer name and names of any other individual's who aided in reviewer Jun Wang
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper? Yes
Additional Comments
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound? Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Additional Comments
Is there sufficient data validation and statistical analyses of data quality? Yes
Additional Comments
Is the validation suitable for this type of data? Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
Additional Comments
Any Additional Overall Comments to the Author The manuscript assembled chromosomal-level genomes of five Sinocyclocheilus species, and conducted allotetraploid origin analysis on these species. The manuscript was meaningful and provided valuable genome resources in Sinocyclocheilus genus, which will further help with the evolution and functional genomics of these species. The analysis was accurate, and the results were solid. My comments are as follows 1. Please detail the method how you assembled four other species on homologous comparison? You just map the assembled scaffold to the reference genome? 2. In the manuscript, the author only provide the sequencing info of S.grahami but not other four species. What are the sequencing information of other four species, like how many reads have been sequenced for Illumina? 3. There was no results description for figure 2 and why there are only repeat annotation results for S.grahami no other four species?
Recommendation Minor Revision
GigaByte.

Review MS

Editor: Fei Li

Reviewer name and names of any other individual's who aided in reviewer Fei Li, Shili Li
Do you understand and agree to our policy of having open and named reviews, and having your review included with the published papers. (If no, please inform the editor that you cannot review this manuscript.) Yes
Is the language of sufficient quality? Yes
Please add additional comments on language quality to clarify if needed
Are all data available and do they match the descriptions in the paper? Yes
Additional Comments
Are the data and metadata consistent with relevant minimum information or reporting standards? See GigaDB checklists for examples <a href="http://gigadb.org/site/guide" target="_blank">http://gigadb.org/site/guide</a> Yes
Additional Comments
Is the data acquisition clear, complete and methodologically sound? Yes
Additional Comments
Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes
Additional Comments
Is there sufficient data validation and statistical analyses of data quality? Yes
Additional Comments
Is the validation suitable for this type of data? Yes
Additional Comments
Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes
Additional Comments
Any Additional Overall Comments to the Author This paper entitled “Chromosome-level genome assemblies of five Sinocyclocheilus species” reported a chromosome-level golden-line barbel genome by using combination of Pacbio and Hi-C data. Using this chromosome-level assembly as reference, the author also constructed other four psedo chromosome-level assemblies of S.anophthalmus, S. maitianheensis, S. anshuiensis, and S. rhinocerous. These data are really important resource for conservation of these endanger species. However, some important results have not shown: 1. Protein BUSCO result has not been shown. 2. Raw reads were not uploaded to NCBI. 3. What’s the detailed number for functional annotation. Some minor suggestions: Add “,” before “and conservation”. What’s meaning of “R & D”? Line 58, “a good model” should be “good models”. Line 64, remove “at first”. Line 84, change “a” to “the”. Line 90, change ‘muscle’ to “muscle tissue”. Line 105, remove ‘which was’. Line 112, remove ‘this study’. Line 122, change “Repeat annotation, gene prediction, and function prediction” to “Annotation of repeat, gene and function”. Line 137, ‘with’ should be ‘by using’. Line 127, remove ‘(TEs)’. Line 134, What’s meaning of NCBI GenBank? Remove GenBank. Line 140, ‘was’ should be ‘were’. Line 178, ‘Species’ should be ‘species’.
Recommendation Minor Revision
GigaByte.

Editor Decision

Editor: Qing Lan
GigaByte. 2025 May 9;2025:gigabyte155.

Minor Revision

Chao Bian
GigaByte.

Assess Revision

Editor: Qing Lan
GigaByte.

Final Data Preparation

Editor: Bastien Molcrette
GigaByte.

Editor Decision

Editor: Qing Lan
GigaByte.

Accept

Editor: Scott Edmunds

Editor’s Assessment Sinocyclocheilus are a genus of freshwater cavefish fish that are endemic to the Karst regions of Southwest China. Having diverse traits in morphology, behavior, and physiology typical of cavefish, that make them interesting models for studying cave adaptation and phylogenetic evolution. The manuscript assembled chromosomal-level genomes of five Sinocyclocheilus species, and conducted allotetraploid origin analysis on these species. Assembling S. grahami (the golden-line barbel), using PacBio and Hi-C sequencing technologies, a final chromosome-level genome assembly was 1.6 Gb in size with a contig N50 of 738.5 kb and a scaffold N50 of 30.7 Mb. With 93.1% of the assembled genome sequences and 93.8% of the predicted genes anchored onto 48 chromosomes. Subsequently the authors conducted a homologous comparison to obtain chromosome-level genome assemblies for four other Sinocyclocheilus species: S. maitianheensis, S. rhinocerous, S. anshuiensis, and S. Anophthalmus. With over 82% of the genome sequences anchored on these constructed chromosomes. Peer review provided clarification on the assembly strategy and provided more benchmarking. This data having the potential to contribute to species conservation and the exploitation of potential economic and ecological values of diverse Sinocyclocheilus members.
Editor’s Assessment Sinocyclocheilus are a genus of freshwater cavefish fish that are endemic to the Karst regions of Southwest China. Having diverse traits in morphology, behavior, and physiology typical of cavefish, that make them interesting models for studying cave adaptation and phylogenetic evolution. The manuscript assembled chromosomal-level genomes of five Sinocyclocheilus species, and conducted allotetraploid origin analysis on these species. Assembling S. grahami (the golden-line barbel), using PacBio and Hi-C sequencing technologies, a final chromosome-level genome assembly was 1.6 Gb in size with a contig N50 of 738.5 kb and a scaffold N50 of 30.7 Mb. With 93.1% of the assembled genome sequences and 93.8% of the predicted genes anchored onto 48 chromosomes. Subsequently the authors conducted a homologous comparison to obtain chromosome-level genome assemblies for four other Sinocyclocheilus species: S. maitianheensis, S. rhinocerous, S. anshuiensis, and S. Anophthalmus. With over 82% of the genome sequences anchored on these constructed chromosomes. Peer review provided clarification on the assembly strategy and provided more benchmarking. This data having the potential to contribute to species conservation and the exploitation of potential economic and ecological values of diverse Sinocyclocheilus members.
GigaByte.

Export to Production

Editor: Scott Edmunds

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Availability Statement

    The genome assembly of S. grahami was uploaded to NCBI under the BioProject PRJNA1172646, and the genome assembly of S. anophthalmus is available under the BioProject PRJNA669129. The Pacbio, HiC, and transcriptome reads are deposited in NCBI with accession numbers SRR32815372, SRR32815371, and SRR32815370, respectively. All other data, including the repeat and gene annotations, have been shared via the GigaDB repository [49], with separate entries for the individual species genomes [34, 5053].


    Articles from GigaByte are provided here courtesy of Gigascience Press

    RESOURCES