Near complete genome assembly of Yadong trout (Salmo trutta)

Chen Li; Shenglei Han; Shuo Li; Kaiqiang Liu; Yuyan Liu; Hong-yan Wang; Qian Wang; Changlin Liu; Changwei Shao

doi:10.1038/s41597-025-04418-w

. 2025 Jan 15;12:74. doi: 10.1038/s41597-025-04418-w

Near complete genome assembly of Yadong trout (Salmo trutta)

Chen Li ^1,^2,^3,^#, Shenglei Han ^1,^3,^#, Shuo Li ^1,³, Kaiqiang Liu ^1,^3,⁴, Yuyan Liu ^1,^3,⁴, Hong-yan Wang ^1,^3,⁴, Qian Wang ^1,^3,⁴, Changlin Liu ^1,³, Changwei Shao ^1,^3,^4,^✉

PMCID: PMC11735641 PMID: 39814780

Abstract

The Yadong trout (Salmo trutta), a species endemic to the Yatung River in Tibet, China, was classified as a second-class protected species in the 20th century. Now, it is considered one of the most important fishery resources in China. In this study, we assembled a near-complete genome of the S. trutta, integrating PacBio HiFi, Hi-C, and ONT sequencing technologies. The genome assembly spans 2.49 Gb, with 96.87% of the sequence anchored onto 40 chromosomes. In this assembly, a total of 12 chromosomes were assembled to a gap-free level, with 8 of them reaching the telomere-to-telomere level. The completeness of this assembly was assessed at 99.50% by BUSCO, containing approximately 63.24% repetitive sequences, and predicted to encode 41,782 protein-coding genes. This is the first near-complete genome assembly of the S. trutta, providing an essential resource for molecular breeding and germplasm conservation of this important species.

Subject terms: Eukaryote, Open reading frames

Background & Summary

Salmo trutta, a member of the Salmonidae family, is characterized by a grey upper body with spots distributed above and below the lateral line¹. It is native to Europe, Western Asia, and North Africa², and can be broadly categorized into anadromous and lacustrine populations³. Since the mid-19th century, S. trutta has been introduced to 24 countries, including Russia, the United States, the United Kingdom, Japan, and countries in South America. Its strong migratory abilities and adaptability to diverse environments have enabled it to rapidly establish itself as a global species^3,4.

In 1866, S. trutta was introduced to the Yadong River in Tibet, China, where it has since become a localized population, colloquially known as Yadong trout⁵. The specific environmental conditions of the Yadong River result in a slow reproductive and growth cycle for the Yadong trout population. This makes the species highly vulnerable to overfishing^6,7. This has led to a steady decline in population numbers, prompting its designation as a second-class protected aquatic species in the Tibet Autonomous Region in 1992⁸. In recent years, efforts by the Yellow Sea Fisheries Research Institute of the Chinese Academy of Fishery Sciences have led to the implementation of large-scale aquaculture programs to support the conservation and commercial farming of Yadong trout.

Advancements in sequencing technologies have made telomere-to-telomere (T2T) level genome assemblies feasible. As a result, several fish species, including Neosalanx taihuensis⁹, Lateolabrax maculatus¹⁰, and Clarias gariepinus¹¹, have had their genomes published at this level of detail. However, the only available genome (GCA_901001165.1) of S. trutta remains the chromosomal-level assembly published by the Wellcome Sanger Institute in 2019¹², and the number of published chromosomal-level genomes is still extremely limited compared to the global distribution of different populations of S. trutta.

In this study, we integrated PacBio high-fidelity (HiFi), high-throughput chromatin conformation capture (Hi-C) and Oxford Nanopore Technologies (ONT) reads to assemble the S. trutta genome, achieving near complete genome sequence level (Fig. 1a,b). Compared to the published S. trutta genome¹², our assembly shows significant improvements in continuity and completeness. This high-quality genome will provide valuable resources for the molecular breeding of S. trutta and facilitate comparative genomic analyses among S. trutta populations from different regions.

Fig. 1 — *S. trutta* genome snail plot and circos plot. (a) The snail plot presents the basic metrics of the genome assembly. (b) The circos plot represents the following metrics from outer to inner layers: (a) chromosomes, (b) gene density, (c) CG content, (d) DNA transposons, (e) LTRs, (f) LINEs, and (g) SINEs. The points on the chromosome backbone indicate detected telomeres, with red points representing chromosomes that are telomere-to-telomere with no gaps.).

Methods

Sample collection and sequencing

An adult female S. trutta was sourced from the Yadong County Industrial Park, Shang Yadong Township, Shigatse City, Tibet, China. We employed a rigorously annotated SDS method to obtain sufficient quality and quantity of genomic DNA (gDNA). The sheared gDNA was purified using AMPure PB beads, followed by end repair, adapter ligation, and further purification to construct SMRTbell templates. These templates were then loaded into SMRT cells and sequenced on the PacBio Sequel II platform, yielding 94.96 Gb (~38×) of HiFi data (Table 1).

Table 1.

Statistics of the sequencing data.

Library type	Platform	Tissue	Data size (Gb)	Average depth (×)	Average Length (bp)
ONT ultra-long	PromethION	Fin Ray	72.72	29	84,676
PacBio SMRT	PacBio Sequel II	Muscle	94.96	38	12,037
Hi-C	Illumina Novaseq6000	Muscle	255.69	103	150
WGS	Illumina Novaseq6000	Muscle	99.63	40	150
RNA-Seq	Illumina Novaseq6000	Liver	7.38	—	150
RNA-Seq	Illumina Novaseq6000	Heart	6.42	—	150
RNA-Seq	Illumina Novaseq6000	Intestine	6.62	—	150
RNA-Seq	Illumina Novaseq6000	Gill	7.44	—	150
RNA-Seq	Illumina Novaseq6000	Kidney	6.72	—	150
RNA-Seq	Illumina Novaseq6000	Brain	6.61	—	150
RNA-Seq	Illumina Novaseq6000	Muscle	7.37	—	150
RNA-Seq	Illumina Novaseq6000	Ovary	6.99	—	150
RNA-Seq	Illumina Novaseq6000	Spleen	6.08	—	150
Iso-Seq	PacBio Sequel IIe	Tissue mixture	10.11	—	2,730

Open in a new tab

For ONT data, DNA was extracted from fin clip tissue using the NEB Monarch® HMW DNA Extraction Kit for Tissue. Libraries were prepared and sequenced on the PromethION platform, resulting in 72.72 Gb (~29×) of ONT data (Table 1).

For Hi-C data, muscle tissue was processed through formaldehyde crosslinking, followed by washing, lysis, enzymatic digestion, DNA end modification, fragment ligation, purification, DNA end repair, biotin labeling, and PCR amplification. Illumina PE150 sequencing was performed, yielding 255.69 Gb (~103×) of Hi-C data (Table 1).

For WGS sequencing, DNA extracted from muscle tissues was transformed into Illumina library formats using the the NEBNext® Ultra™ DNA Library Prep Kit. Cluster formation for these libraries was carried out on cBot Cluster Generation System with the Illumina Paired-End Cluster Kit, adhering to the guidelines provided by the producer. We ultimately obtained 99.63 Gb (~40×) of short-fragment data (Table 1).

For RNA-seq data, we extracted RNA from nine different tissues, including muscle, liver, intestine, ovary, brain, spleen, gill, kidney, and heart. RNA-Seq libraries were assembled following the protocol of the library preparation kit. Illumina sequencing yielded an average of 6.83 Gb of data per tissue sample (Table 1). Additionally, we extracted RNA from a mixed tissue sample, and qualified RNA samples underwent reverse transcription, end repair, DNA fragmentation, adapter ligation, and amplification to construct the library. Sequencing was then carried out on the PacBio Sequel IIe platform, resulting in 10.11 Gb of Isoform Sequencing data. (Table 1).

Genome assembly and telomeres identification

To obtain a high-quality genome for S. trutta, we integrated ONT data with HiFi data and utilized Hifiasm¹³ (v0.19.9) to assemble a draft genome. Subsequently, we employed CRAQ¹⁴ (v1.0.9) to identify chimeric fragments and generated CRAQ-corrected genome. We then used kmerDedup¹⁵ for assembly redundancy removal and HapHic¹⁶ (v1.0.6) together with Hi-C data, to anchor the draft genome to 40 chromosomes, consistent with the chromosome number of the previously published S. trutta genome¹² (Fig. 2a). We utilized Juicer-box¹⁷ (v1.91) for minor manual refinements of the genome, subsequently employing TGS-GapCloser¹⁸ (v1.2.1) in conjunction with ONT data to fill gaps within the genome. Thereafter, we leveraged NextPolish2¹⁹ (v0.2.1), in tandem with HiFi and WGS data, to rectify the genome, culminating in the acquisition of a 2.49 Gb genome with a contig N50 of 47.99 Mb (Fig. 1a). We utilized the TeloExplorer parameter in quarTeT²⁰ (v1.2.1) to identify the telomeres (TTAGGG) at both ends of each chromosome in the S. trutta genome, revealing that 18 chromosomes had double-end telomeres detected, 17 chromosomes had single-end telomeres detected, and 8 chromosomes achieved gap-free telomere-to-telomere status (Fig. 1b and Table 2).

Fig. 2 — Hi-C heatmap and collinearity dot plot of the genome. (a) The Hi-C heatmap illustrates the chromosome interaction frequencies of the *S. trutta* genome, with each blue contour representing a chromosome. (b) The dot plot displays the collinearity relationship with previously published genome assembly of *S. trutta*.

Table 2.

Assembly statistics of chromosomes.

Chromosomes	Length (bp)	Telomere Numbers
Chr 1	90,493,090	1
Chr 2	82,810,130	1
Chr 3	84,936,639	2
Chr 4	85,513,606	1
Chr 5	72,538,494	1
Chr 6	72,262,847	1
Chr 7	84,217,063	1
Chr 8	58,820,683	1
Chr 9	50,963,598	2
Chr 10	48,104,659	2
Chr 11	25,970,806	1
Chr 12	107,239,732	1
Chr 13	101,836,447	1
Chr 14	99,869,877	0
Chr 15	100,024,870	1
Chr 16	62,148,298	1
Chr 17	64,962,358	1
Chr 18	61,029,533	2
Chr 19	58,673,832	0
Chr 20	58,621,627	2
Chr 21	53,977,117	2
Chr 22	54,916,544	0
Chr 23	53,462,562	2
Chr 24	53,486,768	0
Chr 25	50,849,851	1
Chr 26	51,063,025	2
Chr 27	47,770,962	1
Chr 28	51,549,634	2
Chr 29	54,445,351	2
Chr 30	47,725,804	2
Chr 31	47,930,784	2
Chr 32	49,330,108	2
Chr 33	47,087,300	2
Chr 34	47,087,710	2
Chr 35	43,920,889	2
Chr 36	43,421,662	2
Chr 37	48,078,177	2
Chr 38	34,900,096	1
Chr 39	28,802,253	1
Chr 40	29,316,477	0
Unplaced	77,658,758	—

Open in a new tab

Additionally, we conducted a collinearity analysis using Minimap2²¹ (v2.28) with the published S. trutta genome and visualized the results using pafCoordsDotPlotly.R (https://github.com/tpoorten/dotPlotly). The results showed that most chromosomes exhibited good collinearity, although some chromosomes displayed different arrangements (Fig. 2b). We also compared various assembly metrics between the two genomes, and found that our assembled genome demonstrated superior performance, with a total genome length of 2.49 Gb, an anchoring rate of 96.87%, a contig N50 of 47.99 Mb, a BUSCO completion rate of 99.50%, fewer genome gaps, 12 chromosomes being gap-free, and the detection of 53 telomeres (Table 3).

Table 3.

Assembly statistics of S. trutta.

Assembly	fSalTru1.1	This study
Total length	2.37 Gb	2.49 Gb
Anchoring rate	91.45%	96.87%
Contig N50	1.69 Mb	47.99 Mb
BUSCO completion rate	98.93%	99.50%
Number of gaps	2,954	97
Number of gap-free chromosomes	0	12
Number of telomeres	0	53

Open in a new tab

Repetitive sequence annotation

For the identification and characterization of genomic repetitive elements, we employed a combination of de novo prediction and homology-based annotation. We employed LTR-Finder²² (v1.07) to predict LTR retrotransposons in the S. trutta genome. Simultaneously, we utilized RepeatModeler²³ (v2.0.5) to perform de novo predictions of repetitive elements, and based on the results from both methods, we constructed a repeat element database specific to the S. trutta genome with the Repbase database²⁴ (v202101). Homology-based annotation was mainly conducted using the RepeatProteinMask and Repbase modules of RepeatMasker²⁵ (v.4.1.0), also with default parameters for prediction. The predictions from both strategies were then integrated and filtered. The annotation results indicated that repetitive sequences account for approximately 63.24% of the genome, with the largest portion being occupied by various interspersed elements: long interspersed nuclear elements (LINEs) made up 32.34%, short interspersed nuclear elements (SINEs) constituted 0.55%, and long terminal repeats (LTRs) accounted for 10.80% (Fig. 1b and Table 4).

Table 4.

Statistics of repeat content.

Type	Length (bp)	Percent of genome (%)
DNA	618,594,869	24.87
LINE	804,466,278	32.34
SINE	13,567,810	0.55
LTR	268,565,114	10.80
Other	3,941,006	0.16
Unknown	21,935,652	0.88
Total (exclude nested/overlapping TEs)	1,573,346,823	63.24

Open in a new tab

Genomic structure and functional annotation

Building upon the repetitive sequence-masked S. trutta genome, we employed Augustus²⁶ (v3.5.0) for gene de novo prediction using default parameters. Additionally, we utilized Miniprot²⁷ (v0.13) to perform homology annotation with protein sequences from four species, including Salmo salar, S. trutta, Oncorhynchus mykiss, and Oncorhynchus tshawytscha. Concurrently, we aligned RNA-seq data to the genome using HISAT2²⁸ (v2.2.1), followed by transcriptome assembly with StringTie²⁹ (v2.2.3). Open reading frame (ORF) within the assembled transcripts were further identified using TransDecoder (https://github.com/TransDecoder/TransDecoder), and potential coding regions were further predicted to construct evidence for annotation. Additionally, we processed full-length transcriptome data using the IsoSeq3³⁰ (v4.0.0) pipeline and aligned them to the genome with GMAP (https://github.com/juliangehring/GMAP-GSNAP) to generate annotation evidence. Finally, we integrated four annotation strategies—de novo, homology, transcript-based, and full-length transcriptome—using EvidenceModeler³¹ (v1.1.1). The final gene set was further refined by aligning it to the S. trutta genome annotation using Liftoff³² (v1.6.3) as a reference, resulting in the identification of 41,782 genes. To assess the accuracy of gene annotation, we compared the distribution of mRNA lengths and exon counts per mRNA in the S. trutta genome with gene data from S. salar and O. mykiss. The results demonstrated highly similar genomic component distribution characteristics among the three species (Fig. 3a).

Fig. 3 — Comparative genome plot of closely related species and gene functional annotation upset plot. (a) Comparison of mRNA length distribution and exon count per mRNA among the three closely related species. (b) Upset plot of gene functional annotation using data from EggNOG, Pfam, KEGG, NR, Kofam, and SwissProt.

The predicted gene protein sequences were aligned against various functional databases using Diamond³³ (v2.1.6) with an E-value threshold set at 1e-5. The databases encompassed Kyoto Encyclopedia of Genes and Genomes³⁴ (KEGG), Swiss Institute of Bioinformatics Protein Database³⁵ (Swiss-Prot), Evolutionary Genealogy of Genes: Non-supervised Orthologous Groups³⁶ (EggNOG), Protein Families Database³⁷ (Pfam), KOfam Database³⁸ (Kofam), and the Non-Redundant database³⁹ (NR) to extract potential gene functional information, which was subsequently utilized for statistical analysis. A total of 41,629 genes, which account for 99.63% of the total estimated protein-coding genes, have been effectively annotated by a minimum of one of these databases (Fig. 3b and Table 5).

Table 5.

Statistics of gene annotation.

Type	Number	Percentage (%)
NR	41,619	99.61
EggNOG	40,884	97.85
SwissProt	38,283	91.63
KEGG	33,564	80.33
Pfam	37,890	90.68
Kofam	31,161	74.58
Total	41,629	99.63

Open in a new tab

Data Records

The genome assembly data have been uploaded to GenBank under accession number JBICOJ000000000 (PRJNA1170013)⁴⁰ and Figshare⁴¹. The genome annotation file has also been uploaded to Figshare⁴¹. The sequencing raw data have been uploaded to NCBI with the project number PRJNA1171601 (SRP540059)⁴².

Technical Validation

We employed BUSCO⁴³ (v5.3) with the Actinopterygii database (actinopterygii_odb10) to evaluate the genome assembly’s completeness. The BUSCO analysis of the genome revealed an overall completeness of 99.50%, with 56.48% being single-copy and 43.02% being duplicated, leaving only 0.28% as fragmented and 0.22% missing (Fig. 1a and Table 3). We utilized Inspector⁴⁴ (v1.0.1) to map the PacBio HiFi reads against the genome, achieving a quality score, as indicated by the Quality Value (QV), of 44.91 and a read-to-contig alignment rate of 99.95%. Additionally, we applied CRAQ¹⁴ for genome assessment, which determined an Assembly Quality Index (AQI) of 96 (>90), indicating that our assembled genome is highly complete, of good quality, and has achieved a reference-quality standard.

Acknowledgements

This research was funded by the National Key R&D Program of China (2024YFD2400901 and 2022YFD2400100), the Key Research and Development Project of Shandong Province (2024LZGC005),the AoShan Talents Cultivation Program Supported by Qingdao National Laboratory for Marine Science and Technology (grant number 2017ASTCP-ES06), the Taishan Scholars Program (NO. tstp20221149) to C.S, the National Ten-Thousands Talents Special Support Program to C.S, the Key R&D Program of Hebei Province, China (21326307D) and the Central Public-interest Scientific Institution Basal Research Fund, CAFS (grant number 2023TD19).

Author contributions

C.S. conceived of the project. C.L. and S.H. analyzed the data. C.L. drafted the manuscript. S.L. K.L., H.W. and Q.W. revised the manuscript. Y.L. and C.L. prepared the sample materials and extracted the DNA. All authors read and approved the final version of the manuscript.

Code availability

In this study, we did not employ any customized scripts or software for personalized analysis. The analytical tools and parameters used are described in the methods section. For software without specific parameter descriptions, the default parameters were selected.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Chen Li, Shenglei Han.

References

1.Moyle, P. B. Inland Fishes of California. (University of California Press, 1976).
2.Lobry, J., Mourand, L., Rochard, E. & Elie, P. Structure of the Gironde estuarine fish assemblages: a comparison of European estuaries perspective. Aquat. Living Resour.16, 47–58 (2003). [Google Scholar]
3.Klemetsen, A. et al. Atlantic salmon Salmo salar L., brown trout Salmo trutta L. and Arctic charr Salvelinus alpinus (L.): a review of aspects of their life histories. Ecol Freshwa Fish12, 1–59 (2003). [Google Scholar]
4.Elliott, J. M. Quantitative ecology and the brown trout Ch. 9 (Oxford Univ. Press, 1994).
5.Liu, J. et al. Comparative Analysis of Nutrient Composition of Different-Colored Yadong Trout Eggs. Prog Fish Sci44, 133–141 (2023). [Google Scholar]
6.Kang, B. et al. Introduction of non-native fish for aquaculture in China: A systematic review. RevAquac15, 676–703 (2023). [Google Scholar]
7.Zhou, J., Min, Z., Zhang, C., Li, B. & Wang, W. Research progress of fishery resources in Tibet. Anim. Husb. Feed Sci.8, 246–250 (2016). [Google Scholar]
8.Tian, H.-F., Hu, Q.-M. & Li, Z. Genome-wide identification of simple sequence repeats and development of polymorphic SSR markers in swamp eel (Monopterus albus). Sci Prog104, 368504211035597 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Zhou, Y. et al. Gap-free genome assembly of Salangid icefish Neosalanx taihuensis. Sci Data10, 768 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Sun, Z. et al. Telomere-to-telomere gapless genome assembly of the Chinese sea bass (Lateolabrax maculatus). Sci Data11, 175 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Nguinkal, J. A. et al. Haplotype-resolved and near-T2T genome assembly of the African catfish (Clarias gariepinus). Sci. Data11, 1095 (2024). [DOI] [PMC free article] [PubMed]
12.Hansen, T. et al. The genome sequence of the brown trout, Salmo trutta Linnaeus 1758. Wellcome Open Res6, 108 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods18, 170–175 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Li, K., Xu, P., Wang, J., Yi, X. & Jiao, Y. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nat. Commun.14, 6556 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Abalde, S., Tellgren-Roth, C., Heintz, J., Vinnere Pettersson, O. & Jondelius, U. The draft genome of the microscopic Nemertoderma westbladi sheds light on the evolution of Acoelomorpha genomes. Front. Genet. 14, (2023). [DOI] [PMC free article] [PubMed]
16.Zeng, X. et al. Chromosome-level scaffolding of haplotype-resolved assemblies using Hi-C data without reference genomes. Nat. Plants10, 1184–1200 (2024). [DOI] [PubMed] [Google Scholar]
17.Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst3, 95–98 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Xu, M. et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. GigaScience9, giaa094 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Hu, J. et al. NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads. Genomics Proteomics Bioinf22, qzad009 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Lin, Y. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic. Res.10, uhad127 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics34, 3094–3100 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res.35, W265–W268 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA117, 9451–9457 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA6, 11 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, 4.10.1–4.10.14 (2009). [DOI] [PubMed] [Google Scholar]
26.Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res27, 573–580 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Li, H. Protein-to-genome alignment with miniprot. Bioinformatics39, btad014 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat Methods12, 357–360 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol33, 290–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Guizard, S. et al. nf-core/isoseq: simple gene and isoform annotation with PacBio Iso-Seq long-read sequencing. Bioinformatics39, btad150 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol9, R7 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.A, S. & Sl, S. Liftoff: accurate mapping of gene annotations. Bioinformatics37, (2021). [DOI] [PMC free article] [PubMed]
33.Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods18, 366–368 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res28, 27–30 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res28, 45–48 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res47, D309–D314 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res49, D412–D419 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics36, 2251–2252 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res33, D501–504 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.NCBI GenBankhttps://identifiers.org/ncbi/insdc.gca:GCA_043791845.1 (2024).
41.Li, C. Salmo trutta genome. Figshare.10.6084/m9.figshare.27282591.v3 (2024).
42.NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP540059 (2024).
43.Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol Biol Evol38, 4647–4654 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Chen, Y., Zhang, Y., Wang, A. Y., Gao, M. & Chong, Z. Accurate long-read de novo assembly evaluation with Inspector. Genome Biology22, 312 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

NCBI GenBankhttps://identifiers.org/ncbi/insdc.gca:GCA_043791845.1 (2024).
Li, C. Salmo trutta genome. Figshare.10.6084/m9.figshare.27282591.v3 (2024).
NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP540059 (2024).

Data Availability Statement

[CR1] 1.Moyle, P. B. Inland Fishes of California. (University of California Press, 1976).

[CR2] 2.Lobry, J., Mourand, L., Rochard, E. & Elie, P. Structure of the Gironde estuarine fish assemblages: a comparison of European estuaries perspective. Aquat. Living Resour.16, 47–58 (2003). [Google Scholar]

[CR3] 3.Klemetsen, A. et al. Atlantic salmon Salmo salar L., brown trout Salmo trutta L. and Arctic charr Salvelinus alpinus (L.): a review of aspects of their life histories. Ecol Freshwa Fish12, 1–59 (2003). [Google Scholar]

[CR4] 4.Elliott, J. M. Quantitative ecology and the brown trout Ch. 9 (Oxford Univ. Press, 1994).

[CR5] 5.Liu, J. et al. Comparative Analysis of Nutrient Composition of Different-Colored Yadong Trout Eggs. Prog Fish Sci44, 133–141 (2023). [Google Scholar]

[CR6] 6.Kang, B. et al. Introduction of non-native fish for aquaculture in China: A systematic review. RevAquac15, 676–703 (2023). [Google Scholar]

[CR7] 7.Zhou, J., Min, Z., Zhang, C., Li, B. & Wang, W. Research progress of fishery resources in Tibet. Anim. Husb. Feed Sci.8, 246–250 (2016). [Google Scholar]

[CR8] 8.Tian, H.-F., Hu, Q.-M. & Li, Z. Genome-wide identification of simple sequence repeats and development of polymorphic SSR markers in swamp eel (Monopterus albus). Sci Prog104, 368504211035597 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Zhou, Y. et al. Gap-free genome assembly of Salangid icefish Neosalanx taihuensis. Sci Data10, 768 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Sun, Z. et al. Telomere-to-telomere gapless genome assembly of the Chinese sea bass (Lateolabrax maculatus). Sci Data11, 175 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Nguinkal, J. A. et al. Haplotype-resolved and near-T2T genome assembly of the African catfish (Clarias gariepinus). Sci. Data11, 1095 (2024). [DOI] [PMC free article] [PubMed]

[CR12] 12.Hansen, T. et al. The genome sequence of the brown trout, Salmo trutta Linnaeus 1758. Wellcome Open Res6, 108 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods18, 170–175 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Li, K., Xu, P., Wang, J., Yi, X. & Jiao, Y. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nat. Commun.14, 6556 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Abalde, S., Tellgren-Roth, C., Heintz, J., Vinnere Pettersson, O. & Jondelius, U. The draft genome of the microscopic Nemertoderma westbladi sheds light on the evolution of Acoelomorpha genomes. Front. Genet. 14, (2023). [DOI] [PMC free article] [PubMed]

[CR16] 16.Zeng, X. et al. Chromosome-level scaffolding of haplotype-resolved assemblies using Hi-C data without reference genomes. Nat. Plants10, 1184–1200 (2024). [DOI] [PubMed] [Google Scholar]

[CR17] 17.Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst3, 95–98 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Xu, M. et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. GigaScience9, giaa094 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Hu, J. et al. NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads. Genomics Proteomics Bioinf22, qzad009 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Lin, Y. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic. Res.10, uhad127 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics34, 3094–3100 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res.35, W265–W268 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA117, 9451–9457 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA6, 11 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, 4.10.1–4.10.14 (2009). [DOI] [PubMed] [Google Scholar]

[CR26] 26.Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res27, 573–580 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Li, H. Protein-to-genome alignment with miniprot. Bioinformatics39, btad014 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat Methods12, 357–360 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol33, 290–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Guizard, S. et al. nf-core/isoseq: simple gene and isoform annotation with PacBio Iso-Seq long-read sequencing. Bioinformatics39, btad150 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol9, R7 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.A, S. & Sl, S. Liftoff: accurate mapping of gene annotations. Bioinformatics37, (2021). [DOI] [PMC free article] [PubMed]

[CR33] 33.Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods18, 366–368 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res28, 27–30 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res28, 45–48 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res47, D309–D314 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res49, D412–D419 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics36, 2251–2252 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res33, D501–504 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.NCBI GenBankhttps://identifiers.org/ncbi/insdc.gca:GCA_043791845.1 (2024).

[CR41] 41.Li, C. Salmo trutta genome. Figshare.10.6084/m9.figshare.27282591.v3 (2024).

[CR42] 42.NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP540059 (2024).

[CR43] 43.Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol Biol Evol38, 4647–4654 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Chen, Y., Zhang, Y., Wang, A. Y., Gao, M. & Chong, Z. Accurate long-read de novo assembly evaluation with Inspector. Genome Biology22, 312 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Near complete genome assembly of Yadong trout (Salmo trutta)

Chen Li

Shenglei Han

Shuo Li

Kaiqiang Liu

Yuyan Liu

Hong-yan Wang

Qian Wang

Changlin Liu

Changwei Shao

Abstract

Background & Summary

Fig. 1.

Methods

Sample collection and sequencing

Table 1.

Genome assembly and telomeres identification

Fig. 2.

Table 2.

Table 3.

Repetitive sequence annotation

Table 4.

Genomic structure and functional annotation

Fig. 3.

Table 5.

Data Records

Technical Validation

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases