A chromosome-level genome of Brachymystax tsinlingensis provides resources and insights into salmonids evolution

Wenbo Zhu; Zhongkai Wang; Haorong Li; Ping Li; Lili Ni; Li Jiao; Yandong Ren; Ping You

doi:10.1093/g3journal/jkac162

. 2022 Jun 27;12(8):jkac162. doi: 10.1093/g3journal/jkac162

A chromosome-level genome of Brachymystax tsinlingensis provides resources and insights into salmonids evolution

Wenbo Zhu ^1,^2,^†, Zhongkai Wang ^3,^†, Haorong Li ⁴, Ping Li ⁵, Lili Ni ⁶, Li Jiao ⁷, Yandong Ren ^8,^✉, Ping You ^9,^✉

Editor: A Whitehead

PMCID: PMC9339311 PMID: 35758619

Abstract

Brachymystax tsinlingensis Li, 1966 is an endangered freshwater fish with economic, ecological, and scientific values. Study of the genome of B. tsinlingensis might be particularly insightful given that this is the only Brachymystax species with genome. We present a high-quality chromosome-level genome assembly and protein-coding gene annotation for B. tsinlingensis with Illumina short reads, Nanopore long reads, Hi-C sequencing reads, and RNA-seq reads from 5 tissues/organs. The final chromosome-level genome size is 2,031,709,341 bp with 40 chromosomes. We found that the salmonids have a unique GC content and codon usage, have a slower evolutionary rate, and possess specific positively selected genes. We also confirmed the salmonids have undergone a whole-genome duplication event and a burst of transposon-mediated repeat expansion, and lost HoxAbβ Hox cluster, highly expressed genes in muscle may partially explain the migratory habits of B. tsinlingensis. The high-quality B. tsinlingensis assembled genome could provide a valuable reference for the study of other salmonids as well as aid the conservation of this endangered species.

Keywords: Brachymystax tsinlingensis, chromosome-level genome, salmonids, evolutionary rate, positively selected genes

Introduction

Whole-genome duplication (WGD) events have shaped the history of many evolutionary lineages. It is widely accepted that 1R WGD and 2R WGD are shared by all the jawed vertebrates, and the third teleost-specific WGD (Ts3R) occurred at basally in the teleost radiation ∼320 MYA (Jaillon et al. 2004; Kasahara et al. 2007). However, salmonid-specific WGD (Ss4R) occurred in the common ancestor of salmonids ∼80 MYA after their divergence from Esociformes ∼125 MYA (Near et al. 2012; Macqueen and Johnston 2014). Previous studies had focused on the WGD events, gene duplication (Berthelot et al. 2014; Lien et al. 2016; Robertson et al. 2017), Hox clusters evolution (Mungpakdee et al. 2008), and TE expansions of salmonids (de Boer et al. 2007; Lien et al. 2016). Our study replenish a new salmonids species and explain its unique characters.

Brachymystax tsinlingensis (Pallas, 1773) is an endangered freshwater fish (Yang et al. 1999; Ren and Liang 2004) that is distributed in Siberia, Korea, and North China, with endemic to the middle part of the Qinling Mountains, especially the Heihe, Shitouhe, Xushui, and Taibaihe Rivers, which had been described a subspecies in 2014, Brachymystax lenok tsinlingensis (Froese and Pauly 2014). After examining paratypes of B. tsinlingensis, additional specimens, and information from other studies on mitochondrial DNA (Xing et al. 2015) concluded that B. tsinlingensis is a valid species, Brachymystax tsinlingensis Li, 1966.

Study of B. tsinlingensis could aid our understanding of salmonid evolution given that it is the only sequenced Brachymystax species. However, only mitochondrial genome data have been published (Si et al. 2012; Yu and Kwak 2015); genome and transcriptome data of B. tsinlingensis are still lacking by comparison. In this study, we conducted the first chromosome-level genome assembly of B. tsinlingensis by combining Nanopore long reads, Hi-C data, and Illumina short reads. Comparative genomics analyses with other related species revealed and conformed that the salmonids have a unique GC content and codon usage, have undergone a common WGD event and a burst of transposon-mediated repeat expansion, have a slower molecular evolution rate than other outgroup teleosts, possess specific positively selected genes, and have highly expressed genes in muscle. Overall, these findings provide new insight into the evolution of salmonids and will aid future studies of salmonid evolution.

Materials and methods

Genome data generation

The muscle, heart, kidney, liver, and spleen tissues/organs were obtained from 3 adults obtained from Qingling Mountain, Shaanxi province, China. All the samples were stored in −80°C and sent to Beijing Biomarker Technologies. Co. LTD for DNA/RNA extraction and sequencing. Briefly, genomic DNA from muscle was extracted using a DNeasy Blood & Tissue Kit (Qiagen, German). The quality of the extracted DNA was determined using an Agilent 2100 bioanalyzer (Agilent Technologies, USA). The RNA of these tissues/organs was extracted using Trizol, and the quality of the extracted RNA was also assessed by an Agilent 2100 bioanalyzer (Agilent Technologies, USA) and evaluated on a 1.5% agarose gel stained. Both DNA and RNA were used for the construction of different libraries and further sequencing. For the genomic survey, an Illumina paired-ended library (insert size of 300 bp) was constructed using the TruSeq DNA Sample Preparation Kit version 2 with no PCR amplification (PCR-free) following the manufacturer’s protocol (Illumina, USA). The Agilent 2100 bioanalyzer (Agilent Technologies, USA) High Sensitivity Kit was used to check for quality. Then, the library was sequenced on the NovaSeq platform (Illumina, USA) using the 150-bp paired-end strategy. For Nanopore long-read sequencing, g-TUBEs (Covaris, USA) were used to break the high-quality genome DNA to proper size and the DNA fragment were enriched by magnetic bead. Then, the target DNA fragment was repaired and the adaptors were added at both end of DNA fragment. Unaligned ends were removed and BluePippin (Sage Science, USA) were used for proper fragments. At last, the DNA library was checked by Agilent 2100 Bioanalyzer (Agilent technologies, USA) and the qualified libraries were sequenced on the PromethION platform (Oxford, Nanopore). For Hi-C reads, the sample was cross-linked for 10 min with formaldehyde (1% final concentration), after which glycine (0.2 M final concentration) was added for 5 min to stop the cross-linking process, with the sample then stored until the further analysis. Then, we use a douncer to lyse the cells in cold hypotonic buffer that is supplemented with protease inhibitors to maintain protein-DNA complexes. We used Hind III for digestion. Then, the DNA ends were marked with biotin and preparing captured conformations for deep sequencing. To obtain enough DNA for deep sequencing, the library of ligated fragments was amplified by PCR, and sequenced on NovaSeq platform (Illumina, USA).

Quality control of raw sequencing reads

Because the sequencing data included Illumina short reads and Nanopore long reads, 2 different strategies were used to filter the reads. For Illumina data, all low-quality reads, adaptor sequences, and duplicated reads produced by PCR were removed. For RNA-seq data, the low-quality reads, adaptor sequences, and duplicated reads produced by PCR were also removed. For Nanopore reads, all reads with an average quality ≥ 7 were retained for genome raw assembly by an Perl script (https://ftp.cngb.org/pub/gigadb/pub/10.5524/102001_103000/102210/filter_ONT_data_for_7_with_auto_check_quality_position_wzk.pl; last accessed data: 2022/6/30). For contamination detection, the 363,212 Illumina reads (∼50 Mb) were aligned to the Nr database from NCBI by BLAST software (v2.9.0) (Altschul et al. 1990) with e-value set as 10⁻⁵. Then the blast results were used for taxonomic classification, the reads clearly belonging to prokaryotes, fungi or plants would be judged as contamination.

Estimation of genome size

To investigate the genome characteristics of B. tsinlingensis, the genome size was estimated using the K-mer method with all filtered Illumina genome reads by kmer_freq_hash program in GCE software (v1.0.0) (Liu et al. 2013). The K-mer number was set to 17, and the genome size was estimated using the total number of 17-mers divided by the peak 17-mer frequency. Thus, the genome size was estimated using the formula: Genome size = total 17-mer number/peak frequency (Liu et al. 2013).

De novo genome assembly and Hi-C scaffolding

All of the filtered Nanopore long reads were used for the draft genome assembly in Nextdenovo software (v2.3.1; https://github.com/Nextomics/NextDenovo) with default parameters. The assembled genome was corrected using the filtered Illumina genome reads. BWA software (BWA-MEM module) (v0.7.12-r1039) (Li and Durbin 2009) was used to map short reads to the genome, and Pilon software (v1.24) (Walker et al. 2014) was used for error correction of the sequenced bases. To obtain the chromosome-level genome, the Hi-C reads were aligned to the assembled genome using Juicer software (Durand et al. 2016). The software 3D-DNA (v180419) (Dudchenko et al. 2017) was used to cluster scaffolds into different clusters; in each cluster, the order of scaffolds was determined by the strength of interactions. Finally, all the possibilities of scaffold orientation and generated finely orientated scaffolds using a weighted directed acyclic graph.

Genome quality evaluation

After the genome assembly process, several genome quality evaluation methods were used. We used Benchmarking Universal Single-Copy Orthologs (BUSCO, v5) software (https://gvolante.riken.jp/analysis.html) (Simao et al. 2015) to estimate the percentage of conserved orthologs in the assembled genome. The conserved gene sets of Eukaryota, Metazoa, and Actinopterygii were used as a database and the complete BUSCO genes, fragment genes, and missing genes were detected. Software BWA (v0.7.12-r1039) (Li and Durbin 2009) and BLAT (Kent 2002) were used for the mapping ratio of NGS data and de novo assembled transcripts to the assembled genome, respectively.

Repeats and transposable elements annotation

For repetitive sequence annotation, tandem repeats were annotated using Tandem Repeat Finder (http://tandem.bu.edu/trf/trf.html, V4.10) (Benson 1999) with default parameters. For transposable elements (TEs) annotation, both software RepeatProteinMask (RM-BLASTX) and software RepeatMasker (open-4.0.7) (Bedell et al. 2000) were used. Software RepeatProteinMask was used to search TEs in its protein database, and RepeatMasker software was used for de novo libraries and the Repbase library (zebrafish). The de novo repeat libraries were analyzed by RepeatModeler software with default parameters. The insertion time of each TE was calculated using K/2r (SanMiguel et al. 1998; Bowen and McDonald 2001). K represents the kimura value, which was extracted from the RepeatMasker analysis, and r represents the evolutionary rate acquired from the r8s analysis (Sanderson 2003). The insertion time of each TE sequence was estimated using a Kimura distance-based analysis (Chalopin et al. 2015) using parseRM (https://github.com/4ureliek/Parsing-RepeatMasker-Outputs).

Gene structure annotation and functional annotation

All of the TEs in the genome were masked and used for gene structure annotation. In this step, 3 different strategies, including de novo prediction, homolog-based prediction, and transcript-based prediction, were used. For de novo prediction, Augustus software (v3.3.3) (Stanke and Waack 2003) was used with default parameters. For homolog-based annotation, proteins of 10 species, including Esox lucius (GCF_011004845.1) (Ishiguro et al. 2003), Lepisosteus oculatus (GCF_000242695.1) (Inoue et al. 2003), Danio rerio (GCF_000002035.6) (Howe et al. 2013), Oncorhynchus tshawytscha (GCF_002872995.1) (Christensen et al. 2018), Oncorhynchus keta (GCF_012931545.1), Salmo salar (GCF_000233375.1) (Lien et al. 2016), Salmo trutta (GCF_901001165.1), Oncorhynchus nerka (GCF_006149115.1) (Christensen et al. 2020), Oncorhynchus mykiss (GCF_013265735.2) (Gao et al. 2021), and Oncorhynchus kisutch (GCF_002021735.2), were downloaded from the NCBI database (https://ftp.ncbi.nlm.nih.gov/genomes) and aligned to the repeat-masked genome by tblastn (Altschul et al. 1990) with an e-value of 10e⁻⁵. We then used Genewise software (Birney et al. 2004) to select the longest coding regions and/or the highest score at each gene locus. For transcript-based annotation, the RNA-seq reads were assembled into transcripts using Bridger software (Chang et al. 2015), and the transcripts were mapped to the genome by BLAT software (v34) (more than 90% identity and coverage) (Kent 2002); PASA (Haas et al. 2003) was then used to link spliced alignments. Finally, software EvidenceModeler (v1.1.1) (Haas et al. 2008) was used to integrate these results into the final gene set.

All of the predicted genes were used for functional annotation using the following public protein database. InterProScan (v4.8) (Zdobnov and Apweiler 2001) was used to screen proteins against 5 databases (Pfam, release 24.057; ProDom, 2006.1; MART, release 6.059; PROSITE, release 20.52; PRINT, release 40.058). The Kyoto Encyclopedia of Genes and Genomes (KEGG), SwissProt (Release 2011.6), nonredundant database (NR), and TrEMBL (Release 2011.6) databases were all used in the function annotation in BLAST software (v2.9.0) (Altschul et al. 1990) with the e-value of 1e⁻⁵.

Genome synteny

To perform the synteny analysis, we implemented the Last software (v1066) (“lastal” command in LAST with -P 5 -m100 -E 0.05; v802) (Kiełbasa et al. 2011) to achieve the whole genome alignment using B. tsinlingensis assembly as database. The genome of O. tshawytscha, O. keta, S. trutta, O. nerka, O. mykiss, and O. kisutch were aligned to B. tsinlingensis assembly. Then, Circos (v0.69-6) software (Krzywinski et al. 2009) was used to plot the syntenic relationship graph.

Phylogenetic inference

Reciprocal BLAST (Altschul et al. 1990) and OrthoMCL software (Li et al. 2003) were used to determine the homology relationships among the protein sequences of 11 species (B. tsinlingensis, D. rerio, O. mykiss, O. tshawytscha, S. trutta, O. kisutch, O. nerka, S. salar, O. keta, E. lucius, and L. oculatus). All the protein sequences were downloaded from NCBI described earlier and the longest isoform of each gene was chosen to obtain the nonredundant protein sequences of each gene set. The initial grouping of gene sequences into ortholog sequence sets were done by a best reciprocal blast, and these results were used as input for OrthMCL software (Li et al. 2003). The 1:1 genes were identified as single-copy genes. All the single-copy genes in these species were concatenated into 1 super sequence for each species. MUSCLE (v3.8.31) (Edgar 2004a, 2004b) was then used to align these sequences. Lastly, RAxML (v8.2.9) (Stamatakis 2014) was used to reconstruct the phylogenetic relationships using different models (Protgammaauto/Gtrgamma) with 100 bootstrap replicates; L. oculatus was used as the outgroup species.

Molecular clock analysis

To estimate the divergence time among these 11 species, all of the 4DTv sequences were extracted from the previous super sequences using Reseqtools (https://github.com/BGI-shenzhen/Reseqtools). MCMCtree program in the PAML package (v4.8) (Yang 2007) was used for divergence time calculation. The divergence times were analyzed by Markov chain Monte Carlo sampling with samples drawn every 2,000 steps and 100,000 samples, and the results were calibrated by the fossil records downloaded from the TIMEtree database (http://www.timetree.org).

Rate of molecular evolution

The molecular evolution rate of these 11 species was calculated using the protein sequences of all single-copy genes, these sequences were connected into 1 super sequence and aligned using MUSCLE (v3.8.31) (Edgar 2004a, 2004b). Then, 2 methods were used, i.e. LINTRE software (2 cluster analysis) (Takezaki et al. 1995) and MEGA (v11) (Tajima’s relative rate test) (Kumar et al. 2016). We specified L. oculatus as the outgroup species and tested the relative evolution rate between B. tsinlingensis and other species. For Tajima’s relative rate test, the Chi-square test was used to test which species has faster evolution rate compared to the other one. For 2 cluster analysis, the tpcv model in LINTRE was used for calculation.

Positive selection

All sequences of these single-copy genes were aligned using MUSCLE software (Edgar 2004a, 2004b) with default parameters. At first, the rate ratio (ω) of nonsynonymous to synonymous nucleotide substitutions was estimated. One ratio model was used to detect average ω across the species tree (ω0). Then, 2 ratio branch model was used to detect the ω (ω1) of the appointed branch (B. tsinlingensis) to test the (ω1) and ω of all the other branches (ω_background). A likelihood ratio test was performed to compare the fit of the 2 ratio models with the 1 ratio model to determine whether this gene is positively selected gene (ω1 > 1, ω1>ω0, ω1>ω_background, and P-value < 0.05).

WGD assessment

The distribution of synonymous substitutions per site (Ks) within paralogs was used to examine the WGD event in salmonids. The protein sequences of all 11 species were aligned with BLAST v2.9.0 (e-value 1e⁻¹⁰) (Altschul et al. 1990). When one gene and another gene were mutual best hits (excluding hits to themselves), they were identified as a paralog genes. Ks was calculated using the KaKs_calculator (v2.0) (Wang et al. 2010) for each paralog. For comparison, we also plotted the Ks distribution between B. tsinlingensis and other species.

Hox gene cluster identification

Sequences of the known Hox genes were downloaded from the Swiss-Prot database and used for Hox gene annotation in other species. Specifically, the downloaded sequences were aligned to the genome assembly using BLAST software (tblastn, v2.9.0) (Altschul et al. 1990) with an e-value of 1e⁻¹⁰. The lowest e-value of the annotation result was selected for the final annotation of each gene. Hox genes that aligned length over 150 bp are indicated by boxes with solid lines, and those with an alignment length between 100 and 150 bp are indicated by boxes with dotted lines. The same Hox gene cluster distributed in different scaffold/chromosome was concatenated by dotted lines.

Gene expression profile analysis

RNA extracted from heart, kidney, liver, spleen, and muscle of B. tsinlingensis was sequenced on the Illumina sequencing platform. For each tissue/organ, 3 biological replicates were sampled and used for the RNA extraction and sequencing. Then, we performed normalization and differential gene expression analysis between these 5 tissues/organs using the “DESeq2” R package in pairwise fashion. As a method for differential analysis of transcriptome count data, DESeq2 improves the interpretability and stability of estimation because of shrinkage estimators for fold change (FC) and dispersion. The differential gene expression analysis was conducted. We specified |log2FC| > 1 and false discovery rate < 0.001 as cut-offs to identify qualified DEGs. Principal component analyses (PCA) for all organs/tissues were performed using the RNA-seq data. All FPKM values for the different tissues/organs were used to create a plot using the prcomp function in the R package (pc <- prcomp (data[, −1])) and also a plot using the autoplot function in the R package (autoplot (pc, data = data, color = “tracking_id”, label = TRUE, text = TRUE, label.size = 5)).

Results and discussion

Genome sequencing, assembly, and evaluation

We obtained a total of 714,523,240 reads from the Illumina PE library, yielding a total sequence length of 100,539,703,234 bp (Supplementary Table 1). K-mer analysis (K = 17) indicated that the genome size is 2.30 Gb with slightly repeats ratio (Supplementary Fig. 1). After contamination screen and taxonomic group of reads, most of the reads (89.58%) were align to Salmoniformes. Other reads were align to Cypriniformes (1.41%), Esociformes (0.64%), Eupercaria (0.47%), Cyprinodontiformes (0.31%), Perciformes (0.25%), Cichliformes (0.24%), Pleuronectiformes (0.22%), Tetraodontiformes (0.20%), Characiformes (0.17%), Osteoglossiformes (0.12%), and other species. These results showed that our data were not contaminated by prokaryotes, fungi, or plants. We also generated 257.70 Gb raw data using the Nanopore platform (111.95× coverage) for genome assembly: a data set of 13,970,063 reads with an N50 of 22,905 bp (Supplementary Table 2). To construct the chromosome-level genome, 373.43 Gb Hi-C data were generated for scaffolding (Supplementary Table 3). The final assembled B. tsinlingensis genome is summarized in Supplementary Tables 4 and 5. In the contig-level genome, 414 scaffolds/chromosomes (>100 bp) were included, with a total length of 2,031,671,487 bp. In the final chromosome-level assembly, 67 scaffolds/chromosomes (>100 bp) were included, with a total length of 2,031,709,341 bp. The N50 was 50.15 Mb, and chromosome length divided by the whole genome length was 99.58% (Fig. 1a). Compared with the published genomes of salmonids, our genome is a high-quality genome (Supplementary Table 6). The chromosomes number of B. tsinlingensis was 40, with length ranged from 23,778,729 to 111,498,818 bp. We also used different strategies to evaluate the quality of this assembly, including the assembled transcript mapping ratio from the RNA-seq data of 5 different tissues/organs (Supplementary Tables 7–9), BUSCO results are based on the Metazoan, Eukaryota, and Actinopterygii model set (Supplementary Table 10), Illumina short reads mapping ratio (Supplementary Table 11), and the genomic synteny between B. tsinlingensis and 6 other salmonids (Supplementary Figs. 2–7). All of these results indicated that this genome has a high level of accuracy, continuity, and connectivity.

Fig. 1. — The assembled *B. tsinlingensis* genome and its codon usage characteristics. a) Overview of Hi-C heatmap for assembled chromosomes of *B. tsinlingensis*. The color indicates the frequency of Hi–C interaction links. b) Circos graph of the genome characteristics. Shown from the outer circle to the inner ring are the gene distribution, tandem repeats (TRP), long tandem repeats (LTRs), short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), DNA elements (DNA), and genome GC content. c) Codon usage bias. Values of the codon bias index (CBI) on the frequency of guanine + cytosine at the synonymous third position of codons (GC3s) were determined using the nucleotide sequences of all predicted genes concatenated for individual species. d) The third position of the synonymous codon of these species.

The GC content of the genome and CDS of B. tsinlingensis and other related species were assessed. The genome GC content of all salmonids was quite similar and ranged from 43.03% to 43.55%, which is slightly higher compared with outgroup species, such as D. rerio (36.60%), E. lucius (42.22%), and L. oculatus (39.59%) (Supplementary Fig. 8). The CDS GC content of salmonids was also very similar and ranged from 54.19% to 55.21%, which is slightly higher compared with the outgroup species, such as D. rerio (49.85%), E. lucius (54.64%), and L. oculatus (53.25%) (Supplementary Fig. 9). These findings showed that the salmonids genome may have a special GC content pattern.

Genome annotation

From the assembled genome, the repeat sequences were identified in the genome of B. tsinlingensis. Repeat sequences accounted for 64.48% of the genome, and the DNA transposons (20.78%) were the most abundant repeat type (Supplementary Tables 12 and 13). For genome annotation, a total of 55,706 genes were predicted using different annotation methods, and the gene structure was similar to other published genomes of related species (Supplementary Fig. 10). The BUSCO results showed that most of the genes were successfully annotated in our gene set (95.29% in eukaryota database). The functional annotation results revealed that among these 55,706 protein-coding genes, homologous genes could be found in public databases for 90.14% (50,214) of the genes, which indicated that the gene structure annotation was robust (Supplementary Table 14). The gene density, all types of repeat sequences, and GC density of the assembly are shown in Fig. 1b. Owing to the Ss4R event, codon usage in salmonids was uniform (Fig. 1c). In addition, the third position of the synonymous codon of salmonids was more likely to be G or C, which differs from other teleost (Fig. 1d).

Salmonid-specific WGD

The WGD occurred approximately 320 MYA (Jaillon et al. 2004; Kasahara et al. 2007); because of its basal occurrence in the teleost radiation, little is known about this WGD event. However, salmonid-specific WGD (Ss4R) took place in the common ancestor of salmonids approximately 80 MYA (Near et al. 2012; Macqueen and Johnston 2014), which is a much closed WGD event in vertebrates. To elucidate the history of Ss4R of salmonid, we screened the paralogs of all species by McScan (Tang et al. 2008) and calculated the distribution of the rate of transversions on 4-fold degenerate synonymous sites (4DTv). The peaks of all salmonids were concentrated near values of 4DTv at 0.02–0.03, which is the Ss4R event that occurred ∼80 MYA in all salmonids (Near et al. 2012; Macqueen and Johnston 2014). We also screened the orthologs with syntenic blocks between B. tsinlingensis and all other species separately. Furthermore, we calculated the 4DTv of the homologs in the other species, which showed that peaks of the 4DTv of the salmonids were around 0.07–0.09 (Fig. 2a). Taken together, Ss4R occurred before the divergence of salmonids from Esociformes, these results are consistent with previous results (Thorgaard et al. 1983). B. tsinlingensis was first proved to be consistent with the previous study that all salmonids shared the Ss4R event.

Fig. 2. — WGD events and the expanding genome. a) Ss4R event in *B. tsinlingensis* and other species. Solid lines are values for the 4DTv of paralogous genes in these 11 species; dotted lines are 4DTv of orthologs between *B. tsinlingensis* and other species. b) The genome composition of these species, including coding exons, tandem repeats, DNA elements, LINEs, SINEs, LTR elements, and others. c) The TE insertion times of these species.

Because of the Ss4R event, the genome size of all salmonids (which ranges from 1.85 to 2.97 Gb) is much larger than other related species (which ranges from 1.19 to 1.68 Gb). The TEs in salmonids are much more abundant compared with other species, especially the tandem repeats, DNA elements, and LINE elements (Fig. 2b). These 3 types of TEs may play important roles in WGD and contribute to the larger genome size of salmonids. In addition, the peaks of the TEs insertion time of salmonids were all around 40–80 MYA (Fig. 2c). The TEs insertion pattern of B. tsinlingensis is highly consistent with previous studies on the other salmonids (de Boer et al. 2007; Lien et al. 2016).

Hox genes of salmonids

Hox cluster organization provides a valuable marker for studying the effects of WGD in salmonids. For most teleosts with 3 rounds of WGD, the Hox gene clusters should number 7 or 8 (Stellwag 1999; McArthur et al. 2003). However, we found that the number of Hox gene clusters in salmonids was 13, including HoxAaα, HoxAaβ, HoxAbα, HoxBaα, HoxBaβ, HoxBbα, HoxBbβ, HoxCaα, HoxCaβ, HoxCbα, HoxCbβ, HoxDaα, and HoxDaβ (Fig. 3). HoxAbβ appears to have been lost in the common ancestor of Brachymystax+Hucho, Oncorhynchus+Salvelinus+Salvethymus, and Salmo. All these published salmonid species do not possess the HoxAbβ Hox cluster. We also used the genome of Thymallus thymallus (Thymallus), Coregonus clupeaformis (Coregonus), and Coregonus lavaretus (Coregonus). Both of them are the ancient lineage of salmonids, and the HoxAbβ Hox cluster was lost in these species (Supplementary Fig. 11a). So, these results indicated us that the salmonids may have lost the HoxAbβ Hox cluster at a quite early stage (Supplementary Fig. 11b). With the exception of the assembly quality of salmonids, the 13 Hox clusters in salmonids were highly similar. Compared with the previous studies (Mungpakdee et al. 2008), all these results confirmed that the Hox clusters were likely derived from a single common ancestor.

Fig. 3. — *Hox clusters* in these species. Both Ts3R event and Ss4R event are shown in this figure. The broken edges in the *Hox* gene cluster means this cluster is not assembled into one complete scaffold/chromosome. *Hox* genes that aligned over 150 bp are indicated by boxes with solid lines, and those with an alignment length between 100 and 150 bp are indicated by boxes with dotted lines.

Evolutionary rate of salmonids

B. tsinlingensis and 8 other salmonid fishes formed a Salmonidae cluster, and B. tsinlingensis was one of the most ancient salmonid lineage. The divergence time between E. lucius and salmonid fishes was 132.9 MYA, and the divergence time between B. tsinlingensis and other salmonid fishes was 41.4 MYA (Fig. 4a and Supplementary Figs. 11–13). Studies of mutation frequencies over time have shown that species vary in their evolutionary rates. Thus, the evolutionary rate of B. tsinlingensis and other salmonids may be quite different from other fishes. The evolutionary rate across salmonids was quite similar, but compared with other outgroup species was on the whole slower. However, the evolutionary rate of B. tsinlingensis was the fastest of all previously published evolutionary rates for salmonids (Fig. 4b and Supplementary Tables 15 and 16). These results indicate that after Ss4R, all of the salmonids shared a much slower evolutionary rate.

Fig. 4. — Comparative genomics of *B. tsinlingensis* and other closely related species. a) The phylogenetic relationship and divergence time in these species. *L. oculatus* was used as an outgroup species. b) The relative evolutionary rate of these species; the reference species was *B. tsinlingensis*. c) The Ka/Ks distribution of these species calculated by PAML.

Positively selected genes

Species often face various environmental pressures, which may explain the presence of several rapidly evolving, positively selected genes. Some specific gene loci may also undergo nucleotide substitutions. We determined positively selected genes in B. tsinlingensis and in the salmonid lineage with E. lucius as the only outgroup species. B. tsinlingensis has the highest Ka/Ks value among these species, indicating that B. tsinlingensis possessed much more rapidly evolving genes (Fig. 4c). This result is consistent with the findings above indicating that B. tsinlingensis has the fastest evolutionary rate among salmonids. There were 36 positively selected genes in B. tsinlingensis and 14 positively selected genes in salmonids. Four positively selected genes in B. tsinlingensis were related to the immune response, including c1qb, cebpb, tnfa, and psme1 (Supplementary Table 17). c1qb is associated with the brain immune system and social behavior (Ma et al. 2015), cebpb is a key immune-related gene that may be involved in sepsis (Xu et al. 2020), tnfa plays a vital role in the immune response by regulating several pathways that produce an immediate inflammatory reaction (Holbrook et al. 2019), and psme1 may play an important role in the progression of fanconi anemia to acute myeloid leukemia (Hou et al. 2020). For the salmonid line, 2 positively selected genes were related to cell division and muscle development. btg3 may play an important role in tumor suppression and is a key effector kinase in the cell cycle checkpoint response (Cheng et al. 2013). rapsn has been shown to be associated with congenital myasthenic syndromes. cld10 can form paracellular channels with ion selectivity; its variant causes anhidrosis and kidney damage (Joakim et al. 2017) (Supplementary Table 18). Both muscle function and kidney function are important for the anadromy of salmonids, as salmonids need to be able to inhabit freshwater and seawater, which requires adaptation to both types of environments. The anadromy of salmonids also requires strong muscles. Both of these genes were positively selected in the salmonid line, which may be associated with the migratory habits of these species.

Muscle development genes in B. tsinlingensis

The PCA results, based on the expressed genes, showed that each tissue/organ (muscle, liver, spleen, heart, and kidney) were tightly clustered, and the expression pattern of liver and muscle were different from other tissue/organs (Fig. 5a). To explain the muscle development and high nutritional value of meat in B. tsinlingensis, the muscle and other 4 tissue/organ were used for RNA-seq. According to the differential gene expression analysis using DESeq2, there are 1,380 specifically highly expressed genes in muscle. The enrichment results showed that these genes were related to the muscle development and movement, including contractile fiber, myofibril, sarcomere, striated muscle thin filament, troponin, cytoskeletal part, actin cytoskeleton, structural constituent of myelin sheath, structural molecule activity, and calcium ion binding (Fig. 5b). Besides, the enrichment pathway, like protein digestion and absorption, insulin signaling pathway, biosynthesis of amino acids, fructose and mannose metabolism, insulin signaling pathway, and regulation of actin cytoskeleton (Fig. 5c). All these pathways are related to the muscle development and movement, which may be associated with the migratory habits of B. tsinlingensis.

Figure 5. — Gene expression profile of *B. tsinlingensis*. a) PCA for these organs/tissues in *B. tsinlingensis*. b) GO enrichment pathway of specifically highly expressed genes in muscle. c) KEGG enrichment pathway of specifically highly expressed genes in muscle.

Summary and conclusions

B. tsinlingensis is the first sequenced genome in Brachymystax. Its genomic resources not only provide new insights into its evolution but also supply basic data for future studies of salmonid evolution. We presented the first chromosome-level genome assembly of B. tsinlingensis with an N50 ∼50.15 Mb. Several different methods confirmed the high quality and accuracy of the assembled genome. Comparison of the genome of B. tsinlingensis with that of other salmonids revealed that the basic genome characters of salmonids were similar after the Ss4R event. Ss4R also caused an increase in TEs in salmonids and the number of genes, and several positively selected genes were identified. The evolutionary rate of all salmonids was slower compared with other species. In addition, there is a pressing need to protect these endangered species. Habitat conservation for B. tsinlingensis is particularly important for ensuring the long-term survival and viability of B. tsinlingensis populations.

Data availability

All the raw genome sequencing data from different platforms, RNA-seq data, and the assembled genome during the current study are available in the National Centre for Biotechnology Information (NCBI) under BioProject accession number PRJNA713905.

Supplemental material is available at G3 online.

Supplementary Material

jkac162_Supplementary_Data

Click here for additional data file.^{(4.6MB, docx)}

Acknowledgments

The authors thank Professor Feng Wang (Yellow River Fisheries Research Institute, Chinese Academy of Fishery Science) for assistance with some materials.

Funding

This work was funded by grants from the National Natural Science Foundation of China (31872203) and Fundamental Research Funds for the Central Universities of Shaanxi normal university (2020TS051).

Conflicts of interest

None declared.

Author contributions

P Y and WZ conceived and designed the investigation. LN and LJ performed field and laboratory work. ZW assembled the genome. HL performed the Hi-C scaffold. YR, WZ, ZW, and HL analyzed the genome data. PL and PY contributed materials and reagents. PY and WZ wrote the article. All the authors read and approved the final manuscript.

Contributor Information

Wenbo Zhu, College of Life Science, Shaanxi Normal University, Xi’an 710062, P. R. China; School of Ecology and Environment, Northwestern Polytechnical University, Xi’an 710072, P. R. China.

Zhongkai Wang, School of Ecology and Environment, Northwestern Polytechnical University, Xi’an 710072, P. R. China.

Haorong Li, School of Ecology and Environment, Northwestern Polytechnical University, Xi’an 710072, P. R. China.

Ping Li, Centre for Research on Environmental Ecology and Fish Nutrition (CREEFN) of the Ministry of Agriculture and Rural Affairs, Shanghai Ocean University, Shanghai 201306, P. R. China.

Lili Ni, College of Life Science, Shaanxi Normal University, Xi’an 710062, P. R. China.

Li Jiao, College of Life Science, Shaanxi Normal University, Xi’an 710062, P. R. China.

Yandong Ren, School of Ecology and Environment, Northwestern Polytechnical University, Xi’an 710072, P. R. China.

Ping You, College of Life Science, Shaanxi Normal University, Xi’an 710062, P. R. China.

Literature cited

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. [DOI] [PubMed] [Google Scholar]
Bedell JA, Korf I, Gish W.. Maskeraid: aperformance enhancement to repeatmasker. Bioinformatics. 2000;16(11):1040–1041. [DOI] [PubMed] [Google Scholar]
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
Berthelot C, Brunet F, Chalopin D, Juanchich A, Bernard M, Noël B, Bento P, Da Silva C, Labadie K, Alberti A, et al. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nat Commun. 2014;5:3657. [DOI] [PMC free article] [PubMed] [Google Scholar]
Birney E, Clamp M, Durbin R.. Genewise and genomewise. Genome Res. 2004;14(5):988–995. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bowen NJ, McDonald JF.. Drosophila euchromatic LTR retrotransposons are much younger than the host species in which they reside. Genome Res. 2001;11(9):1527–1540. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chalopin D, Naville M, Plard F, Galiana D, Volff JN.. Comparative analysis of transposable elements highlights mobilome diversity and evolution in vertebrates. Genome Biol Evol. 2015;7(2):567–580. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chang Z, Li G, Liu J, Zhang Y, Ashby C, Liu D, Cramer CL, Huang X.. Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol. 2015;16(1):30. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cheng YC, Lin TY, Shieh SY.. Candidate tumor suppressor BTG3 maintains genomic stability by promoting Lys63-linked ubiquitination and activation of the checkpoint kinase chk1. Proc Natl Acad Sci USA. 2013;110(15):5993–5998. [DOI] [PMC free article] [PubMed] [Google Scholar]
Christensen KA, Leong JS, Dionne S, Biagi CA, Minkley DR, Withler RE, Rondeau E, Koop BF, Devlin RH, Meador JP.. Chinook salmon (Oncorhynchus tshawytscha) genome and transcriptome. PLoS One. 2018;13(4):e0195461. [DOI] [PMC free article] [PubMed] [Google Scholar]
Christensen KA, Rondeau EB, Minkley DR, Sakhrani D, Biagi CA, Flores A-M, Withler RE, Pavey SA, Beacham TD, Godin T, et al. The sockeye salmon genome, transcriptome, and analyses identifying population defining regions of the genome. PLoS One. 2020;15(10):e0240935. [DOI] [PMC free article] [PubMed] [Google Scholar]
de Boer JG, Yazawa R, Davidson WS, Koop BF.. Bursts and horizontal evolution of DNA transposons in the speciation of pseudotetraploid salmonids. BMC Genomics. 2007;8:422. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356(6333):92–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
Durand N, Shamim M, Machol I, Rao SP, Huntley M, Lander E, Aiden EL.. Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Syst. 2016;3(1):95–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edgar RC. Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004a;5(5):113. [DOI] [PMC free article] [PubMed] [Google Scholar]
Edgar RC. Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004b;32(5):1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
Froese R, Pauly D.. 2014. Fishbase. World wide web electronic publication. (http://www.fishbase.org).
Gao G, Magadan S, Waldbieser GC, Youngblood RC, Wheeler PA, Scheffler BE, Thorgaard GH, Palti Y.. A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals chromosomal rearrangements in rainbow trout. G3 (Bethesda). 2021;11(4):jkab052. [DOI] [PMC free article] [PubMed] [Google Scholar]
Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–5666. [DOI] [PMC free article] [PubMed] [Google Scholar]
Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR.. Automated eukaryotic gene structure annotation using evidencemodeler and the program to assemble spliced alignments. Genome Biol. 2008;9(1):R7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Holbrook J, Lara-Reyna S, Jarosz-Griffiths H, McDermott M.. Tumour necrosis factor signalling in health and disease [version 1; peer review: 2 approved]. F1000Research. 2019;8:111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hou H, Li D, Gao J, Gao L, Lu Q, Hu Y, Wu S, Chu X, Yao Y, Wan L, et al Proteomic profiling and bioinformatics analysis identify key regulators during the process from fanconi anemia to acute myeloid leukemia. Am J Transl Res. 2020;12(4):1415–1427. [PMC free article] [PubMed] [Google Scholar]
Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013;496(7446):498–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
Inoue JG, Miya M, Tsukamoto K, Nishida M.. Basal actinopterygian relationships: a mitogenomic perspective on the phylogeny of the “ancient fish”. Mol Phylogenet Evol. 2003;26(1):110–120. [DOI] [PubMed] [Google Scholar]
Ishiguro NB, Miya M, Nishida M.. Basal euteleostean relationships: a mitogenomic perspective on the phylogenetic reality of the “protacanthopterygii”. Mol Phylogenet Evol. 2003;27(3):476–488. [DOI] [PubMed] [Google Scholar]
Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, et al. Genome duplication in the teleost fish tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431(7011):946–957. [DOI] [PubMed] [Google Scholar]
Joakim K, Susanne M, Muhammad T, Muhammad J, Tilman B, Jens S, Ambrin F, Maria A, Muhammad S, Mäbert K, et al. Altered paracellular cation permeability due to a rare CLDN10B variant causes anhidrosis and kidney damage. PLoS Genetics. 2017;13(7):e1006897. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, Ahsan B, Yamada T, Nagayasu Y, Doi K, Kasai Y, et al. The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007;447(7145):714–719. [DOI] [PubMed] [Google Scholar]
Kent WJ. Blat–the blast-like alignment tool. Genome Res. 2002;12(4):656–664. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC.. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011;21(3):487–493. [DOI] [PMC free article] [PubMed] [Google Scholar]
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA.. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kumar S, Stecher G, Tamura K.. Mega7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–1874. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li L, Stoeckert CJ, Roos DS.. Orthomcl: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–2189. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H, Durbin R.. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25(14):1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lien S, Koop BF, Sandve SR, Miller JR, Kent MP, Nome T, Hvidsten TR, Leong JS, Minkley DR, Zimin A, et al. The Atlantic salmon genome provides insights into rediploidization. Nature. 2016;533(7602):200–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu B, Shi Y, Yuan J, Hu X, Zhang H, Li N, Li Z, Chen Y, Mu D, Fa NW.. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Quant Biol. 2013;35(s 1–3):62–67. [Google Scholar]
Ma L, Piirainen S, Kulesskaya N, Rauvala H, Tian L.. Association of brain immune genes with social behavior of inbred mouse strains. J Neuroinflam. 2015;12(1):75. [DOI] [PMC free article] [PubMed] [Google Scholar]
Macqueen DJ, Johnston IA.. A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification. Proc Biol Sci. 2014;281(1778):20132881. [DOI] [PMC free article] [PubMed] [Google Scholar]
McArthur AG, Hegelund T, Cox RL, Stegeman JJ, Liljenberg M, Olsson U, Sundberg P, Celander MC.. Phylogenetic analysis of the cytochrome p450 3 (cyp3) gene family. J Mol Evol. 2003;57(2):200–211. [DOI] [PubMed] [Google Scholar]
Mungpakdee S, Seo HC, Angotzi AR, Dong X, Akalin A, Chourrout D.. Differential evolution of the 13 Atlantic salmon Hox clusters. Mol Biol Evol. 2008;25(7):1333–1343. [DOI] [PubMed] [Google Scholar]
Near TJ, Eytan RI, Dornburg A, Kuhn KL, Moore JA, Davis MP, Wainwright PC, Friedman M, Smith WL.. Resolution of ray-finned fish phylogeny and timing of diversification. Proc Natl Acad Sci USA. 2012;109(34):13698–13703. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ren J, Liang G.. Resource survey report of Brachymystax lenok tsinlingensi in Qianhe river valleys of Qinling mountains. J Shaanxi Normal Univ Nat Sci Ed. 2004;(S2):165–168. [Google Scholar]
Robertson FM, Gundappa MK, Grammes F, Hvidsten TR, Redmond AK, Lien S, Martin SAM, Holland PWH, Sandve SR, Macqueen DJ.. Lineage-specific rediploidization is a mechanism to explain time-lags between genome duplication and evolutionary diversification. Genome Biol. 2017;18(1):111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sanderson MJ. R8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 2003;19(2):301–302. [DOI] [PubMed] [Google Scholar]
SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL.. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20(1):43–45. [DOI] [PubMed] [Google Scholar]
Si S, Wang Y, Xu G, Yang S, Mou Z, Song Z.. Complete mitochondrial genomes of two lenoks, Brachymystax lenok and brachymystax lenok tsinlingensis. Mitochondrial DNA. 2012;23(5):338–340. [DOI] [PubMed] [Google Scholar]
Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM.. Busco: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. [DOI] [PubMed] [Google Scholar]
Stamatakis A. Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stanke M, Waack S.. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19(Suppl 2):ii215–ii225. [DOI] [PubMed] [Google Scholar]
Stellwag EJ. Hox gene duplication in fish. Semin Cell Dev Biol. 1999;10(5):531–540. [DOI] [PubMed] [Google Scholar]
Takezaki N, Rzhetsky A, Nei M.. Phylogenetic test of the molecular clock and linearized trees. Mol Biol Evol. 1995;12(5):823–833. [DOI] [PubMed] [Google Scholar]
Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH.. Synteny and collinearity in plant genomes. Science. 2008;320(5875):486–488. [DOI] [PubMed] [Google Scholar]
Thorgaard GH, Allendorf FW, Knudsen KL.. Gene-centromere mapping in rainbow trout: high interference over long map distances. Genetics. 1983;103(4):771–783. [DOI] [PMC free article] [PubMed] [Google Scholar]
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9(11):e112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang D, Zhang Y, Zhang Z, Zhu J, Yu J.. Kaks_calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteomics Bioinformatics. 2010;8(1):77–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xing YC, Lv BB, Ye EQ, Fan EY, Li SY, Wang LX, Zhang CG, Zhao YH.. Revalidation and redescription of Brachymystax tsinlingensis Li, 1966 (Salmoniformes: Salmonidae) from China. Zootaxa. 2015;3962:191–205. [DOI] [PubMed] [Google Scholar]
Xu C, Xu J, Lu L, Tian W, Ma J, Wu M.. Identification of key genes and novel immune infiltration-associated biomarkers of sepsis. Innate Immun. 2020;26(8):666–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang D, Li X, Cheng B.. The distributing actuality and protecting countermeasure of rare aquatic animals in Xushui river of Qinling mountains. J Fish Sci Chin. 1999;6(3):123–125. [Google Scholar]
Yang Z. Paml 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–1591. [DOI] [PubMed] [Google Scholar]
Yu JN, Kwak M.. The complete mitochondrial genome of Brachymystax lenok tsinlingensis (Salmoninae, Salmonidae) and its intraspecific variation. Gene. 2015;573(2):246–253. [DOI] [PubMed] [Google Scholar]
Zdobnov EM, Apweiler R.. Interproscan – an integration platform for the signature-recognition methods in interpro. Bioinformatics. 2001;17(9):847–848. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

jkac162_Supplementary_Data

Click here for additional data file.^{(4.6MB, docx)}

Data Availability Statement

Supplemental material is available at G3 online.

[jkac162-B1] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. [DOI] [PubMed] [Google Scholar]

[jkac162-B2] Bedell JA, Korf I, Gish W.. Maskeraid: aperformance enhancement to repeatmasker. Bioinformatics. 2000;16(11):1040–1041. [DOI] [PubMed] [Google Scholar]

[jkac162-B3] Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27(2):573–580. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B4] Berthelot C, Brunet F, Chalopin D, Juanchich A, Bernard M, Noël B, Bento P, Da Silva C, Labadie K, Alberti A, et al. The rainbow trout genome provides novel insights into evolution after whole-genome duplication in vertebrates. Nat Commun. 2014;5:3657. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B5] Birney E, Clamp M, Durbin R.. Genewise and genomewise. Genome Res. 2004;14(5):988–995. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B6] Bowen NJ, McDonald JF.. Drosophila euchromatic LTR retrotransposons are much younger than the host species in which they reside. Genome Res. 2001;11(9):1527–1540. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B7] Chalopin D, Naville M, Plard F, Galiana D, Volff JN.. Comparative analysis of transposable elements highlights mobilome diversity and evolution in vertebrates. Genome Biol Evol. 2015;7(2):567–580. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B8] Chang Z, Li G, Liu J, Zhang Y, Ashby C, Liu D, Cramer CL, Huang X.. Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol. 2015;16(1):30. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B9] Cheng YC, Lin TY, Shieh SY.. Candidate tumor suppressor BTG3 maintains genomic stability by promoting Lys63-linked ubiquitination and activation of the checkpoint kinase chk1. Proc Natl Acad Sci USA. 2013;110(15):5993–5998. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B10] Christensen KA, Leong JS, Dionne S, Biagi CA, Minkley DR, Withler RE, Rondeau E, Koop BF, Devlin RH, Meador JP.. Chinook salmon (Oncorhynchus tshawytscha) genome and transcriptome. PLoS One. 2018;13(4):e0195461. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B11] Christensen KA, Rondeau EB, Minkley DR, Sakhrani D, Biagi CA, Flores A-M, Withler RE, Pavey SA, Beacham TD, Godin T, et al. The sockeye salmon genome, transcriptome, and analyses identifying population defining regions of the genome. PLoS One. 2020;15(10):e0240935. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B12] de Boer JG, Yazawa R, Davidson WS, Koop BF.. Bursts and horizontal evolution of DNA transposons in the speciation of pseudotetraploid salmonids. BMC Genomics. 2007;8:422. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B13] Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, Shamim MS, Machol I, Lander ES, Aiden AP, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356(6333):92–95. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B14] Durand N, Shamim M, Machol I, Rao SP, Huntley M, Lander E, Aiden EL.. Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Syst. 2016;3(1):95–98. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B15] Edgar RC. Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004a;5(5):113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B16] Edgar RC. Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004b;32(5):1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B17] Froese R, Pauly D.. 2014. Fishbase. World wide web electronic publication. (http://www.fishbase.org).

[jkac162-B18] Gao G, Magadan S, Waldbieser GC, Youngblood RC, Wheeler PA, Scheffler BE, Thorgaard GH, Palti Y.. A long reads-based de-novo assembly of the genome of the Arlee homozygous line reveals chromosomal rearrangements in rainbow trout. G3 (Bethesda). 2021;11(4):jkab052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B19] Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick LI, Maiti R, Ronning CM, Rusch DB, Town CD, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31(19):5654–5666. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B20] Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR.. Automated eukaryotic gene structure annotation using evidencemodeler and the program to assemble spliced alignments. Genome Biol. 2008;9(1):R7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B21] Holbrook J, Lara-Reyna S, Jarosz-Griffiths H, McDermott M.. Tumour necrosis factor signalling in health and disease [version 1; peer review: 2 approved]. F1000Research. 2019;8:111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B22] Hou H, Li D, Gao J, Gao L, Lu Q, Hu Y, Wu S, Chu X, Yao Y, Wan L, et al Proteomic profiling and bioinformatics analysis identify key regulators during the process from fanconi anemia to acute myeloid leukemia. Am J Transl Res. 2020;12(4):1415–1427. [PMC free article] [PubMed] [Google Scholar]

[jkac162-B23] Howe K, Clark MD, Torroja CF, Torrance J, Berthelot C, Muffato M, Collins JE, Humphray S, McLaren K, Matthews L, et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature. 2013;496(7446):498–503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B24] Inoue JG, Miya M, Tsukamoto K, Nishida M.. Basal actinopterygian relationships: a mitogenomic perspective on the phylogeny of the “ancient fish”. Mol Phylogenet Evol. 2003;26(1):110–120. [DOI] [PubMed] [Google Scholar]

[jkac162-B25] Ishiguro NB, Miya M, Nishida M.. Basal euteleostean relationships: a mitogenomic perspective on the phylogenetic reality of the “protacanthopterygii”. Mol Phylogenet Evol. 2003;27(3):476–488. [DOI] [PubMed] [Google Scholar]

[jkac162-B26] Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, et al. Genome duplication in the teleost fish tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004;431(7011):946–957. [DOI] [PubMed] [Google Scholar]

[jkac162-B27] Joakim K, Susanne M, Muhammad T, Muhammad J, Tilman B, Jens S, Ambrin F, Maria A, Muhammad S, Mäbert K, et al. Altered paracellular cation permeability due to a rare CLDN10B variant causes anhidrosis and kidney damage. PLoS Genetics. 2017;13(7):e1006897. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B28] Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, Ahsan B, Yamada T, Nagayasu Y, Doi K, Kasai Y, et al. The medaka draft genome and insights into vertebrate genome evolution. Nature. 2007;447(7145):714–719. [DOI] [PubMed] [Google Scholar]

[jkac162-B29] Kent WJ. Blat–the blast-like alignment tool. Genome Res. 2002;12(4):656–664. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B30] Kiełbasa SM, Wan R, Sato K, Horton P, Frith MC.. Adaptive seeds tame genomic sequence comparison. Genome Res. 2011;21(3):487–493. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B31] Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA.. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–1645. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B32] Kumar S, Stecher G, Tamura K.. Mega7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–1874. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B33] Li L, Stoeckert CJ, Roos DS.. Orthomcl: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13(9):2178–2189. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B34] Li H, Durbin R.. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25(14):1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B35] Lien S, Koop BF, Sandve SR, Miller JR, Kent MP, Nome T, Hvidsten TR, Leong JS, Minkley DR, Zimin A, et al. The Atlantic salmon genome provides insights into rediploidization. Nature. 2016;533(7602):200–205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B36] Liu B, Shi Y, Yuan J, Hu X, Zhang H, Li N, Li Z, Chen Y, Mu D, Fa NW.. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Quant Biol. 2013;35(s 1–3):62–67. [Google Scholar]

[jkac162-B37] Ma L, Piirainen S, Kulesskaya N, Rauvala H, Tian L.. Association of brain immune genes with social behavior of inbred mouse strains. J Neuroinflam. 2015;12(1):75. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B38] Macqueen DJ, Johnston IA.. A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification. Proc Biol Sci. 2014;281(1778):20132881. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B39] McArthur AG, Hegelund T, Cox RL, Stegeman JJ, Liljenberg M, Olsson U, Sundberg P, Celander MC.. Phylogenetic analysis of the cytochrome p450 3 (cyp3) gene family. J Mol Evol. 2003;57(2):200–211. [DOI] [PubMed] [Google Scholar]

[jkac162-B40] Mungpakdee S, Seo HC, Angotzi AR, Dong X, Akalin A, Chourrout D.. Differential evolution of the 13 Atlantic salmon Hox clusters. Mol Biol Evol. 2008;25(7):1333–1343. [DOI] [PubMed] [Google Scholar]

[jkac162-B41] Near TJ, Eytan RI, Dornburg A, Kuhn KL, Moore JA, Davis MP, Wainwright PC, Friedman M, Smith WL.. Resolution of ray-finned fish phylogeny and timing of diversification. Proc Natl Acad Sci USA. 2012;109(34):13698–13703. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B42] Ren J, Liang G.. Resource survey report of Brachymystax lenok tsinlingensi in Qianhe river valleys of Qinling mountains. J Shaanxi Normal Univ Nat Sci Ed. 2004;(S2):165–168. [Google Scholar]

[jkac162-B43] Robertson FM, Gundappa MK, Grammes F, Hvidsten TR, Redmond AK, Lien S, Martin SAM, Holland PWH, Sandve SR, Macqueen DJ.. Lineage-specific rediploidization is a mechanism to explain time-lags between genome duplication and evolutionary diversification. Genome Biol. 2017;18(1):111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B44] Sanderson MJ. R8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 2003;19(2):301–302. [DOI] [PubMed] [Google Scholar]

[jkac162-B45] SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL.. The paleontology of intergene retrotransposons of maize. Nat Genet. 1998;20(1):43–45. [DOI] [PubMed] [Google Scholar]

[jkac162-B46] Si S, Wang Y, Xu G, Yang S, Mou Z, Song Z.. Complete mitochondrial genomes of two lenoks, Brachymystax lenok and brachymystax lenok tsinlingensis. Mitochondrial DNA. 2012;23(5):338–340. [DOI] [PubMed] [Google Scholar]

[jkac162-B47] Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM.. Busco: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. [DOI] [PubMed] [Google Scholar]

[jkac162-B48] Stamatakis A. Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–1313. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B49] Stanke M, Waack S.. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003;19(Suppl 2):ii215–ii225. [DOI] [PubMed] [Google Scholar]

[jkac162-B50] Stellwag EJ. Hox gene duplication in fish. Semin Cell Dev Biol. 1999;10(5):531–540. [DOI] [PubMed] [Google Scholar]

[jkac162-B51] Takezaki N, Rzhetsky A, Nei M.. Phylogenetic test of the molecular clock and linearized trees. Mol Biol Evol. 1995;12(5):823–833. [DOI] [PubMed] [Google Scholar]

[jkac162-B52] Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH.. Synteny and collinearity in plant genomes. Science. 2008;320(5875):486–488. [DOI] [PubMed] [Google Scholar]

[jkac162-B53] Thorgaard GH, Allendorf FW, Knudsen KL.. Gene-centromere mapping in rainbow trout: high interference over long map distances. Genetics. 1983;103(4):771–783. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B54] Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9(11):e112963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B55] Wang D, Zhang Y, Zhang Z, Zhu J, Yu J.. Kaks_calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genomics Proteomics Bioinformatics. 2010;8(1):77–80. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B56] Xing YC, Lv BB, Ye EQ, Fan EY, Li SY, Wang LX, Zhang CG, Zhao YH.. Revalidation and redescription of Brachymystax tsinlingensis Li, 1966 (Salmoniformes: Salmonidae) from China. Zootaxa. 2015;3962:191–205. [DOI] [PubMed] [Google Scholar]

[jkac162-B57] Xu C, Xu J, Lu L, Tian W, Ma J, Wu M.. Identification of key genes and novel immune infiltration-associated biomarkers of sepsis. Innate Immun. 2020;26(8):666–682. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jkac162-B58] Yang D, Li X, Cheng B.. The distributing actuality and protecting countermeasure of rare aquatic animals in Xushui river of Qinling mountains. J Fish Sci Chin. 1999;6(3):123–125. [Google Scholar]

[jkac162-B59] Yang Z. Paml 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586–1591. [DOI] [PubMed] [Google Scholar]

[jkac162-B60] Yu JN, Kwak M.. The complete mitochondrial genome of Brachymystax lenok tsinlingensis (Salmoninae, Salmonidae) and its intraspecific variation. Gene. 2015;573(2):246–253. [DOI] [PubMed] [Google Scholar]

[jkac162-B61] Zdobnov EM, Apweiler R.. Interproscan – an integration platform for the signature-recognition methods in interpro. Bioinformatics. 2001;17(9):847–848. [DOI] [PubMed] [Google Scholar]

PERMALINK

A chromosome-level genome of Brachymystax tsinlingensis provides resources and insights into salmonids evolution

Wenbo Zhu

Zhongkai Wang

Haorong Li

Ping Li

Lili Ni

Li Jiao

Yandong Ren

Ping You

Roles

Abstract

Introduction

Materials and methods

Genome data generation

Quality control of raw sequencing reads

Estimation of genome size

De novo genome assembly and Hi-C scaffolding

Genome quality evaluation

Repeats and transposable elements annotation

Gene structure annotation and functional annotation

Genome synteny

Phylogenetic inference

Molecular clock analysis

Rate of molecular evolution

Positive selection

WGD assessment

Hox gene cluster identification

Gene expression profile analysis

Results and discussion

Genome sequencing, assembly, and evaluation

Fig. 1.

Genome annotation

Salmonid-specific WGD

Fig. 2.

Hox genes of salmonids

Fig. 3.

Evolutionary rate of salmonids

Fig. 4.

Positively selected genes

Muscle development genes in B. tsinlingensis

Figure 5.

Summary and conclusions

Data availability

Supplementary Material

Acknowledgments

Funding

Conflicts of interest

Author contributions

Contributor Information

Literature cited

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases