Abstract
We present a genome assembly from an individual female Salmo trutta (the brown trout; Chordata; Actinopteri; Salmoniformes; Salmonidae). The genome sequence is 2.37 gigabases in span. The majority of the assembly is scaffolded into 40 chromosomal pseudomolecules. Gene annotation of this assembly on Ensembl has identified 43,935 protein coding genes.
Keywords: Salmo trutta, brown trout, genome sequence, chromosomal
Species taxonomy
Metazoa; Chordata; Craniata; Actinopterygii; Actinopteri; Neopterygii; Teleostei; Euteleosteomorpha; Salmoniformes; Salmonidae; Salmoninae; Salmo; Salmo trutta Linnaeus 1758 (NCBItxid:8032).
Introduction
The brown trout, Salmo trutta, is native to Europe, western Asia and North Africa; however, the species has been successfully introduced to a multitude of other geographical locations ( Klemetsen et al., 2003). Genetically similar S. trutta can be freshwater residents, freshwater migrants or anadromous (migrating to the sea to feed, only returning to freshwater to breed), leading taxonomists initially to believe that these were multiple independent species. This phenotypic difference has a genetic component but is also partly caused by environmental factors, such as food availability, which lead to changes in gene expression and drives migration and adaptation to different environments ( Ferguson et al., 2019). S. trutta also exhibit considerable genetic variation within migratory or resident populations; these differences can be seen by populations in different habitats ( Ferguson, 1989) or in the same habitat ( Andersson et al., 2017). This genetic diversity can allow populations to occupy different environments, such as those with high levels of acidity ( Prodöhl et al., 2019).
This reference genome sequence will be of utility for researchers that wish to sample and analyse the genetics of S. trutta populations, helping to understand genetic drivers behind migration and the reasons why different populations of brown trout are so well adapted to different conditions. As increases in atmospheric CO 2 continue to increase temperatures and acidify oceans, this information will help conservation of S. trutta and other species by revealing which genetic components allow populations to adapt to warmer and more acidic environments.
Genome sequence report
The genome was sequenced from a single female Salmo trutta bred at the Institute of Marine Research, Bergen, Norway. A total of 52-fold coverage in Pacific Biosciences single-molecule long reads (N50 19 kb) and 70-fold coverage in 10X Genomics read clouds (from molecules with an estimated N50 of 65 kb) were generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data, and 67-fold coverage of Bionano optical maps. Manual assembly curation corrected 175 missing/misjoins, reducing the scaffold number by 4.8% and the assembly length by 0.5%. The final assembly has a total length of 2.37 Gb in 1,441 sequence scaffolds with a scaffold N50 of 52.21 Mb ( Table 1). The majority, 91.5%, of the assembly sequence was assigned to 40 chromosomal-level scaffolds, representing 40 autosomes (numbered by sequence length). No sex chromosomes could be identified ( Figure 1; Table 2). The assembly has a BUSCO ( Simão et al., 2015) completeness of 97.2% using the actinopterygii_odb10 reference set. Genome assembly metrics, GC coverage, cumulative sequence and the Hi-C contact map are visualised in Figure 1– Figure 4, respectively.
Figure 1. Genome assembly of Salmo trutta, fSalTru1.1: metrics.
The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Salmo%20trutta/dataset/CAAJIE01/snail.
Figure 4. Genome assembly of Salmo trutta, fSalTru1.1: Hi-C contact map.
Hi-C contact map of the fSalTru1.1 assembly, visualised in HiGlass.
Table 1. Genome data for Salmo trutta, fSalTru1.1.
Project accession data | |
---|---|
Assembly identifier | fSalTru1.1 |
Species | Salmo trutta |
Specimen | fSalTru1 |
NCBI taxonomy ID | txid8032 |
BioProject | PRJEB32115 |
BioSample ID | SAMEA994732 |
Isolate information | Female, muscle |
Raw data accessions | |
PacificBiosciences SEQUEL I | ERX3245920, ERX3253848-
ERX3253850, ERX3279922- ERX3279929, ERX3288373, ERX3311049-ERX3311054, ERX3311066, ERX3318044- ERX3318049, ERX3338928, ERX3338929 |
10X Genomics Illumina | ERX3341615-ERX3341622 |
Hi-C Illumina | ERX4142808-ERX4142812 |
BioNano | ERZ1395486 |
Genome assembly | |
Assembly accession | GCA_901001165.1 |
Span (Mb) | 2,372 |
Number of contigs | 5,378 |
Contig N50 length (Mb) | 1.7 |
Number of scaffolds | 1441 |
Scaffold N50 length (Mb) | 52.2 |
Longest scaffold (Mb) | 81.5 |
BUSCO * genome score | C:94.7%[S:49.4%,D:45.3%],F:1.8%,
M:3.5%,n:4584 |
Genome annotation | |
Number of protein-coding
genes |
43,935 |
Average coding sequence
length (bp) |
2,058 |
Average number of exons
per gene |
13 |
Average exon size (bp) | 210 |
Average intron size (bp) | 2,770 |
*BUSCO scores based on the actinopterygii_odb10 BUSCO set using v5.0.0. C= complete [S= single copy, D=duplicated], F=fragmented, M=missing, n=number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/Salmo%20trutta/dataset/CAAJIE01/busco.
Table 2. Chromosomal pseudomolecules in the genome assembly of Salmo trutta, fSalTru1.1.
INSDC accession | Chromosome | Size (Mb) | GC% |
---|---|---|---|
LR584410.1 | 1 | 81.54 | 43.8 |
LR584445.1 | 2 | 75.35 | 43.6 |
LR584416.1 | 3 | 74.75 | 43.6 |
LR584420.1 | 4 | 73.17 | 43.2 |
LR584433.1 | 5 | 67.76 | 43.1 |
LR584406.1 | 6 | 60.1 | 43.5 |
LR584430.1 | 7 | 59.84 | 43.1 |
LR584407.1 | 8 | 51.19 | 43.8 |
LR584409.1 | 9 | 49.36 | 43.5 |
LR584419.1 | 10 | 46.6 | 43.2 |
LR584438.1 | 11 | 22.96 | 43.8 |
LR584441.1 | 12 | 97.53 | 43.8 |
LR584428.1 | 13 | 91.49 | 43.9 |
LR584411.1 | 14 | 86.25 | 43.3 |
LR584415.1 | 15 | 66.9 | 42.9 |
LR584431.1 | 16 | 61.35 | 43.1 |
LR584426.1 | 17 | 59.76 | 43.1 |
LR584435.1 | 18 | 59.14 | 43.1 |
LR584427.1 | 19 | 56.58 | 43.2 |
LR584429.1 | 20 | 55.16 | 43.2 |
LR584437.1 | 21 | 52.73 | 43.4 |
LR584440.1 | 22 | 52.21 | 43.6 |
LR584421.1 | 23 | 51.49 | 43.5 |
LR584412.1 | 24 | 50.33 | 43.2 |
LR584436.1 | 25 | 48.97 | 43.6 |
LR584439.1 | 26 | 48.7 | 44 |
LR584424.1 | 27 | 46.41 | 43.4 |
LR584422.1 | 28 | 46.38 | 43.5 |
LR584418.1 | 29 | 46.06 | 43.7 |
LR584432.1 | 30 | 45.79 | 43.7 |
LR584423.1 | 31 | 45.59 | 43.1 |
LR584408.1 | 32 | 44.95 | 43.9 |
LR584414.1 | 33 | 44.89 | 43.5 |
LR584434.1 | 34 | 42.9 | 43.9 |
LR584444.1 | 35 | 41.92 | 43.5 |
LR584442.1 | 36 | 41.68 | 43.9 |
LR584417.1 | 37 | 35.21 | 43.8 |
LR584425.1 | 38 | 34.89 | 43.3 |
LR584413.1 | 39 | 25.83 | 43.6 |
LR584443.1 | 40 | 25.48 | 44.1 |
Figure 2. Genome assembly of Salmo trutta, fSalTru1.1: GC coverage.
BlobToolKit GC-coverage plot. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Salmo%20trutta/dataset/CAAJIE01/blob?plotShape=circle.
Gene annotation
The Ensembl gene annotation system ( Aken et al., 2016) was used to generate annotation for the fSalTru1.1 assembly ( GCA_901001165.1) ( Table 1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of vertebrate proteins from UniProt ( UniProt Consortium, 2019). The resulting Ensembl annotation includes 122,381 transcripts assigned to 43,935 coding and 4,441 non-coding genes ( Salmo trutta - Ensembl Rapid Release).
Methods
Owing to the high genetic diversity of brown trout and the variable chromosome numbers ( S. trutta have 38-42 chromosomes, with multiple copies of these chromosomes), doubled haploid specimens were bred for sequencing and generation of the assembly. The doubled haploid female used in this study was bred on 26 November 2015 at the Institute of Marine Research using a protocol optimized for Atlantic salmon, Salmo salar (see ( Hansen et al., 2020)). In summary, eggs from one Salmo trutta female from a domestic stock that originated from Lake Tunhovd in eastern Norway were fertilized with UV irradiated milt (brown trout sperm diluted 1:40 with sperm fluid and irradiated (254 nm) for 8 mins at 0.48 mWcm 2, activated and left to hydrate in 8°C freshwater in a polyethylene (PE) container. After 4700 min.°C irradiation, the PE bottle was transferred to a pressure chamber and the eggs were subjected to a hydrostatic pressure of 655 bar for 5 mins. The eggs were incubated at approximately 6°C and surviving larvae were fed at 12°C and continuous light until June 2016 when temperature and photoperiod was changed to ambient conditions. On 16 January 2018, one female individual was euthanized (500 mgL− 1 Finquel® (MS 222) and sampled.
The specimen was transferred to the Wellcome Sanger Institute and DNA was extracted using an agarose plug extraction from spleen tissue following the Bionano Prep Animal Tissue DNA Isolation Soft Tissue Protocol.
Sequencing was performed by the Scientific Operations core at the Wellcome Sanger Institute on Pacific Biosciences SEQUEL I and Illumina HiSeq X instruments. Hi-C data were generated using the Arima Hi-C kit v1 by Arima Genomics, San Diego, USA, and sequenced on Illumina HiSeqX. BioNano data were generated in the Rockefeller University Vertebrate Genome laboratory using the Saphyr instrument. Ultra-high molecular weight DNA was extracted using the Bionano Prep Animal Tissue BioNano data were generated in the Rockefeller University Vertebrate Genome laboratory using the Saphyr instrument. Ultra-high molecular weight DNA was extracted using the Bionano Prep Animal Tissue DNA Isolation FibrousTissue Protocol and assessed by pulsed field gel and Qubit 3 fluorimetry. DNA was labeled for Bionano Genomics optical mapping following the Bionano Prep Direct Label and Stain (DLS) Protocol and run on one Saphyr instrument chip flowcell. The total yield of tagged molecules ≥ 150 kb with at least 9 sites was 272.3 Gb (N50 0.28 Mb). A CMAP (Bionano assembly consensus genome map) was de-novo assembled using Bionano Solve (see Table 3 for software versions and sources) a total map length of 2.62 Gb and a map N50 of 29.37 Mb.
Table 3. Software tools used.
Software tool | Version | Source |
---|---|---|
Falcon-unzip | falcon-kit 1.2.1 | ( Chin et al., 2016) |
SALSA2 | 2.1 | ( Ghurye et al., 2019) |
scaff10x | 3.0 | https://github.com/wtsi-hpag/Scaff10X |
arrow | GenomicConsensus 2.2.2 | https://github.com/PacificBiosciences/GenomicConsensus |
longranger align | 2.2.2 |
https://support.10xgenomics.com/genome-exome/software/
pipelines/latest/advanced/other-pipelines |
freebayes | 1.1.0-3-g961e5f3 | ( Garrison & Marth, 2012) |
bcftools consensus | 1.9 | http://samtools.github.io/bcftools/bcftools.html |
Bionano Solve | 3.2.2_08222018 | https://bionanogenomics.com/downloads/bionano-solve/ |
HiGlass | 1.11.6 | ( Kerpedjiev et al., 2018) |
PretextViewer | 0.0.4 | https://github.com/wtsi-hpag/PretextView |
gEVAL | N/A | ( Chow et al., 2016) |
BlobToolKit | 1.2 | ( Challis et al., 2020) |
Assembly was carried out following the Vertebrate Genome Project pipeline v1.0 ( Rhie et al., 2020) with Falcon-unzip ( Chin et al., 2016) and a first round of scaffolding carried out with 10X Genomics read clouds using scaff10x. Hybrid scaffolding was performed using the BioNano DLE-1 data and BioNano Solve. Scaffolding with Hi-C data ( Rao et al., 2014) was carried out with SALSA2 ( Ghurye et al., 2019). The Hi-C scaffolded assembly was polished with arrow using the PacBio data, then polished with the 10X Genomics Illumina data by aligning to the assembly with longranger align, calling variants with freebayes ( Garrison & Marth, 2012) and applying homozygous non-reference edits using bcftools consensus. Two rounds of the Illumina polishing were applied. The assembly was checked for contamination and corrected. Manual curation was performed as described previously ( Howe et al., 2021) using the gEVAL system ( Chow et al., 2016), Bionano Access, HiGlass and Pretext. Figure 1– Figure 3 and BUSCO values were generated using BlobToolKit ( Challis et al., 2020).
Figure 3. Genome assembly of Salmo trutta, fSalTru1.1: cumulative sequence.
BlobToolKit cumulative sequence plot. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Salmo%20trutta/dataset/CAAJIE01/cumulative.
Data availability
Underlying data
BioProject: Salmo trutta RefSeq Genome, Accession number PRJNA550988: https://www.ncbi.nlm.nih.gov/bioproject/550988
The genome sequence is released openly for reuse. The S. trutta genome sequencing initiative is part of the Darwin Tree of Life (DToL) project and the Vertebrate Genome Project (VGP) ordinal references programme. All raw data and the assembly have been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.
Acknowledgements
We thank Mike Stratton and Julia Wilson for their support for the 25 genomes for 25 years project.
Funding Statement
This work was supported by Wellcome through core funding to the Wellcome Sanger Institute (206194) and the Darwin Tree of Life Discretionary Award (218328). SAM and RD are supported by Wellcome (207492).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
[version 1; peer review: 3 approved]
References
- Aken BL, Ayling S, Barrell D, et al. : The Ensembl Gene Annotation System. Database (Oxford). 2016;2016:baw093. 10.1093/database/baw093 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andersson A, Jansson E, Wennerström L, et al. : Complex Genetic Diversity Patterns of Cryptic, Sympatric Brown Trout ( Salmo Trutta) Populations in Tiny Mountain Lakes. Conserv Genet. 2017;18(5):1213–27. 10.1007/s10592-017-0972-4 [DOI] [Google Scholar]
- Challis R, Richards E, Rajan J, et al. : BlobToolKit - Interactive Quality Assessment of Genome Assemblies. G3 (Bethesda). 2020;10(4):1361–74. 10.1534/g3.119.400908 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chin CS, Peluso P, Sedlazeck FJ, et al. : Phased Diploid Genome Assembly with Single-Molecule Real-Time Sequencing. Nat Methods. 2016;13(12):1050–54. 10.1038/nmeth.4035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chow W, Brugger K, Caccamo M, et al. : gEVAL - a Web-Based Browser for Evaluating Genome Assemblies. Bioinformatics. 2016;32(16):2508–10. 10.1093/bioinformatics/btw159 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ferguson A: Genetic Differences among Brown Trout, Salmo Trutta, Stocks and Their Importance for the Conservation and Management of the Species. Freshw Biol. 1989;21(1):35–46. 10.1111/j.1365-2427.1989.tb01346.x [DOI] [Google Scholar]
- Ferguson A, Reed TE, Cross TF, et al. : Anadromy, Potamodromy and Residency in Brown Trout Salmo Trutta: The Role of Genes and the Environment. J Fish Biol. 2019;95(3):692–718. 10.1111/jfb.14005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garrison E, Marth G: Haplotype-Based Variant Detection from Short-Read Sequencing.arXiv: 1207.3907. 2012. Reference Source [Google Scholar]
- Ghurye J, Rhie A, Walenz BP, et al. : Integrating Hi-C Links with Assembly Graphs for Chromosome-Scale Assembly. PLoS Comput Biol. 2019;15(8):e1007273. 10.1371/journal.pcbi.1007273 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansen TJ, Penman D, Glover KA, et al. : Production and Verification of the First Atlantic Salmon ( Salmo Salar L.) Clonal Lines. BMC Genet. 2020;21(1):71. 10.1186/s12863-020-00878-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Howe K, Chow W, Collins J, et al. : Significantly Improving the Quality of Genome Assemblies through Curation. Gigascience. 2021;10(1):giaa153. 10.1093/gigascience/giaa153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kerpedjiev P, Abdennur N, Lekschas F, et al. : HiGlass: Web-Based Visual Exploration and Analysis of Genome Interaction Maps. Genome Biol. 2018;19(1):125. 10.1186/s13059-018-1486-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Klemetsen A, Amundsen PA, Dempson JB, et al. : Atlantic Salmon Salmo Salar L., Brown Trout Salmo Trutta L. and Arctic Charr Salvelinus Alpinus (L.): A Review of Aspects of Their Life Histories. Ecol Freshw Fish. 2003;12(1):1–59. 10.1034/j.1600-0633.2003.00010.x [DOI] [Google Scholar]
- Prodöhl PA, Ferguson A, Bradley CR, et al. : Impacts of Acidification on Brown Trout Salmo Trutta Populations and the Contribution of Stocking to Population Recovery and Genetic Diversity. J Fish Biol. 2019;95(3):719–42. 10.1111/jfb.14054 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rao SSP, Huntley MH, Durand NC, et al. : A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell. 2014;159(7):1665–80. 10.1016/j.cell.2014.11.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rhie A, McCarthy SA, Fedrigo O, et al. : Towards Complete and Error-Free Genome Assemblies of All Vertebrate Species. bioRxiv. 2020; 2020.05.22.110833. 10.1101/2020.05.22.110833 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simão FA, Waterhouse RM, Ioannidis P, et al. : BUSCO: Assessing Genome Assembly and Annotation Completeness with Single-Copy Orthologs. Bioinformatics. 2015;31(19):3210–12. 10.1093/bioinformatics/btv351 [DOI] [PubMed] [Google Scholar]
- UniProt Consortium: UniProt: A Worldwide Hub of Protein Knowledge. Nucleic Acids Res. 2019;47(D1):D506–15. 10.1093/nar/gky1049 [DOI] [PMC free article] [PubMed] [Google Scholar]