Abstract
Background
Salmonids are ray-finned fishes which constitute 11 genera and at least 70 species including Atlantic salmon, whitefishes, graylings, rainbow trout, and char. The common ancestor of all Salmonidae experienced a whole genome duplication (WGD) ~80 million years ago, resulting in an autotetraploid genome. Genomic rediplodization is still going on in salmonid species, providing an unique system for studying evolutionary consequences of whole genome duplication. In recent years, high quality genome sequences of Atlantic salmon and Rainbow trout has been established, due to their scientific and commercial values. In this paper we introduce SalmoBase (http://www.salmobase.org/), a tool for making molecular resources for salmonids public available in a framework of visualizations and analytic tools.
Results
SalmoBase has been developed as a part of the ELIXIR.NO project. Currently, SalmoBase contains molecular resources for Atlantic salmon and Rainbow trout. Data can be accessed through BLAST, Genome Browser (GBrowse), Genetic Variation Browser (GVBrowse) and Gene Expression Browser (GEBrowse).
Conclusions
To the best of our knowledge, SalmoBase is the first database which integrates salmonids data and allow users to study salmonids in an integrated framework. The database and its tools (e.g., comparative genomics tools, synteny browsers) will be expanded as additional public resources describing other Salmonidae genomes become available.
Keywords: Salmobase, Atlantic salmon, Salmonids, Genome browser
Background
Salmonids (e.g. Atlantic salmon (Salmo salar), Rainbow trout (Oncorhynchus mykiss), Brown trout (Salmo trutta)) has considerable socio- and economic importance. From a biological perspective the anadromous migration pattern of salmon is of great interest, and allow investigations of unique physiological traits such as smoltification and flesh pigmentation. The evolutionary history of salmonids is particularly interesting. A whole genome duplication (WDG) event took place in a common ancestor to all salmonids ~80 million years ago [1], which makes it possible to study post duplication phenomena in a recent time frame, in contrast to other polyploid origin vertebrates whose WGDs date back further in time. These phenomena include the effects of WGDs on gene diversity and functional specialization, as well as consequences on evolution and adaptation [2].
A high quality, annotated Atlantic salmon and Rainbow trout genome sequences are now available thanks to the efforts from the International Cooperation to Sequence the Atlantic Salmon (ICSASG) and associated partners [3] and The international collaboration to sequence Rainbow trout genome, and we expect that genome sequences and genomic data for other salmonid species will be available in the near future. SalmoBase (www.salmobase.org) was developed to make these substantial amounts of data accessible through visualizations and analytic tools in a common framework. We expect that genome sequences and genomic data for other salmonid species will be available in the near future and plan to integrate this information with SalmoBase.
As a first step, the genome and genome annotations for Atlantic salmon and Rainbow trout are made available through SalmoBase. For Atlantic salmon, tissue specific gene expression data and single nucleotide polymorphisms (SNP) data are also available. Similar resources for other salmonid species will be added to SalmoBase when they become available.
Construction and content
SalmoBase was developed using HTML, PHP, Javascript, Python and MySQL. The latest version of GBrowse [4] and BLAST [5] are installed. Gene expression data are plotted using Plotly (plot.ly) Javascript.
Atlantic salmon genome reference (fasta file), annotation (gff3 file) and gene expression data (Sequence Read Archive accession: PRJNA260929) were produced as the part of ICSASG. The RefSeq annotation for Atlantic salmon was added later when it became available. New Rainbow trout genome reference (fasta file) and annotations (gff3) were produced by The international collaboration to sequence the Rainbow trout genome.
Utility and discussion
Genome browser (GBrowse)
We have chosen to use GBrowse [4] to visualize the genomic data for salmonids. Atlantic salmon [3] GBrowse contains two sets of genome annotations (Fig. 1), one made in-house during the assembly of the salmon genome and the other presenting the NCBI RefSeq annotation which, providing an option to compare results between the two different annotations. Currently, the Atlantic salmon genome browser contains genes, transcripts and repeat sequences. The transcript tracks are linked to tissue specific gene expression data, homeolog regions and sequence download options. Rainbow trout GBrowse contains the latest reference genome and an in-house genome annotation.
Data tracks can be downloaded in a variety of formats including GFF3, Genbank, and EMBL (European Molecular Biology Laboratory), while gene, protein and transcript sequences can be downloaded in FASTA, Genbank, and EMBL formats etc. Users can upload their own data (custom track option) in a variety of file formats and can customize track displays to visualize their data. Users can easily save and share search results as links, or export the results as PNG, SVG and other file formats for publication purposes. Navigation in the database is eased by clickable questionmarks.
BLAST server
SalmoBase BLAST [5] search allows the users to search their sequences against the entire reference genome, repeat masked genome, predicted protein sequences (CIGENE and/or RefSeq), as well as transcript sequence databases. The five top hits from BLAST are displayed including location of the alignment, size of the alignment and similarity between query and subject (Fig. 2). Search results are connected to the GBrowse so that information about nearby genes and other genomic features can be accessed.
Genetic variation browser (GVBrowse)
A genetic variation browser displays genetic variations in the Atlantic salmon genome. Genetic variations can be shown based on genomic location (e.g., ssa01:1–1000) or by gene symbols (e.g., ttn) (Fig. 3). Resulting SNPs and other DNA variations are displayed in a table format along with location, alleles, annotation (synonymous or non-synonymous), and location with respect to genes (intron, exon, upstream, downstream, or inter-genic).
It is possible to quickly obtain flanking sequence for each variation by following the link from the “SNP ID”. By clicking the genomic view image in the SNP ID link, additional information can also be obtained such as location of the genetic variation in genome sequence, nearby gene annotations and other genomic features. Flanking sequences for multiple genetic variations can be downloaded by selecting the wanted genetic variations and clicking the download button at the bottom of the result table.
Gene expression browser (GEBrowse)
The gene expression browser allows the users to access the tissue-specific gene expression data of Atlantic salmon. Users can search the gene expression browser by genomic coordinates (e.g., ssa02:1–100,000) or by gene symbols (e.g., ttn). Pre-plotted bar graphs (Fig. 4) based on the tissue-specific Fragments per Kilobase of Exon per Million Fragments Mapped (FPKM) values are displayed.
Future plans and intergration of other resources
SalmoBase was developed in close collaboration with ICSASG research groups. Through this collaboration more resources from other projects on salmonids will become available in the near future. As more data becomes available for Atlantic salmon, Rainbow trout and other salmonid species, new tools and resources will be added to SalmoBase. The SalmoBase team is also working closely with Functional Analysis of All Salmonid Genomes (FAASG) [6] and the results from FAASG will be accessible through Salmobase in the future.
Database access and feedback
Data are available for download under the ‘Download’ option in SalmoBase. User support is available through the ‘Contact’ form in SalmoBase. Suggestions for improvements and other comments are welcomed through the ‘Contact’ form. We will consider to include data from users who wish to deposit data into SalmoBase.
Conclusions
To the best of our knowledge SalmoBase is the only online database to access, visualize and download genomics data of salmonids. Due to rapid improvements in high-throughput sequencing technologies we expect a deluge for salmonids’ genomics data. SalmoBase is designed to accommodate the challenges. And, SalmoBase will play a vital role in studying salmonids.
Availability and requirements
SalmoBase can be accessed at www.salmobase.org.
Acknowledgements
We thank International Cooperation to Sequence the Atlantic Salmon Genome (ICSASG) for generating data presented in SalmoBase. Norwegian metacenter for computational science (under project nn4653k).
Funding
This project has received financial support from the Research Council of Norway, project no. 208481 (ELIXIR.NO). The funding source had no role in study design, data collection and interpretation and in writing the manuscript.
Availability of data and materials
This work does not contain additional data.
Abbreviations
- EMBL
European Molecular Biology Laboratory
- FAASG
Functional Analysis of All Salmonid Genomes
- FPKM
Fragments per Kilobase of Exon per Million Fragments Mapped
- GBrowse
Genome Browser
- GEBrowse
Gene Expression Browser
- GVBrowse
Genetic Variation Browser
- ICSASG
International Cooperation to Sequence the Atlantic Salmon
- SNP
single nucleotide polymorphisms
- WGD
Whole Genome Duplicatio
Authors’ contributions
SL, DIV and JKAS conceived the idea of developing SalmoBase. JKAS developed salmobase with the help of TDM and suggestions from TN, SRS, MPK, FG, SL and DIV. JKAS wrote the draft and included comments from co-authors. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Consent for publication
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Jeevan Karloss Antony Samy, Phone: +47-99881889, Email: jeevan.karloss@nmbu.no.
Teshome Dagne Mulugeta, Email: teshome.mulugeta@nmbu.no.
Torfinn Nome, Email: torfinn.nome@nmbu.no.
Simen Rød Sandve, Email: simen.sandve@nmbu.no.
Fabian Grammes, Email: fabian.grammes@nmbu.no.
Matthew Peter Kent, Email: matthew.peter.kent@nmbu.no.
Sigbjørn Lien, Email: sigbjorn.lien@nmbu.no.
Dag Inge Våge, Email: daginge.vage@nmbu.no.
References
- 1.Macqueen DJ, Johnston IA. A well-constrained estimate for the timing of the salmonid whole genome duplication reveals major decoupling from species diversification. Proceedings Biological sciences / The Royal Society. 2014;281(1778):20132881. doi: 10.1098/rspb.2013.2881. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Davidson WS, Koop BF, Jones SJ, Iturra P, Vidal R, Maass A, et al. Sequencing the genome of the Atlantic salmon (Salmo salar). Genome Biol. 2010;11(9):403. [DOI] [PMC free article] [PubMed]
- 3.Lien S, Koop BF, Sandve SR, Miller JR, Kent MP, Nome T, et al. The Atlantic salmon genome provides insights into rediploidization. Nature. 2016;533:200–5. [DOI] [PMC free article] [PubMed]
- 4.Donlin MJ. Using the generic genome browser (GBrowse) Curr Protoc Bioinformatics. 2009;Chapter 9:Unit 9.9. doi: 10.1002/0471250953.bi0909s28. [DOI] [PubMed] [Google Scholar]
- 5.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 6.Macqueen DJ, Primmer CR, Houston RD, Nowak BF, Bernatchez L, Bergseth S, Davidson WS, Gallardo-Escarate C, Goldammer T, Guiguen Y, et al. Functional Analysis of All Salmonid Genomes (FAASG): an international initiative supporting future salmonid research, conservation and aquaculture. bioRxiv. 2016. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
This work does not contain additional data.