Skip to main content
Wellcome Open Research logoLink to Wellcome Open Research
. 2025 Apr 28;9:189. Originally published 2024 Apr 12. [Version 2] doi: 10.12688/wellcomeopenres.21122.2

The genome sequence of the Atlantic cod, Gadus morhua (Linnaeus, 1758)

Sissel Jentoft 1, Ole K Tørresen 1,a, Ave Tooming-Klunderud 2, Morten Skage 2, Spyridon Kollias 2, Kjetill S Jakobsen 1; Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory team; Wellcome Sanger Institute Scientific Operations: Sequencing Operations; Wellcome Sanger Institute Tree of Life Core Informatics team; Tree of Life Core Informatics collective; Darwin Tree of Life Consortium
PMCID: PMC11367075  PMID: 39224768

Version Changes

Revised. Amendments from Version 1

In response to reviewers' comments, we have now expanded and rewritten parts of the Introduction/Background, to clarify that we here describe previous results and studies conducted using earlier version of the genome assmblies of Atlantic cod.

Abstract

We present a genome assembly from an individual male Gadus morhua (the Atlantic cod; Chordata; Actinopteri; Gadiformes; Gadidae). The genome sequence is 669.9 megabases in span. Most of the assembly is scaffolded into 23 chromosomal pseudomolecules. Gene annotation of this assembly on Ensembl identified 23,515 protein coding genes.

Keywords: Gadus morhua, Atlantic cod, genome sequence, chromosomal, Gadiformes

Species taxonomy

Eukaryota; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Actinopterygii; Actinopteri; Neopterygii; Teleostei; Osteoglossocephalai; Clupeocephala; Euteleosteomorpha; Neoteleostei; Eurypterygia; Ctenosquamata; Acanthomorphata; Paracanthopterygii; Zeiogadaria; Gadariae; Gadiformes; Gadoidei; Gadidae; Gadus; Gadus morhua (Linnaeus, 1758) (NCBI:txid8049).

Background

Atlantic cod ( Gadus morhua) ( Figure 1) is a highly abundant and ecologically important marine fish species distributed throughout the Northern Atlantic Ocean. As a top predator it plays a critical role in maintaining the ecosystem functioning and services, i.e., by regulating the abundance of smaller pelagic fish and invertebrates ( Bogstad et al., 1994; Holt et al., 2019). It has played a major role in fisheries and trade in both western and eastern Northern Atlantic for hundreds of years ( Hutchings & Myers, 1994; Star et al., 2017), and is still a highly valued marine resource worldwide. To better aid in stock assessment and identification of true management units, as well as an in-depth characterisation of the genomic makeup of one of the world most successful marine species, the first version of the Atlantic cod genome was released in 2011 ( Star et al., 2011). It was one of the very first vertebrate genomes sequenced using only next generation sequencing technologies. With this original version of the Atlantic cod genome it was demonstrated that Atlantic cod has lost major histocompatibility complex (MHC) II genes, thought to be an essential part of the adaptive immune system in all vertebrates ( Star et al., 2011). Furthermore, additional genome sequencing of a larger selection of Gadiformes and other fish clades have shown that this loss was shared within the entire Gadiformes lineage, caused by a single evolutionary event around 80–100 Mya ( Malmstrøm et al., 2016). Another intriguing discovery, based on both the original draft genome and the improved second version of the genome (gadMor2), is that Atlantic cod has an unusual high proportion of simple tandem repeats (STRs) compared to most other vertebrates ( Reinar et al., 2023; Star et al., 2011; Tørresen et al., 2017). Such high number of STRs may have strong evolutionary implications, and has been linked to adaptations to environments such as habitat (marine vs. freshwater) as well as production of eggs (i.e. fecundity) in teleosts ( Reinar et al., 2023).

Figure 1. Photography of Gadus morhua (not the specimen used for genome sequencing).

Figure 1.

Photograph by Hans-Petter Fjeld.

Moreover, population genome sequencing in combination with the chromosome anchored reference genome revealed that three larger chromosomal inversions largely distinguish the iconic migratory ecotype the northeast Arctic cod and the stationary non-migratory Norwegian coastal cod ( Berg et al., 2016; Berg et al., 2017). Extending the geographical range further uncovered two genetically distinct co-existing ecotypes in the southernmost fjord systems of Norway: one fjord-type and one more offshore oceanic-type ( Barth et al., 2017; Barth et al., 2019; Jorde et al., 2018), with allele frequency differences in a total of four chromosomal inversions as well as differentiation at the genome-wide level ( Barth et al., 2017; Barth et al., 2019; Sodeland et al., 2016; Sodeland et al., 2022). Recent studies using the gadMor2 genome have shown that the four larger chromosomal inversions most likely have arisen as separate evolutionary events from 400,000 to over a Mya ( Matschiner et al., 2022).

The second version of the genome assembly (gadMor2) was generated using a combination of 454, Illumina and PacBio reads, anchoring scaffolds into chromosomes based on a linkage map ( Tørresen et al., 2017) with fifty-fold larger contig N50 than the first version. However, further improvement in sequencing technologies have enabled us to generate an even more complete genome assembly (gadMor3) for Atlantic cod presented in this genome note. This assembly will further aid in e.g. the detection of additional structural variants and other genomic reorganizations present in the Atlantic cod genome.

Genome sequence report

The sequenced genome originated from a male Gadus morhua specimen from the Northeast Arctic cod (NEAC) population, i.e. the same individual as used for the previous genome assemblies: NEAC_001 (also referred to as fGadMor1) and gadMor2 ( Star et al., 2011; Tørresen et al., 2017). A total of 130-fold coverage in Pacific Biosciences single-molecule long reads, 167-fold coverage in 10X Genomics read clouds and 1416-fold coverage in BioNano reads was generated. Primary assembly contigs were scaffolded with chromosome conformation Hi-C data (76× coverage). Manual assembly curation corrected 429 missing joins or mis-joins and removed 14 haplotypic duplications, reducing the assembly length by 1.74% and the scaffold number by 42.05%, and increasing the scaffold N50 by 23.03%.

The final assembly has a total length of 669.9 Mb in 226 sequence scaffolds with a scaffold N50 of 28.7 Mb ( Table 1). The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3. The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla. Most (97.52%) of the assembly sequence was assigned to 23 chromosomal-level scaffolds ( Figure 5). Chromosome-scale scaffolds were named based on a genetic map provided by the Jakobsen lab ( Table 2). While not fully phased, the assembly deposited is of one haplotype. Contigs corresponding to an alternate haplotype have also been deposited.

Figure 2. Genome assembly of Gadus morhua, gadMor3.0: metrics.

Figure 2.

The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness. The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 584,119,146 bp assembly. The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (2,763,216 bp, shown in red). Orange and pale-orange arcs show the N50 and N90 scaffold lengths (278,683 and 82,741 bp), respectively. The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude. The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot. A summary of complete, fragmented, duplicated and missing BUSCO genes in the actinopterygii_odb10 set is shown in the top right. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Gadus%20morhua/dataset/CABHMC01.1/snail.

Figure 3. Genome assembly of Gadus morhua, gadMor3.0: BlobToolKit GC-coverage plot.

Figure 3.

Scaffolds are coloured by phylum. Circles are sized in proportion to scaffold length. Histograms show the distribution of scaffold length sum along each axis. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Gadus%20morhua/dataset/CABHMC01.1/blob.

Figure 4. Genome assembly of Gadus morhua, gadMor3.0: BlobToolKit cumulative sequence plot.

Figure 4.

The grey line shows cumulative length for all scaffolds. Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule. An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/Gadus%20morhua/dataset/CABHMC01.1/cumulative.

Figure 5. Genome assembly of Gadus morhua, gadMor3.0: Hi-C contact map of the gadMor3.0 assembly, visualised using HiGlass.

Figure 5.

Chromosomes are shown in order of size from left to right and top to bottom. An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=N-q-q64SS4-WKerHMnMklw.

Table 1. Genome data for Gadus morhua, gadMor3.0.

Project accession data
Assembly identifier gadMor3.0
Species Gadus morhua
Specimen NEAC_001/fGadMor1
NCBI taxonomy ID 8049
BioProject PRJEB33456
BioSample ID SAMEA5574046
Isolate information fGadMor1
Assembly metrics * Benchmark
Consensus quality (QV) 38.6 ≥ 40
k-mer completeness 99.56% ≥ 95%
BUSCO ** C:92.7%[S:91.8%,D:0.9%],
F:1.8%,M:5.5%,n:3,640
C ≥ 95%
Percentage of assembly
mapped to chromosomes
97.52% ≥ 95%
Raw data accessions
PacBio ERR7254624–ERR7254628
10X Genomics Illumina ERR5528096–ERR5528099
Genome assembly
Assembly accession GCA_902167405.1
Accession of alternate haplotype GCA_902167395.1
Span (Mb) 669.9
Number of contigs 1,441
Contig N50 length (Mb) 1.0
Number of scaffolds 226
Scaffold N50 length (Mb) 28.7
Longest scaffold (Mb) 30.9
Genome annotation
Number of protein-coding
genes
23,515
Number of non-coding genes 5,339
Number of gene transcripts 68,853

* Assembly metric benchmarks are adapted from column VGP-2020 of “Table 1: Proposed standards and metrics for defining genome assembly quality” from Rhie et al. (2021).

** BUSCO scores based on the actinopterygii_odb10 BUSCO set using v5.3.2. C = complete [S = single copy, D = duplicated], F = fragmented, M = missing, n = number of orthologues in comparison. A full set of BUSCO scores is available at https://blobtoolkit.genomehubs.org/view/Gadus%20morhua/dataset/CABHMC01.1/busco.

Table 2. Chromosomal pseudomolecules in the genome assembly of Gadus morhua, fGadMor1.

INSDC
accession
Chromosome Length (Mb) GC%
LR633943.1 1 30.9 45.5
LR633944.1 2 28.7 45.5
LR633945.1 3 31.0 45.5
LR633946.1 4 43.8 45.5
LR633947.1 5 25.3 46.0
LR633948.1 6 27.8 45.5
LR633949.1 7 34.1 45.5
LR633950.1 8 29.7 45.5
LR633951.1 9 26.4 45.5
LR633952.1 10 27.2 45.5
LR633953.1 11 30.7 46.0
LR633954.1 12 31.0 45.5
LR633955.1 13 28.8 45.5
LR633956.1 14 29.6 45.0
LR633957.1 15 28.7 45.5
LR633958.1 16 34.8 45.5
LR633959.1 17 21.7 46.0
LR633960.1 18 24.9 46.0
LR633961.1 19 22.0 45.5
LR633962.1 20 24.8 45.5
LR633963.1 21 22.4 46.0
LR633964.1 22 23.7 46.0
LR633965.1 23 25.2 46.0

The estimated Quality Value (QV) of the final assembly is 38.6 with k-mer completeness (for the combined haplotypes) of 99.56%, and the assembly has a BUSCO v5.3.2 completeness of 92.7% (single = 91.8%, duplicated = 0.9%), using the actinopterygii_odb10 reference set ( n = 3,640).

Metadata for specimens, barcode results, spectra estimates, sequencing runs, contaminants and pre-curation assembly statistics are given at https://links.tol.sanger.ac.uk/species/8049.

Genome annotation report

The Gadus morhua genome assembly (GCA_902167405.1) was annotated using the Ensembl rapid annotation pipeline at the European Bioinformatics Institute (EBI). The resulting annotation includes 68,853 transcribed mRNAs from 23,515 protein-coding and 5,339 non-coding genes ( Table 1; https://rapid.ensembl.org/Gadus_morhua_GCA_902167405.1/Info/Index).

Methods

Sample acquisition and nucleic acid extraction

The sequenced cod used in this study was a wild-caught male from the NEAC population, estimated at 8 years of age based on otolith readings, i.e. the same individual as used for the previously launched genome assemblies (i.e. NEAC_001, also referred to as fGadMor1) and gadMor2 ( Star et al., 2011; Tørresen et al., 2017). High molecular weight DNA was extracted from i) flash frozen blood (at Sanger) and ii) agarose blood plugs (at UiO) from the NEAC_001. DNA was dissolved overnight in 1 ml of TE-buffer. Quality and quantity of DNA were checked using NanoDrop (NanoDrop Products), PicoGreen Quant-iT™ (Invitrogen) and FLUOstar Optima (BMG Labtech) and through visual inspection of agarose gels.

Sequencing

PacBio data previously generated on the RSII and Sequel systems by the Jakobsen lab at the University of Oslo, Norway, were combined with data from 5 additional SMRTcells generated at the Wellcome Sanger Institute (WSI). In addition, Chromium 10X Genomics data were generated on the Illumina HiSeqX platform at WSI, and BioNano Saphyr DLE maps were produced for structural variant analysis. Arima Hi-C data were generated from heart and gill tissue at the Jakobsen lab and sequenced on Illumina HiSeq. Raw data can be accessed at GenomeArk.

Genome assembly, curation and evaluation

The assembly process included the following sequence of steps: initial PacBio assembly generation with Falcon-unzip ( Chin et al., 2016), retained haplotig identification with purge_dups ( Guan et al., 2020), 10X based scaffolding with scaff10x, BioNano hybrid-scaffolding, Hi-C based scaffolding with SALSA2 ( Ghurye et al., 2019), Arrow polishing, and two rounds of FreeBayes ( Garrison & Marth, 2012) polishing. The assembly was checked for contamination and corrected using the gEVAL system ( Chow et al., 2016) as described previously ( Howe et al., 2021). Manual curation was performed using gEVAL, HiGlass ( Kerpedjiev et al., 2018) and Pretext ( Harry, 2022). Chromosome-scale scaffolds were named based on a genetic map provided by the Jakobsen lab.

A Hi-C map for the final assembly was produced using bwa-mem2 ( Vasimuddin et al., 2019) in the Cooler file format ( Abdennur & Mirny, 2020). To assess the assembly metrics, the k-mer completeness and QV consensus quality values were calculated in Merqury ( Rhie et al., 2020). This work was done using Nextflow ( Di Tommaso et al., 2017) DSL2 pipelines “sanger-tol/readmapping” ( Surana et al., 2023a) and “sanger-tol/genomenote” ( Surana et al., 2023b). The genome was analysed within the BlobToolKit environment ( Challis et al., 2020) and BUSCO scores ( Manni et al., 2021; Simão et al., 2015) were calculated.

Table 3 contains a list of relevant software tool versions and sources.

Table 3. Software tools: versions and sources.

Genome annotation

The Ensembl Genebuild annotation system at the EBI ( Aken et al., 2016) was used to generate annotation for the Gadus morhua assembly (GCA_902167405.1). Annotation was created primarily through alignment of transcriptomic data to the genome, with gap filling via protein-to-genome alignments of a select set of proteins from UniProt ( UniProt Consortium, 2019).

Wellcome Sanger Institute – Legal and Governance

The materials that have contributed to this genome note have been supplied by a Darwin Tree of Life Partner. The submission of materials by a Darwin Tree of Life Partner is subject to the ‘Darwin Tree of Life Project Sampling Code of Practice’, which can be found in full on the Darwin Tree of Life website here. By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.

Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use. The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible. The overarching areas of consideration are:

•     Ethical review of provenance and sourcing of the material

•     Legality of collection, transfer and use (national and international)

Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner, Genome Research Limited (operating as the Wellcome Sanger Institute), and in some circumstances other Darwin Tree of Life collaborators.

Funding Statement

This work was supported by Wellcome through core funding to the Wellcome Sanger Institute [206194, <a href=https://doi.org/10.35802/206194>https://doi.org/10.35802/206194</a>] and the Darwin Tree of Life Discretionary Award [218328, <a href=https://doi.org/10.35802/218328>https://doi.org/10.35802/218328 </a>], and by the research Council of Norway (grant number 221734/O30) to K.S.J.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 2; peer review: 3 approved]

Data availability

European Nucleotide Archive: Gadus morhua (Atlantic cod). Accession number PRJEB33456; https://identifiers.org/ena.embl/PRJEB33456 ( Wellcome Sanger Institute, 2021). The genome sequence is released openly for reuse. The Gadus morhua genome sequencing initiative is part of the Darwin Tree of Life (DToL) project and the Vertebrate Genomes Project (VGP). The assembly has been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.

Author information

Members of the Wellcome Sanger Institute Tree of Life Management, Samples and Laboratory Team are listed here: https://doi.org/10.5281/zenodo.10066175.

Members of Wellcome Sanger Institute Scientific Operations: Sequencing Operations are listed here: https://doi.org/10.5281/zenodo.10043364.

Members of the Wellcome Sanger Institute Tree of Life Core Informatics team are listed here: https://doi.org/10.5281/zenodo.10066637.

Members of the Tree of Life Core Informatics collective are listed here: https://doi.org/10.5281/zenodo.5013541.

Members of the Darwin Tree of Life Consortium are listed here: https://doi.org/10.5281/zenodo.4783558.

References

  1. Abdennur N, Mirny LA: Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics. 2020;36(1):311–316. 10.1093/bioinformatics/btz540 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Aken BL, Ayling S, Barrell D, et al. : The Ensembl gene annotation system. Database (Oxford). 2016;2016: baw093. 10.1093/database/baw093 [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Barth JMI, Berg PR, Jonsson PR, et al. : Genome architecture enables local adaptation of Atlantic cod despite high connectivity. Mol Ecol. 2017;26(17):4452–4466. 10.1111/mec.14207 [DOI] [PubMed] [Google Scholar]
  4. Barth JMI, Villegas-Ríos D, Freitas C, et al. : Disentangling structural genomic and behavioural barriers in a sea of connectivity. Mol Ecol. 2019;28(6):1394–1411. 10.1111/mec.15010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Berg PR, Star B, Pampoulie C, et al. : Trans-oceanic genomic divergence of Atlantic cod ecotypes is associated with large inversions. Heredity (Edinb). 2017;119(6):418–428. 10.1038/hdy.2017.54 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Berg PR, Star B, Pampoulie C, et al. : Three chromosomal rearrangements promote genomic divergence between migratory and stationary ecotypes of Atlantic cod. Sci Rep. 2016;6(1): 23246. 10.1038/srep23246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bogstad B, Lilly GR, Mehl S, et al. : Cannibalism and year-class strength in Atlantic cod ( Gadus morhua) in Arcto-boreal ecosystems (Barents Sea, Iceland, and eastern Newfoundland). ICES J Mar Sci. 1994;198:576–599. Reference Source [Google Scholar]
  8. Challis R, Richards E, Rajan J, et al. : BlobToolKit - interactive quality assessment of genome assemblies. G3 (Bethesda). 2020;10(4):1361–1374. 10.1534/g3.119.400908 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chin CS, Peluso P, Sedlazeck FJ, et al. : Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13(12):1050–1054. 10.1038/nmeth.4035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chow W, Brugger K, Caccamo M, et al. : gEVAL - a web-based browser for evaluating genome assemblies. Bioinformatics. 2016;32(16):2508–2510. 10.1093/bioinformatics/btw159 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Di Tommaso P, Chatzou M, Floden EW, et al. : Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35(4):316–319. 10.1038/nbt.3820 [DOI] [PubMed] [Google Scholar]
  12. Garrison E, Marth G: Haplotype-based variant detection from short-read sequencing.2012. Reference Source
  13. Ghurye J, Rhie A, Walenz BP, et al. : Integrating Hi-C links with assembly graphs for chromosome-scale assembly. PLoS Comput Biol. 2019;15(8): e1007273. 10.1371/journal.pcbi.1007273 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Guan D, McCarthy SA, Wood J, et al. : Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics. 2020;36(9):2896–2898. 10.1093/bioinformatics/btaa025 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Harry E: PretextView (Paired REad TEXTure Viewer): a desktop application for viewing pretext contact maps.2022. Reference Source
  16. Holt RE, Bogstad B, Durant JM, et al. : Barents Sea cod ( Gadus morhua) diet composition: long-term interannual, seasonal, and ontogenetic patterns. ICES J Mar Sci. 2019;76(6):1641–1652. 10.1093/icesjms/fsz082 [DOI] [Google Scholar]
  17. Howe K, Chow W, Collins J, et al. : Significantly improving the quality of genome assemblies through curation. GigaScience. Oxford University Press,2021;10(1): giaa153. 10.1093/gigascience/giaa153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hutchings JA, Myers RA: What can be learned from the collapse of a renewable resource? Atlantic cod, Gadus morhua, of newfoundland and labrador. Can J Fish Aquat Sci. 1994;51(9):2126–2146. 10.1139/f94-214 [DOI] [Google Scholar]
  19. Jorde PE, Synnes AE, Espeland SH, et al. : Can we rely on selected genetic markers for population identification? Evidence from coastal Atlantic cod. Ecol Evol. 2018;8(24):12547–12558. 10.1002/ece3.4648 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kerpedjiev P, Abdennur N, Lekschas F, et al. : HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 2018;19(1): 125. 10.1186/s13059-018-1486-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Malmstrøm M, Matschiner M, Tørresen OK, et al. : Evolution of the immune system influences speciation rates in teleost fishes. Nat Genet. 2016;48(10):1204–1210. 10.1038/ng.3645 [DOI] [PubMed] [Google Scholar]
  22. Manni M, Berkeley MR, Seppey M, et al. : BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021;38(10):4647–4654. 10.1093/molbev/msab199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Matschiner M, Barth JMI, Tørresen OK, et al. : Supergene origin and maintenance in Atlantic cod. Nat Ecol Evol. 2022;6(4):469–481. 10.1038/s41559-022-01661-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Reinar WB, Tørresen OK, Nederbragt AJ, et al. : Teleost genomic repeat landscapes in light of diversification rates and ecology. Mob DNA. 2023;14(1): 14. 10.1186/s13100-023-00302-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Rhie A, McCarthy SA, Fedrigo O, et al. : Towards complete and error-free genome assemblies of all vertebrate species. Nature. 2021;592(7856):737–746. 10.1038/s41586-021-03451-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Rhie A, Walenz BP, Koren S, et al. : Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21(1): 245. 10.1186/s13059-020-02134-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Simão FA, Waterhouse RM, Ioannidis P, et al. : BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210–3212. 10.1093/bioinformatics/btv351 [DOI] [PubMed] [Google Scholar]
  28. Sodeland M, Jentoft S, Jorde PE, et al. : Stabilizing selection on Atlantic cod supergenes through a millennium of extensive exploitation. Proc Natl Acad Sci U S A. 2022;119(8): e2114904119. 10.1073/pnas.2114904119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Sodeland M, Jorde PE, Lien S, et al. : "Islands of Divergence" in the Atlantic cod genome represent polymorphic chromosomal rearrangements. Genome Biol Evol. 2016;8(4):1012–22. 10.1093/gbe/evw057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Star B, Boessenkool S, Gondek AT, et al. : Ancient DNA reveals the Arctic origin of Viking Age cod from Haithabu, Germany. Proc Natl Acad Sci U S A. 2017;114(34):9152–9157. 10.1073/pnas.1710186114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Star B, Nederbragt AJ, Jentoft S, et al. : The genome sequence of Atlantic cod reveals a unique immune system. Nature. 2011;477(7363):207–10. 10.1038/nature10342 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Surana P, Muffato M, Qi G: sanger-tol/readmapping: sanger-tol/readmapping v1.1.0 - Hebridean Black (1.1.0). Zenodo. 2023a. 10.5281/zenodo.7755669 [DOI] [Google Scholar]
  33. Surana P, Muffato M, Sadasivan Baby C: sanger-tol/genomenote (v1.0.dev). Zenodo. 2023b. 10.5281/zenodo.6785935 [DOI] [Google Scholar]
  34. Tørresen OK, Star B, Jentoft S, et al. : An improved genome assembly uncovers prolific tandem repeats in Atlantic cod. BMC Genomics. 2017;18(1): 95. 10.1186/s12864-016-3448-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. UniProt Consortium: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47(D1):D506–D515. 10.1093/nar/gky1049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Vasimuddin M, Misra S, Li H, et al. : Efficient architecture-aware acceleration of BWA-MEM for multicore systems. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).IEEE,2019;314–324. 10.1109/IPDPS.2019.00041 [DOI] [Google Scholar]
  37. Wellcome Sanger Institute: The genome sequence of the Atlantic cod, Gadus morhua (Linnaeus, 1758). European Nucleotide Archive, [dataset], accession number PRJEB33456,2021.
Wellcome Open Res. 2025 May 27. doi: 10.21956/wellcomeopenres.26385.r122748

Reviewer response for version 2

Ruiqi Li 1

The authors have addressed all my concerns.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Partly

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Genomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2025 May 13. doi: 10.21956/wellcomeopenres.26385.r122747

Reviewer response for version 2

Merly Escalona 1

My comments have been addressed. I have no other comments.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Partly

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

genome assembly

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2024 Sep 1. doi: 10.21956/wellcomeopenres.23364.r91124

Reviewer response for version 1

Merly Escalona 1

In this study, the authors present a high-quality genome assembly for a male Atlantic cod. The metadata and genome assembly processes provided are comprehensive. Though the metrics do not reach the benchmark values stated in the document, they match those of a high-quality genome.  

The minor comments presented below agree with the previous reviewer. Please check. 

  • The Background reads more like a discussion than an introduction. It introduces other resources, but the reasoning behind a new effort to assemble a genome, while somewhat trivial, is not explained. In addition, it is not clear if some of the results are from previous papers or a result of the current data note.

     
    •  For example, in the Background Section, 4th sentence:
       
      • Here, we showed that Atlantic cod has lost major histocompatibility complex (MHC) II genes, which are an essential part of the adaptive immune system ( Star et al., 2011)
      • Is that statement a result of the current paper? If so, how was this identified? If not, referring to previous studies, maybe the sentence should be rephrased, “ … In Start et al. 2011, it was shown that the Atlantic code has lost…”
  • It is not explained whether the mitochondrial genome has been generated, and Table 3 references the MitoHiFi pipeline but does not provide information on the results. If it was generated, please introduce some summary metrics for it.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Partly

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

genome assembly

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Wellcome Open Res. 2025 Mar 2.
Tree of Life Team Sanger 1

Thank you for reviewing this data note. We have responded to your points below.

  • The Background reads more like a discussion than an introduction. It introduces other resources, but the reasoning behind a new effort to assemble a genome, while somewhat trivial, is not explained. In addition, it is not clear if some of the results are from previous papers or a result of the current data note. For example, in the Background Section, 4th sentence: "Here, we showed that Atlantic cod has lost major histocompatibility complex (MHC) II genes, which are an essential part of the adaptive immune system ( Star et al., 2011)." Is that statement a result of the current paper? If so, how was this identified? If not, referring to previous studies, maybe the sentence should be rephrased, “ … In Start et al. 2011, it was shown that the Atlantic code has lost…”

Response: Thank you. We have tried to adjust this to make it clearer that this is referring to the 2011 paper.

  • It is not explained whether the mitochondrial genome has been generated, and Table 3 references the MitoHiFi pipeline but does not provide information on the results. If it was generated, please introduce some summary metrics for it.

Response: The mitochondrial genome was not assembled for this project, and we have removed the reference to the MitoHiFi pipeline in the table.

Wellcome Open Res. 2024 Aug 31. doi: 10.21956/wellcomeopenres.23364.r91121

Reviewer response for version 1

Leif Andersson 1

This brief report documents the release of a high-quality assembly of the cod genome. This will be a very valuable resource for future work on cod biology. 

I have only a few comments for minor improvements:

  • Background line 11. The citation to Reinar et al. is a bit misleading since the reader may get the impression that this is a cod paper, but Reinar et al. is a paper on Arabidopsis this should be rephrased.

  • Your statement that one heterozygous male was sequenced is confusing, all cod are heterozygous at some loci. What do you refer to here? An non-inbred male or a male heterozygous for a particularly important locus?

  • Shouldn’t the specimen name (fGadMor1) be the same as the Id number given in the text (NEAC_001)? Please clarify if this is a different individual compared with those used for the previous cod assemblies.

  • It would be useful if the authors could comment why the QV score (38.6) does not reach the bench mark (>50) despite the extensive amount of sequence data.

  • Table 2. I recommend that a consistent number of decimal places are used for Length in this table.

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Population and Evolutionary Genomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2025 Mar 2.
Tree of Life Team Sanger 1

Thank you for reviewing this data note. We have responded to your comments below. 

  • Background line 11. The citation to Reinar et al. is a bit misleading since the reader may get the impression that this is a cod paper, but Reinar et al. is a paper on  Arabidopsis this should be rephrased.

Response: Thanks for notifying us. We are now referring to a study done on teleosts which discusses the role of STRs.

  • Your statement that one heterozygous male was sequenced is confusing, all cod are heterozygous at some loci. What do you refer to here? An non-inbred male or a male heterozygous for a particularly important locus?

Response: We have removed the reference to a ‘heterozygous male’. It was ment to be coupled to male-heterogametic sex-determination found in cod, but we see that this was misleading.  

  • Shouldn’t the specimen name (fGadMor1) be the same as the Id number given in the text (NEAC_001)? Please clarify if this is a different individual compared with those used for the previous cod assemblies.

Response: This particular individual is the same as sequenced and published on in both 2011 and 2017, and it was called NEAC_001 then. When it came into the Sanger/VGP system, it got the specimen name fGadMor1. We have clarified this in the text.

 

  • It would be useful if the authors could comment why the QV score (38.6) does not reach the bench mark (>50) despite the extensive amount of sequence data.

  Response: Many assemblies reach the benchmark of a QV of more than 50 by using ample amounts of PacBio HiFi and comparing to a k-mer database of the same data. When comparing to a k-mer database of Illumina data (for instance Hi-C), this QV score is lower. In this particular project, the cod was sequenced with the older PacBio CLR technology, that is, not the newer HiFi/CCS. This older technology might not reach the same consensus quality as the newer technology and therefore we have this discrepancy. The EBP recommended standard for QV is 40, rather than 50, which we have also corrected here.

 

  • Table 2. I recommend that a consistent number of decimal places are used for Length in this table.

Response: We have reduced this to one decimal place.

Wellcome Open Res. 2024 Jun 10. doi: 10.21956/wellcomeopenres.23364.r85495

Reviewer response for version 1

Ruiqi Li 1

Jentoft et al. presented a high-quality genome of the Atlantic cod. I only have a few minor comments.

1. Introduction: The introduction reads more like results. While it introduces findings from previous versions of the genome, it should also discuss their limitations and explain how the new version improves upon them.

2. “Here, we showed that Atlantic cod has lost major histocompatibility complex (MHC) II genes,”: Clarify whether this finding is original to this study or based on previous research.

3. Biological/Ecological Background: Expand on the biological and ecological significance of the species in the introduction.

4. Mitochondrial Genome: Please double check if the mitochondrial genome was assembled and whether MitoHiFi was used.

5. Specimen Photo: Include a photo of the specimen used for sequencing

Are sufficient details of methods and materials provided to allow replication by others?

Yes

Is the rationale for creating the dataset(s) clearly described?

Partly

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Genomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Wellcome Open Res. 2025 Mar 2.
Tree of Life Team Sanger 1

Thank you for reviewing this data note. We have responded to your comments below. 

1. Introduction: The introduction reads more like results. While it introduces findings from previous versions of the genome, it should also discuss their limitations and explain how the new version improves upon them.

Response: Thanks for the comment. We have now expanded and rewritten parts of the Introduction/Background, to clarify that we are describing previous results and studies conducted using earlier version of the genome assemblies of Atlantic cod.

2. “Here, we showed that Atlantic cod has lost major histocompatibility complex (MHC) II genes,”: Clarify whether this finding is original to this study or based on previous research.

Response: As described above, we have now clarified this part of the Introduction, and we have changed the wording here to reflect that we indeed talk about the 2011 study.

3. Biological/Ecological Background: Expand on the biological and ecological significance of the species in the introduction.

Response: We agree, and we have now expanded this section.

4. Mitochondrial Genome: Please double check if the mitochondrial genome was assembled and whether MitoHiFi was used.

Response: The mitochondrial genome was not assembled in this particular project/study. MitoHiFi could not be used since this is based on the older Continous Long Read sequencing technology, and not the newer PacBio HiFi/CCS (Circular Consensus Sequence). We have omitted MitoHiFi from the software table.

5. Specimen Photo: Include a photo of the specimen used for sequencing.

Response: We do not have a photo of the particular specimen used for sequencing suitable in this context. What we have is a poor quality one of a partly dissected individual. We instead use a photo of a representative individual of the same species.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Wellcome Sanger Institute: The genome sequence of the Atlantic cod, Gadus morhua (Linnaeus, 1758). European Nucleotide Archive, [dataset], accession number PRJEB33456,2021.

    Data Availability Statement

    European Nucleotide Archive: Gadus morhua (Atlantic cod). Accession number PRJEB33456; https://identifiers.org/ena.embl/PRJEB33456 ( Wellcome Sanger Institute, 2021). The genome sequence is released openly for reuse. The Gadus morhua genome sequencing initiative is part of the Darwin Tree of Life (DToL) project and the Vertebrate Genomes Project (VGP). The assembly has been deposited in INSDC databases. Raw data and assembly accession identifiers are reported in Table 1.


    Articles from Wellcome Open Research are provided here courtesy of The Wellcome Trust

    RESOURCES