Skip to main content
PeerJ logoLink to PeerJ
. 2021 Apr 2;9:e11112. doi: 10.7717/peerj.11112

A total of 219 metagenome-assembled genomes of microorganisms from Icelandic marine waters

Clara Jégousse 1,2, Pauline Vannier 2, René Groben 2, Frank Oliver Glöckner 3,4, Viggó Marteinsson 1,2,
Editor: Michael Rappe
PMCID: PMC8020865  PMID: 33859876

Abstract

Marine microorganisms contribute to the health of the global ocean by supporting the marine food web and regulating biogeochemical cycles. Assessing marine microbial diversity is a crucial step towards understanding the global ocean. The waters surrounding Iceland are a complex environment where relatively warm salty waters from the Atlantic cool down and sink down to the deep. Microbial studies in this area have focused on photosynthetic micro- and nanoplankton mainly using microscopy and chlorophyll measurements. However, the diversity and function of the bacterial and archaeal picoplankton remains unknown. Here, we used a co-assembly approach supported by a marine mock community to reconstruct metagenome-assembled genomes (MAGs) from 31 metagenomes from the sea surface and seafloor of four oceanographic sampling stations sampled between 2015 and 2018. The resulting 219 MAGs include 191 bacterial, 26 archaeal and two eukaryotic MAGs to bridge the gap in our current knowledge of the global marine microbiome.

Keywords: Metagenomics, Metagenome-assembled genomes, Iceland, Bacteria, Archaea

Introduction

Marine microorganisms are crucial to the global ecosystem as they regulate the carbon cycle (Azam, 1998; Falkowski, Fenchel & Delong, 2008) and support the marine food web (Pomeroy, 1974; Azam et al., 1983). The study of microorganisms within complex environments, such as the ocean, was accelerated by the emergence of sequencing technologies. In particular, metagenomics—the study of the total genetic material recovered from an environmental sample—have provided previously unavailable information on the functional diversity and ecology of the microbial communities within their environments  (Hugenholtz & Tyson, 2008; Quince et al., 2017).

Large-scale metagenomics projects, such as the Global Ocean Sampling (Venter et al., 2004; Rusch et al., 2007), Ocean Sampling Day (Kopf et al., 2015) and Tara Oceans (Sunagawa et al., 2015; Sunagawa et al., 2020), have provided fascinating new insights, but also revealed the gaps in our knowledge of marine microbial species, their geographical distribution, and their organisation in complex and dynamic communities. These and other large-scale initiatives have so far not covered the oceanic regions around Iceland, a complex marine environment that is characterized by distinct water masses and powerful currents: the cold Polar Water of the East Greenland Current and the Arctic Water of the East Icelandic Current from the north and the warm North Atlantic Water of the Irminger Current from the south (Malmberg, Valdimarsson & Mortensen, 1995; Valdimarsson & Malmberg, 1999). Most microbial studies in Icelandic waters have so far been conducted with traditional methods, like chlorophyll measurements or microscopy, and were therefore mainly focused on larger heterotrophs and photosynthetic microorganisms (Thórdardóttir, 1986; Gudmundsson, 1998; Astthorsson, Gislason & Jonsson, 2007). To establish the baseline knowledge of microbial ecology in Icelandic marine waters, we assembled metagenomic sequence data into draft microbial genomes often called metagenome-assembled genomes (MAGs).

The recovery of MAGs opens the route to further analysis such as comparative genomics to understand the roles of these microorganisms within their community and ecosystem (Sangwan, Xia & Gilbert, 2016). MAGs are particularly valuable for yet uncultured marine lineages as they reveal the metabolic potential and environmental adaptation of these microorganisms and give clues about trophic interactions and ecology within the environment. Several marine metagenomic studies recovered MAGs from marine environments with—among others—136 MAGs from the Red Sea (Haroon et al., 2016), 290 from the Mediterranean Sea  (Tully et al., 2017), and 2,631 from the global oceans with data harvested by Tara Oceans (Tully, Graham & Heidelberg, 2018).

Here, we report 219 MAGs from 31 samples collected in the Arctic Ocean north of Iceland and in the warmer Atlantic waters south of Iceland. The samples were collected between 2015 and 2018 at four established oceanographic sampling stations visited during six research cruises with two depths sampled at each station. A set of metadata is available for these samples following the best practices recommended by Ten Hoopen et al. (2017), offering an opportunity to further understand the environmental conditions that shape the microbial communities in the waters off the Icelandic coasts.

Materials & Methods

Sampling

Seawater samples were collected between May 2015 and May 2018 from four stations, two in the North Atlantic Ocean, Selvogsbanki 2 and 5 (SB2 and SB5), and two in the Arctic Ocean, Siglunes 3 and 8 (SI3 and SI8) (Fig. 1A and Table 1). Sampling was conducted on board of the oceanographic research vessel Bjarni Sæmundsson RE 30 operated by the Icelandic Marine Research Institute (MRI) by collecting 5 L of seawater from the surface and the seafloor of the ocean, using Niskin bottles on a CTD rosette sampler. Seawater samples were directly filtered onto 0.22 µm Sterivex filter units (Merck Millipore) and immediately flash frozen in liquid nitrogen before stored at −80°C until further processing (full workflow in Fig. 1B).

Figure 1. (A) Sampling stations location and coordinates. (B) Workflow of bio-molecular processes and downstream analysis.

Figure 1

Table 1. Sampling dates and locations with corresponding seawater temperature and salinity.

Sampling date Station ID Latitude (dd.mm) Longitude (dd.mm) Depth (m) Temperature (°C) Salinity (PSU)
23.05.2015 SI8 67.9993 −18.8313 1,045 −0.481 34.913
30.05.2015 SB5 62.9822 −21.4737 0 7.632 35.195
30.05.2015 SB5 62.9822 −21.4737 1,004 4.391 34.998
23.05.2016 SI8 68.0100 −18.8247 0 1.632 34.869
23.05.2016 SI8 68.0100 −18.8247 1,045 −0.431 34.914
31.05.2016 SB5 62.9936 −21.4839 0 8.147 35.113
31.05.2016 SB5 62.9936 −21.4839 1,004 4.722 35.017
21.05.2017 SI8 68.0094 −18.8325 1,045 2.700 34.852
21.05.2017 SI8 68.0094 −18.8325 0 −0.381 34.914
22.05.2017 SI3 66.5342 −18.8378 470 5.517 34.492
22.05.2017 SI3 66.5342 −18.8378 0 0.151 34.906
30.05.2017 SB5 62.9878 −21.4800 1,004 8.477 34.761
30.05.2017 SB5 62.9878 −21.4800 0 4.801 35.009
09.08.2017 SI3 66.5344 −18.8419 0 9.980 34.310
09.08.2017 SI3 66.5344 −18.8419 470 0.190 34.900
09.08.2017 SI8 68.0006 −18.8375 1,045 7.640 34.650
09.08.2017 SI8 68.0006 −18.8375 0 −0.370 34.910
18.08.2017 SB2 63.4933 −20.9569 0 12.000 33.700
18.08.2017 SB2 63.4933 −20.9569 90 8.470 34.940
18.08.2017 SB5 62.9883 −21.4867 0 12.200 34.980
18.08.2017 SB5 62.9883 −21.4867 1,004 4.730 35.010
16.02.2018 SI3 66.5442 −18.8400 470 0.044 34.901
16.02.2018 SI8 68.0000 −18.8386 0 0.533 34.640
16.02.2018 SI8 68.0000 −18.8386 1,045 −0.410 34.914
18.05.2018 SI8 68.0058 −18.8256 0 1.355 34.727
18.05.2018 SI8 68.0058 −18.8256 1,045 −0.428 34.914
20.05.2018 SI3 66.5439 −18.8406 0 5.108 34.894
29.05.2018 SB2 63.4942 −20.9008 0 7.625 34.913
29.05.2018 SB2 63.4942 −20.9008 90 7.298 35.031
29.05.2018 SB5 62.9858 −21.4731 0 7.740 35.042
29.05.2018 SB5 62.9858 −21.4731 1,004 4.488 34.978

Mock community

A marine mock community was included in the analysis for quality control, consisting of 20 bacterial and two archaeal species. Strains were cultivated according to Table 2. After 12 to 24 h of growth (to obtain 10e6 to 10e8 cell/ml), cells were counted on a Thoma cell BRAND (ref. 718020; 0.100 mm depth) to achieve a final concentration of 1.29 × 10e9 cell/L by dilutions. Synthetic seawater was prepared by adding 150 g of sea salts (Sigma-Aldrich, S9883 and 17.25 g of PIPES (Sigma-Aldrich, P1851) to 5 L of autoclaved MilliQ water. The mock community was immediately treated in the same manner as the other seawater samples and filtered onto Sterivex filters for DNA extraction.

Table 2. List of bacterial and archaeal species in the mock community.

Strains were obtained from the Icelandic Strain Collection and Records (ISCAR) or the German Collection of Microorganisms and Cell Cultures (DSMZ: https://www.dsmz.de/). Recipes for growth media can be found at if not otherwise indicated.

Domain Species name % identity Collection number Growth parameters Successfully reassembled
Bacteria Alteromonas naphthalenivorans 99.66% ISCAR-05201 Marine Broth, 22°C, pH 6.8, aerobic condition Yes
Bacteria Jeotgalibacillus marinus 100% ISCAR-03118 Marine Broth, 22°C, pH 6.8, aerobic condition No
Bacteria Geobacillus thermoleovorans 100% ISCAR-00004 162 media, 65°C, pH 7.0, aerobic condition No
Bacteria Colwellia psychrerythraea 99% ISCAR-05175 Marine Broth, 22°C, pH 6.8, aerobic condition Yes
Bacteria Dietzia psychralcaliphila 99.52% ISCAR-05191 92 media, 22°C, pH 6.8, aerobic condition No
Bacteria Escherichia coli 100% ISCAR-02961 LB media, 37°C, pH 7.0, aerobic condition Yes
Bacteria Pseudomonas salina 99.83% ISCAR-05249 Marine Broth media, 22°C, pH 6.8, aerobic condition No
Bacteria Marinobacter psychrophilus 99.84% ISCAR-05186 Marine Broth media, 22°C, pH 6.8, aerobic condition Yes
Bacteria Photobacterium indicum 100% ISCAR-05002 Marine Broth media, 22°C, pH 6.8, aerobic condition Yes
Bacteria Pseudoalteromonas neustonica 98.58% ISCAR-05312 172 media, 22°C, pH 6.8, aerobic condition Yes
Bacteria Reinekea aestuarii 100% DSM 29881 Marine Broth media, 22°C, pH 6.8, aerobic condition No
Bacteria Reinekea marinisedimentorum 100% DSM 15388 Marine Broth media, 30°C, pH 6.8, aerobic condition Yes
Bacteria Rhodococcus kyotonensis 99.23% ISCAR-05221 Marine Broth media,22°C, pH 6.8, aerobic condition No
Bacteria Reinekea sp. 84 97.75% with Reinekea marina ISCAR-05258 Marine Broth media, 22°C, pH 6.8, aerobic condition No
Bacteria Sulfitobacter sp. 87 97.73% with Sulfitobacter donghicola ISCAR-05261 Marine Broth media, 22°C, pH 6.8, aerobic condition No
Bacteria Sulfitobacter donghicola 100% DSM 23563 Marine Broth media, 22°C, pH 6.8, aerobic condition Yes
Bacteria Sulfitobacter guttiformis 100% DSM 11544 Marine Broth media, 22°C, pH 6.8, aerobic condition Yes
Bacteria Sulfitobacter pontiacus 100% DSM 10014 Marine Broth media, 22°C, pH 6.8, aerobic condition Yes
Bacteria Sulfitobacter undariae 100% DSM 102234 Marine Broth media, 22°C, pH 6.8, aerobic condition No
Bacteria Thermus thermophilus 100% ISCAR-03915 166 media, 65°C, pH 7.0, aerobic condition No
Bacteria Vibrio cyclitrophicus 100% ISCAR-06209 Marine Broth media, 22°C, pH 6.8, aerobic condition No
Archaea Pyrococcus abyssi 100% DSM 25543 YPS1 media, 90°C, pH 7, anaerobic condition, elemental sulfur Yes
Archaea Thermococcus barophilus 100% DSM 11836 TRM2, 85°C, pH 6.5, anaerobic condition, elemental sulfur Yes

Notes.

DNA extractions

DNA was extracted from all samples using the QIAGEN AllPrep kit according to the manufacturer’s instructions with modifications. Sterivex filters were aseptically removed from their plastic casing as described by Cruaud et al. (2017). Filters were transferred to tubes containing 600 µl RTL buffer from the kit and 0.2 g of 0.1 mm zirconia/silica beads (BioSpec, cat. 11079101z) for mechanical disruption of the cells (bead-beating) using a Disrupt MixerMill MM400 by Retsch with the program P9 (300 Hz) three times for 10 s each, cooling down tubes in icy water in between each bead-beating step. DNA quality was assessed with a NanoDrop 1000 Spectrophotometer (ThermoFisher) and DNA was quantified with a Qubit fluorometer (Qubit DNA BR assay, Invitrogen).

Library preparation and sequencing

High-throughput sequencing of the samples was performed by Genome Quebec using the HiSeq system (Illumina). Libraries were prepared using NEBNext UltraTM II DNA Library Prep Kit for Illumina (New England Biolabs) followed by sequencing on two lanes of an Illumina HiSeq 4000 PE150 system (Illumina) allocating 1/20 and 1/25 of a lane for each sample. Demultiplexing and conversion to FASTQ files were performed using bcl2fastq Conversion Software v1.8.4 (Illumina) resulting in 32 metagenomic datasets.

Co-assembly and binning

The quality of the raw sequencing reads was assessed using FastQC v0.11.8 (Andrews et al., 2012) (Fig. S1). Quality control of the raw reads was performed with Sunbeam v2.0.2 (Clarke et al., 2019) which includes trimming with Trimmomatic v0.36 (Bolger, Lohse & Usadel, 2014), adapter removal with Cutadapt v2.6 (Martin, 2011) (parameters PE -phred33 ILLUMINACLIP: NexteraPE-PE.fa:2:30:10:8:true LEADING: 3 TRAILING: 3 SLIDINGWINDOW: 4:15 MINLEN: 36), removal of low complexity sequences using Sunbeam Komplexity (default parameter) and removal of contaminating human sequences using the Genome Reference Consortium Human Build 38 patch release 13 GRCh38.p13 (Lander et al., 2001; Schneider et al., 2017). Resulting quality-filtered metagenomic data were divided into surface and seafloor datasets as the surface of the ocean can be considered a different environment compared to the seafloor (Fig. S2). Both datasets also included the mock community. After quality filtering, MEGAHIT v1.2.9 (Li et al., 2015; Li et al., 2016) (parameters: –min-contig-len 1000 -m 0.85) co-assembled both datasets of samples with a minimum contig length of 1000 bp, resulting in two FASTA files of community contigs. Quality-filtered short reads from each sample were mapped back to the contigs of both co-assemblies respectively using Bowtie v2 (default parameters and –no-unal flag) (Langmead & Salzberg, 2012). The resulting SAM files were indexed and converted to BAM files with SAMTOOLS v0.3.3 (parameters: view -F 4 -bS) (Li et al., 2009). For both co-assemblies, the FASTA files containing the contigs were formatted with the script reformat-fasta from Anvi’o v6.2 (Eren et al., 2015). The two contigs databases (the surface and the seafloor databases) were generated with Anvi’o, BAM files were profiled and merged to the respective databases. Automated binning was performed using Anvi’o script anvi-cluster-contigs with default parameters with three binning algorithms: CONCOCT v1.1.0 (Alneberg et al., 2013), MaxBin2 v2.2.6 (Wu, Simmons & Singer, 2016), and MetaBAT 2 v2:2.15 (Kang et al., 2019). For all binning results, completeness and redundancy of the bins were estimated with Anvio’s script anvi-estimate-genome-completeness which relies on CheckM v1.1.3 (Parks et al., 2015). Based on the comparison of the three binning algorithms, we selected the “good quality bins” from MetaBAT 2 with an estimated completion above 50% and an estimated redundancy below 10% according to standards suggested by Bowers et al. (2017). The relative proportions of good quality bins in the total number of bins was assessed by chi2 test.

Functional assignment, taxonomy and phylogenomic trees

We used PRODIGAL v2.6.3 (Hyatt et al., 2010) to identify Open Reading Frames (ORFs) within the contigs. The resulting ORFs were processed with Kaiju v1.7.3 (Menzel, Ng & Krogh, 2016) and NCBI nr+euk database (nr_euk 2019-06-25, 46GB, available for download at for taxonomic assignment. Beside the contig-based taxonomic assignment, we used GTDB-Tk v1.3.0 (Genome Taxonomy Database Toolkit) (Chaumeil et al., 2019) to construct two bacterial and two archaeal phylogenomic trees containing good quality MAGs (completeness ≥50%; contamination ≤10%) and Genome Taxonomy Data Bank (GTDB) R95 (released in July 2020) reference genomes to confirm taxonomic assignments of the MAGs (Parks et al., 2018). The trees were reconstructed using ARB (Ludwig et al., 2004) for comprehensive visualisation.

Data availability

The raw Illumina sequencing paired-end reads are available in the ENA under project accession number PRJEB41565 (ERP125360). MAGs are available under accession numbers ERS5621908 to ERS5622126. Code is available at https://github.com/clarajegousse/.

Results

Co-assemblies

The co-assembly of the 16 samples of the surface of the ocean yielded 445,328 contigs, with a minimal length of 1,000 bp, representing a total length of 1.06 Gb (1,060,942,783 nucleotides) with N50 of 2,627 bp and 1,271,859 gene calls (Table 3).

Table 3. Statistics summary of co-assemblies.

Surface Seafloor
Total nucleotides 1.06 Gb 1.23 Gb
N50 2,382 bp 2,327 bp
L50 83,272 bp 114,549 bp
Number of contigs 445,328 554,104
Longest contig 864,343 bp 1,302,516 bp
Shortest contig 1,000 bp 1,000 bp
Number of contigs >10 kb 8,521 8,306
Number of genes (Prodigal) 1,271,859 1,532,800

The co-assembly of the 17 samples of the seafloor of the ocean yielded 554,104 contigs, with a minimal length of 1,000 bp, representing a total of length of 1.23 Gb (1,233,390,295 nucleotides) with N50 of 2,327 bp and 1,532,800 gene calls (Table 3).

Binning

A comparison of the three binning algorithms - CONCOCT, MaxBin2 and MetaBAT 2 - was conducted on the surface and seafloor co-assemblies based on the number of good quality bins (Fig. 2). Good quality bins have an estimated completion above 50% and an estimated redundancy (also called estimated contamination) below 10% (Bowers et al., 2017). The relative proportions of good quality bins is significantly different for the three binning methods (χ2 = 135.23, df = 2, p-value <2.2e−16). The results of the binning showed that MetaBAT 2 resulted in a lower number of bins compared to CONCOCT and MaxBin2. Yet the number of good quality bins was much higher with MetaBAT 2 compared with CONCOCT and MaxBin2 (Table 4).

Figure 2. Binning comparison. Numbers of contigs binned and numbers of bad and good quality bins obtained with CONCOCT, MaxBin2 and MetaBAT 2 from the surface co-assembly (A) and the seafloor co-assembly (B).

Figure 2

Numbers of contigs binned is represented by the size of the pie plots. Numbers and percentages of bad quality bins and good quality bins are shown within the grey and coloured slices of the chart respectively. Good quality bins have an estimated completion above 50% and an estimated redundancy (also called estimated contamination) below 10% (Bowers et al., 2017).

Table 4. Statistics summary of co-assemblies.

Co-assembly Binning method Number of bins Number of MAGs Average completeness (%) Average contamination (%)
Surface CONCOCT 319 43 45.15 49.23
Surface MaxBin2 302 17 25.77 13.30
Surface MetaBAT 2 279 118 44.12 3.46
Seafloor CONCOCT 259 28 51.26 90.39
Seafloor MaxBin2 358 18 34.59 18.63
Seafloor MetaBAT 2 299 134 49.90 7.13

MetaBAT 2 gave the best results which were used for further analysis and shown in more detail in Fig. 3. Out of the 279 bins identified by MetaBAT 2 for the surface samples, 42.4% (118) of them are good quality bins that can be considered draft MAGs according to Bowers et al. (2017). Within the 118 good quality MAGs (Fig. 3B), 16 represent genomes of organisms from the mock community and 102 are assembled from the surface seawater. In the same manner, out of the 299 bins identified by MetaBAT 2 for the seafloor samples, 45.81% (134) of can be considered good draft MAGs. Within the 134 good quality MAGs (Fig. 3D), 17 represent genomes of organisms from the mock community and 117 are assembled from the seawater at the seafloor. The relative proportions of MAGs out of the total number of bins is the same out of the two co-assemblies datasets (χ2 = 0.27784, df = 1, p-value = 0.5981) which means that the environments do not seem to impact significantly the number of MAGs. In the same manner, the relative proportions of MAGs associated to the mock community out of the total number of MAGs is the same in the two co-assemblies datasets (χ2 = 0.0003, df = 1, p-value = 0.9858).

Figure 3. Assessment of bin quality with the estimated completeness as a function of the redundancy.

Figure 3

Bad quality bins (completeness below 50% and redundancy above 10%) are shown in grey while good quality bins are in colours (green for surface, blue for seafloor samples). (A) A total of 279 bins obtained with MetaBAT 2 from the surface co-assembly with 118 good quality bins. (B) Good quality bins from the surface co-assembly with the identification bins corresponding to members of the mock community. (C) A total of 299 bins obtained with MetaBAT 2 from the seafloor co-assembly with 134 good quality bins. (D) Good quality bins from the seafloor with the identification of the bins corresponding to members of the mock community.

Taxonomy

When excluding members of the mock community based on taxonomic assignment and differential coverage, we identified 102 MAGs reconstructed from the surface co-assembly and 117 MAGs from the seafloor co-assembly. The surface MAGs include two eukaryotes (Bathycoccus and Micromonas), 92 bacteria, and eight archaea while the seafloor MAGs include 99 bacteria, 18 archaea and no eukaryotes.

The surface co-assembly yielded a total of 92 bacterial MAGs (Fig. 4). These MAGs are members of seven phyla (number of MAGs in brackets): Proteobacteria (52), Bacteroidota (31), Actinobacteriota (2), Verrumicrobiota (2), Planctomycetota (2), SAR324 (1) and Cyanobacteria (1). The MAG within the Cyanobacteria phylum belongs to the genus Synechococcus. Within the phylum Actinobacteriota, we retrieved two MAGs: one from a member of the genus Aquiluna and one of the genus Pontimonas. We reconstructed two MAGs within the phylum Planctomycetota. The two MAGs within the Verrumicrobiota belong to the family Akkermansiaceae. The Bacteroidota phylum includes 31 MAGs reconstructed from the sea surface co-assembly. Most of these Bacteroidota MAGs belong to the Flavobacteriaceae family (18), including one representant of the genus Polaribacter. Many MAGs within the Flavobacteriaceae family are related to MAGs revealed by Tara Ocean Consortium such as Cryomorphaceae bacterium and Flavobacteriales bacterium (CFB group bacteria). We also reconstructed 52 MAGs belonging to the phylum of Proteobacteria, including nine Rhodobacteraceae, ten SAR86 and ten Porticoccaceae. Within the three MAGs of the Burkholderiales order, one is within the Burkholderia genus, and the two others belong to the Methylophilaceae family according to GTDB.

Figure 4. Bacterial phylogenomic tree.

Figure 4

Distribution of the Marine Icelandic MAGs across 76 bacterial phyla from GTDB. The maximum likelihood tree was inferred from the concatenation of 120 proteins spanning a dereplicated set of 191,527 bacterial genomes (GTDB 05-RS95 released on the 17th July 2020) and the Marine Icelandic MAGs. Phyla containing MAGs from the surface seawater, seafloor or both are shown in green, blue or teal respectively. Number of Marine Icelandic MAGs from the surface and the seafloor in each phylum are indicated in between parenthesis in green and blue respectively.

The seafloor co-assembly yielded a total of 99 bacterial MAGs spanning across 12 phyla: Proteobacteria (46), Verrumicrobiota (9), Bacteroidota (9), Marinisomatota (8), Actinobacteria (5), Planctomycetota (5), Gemmatimonadota (4), Nitrospinota (3), Chloroflexota (2), SAR324 (2), Myxococcota (1), Lactescibacterota (1). Six of these phyla include exclusively MAGs from the seafloor (Nitrospinota, Myxococcota, Gemmatimonadota, Marinisomatota, Chloroflexa, Lactescibacterota). Within the Proteobacteria, most of the MAGs belong to the Gammaproteobacteria class with 32 MAGSs while the remaining 14 are part of the Alphaproteobacteria. Five orders within the Proteobacteria exclusively include MAGs reconstructed from the seafloor co-assembly (Rhizobiales, Rhodospirillales, TMED109, UBA10353, UBA4486) and none from the surface co-assembly.

Out of the 21 bacterial species of the mock community, 12 of them were re-assembled and given the correct taxonomic assignment down to species level (if available for the strain used) for Alteromonas sp., Geobacillus marinus, Colwellia sp., Escherichia coli, Marinobacter sp., Photobacterium sp., Pseudoalteromonas sp., Reinekea marinisedimentorum, Sulfitobacter donghicola, Sulfitobacter guttiformis, Sulfitobacter pontiacus and Thermus thermophilus. However, some distinct species of the mock community that belong to the same genus do not match any specific MAGs but seem to have been reassembled as one single MAG within the genus in question, such as Reinekea aestuarii and Reinekea sp. 84 as well as Sulfitobacter undariae and Sulfitobacter sp. 87. The genomes of Bacillus thermoleovorans, Dietzia sp., Halomonas sp. and Vibrio cyclitrophicus were not reassembled.

The surface co-assembly yielded only eight archaeal MAGs (Fig. 5), all within the Thermoplasmota phylum, including three MAGs within the genus MGIIb-O2 of the Thalassarchaeaceae family and five within the Poseidoniaceae family. The seafloor co-assembly resulted in 18 archaeal MAGs including one representant of the Thermoproteota phylum: this MAGs belongs to the UBA57 phylum within the order of the Nitrososphaerales. The 17 other archaeal MAGs are all comprised in the Thermoplasmatota phylum, within the class Poseidoniia, including representatives of the Poseidoniaceae and Thalassarchaeaceae families. The two archaeal members within the mock community (Pyrococcus abyssi and Thermococcus barophilus) were successfully reconstructed in both co-assemblies.

Figure 5. Archaeal phylogenomic tree.

Figure 5

Distribution of the Marine Icelandic MAGs across 18 archaeal phyla from GTDB. The maximum likelihood tree was inferred from the concatenation of 122 proteins spanning a dereplicated set of 3,073 archaeal genomes (GTDB 05-RS95 released on the 17th July 2020) and the Marine Icelandic MAGs. Phyla containing MAGs from the surface seawater, seafloor or both are shown in green, blue or teal respectively. Number of Marine Icelandic MAGs from the surface and the seafloor in each phylum are indicated in between parenthesis in green and blue respectively.

Discussion

Mock communities are used to quantify and characterise biases introduced in the sample processing pipeline (Brooks et al., 2015) and are indispensable to benchmark sequencing methods and downstream analysis (Singer et al., 2016; Sevim et al., 2019). Mock communities can also be used as a positive control for metagenomic studies. Our mock community confirmed that MetaBAT 2 was able to resolve genomes of species within the same genus, thus making it the most suitable binning algorithms out of the three tested in this study: CONCOCT, MaxBin2 and MetaBAT 2. This result is consistent with previous studies (Yue et al., 2020).

The ocean is a vast continuum and the samples were taken within a relatively small section/fraction of the North Atlantic Ocean at several sampling depths: the surface and the seafloor (90 m, 470 m, 1,006 m, and 1,060 m depending on the station). The differences in the sampling depth implies differences in lighting, pressure and temperature compared to the surface of the ocean. While the surface of the ocean is subjected to seasonal variations in day light and temperature, the seafloor remains darker and colder than the surface, and such parameters are driving microbial community structure and function. Therefore, we considered the surface and the seafloor of the ocean as two different types of environments which justifies our approach of two co-assemblies rather than assembling all of the 32 samples together. The fact that a number of MAGs were exclusively found in only one of the two environments, confirmed this.

Conclusions

The goal of this study was to reconstruct MAGs from 31 samples from Icelandic sea waters. The 219 MAGs span across 13 bacterial and two archaeal phyla and contribute to a more define picture of the global marine microbiome. Moreover, this study confirms, thanks to the inclusion of a mock community in the analysis, that the combination of co-assembly and binning with MetaBAT 2 allows, despite a relatively shallow sequencing depth, the recovery of quality MAGs that are a precious resource for further ecological and environmental studies.

Supplemental Information

Supplemental Information 1. Number of raw reads of 32 metagenomic datasets.

Metagenomic datasets from 32 samples (31 seawater samples and mock community). Number of reads displayed depending on the sampling locations and times.

DOI: 10.7717/peerj.11112/supp-1
Supplemental Information 2. Principal Component Analysis (PCoA) based on Bray–Curtis dissimilarity computed by SimkaMin (Benoit et al., 2020).

(A) Experimental variable. (B) Environmental and geographic variables.

DOI: 10.7717/peerj.11112/supp-2

Acknowledgments

The authors would like to thank Kristinn Gudmundsson and Bjarni Saemundsson’s crew from the Marine Research Institute, and Pauline Bergsten and Mia Cerfonteyn from the University of Iceland & Matís for sampling, Antonio Fernandez Guerra from the Max Plank Institute for Marine Microbiology and Arnar Pálsson from the University of Iceland for advice and Elvar Örn Jónsson from the University of Iceland for technical support. The analyses presented in the study were performed using the resources provided by the Icelandic High Performance Computing Centre at the University of Iceland.

Funding Statement

The work is part of the Microbes in the Icelandic Marine Environment (MIME) project which was funded by the Grant of Excellence (No. 163266-051) of the Icelandic Research Fund (Rannís). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Additional Information and Declarations

Competing Interests

Clara Jégousse, Pauline Vannier, René Groben and Viggó Marteinsson are employees of Matís ohf.

Author Contributions

Clara Jégousse conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Pauline Vannier conceived and designed the experiments, performed the experiments, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

René Groben, Frank Oliver Glöckner and Viggó Marteinsson conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

DNA Deposition

The following information was supplied regarding the deposition of DNA sequences:

Data are available at the ENA under project number PRJEB41565: all MAGs: ERS5621908 to ERS5622126; the surface and seafloor co-assemblies: ERS5565811 and ERS5565812.

Data Availability

The following information was supplied regarding data availability:

Code is available at Github:

https://github.com/clarajegousse/mime.

The following data are available at ENA:

- Raw data, co-assemblies and MAGs: PRJEB41565.

- Raw sequence data for the mock community: ERS5472810 to ERS5472840, and ERS5475418.

- The surface and seafloor co-assemblies: ERS5565811 and ERS5565812 respectively.

- MAGs: ERS5621908 to ERS5622126.

References

  • Alneberg et al. (2013).Alneberg J, Bjarnason BS, De Bruijn I, Schirmer M, Quick J, Ijaz UZ, Loman NJ, Andersson AF, Quince C. CONCOCT: clustering contigs on coverage and composition. 2013 doi: 10.1038/nmeth.3103.1312.4038 [DOI] [PubMed]
  • Andrews et al. (2012).Andrews S, Krueger F, Segonds-Pichon A, Biggins L, Krueger C, Wingett S. FastQC. Babraham: Babraham Institute; 2012. [Google Scholar]
  • Astthorsson, Gislason & Jonsson (2007).Astthorsson OS, Gislason A, Jonsson S. Climate variability and the Icelandic marine ecosystem. Deep Sea Research Part II: Topical Studies in Oceanography. 2007;54(23–26):2456–2477. doi: 10.1016/j.dsr2.2007.07.030. [DOI] [Google Scholar]
  • Azam (1998).Azam F. Microbial control of oceanic carbon flux: the plot thickens. Science. 1998;280(5364):694–696. doi: 10.1126/science.280.5364.694. [DOI] [Google Scholar]
  • Azam et al. (1983).Azam F, Fenchel T, Field JG, Gray J, Meyer-Reil L, Thingstad F. The ecological role of water-column microbes in the sea. Marine Ecology Progress Series. 1983;10:257–263. [Google Scholar]
  • Benoit et al. (2020).Benoit G, Mariadassou M, Robin S, Schbath S, Peterlongo P, Lemaitre C. SimkaMin: fast and resource frugal de novo comparative metagenomics. Bioinformatics. 2020;36(4):1275–1276. doi: 10.1093/bioinformatics/btz685. [DOI] [PubMed] [Google Scholar]
  • Bolger, Lohse & Usadel (2014).Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Bowers et al. (2017).Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy T. BK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, Tringe SG, Ivanova NN, Copeland A, Clum A, Becraft ED, Malmstrom RR, Birren B, Podar M, Bork P, Weinstock GM, Garrity GM, Dodsworth JA, Yooseph S, Sutton G, Glöckner FO, Gilbert JA, Nelson WC, Hallam SJ, Jungbluth SP, Ettema TJG, Tighe S, Konstantinidis KT, Liu W-T, Baker BJ, Rattei T, Eisen JA, Hedlund B, McMahon KD, Fierer N, Knight R, Finn R, Cochrane G, Karsch-Mizrachi I, Tyson GW, Rinke C, Schriml L, Hugenholtz P, Yilmaz P, Meyer F, Lapidus A, Parks DH, Murat Eren A, Banfield JF, Woyke T, TGS Consortium Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology. 2017;35(8):725–731. doi: 10.1038/nbt.3893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Brooks et al. (2015).Brooks JP, Edwards DJ, Harwich MD, Rivera MC, Fettweis JM, Serrano MG, Reris RA, Sheth NU, Huang B, Girerd P, Strauss JF, Jefferson KK, Buck GA, (additional members), VMC The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiology. 2015;15(1):66. doi: 10.1186/s12866-015-0351-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Chaumeil et al. (2019).Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2019;36(6):1925–1927. doi: 10.1093/bioinformatics/btz848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Clarke et al. (2019).Clarke EL, Taylor LJ, Zhao C, Connell A, Lee J-J, Fett B, Bushman FD, Bittinger K. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome. 2019;7(1):46. doi: 10.1186/s40168-019-0658-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Cruaud et al. (2017).Cruaud P, Vigneron A, Fradette M-S, Charette SJ, Rodriguez MJ, Dorea CC, Culley AI. Open the Sterivex casing: an easy and effective way to improve DNA extraction yields. Limnology and Oceanography: Methods. 2017;15(12):1015–1020. doi: 10.1002/lom3.10221. [DOI] [Google Scholar]
  • Erauso et al. (1993).Erauso G, Reysenbach A-L, Godfroy A, Meunier J-R, Crump B, Partensky F, Baross JA, Marteinsson V, Barbier G, Pace NR, Prieur D. Pyrococcus abyssi sp. nov., a new hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent. Archives of Microbiology. 1993;160(5):338–349. doi: 10.1007/BF00252219. [DOI] [Google Scholar]
  • Eren et al. (2015).Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, Delmont TO. Anvio: an advanced analysis and visualization platform for omics data. PeerJ. 2015;3:e1319. doi: 10.7717/peerj.1319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Falkowski, Fenchel & Delong (2008).Falkowski PG, Fenchel T, Delong EF. The microbial engines that drive Earth’s biogeochemical cycles. Science. 2008;320(5879):1034–1039. doi: 10.1126/science.1153213. [DOI] [PubMed] [Google Scholar]
  • Gudmundsson (1998).Gudmundsson K. Long-term variation in phytoplankton productivity during spring in Icelandic waters. ICES Journal of Marine Science. 1998;55(4):635–643. doi: 10.1006/jmsc.1998.0391. [DOI] [Google Scholar]
  • Haroon et al. (2016).Haroon MF, Thompson LR, Parks DH, Hugenholtz P, Stingl U. A catalogue of 136 microbial draft genomes from Red Sea metagenomes. Scientific Data. 2016;3(1):1–6. doi: 10.1038/sdata.2016.50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Hugenholtz & Tyson (2008).Hugenholtz P, Tyson GW. Metagenomics. Nature. 2008;455(7212):481–483. doi: 10.1038/455481a. [DOI] [PubMed] [Google Scholar]
  • Hyatt et al. (2010).Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11(1):119. doi: 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Kang et al. (2019).Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, Wang Z. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019;7:e7359. doi: 10.7717/peerj.7359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Kopf et al. (2015).Kopf A, Bicak M, Kottmann R, Schnetzer J, Kostadinov I, Lehmann K, Fernandez-Guerra A, Jeanthon C, Rahav E, Ullrich M, Wichels A, Gerdts G, Polymenakou P, Kotoulas G, Siam R, Abdallah RZ, Sonnenschein EC, Cariou T, O’Gara F, Jackson S, Orlic S, Steinke M, Busch J, Duarte B, Caçador I, Canning-Clode J, Bobrova O, Marteinsson V, Reynisson E, Loureiro CM, Luna GM, Quero GM, Löscher CR, Kremp A, DeLorenzo ME, Øvreås L, Tolman J, LaRoche J, Penna A, Frischer M, Davis T, Katherine B, Meyer CP, Ramos S, Magalhães C, Jude-Lemeilleur F, Aguirre-Macedo ML, Wang S, Poulton N, Jones S, Collin R, Fuhrman JA, Conan P, Alonso C, Stambler N, Goodwin K, Yakimov MM, Baltar F, Bodrossy L, Van De Kamp J, Frampton DM, Ostrowski M, Van Ruth P, Malthouse P, Claus S, Deneudt K, Mortelmans J, Pitois S, Wallom D, Salter I, Costa R, Schroeder DC, Kandil MM, Amaral V, Biancalana F, Santana R, Pedrotti ML, Yoshida T, Ogata H, Ingleton T, Munnik K, Rodriguez-Ezpeleta N, Berteaux-Lecellier V, Wecker P, Cancio I, Vaulot D, Bienhold C, Ghazal H, Chaouni B, Essayeh S, Ettamimi S, Zaid EH, Boukhatem N, Bouali A, Chahboune R, Barrijal S, Timinouni M, El Otmani F, Bennani M, Mea M, Todorova N, Karamfilov V, ten Hoopen P, Cochrane G, L’Haridon S, Bizsel KC, Vezzi A, Lauro FM, Martin P, Jensen RM, Hinks J, Gebbels S, Rosselli R, De Pascale F, Schiavon R, dos Santos A, Villar E, Pesant S, Cataletto B, Malfatti F, Edirisinghe R, Silveira J. AH, Barbier M, Turk V, Tinta T, Fuller WJ, Salihoglu I, Serakinci N, Ergoren MC, Bresnan E, Iriberri J, Nyhus P. AF, Bente E, Karlsen HE, Golyshin PN, Gasol JM, Moncheva S, Dzhembekova N, Johnson Z, Sinigalliano CD, Gidley ML, Zingone A, Danovaro R, Tsiamis G, Clark MS, Costa AC, El Bour M, Martins AM, Collins RE, Ducluzeau A-L, Martinez J, Costello MJ, Amaral-Zettler LA, Gilbert JA, Davies N, Field D, Glöckner FO. The ocean sampling day consortium. GigaScience. 2015;4(1):27. doi: 10.1186/s13742-015-0066-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Lander et al. (2001).Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng J-F, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen H-C, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert J. GR, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit A. FA, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang S-P, Yeh R-F, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Patrinos A, Morgan MJ, IHGS Consortium. Whitehead Institute for Biomedical Research, CfGR. TS Centre. WUGS Center. UDJG Institute. BC of Medicine Human Genome Sequencing Center. RGS Center. RGS Genoscope UMR-8030. IoMB Department of Genome Analysis. GS Center. BGIG Center. TIfSB Multimegabase Sequencing Center. SGT Center. U of Oklahoma’s Advanced Center for Genome Technology. MPI for Molecular Genetics. LAHGC Cold Spring Harbor Laboratory. G-GRC for Biotechnology. *Genome Analysis Group (listed in alphabetical order, a. i. i. l. u. oh. U. N. I. oH Scientific management: National Human Genome Research Institute. SHG Center. U of Washington Genome Center. K. U. S. oM Department of Molecular Biology. U of Texas Southwestern Medical Center at Dallas. U. D. oE Office of Science. Trust TW Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
  • Langmead & Salzberg (2012).Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Li et al. (2015).Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–1676. doi: 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]
  • Li et al. (2016).Li D, Luo R, Liu C-M, Leung C-M, Ting H-F, Sadakane K, Yamashita H, Lam T-W. MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods. 2016;102:3–11. doi: 10.1016/j.ymeth.2016.02.020. [DOI] [PubMed] [Google Scholar]
  • Li et al. (2009).Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Ludwig et al. (2004).Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar, Buchner A, Lai T, Steppi S, Jobb G, Frster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, Knig A, Liss T, Lmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer K. ARB: a software environment for sequence data. Nucleic Acids Research. 2004;32(4):1363–1371. doi: 10.1093/nar/gkh293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Malmberg, Valdimarsson & Mortensen (1995).Malmberg S-A, Valdimarsson H, Mortensen J. Long time series in Icelandic Waters, in relation to physical variability in the northern north Atlantic. Ocean Challenge. 1995;6:48–51. [Google Scholar]
  • Marteinsson et al. (1999).Marteinsson VT, Birrien J-L, Reysenbach A-L, Vernet M, Marie D, Gambacorta A, Messner P, Sleytr UB, Prieur D. Thermococcus barophilus sp. nov., a new barophilic and hyperthermophilic archaeon isolated under high hydrostatic pressure from a deep-sea hydrothermal vent. International Journal of Systematic and Evolutionary Microbiology. 1999;49(2):351–359. doi: 10.1099/00207713-49-2-351. [DOI] [PubMed] [Google Scholar]
  • Martin (2011).Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. Journal. 2011;17(1):10–12. [Google Scholar]
  • Menzel, Ng & Krogh (2016).Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications. 2016;7(1):11257. doi: 10.1038/ncomms11257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Parks et al. (2018).Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, Hugenholtz P. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology. 2018;36(10):996–1004. doi: 10.1038/nbt.4229. [DOI] [PubMed] [Google Scholar]
  • Parks et al. (2015).Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research. 2015;25(7):1043–1055. doi: 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Pomeroy (1974).Pomeroy LR. The ocean’s food web, a changing paradigm. Bioscience. 1974;24(9):499–504. doi: 10.2307/1296885. [DOI] [Google Scholar]
  • Quince et al. (2017).Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nature Biotechnology. 2017;35(9):833–844. doi: 10.1038/nbt.3935. [DOI] [PubMed] [Google Scholar]
  • Rusch et al. (2007).Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers Y-H, Falcn LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, Platt T, Bermingham E, Gallardo V, Tamayo-Castillo G, Ferrari MR, Strausberg RL, Nealson K, Friedman R, Frazier M, Venter JC. The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLOS Biology. 2007;5(3):e77. doi: 10.1371/journal.pbio.0050077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Sangwan, Xia & Gilbert (2016).Sangwan N, Xia F, Gilbert JA. Recovering complete and draft population genomes from metagenome datasets. Microbiome. 2016;4(1):8. doi: 10.1186/s40168-016-0154-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Schneider et al. (2017).Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen H-C, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT, Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin C-S, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Research. 2017;27(5):849–864. doi: 10.1101/gr.213611.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Sevim et al. (2019).Sevim V, Lee J, Egan R, Clum A, Hundley H, Lee J, Everroad RC, Detweiler AM, Bebout BM, Pett-Ridge J, Göker M, Murray AE, Lindemann SR, Klenk H-P, O’Malley R, Zane M, Cheng J-F, Copeland A, Daum C, Singer E, Woyke T. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Scientific Data. 2019;6(1):285. doi: 10.1038/s41597-019-0287-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Singer et al. (2016).Singer E, Andreopoulos B, Bowers RM, Lee J, Deshpande S, Chiniquy J, Ciobanu D, Klenk H-P, Zane M, Daum C, Clum A, Cheng J-F, Copeland A, Woyke T. Next generation sequencing data of a defined microbial mock community. Scientific Data. 2016;3(1):160081. doi: 10.1038/sdata.2016.81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Sunagawa et al. (2020).Sunagawa S, Acinas SG, Bork P, Bowler C, Acinas SG, Babin M, Boss E, Cochrane G, De Vargas C, Follows M, Gorsky G, Grimsley N, Guidi L, Hingamp P, Iudicone D, Jaillon O, Kandels S, Karp-Boss L, Karsenti E, Lescot M, Not F, Ogata H, Pesant S, Poulton N, Raes J, Sardet C, Sieracki M, Speich S, Stemmann L, Sullivan MB, Wincker P, Eveillard D, Lombard F, Pesant S, Sullivan MB, Tara Oceans Coordinators Tara Oceans: towards global ocean ecosystems biology. Nature Reviews Microbiology. 2020;18(8):428–445. doi: 10.1038/s41579-020-0364-5. [DOI] [PubMed] [Google Scholar]
  • Sunagawa et al. (2015).Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A, Cornejo-Castillo FM, Costea PI, Cruaud C, d’Ovidio F, Engelen S, Ferrera I, Gasol JM, Guidi L, Hildebrand F, Kokoszka F, Lepoivre C, Lima-Mendez G, Poulain J, Poulos BT, Royo-Llonch M, Sarmento H, Vieira-Silva S, Dimier C, Picheral M, Searson S, Kandels-Lewis S, Bowler C, de Vargas C, Gorsky G, Grimsley N, Hingamp P, Iudicone D, Jaillon O, Not F, Ogata H, Pesant S, Speich S, Stemmann L, Sullivan MB, Weissenbach J, Wincker P, Karsenti E, Raes J, Acinas SG, Bork P. Structure and function of the global ocean microbiome. Science. 2015;348:6237. doi: 10.1126/science.1261359. [DOI] [PubMed] [Google Scholar]
  • Ten Hoopen et al. (2017).Ten Hoopen P, Finn RD, Bongo LA, Corre E, Fosso B, Meyer F, Mitchell A, Pelletier E, Pesole G, Santamaria M, Willassen NP, Cochrane G. The metagenomic data life-cycle: standards and best practices. Gigascience. 2017;6(8):1–11. doi: 10.1093/gigascience/gix047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Thórdardóttir (1986).Thórdardóttir T. The role of freshwater outflow in coastal marine ecosystems. Springer-Verlag; Berlin Heidelberg: 1986. Timing and duration of spring blooming south and southwest of Iceland; pp. 345–360. [DOI] [Google Scholar]
  • Tully, Graham & Heidelberg (2018).Tully BJ, Graham ED, Heidelberg JF. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Scientific Data. 2018;5:170203. doi: 10.1038/sdata.2017.203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Tully et al. (2017).Tully BJ, Sachdeva R, Graham ED, Heidelberg JF. 290 metagenome-assembled genomes from the Mediterranean Sea: a resource for marine microbiology. PeerJ. 2017;5:e3558. doi: 10.7717/peerj.3558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • Valdimarsson & Malmberg (1999).Valdimarsson H, Malmberg S-A. Near-surface circulation in Icelandic waters derived from satellite tracked drifters. Rit Fiskideild. 1999;16:23–40. [Google Scholar]
  • Venter et al. (2004).Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers Y-H, Smith HO. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304(5667):66–74. doi: 10.1126/science.1093857. [DOI] [PubMed] [Google Scholar]
  • Wu, Simmons & Singer (2016).Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32(4):605–607. doi: 10.1093/bioinformatics/btv638. [DOI] [PubMed] [Google Scholar]
  • Yue et al. (2020).Yue Y, Huang H, Qi Z, Dou H-M, Liu X-Y, Han T-F, Chen Y, Song X-J, Zhang Y-H, Tu J. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinformatics. 2020;21(1):1–15. doi: 10.1186/s12859-019-3325-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information 1. Number of raw reads of 32 metagenomic datasets.

Metagenomic datasets from 32 samples (31 seawater samples and mock community). Number of reads displayed depending on the sampling locations and times.

DOI: 10.7717/peerj.11112/supp-1
Supplemental Information 2. Principal Component Analysis (PCoA) based on Bray–Curtis dissimilarity computed by SimkaMin (Benoit et al., 2020).

(A) Experimental variable. (B) Environmental and geographic variables.

DOI: 10.7717/peerj.11112/supp-2

Data Availability Statement

The raw Illumina sequencing paired-end reads are available in the ENA under project accession number PRJEB41565 (ERP125360). MAGs are available under accession numbers ERS5621908 to ERS5622126. Code is available at https://github.com/clarajegousse/.

The following information was supplied regarding data availability:

Code is available at Github:

https://github.com/clarajegousse/mime.

The following data are available at ENA:

- Raw data, co-assemblies and MAGs: PRJEB41565.

- Raw sequence data for the mock community: ERS5472810 to ERS5472840, and ERS5475418.

- The surface and seafloor co-assemblies: ERS5565811 and ERS5565812 respectively.

- MAGs: ERS5621908 to ERS5622126.


Articles from PeerJ are provided here courtesy of PeerJ, Inc

RESOURCES