A total of 219 metagenome-assembled genomes of microorganisms from Icelandic marine waters

Clara Jégousse; Pauline Vannier; René Groben; Frank Oliver Glöckner; Viggó Marteinsson

doi:10.7717/peerj.11112

. 2021 Apr 2;9:e11112. doi: 10.7717/peerj.11112

A total of 219 metagenome-assembled genomes of microorganisms from Icelandic marine waters

Clara Jégousse ^1,², Pauline Vannier ², René Groben ², Frank Oliver Glöckner ^3,⁴, Viggó Marteinsson ^1,^2,^✉

Editor: Michael Rappe

PMCID: PMC8020865 PMID: 33859876

Abstract

Marine microorganisms contribute to the health of the global ocean by supporting the marine food web and regulating biogeochemical cycles. Assessing marine microbial diversity is a crucial step towards understanding the global ocean. The waters surrounding Iceland are a complex environment where relatively warm salty waters from the Atlantic cool down and sink down to the deep. Microbial studies in this area have focused on photosynthetic micro- and nanoplankton mainly using microscopy and chlorophyll measurements. However, the diversity and function of the bacterial and archaeal picoplankton remains unknown. Here, we used a co-assembly approach supported by a marine mock community to reconstruct metagenome-assembled genomes (MAGs) from 31 metagenomes from the sea surface and seafloor of four oceanographic sampling stations sampled between 2015 and 2018. The resulting 219 MAGs include 191 bacterial, 26 archaeal and two eukaryotic MAGs to bridge the gap in our current knowledge of the global marine microbiome.

Keywords: Metagenomics, Metagenome-assembled genomes, Iceland, Bacteria, Archaea

Introduction

Marine microorganisms are crucial to the global ecosystem as they regulate the carbon cycle (Azam, 1998; Falkowski, Fenchel & Delong, 2008) and support the marine food web (Pomeroy, 1974; Azam et al., 1983). The study of microorganisms within complex environments, such as the ocean, was accelerated by the emergence of sequencing technologies. In particular, metagenomics—the study of the total genetic material recovered from an environmental sample—have provided previously unavailable information on the functional diversity and ecology of the microbial communities within their environments (Hugenholtz & Tyson, 2008; Quince et al., 2017).

Large-scale metagenomics projects, such as the Global Ocean Sampling (Venter et al., 2004; Rusch et al., 2007), Ocean Sampling Day (Kopf et al., 2015) and Tara Oceans (Sunagawa et al., 2015; Sunagawa et al., 2020), have provided fascinating new insights, but also revealed the gaps in our knowledge of marine microbial species, their geographical distribution, and their organisation in complex and dynamic communities. These and other large-scale initiatives have so far not covered the oceanic regions around Iceland, a complex marine environment that is characterized by distinct water masses and powerful currents: the cold Polar Water of the East Greenland Current and the Arctic Water of the East Icelandic Current from the north and the warm North Atlantic Water of the Irminger Current from the south (Malmberg, Valdimarsson & Mortensen, 1995; Valdimarsson & Malmberg, 1999). Most microbial studies in Icelandic waters have so far been conducted with traditional methods, like chlorophyll measurements or microscopy, and were therefore mainly focused on larger heterotrophs and photosynthetic microorganisms (Thórdardóttir, 1986; Gudmundsson, 1998; Astthorsson, Gislason & Jonsson, 2007). To establish the baseline knowledge of microbial ecology in Icelandic marine waters, we assembled metagenomic sequence data into draft microbial genomes often called metagenome-assembled genomes (MAGs).

The recovery of MAGs opens the route to further analysis such as comparative genomics to understand the roles of these microorganisms within their community and ecosystem (Sangwan, Xia & Gilbert, 2016). MAGs are particularly valuable for yet uncultured marine lineages as they reveal the metabolic potential and environmental adaptation of these microorganisms and give clues about trophic interactions and ecology within the environment. Several marine metagenomic studies recovered MAGs from marine environments with—among others—136 MAGs from the Red Sea (Haroon et al., 2016), 290 from the Mediterranean Sea (Tully et al., 2017), and 2,631 from the global oceans with data harvested by Tara Oceans (Tully, Graham & Heidelberg, 2018).

Here, we report 219 MAGs from 31 samples collected in the Arctic Ocean north of Iceland and in the warmer Atlantic waters south of Iceland. The samples were collected between 2015 and 2018 at four established oceanographic sampling stations visited during six research cruises with two depths sampled at each station. A set of metadata is available for these samples following the best practices recommended by Ten Hoopen et al. (2017), offering an opportunity to further understand the environmental conditions that shape the microbial communities in the waters off the Icelandic coasts.

Materials & Methods

Sampling

Seawater samples were collected between May 2015 and May 2018 from four stations, two in the North Atlantic Ocean, Selvogsbanki 2 and 5 (SB2 and SB5), and two in the Arctic Ocean, Siglunes 3 and 8 (SI3 and SI8) (Fig. 1A and Table 1). Sampling was conducted on board of the oceanographic research vessel Bjarni Sæmundsson RE 30 operated by the Icelandic Marine Research Institute (MRI) by collecting 5 L of seawater from the surface and the seafloor of the ocean, using Niskin bottles on a CTD rosette sampler. Seawater samples were directly filtered onto 0.22 µm Sterivex filter units (Merck Millipore) and immediately flash frozen in liquid nitrogen before stored at −80°C until further processing (full workflow in Fig. 1B).

Table 1. Sampling dates and locations with corresponding seawater temperature and salinity.

Sampling date	Station ID	Latitude (dd.mm)	Longitude (dd.mm)	Depth (m)	Temperature (°C)	Salinity (PSU)
23.05.2015	SI8	67.9993	−18.8313	1,045	−0.481	34.913
30.05.2015	SB5	62.9822	−21.4737	0	7.632	35.195
30.05.2015	SB5	62.9822	−21.4737	1,004	4.391	34.998
23.05.2016	SI8	68.0100	−18.8247	0	1.632	34.869
23.05.2016	SI8	68.0100	−18.8247	1,045	−0.431	34.914
31.05.2016	SB5	62.9936	−21.4839	0	8.147	35.113
31.05.2016	SB5	62.9936	−21.4839	1,004	4.722	35.017
21.05.2017	SI8	68.0094	−18.8325	1,045	2.700	34.852
21.05.2017	SI8	68.0094	−18.8325	0	−0.381	34.914
22.05.2017	SI3	66.5342	−18.8378	470	5.517	34.492
22.05.2017	SI3	66.5342	−18.8378	0	0.151	34.906
30.05.2017	SB5	62.9878	−21.4800	1,004	8.477	34.761
30.05.2017	SB5	62.9878	−21.4800	0	4.801	35.009
09.08.2017	SI3	66.5344	−18.8419	0	9.980	34.310
09.08.2017	SI3	66.5344	−18.8419	470	0.190	34.900
09.08.2017	SI8	68.0006	−18.8375	1,045	7.640	34.650
09.08.2017	SI8	68.0006	−18.8375	0	−0.370	34.910
18.08.2017	SB2	63.4933	−20.9569	0	12.000	33.700
18.08.2017	SB2	63.4933	−20.9569	90	8.470	34.940
18.08.2017	SB5	62.9883	−21.4867	0	12.200	34.980
18.08.2017	SB5	62.9883	−21.4867	1,004	4.730	35.010
16.02.2018	SI3	66.5442	−18.8400	470	0.044	34.901
16.02.2018	SI8	68.0000	−18.8386	0	0.533	34.640
16.02.2018	SI8	68.0000	−18.8386	1,045	−0.410	34.914
18.05.2018	SI8	68.0058	−18.8256	0	1.355	34.727
18.05.2018	SI8	68.0058	−18.8256	1,045	−0.428	34.914
20.05.2018	SI3	66.5439	−18.8406	0	5.108	34.894
29.05.2018	SB2	63.4942	−20.9008	0	7.625	34.913
29.05.2018	SB2	63.4942	−20.9008	90	7.298	35.031
29.05.2018	SB5	62.9858	−21.4731	0	7.740	35.042
29.05.2018	SB5	62.9858	−21.4731	1,004	4.488	34.978

Open in a new tab

Mock community

A marine mock community was included in the analysis for quality control, consisting of 20 bacterial and two archaeal species. Strains were cultivated according to Table 2. After 12 to 24 h of growth (to obtain 10e6 to 10e8 cell/ml), cells were counted on a Thoma cell BRAND (ref. 718020; 0.100 mm depth) to achieve a final concentration of 1.29 × 10e9 cell/L by dilutions. Synthetic seawater was prepared by adding 150 g of sea salts (Sigma-Aldrich, S9883 and 17.25 g of PIPES (Sigma-Aldrich, P1851) to 5 L of autoclaved MilliQ water. The mock community was immediately treated in the same manner as the other seawater samples and filtered onto Sterivex filters for DNA extraction.

Table 2. List of bacterial and archaeal species in the mock community.

Strains were obtained from the Icelandic Strain Collection and Records (ISCAR) or the German Collection of Microorganisms and Cell Cultures (DSMZ: https://www.dsmz.de/). Recipes for growth media can be found at if not otherwise indicated.

Domain	Species name	% identity	Collection number	Growth parameters	Successfully reassembled
Bacteria	Alteromonas naphthalenivorans	99.66%	ISCAR-05201	Marine Broth, 22°C, pH 6.8, aerobic condition	Yes
Bacteria	Jeotgalibacillus marinus	100%	ISCAR-03118	Marine Broth, 22°C, pH 6.8, aerobic condition	No
Bacteria	Geobacillus thermoleovorans	100%	ISCAR-00004	162 media, 65°C, pH 7.0, aerobic condition	No
Bacteria	Colwellia psychrerythraea	99%	ISCAR-05175	Marine Broth, 22°C, pH 6.8, aerobic condition	Yes
Bacteria	Dietzia psychralcaliphila	99.52%	ISCAR-05191	92 media, 22°C, pH 6.8, aerobic condition	No
Bacteria	Escherichia coli	100%	ISCAR-02961	LB media, 37°C, pH 7.0, aerobic condition	Yes
Bacteria	Pseudomonas salina	99.83%	ISCAR-05249	Marine Broth media, 22°C, pH 6.8, aerobic condition	No
Bacteria	Marinobacter psychrophilus	99.84%	ISCAR-05186	Marine Broth media, 22°C, pH 6.8, aerobic condition	Yes
Bacteria	Photobacterium indicum	100%	ISCAR-05002	Marine Broth media, 22°C, pH 6.8, aerobic condition	Yes
Bacteria	Pseudoalteromonas neustonica	98.58%	ISCAR-05312	172 media, 22°C, pH 6.8, aerobic condition	Yes
Bacteria	Reinekea aestuarii	100%	DSM 29881	Marine Broth media, 22°C, pH 6.8, aerobic condition	No
Bacteria	Reinekea marinisedimentorum	100%	DSM 15388	Marine Broth media, 30°C, pH 6.8, aerobic condition	Yes
Bacteria	Rhodococcus kyotonensis	99.23%	ISCAR-05221	Marine Broth media,22°C, pH 6.8, aerobic condition	No
Bacteria	Reinekea sp. 84	97.75% with Reinekea marina	ISCAR-05258	Marine Broth media, 22°C, pH 6.8, aerobic condition	No
Bacteria	Sulfitobacter sp. 87	97.73% with Sulfitobacter donghicola	ISCAR-05261	Marine Broth media, 22°C, pH 6.8, aerobic condition	No
Bacteria	Sulfitobacter donghicola	100%	DSM 23563	Marine Broth media, 22°C, pH 6.8, aerobic condition	Yes
Bacteria	Sulfitobacter guttiformis	100%	DSM 11544	Marine Broth media, 22°C, pH 6.8, aerobic condition	Yes
Bacteria	Sulfitobacter pontiacus	100%	DSM 10014	Marine Broth media, 22°C, pH 6.8, aerobic condition	Yes
Bacteria	Sulfitobacter undariae	100%	DSM 102234	Marine Broth media, 22°C, pH 6.8, aerobic condition	No
Bacteria	Thermus thermophilus	100%	ISCAR-03915	166 media, 65°C, pH 7.0, aerobic condition	No
Bacteria	Vibrio cyclitrophicus	100%	ISCAR-06209	Marine Broth media, 22°C, pH 6.8, aerobic condition	No
Archaea	Pyrococcus abyssi	100%	DSM 25543	YPS¹ media, 90°C, pH 7, anaerobic condition, elemental sulfur	Yes
Archaea	Thermococcus barophilus	100%	DSM 11836	TRM², 85°C, pH 6.5, anaerobic condition, elemental sulfur	Yes

Open in a new tab

Notes.

Growth media recipes in: ¹Erauso et al. (1993) ²Marteinsson et al. (1999).

DNA extractions

DNA was extracted from all samples using the QIAGEN AllPrep kit according to the manufacturer’s instructions with modifications. Sterivex filters were aseptically removed from their plastic casing as described by Cruaud et al. (2017). Filters were transferred to tubes containing 600 µl RTL buffer from the kit and 0.2 g of 0.1 mm zirconia/silica beads (BioSpec, cat. 11079101z) for mechanical disruption of the cells (bead-beating) using a Disrupt MixerMill MM400 by Retsch with the program P9 (300 Hz) three times for 10 s each, cooling down tubes in icy water in between each bead-beating step. DNA quality was assessed with a NanoDrop 1000 Spectrophotometer (ThermoFisher) and DNA was quantified with a Qubit fluorometer (Qubit DNA BR assay, Invitrogen).

Library preparation and sequencing

High-throughput sequencing of the samples was performed by Genome Quebec using the HiSeq system (Illumina). Libraries were prepared using NEBNext UltraTM II DNA Library Prep Kit for Illumina (New England Biolabs) followed by sequencing on two lanes of an Illumina HiSeq 4000 PE150 system (Illumina) allocating 1/20 and 1/25 of a lane for each sample. Demultiplexing and conversion to FASTQ files were performed using bcl2fastq Conversion Software v1.8.4 (Illumina) resulting in 32 metagenomic datasets.

Co-assembly and binning

The quality of the raw sequencing reads was assessed using FastQC v0.11.8 (Andrews et al., 2012) (Fig. S1). Quality control of the raw reads was performed with Sunbeam v2.0.2 (Clarke et al., 2019) which includes trimming with Trimmomatic v0.36 (Bolger, Lohse & Usadel, 2014), adapter removal with Cutadapt v2.6 (Martin, 2011) (parameters PE -phred33 ILLUMINACLIP: NexteraPE-PE.fa:2:30:10:8:true LEADING: 3 TRAILING: 3 SLIDINGWINDOW: 4:15 MINLEN: 36), removal of low complexity sequences using Sunbeam Komplexity (default parameter) and removal of contaminating human sequences using the Genome Reference Consortium Human Build 38 patch release 13 GRCh38.p13 (Lander et al., 2001; Schneider et al., 2017). Resulting quality-filtered metagenomic data were divided into surface and seafloor datasets as the surface of the ocean can be considered a different environment compared to the seafloor (Fig. S2). Both datasets also included the mock community. After quality filtering, MEGAHIT v1.2.9 (Li et al., 2015; Li et al., 2016) (parameters: –min-contig-len 1000 -m 0.85) co-assembled both datasets of samples with a minimum contig length of 1000 bp, resulting in two FASTA files of community contigs. Quality-filtered short reads from each sample were mapped back to the contigs of both co-assemblies respectively using Bowtie v2 (default parameters and –no-unal flag) (Langmead & Salzberg, 2012). The resulting SAM files were indexed and converted to BAM files with SAMTOOLS v0.3.3 (parameters: view -F 4 -bS) (Li et al., 2009). For both co-assemblies, the FASTA files containing the contigs were formatted with the script reformat-fasta from Anvi’o v6.2 (Eren et al., 2015). The two contigs databases (the surface and the seafloor databases) were generated with Anvi’o, BAM files were profiled and merged to the respective databases. Automated binning was performed using Anvi’o script anvi-cluster-contigs with default parameters with three binning algorithms: CONCOCT v1.1.0 (Alneberg et al., 2013), MaxBin2 v2.2.6 (Wu, Simmons & Singer, 2016), and MetaBAT 2 v2:2.15 (Kang et al., 2019). For all binning results, completeness and redundancy of the bins were estimated with Anvio’s script anvi-estimate-genome-completeness which relies on CheckM v1.1.3 (Parks et al., 2015). Based on the comparison of the three binning algorithms, we selected the “good quality bins” from MetaBAT 2 with an estimated completion above 50% and an estimated redundancy below 10% according to standards suggested by Bowers et al. (2017). The relative proportions of good quality bins in the total number of bins was assessed by chi² test.

Functional assignment, taxonomy and phylogenomic trees

We used PRODIGAL v2.6.3 (Hyatt et al., 2010) to identify Open Reading Frames (ORFs) within the contigs. The resulting ORFs were processed with Kaiju v1.7.3 (Menzel, Ng & Krogh, 2016) and NCBI nr+euk database (nr_euk 2019-06-25, 46GB, available for download at for taxonomic assignment. Beside the contig-based taxonomic assignment, we used GTDB-Tk v1.3.0 (Genome Taxonomy Database Toolkit) (Chaumeil et al., 2019) to construct two bacterial and two archaeal phylogenomic trees containing good quality MAGs (completeness ≥50%; contamination ≤10%) and Genome Taxonomy Data Bank (GTDB) R95 (released in July 2020) reference genomes to confirm taxonomic assignments of the MAGs (Parks et al., 2018). The trees were reconstructed using ARB (Ludwig et al., 2004) for comprehensive visualisation.

Data availability

The raw Illumina sequencing paired-end reads are available in the ENA under project accession number PRJEB41565 (ERP125360). MAGs are available under accession numbers ERS5621908 to ERS5622126. Code is available at https://github.com/clarajegousse/.

Results

Co-assemblies

The co-assembly of the 16 samples of the surface of the ocean yielded 445,328 contigs, with a minimal length of 1,000 bp, representing a total length of 1.06 Gb (1,060,942,783 nucleotides) with N50 of 2,627 bp and 1,271,859 gene calls (Table 3).

Table 3. Statistics summary of co-assemblies.

	Surface	Seafloor
Total nucleotides	1.06 Gb	1.23 Gb
N50	2,382 bp	2,327 bp
L50	83,272 bp	114,549 bp
Number of contigs	445,328	554,104
Longest contig	864,343 bp	1,302,516 bp
Shortest contig	1,000 bp	1,000 bp
Number of contigs >10 kb	8,521	8,306
Number of genes (Prodigal)	1,271,859	1,532,800

Open in a new tab

The co-assembly of the 17 samples of the seafloor of the ocean yielded 554,104 contigs, with a minimal length of 1,000 bp, representing a total of length of 1.23 Gb (1,233,390,295 nucleotides) with N50 of 2,327 bp and 1,532,800 gene calls (Table 3).

Binning

A comparison of the three binning algorithms - CONCOCT, MaxBin2 and MetaBAT 2 - was conducted on the surface and seafloor co-assemblies based on the number of good quality bins (Fig. 2). Good quality bins have an estimated completion above 50% and an estimated redundancy (also called estimated contamination) below 10% (Bowers et al., 2017). The relative proportions of good quality bins is significantly different for the three binning methods (χ² = 135.23, df = 2, p-value <2.2e−16). The results of the binning showed that MetaBAT 2 resulted in a lower number of bins compared to CONCOCT and MaxBin2. Yet the number of good quality bins was much higher with MetaBAT 2 compared with CONCOCT and MaxBin2 (Table 4).

Numbers of contigs binned is represented by the size of the pie plots. Numbers and percentages of bad quality bins and good quality bins are shown within the grey and coloured slices of the chart respectively. Good quality bins have an estimated completion above 50% and an estimated redundancy (also called estimated contamination) below 10% (Bowers et al., 2017).

Table 4. Statistics summary of co-assemblies.

Co-assembly	Binning method	Number of bins	Number of MAGs	Average completeness (%)	Average contamination (%)
Surface	CONCOCT	319	43	45.15	49.23
Surface	MaxBin2	302	17	25.77	13.30
Surface	MetaBAT 2	279	118	44.12	3.46
Seafloor	CONCOCT	259	28	51.26	90.39
Seafloor	MaxBin2	358	18	34.59	18.63
Seafloor	MetaBAT 2	299	134	49.90	7.13

Open in a new tab

MetaBAT 2 gave the best results which were used for further analysis and shown in more detail in Fig. 3. Out of the 279 bins identified by MetaBAT 2 for the surface samples, 42.4% (118) of them are good quality bins that can be considered draft MAGs according to Bowers et al. (2017). Within the 118 good quality MAGs (Fig. 3B), 16 represent genomes of organisms from the mock community and 102 are assembled from the surface seawater. In the same manner, out of the 299 bins identified by MetaBAT 2 for the seafloor samples, 45.81% (134) of can be considered good draft MAGs. Within the 134 good quality MAGs (Fig. 3D), 17 represent genomes of organisms from the mock community and 117 are assembled from the seawater at the seafloor. The relative proportions of MAGs out of the total number of bins is the same out of the two co-assemblies datasets (χ² = 0.27784, df = 1, p-value = 0.5981) which means that the environments do not seem to impact significantly the number of MAGs. In the same manner, the relative proportions of MAGs associated to the mock community out of the total number of MAGs is the same in the two co-assemblies datasets (χ² = 0.0003, df = 1, p-value = 0.9858).

Bad quality bins (completeness below 50% and redundancy above 10%) are shown in grey while good quality bins are in colours (green for surface, blue for seafloor samples). (A) A total of 279 bins obtained with MetaBAT 2 from the surface co-assembly with 118 good quality bins. (B) Good quality bins from the surface co-assembly with the identification bins corresponding to members of the mock community. (C) A total of 299 bins obtained with MetaBAT 2 from the seafloor co-assembly with 134 good quality bins. (D) Good quality bins from the seafloor with the identification of the bins corresponding to members of the mock community.

Taxonomy

When excluding members of the mock community based on taxonomic assignment and differential coverage, we identified 102 MAGs reconstructed from the surface co-assembly and 117 MAGs from the seafloor co-assembly. The surface MAGs include two eukaryotes (Bathycoccus and Micromonas), 92 bacteria, and eight archaea while the seafloor MAGs include 99 bacteria, 18 archaea and no eukaryotes.

The surface co-assembly yielded a total of 92 bacterial MAGs (Fig. 4). These MAGs are members of seven phyla (number of MAGs in brackets): Proteobacteria (52), Bacteroidota (31), Actinobacteriota (2), Verrumicrobiota (2), Planctomycetota (2), SAR324 (1) and Cyanobacteria (1). The MAG within the Cyanobacteria phylum belongs to the genus Synechococcus. Within the phylum Actinobacteriota, we retrieved two MAGs: one from a member of the genus Aquiluna and one of the genus Pontimonas. We reconstructed two MAGs within the phylum Planctomycetota. The two MAGs within the Verrumicrobiota belong to the family Akkermansiaceae. The Bacteroidota phylum includes 31 MAGs reconstructed from the sea surface co-assembly. Most of these Bacteroidota MAGs belong to the Flavobacteriaceae family (18), including one representant of the genus Polaribacter. Many MAGs within the Flavobacteriaceae family are related to MAGs revealed by Tara Ocean Consortium such as Cryomorphaceae bacterium and Flavobacteriales bacterium (CFB group bacteria). We also reconstructed 52 MAGs belonging to the phylum of Proteobacteria, including nine Rhodobacteraceae, ten SAR86 and ten Porticoccaceae. Within the three MAGs of the Burkholderiales order, one is within the Burkholderia genus, and the two others belong to the Methylophilaceae family according to GTDB.

Distribution of the Marine Icelandic MAGs across 76 bacterial phyla from GTDB. The maximum likelihood tree was inferred from the concatenation of 120 proteins spanning a dereplicated set of 191,527 bacterial genomes (GTDB 05-RS95 released on the 17th July 2020) and the Marine Icelandic MAGs. Phyla containing MAGs from the surface seawater, seafloor or both are shown in green, blue or teal respectively. Number of Marine Icelandic MAGs from the surface and the seafloor in each phylum are indicated in between parenthesis in green and blue respectively.

The seafloor co-assembly yielded a total of 99 bacterial MAGs spanning across 12 phyla: Proteobacteria (46), Verrumicrobiota (9), Bacteroidota (9), Marinisomatota (8), Actinobacteria (5), Planctomycetota (5), Gemmatimonadota (4), Nitrospinota (3), Chloroflexota (2), SAR324 (2), Myxococcota (1), Lactescibacterota (1). Six of these phyla include exclusively MAGs from the seafloor (Nitrospinota, Myxococcota, Gemmatimonadota, Marinisomatota, Chloroflexa, Lactescibacterota). Within the Proteobacteria, most of the MAGs belong to the Gammaproteobacteria class with 32 MAGSs while the remaining 14 are part of the Alphaproteobacteria. Five orders within the Proteobacteria exclusively include MAGs reconstructed from the seafloor co-assembly (Rhizobiales, Rhodospirillales, TMED109, UBA10353, UBA4486) and none from the surface co-assembly.

Out of the 21 bacterial species of the mock community, 12 of them were re-assembled and given the correct taxonomic assignment down to species level (if available for the strain used) for Alteromonas sp., Geobacillus marinus, Colwellia sp., Escherichia coli, Marinobacter sp., Photobacterium sp., Pseudoalteromonas sp., Reinekea marinisedimentorum, Sulfitobacter donghicola, Sulfitobacter guttiformis, Sulfitobacter pontiacus and Thermus thermophilus. However, some distinct species of the mock community that belong to the same genus do not match any specific MAGs but seem to have been reassembled as one single MAG within the genus in question, such as Reinekea aestuarii and Reinekea sp. 84 as well as Sulfitobacter undariae and Sulfitobacter sp. 87. The genomes of Bacillus thermoleovorans, Dietzia sp., Halomonas sp. and Vibrio cyclitrophicus were not reassembled.

The surface co-assembly yielded only eight archaeal MAGs (Fig. 5), all within the Thermoplasmota phylum, including three MAGs within the genus MGIIb-O2 of the Thalassarchaeaceae family and five within the Poseidoniaceae family. The seafloor co-assembly resulted in 18 archaeal MAGs including one representant of the Thermoproteota phylum: this MAGs belongs to the UBA57 phylum within the order of the Nitrososphaerales. The 17 other archaeal MAGs are all comprised in the Thermoplasmatota phylum, within the class Poseidoniia, including representatives of the Poseidoniaceae and Thalassarchaeaceae families. The two archaeal members within the mock community (Pyrococcus abyssi and Thermococcus barophilus) were successfully reconstructed in both co-assemblies.

Distribution of the Marine Icelandic MAGs across 18 archaeal phyla from GTDB. The maximum likelihood tree was inferred from the concatenation of 122 proteins spanning a dereplicated set of 3,073 archaeal genomes (GTDB 05-RS95 released on the 17th July 2020) and the Marine Icelandic MAGs. Phyla containing MAGs from the surface seawater, seafloor or both are shown in green, blue or teal respectively. Number of Marine Icelandic MAGs from the surface and the seafloor in each phylum are indicated in between parenthesis in green and blue respectively.

Discussion

Mock communities are used to quantify and characterise biases introduced in the sample processing pipeline (Brooks et al., 2015) and are indispensable to benchmark sequencing methods and downstream analysis (Singer et al., 2016; Sevim et al., 2019). Mock communities can also be used as a positive control for metagenomic studies. Our mock community confirmed that MetaBAT 2 was able to resolve genomes of species within the same genus, thus making it the most suitable binning algorithms out of the three tested in this study: CONCOCT, MaxBin2 and MetaBAT 2. This result is consistent with previous studies (Yue et al., 2020).

The ocean is a vast continuum and the samples were taken within a relatively small section/fraction of the North Atlantic Ocean at several sampling depths: the surface and the seafloor (90 m, 470 m, 1,006 m, and 1,060 m depending on the station). The differences in the sampling depth implies differences in lighting, pressure and temperature compared to the surface of the ocean. While the surface of the ocean is subjected to seasonal variations in day light and temperature, the seafloor remains darker and colder than the surface, and such parameters are driving microbial community structure and function. Therefore, we considered the surface and the seafloor of the ocean as two different types of environments which justifies our approach of two co-assemblies rather than assembling all of the 32 samples together. The fact that a number of MAGs were exclusively found in only one of the two environments, confirmed this.

Conclusions

The goal of this study was to reconstruct MAGs from 31 samples from Icelandic sea waters. The 219 MAGs span across 13 bacterial and two archaeal phyla and contribute to a more define picture of the global marine microbiome. Moreover, this study confirms, thanks to the inclusion of a mock community in the analysis, that the combination of co-assembly and binning with MetaBAT 2 allows, despite a relatively shallow sequencing depth, the recovery of quality MAGs that are a precious resource for further ecological and environmental studies.

Supplemental Information

Supplemental Information 1. Number of raw reads of 32 metagenomic datasets.

Metagenomic datasets from 32 samples (31 seawater samples and mock community). Number of reads displayed depending on the sampling locations and times.

Click here for additional data file.^{(1.2MB, png)}

DOI: 10.7717/peerj.11112/supp-1

Supplemental Information 2. Principal Component Analysis (PCoA) based on Bray–Curtis dissimilarity computed by SimkaMin (Benoit et al., 2020).

(A) Experimental variable. (B) Environmental and geographic variables.

Click here for additional data file.^{(335.4KB, png)}

DOI: 10.7717/peerj.11112/supp-2

Acknowledgments

The authors would like to thank Kristinn Gudmundsson and Bjarni Saemundsson’s crew from the Marine Research Institute, and Pauline Bergsten and Mia Cerfonteyn from the University of Iceland & Matís for sampling, Antonio Fernandez Guerra from the Max Plank Institute for Marine Microbiology and Arnar Pálsson from the University of Iceland for advice and Elvar Örn Jónsson from the University of Iceland for technical support. The analyses presented in the study were performed using the resources provided by the Icelandic High Performance Computing Centre at the University of Iceland.

Funding Statement

The work is part of the Microbes in the Icelandic Marine Environment (MIME) project which was funded by the Grant of Excellence (No. 163266-051) of the Icelandic Research Fund (Rannís). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Additional Information and Declarations

Competing Interests

Clara Jégousse, Pauline Vannier, René Groben and Viggó Marteinsson are employees of Matís ohf.

Author Contributions

Clara Jégousse conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Pauline Vannier conceived and designed the experiments, performed the experiments, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

René Groben, Frank Oliver Glöckner and Viggó Marteinsson conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.

DNA Deposition

The following information was supplied regarding the deposition of DNA sequences:

Data are available at the ENA under project number PRJEB41565: all MAGs: ERS5621908 to ERS5622126; the surface and seafloor co-assemblies: ERS5565811 and ERS5565812.

Data Availability

The following information was supplied regarding data availability:

Code is available at Github:

https://github.com/clarajegousse/mime.

The following data are available at ENA:

- Raw data, co-assemblies and MAGs: PRJEB41565.

- Raw sequence data for the mock community: ERS5472810 to ERS5472840, and ERS5475418.

- The surface and seafloor co-assemblies: ERS5565811 and ERS5565812 respectively.

- MAGs: ERS5621908 to ERS5622126.

References

Alneberg et al. (2013).Alneberg J, Bjarnason BS, De Bruijn I, Schirmer M, Quick J, Ijaz UZ, Loman NJ, Andersson AF, Quince C. CONCOCT: clustering contigs on coverage and composition. 2013 doi: 10.1038/nmeth.3103.1312.4038 [DOI] [PubMed]
Andrews et al. (2012).Andrews S, Krueger F, Segonds-Pichon A, Biggins L, Krueger C, Wingett S. FastQC. Babraham: Babraham Institute; 2012. [Google Scholar]
Astthorsson, Gislason & Jonsson (2007).Astthorsson OS, Gislason A, Jonsson S. Climate variability and the Icelandic marine ecosystem. Deep Sea Research Part II: Topical Studies in Oceanography. 2007;54(23–26):2456–2477. doi: 10.1016/j.dsr2.2007.07.030. [DOI] [Google Scholar]
Azam (1998).Azam F. Microbial control of oceanic carbon flux: the plot thickens. Science. 1998;280(5364):694–696. doi: 10.1126/science.280.5364.694. [DOI] [Google Scholar]
Azam et al. (1983).Azam F, Fenchel T, Field JG, Gray J, Meyer-Reil L, Thingstad F. The ecological role of water-column microbes in the sea. Marine Ecology Progress Series. 1983;10:257–263. [Google Scholar]
Benoit et al. (2020).Benoit G, Mariadassou M, Robin S, Schbath S, Peterlongo P, Lemaitre C. SimkaMin: fast and resource frugal de novo comparative metagenomics. Bioinformatics. 2020;36(4):1275–1276. doi: 10.1093/bioinformatics/btz685. [DOI] [PubMed] [Google Scholar]
Bolger, Lohse & Usadel (2014).Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bowers et al. (2017).Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy T. BK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, Tringe SG, Ivanova NN, Copeland A, Clum A, Becraft ED, Malmstrom RR, Birren B, Podar M, Bork P, Weinstock GM, Garrity GM, Dodsworth JA, Yooseph S, Sutton G, Glöckner FO, Gilbert JA, Nelson WC, Hallam SJ, Jungbluth SP, Ettema TJG, Tighe S, Konstantinidis KT, Liu W-T, Baker BJ, Rattei T, Eisen JA, Hedlund B, McMahon KD, Fierer N, Knight R, Finn R, Cochrane G, Karsch-Mizrachi I, Tyson GW, Rinke C, Schriml L, Hugenholtz P, Yilmaz P, Meyer F, Lapidus A, Parks DH, Murat Eren A, Banfield JF, Woyke T, TGS Consortium Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology. 2017;35(8):725–731. doi: 10.1038/nbt.3893. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brooks et al. (2015).Brooks JP, Edwards DJ, Harwich MD, Rivera MC, Fettweis JM, Serrano MG, Reris RA, Sheth NU, Huang B, Girerd P, Strauss JF, Jefferson KK, Buck GA, (additional members), VMC The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiology. 2015;15(1):66. doi: 10.1186/s12866-015-0351-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chaumeil et al. (2019).Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2019;36(6):1925–1927. doi: 10.1093/bioinformatics/btz848. [DOI] [PMC free article] [PubMed] [Google Scholar]
Clarke et al. (2019).Clarke EL, Taylor LJ, Zhao C, Connell A, Lee J-J, Fett B, Bushman FD, Bittinger K. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome. 2019;7(1):46. doi: 10.1186/s40168-019-0658-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cruaud et al. (2017).Cruaud P, Vigneron A, Fradette M-S, Charette SJ, Rodriguez MJ, Dorea CC, Culley AI. Open the Sterivex casing: an easy and effective way to improve DNA extraction yields. Limnology and Oceanography: Methods. 2017;15(12):1015–1020. doi: 10.1002/lom3.10221. [DOI] [Google Scholar]
Erauso et al. (1993).Erauso G, Reysenbach A-L, Godfroy A, Meunier J-R, Crump B, Partensky F, Baross JA, Marteinsson V, Barbier G, Pace NR, Prieur D. Pyrococcus abyssi sp. nov., a new hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent. Archives of Microbiology. 1993;160(5):338–349. doi: 10.1007/BF00252219. [DOI] [Google Scholar]
Eren et al. (2015).Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, Delmont TO. Anvio: an advanced analysis and visualization platform for omics data. PeerJ. 2015;3:e1319. doi: 10.7717/peerj.1319. [DOI] [PMC free article] [PubMed] [Google Scholar]
Falkowski, Fenchel & Delong (2008).Falkowski PG, Fenchel T, Delong EF. The microbial engines that drive Earth’s biogeochemical cycles. Science. 2008;320(5879):1034–1039. doi: 10.1126/science.1153213. [DOI] [PubMed] [Google Scholar]
Gudmundsson (1998).Gudmundsson K. Long-term variation in phytoplankton productivity during spring in Icelandic waters. ICES Journal of Marine Science. 1998;55(4):635–643. doi: 10.1006/jmsc.1998.0391. [DOI] [Google Scholar]
Haroon et al. (2016).Haroon MF, Thompson LR, Parks DH, Hugenholtz P, Stingl U. A catalogue of 136 microbial draft genomes from Red Sea metagenomes. Scientific Data. 2016;3(1):1–6. doi: 10.1038/sdata.2016.50. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hugenholtz & Tyson (2008).Hugenholtz P, Tyson GW. Metagenomics. Nature. 2008;455(7212):481–483. doi: 10.1038/455481a. [DOI] [PubMed] [Google Scholar]
Hyatt et al. (2010).Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11(1):119. doi: 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kang et al. (2019).Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, Wang Z. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019;7:e7359. doi: 10.7717/peerj.7359. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kopf et al. (2015).Kopf A, Bicak M, Kottmann R, Schnetzer J, Kostadinov I, Lehmann K, Fernandez-Guerra A, Jeanthon C, Rahav E, Ullrich M, Wichels A, Gerdts G, Polymenakou P, Kotoulas G, Siam R, Abdallah RZ, Sonnenschein EC, Cariou T, O’Gara F, Jackson S, Orlic S, Steinke M, Busch J, Duarte B, Caçador I, Canning-Clode J, Bobrova O, Marteinsson V, Reynisson E, Loureiro CM, Luna GM, Quero GM, Löscher CR, Kremp A, DeLorenzo ME, Øvreås L, Tolman J, LaRoche J, Penna A, Frischer M, Davis T, Katherine B, Meyer CP, Ramos S, Magalhães C, Jude-Lemeilleur F, Aguirre-Macedo ML, Wang S, Poulton N, Jones S, Collin R, Fuhrman JA, Conan P, Alonso C, Stambler N, Goodwin K, Yakimov MM, Baltar F, Bodrossy L, Van De Kamp J, Frampton DM, Ostrowski M, Van Ruth P, Malthouse P, Claus S, Deneudt K, Mortelmans J, Pitois S, Wallom D, Salter I, Costa R, Schroeder DC, Kandil MM, Amaral V, Biancalana F, Santana R, Pedrotti ML, Yoshida T, Ogata H, Ingleton T, Munnik K, Rodriguez-Ezpeleta N, Berteaux-Lecellier V, Wecker P, Cancio I, Vaulot D, Bienhold C, Ghazal H, Chaouni B, Essayeh S, Ettamimi S, Zaid EH, Boukhatem N, Bouali A, Chahboune R, Barrijal S, Timinouni M, El Otmani F, Bennani M, Mea M, Todorova N, Karamfilov V, ten Hoopen P, Cochrane G, L’Haridon S, Bizsel KC, Vezzi A, Lauro FM, Martin P, Jensen RM, Hinks J, Gebbels S, Rosselli R, De Pascale F, Schiavon R, dos Santos A, Villar E, Pesant S, Cataletto B, Malfatti F, Edirisinghe R, Silveira J. AH, Barbier M, Turk V, Tinta T, Fuller WJ, Salihoglu I, Serakinci N, Ergoren MC, Bresnan E, Iriberri J, Nyhus P. AF, Bente E, Karlsen HE, Golyshin PN, Gasol JM, Moncheva S, Dzhembekova N, Johnson Z, Sinigalliano CD, Gidley ML, Zingone A, Danovaro R, Tsiamis G, Clark MS, Costa AC, El Bour M, Martins AM, Collins RE, Ducluzeau A-L, Martinez J, Costello MJ, Amaral-Zettler LA, Gilbert JA, Davies N, Field D, Glöckner FO. The ocean sampling day consortium. GigaScience. 2015;4(1):27. doi: 10.1186/s13742-015-0066-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lander et al. (2001).Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng J-F, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen H-C, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert J. GR, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit A. FA, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang S-P, Yeh R-F, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Patrinos A, Morgan MJ, IHGS Consortium. Whitehead Institute for Biomedical Research, CfGR. TS Centre. WUGS Center. UDJG Institute. BC of Medicine Human Genome Sequencing Center. RGS Center. RGS Genoscope UMR-8030. IoMB Department of Genome Analysis. GS Center. BGIG Center. TIfSB Multimegabase Sequencing Center. SGT Center. U of Oklahoma’s Advanced Center for Genome Technology. MPI for Molecular Genetics. LAHGC Cold Spring Harbor Laboratory. G-GRC for Biotechnology. *Genome Analysis Group (listed in alphabetical order, a. i. i. l. u. oh. U. N. I. oH Scientific management: National Human Genome Research Institute. SHG Center. U of Washington Genome Center. K. U. S. oM Department of Molecular Biology. U of Texas Southwestern Medical Center at Dallas. U. D. oE Office of Science. Trust TW Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
Langmead & Salzberg (2012).Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li et al. (2015).Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–1676. doi: 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]
Li et al. (2016).Li D, Luo R, Liu C-M, Leung C-M, Ting H-F, Sadakane K, Yamashita H, Lam T-W. MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods. 2016;102:3–11. doi: 10.1016/j.ymeth.2016.02.020. [DOI] [PubMed] [Google Scholar]
Li et al. (2009).Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ludwig et al. (2004).Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar, Buchner A, Lai T, Steppi S, Jobb G, Frster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, Knig A, Liss T, Lmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer K. ARB: a software environment for sequence data. Nucleic Acids Research. 2004;32(4):1363–1371. doi: 10.1093/nar/gkh293. [DOI] [PMC free article] [PubMed] [Google Scholar]
Malmberg, Valdimarsson & Mortensen (1995).Malmberg S-A, Valdimarsson H, Mortensen J. Long time series in Icelandic Waters, in relation to physical variability in the northern north Atlantic. Ocean Challenge. 1995;6:48–51. [Google Scholar]
Marteinsson et al. (1999).Marteinsson VT, Birrien J-L, Reysenbach A-L, Vernet M, Marie D, Gambacorta A, Messner P, Sleytr UB, Prieur D. Thermococcus barophilus sp. nov., a new barophilic and hyperthermophilic archaeon isolated under high hydrostatic pressure from a deep-sea hydrothermal vent. International Journal of Systematic and Evolutionary Microbiology. 1999;49(2):351–359. doi: 10.1099/00207713-49-2-351. [DOI] [PubMed] [Google Scholar]
Martin (2011).Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. Journal. 2011;17(1):10–12. [Google Scholar]
Menzel, Ng & Krogh (2016).Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications. 2016;7(1):11257. doi: 10.1038/ncomms11257. [DOI] [PMC free article] [PubMed] [Google Scholar]
Parks et al. (2018).Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, Hugenholtz P. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology. 2018;36(10):996–1004. doi: 10.1038/nbt.4229. [DOI] [PubMed] [Google Scholar]
Parks et al. (2015).Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research. 2015;25(7):1043–1055. doi: 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pomeroy (1974).Pomeroy LR. The ocean’s food web, a changing paradigm. Bioscience. 1974;24(9):499–504. doi: 10.2307/1296885. [DOI] [Google Scholar]
Quince et al. (2017).Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nature Biotechnology. 2017;35(9):833–844. doi: 10.1038/nbt.3935. [DOI] [PubMed] [Google Scholar]
Rusch et al. (2007).Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers Y-H, Falcn LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, Platt T, Bermingham E, Gallardo V, Tamayo-Castillo G, Ferrari MR, Strausberg RL, Nealson K, Friedman R, Frazier M, Venter JC. The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLOS Biology. 2007;5(3):e77. doi: 10.1371/journal.pbio.0050077. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sangwan, Xia & Gilbert (2016).Sangwan N, Xia F, Gilbert JA. Recovering complete and draft population genomes from metagenome datasets. Microbiome. 2016;4(1):8. doi: 10.1186/s40168-016-0154-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schneider et al. (2017).Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen H-C, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT, Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin C-S, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Research. 2017;27(5):849–864. doi: 10.1101/gr.213611.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sevim et al. (2019).Sevim V, Lee J, Egan R, Clum A, Hundley H, Lee J, Everroad RC, Detweiler AM, Bebout BM, Pett-Ridge J, Göker M, Murray AE, Lindemann SR, Klenk H-P, O’Malley R, Zane M, Cheng J-F, Copeland A, Daum C, Singer E, Woyke T. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Scientific Data. 2019;6(1):285. doi: 10.1038/s41597-019-0287-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
Singer et al. (2016).Singer E, Andreopoulos B, Bowers RM, Lee J, Deshpande S, Chiniquy J, Ciobanu D, Klenk H-P, Zane M, Daum C, Clum A, Cheng J-F, Copeland A, Woyke T. Next generation sequencing data of a defined microbial mock community. Scientific Data. 2016;3(1):160081. doi: 10.1038/sdata.2016.81. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sunagawa et al. (2020).Sunagawa S, Acinas SG, Bork P, Bowler C, Acinas SG, Babin M, Boss E, Cochrane G, De Vargas C, Follows M, Gorsky G, Grimsley N, Guidi L, Hingamp P, Iudicone D, Jaillon O, Kandels S, Karp-Boss L, Karsenti E, Lescot M, Not F, Ogata H, Pesant S, Poulton N, Raes J, Sardet C, Sieracki M, Speich S, Stemmann L, Sullivan MB, Wincker P, Eveillard D, Lombard F, Pesant S, Sullivan MB, Tara Oceans Coordinators Tara Oceans: towards global ocean ecosystems biology. Nature Reviews Microbiology. 2020;18(8):428–445. doi: 10.1038/s41579-020-0364-5. [DOI] [PubMed] [Google Scholar]
Sunagawa et al. (2015).Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A, Cornejo-Castillo FM, Costea PI, Cruaud C, d’Ovidio F, Engelen S, Ferrera I, Gasol JM, Guidi L, Hildebrand F, Kokoszka F, Lepoivre C, Lima-Mendez G, Poulain J, Poulos BT, Royo-Llonch M, Sarmento H, Vieira-Silva S, Dimier C, Picheral M, Searson S, Kandels-Lewis S, Bowler C, de Vargas C, Gorsky G, Grimsley N, Hingamp P, Iudicone D, Jaillon O, Not F, Ogata H, Pesant S, Speich S, Stemmann L, Sullivan MB, Weissenbach J, Wincker P, Karsenti E, Raes J, Acinas SG, Bork P. Structure and function of the global ocean microbiome. Science. 2015;348:6237. doi: 10.1126/science.1261359. [DOI] [PubMed] [Google Scholar]
Ten Hoopen et al. (2017).Ten Hoopen P, Finn RD, Bongo LA, Corre E, Fosso B, Meyer F, Mitchell A, Pelletier E, Pesole G, Santamaria M, Willassen NP, Cochrane G. The metagenomic data life-cycle: standards and best practices. Gigascience. 2017;6(8):1–11. doi: 10.1093/gigascience/gix047. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thórdardóttir (1986).Thórdardóttir T. The role of freshwater outflow in coastal marine ecosystems. Springer-Verlag; Berlin Heidelberg: 1986. Timing and duration of spring blooming south and southwest of Iceland; pp. 345–360. [DOI] [Google Scholar]
Tully, Graham & Heidelberg (2018).Tully BJ, Graham ED, Heidelberg JF. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Scientific Data. 2018;5:170203. doi: 10.1038/sdata.2017.203. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tully et al. (2017).Tully BJ, Sachdeva R, Graham ED, Heidelberg JF. 290 metagenome-assembled genomes from the Mediterranean Sea: a resource for marine microbiology. PeerJ. 2017;5:e3558. doi: 10.7717/peerj.3558. [DOI] [PMC free article] [PubMed] [Google Scholar]
Valdimarsson & Malmberg (1999).Valdimarsson H, Malmberg S-A. Near-surface circulation in Icelandic waters derived from satellite tracked drifters. Rit Fiskideild. 1999;16:23–40. [Google Scholar]
Venter et al. (2004).Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers Y-H, Smith HO. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304(5667):66–74. doi: 10.1126/science.1093857. [DOI] [PubMed] [Google Scholar]
Wu, Simmons & Singer (2016).Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32(4):605–607. doi: 10.1093/bioinformatics/btv638. [DOI] [PubMed] [Google Scholar]
Yue et al. (2020).Yue Y, Huang H, Qi Z, Dou H-M, Liu X-Y, Han T-F, Chen Y, Song X-J, Zhang Y-H, Tu J. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinformatics. 2020;21(1):1–15. doi: 10.1186/s12859-019-3325-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Information 1. Number of raw reads of 32 metagenomic datasets.

Metagenomic datasets from 32 samples (31 seawater samples and mock community). Number of reads displayed depending on the sampling locations and times.

Click here for additional data file.^{(1.2MB, png)}

DOI: 10.7717/peerj.11112/supp-1

Supplemental Information 2. Principal Component Analysis (PCoA) based on Bray–Curtis dissimilarity computed by SimkaMin (Benoit et al., 2020).

(A) Experimental variable. (B) Environmental and geographic variables.

Click here for additional data file.^{(335.4KB, png)}

DOI: 10.7717/peerj.11112/supp-2

Data Availability Statement

The following information was supplied regarding data availability:

Code is available at Github:

https://github.com/clarajegousse/mime.

The following data are available at ENA:

- Raw data, co-assemblies and MAGs: PRJEB41565.

- Raw sequence data for the mock community: ERS5472810 to ERS5472840, and ERS5475418.

- The surface and seafloor co-assemblies: ERS5565811 and ERS5565812 respectively.

- MAGs: ERS5621908 to ERS5622126.

[ref-1] Alneberg et al. (2013).Alneberg J, Bjarnason BS, De Bruijn I, Schirmer M, Quick J, Ijaz UZ, Loman NJ, Andersson AF, Quince C. CONCOCT: clustering contigs on coverage and composition. 2013 doi: 10.1038/nmeth.3103.1312.4038 [DOI] [PubMed]

[ref-2] Andrews et al. (2012).Andrews S, Krueger F, Segonds-Pichon A, Biggins L, Krueger C, Wingett S. FastQC. Babraham: Babraham Institute; 2012. [Google Scholar]

[ref-3] Astthorsson, Gislason & Jonsson (2007).Astthorsson OS, Gislason A, Jonsson S. Climate variability and the Icelandic marine ecosystem. Deep Sea Research Part II: Topical Studies in Oceanography. 2007;54(23–26):2456–2477. doi: 10.1016/j.dsr2.2007.07.030. [DOI] [Google Scholar]

[ref-4] Azam (1998).Azam F. Microbial control of oceanic carbon flux: the plot thickens. Science. 1998;280(5364):694–696. doi: 10.1126/science.280.5364.694. [DOI] [Google Scholar]

[ref-5] Azam et al. (1983).Azam F, Fenchel T, Field JG, Gray J, Meyer-Reil L, Thingstad F. The ecological role of water-column microbes in the sea. Marine Ecology Progress Series. 1983;10:257–263. [Google Scholar]

[ref-6] Benoit et al. (2020).Benoit G, Mariadassou M, Robin S, Schbath S, Peterlongo P, Lemaitre C. SimkaMin: fast and resource frugal de novo comparative metagenomics. Bioinformatics. 2020;36(4):1275–1276. doi: 10.1093/bioinformatics/btz685. [DOI] [PubMed] [Google Scholar]

[ref-7] Bolger, Lohse & Usadel (2014).Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-8] Bowers et al. (2017).Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy T. BK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, Tringe SG, Ivanova NN, Copeland A, Clum A, Becraft ED, Malmstrom RR, Birren B, Podar M, Bork P, Weinstock GM, Garrity GM, Dodsworth JA, Yooseph S, Sutton G, Glöckner FO, Gilbert JA, Nelson WC, Hallam SJ, Jungbluth SP, Ettema TJG, Tighe S, Konstantinidis KT, Liu W-T, Baker BJ, Rattei T, Eisen JA, Hedlund B, McMahon KD, Fierer N, Knight R, Finn R, Cochrane G, Karsch-Mizrachi I, Tyson GW, Rinke C, Schriml L, Hugenholtz P, Yilmaz P, Meyer F, Lapidus A, Parks DH, Murat Eren A, Banfield JF, Woyke T, TGS Consortium Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology. 2017;35(8):725–731. doi: 10.1038/nbt.3893. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-9] Brooks et al. (2015).Brooks JP, Edwards DJ, Harwich MD, Rivera MC, Fettweis JM, Serrano MG, Reris RA, Sheth NU, Huang B, Girerd P, Strauss JF, Jefferson KK, Buck GA, (additional members), VMC The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiology. 2015;15(1):66. doi: 10.1186/s12866-015-0351-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-10] Chaumeil et al. (2019).Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2019;36(6):1925–1927. doi: 10.1093/bioinformatics/btz848. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-11] Clarke et al. (2019).Clarke EL, Taylor LJ, Zhao C, Connell A, Lee J-J, Fett B, Bushman FD, Bittinger K. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome. 2019;7(1):46. doi: 10.1186/s40168-019-0658-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-12] Cruaud et al. (2017).Cruaud P, Vigneron A, Fradette M-S, Charette SJ, Rodriguez MJ, Dorea CC, Culley AI. Open the Sterivex casing: an easy and effective way to improve DNA extraction yields. Limnology and Oceanography: Methods. 2017;15(12):1015–1020. doi: 10.1002/lom3.10221. [DOI] [Google Scholar]

[ref-13] Erauso et al. (1993).Erauso G, Reysenbach A-L, Godfroy A, Meunier J-R, Crump B, Partensky F, Baross JA, Marteinsson V, Barbier G, Pace NR, Prieur D. Pyrococcus abyssi sp. nov., a new hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent. Archives of Microbiology. 1993;160(5):338–349. doi: 10.1007/BF00252219. [DOI] [Google Scholar]

[ref-14] Eren et al. (2015).Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, Delmont TO. Anvio: an advanced analysis and visualization platform for omics data. PeerJ. 2015;3:e1319. doi: 10.7717/peerj.1319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-15] Falkowski, Fenchel & Delong (2008).Falkowski PG, Fenchel T, Delong EF. The microbial engines that drive Earth’s biogeochemical cycles. Science. 2008;320(5879):1034–1039. doi: 10.1126/science.1153213. [DOI] [PubMed] [Google Scholar]

[ref-16] Gudmundsson (1998).Gudmundsson K. Long-term variation in phytoplankton productivity during spring in Icelandic waters. ICES Journal of Marine Science. 1998;55(4):635–643. doi: 10.1006/jmsc.1998.0391. [DOI] [Google Scholar]

[ref-17] Haroon et al. (2016).Haroon MF, Thompson LR, Parks DH, Hugenholtz P, Stingl U. A catalogue of 136 microbial draft genomes from Red Sea metagenomes. Scientific Data. 2016;3(1):1–6. doi: 10.1038/sdata.2016.50. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-18] Hugenholtz & Tyson (2008).Hugenholtz P, Tyson GW. Metagenomics. Nature. 2008;455(7212):481–483. doi: 10.1038/455481a. [DOI] [PubMed] [Google Scholar]

[ref-19] Hyatt et al. (2010).Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11(1):119. doi: 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-20] Kang et al. (2019).Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, Wang Z. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019;7:e7359. doi: 10.7717/peerj.7359. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-23] Langmead & Salzberg (2012).Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-24] Li et al. (2015).Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–1676. doi: 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]

[ref-25] Li et al. (2016).Li D, Luo R, Liu C-M, Leung C-M, Ting H-F, Sadakane K, Yamashita H, Lam T-W. MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods. 2016;102:3–11. doi: 10.1016/j.ymeth.2016.02.020. [DOI] [PubMed] [Google Scholar]

[ref-26] Li et al. (2009).Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-27] Ludwig et al. (2004).Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar, Buchner A, Lai T, Steppi S, Jobb G, Frster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, Knig A, Liss T, Lmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer K. ARB: a software environment for sequence data. Nucleic Acids Research. 2004;32(4):1363–1371. doi: 10.1093/nar/gkh293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-28] Malmberg, Valdimarsson & Mortensen (1995).Malmberg S-A, Valdimarsson H, Mortensen J. Long time series in Icelandic Waters, in relation to physical variability in the northern north Atlantic. Ocean Challenge. 1995;6:48–51. [Google Scholar]

[ref-29] Marteinsson et al. (1999).Marteinsson VT, Birrien J-L, Reysenbach A-L, Vernet M, Marie D, Gambacorta A, Messner P, Sleytr UB, Prieur D. Thermococcus barophilus sp. nov., a new barophilic and hyperthermophilic archaeon isolated under high hydrostatic pressure from a deep-sea hydrothermal vent. International Journal of Systematic and Evolutionary Microbiology. 1999;49(2):351–359. doi: 10.1099/00207713-49-2-351. [DOI] [PubMed] [Google Scholar]

[ref-30] Martin (2011).Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. Journal. 2011;17(1):10–12. [Google Scholar]

[ref-31] Menzel, Ng & Krogh (2016).Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications. 2016;7(1):11257. doi: 10.1038/ncomms11257. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-32] Parks et al. (2018).Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, Hugenholtz P. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology. 2018;36(10):996–1004. doi: 10.1038/nbt.4229. [DOI] [PubMed] [Google Scholar]

[ref-33] Parks et al. (2015).Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research. 2015;25(7):1043–1055. doi: 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-34] Pomeroy (1974).Pomeroy LR. The ocean’s food web, a changing paradigm. Bioscience. 1974;24(9):499–504. doi: 10.2307/1296885. [DOI] [Google Scholar]

[ref-35] Quince et al. (2017).Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nature Biotechnology. 2017;35(9):833–844. doi: 10.1038/nbt.3935. [DOI] [PubMed] [Google Scholar]

[ref-36] Rusch et al. (2007).Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers Y-H, Falcn LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, Platt T, Bermingham E, Gallardo V, Tamayo-Castillo G, Ferrari MR, Strausberg RL, Nealson K, Friedman R, Frazier M, Venter JC. The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLOS Biology. 2007;5(3):e77. doi: 10.1371/journal.pbio.0050077. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-37] Sangwan, Xia & Gilbert (2016).Sangwan N, Xia F, Gilbert JA. Recovering complete and draft population genomes from metagenome datasets. Microbiome. 2016;4(1):8. doi: 10.1186/s40168-016-0154-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-38] Schneider et al. (2017).Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen H-C, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT, Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin C-S, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Research. 2017;27(5):849–864. doi: 10.1101/gr.213611.116. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-39] Sevim et al. (2019).Sevim V, Lee J, Egan R, Clum A, Hundley H, Lee J, Everroad RC, Detweiler AM, Bebout BM, Pett-Ridge J, Göker M, Murray AE, Lindemann SR, Klenk H-P, O’Malley R, Zane M, Cheng J-F, Copeland A, Daum C, Singer E, Woyke T. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Scientific Data. 2019;6(1):285. doi: 10.1038/s41597-019-0287-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-40] Singer et al. (2016).Singer E, Andreopoulos B, Bowers RM, Lee J, Deshpande S, Chiniquy J, Ciobanu D, Klenk H-P, Zane M, Daum C, Clum A, Cheng J-F, Copeland A, Woyke T. Next generation sequencing data of a defined microbial mock community. Scientific Data. 2016;3(1):160081. doi: 10.1038/sdata.2016.81. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-41] Sunagawa et al. (2020).Sunagawa S, Acinas SG, Bork P, Bowler C, Acinas SG, Babin M, Boss E, Cochrane G, De Vargas C, Follows M, Gorsky G, Grimsley N, Guidi L, Hingamp P, Iudicone D, Jaillon O, Kandels S, Karp-Boss L, Karsenti E, Lescot M, Not F, Ogata H, Pesant S, Poulton N, Raes J, Sardet C, Sieracki M, Speich S, Stemmann L, Sullivan MB, Wincker P, Eveillard D, Lombard F, Pesant S, Sullivan MB, Tara Oceans Coordinators Tara Oceans: towards global ocean ecosystems biology. Nature Reviews Microbiology. 2020;18(8):428–445. doi: 10.1038/s41579-020-0364-5. [DOI] [PubMed] [Google Scholar]

[ref-42] Sunagawa et al. (2015).Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A, Cornejo-Castillo FM, Costea PI, Cruaud C, d’Ovidio F, Engelen S, Ferrera I, Gasol JM, Guidi L, Hildebrand F, Kokoszka F, Lepoivre C, Lima-Mendez G, Poulain J, Poulos BT, Royo-Llonch M, Sarmento H, Vieira-Silva S, Dimier C, Picheral M, Searson S, Kandels-Lewis S, Bowler C, de Vargas C, Gorsky G, Grimsley N, Hingamp P, Iudicone D, Jaillon O, Not F, Ogata H, Pesant S, Speich S, Stemmann L, Sullivan MB, Weissenbach J, Wincker P, Karsenti E, Raes J, Acinas SG, Bork P. Structure and function of the global ocean microbiome. Science. 2015;348:6237. doi: 10.1126/science.1261359. [DOI] [PubMed] [Google Scholar]

[ref-43] Ten Hoopen et al. (2017).Ten Hoopen P, Finn RD, Bongo LA, Corre E, Fosso B, Meyer F, Mitchell A, Pelletier E, Pesole G, Santamaria M, Willassen NP, Cochrane G. The metagenomic data life-cycle: standards and best practices. Gigascience. 2017;6(8):1–11. doi: 10.1093/gigascience/gix047. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-44] Thórdardóttir (1986).Thórdardóttir T. The role of freshwater outflow in coastal marine ecosystems. Springer-Verlag; Berlin Heidelberg: 1986. Timing and duration of spring blooming south and southwest of Iceland; pp. 345–360. [DOI] [Google Scholar]

[ref-45] Tully, Graham & Heidelberg (2018).Tully BJ, Graham ED, Heidelberg JF. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Scientific Data. 2018;5:170203. doi: 10.1038/sdata.2017.203. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-46] Tully et al. (2017).Tully BJ, Sachdeva R, Graham ED, Heidelberg JF. 290 metagenome-assembled genomes from the Mediterranean Sea: a resource for marine microbiology. PeerJ. 2017;5:e3558. doi: 10.7717/peerj.3558. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-47] Valdimarsson & Malmberg (1999).Valdimarsson H, Malmberg S-A. Near-surface circulation in Icelandic waters derived from satellite tracked drifters. Rit Fiskideild. 1999;16:23–40. [Google Scholar]

[ref-48] Venter et al. (2004).Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers Y-H, Smith HO. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304(5667):66–74. doi: 10.1126/science.1093857. [DOI] [PubMed] [Google Scholar]

[ref-49] Wu, Simmons & Singer (2016).Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32(4):605–607. doi: 10.1093/bioinformatics/btv638. [DOI] [PubMed] [Google Scholar]

[ref-50] Yue et al. (2020).Yue Y, Huang H, Qi Z, Dou H-M, Liu X-Y, Han T-F, Chen Y, Song X-J, Zhang Y-H, Tu J. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinformatics. 2020;21(1):1–15. doi: 10.1186/s12859-019-3325-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

A total of 219 metagenome-assembled genomes of microorganisms from Icelandic marine waters

Clara Jégousse

Pauline Vannier

René Groben

Frank Oliver Glöckner

Viggó Marteinsson

Abstract

Introduction

Materials & Methods

Sampling

Figure 1. (A) Sampling stations location and coordinates. (B) Workflow of bio-molecular processes and downstream analysis.

Table 1. Sampling dates and locations with corresponding seawater temperature and salinity.

Mock community

Table 2. List of bacterial and archaeal species in the mock community.

DNA extractions

Library preparation and sequencing

Co-assembly and binning

Functional assignment, taxonomy and phylogenomic trees

Data availability

Results

Co-assemblies

Table 3. Statistics summary of co-assemblies.

Binning

Figure 2. Binning comparison. Numbers of contigs binned and numbers of bad and good quality bins obtained with CONCOCT, MaxBin2 and MetaBAT 2 from the surface co-assembly (A) and the seafloor co-assembly (B).

Table 4. Statistics summary of co-assemblies.

Figure 3. Assessment of bin quality with the estimated completeness as a function of the redundancy.

Taxonomy

Figure 4. Bacterial phylogenomic tree.

Figure 5. Archaeal phylogenomic tree.

Discussion

Conclusions

Supplemental Information

Acknowledgments

Funding Statement

Additional Information and Declarations

Competing Interests

Author Contributions

DNA Deposition

Data Availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases