Abstract
Marine microorganisms contribute to the health of the global ocean by supporting the marine food web and regulating biogeochemical cycles. Assessing marine microbial diversity is a crucial step towards understanding the global ocean. The waters surrounding Iceland are a complex environment where relatively warm salty waters from the Atlantic cool down and sink down to the deep. Microbial studies in this area have focused on photosynthetic micro- and nanoplankton mainly using microscopy and chlorophyll measurements. However, the diversity and function of the bacterial and archaeal picoplankton remains unknown. Here, we used a co-assembly approach supported by a marine mock community to reconstruct metagenome-assembled genomes (MAGs) from 31 metagenomes from the sea surface and seafloor of four oceanographic sampling stations sampled between 2015 and 2018. The resulting 219 MAGs include 191 bacterial, 26 archaeal and two eukaryotic MAGs to bridge the gap in our current knowledge of the global marine microbiome.
Keywords: Metagenomics, Metagenome-assembled genomes, Iceland, Bacteria, Archaea
Introduction
Marine microorganisms are crucial to the global ecosystem as they regulate the carbon cycle (Azam, 1998; Falkowski, Fenchel & Delong, 2008) and support the marine food web (Pomeroy, 1974; Azam et al., 1983). The study of microorganisms within complex environments, such as the ocean, was accelerated by the emergence of sequencing technologies. In particular, metagenomics—the study of the total genetic material recovered from an environmental sample—have provided previously unavailable information on the functional diversity and ecology of the microbial communities within their environments (Hugenholtz & Tyson, 2008; Quince et al., 2017).
Large-scale metagenomics projects, such as the Global Ocean Sampling (Venter et al., 2004; Rusch et al., 2007), Ocean Sampling Day (Kopf et al., 2015) and Tara Oceans (Sunagawa et al., 2015; Sunagawa et al., 2020), have provided fascinating new insights, but also revealed the gaps in our knowledge of marine microbial species, their geographical distribution, and their organisation in complex and dynamic communities. These and other large-scale initiatives have so far not covered the oceanic regions around Iceland, a complex marine environment that is characterized by distinct water masses and powerful currents: the cold Polar Water of the East Greenland Current and the Arctic Water of the East Icelandic Current from the north and the warm North Atlantic Water of the Irminger Current from the south (Malmberg, Valdimarsson & Mortensen, 1995; Valdimarsson & Malmberg, 1999). Most microbial studies in Icelandic waters have so far been conducted with traditional methods, like chlorophyll measurements or microscopy, and were therefore mainly focused on larger heterotrophs and photosynthetic microorganisms (Thórdardóttir, 1986; Gudmundsson, 1998; Astthorsson, Gislason & Jonsson, 2007). To establish the baseline knowledge of microbial ecology in Icelandic marine waters, we assembled metagenomic sequence data into draft microbial genomes often called metagenome-assembled genomes (MAGs).
The recovery of MAGs opens the route to further analysis such as comparative genomics to understand the roles of these microorganisms within their community and ecosystem (Sangwan, Xia & Gilbert, 2016). MAGs are particularly valuable for yet uncultured marine lineages as they reveal the metabolic potential and environmental adaptation of these microorganisms and give clues about trophic interactions and ecology within the environment. Several marine metagenomic studies recovered MAGs from marine environments with—among others—136 MAGs from the Red Sea (Haroon et al., 2016), 290 from the Mediterranean Sea (Tully et al., 2017), and 2,631 from the global oceans with data harvested by Tara Oceans (Tully, Graham & Heidelberg, 2018).
Here, we report 219 MAGs from 31 samples collected in the Arctic Ocean north of Iceland and in the warmer Atlantic waters south of Iceland. The samples were collected between 2015 and 2018 at four established oceanographic sampling stations visited during six research cruises with two depths sampled at each station. A set of metadata is available for these samples following the best practices recommended by Ten Hoopen et al. (2017), offering an opportunity to further understand the environmental conditions that shape the microbial communities in the waters off the Icelandic coasts.
Materials & Methods
Sampling
Seawater samples were collected between May 2015 and May 2018 from four stations, two in the North Atlantic Ocean, Selvogsbanki 2 and 5 (SB2 and SB5), and two in the Arctic Ocean, Siglunes 3 and 8 (SI3 and SI8) (Fig. 1A and Table 1). Sampling was conducted on board of the oceanographic research vessel Bjarni Sæmundsson RE 30 operated by the Icelandic Marine Research Institute (MRI) by collecting 5 L of seawater from the surface and the seafloor of the ocean, using Niskin bottles on a CTD rosette sampler. Seawater samples were directly filtered onto 0.22 µm Sterivex filter units (Merck Millipore) and immediately flash frozen in liquid nitrogen before stored at −80°C until further processing (full workflow in Fig. 1B).
Table 1. Sampling dates and locations with corresponding seawater temperature and salinity.
Sampling date | Station ID | Latitude (dd.mm) | Longitude (dd.mm) | Depth (m) | Temperature (°C) | Salinity (PSU) |
---|---|---|---|---|---|---|
23.05.2015 | SI8 | 67.9993 | −18.8313 | 1,045 | −0.481 | 34.913 |
30.05.2015 | SB5 | 62.9822 | −21.4737 | 0 | 7.632 | 35.195 |
30.05.2015 | SB5 | 62.9822 | −21.4737 | 1,004 | 4.391 | 34.998 |
23.05.2016 | SI8 | 68.0100 | −18.8247 | 0 | 1.632 | 34.869 |
23.05.2016 | SI8 | 68.0100 | −18.8247 | 1,045 | −0.431 | 34.914 |
31.05.2016 | SB5 | 62.9936 | −21.4839 | 0 | 8.147 | 35.113 |
31.05.2016 | SB5 | 62.9936 | −21.4839 | 1,004 | 4.722 | 35.017 |
21.05.2017 | SI8 | 68.0094 | −18.8325 | 1,045 | 2.700 | 34.852 |
21.05.2017 | SI8 | 68.0094 | −18.8325 | 0 | −0.381 | 34.914 |
22.05.2017 | SI3 | 66.5342 | −18.8378 | 470 | 5.517 | 34.492 |
22.05.2017 | SI3 | 66.5342 | −18.8378 | 0 | 0.151 | 34.906 |
30.05.2017 | SB5 | 62.9878 | −21.4800 | 1,004 | 8.477 | 34.761 |
30.05.2017 | SB5 | 62.9878 | −21.4800 | 0 | 4.801 | 35.009 |
09.08.2017 | SI3 | 66.5344 | −18.8419 | 0 | 9.980 | 34.310 |
09.08.2017 | SI3 | 66.5344 | −18.8419 | 470 | 0.190 | 34.900 |
09.08.2017 | SI8 | 68.0006 | −18.8375 | 1,045 | 7.640 | 34.650 |
09.08.2017 | SI8 | 68.0006 | −18.8375 | 0 | −0.370 | 34.910 |
18.08.2017 | SB2 | 63.4933 | −20.9569 | 0 | 12.000 | 33.700 |
18.08.2017 | SB2 | 63.4933 | −20.9569 | 90 | 8.470 | 34.940 |
18.08.2017 | SB5 | 62.9883 | −21.4867 | 0 | 12.200 | 34.980 |
18.08.2017 | SB5 | 62.9883 | −21.4867 | 1,004 | 4.730 | 35.010 |
16.02.2018 | SI3 | 66.5442 | −18.8400 | 470 | 0.044 | 34.901 |
16.02.2018 | SI8 | 68.0000 | −18.8386 | 0 | 0.533 | 34.640 |
16.02.2018 | SI8 | 68.0000 | −18.8386 | 1,045 | −0.410 | 34.914 |
18.05.2018 | SI8 | 68.0058 | −18.8256 | 0 | 1.355 | 34.727 |
18.05.2018 | SI8 | 68.0058 | −18.8256 | 1,045 | −0.428 | 34.914 |
20.05.2018 | SI3 | 66.5439 | −18.8406 | 0 | 5.108 | 34.894 |
29.05.2018 | SB2 | 63.4942 | −20.9008 | 0 | 7.625 | 34.913 |
29.05.2018 | SB2 | 63.4942 | −20.9008 | 90 | 7.298 | 35.031 |
29.05.2018 | SB5 | 62.9858 | −21.4731 | 0 | 7.740 | 35.042 |
29.05.2018 | SB5 | 62.9858 | −21.4731 | 1,004 | 4.488 | 34.978 |
Mock community
A marine mock community was included in the analysis for quality control, consisting of 20 bacterial and two archaeal species. Strains were cultivated according to Table 2. After 12 to 24 h of growth (to obtain 10e6 to 10e8 cell/ml), cells were counted on a Thoma cell BRAND (ref. 718020; 0.100 mm depth) to achieve a final concentration of 1.29 × 10e9 cell/L by dilutions. Synthetic seawater was prepared by adding 150 g of sea salts (Sigma-Aldrich, S9883 and 17.25 g of PIPES (Sigma-Aldrich, P1851) to 5 L of autoclaved MilliQ water. The mock community was immediately treated in the same manner as the other seawater samples and filtered onto Sterivex filters for DNA extraction.
Table 2. List of bacterial and archaeal species in the mock community.
Domain | Species name | % identity | Collection number | Growth parameters | Successfully reassembled |
---|---|---|---|---|---|
Bacteria | Alteromonas naphthalenivorans | 99.66% | ISCAR-05201 | Marine Broth, 22°C, pH 6.8, aerobic condition | Yes |
Bacteria | Jeotgalibacillus marinus | 100% | ISCAR-03118 | Marine Broth, 22°C, pH 6.8, aerobic condition | No |
Bacteria | Geobacillus thermoleovorans | 100% | ISCAR-00004 | 162 media, 65°C, pH 7.0, aerobic condition | No |
Bacteria | Colwellia psychrerythraea | 99% | ISCAR-05175 | Marine Broth, 22°C, pH 6.8, aerobic condition | Yes |
Bacteria | Dietzia psychralcaliphila | 99.52% | ISCAR-05191 | 92 media, 22°C, pH 6.8, aerobic condition | No |
Bacteria | Escherichia coli | 100% | ISCAR-02961 | LB media, 37°C, pH 7.0, aerobic condition | Yes |
Bacteria | Pseudomonas salina | 99.83% | ISCAR-05249 | Marine Broth media, 22°C, pH 6.8, aerobic condition | No |
Bacteria | Marinobacter psychrophilus | 99.84% | ISCAR-05186 | Marine Broth media, 22°C, pH 6.8, aerobic condition | Yes |
Bacteria | Photobacterium indicum | 100% | ISCAR-05002 | Marine Broth media, 22°C, pH 6.8, aerobic condition | Yes |
Bacteria | Pseudoalteromonas neustonica | 98.58% | ISCAR-05312 | 172 media, 22°C, pH 6.8, aerobic condition | Yes |
Bacteria | Reinekea aestuarii | 100% | DSM 29881 | Marine Broth media, 22°C, pH 6.8, aerobic condition | No |
Bacteria | Reinekea marinisedimentorum | 100% | DSM 15388 | Marine Broth media, 30°C, pH 6.8, aerobic condition | Yes |
Bacteria | Rhodococcus kyotonensis | 99.23% | ISCAR-05221 | Marine Broth media,22°C, pH 6.8, aerobic condition | No |
Bacteria | Reinekea sp. 84 | 97.75% with Reinekea marina | ISCAR-05258 | Marine Broth media, 22°C, pH 6.8, aerobic condition | No |
Bacteria | Sulfitobacter sp. 87 | 97.73% with Sulfitobacter donghicola | ISCAR-05261 | Marine Broth media, 22°C, pH 6.8, aerobic condition | No |
Bacteria | Sulfitobacter donghicola | 100% | DSM 23563 | Marine Broth media, 22°C, pH 6.8, aerobic condition | Yes |
Bacteria | Sulfitobacter guttiformis | 100% | DSM 11544 | Marine Broth media, 22°C, pH 6.8, aerobic condition | Yes |
Bacteria | Sulfitobacter pontiacus | 100% | DSM 10014 | Marine Broth media, 22°C, pH 6.8, aerobic condition | Yes |
Bacteria | Sulfitobacter undariae | 100% | DSM 102234 | Marine Broth media, 22°C, pH 6.8, aerobic condition | No |
Bacteria | Thermus thermophilus | 100% | ISCAR-03915 | 166 media, 65°C, pH 7.0, aerobic condition | No |
Bacteria | Vibrio cyclitrophicus | 100% | ISCAR-06209 | Marine Broth media, 22°C, pH 6.8, aerobic condition | No |
Archaea | Pyrococcus abyssi | 100% | DSM 25543 | YPS1 media, 90°C, pH 7, anaerobic condition, elemental sulfur | Yes |
Archaea | Thermococcus barophilus | 100% | DSM 11836 | TRM2, 85°C, pH 6.5, anaerobic condition, elemental sulfur | Yes |
Notes.
Growth media recipes in: 1Erauso et al. (1993) 2Marteinsson et al. (1999).
DNA extractions
DNA was extracted from all samples using the QIAGEN AllPrep kit according to the manufacturer’s instructions with modifications. Sterivex filters were aseptically removed from their plastic casing as described by Cruaud et al. (2017). Filters were transferred to tubes containing 600 µl RTL buffer from the kit and 0.2 g of 0.1 mm zirconia/silica beads (BioSpec, cat. 11079101z) for mechanical disruption of the cells (bead-beating) using a Disrupt MixerMill MM400 by Retsch with the program P9 (300 Hz) three times for 10 s each, cooling down tubes in icy water in between each bead-beating step. DNA quality was assessed with a NanoDrop 1000 Spectrophotometer (ThermoFisher) and DNA was quantified with a Qubit fluorometer (Qubit DNA BR assay, Invitrogen).
Library preparation and sequencing
High-throughput sequencing of the samples was performed by Genome Quebec using the HiSeq system (Illumina). Libraries were prepared using NEBNext UltraTM II DNA Library Prep Kit for Illumina (New England Biolabs) followed by sequencing on two lanes of an Illumina HiSeq 4000 PE150 system (Illumina) allocating 1/20 and 1/25 of a lane for each sample. Demultiplexing and conversion to FASTQ files were performed using bcl2fastq Conversion Software v1.8.4 (Illumina) resulting in 32 metagenomic datasets.
Co-assembly and binning
The quality of the raw sequencing reads was assessed using FastQC v0.11.8 (Andrews et al., 2012) (Fig. S1). Quality control of the raw reads was performed with Sunbeam v2.0.2 (Clarke et al., 2019) which includes trimming with Trimmomatic v0.36 (Bolger, Lohse & Usadel, 2014), adapter removal with Cutadapt v2.6 (Martin, 2011) (parameters PE -phred33 ILLUMINACLIP: NexteraPE-PE.fa:2:30:10:8:true LEADING: 3 TRAILING: 3 SLIDINGWINDOW: 4:15 MINLEN: 36), removal of low complexity sequences using Sunbeam Komplexity (default parameter) and removal of contaminating human sequences using the Genome Reference Consortium Human Build 38 patch release 13 GRCh38.p13 (Lander et al., 2001; Schneider et al., 2017). Resulting quality-filtered metagenomic data were divided into surface and seafloor datasets as the surface of the ocean can be considered a different environment compared to the seafloor (Fig. S2). Both datasets also included the mock community. After quality filtering, MEGAHIT v1.2.9 (Li et al., 2015; Li et al., 2016) (parameters: –min-contig-len 1000 -m 0.85) co-assembled both datasets of samples with a minimum contig length of 1000 bp, resulting in two FASTA files of community contigs. Quality-filtered short reads from each sample were mapped back to the contigs of both co-assemblies respectively using Bowtie v2 (default parameters and –no-unal flag) (Langmead & Salzberg, 2012). The resulting SAM files were indexed and converted to BAM files with SAMTOOLS v0.3.3 (parameters: view -F 4 -bS) (Li et al., 2009). For both co-assemblies, the FASTA files containing the contigs were formatted with the script reformat-fasta from Anvi’o v6.2 (Eren et al., 2015). The two contigs databases (the surface and the seafloor databases) were generated with Anvi’o, BAM files were profiled and merged to the respective databases. Automated binning was performed using Anvi’o script anvi-cluster-contigs with default parameters with three binning algorithms: CONCOCT v1.1.0 (Alneberg et al., 2013), MaxBin2 v2.2.6 (Wu, Simmons & Singer, 2016), and MetaBAT 2 v2:2.15 (Kang et al., 2019). For all binning results, completeness and redundancy of the bins were estimated with Anvio’s script anvi-estimate-genome-completeness which relies on CheckM v1.1.3 (Parks et al., 2015). Based on the comparison of the three binning algorithms, we selected the “good quality bins” from MetaBAT 2 with an estimated completion above 50% and an estimated redundancy below 10% according to standards suggested by Bowers et al. (2017). The relative proportions of good quality bins in the total number of bins was assessed by chi2 test.
Functional assignment, taxonomy and phylogenomic trees
We used PRODIGAL v2.6.3 (Hyatt et al., 2010) to identify Open Reading Frames (ORFs) within the contigs. The resulting ORFs were processed with Kaiju v1.7.3 (Menzel, Ng & Krogh, 2016) and NCBI nr+euk database (nr_euk 2019-06-25, 46GB, available for download at for taxonomic assignment. Beside the contig-based taxonomic assignment, we used GTDB-Tk v1.3.0 (Genome Taxonomy Database Toolkit) (Chaumeil et al., 2019) to construct two bacterial and two archaeal phylogenomic trees containing good quality MAGs (completeness ≥50%; contamination ≤10%) and Genome Taxonomy Data Bank (GTDB) R95 (released in July 2020) reference genomes to confirm taxonomic assignments of the MAGs (Parks et al., 2018). The trees were reconstructed using ARB (Ludwig et al., 2004) for comprehensive visualisation.
Data availability
The raw Illumina sequencing paired-end reads are available in the ENA under project accession number PRJEB41565 (ERP125360). MAGs are available under accession numbers ERS5621908 to ERS5622126. Code is available at https://github.com/clarajegousse/.
Results
Co-assemblies
The co-assembly of the 16 samples of the surface of the ocean yielded 445,328 contigs, with a minimal length of 1,000 bp, representing a total length of 1.06 Gb (1,060,942,783 nucleotides) with N50 of 2,627 bp and 1,271,859 gene calls (Table 3).
Table 3. Statistics summary of co-assemblies.
Surface | Seafloor | |
---|---|---|
Total nucleotides | 1.06 Gb | 1.23 Gb |
N50 | 2,382 bp | 2,327 bp |
L50 | 83,272 bp | 114,549 bp |
Number of contigs | 445,328 | 554,104 |
Longest contig | 864,343 bp | 1,302,516 bp |
Shortest contig | 1,000 bp | 1,000 bp |
Number of contigs >10 kb | 8,521 | 8,306 |
Number of genes (Prodigal) | 1,271,859 | 1,532,800 |
The co-assembly of the 17 samples of the seafloor of the ocean yielded 554,104 contigs, with a minimal length of 1,000 bp, representing a total of length of 1.23 Gb (1,233,390,295 nucleotides) with N50 of 2,327 bp and 1,532,800 gene calls (Table 3).
Binning
A comparison of the three binning algorithms - CONCOCT, MaxBin2 and MetaBAT 2 - was conducted on the surface and seafloor co-assemblies based on the number of good quality bins (Fig. 2). Good quality bins have an estimated completion above 50% and an estimated redundancy (also called estimated contamination) below 10% (Bowers et al., 2017). The relative proportions of good quality bins is significantly different for the three binning methods (χ2 = 135.23, df = 2, p-value <2.2e−16). The results of the binning showed that MetaBAT 2 resulted in a lower number of bins compared to CONCOCT and MaxBin2. Yet the number of good quality bins was much higher with MetaBAT 2 compared with CONCOCT and MaxBin2 (Table 4).
Table 4. Statistics summary of co-assemblies.
Co-assembly | Binning method | Number of bins | Number of MAGs | Average completeness (%) | Average contamination (%) |
---|---|---|---|---|---|
Surface | CONCOCT | 319 | 43 | 45.15 | 49.23 |
Surface | MaxBin2 | 302 | 17 | 25.77 | 13.30 |
Surface | MetaBAT 2 | 279 | 118 | 44.12 | 3.46 |
Seafloor | CONCOCT | 259 | 28 | 51.26 | 90.39 |
Seafloor | MaxBin2 | 358 | 18 | 34.59 | 18.63 |
Seafloor | MetaBAT 2 | 299 | 134 | 49.90 | 7.13 |
MetaBAT 2 gave the best results which were used for further analysis and shown in more detail in Fig. 3. Out of the 279 bins identified by MetaBAT 2 for the surface samples, 42.4% (118) of them are good quality bins that can be considered draft MAGs according to Bowers et al. (2017). Within the 118 good quality MAGs (Fig. 3B), 16 represent genomes of organisms from the mock community and 102 are assembled from the surface seawater. In the same manner, out of the 299 bins identified by MetaBAT 2 for the seafloor samples, 45.81% (134) of can be considered good draft MAGs. Within the 134 good quality MAGs (Fig. 3D), 17 represent genomes of organisms from the mock community and 117 are assembled from the seawater at the seafloor. The relative proportions of MAGs out of the total number of bins is the same out of the two co-assemblies datasets (χ2 = 0.27784, df = 1, p-value = 0.5981) which means that the environments do not seem to impact significantly the number of MAGs. In the same manner, the relative proportions of MAGs associated to the mock community out of the total number of MAGs is the same in the two co-assemblies datasets (χ2 = 0.0003, df = 1, p-value = 0.9858).
Taxonomy
When excluding members of the mock community based on taxonomic assignment and differential coverage, we identified 102 MAGs reconstructed from the surface co-assembly and 117 MAGs from the seafloor co-assembly. The surface MAGs include two eukaryotes (Bathycoccus and Micromonas), 92 bacteria, and eight archaea while the seafloor MAGs include 99 bacteria, 18 archaea and no eukaryotes.
The surface co-assembly yielded a total of 92 bacterial MAGs (Fig. 4). These MAGs are members of seven phyla (number of MAGs in brackets): Proteobacteria (52), Bacteroidota (31), Actinobacteriota (2), Verrumicrobiota (2), Planctomycetota (2), SAR324 (1) and Cyanobacteria (1). The MAG within the Cyanobacteria phylum belongs to the genus Synechococcus. Within the phylum Actinobacteriota, we retrieved two MAGs: one from a member of the genus Aquiluna and one of the genus Pontimonas. We reconstructed two MAGs within the phylum Planctomycetota. The two MAGs within the Verrumicrobiota belong to the family Akkermansiaceae. The Bacteroidota phylum includes 31 MAGs reconstructed from the sea surface co-assembly. Most of these Bacteroidota MAGs belong to the Flavobacteriaceae family (18), including one representant of the genus Polaribacter. Many MAGs within the Flavobacteriaceae family are related to MAGs revealed by Tara Ocean Consortium such as Cryomorphaceae bacterium and Flavobacteriales bacterium (CFB group bacteria). We also reconstructed 52 MAGs belonging to the phylum of Proteobacteria, including nine Rhodobacteraceae, ten SAR86 and ten Porticoccaceae. Within the three MAGs of the Burkholderiales order, one is within the Burkholderia genus, and the two others belong to the Methylophilaceae family according to GTDB.
The seafloor co-assembly yielded a total of 99 bacterial MAGs spanning across 12 phyla: Proteobacteria (46), Verrumicrobiota (9), Bacteroidota (9), Marinisomatota (8), Actinobacteria (5), Planctomycetota (5), Gemmatimonadota (4), Nitrospinota (3), Chloroflexota (2), SAR324 (2), Myxococcota (1), Lactescibacterota (1). Six of these phyla include exclusively MAGs from the seafloor (Nitrospinota, Myxococcota, Gemmatimonadota, Marinisomatota, Chloroflexa, Lactescibacterota). Within the Proteobacteria, most of the MAGs belong to the Gammaproteobacteria class with 32 MAGSs while the remaining 14 are part of the Alphaproteobacteria. Five orders within the Proteobacteria exclusively include MAGs reconstructed from the seafloor co-assembly (Rhizobiales, Rhodospirillales, TMED109, UBA10353, UBA4486) and none from the surface co-assembly.
Out of the 21 bacterial species of the mock community, 12 of them were re-assembled and given the correct taxonomic assignment down to species level (if available for the strain used) for Alteromonas sp., Geobacillus marinus, Colwellia sp., Escherichia coli, Marinobacter sp., Photobacterium sp., Pseudoalteromonas sp., Reinekea marinisedimentorum, Sulfitobacter donghicola, Sulfitobacter guttiformis, Sulfitobacter pontiacus and Thermus thermophilus. However, some distinct species of the mock community that belong to the same genus do not match any specific MAGs but seem to have been reassembled as one single MAG within the genus in question, such as Reinekea aestuarii and Reinekea sp. 84 as well as Sulfitobacter undariae and Sulfitobacter sp. 87. The genomes of Bacillus thermoleovorans, Dietzia sp., Halomonas sp. and Vibrio cyclitrophicus were not reassembled.
The surface co-assembly yielded only eight archaeal MAGs (Fig. 5), all within the Thermoplasmota phylum, including three MAGs within the genus MGIIb-O2 of the Thalassarchaeaceae family and five within the Poseidoniaceae family. The seafloor co-assembly resulted in 18 archaeal MAGs including one representant of the Thermoproteota phylum: this MAGs belongs to the UBA57 phylum within the order of the Nitrososphaerales. The 17 other archaeal MAGs are all comprised in the Thermoplasmatota phylum, within the class Poseidoniia, including representatives of the Poseidoniaceae and Thalassarchaeaceae families. The two archaeal members within the mock community (Pyrococcus abyssi and Thermococcus barophilus) were successfully reconstructed in both co-assemblies.
Discussion
Mock communities are used to quantify and characterise biases introduced in the sample processing pipeline (Brooks et al., 2015) and are indispensable to benchmark sequencing methods and downstream analysis (Singer et al., 2016; Sevim et al., 2019). Mock communities can also be used as a positive control for metagenomic studies. Our mock community confirmed that MetaBAT 2 was able to resolve genomes of species within the same genus, thus making it the most suitable binning algorithms out of the three tested in this study: CONCOCT, MaxBin2 and MetaBAT 2. This result is consistent with previous studies (Yue et al., 2020).
The ocean is a vast continuum and the samples were taken within a relatively small section/fraction of the North Atlantic Ocean at several sampling depths: the surface and the seafloor (90 m, 470 m, 1,006 m, and 1,060 m depending on the station). The differences in the sampling depth implies differences in lighting, pressure and temperature compared to the surface of the ocean. While the surface of the ocean is subjected to seasonal variations in day light and temperature, the seafloor remains darker and colder than the surface, and such parameters are driving microbial community structure and function. Therefore, we considered the surface and the seafloor of the ocean as two different types of environments which justifies our approach of two co-assemblies rather than assembling all of the 32 samples together. The fact that a number of MAGs were exclusively found in only one of the two environments, confirmed this.
Conclusions
The goal of this study was to reconstruct MAGs from 31 samples from Icelandic sea waters. The 219 MAGs span across 13 bacterial and two archaeal phyla and contribute to a more define picture of the global marine microbiome. Moreover, this study confirms, thanks to the inclusion of a mock community in the analysis, that the combination of co-assembly and binning with MetaBAT 2 allows, despite a relatively shallow sequencing depth, the recovery of quality MAGs that are a precious resource for further ecological and environmental studies.
Supplemental Information
Acknowledgments
The authors would like to thank Kristinn Gudmundsson and Bjarni Saemundsson’s crew from the Marine Research Institute, and Pauline Bergsten and Mia Cerfonteyn from the University of Iceland & Matís for sampling, Antonio Fernandez Guerra from the Max Plank Institute for Marine Microbiology and Arnar Pálsson from the University of Iceland for advice and Elvar Örn Jónsson from the University of Iceland for technical support. The analyses presented in the study were performed using the resources provided by the Icelandic High Performance Computing Centre at the University of Iceland.
Funding Statement
The work is part of the Microbes in the Icelandic Marine Environment (MIME) project which was funded by the Grant of Excellence (No. 163266-051) of the Icelandic Research Fund (Rannís). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional Information and Declarations
Competing Interests
Clara Jégousse, Pauline Vannier, René Groben and Viggó Marteinsson are employees of Matís ohf.
Author Contributions
Clara Jégousse conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.
Pauline Vannier conceived and designed the experiments, performed the experiments, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft.
René Groben, Frank Oliver Glöckner and Viggó Marteinsson conceived and designed the experiments, authored or reviewed drafts of the paper, and approved the final draft.
DNA Deposition
The following information was supplied regarding the deposition of DNA sequences:
Data are available at the ENA under project number PRJEB41565: all MAGs: ERS5621908 to ERS5622126; the surface and seafloor co-assemblies: ERS5565811 and ERS5565812.
Data Availability
The following information was supplied regarding data availability:
Code is available at Github:
https://github.com/clarajegousse/mime.
The following data are available at ENA:
- Raw data, co-assemblies and MAGs: PRJEB41565.
- Raw sequence data for the mock community: ERS5472810 to ERS5472840, and ERS5475418.
- The surface and seafloor co-assemblies: ERS5565811 and ERS5565812 respectively.
- MAGs: ERS5621908 to ERS5622126.
References
- Alneberg et al. (2013).Alneberg J, Bjarnason BS, De Bruijn I, Schirmer M, Quick J, Ijaz UZ, Loman NJ, Andersson AF, Quince C. CONCOCT: clustering contigs on coverage and composition. 2013 doi: 10.1038/nmeth.3103.1312.4038 [DOI] [PubMed]
- Andrews et al. (2012).Andrews S, Krueger F, Segonds-Pichon A, Biggins L, Krueger C, Wingett S. FastQC. Babraham: Babraham Institute; 2012. [Google Scholar]
- Astthorsson, Gislason & Jonsson (2007).Astthorsson OS, Gislason A, Jonsson S. Climate variability and the Icelandic marine ecosystem. Deep Sea Research Part II: Topical Studies in Oceanography. 2007;54(23–26):2456–2477. doi: 10.1016/j.dsr2.2007.07.030. [DOI] [Google Scholar]
- Azam (1998).Azam F. Microbial control of oceanic carbon flux: the plot thickens. Science. 1998;280(5364):694–696. doi: 10.1126/science.280.5364.694. [DOI] [Google Scholar]
- Azam et al. (1983).Azam F, Fenchel T, Field JG, Gray J, Meyer-Reil L, Thingstad F. The ecological role of water-column microbes in the sea. Marine Ecology Progress Series. 1983;10:257–263. [Google Scholar]
- Benoit et al. (2020).Benoit G, Mariadassou M, Robin S, Schbath S, Peterlongo P, Lemaitre C. SimkaMin: fast and resource frugal de novo comparative metagenomics. Bioinformatics. 2020;36(4):1275–1276. doi: 10.1093/bioinformatics/btz685. [DOI] [PubMed] [Google Scholar]
- Bolger, Lohse & Usadel (2014).Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bowers et al. (2017).Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy T. BK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, Tringe SG, Ivanova NN, Copeland A, Clum A, Becraft ED, Malmstrom RR, Birren B, Podar M, Bork P, Weinstock GM, Garrity GM, Dodsworth JA, Yooseph S, Sutton G, Glöckner FO, Gilbert JA, Nelson WC, Hallam SJ, Jungbluth SP, Ettema TJG, Tighe S, Konstantinidis KT, Liu W-T, Baker BJ, Rattei T, Eisen JA, Hedlund B, McMahon KD, Fierer N, Knight R, Finn R, Cochrane G, Karsch-Mizrachi I, Tyson GW, Rinke C, Schriml L, Hugenholtz P, Yilmaz P, Meyer F, Lapidus A, Parks DH, Murat Eren A, Banfield JF, Woyke T, TGS Consortium Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology. 2017;35(8):725–731. doi: 10.1038/nbt.3893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brooks et al. (2015).Brooks JP, Edwards DJ, Harwich MD, Rivera MC, Fettweis JM, Serrano MG, Reris RA, Sheth NU, Huang B, Girerd P, Strauss JF, Jefferson KK, Buck GA, (additional members), VMC The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiology. 2015;15(1):66. doi: 10.1186/s12866-015-0351-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chaumeil et al. (2019).Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2019;36(6):1925–1927. doi: 10.1093/bioinformatics/btz848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clarke et al. (2019).Clarke EL, Taylor LJ, Zhao C, Connell A, Lee J-J, Fett B, Bushman FD, Bittinger K. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome. 2019;7(1):46. doi: 10.1186/s40168-019-0658-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cruaud et al. (2017).Cruaud P, Vigneron A, Fradette M-S, Charette SJ, Rodriguez MJ, Dorea CC, Culley AI. Open the Sterivex casing: an easy and effective way to improve DNA extraction yields. Limnology and Oceanography: Methods. 2017;15(12):1015–1020. doi: 10.1002/lom3.10221. [DOI] [Google Scholar]
- Erauso et al. (1993).Erauso G, Reysenbach A-L, Godfroy A, Meunier J-R, Crump B, Partensky F, Baross JA, Marteinsson V, Barbier G, Pace NR, Prieur D. Pyrococcus abyssi sp. nov., a new hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent. Archives of Microbiology. 1993;160(5):338–349. doi: 10.1007/BF00252219. [DOI] [Google Scholar]
- Eren et al. (2015).Eren AM, Esen ÖC, Quince C, Vineis JH, Morrison HG, Sogin ML, Delmont TO. Anvio: an advanced analysis and visualization platform for omics data. PeerJ. 2015;3:e1319. doi: 10.7717/peerj.1319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Falkowski, Fenchel & Delong (2008).Falkowski PG, Fenchel T, Delong EF. The microbial engines that drive Earth’s biogeochemical cycles. Science. 2008;320(5879):1034–1039. doi: 10.1126/science.1153213. [DOI] [PubMed] [Google Scholar]
- Gudmundsson (1998).Gudmundsson K. Long-term variation in phytoplankton productivity during spring in Icelandic waters. ICES Journal of Marine Science. 1998;55(4):635–643. doi: 10.1006/jmsc.1998.0391. [DOI] [Google Scholar]
- Haroon et al. (2016).Haroon MF, Thompson LR, Parks DH, Hugenholtz P, Stingl U. A catalogue of 136 microbial draft genomes from Red Sea metagenomes. Scientific Data. 2016;3(1):1–6. doi: 10.1038/sdata.2016.50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hugenholtz & Tyson (2008).Hugenholtz P, Tyson GW. Metagenomics. Nature. 2008;455(7212):481–483. doi: 10.1038/455481a. [DOI] [PubMed] [Google Scholar]
- Hyatt et al. (2010).Hyatt D, Chen G-L, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11(1):119. doi: 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kang et al. (2019).Kang DD, Li F, Kirton E, Thomas A, Egan R, An H, Wang Z. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ. 2019;7:e7359. doi: 10.7717/peerj.7359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kopf et al. (2015).Kopf A, Bicak M, Kottmann R, Schnetzer J, Kostadinov I, Lehmann K, Fernandez-Guerra A, Jeanthon C, Rahav E, Ullrich M, Wichels A, Gerdts G, Polymenakou P, Kotoulas G, Siam R, Abdallah RZ, Sonnenschein EC, Cariou T, O’Gara F, Jackson S, Orlic S, Steinke M, Busch J, Duarte B, Caçador I, Canning-Clode J, Bobrova O, Marteinsson V, Reynisson E, Loureiro CM, Luna GM, Quero GM, Löscher CR, Kremp A, DeLorenzo ME, Øvreås L, Tolman J, LaRoche J, Penna A, Frischer M, Davis T, Katherine B, Meyer CP, Ramos S, Magalhães C, Jude-Lemeilleur F, Aguirre-Macedo ML, Wang S, Poulton N, Jones S, Collin R, Fuhrman JA, Conan P, Alonso C, Stambler N, Goodwin K, Yakimov MM, Baltar F, Bodrossy L, Van De Kamp J, Frampton DM, Ostrowski M, Van Ruth P, Malthouse P, Claus S, Deneudt K, Mortelmans J, Pitois S, Wallom D, Salter I, Costa R, Schroeder DC, Kandil MM, Amaral V, Biancalana F, Santana R, Pedrotti ML, Yoshida T, Ogata H, Ingleton T, Munnik K, Rodriguez-Ezpeleta N, Berteaux-Lecellier V, Wecker P, Cancio I, Vaulot D, Bienhold C, Ghazal H, Chaouni B, Essayeh S, Ettamimi S, Zaid EH, Boukhatem N, Bouali A, Chahboune R, Barrijal S, Timinouni M, El Otmani F, Bennani M, Mea M, Todorova N, Karamfilov V, ten Hoopen P, Cochrane G, L’Haridon S, Bizsel KC, Vezzi A, Lauro FM, Martin P, Jensen RM, Hinks J, Gebbels S, Rosselli R, De Pascale F, Schiavon R, dos Santos A, Villar E, Pesant S, Cataletto B, Malfatti F, Edirisinghe R, Silveira J. AH, Barbier M, Turk V, Tinta T, Fuller WJ, Salihoglu I, Serakinci N, Ergoren MC, Bresnan E, Iriberri J, Nyhus P. AF, Bente E, Karlsen HE, Golyshin PN, Gasol JM, Moncheva S, Dzhembekova N, Johnson Z, Sinigalliano CD, Gidley ML, Zingone A, Danovaro R, Tsiamis G, Clark MS, Costa AC, El Bour M, Martins AM, Collins RE, Ducluzeau A-L, Martinez J, Costello MJ, Amaral-Zettler LA, Gilbert JA, Davies N, Field D, Glöckner FO. The ocean sampling day consortium. GigaScience. 2015;4(1):27. doi: 10.1186/s13742-015-0066-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lander et al. (2001).Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng J-F, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen H-C, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert J. GR, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit A. FA, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang S-P, Yeh R-F, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Patrinos A, Morgan MJ, IHGS Consortium. Whitehead Institute for Biomedical Research, CfGR. TS Centre. WUGS Center. UDJG Institute. BC of Medicine Human Genome Sequencing Center. RGS Center. RGS Genoscope UMR-8030. IoMB Department of Genome Analysis. GS Center. BGIG Center. TIfSB Multimegabase Sequencing Center. SGT Center. U of Oklahoma’s Advanced Center for Genome Technology. MPI for Molecular Genetics. LAHGC Cold Spring Harbor Laboratory. G-GRC for Biotechnology. *Genome Analysis Group (listed in alphabetical order, a. i. i. l. u. oh. U. N. I. oH Scientific management: National Human Genome Research Institute. SHG Center. U of Washington Genome Center. K. U. S. oM Department of Molecular Biology. U of Texas Southwestern Medical Center at Dallas. U. D. oE Office of Science. Trust TW Initial sequencing and analysis of the human genome. Nature. 2001;409(6822):860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
- Langmead & Salzberg (2012).Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li et al. (2015).Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31(10):1674–1676. doi: 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]
- Li et al. (2016).Li D, Luo R, Liu C-M, Leung C-M, Ting H-F, Sadakane K, Yamashita H, Lam T-W. MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods. 2016;102:3–11. doi: 10.1016/j.ymeth.2016.02.020. [DOI] [PubMed] [Google Scholar]
- Li et al. (2009).Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ludwig et al. (2004).Ludwig W, Strunk O, Westram R, Richter L, Meier H, Yadhukumar, Buchner A, Lai T, Steppi S, Jobb G, Frster W, Brettske I, Gerber S, Ginhart AW, Gross O, Grumann S, Hermann S, Jost R, Knig A, Liss T, Lmann R, May M, Nonhoff B, Reichel B, Strehlow R, Stamatakis A, Stuckmann N, Vilbig A, Lenke M, Ludwig T, Bode A, Schleifer K. ARB: a software environment for sequence data. Nucleic Acids Research. 2004;32(4):1363–1371. doi: 10.1093/nar/gkh293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Malmberg, Valdimarsson & Mortensen (1995).Malmberg S-A, Valdimarsson H, Mortensen J. Long time series in Icelandic Waters, in relation to physical variability in the northern north Atlantic. Ocean Challenge. 1995;6:48–51. [Google Scholar]
- Marteinsson et al. (1999).Marteinsson VT, Birrien J-L, Reysenbach A-L, Vernet M, Marie D, Gambacorta A, Messner P, Sleytr UB, Prieur D. Thermococcus barophilus sp. nov., a new barophilic and hyperthermophilic archaeon isolated under high hydrostatic pressure from a deep-sea hydrothermal vent. International Journal of Systematic and Evolutionary Microbiology. 1999;49(2):351–359. doi: 10.1099/00207713-49-2-351. [DOI] [PubMed] [Google Scholar]
- Martin (2011).Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. Journal. 2011;17(1):10–12. [Google Scholar]
- Menzel, Ng & Krogh (2016).Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Communications. 2016;7(1):11257. doi: 10.1038/ncomms11257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parks et al. (2018).Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, Hugenholtz P. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology. 2018;36(10):996–1004. doi: 10.1038/nbt.4229. [DOI] [PubMed] [Google Scholar]
- Parks et al. (2015).Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research. 2015;25(7):1043–1055. doi: 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pomeroy (1974).Pomeroy LR. The ocean’s food web, a changing paradigm. Bioscience. 1974;24(9):499–504. doi: 10.2307/1296885. [DOI] [Google Scholar]
- Quince et al. (2017).Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nature Biotechnology. 2017;35(9):833–844. doi: 10.1038/nbt.3935. [DOI] [PubMed] [Google Scholar]
- Rusch et al. (2007).Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers Y-H, Falcn LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, Platt T, Bermingham E, Gallardo V, Tamayo-Castillo G, Ferrari MR, Strausberg RL, Nealson K, Friedman R, Frazier M, Venter JC. The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific. PLOS Biology. 2007;5(3):e77. doi: 10.1371/journal.pbio.0050077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sangwan, Xia & Gilbert (2016).Sangwan N, Xia F, Gilbert JA. Recovering complete and draft population genomes from metagenome datasets. Microbiome. 2016;4(1):8. doi: 10.1186/s40168-016-0154-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneider et al. (2017).Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen H-C, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT, Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin C-S, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Research. 2017;27(5):849–864. doi: 10.1101/gr.213611.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sevim et al. (2019).Sevim V, Lee J, Egan R, Clum A, Hundley H, Lee J, Everroad RC, Detweiler AM, Bebout BM, Pett-Ridge J, Göker M, Murray AE, Lindemann SR, Klenk H-P, O’Malley R, Zane M, Cheng J-F, Copeland A, Daum C, Singer E, Woyke T. Shotgun metagenome data of a defined mock community using Oxford Nanopore, PacBio and Illumina technologies. Scientific Data. 2019;6(1):285. doi: 10.1038/s41597-019-0287-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Singer et al. (2016).Singer E, Andreopoulos B, Bowers RM, Lee J, Deshpande S, Chiniquy J, Ciobanu D, Klenk H-P, Zane M, Daum C, Clum A, Cheng J-F, Copeland A, Woyke T. Next generation sequencing data of a defined microbial mock community. Scientific Data. 2016;3(1):160081. doi: 10.1038/sdata.2016.81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sunagawa et al. (2020).Sunagawa S, Acinas SG, Bork P, Bowler C, Acinas SG, Babin M, Boss E, Cochrane G, De Vargas C, Follows M, Gorsky G, Grimsley N, Guidi L, Hingamp P, Iudicone D, Jaillon O, Kandels S, Karp-Boss L, Karsenti E, Lescot M, Not F, Ogata H, Pesant S, Poulton N, Raes J, Sardet C, Sieracki M, Speich S, Stemmann L, Sullivan MB, Wincker P, Eveillard D, Lombard F, Pesant S, Sullivan MB, Tara Oceans Coordinators Tara Oceans: towards global ocean ecosystems biology. Nature Reviews Microbiology. 2020;18(8):428–445. doi: 10.1038/s41579-020-0364-5. [DOI] [PubMed] [Google Scholar]
- Sunagawa et al. (2015).Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B, Zeller G, Mende DR, Alberti A, Cornejo-Castillo FM, Costea PI, Cruaud C, d’Ovidio F, Engelen S, Ferrera I, Gasol JM, Guidi L, Hildebrand F, Kokoszka F, Lepoivre C, Lima-Mendez G, Poulain J, Poulos BT, Royo-Llonch M, Sarmento H, Vieira-Silva S, Dimier C, Picheral M, Searson S, Kandels-Lewis S, Bowler C, de Vargas C, Gorsky G, Grimsley N, Hingamp P, Iudicone D, Jaillon O, Not F, Ogata H, Pesant S, Speich S, Stemmann L, Sullivan MB, Weissenbach J, Wincker P, Karsenti E, Raes J, Acinas SG, Bork P. Structure and function of the global ocean microbiome. Science. 2015;348:6237. doi: 10.1126/science.1261359. [DOI] [PubMed] [Google Scholar]
- Ten Hoopen et al. (2017).Ten Hoopen P, Finn RD, Bongo LA, Corre E, Fosso B, Meyer F, Mitchell A, Pelletier E, Pesole G, Santamaria M, Willassen NP, Cochrane G. The metagenomic data life-cycle: standards and best practices. Gigascience. 2017;6(8):1–11. doi: 10.1093/gigascience/gix047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thórdardóttir (1986).Thórdardóttir T. The role of freshwater outflow in coastal marine ecosystems. Springer-Verlag; Berlin Heidelberg: 1986. Timing and duration of spring blooming south and southwest of Iceland; pp. 345–360. [DOI] [Google Scholar]
- Tully, Graham & Heidelberg (2018).Tully BJ, Graham ED, Heidelberg JF. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Scientific Data. 2018;5:170203. doi: 10.1038/sdata.2017.203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tully et al. (2017).Tully BJ, Sachdeva R, Graham ED, Heidelberg JF. 290 metagenome-assembled genomes from the Mediterranean Sea: a resource for marine microbiology. PeerJ. 2017;5:e3558. doi: 10.7717/peerj.3558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valdimarsson & Malmberg (1999).Valdimarsson H, Malmberg S-A. Near-surface circulation in Icelandic waters derived from satellite tracked drifters. Rit Fiskideild. 1999;16:23–40. [Google Scholar]
- Venter et al. (2004).Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers Y-H, Smith HO. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304(5667):66–74. doi: 10.1126/science.1093857. [DOI] [PubMed] [Google Scholar]
- Wu, Simmons & Singer (2016).Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32(4):605–607. doi: 10.1093/bioinformatics/btv638. [DOI] [PubMed] [Google Scholar]
- Yue et al. (2020).Yue Y, Huang H, Qi Z, Dou H-M, Liu X-Y, Han T-F, Chen Y, Song X-J, Zhang Y-H, Tu J. Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets. BMC Bioinformatics. 2020;21(1):1–15. doi: 10.1186/s12859-019-3325-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The raw Illumina sequencing paired-end reads are available in the ENA under project accession number PRJEB41565 (ERP125360). MAGs are available under accession numbers ERS5621908 to ERS5622126. Code is available at https://github.com/clarajegousse/.
The following information was supplied regarding data availability:
Code is available at Github:
https://github.com/clarajegousse/mime.
The following data are available at ENA:
- Raw data, co-assemblies and MAGs: PRJEB41565.
- Raw sequence data for the mock community: ERS5472810 to ERS5472840, and ERS5475418.
- The surface and seafloor co-assemblies: ERS5565811 and ERS5565812 respectively.
- MAGs: ERS5621908 to ERS5622126.