Abstract
The South China Sea (SCS) is a marginal sea characterized by strong land-sea biogeochemical interactions. SCS has a distinctive landscape with a multitude of seamounts in its basin. Seamounts create “seamount effects” that influence the diversity and distribution of planktonic microorganisms in the surrounding oligotrophic waters. Although the vertical distribution and community structure of marine microorganisms have been explored in certain regions of the global ocean, there is a lack of comprehensive microbial genomic surveys for uncultured microorganisms in SCS, particularly in the seamount regions. Here, we employed a metagenomic approach to study the uncultured microbial communities sampled from the Xianbei seamount region to the North Coast waters of SCS. A total of 1887 non-redundant prokaryotic metagenome-assembled genomes (MAGs) were reconstructed, of which, 153 MAGs were classified as high-quality MAGs based on the MIMAG standards. The community structure and genomic information provided by this dataset could be used to analyze microbial distribution and metabolism in the SCS.
Subject terms: Marine biology, Microbial biooceanography, Microbial ecology
Background & Summary
The South China Sea (SCS) is the largest marginal sea in the western Pacific Ocean. It is characterized by a tropical and subtropical climate1 with complex physical and chemical gradients over spatial scales2,3. The SCS encompasses a multitude of underwater seamounts rising from the seafloor4,5, which are unique topographic features that could alter the local hydrodynamics of the surrounding waters6–8. These seamounts cause “seamount effects” in the oligotrophic oceans, leading to intensified vertical movements and rapid exchanges of shallow and deep waters7–10. These vertical movements, both upwelling and downwelling, have a fundamental influence on the primary production and phytoplankton diversity8–12. The differential distribution patterns of diverse marine phytoplankton may further affect the assemblage of heterotrophic microbial communities as a result of substrate-constrained partition and succession13. For instance, it was found that the vertically distributed phytoplankton had a significant influence on the bacterioplankton community structure at different water layers surrounding seamounts in the western Pacific Ocean8.
The Xianbei seamount is a shallow underwater mountain situated in the central basin of the SCS, with its summit lying approximately 208 meters below the sea surface12,14. The deep seawater in the SCS is mainly transported from the western Pacific Ocean through the Luzon Strait4,5. This transportation process results in a rapid basin-scale cyclonic circulation pattern and creates deep upwelling events in the seamount regions along the way4,5. Mount Xianbei is one of the largest seamounts close to the euphotic zone, making it a natural laboratory for studying seamount effects on microbial diversity and distribution. In addition, how the microbial communities in seamount regions differ from those in the continental shelf or coastal waters has not been fully understood.
In this study, we collected 61 seawater samples from the Xianbei seamount region (XB, n = 43), as well as Dongsha (DS, n = 11) and Xisha (XS, n = 7) areas to survey the microbial diversity and metabolic potentials in SCS (Fig. 1). Sample metadata, sequencing strategy and environmental factors can be found in Table S1. The 16S rRNA gene amplicon sequencing data revealed that Alphaproteobacteria and Gammaproteobacteria were the most abundant bacterial groups in all surface (5 m) samples. The cumulative relative abundance of Alphaproteobacteria Amplicon Sequence Variants (ASVs) ranged from 31.66% to 55.08%, while for Gammaproteobacteria, the cumulative proportions of ASVs were in the range of 6.98% to 37.62%. As expected, cyanobacteria were found to be prevalent in samples of the top 150 m in depth (Fig. 2a,b). In the Xianbei seamount region, as the depth increased, the cumulative relative abundance of Alphaproteobacteria or Cyanobacteria ASVs showed a decreasing trend, whereas for other taxonomic groups, such as Gammaproteobacteria, Thermoproteota, SAR324 clade, and Marinimicrobia (SAR406 clade), an increasing trend with depth was observed (Fig. 2b,Table S2).
Upon metagenomic sequencing and binning, a total of 1887 dereplicated Metagenome Assembled Genomes (MAGs) were reconstructed with completeness ≥50% and contamination <10%. Of them, 1260, 325, and 302 representative MAGs originated from XB, DS, and XS metagenomes, respectively (Table S3a). Notably, 153 of them (8.1%) were classified as high-quality MAGs based on the MIMAG (Minimum Information about a Metagenome-Assembled Genome) standards15. These MAGs were taxonomically assigned to 4 archaeal and 24 bacterial phyla based on the Genome Taxonomy Database (GTDB)16, with a total of 240 archaeal and 1647 bacterial MAGs. Archaeal MAGs were affiliated with Thermoplasmatota (219), Thermoproteota (18), Nanoarchaeota (2), and Asgardarchaeota (1) phyla (Fig. 3, Table S3b). Bacterial MAGs were mainly from Pseudomonadota (757), Bacteroidota (157), Actinomycetota (156), Planctomycetota (127), Verrucomicrobiota (93), Chloroflexota (73), Marinisomatota (67) and SAR324 (65) phyla. Within the Pseudomonadota phylum, MAGs were assigned to either Alphaproteobacteria (362) or Gammaproteobacteria (395) class. Comparative analysis of the MAGs recovered here with those recovered from diverse SCS habitats17–19, OceanDNA20 and Tara Oceans21, revealed that 19.34% of the MAGs (366 MAGs) recovered in this study were not present in any of these datasets at a 95% average nucleotide identity (ANI) threshold (Table S3c).
Genes were called at the contig level and deduplicated in order to generate a non-redundant reference gene catalog, as a supplement to the MAG-based analysis. In total, 10,551,413 unique genes were predicted, and their functions were annotated with KEGG Orthology (KO) groups.
Materials and Methods
Sample collection and environmental variable characterization
Seawater samples were collected from the South China Sea (16°32′–16°46′ N, 116°41′–116°47′ E) between August and September, 2021. Details of sampling sites and depths can be found in Fig. 1 and Table S1. Following the methodology of a previous study on harmful algal species12, seawater samples were collected at a depth of 5 meters from XS3.1 to XS9.1, DS6.1 to DS17.1, and XB1.1 to XB20.1. Additionally, in the XB2, XB3, XB4, and XB5 regions, seawater samples were collected across multiple depths including 5, 25, 100, 150, 200, 300, 500, 800, 1000, and 1500 meters. 2 L seawater samples were collected from each sampling site using size-fractionated filtration to remove mesozooplankton and suspended particles, and microbial cells within the size range of 0.2–200 μm were collected on polycarbonate membrane filters (Millipore, USA). Filters were then snap-frozen in liquid nitrogen and stored at −80 °C until DNA extraction. Temperature (°C), and Density (Kg/m³3) were measured using a SeaBird CTD system (Ocean Test Equipment, Florida, USA) on board.
DNA extraction, amplicon and metagenomic library construction and sequencing
Total DNA was extracted and quantified as documented in the previous study12. All DNA samples were preserved at −80 °C until amplicon and metagenomic library preparation and sequencing. The detailed amplicon library preparation and sequencing have been documented previously12,22. Briefly, the V4-V5 regions of 16S rRNA genes were amplified using the universal primer set 515Y/926 R (5′-GTGYCAGCMGCCGCGGTAA-3′/5′-CCGYCAATTYMTTTRAGTTT-3′)23 with thermal cycling parameters followed the previously described protocol23,24. PCR products were used for library construction and subsequent sequencing on an Illumina NovaSeq platform at Novogene (Novogene, Beijing, China) using PE250 chemistries. For metagenomic sequencing, DNA was sheared into ~500 bp fragments using the Covaris Ultrasonicator M220 (Covaris, USA), then libraries were prepared using the NovaSeq Reagent Kit (Illumina, USA) according to the manufacturer’s instructions. Metagenomic sequencing was performed on the NovaSeq 6000 sequencing platform at Novogene (Beijing, China) using the Illumina PE150 chemistries.
Sequence quality control
As previously described12, the raw reads of amplicon sequencing were first trimmed using cutadapt v3.525 to remove adaptors and PCR primers with an error rate of 0.2, and the clean reads were subjected to further analysis using the Fuhrman lab pipeline26,27 with detailed parameters described previously by Huang et al.12. Briefly, clean reads were further split into 16S and 18 S rRNA pools using custom 16S/18 S databases derived from the SILVA 138 ribosomal RNA database28 and the Protist Ribosomal Reference database (PR2)29. The concatenated 16S rRNA reads were denoised using the DADA230 denoise-paired command to reconstruct ASVs, which were then taxonomy classified against the SILVA v138 database28. ASV sequences of chloroplasts and mitochondria were removed in the following analysis. For Metagenomic sequencing, raw reads were first trimmed using fastp v0.19.531, followed by the removal of human contaminants using bbmap.sh with specific parameters (minid = 0.95, maxindel = 3, bwr = 0.16, bw = 12, quickmatch, fast) and the recommended reference sequence file: hg19_main_mask_ribo_animal_allplant_allfungus.fa (http://sourceforge.net/projects/bbmap). Clean reads were used for metagenomic assembly and binning.
Metagenomic assembly, gene prediction, MAG generation, refinement, and quality assessment
For each sample, high-quality reads were assembled into contigs using MEGAHIT v1.2.932,33 with the kmer parameter–k-list 21,33,55,77,99,127. Samples from XS, DS and XB were also co-assembled using the same kmer set and assembler. The assembled contigs underwent gene-coding sequences prediction using Prodigal v2.6.334 in “meta” mode. To generate a gene catalog of non-redundant sequences, all the coding sequences were clustered into representative sequences at 95% identity using CD-HT v4.6.135. Functions of the non-redundant genes were predicted by KofamScan36 using the prokaryotic, eukaryotic and viral KEGG gene database (Release 106.1) with default settings.
Contigs longer than 1 kb were selected for metagenomic binning. We utilized multiple toolkits to recover high-quality MAGs, each sample assembly or co-assembly was binned using a combination of several tools including BASALT (via MetaBAT2 v2.12.1, MaxBin2 v.2.2.4, and CONCOCT v1.1.0 with more-sensitivity parameter)37–40, metaWRAP (via MetaBAT2 v2.12.1 and CONCOCT v1.1.0)41, MetaBinner v1.4.442, MetaCoAG v1.143, SemiBin v1.5.1 (single_easy_bin,–self-supervised)44, Vamb v4.1.045 and MetaDecoder v1.0.1846 with default parameters. The resulting bins were then pre-assessed and quality-filtered using MDMcleaner v0.8.747, retaining only bins with completeness ≥50% and contamination ≤10%. All these bins were further dereplicated into unique MAGs using dRep v3.4.048 (-comp 50 -con 10 options) at 99% ANI. The completeness and contamination were estimated using CheckM v.1.2.149, based on which these MAGs were classified into high-, medium-quality classes according to the MIMAG criteria15.
Taxonomic annotation and phylogenomic analysis
The final 1887 MAGs were taxonomically classified using GTDB-Tk v2.1.1 with the reference GTDB release 21416. The archaeal and bacterial phylogenomic trees were constructed using protein sequences of 41 single-copy marker genes extracted from these MAGs50,51. Sequences were aligned using MAFFT v7.52052 and further automatically trimmed using trimAL v1.4.1 (-automated1)53. The alignments were concatenated using catfasta2phyml v1.1.0 (https://github.com/nylander/catfasta2phyml) and missing data were filled with gaps. The maximum-likelihood (ML) phylogenomic trees were constructed using IQ-TREE v2.0.3 with 1000 bootstrapping (-m LG + R10 -B 1000)54, and were visualized and annotated using the Interactive Tree of Life (iTOL) web tool55.
Data Records
Raw reads generated in this study have been deposited at the NCBI Sequence Read Archive (SRA) database under the BioProject number PRJNA88076256, including accession numbers for both amplicon and metagenomic sequencing reads. MAGs have been deposited at Genbank under the same NCBI Bioproject56. ASVs, metagenomic assemblies and MAGs generated in this study have been deposited at Figshare57. The functional annotations of both contigs and MAGs have also been deposited into the same Figshare repository57.
Technical Validation
All raw data processing steps, including software and parameters used in this study, were described in the Methods section. The quality of clean reads was assessed using FastQC v0.11.8, and the quality of the MAGs was assessed using CheckM v.1.2.149. We have performed gene annotation of MAGs using Prokka v1.14.558. MAGs recovered in this study were compared with diverse SCS habitats including cold seeps17, deep-sea sediments18, subtropical estuaries19, as well as OceanDNA20 and Tara Oceans21 using dRep v3.4.048 (-comp 50 -con 10 options) at 95% average nucleotide identity to investigate the novelty of the MAGs.
Supplementary information
Acknowledgements
This work was supported by the National Natural Science Foundation of China (grant nos. 42276163, 92051117, 32170108, 42188102), by Shenzhen Science, Technology and Innovation Commission Program (JCYJ20220530115401003), by the MEL Visiting Fellowship of Xiamen University (MELRS2210), and by Guangdong Basic and Applied Basic Research Foundation (2021B1515120080). The sequencing and logistics were supported by the Science and Technology Innovation 2025 Major Project of Ningbo City (grant no. 2022Z189), the Independent Research Projects of Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) (grant No. SML2021SP204), and the National Science and Technology Basic Resources Investigation Program of China (2018FY100206). We thank Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) for the cruise support (SML2020SI1001). We are also grateful to colleagues from the “seamount team” for their help in field sampling.
Author contributions
S.H. and H.J. conceived this study. S.X. and H.H. conducted field sampling and DNA extraction. S.X., H.H. and S.C. analyzed the amplicon data, assembled the metagenomes, generated the MAGs and produced all figures under the supervision of S.H. and H.J. S.X., H.H. and S.C. interpreted the results and wrote the first draft. S.H. and MZA revised the draft. W.X., MZA and H.J. reviewed and edited the draft. All authors reviewed and contributed to the final version of the manuscript.
Code availability
All versions of third-party software and scripts used in this study are described and referenced accordingly in the Methods section.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Shuaishuai Xu, Hailong Huang, Songze Chen.
Contributor Information
Haibo Jiang, Email: jianghaibo@nbu.edu.cn.
Shengwei Hou, Email: housw@sustech.edu.cn.
Supplementary information
The online version contains supplementary material available at 10.1038/s41597-024-03050-4.
References
- 1.Zhang Y, et al. Community differentiation of bacterioplankton in the epipelagic layer in the South China Sea. Ecol. Evol. 2018;8:4932–4948. doi: 10.1002/ece3.4064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zhang Y, Zhao Z, Dai M, Jiao N, Herndl GJ. Drivers shaping the diversity and biogeography of total and active bacterial communities in the South China Sea. Mol. Ecol. 2014;23:2260–2274. doi: 10.1111/mec.12739. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ning, X. et al. Physical-biological oceanographic coupling influencing phytoplankton and primary production in the South China Sea. J. Geophys. Res. Oceans109, (2004).
- 4.Tian J, Qu T. Advances in research on the deep South China Sea circulation. Chin. Sci. Bull. 2012;57:3115–3120. doi: 10.1007/s11434-012-5269-x. [DOI] [Google Scholar]
- 5.Li H, Zhou H, Yang S, Dai X. Stochastic and Deterministic Assembly Processes in Seamount Microbial Communities. Appl. Environ. Microbiol. 2023;0:e00701–23. doi: 10.1128/aem.00701-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Becker, J. W. et al. Closely related phytoplankton species produce similar suites of dissolved organic matter. Front. Microbiol. 5, (2014). [DOI] [PMC free article] [PubMed]
- 7.Ma J, et al. Control factors of DIC in the Y3 seamount waters of the Western. Pacific Ocean. J. Oceanol. Limnol. 2020;38:1215–1224. doi: 10.1007/s00343-020-9314-3. [DOI] [Google Scholar]
- 8.Zhao H, et al. Vertically Exported Phytoplankton (<20 µm) and Their Correlation Network With Bacterioplankton Along a Deep-Sea Seamount. Front. Mar. Sci. 2022;9:862494. doi: 10.3389/fmars.2022.862494. [DOI] [Google Scholar]
- 9.Mendonça A, et al. Is There a Seamount Effect on Microbial Community Structure and Biomass? The Case Study of Seine and Sedlo Seamounts (Northeast Atlantic) PLoS ONE. 2012;7:e29526. doi: 10.1371/journal.pone.0029526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Clark MR, et al. The Ecology of Seamounts: Structure, Function, and Human Impacts. Annu. Rev. Mar. Sci. 2010;2:253–278. doi: 10.1146/annurev-marine-120308-081109. [DOI] [PubMed] [Google Scholar]
- 11.Mohn C, et al. Dynamics of currents and biological scattering layers around Senghor Seamount, a shallow seamount inside a tropical Northeast Atlantic eddy corridor. Deep Sea Res. Part Oceanogr. Res. Pap. 2021;171:103497. doi: 10.1016/j.dsr.2021.103497. [DOI] [Google Scholar]
- 12.Huang H, et al. Diversity and Distribution of Harmful Algal Bloom Species from Seamount to Coastal Waters in the South China Sea. Microbiol. Spectr. 2023;11:e04169–22. doi: 10.1128/spectrum.04169-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Teeling H, et al. Substrate-Controlled Succession of Marine Bacterioplankton Populations Induced by a Phytoplankton Bloom. Science. 2012;336:608–611. doi: 10.1126/science.1218344. [DOI] [PubMed] [Google Scholar]
- 14.Ding W, Chen Y, Sun Z, Cheng Z. Chemical compositions and precipitation timing of basement calcium carbonate veins from the South China Sea. Mar. Geol. 2017;394:116–124. doi: 10.1016/j.margeo.2017.11.012. [DOI] [Google Scholar]
- 15.Bowers RM, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 2017;35:725–731. doi: 10.1038/nbt.3893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Rinke C, et al. A standardized archaeal taxonomy for the Genome Taxonomy Database. Nat. Microbiol. 2021;6:946–959. doi: 10.1038/s41564-021-00918-8. [DOI] [PubMed] [Google Scholar]
- 17.Zhang H, et al. Metagenome sequencing and 768 microbial genomes from cold seep in South China Sea. Sci. Data. 2022;9:480. doi: 10.1038/s41597-022-01586-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Huang J-M, Baker BJ, Li J-T, Wang Y. New Microbial Lineages Capable of Carbon Fixation and Nutrient Cycling in Deep-Sea Sediments of the Northern South China Sea. Appl. Environ. Microbiol. 2019;85:e00523–19. doi: 10.1128/AEM.00523-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhou L, Huang S, Gong J, Xu P, Huang X. 500 metagenome-assembled microbial genomes from 30 subtropical estuaries in South China. Sci. Data. 2022;9:310. doi: 10.1038/s41597-022-01433-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Nishimura Y, Yoshizawa S. The OceanDNA MAG catalog contains over 50,000 prokaryotic genomes originated from various marine environments. Sci. Data. 2022;9:305. doi: 10.1038/s41597-022-01392-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Paoli L, et al. Biosynthetic potential of the global ocean microbiome. Nature. 2022;607:111–118. doi: 10.1038/s41586-022-04862-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Huang H, Xu Q, Gibson K, Chen Y, Chen N. Molecular characterization of harmful algal blooms in the Bohai Sea using metabarcoding analysis. Harmful Algae. 2021;106:102066. doi: 10.1016/j.hal.2021.102066. [DOI] [PubMed] [Google Scholar]
- 23.Parada AE, Needham DM, Fuhrman JA. Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples: Primers for marine microbiome studies. Environ. Microbiol. 2016;18:1403–1414. doi: 10.1111/1462-2920.13023. [DOI] [PubMed] [Google Scholar]
- 24.Needham DM, Fuhrman JA. Pronounced daily succession of phytoplankton, archaea and bacteria following a spring bloom. Nat. Microbiol. 2016;1:1–7. doi: 10.1038/nmicrobiol.2016.5. [DOI] [PubMed] [Google Scholar]
- 25.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
- 26.McNichol J, Berube PM, Biller SJ, Fuhrman JA. Evaluating and Improving Small Subunit rRNA PCR Primer Coverage for Bacteria, Archaea, and Eukaryotes Using Metagenomes from Global Ocean Surveys. mSystems. 2021;6:e00565–21. doi: 10.1128/mSystems.00565-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yeh Y-C, Fuhrman JA. Contrasting diversity patterns of prokaryotes and protists over time and depth at the San-Pedro Ocean Time series. ISME Commun. 2022;2:1–12. doi: 10.1038/s43705-022-00121-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Quast C, et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41:D590–D596. doi: 10.1093/nar/gks1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Guillou L, et al. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 2013;41:D597–D604. doi: 10.1093/nar/gks1160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Callahan BJ, et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods. 2016;13:581–583. doi: 10.1038/nmeth.3869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31:1674–1676. doi: 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]
- 33.Li D, et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods San Diego Calif. 2016;102:3–11. doi: 10.1016/j.ymeth.2016.02.020. [DOI] [PubMed] [Google Scholar]
- 34.Hyatt D, et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010;11:119. doi: 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28:3150–3152. doi: 10.1093/bioinformatics/bts565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Aramaki T, et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics. 2020;36:2251–2252. doi: 10.1093/bioinformatics/btz859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yu, K. et al. Recovery of high-qualitied genomes from a deep-inland salt lake using BASALT. BioRxiv Prepr. Serv. Biol. 10.1101/2021.03.05.434042 (2021).
- 38.Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ. 2015;3:e1165. doi: 10.7717/peerj.1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Alneberg J, et al. Binning metagenomic contigs by coverage and composition. Nat. Methods. 2014;11:1144–1146. doi: 10.1038/nmeth.3103. [DOI] [PubMed] [Google Scholar]
- 40.Wu Y-W, Simmons BA, Singer SW. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics. 2016;32:605–607. doi: 10.1093/bioinformatics/btv638. [DOI] [PubMed] [Google Scholar]
- 41.Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome. 2018;6:158. doi: 10.1186/s40168-018-0541-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wang Z, Huang P, You R, Sun F, Zhu S. MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities. Genome Biol. 2023;24:1. doi: 10.1186/s13059-022-02832-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Mallawaarachchi, V. & Lin, Y. MetaCoAG: Binning Metagenomic Contigs via Composition, Coverage and Assembly Graphs. in Research in Computational Molecular Biology (ed. Pe’er, I.) vol. 13278 70–85 (Springer International Publishing, Cham, 2022). [DOI] [PubMed]
- 44.Pan S, Zhu C, Zhao X-M, Coelho LP. A deep siamese neural network improves metagenome-assembled genomes in microbiome datasets across different environments. Nat. Commun. 2022;13:2326. doi: 10.1038/s41467-022-29843-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Líndez PP, et al. Adversarial and variational autoencoders improve metagenomic binning. Commun. Biol. 2023;6:1073. doi: 10.1038/s42003-023-05452-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Liu C-C, et al. MetaDecoder: a novel method for clustering metagenomic contigs. Microbiome. 2022;10:46. doi: 10.1186/s40168-022-01237-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Vollmers J, Wiegand S, Lenk F, Kaster A-K. How clear is our current view on microbial dark matter? (Re-)assessing public MAG & SAG datasets with MDMcleaner. Nucleic Acids Res. 2022;50:e76–e76. doi: 10.1093/nar/gkac294. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Olm MR, Brown CT, Brooks B, Banfield JF. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 2017;11:2864–2868. doi: 10.1038/ismej.2017.126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–1055. doi: 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Sunagawa S, et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods. 2013;10:1196–1199. doi: 10.1038/nmeth.2693. [DOI] [PubMed] [Google Scholar]
- 51.Martinez-Gutierrez CA, Aylward FO. Phylogenetic Signal, Congruence, and Uncertainty across Bacteria and Archaea. Mol. Biol. Evol. 2021;38:5514–5527. doi: 10.1093/molbev/msab254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 2013;30:772–780. doi: 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25:1972–1973. doi: 10.1093/bioinformatics/btp348. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Minh BQ, et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol. Biol. Evol. 2020;37:1530–1534. doi: 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Letunic I, Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–W296. doi: 10.1093/nar/gkab301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.2022. NCBI Sequence Read Archive. SRP397785
- 57.Xu S. 2023. The South China Sea metagenomic datasets. Figshare. [DOI]
- 58.Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–2069. doi: 10.1093/bioinformatics/btu153. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- 2022. NCBI Sequence Read Archive. SRP397785
- Xu S. 2023. The South China Sea metagenomic datasets. Figshare. [DOI]
Supplementary Materials
Data Availability Statement
All versions of third-party software and scripts used in this study are described and referenced accordingly in the Methods section.