Skip to main content
Scientific Data logoLink to Scientific Data
. 2026 Feb 11;13:422. doi: 10.1038/s41597-026-06812-4

35 metagenomic datasets from the northern and southern parts of the Yap trench sediments

Mingyang Niu 1, Lulu Fu 2,, Qingyun Yan 1, Zhili He 1, Dong Li 3,, Yu Zhen 4, Minxiao Wang 2, Chaolun Li 2,5,6,7
PMCID: PMC13005013  PMID: 41673004

Abstract

The hadal trench is the deepest part of the global ocean and harbors highly abundant microbial cells. However, the diversity and function of the majority of microbial communities in this part of the ocean are still unclear. Here, we collected 35 metagenomes from three push cores across different sites in both the northern and southern Yap trench to construct a comprehensive gene and genome dataset. A total of 32 million non-redundant genes were predicted from the whole metagenome datasets, with 63% assigned to known functional groups based on currently available databases. A total of 404 metagenome-assembled genomes (MAGs) with completeness >50% and contamination <10% were retrieved, and their taxonomy was highly diverse across 26 phyla. Alpha- and Gammaproteobacteria, Phycisphaerae, Nitrospiria, and Dehalococcoidia were dominant classes across all samples. The nonredundant gene and MAG datasets are valuable resources for advancing our understanding of the diversity, composition, and functions of microbiota in the sediment of the hadal trench.

Subject terms: Marine biology, Metagenomics

Background & Summary

The hadal zone is the deepest habitat of the ocean, referring to the deep region with >6000 meters water depth, and approximately 1%–2% of the global benthic area, but constitutes the deepest 45% of the vertical depth gradient1. Tectonically, the hadal zone is in the subduction zone, creating topographic V-shaped depressions that form a unique topographic feature in the deep ocean2,3. The geophysical and geochemical features of the hadal zone are distinctive from those of other habitats in the deep ocean1,4. Topography, geographical isolation, and spatio-temporal variation in food supply, as well as low temperature and extremely high hydrostatic pressure, created a unique habitat that accommodated a diverse and active microbial community4. With advances in deep-sea sampling technologies and in high-throughput sequencing, the knowledge of the hadal biosphere has been largely improved. Sediments of the hadal zone harbor microbial communities with high abundance and diverse metabolic functions, showing clear shifts of composition and assembly strategies from bathyal and abyssal sediments to deep hadal zones59. The proportion of heterotrophic microbial communities was dominant in hadal sediments10,11. They are able to degrade various organic matter, such as aromatic compounds, alkane, and long-chain hydrocarbons, as revealed from previous metagenome sequencing-based analyses5,6,8,12. Further, growing evidence indicates that chemoautotrophic carbon fixation occurred within the hadal trench13,14. Despite the increasing number of studies in the hadal biosphere, the current microbiome data are not sufficient to carry out a comprehensive investigation on microbial diversity and function in the sediment of hadal trenches. Therefore, the knowledge of the diversity, composition, and function of microbial communities in the hadal sediment remains deficient.

The Yap Trench is located at the southern end of the Philippine Sea Plate and is a tectonic region of convergence among the Philippine Sea, the Pacific and the Caroline plates in the southwestern of the Pacific Ocean. There are three different trenches, namely the Yap Trench, the Mariana Trench and the Palau Trench, created by the process of tectonic plate collisions15. The Yap Trench is located between the Mariana Trench and the Palau Trench, extending about 700 km long and 50 km wide from the trench axis to the island arc. The width of the Yap Trench is much less than that of other arc-trench systems, forming a sharp “V” shape. The Yap Trench is divided into northern and southern sections, with the boundary between them marked at 8°26′N, based on its relation to the Caroline Ridge16,17. The geological, geophysical, and geochemical characteristics were different between northern and southern sections16,17. For instance, the southern Yap Trench has a gentler trench slope and lower seismic intensity compared to the northern section18. Additionally, the concentration of organic matter in the sediment of the southern section is higher than that of the northern section19,20. These contrasting characteristics may influence the formation of different microbial communities in the sediments of the two sections of the Yap Trench, which is important to broad our understanding of microbial community functions on hadal zone.

To better understand the diversity, composition and function of sediment microbial communities at the Yap Trench, and compare the microbiome of northern and southern parts, we collected three push cores from different water depths, covering abyssal (Sites 1 and 2) and hadal trench (Site 3) regions in the northern and southern parts of the Yap Trench. 35 metagenomes obtained from top to bottom layers of three push cores (Fig. 1A and Supplementary Table S1). Through metagenome assembly and binning processes (Fig. 1B), we obtained 32 million non-redundant predicted genes and 404 metagenome assembled genomes (MAGs) with completeness >50% and contamination <10% from the whole dataset. Within these MAGs, 142 MAGs were estimated to be >70% completeness, account for 35% of total MAGs (Supplementary Table S2). Based on taxonomy classification and the relative abundance of these MAGs, Alpha- and Gammaproteobacteria, Phycisphaerae, Nitrospiria and Dehalococcoidia were dominant classes across all samples. Gammaproteobacteria and Acidimicrobiia were highly abundant in abyssal sediment, while Alphaproteobacteria and Dehalococcoidia were dominant classes in the hadal sediments (Fig. 2). The assembled contigs of each sample were integrated and redundant genes were removed with sequence similarity (cut off = 99%, Fig. 1B). After clustering, 3,976,582 non-redundant genes were retrieved from the datasets. We blasted these non-redundant genes against the KEGG, Pfam, CAZy, and eggNOG databases to predict their functions, and 63% of these genes could be assigned to known genes in the databases (Fig. 3). The MAGs with more than 70% completeness and less than 10% contamination were used to construct a phylogenomic tree (Fig. 4). The results showed that the taxonomy of MAGs included 26 phyla indicating highly diverse of microbial community in Yap trench sediment (Fig. 4). Among them, top three numbers of MAGs were affiliated with Pseudomonadota (n = 50), Acidobacteriota (n = 17) and Chloroflexota (n = 8). The archaeal MAGs belonged to Thermoproteota and Nanoarchaeota (Supplementary Table S2). These datasets will enable us to further understand the diversity, composition and function of microbiota in the hadal trench, and highlight their critical roles in the hadal biosphere.

Fig. 1.

Fig. 1

The map of sampling sites and pipeline of metagenome analysis (A). Sampling location. The red dots showed the location of sampling sites (B). The pipeline of metagenomic analysis for sediment samples.

Fig. 2.

Fig. 2

The relative abundance and distribution of recovered MAGs in three sites. The stacked bar plot based on the relative abundance of MAGs obtained from this study.

Fig. 3.

Fig. 3

Functional characterization of the non-redundant gene catalog. Non-annotation indicates that these genes were not annotated in at least one of the following databases: eggNOG, Pfam, KEGG, and CAZy. The low panel of filled-in cells indicated the databases are in an intersection. The vertical bars in top panel represent the number of annotated genes in the intersection or shared between different databases in intersection. The horizontal bars in left panel indicate the total number of annotated genes in each database.

Fig. 4.

Fig. 4

Phylogenetic tree of MAGs including bacteria and archaea. The tree was constructed based on concatenated 37 conserved single copy proteins alignment. The black points in the branches of the tree represent bootstrap values >0.7. The MAGs recovered from this study were labeled red. Phyla are color-coded, and taxonomy is from the Genome Taxonomy Database (GTDB). The gray bars in the outside cycle indicated the relative abundance of MAGs.

Methods

Sample collection

Three push cores were retrieved from the western trench slope of Yap Trench during R/V Xiangyanghong 10th cruise with manned submersible Jiaolong (Fig. 1). The subsample of sediment was split at 1-cm intervals using sterilized tools on board; additionally, the subsamples were split at 2-cm intervals below 10 cm. Only the interior sections of the sediment were used for microbiological study to avoid potential contamination21. A total of 35 subsamples obtained from 3 push cores were analyzed (Supplementary Table S1), and subsequently, sediments for microbiological analyses were stored at −80 °C until further processing.

DNA extraction and sequencing

Total DNA was extracted from the sediments with the PowerSoil DNA Isolation Kit (Qiagen, Germany) according to the manufacturer’s instructions. The DNA was purified and concentrated with the Genomic DNA Clean & Concentrator kit (Zymo Research, USA). DNA was fragmented into smaller pieces with a Covaris instrument (Covaris, USA) and selected 300–500 bp DNA fragments to construct libraries with Illumina Nextera DNA libraries kit (Illumina, USA), and sequencing on Illumina HiSeq X-Ten platform (Wuhan Onemore-tech Co., Ltd.).

Metagenome assembly and binning

The trimming of raw reads was performed using Trimmomatic v.0.39. The clean reads of each sample were assembled using MEGAHIT v1.2.9 with parameters ‘–k-min 21–k-max 144–k-step 10’22. The length of contigs larger than 1000 bp was used for downstream analysis. The coverage of contigs was determined using BWA software (v0.7.17; BWA-MEM algorithm)23. Binning process performed with metaWRAP binning module (v1.3.2; parameters: -metabat2, -maxbin2, -concoct, -m 2000)24 and VAMB25 with default parameters, respectively. The reconstructed MAGs were refined using the ‘bin_refinement’ module of MetaWRAP v1.312, and their quality and taxonomic information were identified using CheckM2 v1.0.226 and GTDB-TK v2.4.027 with the GTDB-TK reference database (version 220), respectively. MAGs with completeness more than 50% and contamination less than 10% were used for downstream analysis. 404 representative MAGs were obtained based on an average nucleotide identity (ANI) cutoff value of 95% with dRep v3.5.028. The coverage of each MAG was calculated using CoverM in genome mode (v0.6.1; https://github.com/wwood/CoverM; parameters: -min-read-percent-identity 0.95, -min-read-aligned-percent 0.75, -trim-min 0.10, -trim-max 0.90, -m relative_abundance).

Functional gene annotation and phylogenetic analysis

The open reading frames (ORFs) of the genomes and contigs were predicted using Prodigal v2.6.329 with the ‘-p meta’ parameter and then annotated against the Kyoto Encyclopedia of Genes and Genomes (KEGG) (version Jan. 1st, 2025) using KofamScan v1.3.030 with E-values ≤ 1e-20, and Tigrfam31 using hmmscan (v3.3.2)32. The peptidase and proteinase encoding genes were annotated in the MEROPS database 12.433 using Diamond blastp v0.9.1434 with a threshold of coverage >40% and E-value < 1e-20.

 We used 142 MAGs with completeness >70% and contamination <10% to construct the phylogenetic tree. The concatenated set of 37 conversed single-copy genes based on a hidden Markov Model profile was used for phylogenetic analysis with IQ-TREE (v2.2.0.3)35 with the best-fit model (Q.pfam + I + I + R9) and 1000 times ultrafast bootstrapping. The tree file was edited using the online tool iTOL (https://itol.embl.de/).

Data Records

The 35 raw metagenome sequences are available on the NCBl Sequence Read Archive (SRA) associated with BioProject number PRJNA131417336 and accession number SRP61789737. A total of 404 non-redundant FASTA formatted MAGs from these metagenomes were available at European Nucleotide Archive (ENA) under accession code PRJEB10696838, PRJEB10696939 and PRJEB10691440. The detailed information for these qualified MAGs, including genomic quality, GTDB taxonomy, accession number and relative abundance was shown in the Supplementary Table S2.

Technical Validation

To avoid contamination of sediment samples, all sampling tools and containers have been sterilized before sampling and only the interior sections of the sediment core were collected for DNA extraction. After the samples collection, the sediment samples were stored at −80 °C until further processing. All processes of DNA extraction and library construction were carried out in an ultra-clean lab. To ensure the quality of genes prediction, we selected assembled contigs with a length larger than 1000 bp. To maximize the number of MAGs, the length of contigs more than 1000 bp and four different binning tools were used in the binning process, such as CONCOCT, MetaBat2, Maxbin2 and VAMB. The quality of MAGs was identified with CheckM2. The high-quality MAGs were completeness >50% and contamination <10%. To increase the accuracy of phylogenetic analysis, we used MAGs with completeness >70% and contamination <10% to construct a phylogenetic tree.

Usage Notes

The biosphere in hadal zone sediments has many enigmas and is only partially explored. This study provides comprehensive metagenomic data from the sediments retrieved from different depths of the northern and southern Yap Trench, covering abyssal and trench sediments. The datasets contained 21 and 14 metagenomes from abyssal and trench sediments, respectively. All data were analyzed with a commonly used pipeline, generating bacterial and archaeal high-quality MAGs. The datasets can be used for exploring the diversity and potential metabolic function of microorganisms inhibited in the hadal sediment and comparing with the microbiome of other hadal trench.

Supplementary information

Supplementmary Table (50.8KB, xlsx)

Acknowledgements

This work was supported by National Natural Science Foundation of China (42030407), National Key Basic Research and Development Project of China (2015CB755904), National Natural Science Foundation of China (42006083, 41906124), Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) (SML2023SP220, Dong, SML2024SP002).

Author contributions

M.N. and L.F.: designed this study, performed data analysis, interpreted the data and wrote manuscript. D.L. collected samples. D.L., Q.Y., M.W., C.L. and Z.H. edited the manuscript.

Data availability

Metagenome sequences are deposited on the NCBl Sequence Read Archive (SRA) associated with BioProject number PRJNA131417336 and accession number SRP61789737. 404 high-quality non-redundant MAGs retrieved from these metagenomes of three cores were deposited in ENA database under accession code PRJEB10696838, PRJEB10696939 and PRJEB10691440, respectively. The metadata of MAGs and custom codes were available at FigShare41.

Code availability

The custom scripts to generate the datasets are publicly available on Figshare41. The parameters and versions of all bioinformatics tools used for the metagenomic analysis are described in the Methods section.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Lulu Fu, Email: fululu@qdio.ac.cn.

Dong Li, Email: lidong@sio.org.cn.

Supplementary information

The online version contains supplementary material available at 10.1038/s41597-026-06812-4.

References

  • 1.Jamieson, A. J., Fujii, T., Mayor, D. J., Solan, M. & Priede, I. G. Hadal trenches: the ecology of the deepest places on Earth. Trends in Ecology & Evolution25(3), 190–7 (2010). [DOI] [PubMed] [Google Scholar]
  • 2.Stewart, H. A. & Jamieson, A. J. Habitat heterogeneity of hadal trenches: Considerations and implications for future studies. Progress in Oceanography161, 47–65 (2018). [Google Scholar]
  • 3.Stern, R. J. Subduction zones. Reviews of Geophysics40(4), 3-1–3-38 (2002). [Google Scholar]
  • 4.Du, M. et al. Geology, environment, and life in the deepest part of the world’s oceans. The Innovation2(2), 100109 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Zhou, Y.-L., Mara, P., Cui, G.-J., Edgcomb, V. P. & Wang, Y. Microbiomes in the Challenger Deep slope and bottom-axis sediments. Nature Communications13(1), 1515 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Liu, R. et al. Novel Chloroflexi genomes from the deepest ocean reveal metabolic strategies for the adaptation to deep-sea habitats. Microbiome10(1), 75 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Li, Y., Cao, W., Wang, Y. & Ma, Q. Microbial diversity in the sediments of the southern Mariana Trench. Journal of Oceanology and Limnology37(3), 1024–9 (2019). [Google Scholar]
  • 8.Xiao, X. et al. Microbial ecosystems and ecological driving forces in the deepest ocean sediments. Cell188(5), 1363–77.e9 (2025). [DOI] [PubMed] [Google Scholar]
  • 9.Fu, L. et al. Characteristics of the archaeal and bacterial communities in core sediments from Southern Yap Trench via in situ sampling by the manned submersible Jiaolong. Science of The Total Environment703, 134884 (2020). [DOI] [PubMed] [Google Scholar]
  • 10.Peoples, L. M. et al. Microbial Community Diversity Within Sediments from Two Geographically Separated Hadal Trenches. Front Microbiol10, 2019 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Hiraoka, S. et al. Microbial community and geochemical analyses of trans-trench sediments for understanding the roles of hadal environments. ISME J14(3), 740–56 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wang, Y. et al. Genomics insights into ecotype formation of ammonia-oxidizing archaea in the deep ocean. Environ Microbiol21(2), 716–29 (2019). [DOI] [PubMed] [Google Scholar]
  • 13.Wenzhöfer, F. et al. Benthic carbon mineralization in hadal trenches: Assessment by in situ O2 microprofile measurements. Deep Sea Research Part I: Oceanographic Research Papers116, 276–86 (2016). [Google Scholar]
  • 14.Luo, M., Gieskes, J., Chen, L., Shi, X. & Chen, D. Provenances, distribution, and accumulation of organic matter in the southern Mariana Trench rim and slope: Implication for carbon cycle and burial in hadal trenches. Marine Geology386, 98–106 (2017). [Google Scholar]
  • 15.Yang, Y. et al. Geology of the Yap Trench: new observations from a transect near 10°N from manned submersible Jiaolong. International Geology Review60(16), 1941–53 (2018). [Google Scholar]
  • 16.Xia, C.-L. et al. Geological and geophysical differences between the north and south sections of the Yap trench-arc system and their relationship with Caroline Ridge subduction. Geological Journal55(12), 7775–89 (2020). [Google Scholar]
  • 17.Fujiwara, T. et al. Morphology and tectonics of the Yap Trench. Marine Geophysical Researches21(1), 69–86 (2000). [Google Scholar]
  • 18.Jamieson, A. The Hadal Zone: Life in the Deepest Oceans. Cambridge: Cambridge University Press (2015).
  • 19.Li, D. et al. Spatial heterogeneity of organic carbon cycling in sediments of the northern Yap Trench: Implications for organic carbon burial. Mar Chem223, 103813 (2020). [Google Scholar]
  • 20.Li, D. et al. Comparison of sedimentary organic carbon loading in the Yap Trench and other marine environments. Journal of Oceanology and Limnology38(3), 619–33 (2020). [Google Scholar]
  • 21.Lever, M. A. et al. Life under extreme energy limitation: a synthesis of laboratory- and field-based investigations. FEMS Microbiology Reviews (2015). [DOI] [PubMed]
  • 22.Li, D., Liu, C.-M., Luo, R., Sadakane, K., Lam, T.- W. J. B. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. 31(10):1674-6 (2015). [DOI] [PubMed]
  • 23.Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods9, 357 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome6(1), 158 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Nissen, J. N. et al. Improved metagenome binning and assembly using deep variational autoencoders. Nature Biotechnology39(5), 555–60 (2021). [DOI] [PubMed] [Google Scholar]
  • 26.Chklovski, A., Parks, D. H., Woodcroft, B. J. & Tyson, G. W. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nature Methods20(8), 1203–12 (2023). [DOI] [PubMed] [Google Scholar]
  • 27.Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics36(6), 1925–7 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J11(12), 2864–8 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hyatt, D. et al. LJJBb. Prodigal: prokaryotic gene recognition and translation initiation site identification. 11(1):119 (2010). [DOI] [PMC free article] [PubMed]
  • 30.Aramaki, T. et al. KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold. Bioinformatics36(7), 2251–2 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Haft, D. H. et al. TIGRFAMs and Genome Properties in 2013. Nucleic Acids Research41(D1), D387–D95 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Eddy, S. R. Accelerated Profile HMM Searches. PLOS Computational Biology7(10), e1002195 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Rawlings, N. D. & Bateman, A. How to use the database and website to help understand peptidase specificity. Protein Science30(1), 83–92 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nature Methods12, 59 (2014). [DOI] [PubMed] [Google Scholar]
  • 35.Minh, B. Q. et al. IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era. Mol Biol Evol37(5), 1530–4 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.NCBI BioProjecthttps://identifiers.org/ncbi/bioproject:PRJNA1314173 (2025).
  • 37.NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP617897 (2025).
  • 38.Euuropean Nucleotide Archivehttps://www.ebi.ac.uk/ena/browser/view/PRJEB106968 (2026).
  • 39.Euuropean Nucleotide Archivehttps://www.ebi.ac.uk/ena/browser/view/PRJEB106969 (2026).
  • 40.Euuropean Nucleotide Archivehttps://www.ebi.ac.uk/ena/browser/view/PRJEB106914 (2026).
  • 41.Mingyang, N. et al. 35 metagenomic datasets from the northern and southern slope of Yap trench sediments. Figshare10.6084/m9.figshare.29328314 (2025).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. NCBI Sequence Read Archivehttps://identifiers.org/ncbi/insdc.sra:SRP617897 (2025).
  2. Euuropean Nucleotide Archivehttps://www.ebi.ac.uk/ena/browser/view/PRJEB106968 (2026).
  3. Euuropean Nucleotide Archivehttps://www.ebi.ac.uk/ena/browser/view/PRJEB106969 (2026).
  4. Euuropean Nucleotide Archivehttps://www.ebi.ac.uk/ena/browser/view/PRJEB106914 (2026).
  5. Mingyang, N. et al. 35 metagenomic datasets from the northern and southern slope of Yap trench sediments. Figshare10.6084/m9.figshare.29328314 (2025).

Supplementary Materials

Supplementmary Table (50.8KB, xlsx)

Data Availability Statement

Metagenome sequences are deposited on the NCBl Sequence Read Archive (SRA) associated with BioProject number PRJNA131417336 and accession number SRP61789737. 404 high-quality non-redundant MAGs retrieved from these metagenomes of three cores were deposited in ENA database under accession code PRJEB10696838, PRJEB10696939 and PRJEB10691440, respectively. The metadata of MAGs and custom codes were available at FigShare41.

The custom scripts to generate the datasets are publicly available on Figshare41. The parameters and versions of all bioinformatics tools used for the metagenomic analysis are described in the Methods section.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES