Skip to main content
Applied and Environmental Microbiology logoLink to Applied and Environmental Microbiology
. 2026 Apr 1;92(4):e00392-26. doi: 10.1128/aem.00392-26

Metagenomic mining reveals extensive novelty, enhanced biodegradation potential, and untapped biosynthetic capacity in Chinese oilfield microbiomes

Changhao Zhou 1,2, Shuoliang Wang 1,2,, Hui Zhao 3,, Shiqi Wang 1, Liangliang Jiang 4, Chunlei Yu 1,
Editor: John R Spear5
PMCID: PMC13101482  PMID: 41919968

ABSTRACT

Oil reservoir microorganisms represent a vast and largely unexplored reservoir of biological diversity and functional potential, yet comprehensive studies on their genomic and metabolic characteristics remain limited. To address this gap, we collected 101 metagenomic sequencing samples from 13 distinct oilfields across China. Through extensive de novo assembly and binning processes, we successfully reconstructed 3,057 medium and high-quality metagenome-assembled genomes (MAGs), providing an unprecedented genomic resource for reservoir microbiome research. Strikingly, 73.77% of these MAGs correspond to novel taxa at the species level, highlighting the significant unexplored microbial diversity in these environments. Detailed genomic analysis revealed that MAGs classified under the class Planctomycetia exhibited notably larger genome sizes, primarily driven by the expansion of specific gene families, suggesting adaptive evolutionary strategies in hydrocarbon-rich environments. Furthermore, we identified 68 genes implicated in anaerobic alkane biodegradation pathways, with samples from the Shengli oilfield demonstrating particularly enhanced biodegradation potential, indicating site-specific functional adaptations. Beyond biodegradation, our study uncovered three MAGs assigned to the genus Tistrella, which harbored a remarkable abundance of biosynthetic gene clusters (BGCs) for secondary metabolites. Additionally, 14 candidate antimicrobial peptides (cAMPs) were detected, signifying the potential for novel bioactive compound discovery. Critically, both the Tistrella MAGs and cAMPs were identified for the first time within petroleum reservoir ecosystems, underscoring the unique biotechnological value of these environments. This research not only expands our understanding of oil reservoir microbial communities but also emphasizes their substantial implications for industrial applications, including bioremediation, antimicrobial development, and sustainable resource management.

IMPORTANCE

This study provides a groundbreaking genomic exploration of oil reservoir microbiomes across 13 Chinese oilfields, reconstructing 3,057 medium and high-quality metagenome-assembled genomes (MAGs). Remarkably, 73.77% of these MAGs represent novel species, revealing vast unexplored microbial diversity. We observed genome expansion in Planctomycetia lineages and identified 68 genes involved in anaerobic alkane degradation, with heightened biodegradation potential in Shengli oilfield samples. Crucially, we discovered three Tistrella MAGs rich in biosynthetic gene clusters (BGCs) for secondary metabolites and 14 candidate antimicrobial peptides (cAMPs), both reported for the first time in petroleum reservoirs. These findings highlight the immense biotechnological potential of reservoir microbiomes, offering new pathways for bioremediation strategies in oil-contaminated environments and novel sources for antimicrobial discovery. This work underscores the critical need for continued investigation into these unique ecosystems to harness their functional capabilities for energy sustainability and pharmaceutical innovation.

KEYWORDS: oil reservoirs, metagenome, AMP, biodegradation

INTRODUCTION

Petroleum is a complex mixture consisting of alkanes, aromatic hydrocarbons, and compounds containing carbon, hydrogen, oxygen, sulfur, and nitrogen. Nevertheless, a large number of microorganisms can survive and exert various functions in reservoir environments. Microbial-enhanced oil recovery (MEOR) is a promising tertiary oil recovery technology that has attracted extensive attention from the global petroleum industry in recent years due to its advantages of environmental friendliness, simple operation, and high cost-effectiveness (1). In the MEOR process, cultured microorganisms or nutrient solutions are injected into underground oil reservoirs, where the microorganisms grow, reproduce, and release specific metabolites to improve oil recovery efficiency (2, 3). Based on different operation modes and injected microbial strains, MEOR technology is mainly classified into five types: microbial huff and puff recovery (MHPR), microbial flooding recovery (MFR), microbial selective plugging recovery (MSPR), biopolymer flooding recovery (BFR), and microbial wax removal and control (MWRC) (4). MHPR and MFR mainly utilize microbial metabolites to improve fluid properties (e.g., reducing crude oil viscosity and oil-water interfacial tension) and thus promote oil migration (5, 6). MWRC enhances oil production efficiency by degrading crystalline wax deposited in oil well bores and gathering pipelines (7).

In the three aforementioned MEOR technologies, microorganisms primarily utilize long-chain alkanes as carbon sources for metabolism; thus, the biodegradation of long-chain n-alkanes is a common phenomenon in petroleum reservoirs (8). Decades of research have confirmed that hydrocarbon-degrading bacteria employ alkylsuccinate synthase (AssA) or benzylalkylsuccinate synthases (BssA) to catalyze the fumarate addition reaction of alkanes, generating alkylsuccinates under anaerobic conditions, with nitrate, iron, or sulfate serving as electron acceptors (912). In the absence of exogenous electron acceptors, syntrophic interactions between bacteria and archaea drive the biodegradation process. Bacteria first cleave long-chain alkanes into simple compounds (e.g., acetate and formate), benzoyl-CoA reductase subunits B (BcrB) and C (BcrC) were used in this system, and methanogenic archaea subsequently convert these compounds into methane via methyl-coenzyme M reductase (Mcr) (13, 14). Recent studies have revealed that archaeal alkyl-coenzyme M reductase (Acr), a variant of Mcr, can directly activate long-chain alkanes (15). Laso-Pérez (16) identified both Acr and Mcr genes in the metagenome-assembled genomes (MAGs) of Methanoliparia. Subsequent research demonstrated that Methanoliparum archaea can independently activate long-chain alkanes (C13+) and alkyl-substituted hydrocarbons by utilizing Acr variants (17).

Beyond MEOR, researchers have also shown great interest in other functions of reservoir microorganisms, such as anaerobic ammonium oxidation (anammox) (18) and post-contamination remediation (19). These studies are mostly focused on solving practical problems in oil reservoir production, while the high biological value of the microorganisms themselves remains to be explored in depth. For instance, with the rapid development of deep learning, an increasing number of bioactive peptides have been identified from microorganisms in various environments. Antimicrobial peptides (AMPs) are a class of short peptides with broad-spectrum biological activity, generally consisting of fewer than 50 amino acids. In microorganisms, they are mainly produced via two secondary metabolic pathways, ribosomally synthesized and post-translationally modified peptides (RiPPs) and nonribosomal peptide synthetases (NRPSs), and exhibit antibacterial, antifungal, antiviral, and even anticancer activities (20). Ma (21) identified 2,349 candidate AMPs (cAMPs) from the human gut microbiome using Attention, LSTM, and BERT models, among which 181 exhibited antibacterial activity. Chen (22) discovered 121 marine-derived cAMPs, with 10 showing potent biological activity. Santos-Júnior (23) constructed a global database containing more than 860,000 peptides from 150,000 microbial genomes and verified the antimicrobial efficacy of 79 out of 100 tested cAMPs. Notably, AMPs derived from petroleum microbiomes remain largely unexplored.

In this study, we collected 101 publicly available metagenomic samples from major Chinese oilfields, conducted genome assembly and binning, investigated the core microbial taxa and interfield differences in microbial community structure, characterized the alkane biodegradation potential of microbial communities, and simultaneously mined potential AMPs from these samples. This work lays a foundation for further exploitation and utilization of petroleum microbial resources.

MATERIALS AND METHODS

Data collection

Using the keywords “Metagenome AND China oil reservoirs,” we searched PubMed for articles published between 2018 and 2025. A total of 101 metagenomic samples from 13 Chinese oilfields were collected (Table S1), and raw sequencing data were downloaded via links provided in the publications (2429).

Metagenomic assembly, binning, and quality control

Raw sequencing data in SRA format were converted to FASTQ using SRA Toolkit (v3.1.1). Adapters and low-quality sequences were removed with SOAPnuke (v1.6.5) (30). Clean reads were assembled using MEGAHIT (v1.1.3) (31) with the following parameters: --min-count 2 --k-min 33 --k-max 83 --k-step 20 --no-mercy. A total of 101 assemblies were binned via MetaWRAP (v1.3.2) (32) using MetaBAT2, MaxBin2, and CONCOCT. Bins from all three tools were integrated with bin_refinement, retaining those with ≥50% completeness and ≤10% contamination, yielding 5,007 medium- and high-quality bins (33). These bins were clustered at 95% average nucleotide identity (ANI) using dRep (v3.2.2) (34) with the following parameters: -comp 50 -con 10 -pa 0.9 --S_ani 0.95 --cov_thresh 0.3 --strain_heterogeneity_weight 0, resulting in 3,057 representative MAGs for downstream analysis.

Taxonomic annotation and relative abundance calculation

The 3,057 MAGs were taxonomically annotated using GTDB-Tk (v2.4.1) (35) with default parameters (data set: r226). Bacterial and archaeal phylogenetic trees were constructed from aligned protein sequences (generated by GTDB-Tk) using IQ-TREE (v2.1.4) (36) and visualized with iTOL (v6.0) (37).

For relative abundance calculation, MAGs were merged into a reference genome database. Clean reads from all 101 samples were aligned to this database using Bowtie2 (v2.2.5) (38) with the following parameters: -D 10 -R 2 -N 1 -L 22 -i S,0,2.50. SAM files were converted to sorted BAM files using samtools (v1.9) (39), followed by filtering of paired-end reads with >97% similarity using BBMap (v39.33) (40) (reformat.sh minidfilter = 0.97 primaryonly = t pairedonly = t). Relative abundance was computed with CoverM (v0.7.0) (41), retaining reads covering ≥60% of the genome (coverm genome --proper-pairs-only --min-read-percent-identity-pair 0.97 --min-covered-fraction 0.6). Alpha diversity analysis and visualization were performed using OmicStudio (42).

Coding gene annotation and orthogroup identification

Open reading frames (ORFs) from the 3,057 MAGs were predicted with Prokka (v1.14.6) (43), identifying 8,514,112 coding genes. Functional annotation was performed using eggNOG-mapper (v2.1.12) (44) against eggNOG DB (v5.0.2). Orthogroups (OGs) for 103 Planctomycetia and Phycisphaerae MAGs were identified with OrthoFinder (v2.5.5) (45) using the following parameters: -a 2 -M msa -S diamond -A muscle -T fasttree -og.

Anaerobic hydrocarbon biodegradation gene mining

Hidden Markov models (HMMs) for AcrA genes were downloaded from PFAM (PF02249 for MCR_alpha; PF02745 for MCR_alpha_N). AssA reference sequences were obtained from the AnHyDeg database (https://github.com/AnaerobesRock/AnHyDeg), and HMM profiles were built using hmmbuild (HMMER v3.1b2) (46). BcrB and BcrC references were identified via Annotree (47) (TIGR02260 and TIGR02263) and converted to HMMs. Candidate genes were screened by aligning coding proteins to HMMs using hmmsearch (e-value<1e−5), followed by deduplication at 99% sequence similarity with CD-HIT (v4.8.1) (48).

AcrA reference sequences were downloaded from KEGG (K00399). Reference sequences were merged with deduplicated candidate gene sequences, aligned with MUSCLE (v3.8.31) (49), and filtered via maximum-likelihood phylogeny using IQ-TREE (-m WAG -bb 1000).

cAMP identification

Biosynthetic gene clusters (BGCs) were predicted using antiSMASH (v7.1.0) (50) with the following parameters: -c 8 --taxon bacteria --genefinding-tool prodigal --cb-general --cb-subclusters --cb-knownclusters --asf --pfam2go. Core peptides from RiPP-class BGCs were screened following Chen (22) and scored using Attention, LSTM, and BERT models. Peptides with scores > 0.5 across all models were classified as cAMPs (21, 22). Briefly, the scripts provided by Chen (22) were first used to extract core peptides from RiPP-class BGCs. Subsequently, these core peptides were converted from FASTA format into a format compatible with the Attention and LSTM models, whereas the BERT model continued to use the FASTA format. Next, the pretrained Attention, LSTM, and BERT models were downloaded, and each of these three models was used to score the core peptides individually. Finally, the results from the three models were integrated, and only those core peptides with a score > 0.5 in all models were defined as cAMPs. All the aforementioned scripts and pretrained models are available in the study by Chen (22).

Statistics

All statistical significance tests used in this study were t-tests by R package “ggpubr,” and all correlation coefficients employed were Pearson correlation coefficients.

RESULTS

Overview of genome catalogs in China oil reservoirs

In this study, we collected 101 publicly available samples from 13 oil fields across China (Fig. 1a; Tables S1 and S2). A total of 3,057 medium- and high-quality MAGs (completeness ≥50%; contamination ≤10%) were reconstructed. On average, 30 MAGs were obtained per sample (Fig. 1b). Across the 3,057 MAGs, a total of 8,514,112 coding proteins were functionally annotated, with an average of 2,785 coding genes per MAG. A production water sample from Xinjiang oilfields (Ref5_s24) yielded 179 MAGs, the highest number among all samples. Conversely, another production water sample from Xinjiang Oilfields (Ref4_s5) produced no MAGs, the lowest in the data sets. The assembly size of Ref4_s5 was 8.34 Mb, the smallest across all samples, explaining the absence of MAGs. While assembly sizes showed no significant differences among Huabei, Shengli, and Xinjiang oilfields, Huabei samples yielded notably more MAGs (Fig. 1b and c).

Fig 1.

Geographic distribution of petroleum reservoir samples across China reveals production water as predominant sample type with varying MAG recovery and assembly quality between Huabei, Shengli, and Xinjiang oilfields.

Information on 101 samples. (a) Sampling site distribution map. (b) Boxplot of MAG numbers obtained from different samples. t-test was used for significance test, “ns” means a P-value > 0.05, and “*” means a P-value ≤ 0.05 and P-value > 0.01. (c) Boxplot of assembly size for Huabei, Shengli, and Xinjiang samples. The statistical significance test method was the same as above.

Microbial distribution in China’s oilfields

The 3,057 MAGs were annotated against the GTDB v226 database, spanning 67 phyla and 1,319 genera (Fig. 2a; Fig. S1 and Table S3). Pseudomonadota, Actinomycetota, and Bacteroidota were the top three phyla by MAG count, while Brevundimonas, Phenylobacterium, and Devosia (within Pseudomonadota) ranked highest at the genus level. Among the 122 archaeal MAGs identified, the majority were associated with methane metabolism—primarily represented by the classes Methanobacteria, Methanosarcinia, and Methanomicrobia, with additional members belonging to the phylum Thermoplasmatota. Notably, 54 of these archaeal MAGs originated from the Shengli samples, consistent with previous reports indicating higher archaeal community diversity in Shengli samples (51). Notably, 2,255 MAGs (73.77%) lacked species-level annotations (Fig. 2b). Among the top 20 genera, >70% of MAGs were taxonomically uncharacterized at the species level. For genera UBA9382, Ramlibacter, Hyphomicrobium, and Immundisolibacter, more than 90% of MAGs represented novel species (Fig. 2c), highlighting vast unexplored microbial diversity in oil environments and the breakthrough potential of this data set in revealing “microbial dark matter.”

Fig 2.

Circular phylogenetic tree of bacterial MAGs showing evolutionary relationships across phyla with sample annotations. Bar charts reveal increasing novel MAG proportions from phylum to species level with significant variation among genera.

Taxonomic annotation information. (a) Phylogenetic tree of bacterial MAGs. Branch colors indicate different phyla; inner ring: sample source; outer ring: sample type. (b) Bar chart of novel MAG counts at different taxonomic levels. (c) Bar chart of novel MAG counts for top 20 genera.

To further investigate the microbial distribution across oilfields, we calculated the relative abundance and alpha diversity (see Methods). Shannon index and richness values indicated higher species diversity in Huabei, Shengli, and Xinjiang samples compared to others, with Huabei exhibiting the most complex and stable community structure (Fig. 3a). Additionally, some microbes showed oilfield-specific distributions (Fig. 3b; Fig. S2 and Table S4). For example, Methanoliparum and Methanothermobacter were predominantly abundant in Shengli’s oily sludge samples. The other methanogen, Methanocalculus, was detected only in Xinjiang samples. Marinobacter was ubiquitous in Changqing oilfields but rare elsewhere. Brevundimonas and Parvibaculum were widespread across Huabei, Shengli, and Xinjiang samples.

Fig 3.

Boxplots show varying Shannon index and richness across Huabei, Shengli, Xinjiang and other oilfield sites. Stacked bars reveal distinct bacterial compositions across Changqing, Huabei, Shengli and Xinjiang samples with multiple taxa.

Species abundance information. (a) Alpha diversity results (left: Shannon index boxplot; right: richness boxplot). (b) Bar chart of species composition for four oilfield samples (Cq: Changqing, Hb: Huabei, Sl: Shengli, and Xj: Xinjiang).

Planctomycetia: a lineage with large genomes

Among the 3,057 MAGs, genome sizes ranged from 3.33 Mb to 10.33 Mb. Eight MAGs exceeded genome sizes of 8 Mb, all belonging to the class Planctomycetia (phylum Planctomycetota). Comparative analysis of the top 20 phyla confirmed that Planctomycetota MAGs were significantly larger (Fig. 4a). Further comparison within Planctomycetota revealed that Planctomycetia (59 MAGs) and Phycisphaerae (44 MAGs) were the dominant classes, with Planctomycetia genomes being significantly larger than Phycisphaerae (Fig. 4b).

Fig 4.

Multi-panel genomic analysis comparing Planctomycetia-class MAGs with bacterial phyla and Phycisphaerae class. Reveals significant genome size variations across phyla with Planctomycetota showing distinctive gene counts and coverage patterns.

Planctomycetia-class MAG information. (a) Boxplot of genome size for MAGs of top 20 phyla. t-test was used for significance test, and “****” means a P value ≤ 0.0001. (b) Boxplot of genome size for Planctomycetota phylum MAGs. The statistical significance test method was the same as above. (c) Boxplot of gene counts for Planctomycetia and Phycisphaerae classes. (d) Boxplot of gene coverage for Planctomycetia and Phycisphaerae classes. (e) Heatmap of gene counts in Large_orthogroups. Red represents MAGs of the class Planctomycetia, while blue represents MAGs of the class Phycisphaerae.

To assess whether MAG quality influenced genome size, we analyzed N50, QS (QS = completeness – 5 × contamination), and size relationships (Fig. S3 and S4). Planctomycetia and Phycisphaerae exhibited similar distributions of N50 and QS, indicating comparable MAG quality. At equivalent N50 or QS values, Planctomycetia genomes remained larger, confirming that quality did not drive size differences.

We analyzed gene counts in 103 MAGs (59 Planctomycetia + 44 Phycisphaerae) and clustered genes into OGs using OrthoFinder v2.5.5. Planctomycetia MAGs harbored significantly more genes than Phycisphaerae (Fig. 4c), explaining their larger genomes. The 103 MAGs contained 436,764 genes, of which 388,838 (89%) clustered into 21,330 OGs. Over 70% of genes per MAG were assignable to OGs (Fig. 4d; Table S5).

OGs were filtered and categorized as follows: a. Planctomycetia_Phycisphaerae_common, 248 OGs present in >90% of MAGs from both classes; b. Planctomycetia_Phycisphaerae_specific: 467 OGs present in >90% of MAGs from one class but not the other; c. Large orthogroups: 16 OGs with >10 gene copies per MAG on average (Table S6). Heatmaps of these OGs revealed that Planctomycetia MAGs consistently contained higher gene copy numbers in specific OGs than Phycisphaerae, but not vice versa (Fig. 4e). Furthermore, genes within these OGs predominantly function in energy and nutrient metabolism, as well as signal transduction. Examples include ATP hydrolysis (ABC transport system), glycogen metabolism, fatty acid synthesis (3-oxoacyl-ACP reductase), carbon metabolism (inositol dehydrogenase), the general secretion pathway (Sec pathway), and two-component systems (OmpR family and NtrC family). Consequently, compared to Phycisphaerae, species within the Planctomycetia class exhibit higher metabolic activity and require a greater number of functional genes. Additionally, we found that Planctomycetia possesses a higher abundance of eukaryotic-like serine/threonine protein kinases (STPKs), and the genes encoding these kinases were likely acquired via horizontal gene transfer from eukaryotes. In summary, we propose that the combined effects of functional gene redundancy and horizontal gene transfer contribute to the observed increase in genome size within the Planctomycetia class.

Alkane metabolism gene mining

Through HMMER alignment and redundancy reduction, we identified 38 candidate AcrA genes, alongside 910 AssA, 478 BcrB, and 395 BcrC genes. Subsequent phylogenetic screening yielded 68 high-confidence alkane metabolism-related genes (3 AcrA, 17 AssA, 21 BcrB, and 27 BcrC; Table S7).

Candidate AcrA genes primarily originated from the classes Halobacteriota, Methanobacteriota, and Thermoplasmatota. Notably, among four Methanoliparum MAGs in our data sets, three contained candidate AcrA genes (Ref1_s6.bin.45_00858, Ref1_s16.bin.8_00715, and Ref1_s16.bin.69_00673). A phylogenetic tree was constructed using reference genes from the KEGG database (KO: K00399) and the 38 candidate AcrA genes (Fig. 5). Within this tree, Ref1_s6.bin.45_00858 and Ref1_s16.bin.8_00715 formed a distinct cluster. Ref1_s16.bin.69_00673 clustered with reference sequences A0A0P9HYI1 and A0A0P9E1V1, both derived from the phylum Bathyarchaeota (classified as class Bathyarchaeia in GTDB v226). Bathyarchaeota members have been implicated in methane metabolism or interactions with methanogens (52, 53). Within our data set, an MAG assigned to the class Bathyarchaeia (Ref5_s8.bin.147) was identified; however, no candidate AcrA genes were detected in this genome. The remaining candidate AcrA genes did not cluster with Methanoliparum-derived sequences, suggesting they represent conventional McrA homologs.

Fig 5.

Branching diagram displaying evolutionary relationships among AcrA candidate genes and reference genes with distinct clustering patterns showing sequence divergence and phylogenetic groupings between screened and unscreened gene variants.

Phylogenetic tree of AcrA candidate genes and reference genes. Branches with the blue background indicate screened AcrA genes.

Seventeen candidate AssA genes clustered with reference sequences (Fig. S5), 65% of which belonged to the phylum Desulfobacterota, consistent with prior reports (12). Similarly, 21 BcrB and 27 BcrC genes clustered with their respective reference sequences (Fig. S6 and S7). These sequences predominantly originated from the phyla Pseudomonadota and Planctomycetota, aligning with reference sequences from AnnoTree (47).

Secondary metabolite biosynthetic gene clusters and novel antimicrobial peptide discovery

BGCs within MAGs were identified using antiSMASH. A total of 10,168 BGCs were detected across 2,580 MAGs (Table S8). Ribosomally synthesized and post-translationally modified peptides (RiPPs) constituted the most abundant class (3,251), followed by terpenes (2,069), polyketide synthases (PKS; 1,401), and NRPSs (1,208) (Fig. 6a). Analysis of BGCs per MAG revealed that 94% of MAGs contained fewer than 10 BGCs. Two MAGs harbored 30 BGCs each, predominantly RiPPs (Fig. 6a). For MAGs with ≥10 BGCs, no significant correlation was observed between BGC count and total gene count (R = 0.271). Interestingly, certain genera consistently exhibited high BGC numbers. For example, all three MAGs assigned to the genus Tistrella (phylum Pseudomonadota, class Alphaproteobacteria) contained >25 BGCs. Similarly, all three MAGs of the related genus Tistlia contained >10 BGCs (Fig. 6b). Within Tistrella MAGs, RiPPs dominated the BGC profile, whereas Tistlia MAGs primarily contained RiPPs and PKSs (Fig. 6c).

Fig 6.

BGC analysis across MAGs reveals NRPS, PKS, RiPP, terpene distribution with no positive gene-BGC correlation across genera. Tistrella and Tistlia show distinct profiles. Thirteen cAMP-containing BGCs display varied genomic organization.

BGC and cAMP information. (a) Bar chart of BGC type counts per MAG (y-axis: MAGs; top-right bar: total BGC counts per type). (b) Scatter plot of gene counts vs BGC counts for MAGs with abundant BGCs (colors indicate genera). (c) Bar chart of BGC counts for Tistrella and Tistlia. (d) Gene distribution map of 13 BGCs containing cAMPs.

Based on BGC analysis, we identified 14 cAMPs from 13 lanthipeptide BGCs (Table S9). These peptides ranged from 10 to 59 amino acids in length and predominantly derived from the classes Alphaproteobacteria (6 cAMPs) and Gammaproteobacteria (3 cAMPs) within the phylum Pseudomonadota. None of these 14 cAMPs are cataloged in the Antimicrobial Peptide Database 6 (APD6) (54), representing novel cAMPs identified for the first time in petroleum microbiomes. The cAMPs were consistently located adjacent to biosynthetic genes within their BGCs (Fig. 6d).

DISCUSSION

Substantial variation in sequencing depth was observed across samples due to diverse data sources. Sequencing depth, assembly size, and MAG yield demonstrated positive correlations, with a particularly significant association between assembly size and MAG count (Fig. S8). Thus, for petroleum microbiome metagenomic studies, maximizing the sequencing depth is recommended where feasible to ensure robust downstream analyses.

From 15 Huabei samples, we recovered 823 MAGs, compared to 758 MAGs from 32 Shengli samples. Although total assembly sizes showed no significant difference between fields (Fig. 1c), three Huabei samples yielded assemblies >1,000 Mb, each producing >100 MAGs per sample—a pattern absent in Shengli samples. This accounts for the higher MAG recovery from Huabei oilfields. Taxonomic analysis revealed 596 genera in Huabei oilfields versus 533 in Shengli oilfields, with only 106 genera overlapping (Fig. S9), indicating substantial biogeographic divergence. While previously noted, the drivers of this disparity warrant further investigation integrating geochemical parameters.

The phylum Gemmatimonadota was exclusively detected in production water and oily sludge (Fig. 2a), while previous studies reported its presence in soil, freshwater, wastewater, marine, and sediment environments (5559). Gemmatimonadota was phylogenetically classified into five classes, with the third class, Longimicrobia (corresponding to Longimicrobiales in GTDB v226), primarily found in oil and gas hydrates (60, 61), consistent with our results, suggesting Longimicrobiales may be an oil-specific microbe.

Our data set contained four Methanoliparum MAGs, three originating from Shengli oilfields—consistent with Methanoliparum’s initial isolation from this field (17). Shengli oilfields also harbored higher archaeal abundance, suggesting enhanced biodegradation potential. 16S rRNA studies indicate Methanoliparum’s environmental plasticity, having been detected in diverse settings including a biodegraded Nigerian reservoir (62) and a sulfur-rich Canadian offshore reservoir (63), highlighting its bioremediation utility for petroleum-contaminated sites.

AcrA, a key alkane-degrading gene in Methanoliparum (64), prompted identification of two candidate AcrA homologs from Bathyarchaeota in Swiss-Prot. Bathyarchaeota thrive globally in anaerobic niches—hot springs (65), sediments (66, 67), acid-sulfate springs (68), animal guts (69), and bioreactors (70)—playing crucial roles in carbon cycling (17). While Evans (71) reported Mcr (indicating methanogenesis) in two Bathyarchaeota MAGs (BA1 and BA2), recent work found no Mcr in strains ILS200 and ILS300 (72), aligning with the results of our Bathyarchaeia MAG (Ref5_s8.bin.147).

To resolve this discrepancy, we annotated all five MAGs using GTDB v226. BA1 and BA2 clustered within Bathyarchaeaceae, whereas ILS200 belonged to Bathycorpusculaceae, and ILS300 clustered with Ref5_s8.bin.147 to families WUQV01 and TCS64 (Table S10). Phylogenetic analysis revealed closer evolutionary proximity among ILS200, ILS300, and Ref5_s8.bin.147 (Fig. S10), suggesting methane metabolism may be restricted to specific lineages like Bathyarchaeaceae.

Three Tistrella MAGs in our data set contained the highest number of BGCs. Tistrella produces didemnin—an NRPS with anticancer and antiviral properties (7375). Initially isolated from marine environments (76), didemnin-producing Tistrella has since been found in wastewater (77), heavy metal-contaminated soil (78), and the Red Sea (79). Our Tistrella MAGs from Xinjiang, Changqing, and Daqing samples represent its first recovery from oil reservoirs. These MAGs harbored 7–10 NRPS-type BGCs, ranking among the highest NRPS-rich genera observed, indicating potential for synthesizing bioactive peptides.

Three cAMP prediction models, previously validated in human gut and marine microbiomes (21, 22), identified only 14 cAMPs in our data set. This limited yield likely reflects our data set size. We anticipate increased cAMP discovery with expanded sampling. All cAMPs in our data set were located near biosynthetic genes (Fig. 6d). These biosynthetic genes are responsible for post-translational modifications of precursor peptides, enabling functional diversification for antimicrobial, anticancer, and other bioactivities (80).

This study analyzed 101 metagenomes from 13 Chinese oilfields, generating 3,057 medium- and high-quality MAGs. Notably, 73.77% represent novel species. We observed gene family expansion in Planctomycetia, correlating with increased genome size. Additionally, 68 anaerobic hydrocarbon biodegradation genes were identified. This work reports the first reservoir-derived Tistrella MAGs (producers of anticancer/antiviral compounds) and 14 novel cAMPs. Petroleum reservoirs constitute a rich microbial treasure trove. Enhanced research promises advances not only in oil recovery but also in biosynthesis and biomedical applications.

ACKNOWLEDGMENTS

This work was supported by the Joint Fund for Enterprise Innovation and Development of NSFC (Grant No. U24B2037) and the Fund for General Program of NSFC (Grant No. 52374051).

C.Z. designed the project and wrote the manuscript. Shiqi Wang, L.J., and H.Z. performed the analysis. C.Y. and Shuoliang Wang reviewed and revised the manuscript.

Contributor Information

Shuoliang Wang, Email: wangshuoliang@cugb.edu.cn.

Hui Zhao, Email: zhaohui@yangtzeu.edu.cn.

Chunlei Yu, Email: yuchunly@sina.com.

John R. Spear, Colorado School of Mines, Golden, Colorado, USA

DATA AVAILABILITY

All raw data were obtained from public databases, and detailed information is provided in Table S1. The 3,057 MAGs have been uploaded to Figshare and are available for viewing and downloading via https://doi.org/10.6084/m9.figshare.31337758.

SUPPLEMENTAL MATERIAL

The following material is available online at https://doi.org/10.1128/aem.00392-26.

Supplemental figures. aem.00392-26-s0001.docx.

Fig. S1 to S10.

aem.00392-26-s0001.docx (1.6MB, docx)
DOI: 10.1128/aem.00392-26.SuF1
Supplemental tables. aem.00392-26-s0002.xlsx.

Tables S1 to S11.

aem.00392-26-s0002.xlsx (686.2KB, xlsx)
DOI: 10.1128/aem.00392-26.SuF2

ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.

REFERENCES

  • 1. Patel J, Borgohain S, Kumar M, Rangarajan V, Somasundaran P, Sen R. 2015. Recent developments in microbial enhanced oil recovery. Renew Sustain Energy Rev 52:1539–1558. doi: 10.1016/j.rser.2015.07.135 [DOI] [Google Scholar]
  • 2. Ke C-Y, Lu G-M, Wei Y-L, Sun W-J, Hui J-F, Zheng X-Y, Zhang Q-Z, Zhang X-L. 2019. Biodegradation of crude oil by Chelatococcus daeguensis HB-4 and its potential for microbial enhanced oil recovery (MEOR) in heavy oil reservoirs. Bioresour Technol 287:121442. doi: 10.1016/j.biortech.2019.121442 [DOI] [PubMed] [Google Scholar]
  • 3. Niu J, Liu Q, Lv J, Peng B. 2020. Review on microbial enhanced oil recovery: mechanisms, modeling and field trials. J Pet Sci Eng 192:107350. doi: 10.1016/j.petrol.2020.107350 [DOI] [Google Scholar]
  • 4. Ke C-Y, Sun R, Wei M-X, Yuan X-N, Sun W-J, Wang S-C, Zhang Q-Z, Zhang X-L. 2024. Microbial enhanced oil recovery (MEOR): recent development and future perspectives. Crit Rev Biotechnol 44:1183–1202. doi: 10.1080/07388551.2023.2270578 [DOI] [PubMed] [Google Scholar]
  • 5. Ke C-Y, Sun W-J, Li Y-B, Hui J-F, Lu G-M, Zheng X-Y, Zhang Q-Z, Zhang X-L. 2018. Polymer-assisted microbial-enhanced oil recovery. Energy Fuels 32:5885–5892. doi: 10.1021/acs.energyfuels.8b00812 [DOI] [Google Scholar]
  • 6. Jeong MS, Lee JH, Lee KS. 2019. Critical review on the numerical modeling of in-situ microbial enhanced oil recovery processes. Biochem Eng J 150:107294. doi: 10.1016/j.bej.2019.107294 [DOI] [Google Scholar]
  • 7. Liu J, Chen Y, Xu R, Jia Y, Wang J, Lu Y, Sun H, Yuan D. 2014. Microbial paraffin-removal technology using paraffin-degrading and biosurfactant-producing strain. Asian J Chem 26:2957–2959. doi: 10.14233/ajchem.2014.16118 [DOI] [Google Scholar]
  • 8. Head IM, Jones DM, Larter SR. 2003. Biological activity in the deep subsurface and the origin of heavy oil. Nature 426:344–352. doi: 10.1038/nature02134 [DOI] [PubMed] [Google Scholar]
  • 9. Widdel F, Knittel K, Galushko A. 2010. Anaerobic hydrocarbon-degrading microorganisms: an overview. In Handbook of hydrocarbon and lipid microbiology (ed. Timmis, K. N.) 1997–2021 [Google Scholar]
  • 10. Callaghan AV. 2013. Enzymes involved in the anaerobic oxidation of n-alkanes: from methane to long-chain paraffins. Front Microbiol 4:89. doi: 10.3389/fmicb.2013.00089 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Toth CRA, Gieg LM. 2017. Time course-dependent methanogenic crude oil biodegradation: dynamics of fumarate addition metabolites, biodegradative genes, and microbial community composition. Front Microbiol 8:2610. doi: 10.3389/fmicb.2017.02610 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Ji J-H, Liu Y-F, Zhou L, Mbadinga SM, Pan P, Chen J, Liu J-F, Yang S-Z, Sand W, Gu J-D, Mu B-Z. 2019. Methanogenic degradation of long n-alkanes requires fumarate-dependent activation. Appl Environ Microbiol 85:e00985-19. doi: 10.1128/AEM.00985-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Jones DM, Head IM, Gray ND, Adams JJ, Rowan AK, Aitken CM, Bennett B, Huang H, Brown A, Bowler BFJ, Oldenburg T, Erdmann M, Larter SR. 2008. Crude-oil biodegradation via methanogenesis in subsurface petroleum reservoirs. Nature 451:176–180. doi: 10.1038/nature06484 [DOI] [PubMed] [Google Scholar]
  • 14. Berdugo-Clavijo C, Gieg LM. 2014. Conversion of crude oil to methane by a microbial consortium enriched from oil reservoir production waters. Front Microbiol 5:197. doi: 10.3389/fmicb.2014.00197 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Borrel G, Adam PS, McKay LJ, Chen L-X, Sierra-García IN, Sieber CMK, Letourneur Q, Ghozlane A, Andersen GL, Li W-J, Hallam SJ, Muyzer G, de Oliveira VM, Inskeep WP, Banfield JF, Gribaldo S. 2019. Wide diversity of methane and short-chain alkane metabolisms in uncultured archaea. Nat Microbiol 4:603–613. doi: 10.1038/s41564-019-0363-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Laso-Pérez R, Hahn C, van Vliet DM, Tegetmeyer HE, Schubotz F, Smit NT, Pape T, Sahling H, Bohrmann G, Boetius A, Knittel K, Wegener G. 2019. Anaerobic degradation of non-methane alkanes by “Candidatus methanoliparia” in hydrocarbon seeps of the gulf of Mexico. mBio 10:e01814-19. doi: 10.1128/mBio.01814-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Zhou Z, Zhang C, Liu P, Fu L, Laso-Pérez R, Yang L, Bai L, Li J, Yang M, Lin J, Wang W, Wegener G, Li M, Cheng L. 2022. Non-syntrophic methanogenic hydrocarbon degradation by an archaeal species. Nature 601:257–262. doi: 10.1038/s41586-021-04235-2 [DOI] [PubMed] [Google Scholar]
  • 18. Gao P, Gao Y, Wang H, Ma T, Gu JD. 2023. An evaluation of different detection methods for anaerobic ammonium-oxidizing (anammox) bacteria inhabiting the oil reservoir systems. Int Biodeterior Biodegradation 177:105536. doi: 10.1016/j.ibiod.2022.105536 [DOI] [Google Scholar]
  • 19. Sun X, Ma Y, Pang L, Wei J, Fu H, Li Y, Li Y, Lu J, Bao M. 2025. Enhancement of petroleum degradation by biochar supported Fe3O4 nanoparticles: role of immobilized Rhodococcus hoagii S4 in oil spill remediation. J Hazard Mater 496:139333. doi: 10.1016/j.jhazmat.2025.139333 [DOI] [PubMed] [Google Scholar]
  • 20. Lazzaro BP, Zasloff M, Rolff J. 2020. Antimicrobial peptides: application informed by evolution. Science 368:eaau5480. doi: 10.1126/science.aau5480 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Ma Y, Guo Z, Xia B, Zhang Y, Liu X, Yu Y, Tang N, Tong X, Wang M, Ye X, Feng J, Chen Y, Wang J. 2022. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat Biotechnol 40:921–931. doi: 10.1038/s41587-022-01226-0 [DOI] [PubMed] [Google Scholar]
  • 22. Chen J, Jia Y, Sun Y, Liu K, Zhou C, Liu C, Li D, Liu G, Zhang C, Yang T, et al. 2024. Global marine microbial diversity and its potential in bioprospecting. Nature 633:371–379. doi: 10.1038/s41586-024-07891-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Santos-Júnior CD, Torres MDT, Duan Y, Rodríguez Del Río Á, Schmidt TSB, Chong H, Fullam A, Kuhn M, Zhu C, Houseman A, Somborski J, Vines A, Zhao X-M, Bork P, Huerta-Cepas J, de la Fuente-Nunez C, Coelho LP. 2024. Discovery of antimicrobial peptides in the global microbiome with machine learning. Cell 187:3761–3778. doi: 10.1016/j.cell.2024.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Zhang C-J, Zhou Z, Cha G, Li L, Fu L, Liu L-Y, Yang L, Wegener G, Cheng L, Li M. 2024. Anaerobic hydrocarbon biodegradation by alkylotrophic methanogens in deep oil reservoirs. ISME J 18:wrae152. doi: 10.1093/ismejo/wrae152 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Hidalgo KJ, Sierra-Garcia IN, Zafra G, de Oliveira VM. 2021. Genome-resolved meta-analysis of the microbiome in oil reservoirs worldwide. Microorganisms 9:1812. doi: 10.3390/microorganisms9091812 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Liu Y-F, Galzerani DD, Mbadinga SM, Zaramela LS, Gu J-D, Mu B-Z, Zengler K. 2018. Metabolic capability and in situ activity of microorganisms in an oil reservoir. Microbiome 6:5. doi: 10.1186/s40168-017-0392-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Yun Y, Lv T, Gui Z, Su T, Cao W, Tian X, Chen Y, Wang S, Jia Z, Li G, Ma T. 2024. Composition and metabolic flexibility of hydrocarbon-degrading consortia in oil reservoirs. Bioresour Technol 409:131244. doi: 10.1016/j.biortech.2024.131244 [DOI] [PubMed] [Google Scholar]
  • 28. An L, Liu X, Wang J, Xu J, Chen X, Liu X, Hu B, Nie Y, Wu X-L. 2024. Global diversity and ecological functions of viruses inhabiting oil reservoirs. Nat Commun 15:6789. doi: 10.1038/s41467-024-51101-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Liu Y-F, Chen J, Liu Z-L, Shou L-B, Lin D-D, Zhou L, Yang S-Z, Liu J-F, Li W, Gu J-D, Mu B-Z. 2020. Anaerobic degradation of paraffins by thermophilic actinobacteria under methanogenic conditions. Environ Sci Technol 54:10610–10620. doi: 10.1021/acs.est.0c02071 [DOI] [PubMed] [Google Scholar]
  • 30. Chen YX, Chen YS, Shi C, Huang Z, Zhang Y, Li S, Li Y, Ye J, Yu C, Li Z, Zhang X, Wang J, Yang H, Fang L, Chen Q. 2018. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience 7. doi: 10.1093/gigascience/gix120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676. doi: 10.1093/bioinformatics/btv033 [DOI] [PubMed] [Google Scholar]
  • 32. Uritskiy GV, DiRuggiero J, Taylor J. 2018. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6:158. doi: 10.1186/s40168-018-0541-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Bowers RM, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D, Reddy TBK, Schulz F, Jarett J, Rivers AR, Eloe-Fadrosh EA, et al. 2017. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol 35:725–731. doi: 10.1038/nbt.3893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Olm MR, Brown CT, Brooks B, Banfield JF. 2017. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J 11:2864–2868. doi: 10.1038/ismej.2017.126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH. 2020. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36:1925–1927. doi: 10.1093/bioinformatics/btz848 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268–274. doi: 10.1093/molbev/msu300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Letunic I, Bork P. 2024. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res 52:W78–W82. doi: 10.1093/nar/gkae268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. doi: 10.1038/nmeth.1923 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup . 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Chaisson MJ, Tesler G. 2012. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13:238. doi: 10.1186/1471-2105-13-238 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Aroney STN, Newell RJP, Nissen JN, Camargo AP, Tyson GW, Woodcroft BJ. 2025. CoverM: read alignment statistics for metagenomics. Bioinformatics 41:btaf147. doi: 10.1093/bioinformatics/btaf147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Lyu F, Han F, Ge C, Mao W, Chen L, Hu H, Chen G, Lang Q, Fang C. 2023. OmicStudio: a composable bioinformatics cloud platform with real-time feedback that can generate high-quality graphs for publication. Imeta 2:e85. doi: 10.1002/imt2.85 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30:2068–2069. doi: 10.1093/bioinformatics/btu153 [DOI] [PubMed] [Google Scholar]
  • 44. Cantalapiedra CP, Hernández-Plaza A, Letunic I, Bork P, Huerta-Cepas J. 2021. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol 38:5825–5829. doi: 10.1093/molbev/msab293 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Emms DM, Kelly S. 2019. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238. doi: 10.1186/s13059-019-1832-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Eddy SR. 2008. A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput Biol 4:e1000069. doi: 10.1371/journal.pcbi.1000069 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Mendler K, Chen H, Parks DH, Lobb B, Hug LA, Doxey AC. 2019. AnnoTree: visualization and exploration of a functionally annotated microbial tree of life. Nucleic Acids Res 47:4442–4448. doi: 10.1093/nar/gkz246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28:3150–3152. doi: 10.1093/bioinformatics/bts565 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. doi: 10.1093/nar/gkh340 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Blin K, Shaw S, Augustijn HE, Reitz ZL, Biermann F, Alanjary M, Fetter A, Terlouw BR, Metcalf WW, Helfrich EJN, van Wezel GP, Medema MH, Weber T. 2023. antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation. Nucleic Acids Res 51:W46–W50. doi: 10.1093/nar/gkad344 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Zhou L, Wu J, Ji J-H, Gao J, Liu Y-F, Wang B, Yang S-Z, Gu J-D, Mu B-Z. 2023. Characteristics of microbiota, core sulfate-reducing taxa and corrosion rates in production water from five petroleum reservoirs in China. Science of The Total Environment 858:159861. doi: 10.1016/j.scitotenv.2022.159861 [DOI] [PubMed] [Google Scholar]
  • 52. Zhou Z, Pan J, Wang F, Gu J-D, Li M. 2018. Bathyarchaeota: globally distributed metabolic generalists in anoxic environments. FEMS Microbiol Rev 42:639–655. doi: 10.1093/femsre/fuy023 [DOI] [PubMed] [Google Scholar]
  • 53. Xue S-D, Yi X-Y, Cui H-L, Li M, Peng J-J, Zhu Y-G, Duan G-L. 2023. Global biogeographic distribution of bathyarchaeota in paddy soils. mSystems 8:e0014323. doi: 10.1128/msystems.00143-23 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Wang G, Schmidt C, Li X, Wang Z. 2026. APD6: the antimicrobial peptide database is expanded to promote research and development by deploying an unprecedented information pipeline. Nucleic Acids Res 54:D363–D374. doi: 10.1093/nar/gkaf860 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Zhang H, Sekiguchi Y, Hanada S, Hugenholtz P, Kim H, Kamagata Y, Nakamura K. 2003. Gemmatimonas aurantiaca gen. nov., sp. nov., a gram-negative, aerobic, polyphosphate-accumulating micro-organism, the first cultured representative of the new bacterial phylum Gemmatimonadetes phyl. nov. Int J Syst Evol Microbiol 53:1155–1163. doi: 10.1099/ijs.0.02520-0 [DOI] [PubMed] [Google Scholar]
  • 56. Gugliandolo C, Michaud L, Lo Giudice A, Lentini V, Rochera C, Camacho A, Maugeri TL. 2016. Prokaryotic community in lacustrine sediments of byers peninsula (livingston island, maritime antarctica). Microb Ecol 71:387–400. doi: 10.1007/s00248-015-0666-8 [DOI] [PubMed] [Google Scholar]
  • 57. Zeng Y, Baumbach J, Barbosa EGV, Azevedo V, Zhang C, Koblížek M. 2016. Metagenomic evidence for the presence of phototrophic Gemmatimonadetes bacteria in diverse environments. Environ Microbiol Rep 8:139–149. doi: 10.1111/1758-2229.12363 [DOI] [PubMed] [Google Scholar]
  • 58. Delgado-Baquerizo M, Oliverio AM, Brewer TE, Benavent-González A, Eldridge DJ, Bardgett RD, Maestre FT, Singh BK, Fierer N. 2018. A global atlas of the dominant bacteria found in soil. Science 359:320–325. doi: 10.1126/science.aap9516 [DOI] [PubMed] [Google Scholar]
  • 59. Mujakić I, Andrei A-Ş, Shabarova T, Fecskeová LK, Salcher MM, Piwosz K, Ghai R, Koblížek M. 2021. Common presence of phototrophic Gemmatimonadota in temperate freshwater lakes. mSystems 6:e01241-20. doi: 10.1128/mSystems.01241-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Pascual J, García-López M, Bills GF, Genilloud O. 2016. Longimicrobium terrae gen. nov., sp. nov., an oligotrophic bacterium of the under-represented phylum gemmatimonadetes isolated through a system of miniaturized diffusion chambers. Int J Syst Evol Microbiol 66:1976–1985. doi: 10.1099/ijsem.0.000974 [DOI] [PubMed] [Google Scholar]
  • 61. Mujakić I, Piwosz K, Koblížek M. 2022. Phylum gemmatimonadota and its role in the environment. Microorganisms 10:151. doi: 10.3390/microorganisms10010151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Conlette OC. 2016. Microbial communities of light crude from Nigeria and potential for in situ biodegradation, souring, and corrosion. Petroleum Science and Technology 34:71–77. doi: 10.1080/10916466.2015.1122622 [DOI] [Google Scholar]
  • 63. Okpala GN, Chen C, Fida T, Voordouw G. 2017. Effect of thermophilic nitrate reduction on sulfide production in high temperature oil reservoir samples. Front Microbiol 8:1573. doi: 10.3389/fmicb.2017.01573 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Zhang H, Huang S, Zou X, Shi W, Liang M, Lin Y, Zheng M, Tang X. 2024. Exploring the biosynthetic potential of tistrella species for producing didemnin antitumor agents. ACS Chem Biol 19:2176–2185. doi: 10.1021/acschembio.4c00384 [DOI] [PubMed] [Google Scholar]
  • 65. Barns SM, Delwiche CF, Palmer JD, Pace NR. 1996. Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences. Proc Natl Acad Sci USA 93:9188–9193. doi: 10.1073/pnas.93.17.9188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Kubo K, Lloyd KG, F Biddle J, Amann R, Teske A, Knittel K. 2012. Archaea of the miscellaneous crenarchaeotal group are abundant, diverse and widespread in marine sediments. ISME J 6:1949–1965. doi: 10.1038/ismej.2012.37 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Fan X, Xing P. 2016. Differences in the composition of archaeal communities in sediments from contrasting zones of lake taihu. Front Microbiol 7:1510. doi: 10.3389/fmicb.2016.01510 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. McKay LJ, Hatzenpichler R, Inskeep WP, Fields MW. 2017. Occurrence and expression of novel methyl-coenzyme M reductase gene (mcrA) variants in hot spring sediments. Sci Rep 7:7252. doi: 10.1038/s41598-017-07354-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Loh HQ, Hervé V, Brune A. 2020. Metabolic potential for reductive acetogenesis and a novel energy-converting [NiFe] hydrogenase in Bathyarchaeia from termite guts – a genome-centric analysis. Front Microbiol 11:635786. doi: 10.3389/fmicb.2020.635786 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Liu X, Lee C, Kim JY. 2023. Comparison of mesophilic and thermophilic anaerobic digestions of thermal hydrolysis pretreated swine manure: process performance, microbial communities and energy balance. J Environ Sci (China) 126:222–233. doi: 10.1016/j.jes.2022.03.032 [DOI] [PubMed] [Google Scholar]
  • 71. Evans PN, Parks DH, Chadwick GL, Robbins SJ, Orphan VJ, Golding SD, Tyson GW. 2015. Methane metabolism in the archaeal phylum Bathyarchaeota revealed by genome-centric metagenomics. Science 350:434–438. doi: 10.1126/science.aac7745 [DOI] [PubMed] [Google Scholar]
  • 72. Deb S, Das SK. 2022. Phylogenomic analysis of metagenome-assembled genomes deciphered novel acetogenic nitrogen-fixing bathyarchaeota from hot spring sediments. Microbiol Spectr 10:e0035222. doi: 10.1128/spectrum.00352-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Vera MD, Joullié MM. 2002. Natural products as probes of cell biology: 20 years of didemnin research. Med Res Rev 22:102–145. doi: 10.1002/med.10003 [DOI] [PubMed] [Google Scholar]
  • 74. Tourneau C, Raymond E, Faivre S. 2007. Aplidine: a paradigm of how to handle the activity and toxicity of a novel marine anticancer poison. CPD 13:3427–3439. [PubMed] [Google Scholar]
  • 75. Lee J, Currano JN, Carroll PJ, Joullié MM. 2012. Didemnins, tamandarins and related natural products. Nat Prod Rep 29:404–424. doi: 10.1039/c2np00065b [DOI] [PubMed] [Google Scholar]
  • 76. Rinehart KL Jr, Gloer JB, Hughes RG Jr, Renis HE, McGovren JP, Swynenberg EB, Stringfellow DA, Kuentzel SL, Li LH. 1981. Didemnins: antiviral and antitumor depsipeptides from a caribbean tunicate. Science 212:933–935. doi: 10.1126/science.7233187 [DOI] [PubMed] [Google Scholar]
  • 77. Shi B-H, Arunpairojana V, Palakawong S, Yokota A. 2002. Tistrella mobilis gen nov, sp nov, a novel polyhydroxyalkanoate-producing bacterium belonging to alpha-proteobacteria. J Gen Appl Microbiol 48:335–343. doi: 10.2323/jgam.48.335 [DOI] [PubMed] [Google Scholar]
  • 78. Zhang D-C, Liu H-C, Zhou Y-G, Schinner F, Margesin R. 2011. Tistrella bauzanensis sp. nov., isolated from soil. Int J Syst Evol Microbiol 61:2227–2230. doi: 10.1099/ijs.0.026930-0 [DOI] [PubMed] [Google Scholar]
  • 79. Xu Y, Kersten RD, Nam S-J, Lu L, Al-Suwailem AM, Zheng H, Fenical W, Dorrestein PC, Moore BS, Qian P-Y. 2012. Bacterial biosynthesis and maturation of the didemnin anti-cancer agents. J Am Chem Soc 134:8625–8632. doi: 10.1021/ja301735a [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Russell AH, Truman AW. 2020. Genome mining strategies for ribosomally synthesised and post-translationally modified peptides. Comput Struct Biotechnol J 18:1838–1851. doi: 10.1016/j.csbj.2020.06.032 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental figures. aem.00392-26-s0001.docx.

Fig. S1 to S10.

aem.00392-26-s0001.docx (1.6MB, docx)
DOI: 10.1128/aem.00392-26.SuF1
Supplemental tables. aem.00392-26-s0002.xlsx.

Tables S1 to S11.

aem.00392-26-s0002.xlsx (686.2KB, xlsx)
DOI: 10.1128/aem.00392-26.SuF2

Data Availability Statement

All raw data were obtained from public databases, and detailed information is provided in Table S1. The 3,057 MAGs have been uploaded to Figshare and are available for viewing and downloading via https://doi.org/10.6084/m9.figshare.31337758.


Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES