ABSTRACT
Natural microbial communities produce a diverse array of secondary metabolites with ecologically and biotechnologically relevant activities. Some of them have been used clinically as drugs, and their production pathways have been identified in a few culturable microorganisms. However, since the vast majority of microorganisms in nature have not been cultured, identifying the synthetic pathways of these metabolites and tracking their hosts remain a challenge. The microbial biosynthetic potential of mangrove swamps remains largely unknown. Here, we examined the diversity and novelty of biosynthetic gene clusters in dominant microbial populations in mangrove wetlands by mining 809 newly reconstructed draft genomes and probing the activities and products of these clusters by using metatranscriptomic and metabolomic techniques. A total of 3,740 biosynthetic gene clusters were identified from these genomes, including 1,065 polyketide and nonribosomal peptide gene clusters, 86% of which showed no similarity to known clusters in the Minimum Information about a Biosynthetic Gene Cluster (MIBiG) repository. Of these gene clusters, 59% were harbored by new species or lineages of Desulfobacterota-related phyla and Chloroflexota, whose members are highly abundant in mangrove wetlands and for which few synthetic natural products have been reported. Metatranscriptomics revealed that most of the identified gene clusters were active in field and microcosm samples. Untargeted metabolomics was also used to identify metabolites from the sediment enrichments, and 98% of the mass spectra generated were unrecognizable, further supporting the novelty of these biosynthetic gene clusters. Our study taps into a corner of the microbial metabolite reservoir in mangrove swamps, providing clues for the discovery of new compounds with valuable activities.
IMPORTANCE At present, the majority of known clinical drugs originated from cultivated species of a few bacterial lineages. It is vital for the development of new pharmaceuticals to explore the biosynthetic potential of naturally uncultivable microorganisms using new techniques. Based on the large numbers of genomes reconstructed from mangrove wetlands, we identified abundant and diverse biosynthetic gene clusters in previously unsuspected phylogenetic groups. These gene clusters exhibited a variety of organizational architectures, especially for nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS), implying the presence of new compounds with valuable activities in the mangrove swamp microbiome.
KEYWORDS: biosynthetic gene cluster, mangrove swamps, metabolomic, metagenomics, metatranscriptomic, natural product
INTRODUCTION
A vast number of microorganisms in nature can produce a diverse array of secondary metabolites. In microbial communities, these metabolites confer various ecological functions to microorganisms, including interspecies communication and competition (1). Aside from their biological functions, these natural products are important sources for the development of antibiotics, antitumor or antiviral drugs, and other useful compounds such as food additives and cosmeceuticals (2). Among these secondary metabolites, polyketides, nonribosomal peptides, and ribosomally synthesized and posttranslationally modified peptides (RiPPs) have received more attention, as most clinical drugs belong to this group (3, 4).
At present, the majority of known clinical drugs originated from cultivated species of a few bacterial lineages, including Actinomycetes, myxobacteria, Pseudomonas, and Bacillus, which constitute only a minority of natural microbial communities (5). Although some bioactive compounds with broad antimicrobial activities were identified from pure cultures of Ktedonobacteria of the phylum Chloroflexi, they have not been purified for downstream in-depth analyses (6). Microbial surveys of various natural environments have revealed that 70% of all known prokaryotic phyla have not yet been cultivated (7), which represents a largely untapped pool of new compounds encoded by microorganisms. Since genes encoding metabolite biosynthetic pathways usually form a cluster on the microbial genome, several computational tools such as antiSMASH (8) and Prediction Informatics for Secondary Metabolomes (PRISM) (9) have been developed to identify these biosynthetic gene clusters (BGCs). Based on these tools, Crits-Christoph et al. identified many new BGCs in hundreds of metagenome-assembled genomes (MAGs) reconstructed from grassland soils, belonging to previously understudied phyla (10). By using long-read metagenomic sequencing on soil samples from Antarctica, a number of highly divergent BGCs, including a new family of RiPPs, were found in contigs or MAGs belonging to the actinobacterial classes Acidimicrobiia and Thermoleophilia and the gammaproteobacterial order UBA7966 (11). Aside from bacteria, archaea from marine environments (i.e., Heimdallarchaeota and “Candidatus Lokiarchaeota”) were also found to contain potentially novel BGCs (12). Recently, the biosynthetic potential of the open-ocean microbiome was investigated by using 10,000 microbial genomes from cultivated and single cells as well as 25,000 MAGs from seawater samples, revealing approximately 40,000 novel BGCs (13). Experimental characterization of two RiPP pathways revealed unusual enzymology and compound structures (13). These studies have opened avenues for tapping into the biosynthetic potential of uncultivated microorganisms.
Mangrove swamps are distributed in tropical and subtropical coastal areas across the globe and are found in over 100 countries (14). They cover nearly 60 to 70% of the world’s tropical and subtropical coastlines and are considered highly productive ecosystems (15). Mangrove foliage and nutrients transported by runoff and tides make these wetlands a paradise for bacteria, other decomposers, and filter feeders. Although these mangrove ecosystems harbor extremely rich microorganisms (15), the biosynthetic potential of their microbiome and their potential as a reservoir of natural products remain largely uncharted. Here, we reconstructed thousands of high-quality genomes from mangrove swamp metagenomes and identified large numbers of novel BGCs from dominant microbial populations. Most of these groups have not been previously reported for their biosynthetic potential. We found that many of the BGCs were harbored by newly identified species or lineages and were active in in situ sediment or microcosm samples with nutrient addition. We also identified many unusual nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) gene clusters in organizational architectures that have not been uncovered previously.
RESULTS AND DISCUSSION
We generated 21 metagenomic samples from mangrove wetland sediments collected from Techeng Island and Tongming Port in Zhanjiang of Guangdong Province and Qingmei Port in Sanya of Hainan Province (see Fig. S1a in the supplemental material), with sequencing data ranging from 60 to 120 Gbp (Table S1). De novo assembly and binning of contigs led to the reconstruction of 1,803 draft genomes (≥70% completeness and ≤10% contamination), among which 1,588 were affiliated with bacteria and 215 were affiliated with archaea. We focused on analyzing the biosynthetic potential of newly reconstructed genomes of Chloroflexota (225), Proteobacteria (284), and 5 Desulfobacterota-related phyla, including 213 Desulfobacterota, 13 Desulfobacterota_B, and 74 genomes from Desulfobacterota_D, Desulfobacterota_E and Desulfobacterota_F (Table S2). These populations were targeted because they were highly abundant at our sampling sites (Fig. 1a) and globally in other mangrove swamps (Fig. S1b and c). Using a 95% whole-genome average nucleotide identity as a cutoff, we classified these wetland-related MAGs into 533 species-level clusters (Fig. 1b). Ninety-nine percent of these species were not previously represented in the Genome Taxonomy Database (GTDB) (Fig. 1b). Using the GTDB Toolkit (GTDB-Tk), we identified 33 new genera and 1 new order from Chloroflexota MAGs and 5 new genera and 1 new order from MAGs of Desulfobacterota-related phyla (Fig. S2 and S3).
FIG 1.

Composition of the microbial communities and diversity of their genomes and biosynthetic gene clusters in mangrove swamps. (a) Relative abundances of microbial phyla determined based on 16S rRNA genes assembled from metagenomes generated from mangrove wetlands. (b) UpSet Venn diagram illustrating the species in reconstructed genomes. The blue bars on the left show the numbers of distinct species that were identified from the reconstructed genomes in the three phyla studied and the corresponding reference genomes in the GTDB by grouping these genomes into species-level clusters (95% average nucleotide identity). The two charts on the right (black dots and black bars) indicate the number of unique species in each data set (indicated by one black dot, with no intersection with the other three data sets) or intersections among the four data sets (indicated by more than one black dot). The data reveal that 99% of these reconstructed genomes are not represented in the GTDB. (c) Distribution of different types of biosynthetic gene clusters predicted by antiSMASH. “Other” gene clusters include homoserine lactone, lasso peptide, lanthipeptide, and unclassified clusters. (d) Number and type of biosynthetic gene clusters in each phylum studied. (e) Number and type of NRPS and PKS gene clusters from each phylum studied. The small images in the top right corners of panels d and e indicate the average number of gene clusters contained per genome for the studied phyla. NAGGN, N-acetylglutaminylglutamine amide.
Diversity and novelty of BGCs in these wetland-related microbial populations.
Within the genomes, we identified 3,740 biosynthetic gene clusters using antiSMASH 6.0, 2,041 of which were found on contigs of ≥10 kb in length (Fig. 1c and Table S3). We then clustered them into 79 gene cluster families (GCFs) that consisted of at least five gene clusters (Fig. S4) using BiG-SCAPE (16). All of these GCFs were not represented in the Minimum Information about a Biosynthetic Gene Cluster (MIBiG) repository. The predicted gene clusters synthesized a variety of secondary metabolites such as polyketides, aryl polyene, nonribosomal peptides, betalactone, terpene, RiPPs, ranthipeptide, thiopeptide, ectoine, and phosphonate, etc. The most prevalent gene clusters were terpene gene clusters (18.5%), followed by NRPS gene clusters (15.1%), PKS gene clusters (13%), and RiPP-like gene clusters (11.1%) (Fig. 1c). These biosynthetic gene clusters were widely distributed across the seven phyla that we studied (Fig. 1d). NRPS and PKS gene clusters were more common in Desulfobacterota-related phyla (60.3%) and Proteobacteria (67.6%) than in Chloroflexota (33.3%) (Table S3). RiPP-like clusters, including ranthipeptide and thiopeptide clusters, were more abundant in Desulfobacterota-related phyla (34.3%) than in Chloroflexota (27.6%) and Proteobacteria (27.8%). At present, many of the clinical drugs that we use are produced from a small number of bacterial isolates of Actinobacteria, Proteobacteria, and Bacillus (5). Recently, based on the mining of MAGs, large numbers of biosynthetic genes were identified in newly found members of the Acidobacteria, Verrucomicrobia, and Gemmatimonadetes (10). Compared to other lineages, the production of secondary metabolites by members of the Chloroflexota and the Desulfobacterota-related phyla has been poorly investigated. We first identified large numbers of biosynthetic gene clusters in these wetland-associated microorganisms and provided a genomic context for the more efficient deciphering of their natural products.
The NRPS and PKS gene clusters are intriguing because of their ability to synthesize many drugs used clinically, including the erythromycin, tetracycline, penicillin, cephalosporin, and vancomycin families as well as many others (17). These gene clusters encode the coordinated, multistep actions of enzymes organized in assembly lines responsible for the selection, activation, and condensation of building blocks as well as modifications of the growing chain, ultimately yielding highly diverse and extensively modified compounds. We identified 1,065 PKS, NRPS, and hybrid PKS-NRPS gene clusters from these wetland-associated genomes, 540 of which were located on contigs of >10 kb (Fig. 1e and Table S3). The average number of NRPS and PKS clusters contained per genome was the highest in Desulfobacterota_B (4.5 on average), followed by Desulfobacterota_D and Desulfobacterota (Fig. 1e). We also identified 178 NRPS, 126 PKS, and 58 hybrid gene clusters in >10-kb contigs from newly identified lineages of these phyla (Table S3). PKSs included three types (types I, II, and III) that differ in the organization of their catalytic domains. We did not identify a canonical type II PKS but detected a considerable number of heterocyst glycolipid synthase-like PKS (HglE-KS) clusters that were originally found in the cyanobacteria and are thought to be involved in the synthesis of unusual heterocyst glycolipids (18). The presence of these HglE-KS clusters implies that these wetland microorganisms may be capable of producing glycolipids with complex chemical structures. Although the NRPS and PKS gene clusters are diverse in their domain arrangements, the catalytic sites of many NRPSs and type I PKSs are organized using a modular strategy. These modular NRPS and PKS clusters can be used to predict the core chemical structures of their products (19). In this study, we obtained the likely chemical structures of products synthesized by 439 NRPS and PKS clusters using antiSMASH (Table S3). Except for traditional NRPS and PKS gene clusters, we also identified many peculiar hybrid gene clusters and NRPS-like clusters (Text S1, Fig. S5 to S7, and Table S4). The core enzymes of most of them were organized in a nonmodular way, highlighting their diverse architectures (Fig. S6).
The above-described 1,065 PKS and NRPS gene clusters were queried using BLASTP analysis against protein sequences from the MIBiG repository. The MIBiG sequences have been manually curated and have known functions. In the best hits with ≥50% of the alignment length, 63% of the predicted proteins showed <40% amino acid identity to reference sequences, and only 0.4% were found to have >70% identity to reference sequences. Overall, all predicted proteins in these clusters showed an average of only 37.5% amino acid identity to the top hit of any MIBiG sequence. Furthermore, the ketosynthase and condensation domains of these gene clusters were compared to protein sequences in the Natural Product Domain Seeker (NaPDoS) database (20) and then subjected to phylogenetic analysis. The results showed that most of these domains were distantly related to the known reference domains in the NaPDoS database and form different divergent clusters (Fig. S8). It was suggested previously that genetic changes within biosynthetic gene clusters will incur the invention of new molecules (21). Our analyses suggested that these wetland-associated microorganisms are likely to produce many novel pharmaceutical compounds. Some degenerate PCR primers have been used to investigate the diversity of PKS and NRPS gene clusters in the environment (22, 23); however, they are not applicable to the amplification of novel sequences with differences. We assessed the in silico effectiveness of primers used previously to amplify the conserved domains of PKS and NRPS gene clusters reported in this study and found that only 68 of 1,065 clusters could be amplified successfully by using these primers (Table S5).
Unusually abundant BGCs were identified in new lineages of Desulfobacterota.
Three high-quality genomes (h03s_bin53, h03m_bin77, and T15_bin234) belonging to newly identified genera of Desulfobacterota or the novel family of Desulfobacterota_B contained unusually abundant biosynthetic gene clusters (Table S2). We refer to the three genomes as “Candidatus Desulfoinsularis” (h03s_bin53), “Ca. Desulfopalusris” (h03m_bin77), and “Ca. Desulfoporticola” (T15_bin234), respectively. A total of 24 biosynthetic gene clusters, with a total length of 525 kb, were identified in the 5.8-Mb “Ca. Desulfoinsularis” genome, which comprised 21 PKS and NRPS, 1 aryl polyene, 1 resorcinol, and 1 acyl amino acid gene cluster. Six of these PKS and NRPS clusters were particularly large, with lengths ranging from 22 to 40 kb (Fig. 2a). Although aryl polyene gene clusters are classified as a single class in antiSMASH, previous studies have shown that aryl polyene is produced by a novel type II PKS that is widespread among Gram-negative bacteria (24, 25). We observed that the aryl polyene cluster of “Ca. Desulfoinsularis” harbors 50 coding regions of approximately 45 kb long, typical of type II PKS gene clusters. In the 7.7-Mb genome of “Ca. Desulfopalusris” and the 7.1-Mb genome of “Ca. Desulfoporticola,” there are 17 and 17 biosynthetic gene clusters, with total lengths of 476 kb and 438 kb, respectively. Some unusually large repertoires of PKS and NRPS clusters that are over 25 kb long were found in the two genomes (Fig. 2b and c). In addition, they harbored an aryl polyene cluster, a lanthipeptide cluster, an RiPP recognition element (RRE) containing cluster, a terpene cluster, a betalactone cluster, and a homoserine lactone cluster. These biosynthetic gene clusters for the three genomes were further verified by using Prediction Informatics for Secondary Metabolomes (PRISM) (26) (Fig. S9). Many antibiotics were found to inhibit the growth of other bacteria when used at high concentrations (27). But it has been suggested that antibiotics may be responsible for mediating microbial interactions and communication when used at low concentrations (27). Thus, metabolites from the Desulfobacterota-related phyla may serve as signals in the natural environment and may play important roles in the construction of wetland microbial communities. Our data revealed that the presence of large numbers of biosynthetic gene clusters is not unique to known lineages of the Actinomycetales, Proteobacteria, Cyanobacteria, Bacilli, and Acidobacteria (10).
FIG 2.

Large NRPS and PKS clusters identified in three high-quality genomes (“Ca. Desulfoinsularis,” “Ca. Desulfopalusris,” and “Ca. Desulfoporticola”) belonging to newly identified lineages of Desulfobacterota and Desulfobacterota_B. The types of genes and domains predicted in clusters are shaded in different colors. The positions of NRPS and PKS domains in the genome are indicated for the core biosynthetic genes. The identifiers for the biosynthetic gene clusters are provided in Table S3 in the supplemental material. A, adenylation domain; ACP, acyl carrier protein; AT, acyltransferase; C, glycopeptide condensation; CAL, coenzyme A ligase; cMT, carbon methyltransferase; DH, dehydratase; DHt, dehydratase domain variant more commonly found in trans-acyltransferase PKS clusters; KR, ketoreductase; KS, ketosynthase; NAD, male sterility protein; PCP, peptidyl carrier protein; TD, terminal reductase; PP, Phosphopantetheine; TE, Thioesterase domain.
A small subset of BGCs is widespread in mangrove wetlands.
We examined the distribution and abundance of each of the identified BGCs (BGC lengths of >15 kb) across 114 publicly available metagenomes generated from mangrove wetlands around the world by mapping reads to these BGCs using Diamond BLASTX. Out of 968 clusters analyzed, 707 were identified in at least one of these mangrove wetland samples, in which 185 clusters were present in >20% of the samples (regarded as a cosmopolitan distribution) (Table S6). These widely distributed clusters can be classified into 17 different types of BGCs, of which betalactone, terpene, NRPS, and PKS types are more abundant than other types (Fig. 3a).
FIG 3.
Widely distributed biosynthetic gene clusters detected in mangrove swamps around the world. (a) Numbers and types of widely distributed gene clusters. hserlactone, homoserine lactone. (b) Heat map showing NRPS and PKS clusters that are present in >20% of the metagenomes. The abundance scores of the gene clusters in the metagenomes are indicated by a color gradient ranging from 3 to 1,000 (see Materials and Methods for details). CDPS, tRNA-dependent cyclodipeptide synthases; LAP, Linear azol(in)e-containing peptides.
Nine aryl polyene clusters, with two of them showing higher abundances in most metagenomes (one identified in 97% of samples and the other identified in 48% of samples), are widely distributed in mangrove wetlands (Fig. 3b). A further survey of 1,588 MAGs from mangrove wetlands identified a total of 509 aryl polyene gene clusters, which are distributed in 21 of 57 bacterial phyla (Fig. S10). These findings highlight the important functions that aryl polyene played in the development of the microbial community in mangrove wetlands. We detected 61 cosmopolitan NRPS, PKS, and hybrid clusters (Fig. 3b), 70% of which have no homologs in the MIBiG repository. The wide distribution of these novel clusters is advantageous for cloning their full-length gene clusters from environmental DNA libraries and further characterizing them by heterologous expression. Using a culture-independent platform, a distinctive class of antibiotics called malacidins that are prevalent in soil microbiomes has recently been found (28). Two NRPSs, NRPS122 and NRPS10, were discovered to have >50% sequence similarity in their core biosynthetic domains to the two BGCs that code for the synthesis of anabaenopeptin and vioprolide, respectively (Fig. S11). Although no similarity was observed for other additional genes among these BGCs, their products may share similar chemistries. Anabaenopeptins are a family of cyclic hexapeptides produced by cyanobacteria, and they can be used as potent protease inhibitors (29), while vioprolides are a promising class of anticancer and antifungal lead compounds produced by Cystobacter violaceus Cb vi35 (U.S. patents US20100028298 and US20110173707) (30). We also found eight ranthipeptide clusters and five thiopeptide clusters that are widely distributed among mangrove wetlands (Fig. 3a). It was demonstrated previously that thiopeptides, which are extensively modified RiPPs, exhibit significant antibacterial actions against a number of Gram-positive pathogens (25). Our observations suggest that BGCs producing compounds with ecologically or biotechnologically relevant activities are abundant in mangrove wetland microbiomes.
Metatranscriptomic and metabolomic analyses.
We conducted metatranscriptomic analyses on 10 sediment samples in situ and 8 microcosm samples from a sampling site in the mangrove wetland of Tongming Port to determine if the BGCs reported in this article are active. High-quality total RNA cannot be easily extracted from wetland sediment. Among the 10 in situ sediment samples, only the RNA from 1 surface sample met the requirements for sequencing after trial and error. This may be due to the highly complex humus contained in sediment as well as the high level of diversity of the microorganisms. To resolve this challenge, we cultured these sediment samples in a microcosm by adding glucose or methanol over 15 days. These nutrients cause specific microbial populations to flourish quickly and become dominant in communities. In the end, we obtained enough high-quality RNA from all microcosm samples for metatranscriptomic sequencing. Using ribosomal protein S3 (rpS3), we estimated the composition of microbial communities from these metatranscriptomic data and found that members of the seven phyla that we looked at were consistently present in microcosm studies (Fig. S12). The metatranscriptomic reads were mapped to full MAGs for probing the expression of each BGC by pseudoaligning exact matches using Kallisto (31). On the whole, we detected the expression of 2,689 gene clusters out of 3,740 BGCs with any level of gene expression (Tables S3 and S7), suggesting that most of the BGCs are active. Among these expressed gene clusters, 359 are NRPS and/or PKS clusters, 263 are NRPS-like clusters, 143 are aryl polyene clusters, and 357 are RiPP clusters. We found the expression of genes in 181 of these 185 widely dispersed clusters, including 21 NRPS and/or PKS clusters, 9 aryl polyene clusters, 7 ranthipeptide clusters, and 5 thiopeptide clusters, with 74% of these clusters being actively transcribed in the in situ sediment samples (Table S6), suggesting their potential role in mediating microbe-microbe interactions in natural wetlands. Most of these expressed NRPS and/or PKS clusters (52.4%), three ranthipeptide clusters, and the five thiopeptide clusters are harbored by members of the Desulfobacterota (40 to 54% of members belong to new genera or families).
In in situ samples and all microcosm samples cultured for 15 days, we discovered that genes within 219 clusters, including 22 NRPS and/or PKS clusters and 3 thiopeptide clusters, are continuously transcribed (Table S7). In addition, 170 gene clusters, including 17 PKS clusters, 23 NRPS-like clusters, and 12 RiPP clusters, are cryptic in in situ sediment samples but are expressed in all incubated samples, suggesting their responses to substrate addition. These expressed gene clusters are distributed in all seven phyla that we studied, with the exception of Desulfobacterota_F, and 32.6% were in Desulfobacterota-related phyla (Table S7). We detected the expression of genes within 17 clusters of “Ca. Desulfoporticola” of Desulfobacterota_B, including 19 genes with NRPS and/or PKS domains, and 10 clusters of h03b_bin158 of Desulfobacterota, including 7 genes with 1 cosmopolitan thiopeptide cluster (Fig. 4a and Table S7). These genes showed consecutive expression in the 15-day amendment experiments. We performed coexpression analyses for all genes for each of the two MAGs using the WGCNA package across the microcosm time point samples (Fig. 4b and c and Table S8). For “Ca. Desulfoporticola,” genes from a PKS cluster and an NRPS-like cluster were coexpressed together with various genes involved in environmental stresses such as those encoding chaperonins, heat shock proteins, and type II toxin-antitoxin systems as well as genes involved in macromolecule transport (tricarboxylate transporters, extracellular solute-binding proteins, and outer membrane protein assembly factor BamD, etc.) and transcriptional regulation. We detected a module in h03b_bin158 in which genes from the cosmopolitan thiopeptide cluster were found to be coexpressed with genes from ranthipeptide, aryl polyene, and betalactone clusters, suggesting their collaborative responses to environmental changes. Homologs of genes encoding β-lactamase, the multidrug efflux transporter AcrB, salmonella virulence plasmid 65-kDa B protein, oligopeptide/dipeptide ABC transporter, and a type II/IV secretion system protein were also coexpressed in this module.
FIG 4.

Metatranscriptomics and metabolomics of biosynthetic gene clusters. (a) Heat map of the expression levels of genes from biosynthetic gene clusters in “Ca. Desulfoporticola” of Desulfobacterota_B and h03b_bin158 of Desulfobacterota in the field and microcosm samples. The expression levels of genes are represented as log10-converted transcripts per million (TPM). (b and c) Representatives of transcriptional coexpression network modules identified in the genomes of “Ca. Desulfoporticola” and h03b_bin158. The transcripts of genes in modules are represented by nodes. The connectivity between transcripts is indicated by edges. Gene function is shaded with colors. Detailed information for the two modules is provided in Table S8 in the supplemental material. ich-P, citramalyl-CoA hydro-lyase; omp, Two-component system; chll, photochlorophyllide reductase subunit chlL; espd, transporter ESBP6-like. (d to f). The three cases with high similarity between compounds identified by metabolomics (red boxes) and products of biosynthetic gene clusters predicted by antiSMASH or PRISM4 (gray boxes). The similarity of the topological fingerprints was analyzed using the RDKit tool and is indicated by the blue boxes. The identifiers for the biosynthetic gene clusters are provided in Table S3 in the supplemental material.
To parse the products of these BGCs, we conduct metabolomic analyses on 12 microcosm samples 7 days following the addition of glucose or methanol. The resulting tandem mass spectrometry (MS/MS) data were analyzed by searching against the Global Natural Social Molecular Networking database (32). Only 1.6% of the average 21,293 spectra generated per sample, or 140 distinct chemical structures, were matched with known analogs in the collection (Table S9). For these unique compounds, 10 harbor peptide bonds devoid of carbonyl groups, and 21 lack peptide bonds but possess carbonyl groups. We compared the products of the BGCs predicted by antiSMASH or PRISM4 (Tables S3 and S10) with the compounds identified by mass spectrometry using the RDKit tool based on topological fingerprints (https://github.com/rdkit/rdkit). Out of 467 predicted products of BGCs available, 20 were found to have >80% fingerprint similarity to known compounds (Fig. 4d to f and Table S11). To deduce products from novel BGCs, we further used CORASON (16) to examine the evolutionary relationships between BGCs within and across GCFs. We found a family of PKSs comprising 28 BGCs that can be grouped into a phylogenetic tree inferred by using the 3-oxoacyl-acyl carrier protein (ACP) synthase gene of these PKS clusters as a query using CORASON (Fig. S13). When adding all reference BGCs containing the 3-oxoacyl-ACP synthase gene in the MIBiG repository, a BGC from the deep-sea isolate Photobacterium profundum SS9 that synthesizes Eicosapentaenoic acid (EPA) was integrated into the tree by CORASON. EPA was identified in our metabolomics data, which suggests that these novel PKS clusters may produce molecules with similar chemical modifications whose chemical structures are analogous to that of EPA. Ultimately, it can be concluded from the metabolomic data that 98% of the mass spectra are unrecognizable, suggesting the existence of plentiful unique secondary metabolites in mangrove wetlands.
Here, we have demonstrated the untapped potential of secondary metabolite production in dominant microbial communities from mangrove swamps. These BGCs displayed extremely intricate and varied architectural designs, some of which were extensively dispersed in mangrove swamps across the world. We also found that many of the BGCs were harbored by newly identified species or lineages and active in in situ sediment or in microcosm samples with nutrient addition. The predicted products for dozens of BGCs were linked to metabolomic data, further underpinning the activities of these BGCs. It is self-evident that the heterologous expression of the new BGCs is still a key strategy to uncover the chemical structures of their products. Nevertheless, our data support the effectiveness of large-scale genome mining based on the reconstruction of MAGs and provide vital clues for the experimental characterization of these gene clusters that produce new compounds with valuable activities.
MATERIALS AND METHODS
Sample collection and metagenomic sequencing.
Seven samples from mangrove swamps of Techeng Island were part of a previous study that aimed to examine the structure and function of the microbial communities in mangrove wetlands (33, 34). Fourteen samples in mangrove wetlands from Qingmei Port in Sanya of Hainan Province and Tongming Port in Zhanjiang of Guangdong Province were collected on 7 November 2020 and 12 June 2021, respectively. Sampling protocols were described in detail previously (33, 34). Genomic DNA was extracted from ~10 g of wet sediment samples using a PowerSoil DNA isolation kit (MoBio Laboratories, Carlsbad, CA, USA). Metagenomic sequencing was performed on the HiSeq 2500 platform at Suzhou Genewiz Biotechnology Company (Suzhou, China). Each sample generated 80 to 120 Gbp of sequencing data (2× 150-bp paired-end reads).
Metagenomic assembly, binning, and genome curation.
The raw reads were trimmed using cutadapt (v.1.9.1) (https://cutadapt.readthedocs.io/en/stable/). The resulting clean reads from the 21 samples were individually assembled de novo using MEGAHIT (v.1.2.9) (35) with the following parameters: –presets meta-large. The sequencing coverage of each contig was obtained using Bowtie2 (36), SAMtools (37), and the jgi_summarize_bam_contig_depth script in MetaBAT (v.2.15) (38). The contigs were binned using MetaBAT 8 times with 8 combinations of specificity and sensitivity parameters (-m 1500, –maxP 90 or 60, –minS 90 or 60, and -maxEdges 200 or 500). The resulting bins were dereplicated, aggregated, and sorted out using the DAS tool (v.1.1.1) (–score_threshold 0.25) (39) to generate accurate bins. RefineM (v.0.1.2) (40) was used to remove the outlier contigs within these bins based on the GC content, tetranucleotide frequency, or coverage profiles. Finally, the bins obtained were curated manually to further remove contaminating contigs based on multicopy marker genes. Completeness, contamination, and strain heterogeneity were assessed using CheckM (v.1.1.3) (41). A total of 4,029 individual bins were recovered from all samples, 1,803 of which belonged to high-quality genomes (≥70% completeness and ≤10% contamination). Taxonomy assignment of these bins was conducted using the Genome Taxonomy Database Toolkit (GTDB-Tk) (42). Detailed information on the genome bins is shown in Table S2 in the supplemental material.
Genomic analysis of biosynthetic gene clusters.
Predictions of BGCs in the genomes from mangrove wetlands were performed using antiSMASH 6.0 (8) with default full-featured run parameters. The results for predicted BGCs are summarized in Table S3. All NRPS and PKS clusters were confirmed using PRISM4 (26). The modular composition of gene clusters and the chemical structures of the products were analyzed using PRISM4 (26) and are summarized in Table S10. These BGCs were clustered into gene cluster families (GCFs) using BiG-SCAPE in “hybrids” mode (v.0.31), and the GCFs were then grouped into gene cluster clans (GCCs). To choose an optimal cutoff value for BiG-SCAPE analysis (16), a string of values ranging from 0.05 to 1 were tested based on the number of vertices/edges in network statistics (Fig. S4). Reference BGCs in the MIBiG repository (43) were added in BiG-SCAPE analysis (16) for the determination of the novelty of GCFs. The architectures of the BGCs were visualized using the R package, and the domain organization was manually inspected.
To annotate the domains of NRPS and PKS clusters, the domain sequences were extracted using a custom Python script, submitted to the NaPDoS Web server (20), and then analyzed using default parameters. The maximum likelihood phylogenies of these domains were computed with IQ-TREE (44) using the LG+F+G4 model.
To examine whether the NRPS and PKS clusters are novel, these clusters were compared to known gene clusters in the MIBiG repository (43), and the top hit was reported. Several sets of degenerate primers used in previous studies of biosynthetic diversity (23, 45) were used to computationally amplify genes of ketosynthase and adenylation domains in NRPS and PKS clusters using EMBOSS in PrimerSearch mode (http://emboss.sourceforge.net/).
Building of species trees.
Bacterial reference genomes from Desulfobacterota-related phyla, Chloroflexota, and Proteobacteria were downloaded from the GTDB (https://gtdb.ecogenomic.org/). Species trees were built using a concatenated set of 120 bacterium-specific maker proteins in the GTDB. The homologs of these maker proteins were identified using HMMER (46) in GTDB-Tk (42). Maximum likelihood trees were computed with IQ-TREE using the LG+F+I+G4 model. An Acidobacteriota genome (GenBank accession no. GCA_003524335.1) was used as an outgroup for these trees.
Distribution of biosynthetic gene clusters.
The protein sequences of the BGCs identified in this study were used to construct a database. Reads of 114 publicly available metagenomes generated from mangrove wetlands around the world were searched against the database of BGCs using Diamond BLASTX (47) (cutoffs of an E value of <1e−5, an identity of ≥80%, and an alignment length of ≥90%). The top gene hit per read was picked out. A gene was regarded as being present in a metagenomic sample if reads mapping to the gene covered at least 50% of the gene length. To ensure accuracy, we removed nonbiosynthetic genes in BGCs (e.g., transporters, transcriptional regulators, and transposases, etc.) as previously described (48). Because each BGC contains multiple genes, a BGC was regarded as being present only when ≥50% of the biosynthetic genes of the BGC were detected in a metagenome. The abundance of a gene was estimated using a custom Python script. The formula used was abundance score of a gene = average frequency of all nucleotides matched in a gene × percent coverage of reads recruited to the gene. The abundance of a BGC is the average of all abundance scores of its genes that were detected in the metagenome. Any values of >1,000 for BGC abundance were replaced with a value of 1,000.
Metatranscriptomic analysis of field and microcosm samples.
Surface and subsurface sediment samples (0- to 10-cm depth) were collected from a mangrove swamp of Tongming Port in Zhanjiang of Guangdong Province on 19 October 2021. For the samples used for in situ metatranscriptomic analysis, each sediment core was mixed in a plastic bag, immediately frozen in liquid N2, and then stored in a cooler with dry ice for RNA extraction. The samples used in microcosm experiments were stored at 4°C after collection. In the laboratory, these sediment samples were mixed with autoclaved artificial seawater at a ratio of 1:1 to make a slurry. The slurry was evenly distributed into 12 brown culture bottles with a cap (25 mL of the slurry each). Ten millimolar glucose or methanol substrate was added to these bottles, and the cap was closed. Subsequently, these bottles were placed into an incubator and incubated at 25°C for 15 days in the dark. Samples for RNA extraction were taken 7, 9, 12, and 15 days after substrate addition. RNA was extracted using RNeasy PowerSoil total RNA kits according to the manufacturer’s protocol. The rRNA was removed from the total RNA using an rRNA removal kit. cDNA libraries were constructed and sequenced on the Novaseq 6000 platform at Suzhou Genewiz Biotechnology Company (Suzhou, China). Each sample generated 20 Gbp of sequencing data (2× 150-bp paired-end reads).
A custom Python script was used to retrieve antiSMASH6-predicted gene sequences from all genomes reconstructed in this study. These gene sequences were then used to build a Kallisto index (31), and their transcript abundances were quantified using Kallisto pseudoalignment of paired reads using default parameters (31). Weighted gene coexpression network analyses were conducted using the WGCNA package (49) on genes in BGCs from the two genomes (T15_bin234 and h03b_bin158) that were found to show consecutive expression during microcosm experiments. The transcripts per million for each gene were log10 converted. We selected 18 and 10 as soft thresholds for the two genomes because at that time, their scale-free topology fit index was >0.8. Pearson correlations were used to construct an adjacency matrix. The module was identified using blockwiseModules with a minimum cluster size of 30 (minModuleSize=30) and threshold of 0.25 (mergeCutHeight=0.25). The resulting modules were exported and visualized using Cytoscape (https://cytoscape.org/). Genes in modules were annotated using InterProScan (50) and KEGG (51).
Ribosomal protein S3 (rpS3) was used as a single-copy phylogenetic marker to track the bacterial abundance at the phylum level during microcosm experiments. The rpS3 proteins in contigs that were assembled from the metagenomes were identified using a custom hidden Markov model and confirmed by searching against the nonredundant (nr) database using BLASTP (52) (cutoff of an E value of <1e−5). The taxonomic information for these ribosomal protein S3 (rpS3) was obtained by searching against a database of rpS3 that were identified from all genomes in the GTDB using BLASTP (52). Reads from metatranscriptomes were searched against these rpS3 using Diamond BLASTX (47) (cutoff of an E value of <1e−5). The relative abundance of reads mapped to rpS3 from different phyla was reported.
Untargeted metabolomic analysis by mass spectrometry.
In sediment microcosm experiments, samples for metabolomics were taken 7 days after the addition of glucose or methanol. Six replicates were performed for each treatment. Metabolites in sediment were profiled using a Vanquish ultrahigh-performance liquid chromatography (UHPLC) system coupled with an Orbitrap Q Exactive HF-X mass spectrometer (Thermo Fisher, Germany). The slurry was lyophilized, filtered with a mesh screen (2 mm), and then ground using liquid nitrogen. The resulting homogenate was resuspended in an extraction solution containing 80% methanol and centrifuged (20 min at 15,000 × g at 4°C) after vibration. The supernatant was injected into a 100- by 2.1-mm Hypersil gold column. This column was eluted at a flow rate of 200 μL/min using a 17-min linear gradient. The mobile phases for the positive-polarity mode were 0.1% formic acid in water (mobile phase A) and 100% methanol (mobile phase B), while for the negative-polarity mode, the mobile phases were composed of 5 mM ammonium acetate in water (mobile phase A) and 100% methanol (mobile phase B). The mass spectrometer was operated using electrospray ionization in positive/negative-polarity mode using full-scan analysis over m/z 100 to 1,500 with the following parameters: ion spray voltage of 3.2 kV, capillary temperature of 320°C, sheath gas of 40, auxiliary gas of 10, funnel radio frequency (RF) level of 40, and auxiliary gas heater temperature of 350°C.
Raw data were uploaded to the Global Natural Social Molecular Networking database (https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp) and processed using default parameters (do not search for analogs). The top hit was compared to the products of the BGCs predicted by antiSMASH (8) or PRISM4 (26) in this study using the RDKit tool (https://github.com/rdkit/rdkit). The chemical structures of the compounds were visualized using the RDKit tool.
Data availability.
All genome bins in this study are archived at the National Microbiology Data Center under BioProject accession no. NMDC10018206. All supplemental Excel tables, genome data sets, and scripts used in this study are available at Figshare (https://figshare.com/s/63effa2eb2cd9664d10a). Raw sequencing reads have been deposited in the SRA database under BioProject accession no. PRJNA958466 with SRA accession no. SRR24270821 to SRR24270841.
ACKNOWLEDGMENTS
H.-P.D. and L.-J.H. were supported by the National Natural Science Foundation of China (41971125 and 42030411), the National Science Foundation for Distinguished Young Scholars (41725002), and the Chinese National Key Programs for Fundamental Research and Development (no. 2016YFA0600904 and 2016YFE0133700).
H.-P.D. and L.-J.H. conceived the study. J.-W.Z. and H.-P.D. recovered genomes from metagenomes and analyzed these genomic data. J.-W.Z. and R.W. performed analyses of phylogenies and environmental distributions of genes. H.-P.D., L.-J.H., X.L., P.H., Y.-L.Z., X.-F.L., D.-Z.G., and M.L. wrote the manuscript and supplemental material. All authors provided comments on the manuscript.
We declare no competing interests.
Footnotes
Supplemental material is available online only.
Contributor Information
Li-Jun Hou, Email: ljhou@sklec.ecnu.edu.cn.
Hong-Po Dong, Email: hpdong@sklec.ecnu.edu.cn.
Knut Rudi, Norwegian University of Life Sciences.
REFERENCES
- 1.Traxler MF, Kolter R. 2015. Natural products in soil microbe interactions and evolution. Nat Prod Rep 32:956–970. doi: 10.1039/c5np00013k. [DOI] [PubMed] [Google Scholar]
- 2.Bérdy J. 2005. Bioactive microbial metabolites. J Antibiot (Tokyo) 58:1–26. doi: 10.1038/ja.2005.1. [DOI] [PubMed] [Google Scholar]
- 3.Finking R, Marahiel MA. 2004. Biosynthesis of nonribosomal peptides. Annu Rev Microbiol 58:453–488. doi: 10.1146/annurev.micro.58.030603.123615. [DOI] [PubMed] [Google Scholar]
- 4.Weissman KJ, Leadlay PF. 2005. Combinatorial biosynthesis of reduced polyketides. Nat Rev Microbiol 3:925–936. doi: 10.1038/nrmicro1287. [DOI] [PubMed] [Google Scholar]
- 5.Rappe MS, Giovannoni SJ. 2003. The uncultured microbial majority. Annu Rev Microbiol 57:369–394. doi: 10.1146/annurev.micro.57.030502.090759. [DOI] [PubMed] [Google Scholar]
- 6.Zheng Y, Saitou A, Wang C-M, Toyoda A, Minakuchi Y, Sekiguchi Y, Ueda K, Takano H, Sakai Y, Abe K, Yokota A, Yabe S. 2019. Genome features and secondary metabolites biosynthetic potential of the class Ktedonobacteria. Front Microbiol 10:893. doi: 10.3389/fmicb.2019.00893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Achtman M, Wagner M. 2008. Microbial diversity and the genetic nature of microbial species. Nat Rev Microbiol 6:431–440. doi: 10.1038/nrmicro1872. [DOI] [PubMed] [Google Scholar]
- 8.Blin K, Shaw S, Kloosterman AM, Charlop-Powers Z, van Wezel GP, Medema MH, Weber T. 2021. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res 49:W29–W35. doi: 10.1093/nar/gkab335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Skinnider MA, Merwin NJ, Johnston CW, Magarvey NA. 2017. PRISM 3: expanded prediction of natural product chemical structures from microbial genomes. Nucleic Acids Res 45:W49–W54. doi: 10.1093/nar/gkx320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Crits-Christoph A, Diamond S, Butterfield CN, Thomas BC, Banfield JF. 2018. Novel soil bacteria possess diverse genes for secondary metabolite biosynthesis. Nature 558:440–444. doi: 10.1038/s41586-018-0207-y. [DOI] [PubMed] [Google Scholar]
- 11.Waschulin V, Borsetto C, James R, Newsham KK, Donadio S, Corre C, Wellington E. 2022. Biosynthetic potential of uncultured Antarctic soil bacteria revealed through long-read metagenomic sequencing. ISME J 16:101–111. doi: 10.1038/s41396-021-01052-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen R, Wong HL, Kindler GS, MacLeod FI, Benaud N, Ferrari BC, Burns BP. 2020. Discovery of an abundance of biosynthetic gene clusters in Shark Bay microbial mats. Front Microbiol 11:1950. doi: 10.3389/fmicb.2020.01950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Paoli L, Ruscheweyh HJ, Forneris CC, Hubrich F, Kautsar S, Bhushan A, Lotti A, Clayssen Q, Salazar G, Milanese A, Carlstrom CI, Papadopoulou C, Gehrig D, Karasikov M, Mustafa H, Larralde M, Carroll LM, Sanchez P, Zayed AA, Cronin DR, Acinas SG, Bork P, Bowler C, Delmont TO, Gasol JM, Gossert AD, Kahles A, Sullivan MB, Wincker P, Zeller G, Robinson SL, Piel J, Sunagawa S. 2022. Biosynthetic potential of the global ocean microbiome. Nature 607:111–118. doi: 10.1038/s41586-022-04862-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Giri C, Ochieng E, Tieszen LL, Zhu Z, Singh A, Loveland T, Masek J, Duke N. 2011. Status and distribution of mangrove forests of the world using earth observation satellite data. Glob Ecol Biogeogr 20:154–159. doi: 10.1111/j.1466-8238.2010.00584.x. [DOI] [Google Scholar]
- 15.Thatoi H, Behera BC, Mishra RR, Dutta SK. 2013. Biodiversity and biotechnological potential of microorganisms from mangrove ecosystems: a review. Ann Microbiol 63:1–19. doi: 10.1007/s13213-012-0442-7. [DOI] [Google Scholar]
- 16.Navarro-Munoz JC, Selem-Mojica N, Mullowney MW, Kautsar SA, Tryon JH, Parkinson EI, De Los Santos ELC, Yeong M, Cruz-Morales P, Abubucker S, Roeters A, Lokhorst W, Fernandez-Guerra A, Cappelini LTD, Goering AW, Thomson RJ, Metcalf WW, Kelleher NL, Barona-Gomez F, Medema MH. 2020. A computational framework to explore large-scale biosynthetic diversity. Nat Chem Biol 16:60–68. doi: 10.1038/s41589-019-0400-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Fischbach MA, Walsh CT. 2006. Assembly-line enzymology for polyketide and nonribosomal peptide antibiotics: logic, machinery, and mechanisms. Chem Rev 106:3468–3496. doi: 10.1021/cr0503097. [DOI] [PubMed] [Google Scholar]
- 18.Campbell EL, Cohen MF, Meeks JC. 1997. A polyketide-synthase-like gene is involved in the synthesis of heterocyst glycolipids in Nostoc punctiforme strain ATCC 29133. Arch Microbiol 167:251–258. doi: 10.1007/s002030050440. [DOI] [PubMed] [Google Scholar]
- 19.Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E, Breitling R. 2011. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res 39:W339–W346. doi: 10.1093/nar/gkr466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ziemert N, Podell S, Penn K, Badger JH, Allen E, Jensen PR. 2012. The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS One 7:e34064. doi: 10.1371/journal.pone.0034064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Medema MH, Cimermancic P, Sali A, Takano E, Fischbach MA. 2014. A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis. PLoS Comput Biol 10:e1004016. doi: 10.1371/journal.pcbi.1004016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Charlop-Powers Z, Owen JG, Reddy BV, Ternei MA, Brady SF. 2014. Chemical-biogeographic survey of secondary metabolism in soil. Proc Natl Acad Sci USA 111:3757–3762. doi: 10.1073/pnas.1318021111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Charlop-Powers Z, Owen JG, Reddy BV, Ternei MA, Guimaraes DO, de Frias UA, Pupo MT, Seepe P, Feng Z, Brady SF. 2015. Global biogeographic sampling of bacterial secondary metabolism. Elife 4:e05048. doi: 10.7554/eLife.05048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Grammbitter GLC, Schmalhofer M, Karimi K, Shi Y-M, Schoner TA, Tobias NJ, Morgner N, Groll M, Bode HB. 2019. An uncommon type II PKS catalyzes biosynthesis of aryl polyene pigments. J Am Chem Soc 141:16615–16623. doi: 10.1021/jacs.8b10776. [DOI] [PubMed] [Google Scholar]
- 25.Cimermancic P, Medema MH, Claesen J, Kurita K, Wieland Brown LC, Mavrommatis K, Pati A, Godfrey PA, Koehrsen M, Clardy J, Birren BW, Takano E, Sali A, Linington RG, Fischbach MA. 2014. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158:412–421. doi: 10.1016/j.cell.2014.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Skinnider MA, Johnston CW, Gunabalasingam M, Merwin NJ, Kieliszek AM, MacLellan RJ, Li H, Ranieri MRM, Webster ALH, Cao MPT, Pfeifle A, Spencer N, To QH, Wallace DP, Dejong CA, Magarvey NA. 2020. Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences. Nat Commun 11:6058. doi: 10.1038/s41467-020-19986-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Davies J, Ryan KS. 2012. Introducing the parvome: bioactive compounds in the microbial world. ACS Chem Biol 7:252–259. doi: 10.1021/cb200337h. [DOI] [PubMed] [Google Scholar]
- 28.Hover BM, Kim S-H, Katz M, Charlop-Powers Z, Owen JG, Ternei MA, Maniko J, Estrela AB, Molina H, Park S, Perlin DS, Brady SF. 2018. Culture-independent discovery of the malacidins as calcium-dependent antibiotics with activity against multidrug-resistant Gram-positive pathogens. Nat Microbiol 3:415–422. doi: 10.1038/s41564-018-0110-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Rouhiainen L, Jokela J, Fewer DP, Urmann M, Sivonen K. 2010. Two alternative starter modules for the non-ribosomal biosynthesis of specific anabaenopeptin variants in Anabaena (Cyanobacteria). Chem Biol 17:265–273. doi: 10.1016/j.chembiol.2010.01.017. [DOI] [PubMed] [Google Scholar]
- 30.Schummer D, Höfle G, Forche E, Reichenbach H, Wray V, Domke TJLA. 1996. Vioprolides: new antifungal and cytotoxic peptolides from Cystobacter violaceus. Liebigs Ann/Recl 1996:971–978. [Google Scholar]
- 31.Bray NL, Pimentel H, Melsted P, Pachter L. 2016. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 34:525–527. doi: 10.1038/nbt.3519. [DOI] [PubMed] [Google Scholar]
- 32.Wang M, Carver JJ, Phelan VV, Sanchez LM, Garg N, Peng Y, Nguyen DD, Watrous J, Kapono CA, Luzzatto-Knaan T, Porto C, Bouslimani A, Melnik AV, Meehan MJ, Liu W-T, Crusemann M, Boudreau PD, Esquenazi E, Sandoval-Calderon M, Kersten RD, Pace LA, Quinn RA, Duncan KR, Hsu C-C, Floros DJ, Gavilan RG, Kleigrewe K, Northen T, Dutton RJ, Parrot D, Carlson EE, Aigle B, Michelsen CF, Jelsbak L, Sohlenkamp C, Pevzner P, Edlund A, McLean J, Piel J, Murphy BT, Gerwick L, Liaw C-C, Yang Y-L, Humpf H-U, Maansson M, Keyzers RA, Sims AC, Johnson AR, Sidebottom AM, Sedio BE, et al. 2016. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat Biotechnol 34:828–837. doi: 10.1038/nbt.3597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zhang J-W, Dong H-P, Hou L-J, Liu Y, Ou Y-F, Zheng Y-L, Han P, Liang X, Yin G-Y, Wu D-M, Liu M, Li M. 2021. Newly discovered Asgard archaea Hermodarchaeota potentially degrade alkanes and aromatics via alkyl/benzyl-succinate synthase and benzoyl-CoA pathway. ISME J 15:1826–1843. doi: 10.1038/s41396-020-00890-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Ou Y-F, Dong H-P, McIlroy SJ, Crowe SA, Hallam SJ, Han P, Kallmeyer J, Simister RL, Vuillemin A, Leu AO, Liu Z, Zheng Y-L, Sun Q-L, Liu M, Tyson GW, Hou L-J. 2022. Expanding the phylogenetic distribution of cytochrome b-containing methanogenic archaea sheds light on the evolution of methanogenesis. ISME J 16:2373–2387. doi: 10.1038/s41396-022-01281-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Li D, Liu C-M, Luo R, Sadakane K, Lam T-W. 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31:1674–1676. doi: 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]
- 36.Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup . 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kang DD, Froula J, Egan R, Wang Z. 2015. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3:e1165. doi: 10.7717/peerj.1165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, Banfield JF. 2018. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat Microbiol 3:836–843. doi: 10.1038/s41564-018-0171-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Parks DH, Rinke C, Chuvochina M, Chaumeil PA, Woodcroft BJ, Evans PN, Hugenholtz P, Tyson GW. 2017. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat Microbiol 2:1533–1542. doi: 10.1038/s41564-017-0012-7. [DOI] [PubMed] [Google Scholar]
- 41.Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055. doi: 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. 2019. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36:1925–1927. doi: 10.1093/bioinformatics/btz848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kautsar SA, Blin K, Shaw S, Navarro-Munoz JC, Terlouw BR, van der Hooft JJJ, van Santen JA, Tracanna V, Suarez Duran HG, Pascal Andreu V, Selem-Mojica N, Alanjary M, Robinson SL, Lund G, Epstein SC, Sisto AC, Charkoudian LK, Collemare J, Linington RG, Weber T, Medema MH. 2020. MIBiG 2.0: a repository for biosynthetic gene clusters of known function. Nucleic Acids Res 48:D454–D458. doi: 10.1093/nar/gkz882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Nguyen L-T, Schmidt HA, Von Haeseler A, Minh BQ. 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268–274. doi: 10.1093/molbev/msu300. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Charlop-Powers Z, Pregitzer CC, Lemetre C, Ternei MA, Maniko J, Hover BM, Calle PY, McGuire KL, Garbarino J, Forgione HM, Charlop-Powers S, Brady SF. 2016. Urban park soil microbiomes are a rich reservoir of natural product biosynthetic diversity. Proc Natl Acad Sci USA 113:14811–14816. doi: 10.1073/pnas.1615581113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Mistry J, Finn RD, Eddy SR, Bateman A, Punta M. 2013. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res 41:e121. doi: 10.1093/nar/gkt263. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Buchfink B, Xie C, Huson DH. 2015. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
- 48.Donia MS, Cimermancic P, Schulze CJ, Wieland Brown LC, Martin J, Mitreva M, Clardy J, Linington RG, Fischbach MA. 2014. A systematic analysis of biosynthetic gene clusters in the human microbiome reveals a common family of antibiotics. Cell 158:1402–1414. doi: 10.1016/j.cell.2014.08.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Langfelder P, Horvath S. 2008. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong S-Y, Lopez R, Hunter S. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Kanehisa M, Sato Y, Morishima K. 2016. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol 428:726–731. doi: 10.1016/j.jmb.2015.11.006. [DOI] [PubMed] [Google Scholar]
- 52.Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental material. Download aem.00102-23-s0001.pdf, PDF file, 6.5 MB (6.5MB, pdf)
Supplemental material. Download aem.00102-23-s0002.xlsx, XLSX file, 4.3 MB (4.3MB, xlsx)
Data Availability Statement
All genome bins in this study are archived at the National Microbiology Data Center under BioProject accession no. NMDC10018206. All supplemental Excel tables, genome data sets, and scripts used in this study are available at Figshare (https://figshare.com/s/63effa2eb2cd9664d10a). Raw sequencing reads have been deposited in the SRA database under BioProject accession no. PRJNA958466 with SRA accession no. SRR24270821 to SRR24270841.

