Abstract
Acid mine drainage (AMD) is usually acidic (pH < 4) and contains high concentrations of dissolved metals and metalloids, making AMD a typical representative of extreme environments. Recent studies have shown that microbes play a key role in AMD bioremediation, and secondary metabolite biosynthetic gene clusters (smBGCs) from AMD microbes are important resources for the synthesis of antibacterial and anticancer drugs. Here, 179 samples from 13 mineral types were used to analyze the putative novel microorganisms and secondary metabolites in AMD environments. Among 7,007 qualified metagenome-assembled genomes (MAGs) mined from these datasets, 6,340 MAGs could not be assigned to any GTDB species representative. Overall, 11,856 smBGCs in eight categories were obtained from 7,007 qualified MAGs, and 10,899 smBGCs were identified as putative novel smBGCs. We anticipate that these datasets will accelerate research in the field of AMD bioremediation, aid in the discovery of novel secondary metabolites, and facilitate investigation into gene functions, metabolic pathways, and CNPS cycles in AMD.
Subject terms: Water microbiology, Metagenomics, Water microbiology
| Measurement(s) | Metagenome |
| Technology Type(s) | Next Generation Sequencing |
| Sample Characteristic - Environment | acid mine drainage |
| Sample Characteristic - Location | Guizhou Province • Marsberg • Municipality of Canaa dos Carajas • North Wales • Town of Vershire • State of Washington • Guangdong Province • Germany • County (Sweden) • Commonwealth of Pennsylvania • Province of Ontario • Hunan Province • Jiangxi Province • Guangxi Zhuang Autonomous Region • Anhui Province • Fujian Province |
Background & Summary
Acid mine drainage (AMD) is a type of acidic (pH < 4) and metal-enriched water that results from the accelerated oxidative dissolution of exposed minerals, principally sulfides, and is associated with mining1,2. The strong acidity and heavy metal toxicity of AMD has caused severe pollution to surrounding water systems and soils2–4, making AMD one of the most serious environmental problems arising during the mining of mineral resources5,6. Metabolically-active acidophilic microorganisms have been observed in AMD7,8, including microbes primarily from the Bacteria (such as Proteobacteria, Nitrospirae, Actinobacteria, Firmicutes, and Acidobacteria) and Archaea domains9.
Microbes in AMD play a key role in the bioremediation of AMD environments10,11. For example, Acidithiobacillus12, one of the most common genera in AMD, includes microbes with chemolithotrophic metabolisms that are able to oxidize Fe2+ and sulfur compounds (such as Acidithiobacillus ferrooxidans, Acidithiobacillus ferridurans, and Acidithiobacillus ferrivorans)9,13,14, or oxidize sulfur compounds alone (such as Acidithiobacillus caldus, Acidithiobacillus thiooxidans, and Acidithiobacillus albertensis)15–17. Sulfate-reducing bacteria (SRB), a group of diverse anaerobic microorganisms that are ubiquitous in natural habitats, have been utilized in AMD remediation11.
Secondary metabolite biosynthetic gene clusters (smBGCs) found in AMD microbes are important resources for the synthesis of antibacterial and anticancer drugs18,19. A previous study reported that microbes including Penicillium sp., Penicillium rubrum, Penicillium solitum, Penicillium clavigerum, Chaetomium funicola, and Pithomyces sp. were separated and cultivated from water and sediment samples in a pit lake formed by the former Berkeley copper mine, among which worthwhile secondary metabolites were found20. For example, berkelic acid, a secondary metabolite of Penicillium sp., had anti-OVCAR-3 activity in NCI-DTP60; berkeleydione, the terpenoid secondary metabolite of Penicillium rubrum, showed selective activity against non-small cell lung cancer NCI-H460 in NCI-DTP 60; and CHCl3 extracted from Penicillium solitum strongly inhibited MMP-3 and caspase-1. In addition, cyclodipeptide synthases (CDPSs) that were capable of synthesizing cyclodipeptide, a precursor of 2,5-diketopiperazines, were found to be produced by 23 metagenome-assembled genomes (MAGs) (LMSG_G000006317.1–LMSG_G000006339.1) in Diplorickettsiaceae in this study21–23. Therefore, mining smBGCs from AMD may reveal valuable secondary metabolites18.
In this study, data were collected and the GTDB species representative assignment for the binned MAGs and putative novel smBGCs of 111 samples from nine mineral types were analyzed. The same method was used to reanalyze public metagenomic datasets consisting of 68 samples of eight mineral types from seven countries. In total, this study obtained the analysis results of metagenomic datasets covering 179 samples of 13 projects across 13 mineral types from seven countries (Table 1, Supplementary Table 1, Figs. 1a,2). A total of 7,007 MAGs mined from the datasets exceeded the medium-quality level of the MIMAG standard24, including 981 MAGs determined to be high quality (Table 2, Supplementary Table 2). Further taxonomic analysis by GTDB-Tk showed that 1,394 MAGs were classified into 150 existed genera, while 5,613 MAGs were not assigned to existed genera; total of 667 MAGs could be assigned to 154 GTDB species representatives, while 6,340 MAGs were not assigned (Fig. 3, Supplementary Table 2). Overall, 11,856 smBGCs in eight categories were obtained from 7,007 MAGs (Table 3, Supplementary Table 3), and 10,899 smBGCs were identified as putative novel smBGCs for discovering novel secondary metabolites by querying each smBGC sequence against the NCBI nucleotide sequence collection (Supplementary Table 3). The analysis of the number of smBGCs in all mineral types showed that the greatest number of smBGCs was found in polymetallic mines, and the second largest number was found in copper mines. The descending order of smBGC abundance in the remaining mineral types was as follows: lead-zinc mines, antimony mines, pyrite-copper mines, pyrite mines, coal mines, nickel-copper mines, magnetite mines, tin-zinc mines, iron mines, arsenic mines, and lignite mines (Fig. 4a).
Table 1.
Data information for each mineral type.
| Mineral type | Sample number | Base number (Gb) | Base number per sample (Gb) | Country | Data source |
|---|---|---|---|---|---|
| Antimony | 8 | 500.66 | 62.58 | China | NODE: OEP001841 |
| Arsenic | 3 | 30.60 | 10.20 | China | NODE: OEP001841 |
| Coal | 13 | 78.31 | 6.02 | China and USA | SRA: SRP218093, SRP226684 |
| Copper | 39 | 1,660.60 | 42.58 | Brazil, China, Germany, United Kingdom, and USA | NODE: OEP001841; SRA: SRP093762, SRP149873, SRP201756, SRP288126 |
| Iron | 17 | 62.81 | 3.69 | USA | SRA: SRP009106 |
| Lead-Zinc | 24 | 1,638.82 | 68.28 | China | NODE: OEP001841; SRA: ERP002170 |
| Lignite | 1 | 5.00 | 5.00 | Germany | SRA: SRP093591 |
| Magnetite | 5 | 598.44 | 119.69 | China | NODE: OEP001841 |
| Nickel-Copper | 15 | 49.15 | 3.28 | Canada | SRA: SRP102076 |
| Polymetallic | 33 | 2,495.94 | 75.62 | China and Sweden | NODE: OEP001841; SRA: SRP132763 |
| Pyrite | 13 | 477.59 | 36.74 | China and Germany | NODE: OEP001841; SRA: SRP096619 |
| Pyrite-Copper | 6 | 454.49 | 75.75 | China | NODE: OEP001841 |
| Tin-Zinc | 2 | 163.63 | 81.81 | China | NODE: OEP001841 |
Giga base is a unit of length for DNA molecules, consisting of one billion nucleotides; abbreviated Gb, or Gbp for giga base pair (http://en.wikipedia.org/wiki/Base_pair).
Fig. 1.
Geographic distribution of sampling sites in this study. (a) Geographic distribution of sampling sites for all samples (the latitude and longitude of SRS1810936 was retrieved according to the geographic location of this sample). (b) Geographic distribution of sampling sites for the acid mine drainage (AMD) metagenomic datasets for China.
Fig. 2.
Base number distributions of samples from 13 types of minerals. The median base number of samples was similar among lead-zinc mines, antimony mines, pyrite-copper mines, magnetite mines, tin-zinc mines, and polymetallic mines. The upper and lower whiskers extend from the hinge within 1.5 x the inter-quantile range to the highest and lowest values, respectively. The outlier points (black) are the ones outside that range.
Table 2.
Quality control standards and metagenome-assembled genome (MAG) numbers in each quality level.
| Quality level | Completeness | Contamination | Quality score | Number |
|---|---|---|---|---|
| High quality | ≥90% | ≤5% | >50 | 981 |
| Medium quality | 50%~90% | ≤5% | >50 | 6,026 |
High-quality MAG requires the presence of the 23S, 16S, and 5S rRNA genes and at least 18 tRNAs.
Fig. 3.
Maximum-likelihood phylogenetic trees of bacterial and archaeal MAGs at the phylum level. Major lineages are assigned arbitrary colours and named. Lineages with GTDB representative species assignment are highlighted with red dots, while lineages with existed genera assignment (genus with NCBI taxonomy ID) are marked with purple triangles. (a) Maximum-likelihood phylogenetic trees of bacterial MAGs were inferred from a concatenated alignment of 120 bacterial single-copy marker genes. The tree includes 8 named archaeal phyla. (b) Maximum-likelihood phylogenetic trees of archaeal MAGs inferred from a concatenated alignment of 122 archaeal single-copy marker genes. The tree includes 40 named bacterial phyla.
Table 3.
Numbers and percentages of smBGCs in eight categories classified by BIG-SCAPE.
| Type of smBGCs | Number of smBGCs | Percentage of smBGCs |
|---|---|---|
| Terpene | 3,751 | 31.64% |
| RiPPs | 1,864 | 15.72% |
| NRPS | 1,738 | 14.66% |
| PKSother | 936 | 7.89% |
| PKS I | 250 | 2.11% |
| PKS-NRP_Hybrids | 181 | 1.53% |
| Saccharides | 1 | 0.01% |
| Others | 3,135 | 26.44% |
Fig. 4.
Secondary metabolite biosynthetic gene cluster (smBGC) distributions in 13 types of minerals. (a) The number of smBGCs in different types of minerals. (b) Relative frequency of smBGC types across 13 types of minerals.
Methods
The workflow of data processing is depicted in Supplementary Fig. 1.
Date source
AMD metagenomic datasets of 179 samples from 13 mineral types obtained from seven countries were used to analyze GTDB species representative assignment for the binned MAGs and putative novel smBGCs (Table 1, Supplementary Table 1, Figs. 1a,2), including 68 public and 111 private samples. The datasets of 68 publicly available samples were downloaded from the SRA database (up to November 17, 2020) using the following search strategies: (((((Mine AMD) OR acid mine drainage) OR mine tailings) OR acidic stream) AND WGS [Strategy]) AND METAGENOMIC [Source] and (mine drainage metagenome [Organism]) AND WGS [Strategy] AND METAGENOMIC [Source], and the Illumina sequence data were kept. A total of 111 private samples across nine mineral types were collected and sequenced in this study. Among them, 87 samples across four mineral types newly collected in this study came from the same mineral types as the datasets downloaded from the SRA database, and 24 samples were obtained from five new mineral types. A total of 122 samples from 10 mineral types constituted the AMD metagenomic datasets for China (Table 1, Fig. 1b).
Quality control of raw data and metagenomic assembly
Trimmomatic is a flexible and efficient preprocessing tool used for reads processing of Illumina next-generation sequencing data, primarily for the filtering of adapter and low-quality sequences25. Quality control of the raw data for 179 samples in this study was performed using Trimmomatic (version 0.39) with Phred quality score cutoff of 20 and a minimum read length of 50 to remove the low-quality sequences. MetaSPAdes performs better in assembly compared to the other assembly tools, but it is time-consuming and requires very high memory26,27. MEGAHIT and metaSPAdes are both widely used tools for metagenome assembly28–30. Although metaSPAdes can provide high-quality assemblies across diverse data sets, MEGAHIT can provide acceptable assemblies with low memory usage and computational time31. Therefore, by a comprehensive consideration of the large volume of AMD samples analyzed and the affordable computational resources, we chose MEGAHIT28,29 as the software for metagenome assembly. The analysis of metagenome assembly was performed by MEGAHIT (version 1.2.9) in meta-sensitive mode to generate assembled contigs.
Metagenomic binning
Compared to original binning software, automated methods with multiple binning methods, such as MAGO, MetaWRAP or DAS Tool, combine the strengths of a flexible set of established binning algorithms to generate more or better bins32–34. MetaWRAP is a widely used tool for the metagenome binning of both environmental35–41 and host-associated42–44 samples, and it can obtain the largest number of high-quality draft genomes in tested datasets with relatively less computational requirements33,45. Additionally, MAGO used DAS Tool for bin refinement, and MetaWRAP outperformed DAS Tool for datasets of varied complexity33. Therefore, we selected MetaWRAP for metagenomic binning in this study. For each assembly, contigs were binned using the binning module (parameter: –maxbin2 –concoct –metabat2), consolidated into a de-replicated bin set using the bin_refinement module (parameter: -c 50 -x 5), and the quality of bins was further improved by using the reassemble_bins module within MetaWRAP (version 1.3.2). A total of 8,035 binned MAGs were obtained from 179 samples by MetaWRAP taking 1224 hours of wall time using an HPC with multiple 2.10 GHz Intel Xeon E7-4380 CPUs and 2 TB of RAM.
The completeness and contamination of all MAGs were estimated using CheckM (version 1.1.2) with a lineage-specific workflow46,47. Based on these results, we selected 7,007 MAGs that were estimated to be at least 50% complete, with less than 5% contamination and that had a quality score of >5036. As additional indicators of completeness, we identified tRNA genes using tRNAscan-SE (version 2.0.9)48 and rRNA genes using Infernal (version 1.1.2)49 with models from the Rfam database50. Based on these results, we found that 981 of the 7,007 MAGs were classified as high quality based on the MIMAG standard (≥90% completeness, ≤5% contamination, ≥18/20 tRNA genes and the presence of 5S, 16S and 23S rRNA genes), with the remaining classified as medium quality (Table 2, Supplementary Table 2).
Taxonomic assignment for bacterial and archaeal genomes
GTDB-Tk is a computationally efficient tool providing objective taxonomic assignment for bacterial and archaeal genomes based on the Genome Taxonomy Database (GTDB, http://gtdb.ecogenomic.org), and it is widely used for the classification of draft genomes directly from environmental- and human-associated samples51. Taxonomic analysis of each MAG was initially assigned using GTDB-Tk (version 1.4.0) based on the GTDB taxonomy R05-RS9552, and forty-eight phyla (eight archaeal phyla and 40 bacterial phyla) were obtained. GTDB-Tk analysis of 7,007 MAGs required 23 hours of wall time using an HPC with multiple 2.10 GHz Intel Xeon E7-4380 CPUs and 2 TB of RAM.
Based on the results of the GTDB-Tk analysis, a total of 1,707 MAGs were assigned to archaeal phyla, while 5,300 MAGs were assigned to bacterial phyla; 6,026 medium-quality MAGs were assigned to seven archaeal phyla and 38 bacterial phyla, while 981 high-quality MAGs were classified to four archaeal phyla and 31 bacterial phyla (Supplementary Table 2). In the genus level analysis, a total of 1,394 MAGs were classified into 150 extant genera, while 5,613 MAGs were not assigned. A total of 667 MAGs were assigned to GTDB representative genomes of 154 species, while 6,340 MAGs were not assigned to any GTDB species representative, data that would provide a large number of microbial resources for further research in the field of AMD bioremediation. A. ferrooxidans, A. ferrivorans, and A. thiooxidan have been demonstrated to be functional in AMD recovery9,14,16. In this study, A. ferrooxidans was found in copper mines, and A. ferrivorans and A. thiooxidan were found in polymetallic mines (Supplementary Table 2).
Constructing a phylogeny of nonredundant MAGs
dRep can reduce the computational time for pairwise genome comparisons by sequentially applying a fast, inaccurate estimation of genome distance and a slow, accurate measure of average nucleotide identity, thereby achieving a 28 fold increase in speed with perfect recall and precision compared to previously developed algorithms53. All of the produced 7,007 qualified bin sets were aggregated and de-replicated at 95% average nucleotide identity (ANI) using dRep (version 3.2.0, parameters: -comp 50 -con 5 -sa 0.95 –pa 0.9), resulting in a total of 1,992 species-level qualified MAGs54. These 1,992 de-replicated MAGs were further refined using a maximum-likelihood phylogeny inferred from a concatenation of 120 bacterial or 122 archaeal marker genes produced by GTDB-Tk51. Bacterial and archaeal approximate maximum likelihood trees were built using FastTree (version 2.1.10) with WAG + GAMMA models47,55–57, and visualized by iTOL58.
A striking feature of these trees is the large number of major lineages without assignment of a GTDB species representative (Fig. 3)51. There were 24 phyla in Bacteria without assignment of a GTDB species representative, and very limited MAGs were assigned to GTDB species representatives of Bacteria in the 16 phyla of Proteobacteria, Actinobacteriota, Nitrospirota, Firmicutes_E, Firmicutes, SZUA-79, Bacteroidota, Campylobacterota, Desulfobacterota, Spirochaetota, Firmicutes_B, Patescibacteria, Acidobacteriota, Aquificota, Bdellovibrionota, and Deinococcota (Fig. 3a). No MAGs were assigned to GTDB species representatives of Archaea in the phyla of Halobacteriota, Methanobacteriota, Thermoproteota, Asgardarchaeota, and Aenigmatarchaeota, and very limited MAGs were assigned to GTDB species representatives of Archaea in the phyla of Nanoarchaeota, Micrarchaeota, and Thermoplasmatota (Fig. 3b).
Mining of secondary metabolite biosynthetic gene clusters
Antibiotics & Secondary Metabolite Analysis Shell (antiSMASH, https://antismash.secondarymetabolites.org) is a tool that enables rapid identification, annotation, and analysis of smBGCs in genomes59. Since its first release in 2011, it has been the most widely used bioinformatics software for predicting smBGCs and the standard tool for smBGCs mining60. A total of 11,856 putative smBGCs were mined from 7,007 qualified MAGs across 13 mineral types using antiSMASH (version 5.1.2) called as follows: –cf-create-clusters –cb-general –cb-knownclusters –cb-subclusters –asf –pfam2go –smcog-trees –genefinding-tool prodigal, and in addition ignoring contigs with lengths shorter than 5 kb. antiSMASH analysis of 7,007 MAGs required 24 hours wall time using an HPC with multiple 2.60 GHz Intel (R) Xeon (R) Gold 6126 CPUs and 196 GB of RAM.
Using a threshold of 75% identity over 80% of the query length, 10,899 (91.93%) of 11,856 putative smBGCs were identified as putative novel smBGCs querying against the NCBI nucleotide sequence collection (downloaded 27 Jan 2021) by the command ‘blastn’ within the NCBI BLAST+ package (version 2.11)61 with an E-value cutoff of 1 × 10−1. Although many modular clusters were fragmented, we identified over 154 smBGC regions >50 kb in length and more than 1,834 > 30 kb. These smBGCs were further classified into eight categories using BIG-SCAPE with default parameters62. Among these eight smBGC categories, terpene had the largest number and made up the highest percentage of smBGCs at 3,751 smBGCs and 31.64%, respectively (Table 3, Supplementary Table 3).
Data Records
The rawdata from the 111 private samples was deposited in NODE (https://www.biosino.org/node/project/detail/OEP001841)63, GSA (CRA006735)64, and NCBI SRA (PRJNA666025)65. A total of 7,007 MAGs with completeness ≥50%, contamination ≤5%, and had a quality score of >50 (the medium-quality level of the MIMAG standard) were obtained from 13 mineral types by metagenomic assembly and binning47. A total of 981 (14.00%) MAGs were assigned as high quality according to the MIMAG standard24. All 7,007 MAGs from the current study have been deposited in eLMSG (an eLibrary of Microbial Systematics and Genomics, https://www.biosino.org/elmsg/index) under accession numbers LMSG_G000004334.1–LMSG_G000011340.166, NODE (https://www.biosino.org/node/analysis/detail/OEZ008530)67, and GenBank (PRJNA834572)68.
All 11,856 putative smBGCs from 7,007 MAGs of 13 mineral types were deposited in NODE (https://www.biosino.org/node/analysis/detail/OEZ008529)69 and GenBank (KFVK00000000)70. The classes of secondary metabolites synthesized by each smBGC across 13 mineral types were assigned (Fig. 4b). Non-ribosomal peptide synthetase (NRPS), post-translationally modified peptides (RiPPs), and terpene were found in all mineral types. The 13 mineral types in this study had relatively low numbers of smBGCs in the remaining smBGC categories, including type I polyketide synthesase (PKS I), PKSother, and PKS-NRP_hybrids. Saccharides are only found in pyrite-copper mines.
Technical Validation
In order to ensure that the datasets from the SRA database only contained AMD metagenomic data, the metadata of these datasets from the SRA database and the scientific literature were manually curated. To select metagenomic datasets, only datasets for which the library strategy was WGS and the library source was METAGENOMIC were chosen. Because the pH values of AMD were usually 2–41, datasets such as SRS1650501-SRS1650503, SRS872561, SRS962537, SRS963313, SRS963552, SRS963574, SRS963594, SRS963611, and SRS963627, whose pH values were greater than 4, were removed to further filter the AMD metagenomic datasets. For datasets that did not provide pH values, metadata in the SRA database and in the scientific literature were reviewed to preserve only AMD metagenomic datasets71–75.
The latitude and longitude of SRS1810936 was retrieved according to the geographic location of this sample. The mineral types of SRS5255199, SRS5255198, SRS5255197, and SRS2947527 were obtained through manual review of the metadata in the SRA database and scientific literature76.
The smBGCs number and type varied even within the same dRep cluster (Supplementary Table 4). Therefore, we used the 7,007 MAGs before de-replication for the smBGCs prediction. A total of 6,026 from 7,007 MAG belonged to medium quality according to the MIMAG standard24. Using the draft genome for the smBGCs mining by using antiSMASH would cause the number of detected gene clusters to be artificially high, and some contigs with gene cluster fragments might be left undetected77. In order to obtain better smBGCs, we ignored contigs with lengths shorter than 5 kb to increase the chance of the smBGCs we mined to have roles in secondary metabolite synthesis78. Although many modular clusters were fragmented, we identified over 154 BGC regions >50 kb in length and more than 1,834 > 30 kb.
We used linear regression to examine the sample size associated with the diversity of secondary metabolite biosynthetic gene clusters by GraphPad Prism (version 9.3.1). The total number of smBGCs in each sample showed a moderate positive correlation (R2 = 0.3620) with the total length of quality MAGs in each sample (Fig. 5a), demonstrating that the number of smBGCs may also be affected by other factors.
Fig. 5.
The diversity of secondary metabolite biosynthetic gene clusters (smBGCs) in different mineral types and geographic locations. (a) Correlation between the total number of smBGCs in each sample and the total length of quality MAGs in each sample. (b) smBGC counts per Gigabase (the total number of smBGCs in each sample divided by the total length of quality MAGs in each sample) plotted according to mineral type. (c) smBGC counts per Gigabase (the total number of smBGCs in each sample divided by the total length of quality MAGs in each sample) plotted according to geographic location. Data were analyzed using one-way ANOVA followed by Turkey’s test (*P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001).
The box plots of smBGC counts per Gigabase in different geographic locations or mineral types were generated using GraphPad Prism (version 9.3.1). One-way ANOVA followed by Turkey’s test was used to analyze the differences among groups (P < 0.05) by GraphPad Prism (version 9.3.1). Notably, the smBGCs were most abundant in Canada: Ontario and USA: Pennsylvania, Scalp Level by the analysis of geographic location, while Coal mine and Nickel-Copper mine had relatively greater abundances of smBGCs according to the analysis of mineral type (Fig. 5b,c).
Usage Notes
The datasets analyzed in this study were the largest AMD metagenomic datasets considered to date. Among the 68 samples from the SRA database, only 11 (16%) of the samples were from AMD metagenomic datasets from China. Through the collection and sequencing of 111 AMD samples in this study, the metagenomic data of AMD in southeastern China were obtained. This complemented the publicly available datasets in order to provide a better overview of the putative novel microorganisms and secondary metabolite resources in the AMD environment. These datasets can be further employed in research on AMD bioremediation, the mining of novel secondary metabolites for drug synthesis, and for the analysis of gene functions, metabolic pathways, and CNPS cycles in AMD.
Supplementary information
Acknowledgements
This work was supported by the Second Tibetan Plateau Scientific Expedition and Research Program (STEP) (2021QZKK0101), National Key R&D Program of China (2018YFA0900704), the International Partnership Program of Chinese Academy of Sciences (Grant NO. 153D31KYSB20170121), Training Program of the Major Research Plan of the National Natural Science Foundation of China (92051116), Biological Resources Programme of Chinese Academy of Sciences (Grant NO. KFJ-BRP-017-79), Biological Resources Service Network Initiative of Chinese Academy of Sciences (Grant NO. KFJ-BRP-009-001), and the Program on Platform and Talent Development of the Department of Science and Technology of Guizhou China ([2019] 5617). We appreciate the strong support from the joint program of Guian Center for Biomedical Big Data of SINH/SIBS of CAS. This publication was made possible by support from Wensheng Shu. We also would like to thank Liang Li for the development of the website for the data deposition, ShengnanYuan and Ruixin Zhu for providing the metagenomic analysis pipeline, guiding the analysis and editing the manuscript, Yufeng Zhang for being involved in the discussion of this study, and the integrated Microbiome Analysis Cloud platform (iMAC platform) for the analysis process and computing power support for this project. We thank the Editor and Reviewers for their constructive reviews that helped improve the original manuscript.
Author contributions
Ling Wang assembled and annotated the metagenome. Wan Liu executed the data analysis and wrote the manuscript. Jieliang Liang provided the newly collected and sequenced AMD datasets. Linna Zhao, Qiang Li, Chenfen Zhou, and Hui Cen were involved in the discussion of this study. Qingbei Weng designed this study. Guoqing Zhang conceived of the studies, provided material support, and edited the manuscript. All of the authors read, edited, and approved the final manuscript.
Code availability
The version and parameters of all of the bioinformatics tools used in this work are described in the Methods section.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Ling Wang, Wan Liu, Jieliang Liang.
These authors jointly supervised this work: Qingbei Weng, Guoqing Zhang.
Contributor Information
Qingbei Weng, Email: wengqb@126.com.
Guoqing Zhang, Email: gqzhang@picb.ac.cn.
Supplementary information
The online version contains supplementary material available at 10.1038/s41597-022-01866-6.
References
- 1.Nancucheo I, et al. Recent Developments for Remediating Acidic Mine Waters Using Sulfidogenic Bacteria. Biomed Res. Int. 2017;2017:7256582. doi: 10.1155/2017/7256582. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Grimalt JO, Ferrer M, Macpherson E. The mine tailing accident in Aznalcollar. Sci. Total Environ. 1999;242:3–11. doi: 10.1016/S0048-9697(99)00372-1. [DOI] [PubMed] [Google Scholar]
- 3.Glukhova LB, et al. Isolation, Characterization, and Metal Response of Novel, Acid-Tolerant Penicillium spp. from Extremely Metal-Rich Waters at a Mining Site in Transbaikal (Siberia, Russia) Microb. Ecol. 2018;76:911–924. doi: 10.1007/s00248-018-1186-0. [DOI] [PubMed] [Google Scholar]
- 4.Schmidt U. Enhancing phytoextraction: the effect of chemical soil manipulation on mobility, plant accumulation, and leaching of heavy metals. J. Environ. Qual. 2003;32:1939–1954. doi: 10.2134/jeq2003.1939. [DOI] [PubMed] [Google Scholar]
- 5.Johnson DB, Hallberg KB. The microbiology of acidic mine waters. Res. Microbiol. 2003;154:466–473. doi: 10.1016/S0923-2508(03)00114-1. [DOI] [PubMed] [Google Scholar]
- 6.Denef VJ, Mueller RS, Banfield JF. AMD biofilms: using model communities to study microbial evolution and ecological complexity in nature. ISME J. 2010;4:599–610. doi: 10.1038/ismej.2009.158. [DOI] [PubMed] [Google Scholar]
- 7.Kuang JL, et al. Contemporary environmental variation determines microbial diversity patterns in acid mine drainage. ISME J. 2013;7:1038–1050. doi: 10.1038/ismej.2012.139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fahy A, et al. 16S rRNA and As-Related Functional Diversity: Contrasting Fingerprints in Arsenic-Rich Sediments from an Acid Mine Drainage. Microb. Ecol. 2015;70:154–167. doi: 10.1007/s00248-014-0558-3. [DOI] [PubMed] [Google Scholar]
- 9.Mendez-Garcia C, et al. Microbial diversity and metabolic networks in acid mine drainage habitats. Front. Microbiol. 2015;6:475. doi: 10.3389/fmicb.2015.00475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Abinandan S, Subashchandrabose SR, Venkateswarlu K, Megharaj M. Microalgae-bacteria biofilms: a sustainable synergistic approach in remediation of acid mine drainage. Appl. Microbiol. Biotechnol. 2018;102:1131–1144. doi: 10.1007/s00253-017-8693-7. [DOI] [PubMed] [Google Scholar]
- 11.Qian Z, Tianwei H, Mackey HR, van Loosdrecht MCM, Guanghao C. Recent advances in dissimilatory sulfate reduction: From metabolic study to application. Water Res. 2019;150:162–181. doi: 10.1016/j.watres.2018.11.018. [DOI] [PubMed] [Google Scholar]
- 12.Williams KP, Kelly DP. Proposal for a new class within the phylum Proteobacteria, Acidithiobacillia classis nov., with the type order Acidithiobacillales, and emended description of the class Gammaproteobacteria. Int. J. Syst. Evol. Microbiol. 2013;63:2901–2906. doi: 10.1099/ijs.0.049270-0. [DOI] [PubMed] [Google Scholar]
- 13.Hedrich S, Johnson DB. Acidithiobacillus ferridurans sp. nov., an acidophilic iron-, sulfur- and hydrogen-metabolizing chemolithotrophic gammaproteobacterium. Int. J. Syst. Evol. Microbiol. 2013;63:4018–4025. doi: 10.1099/ijs.0.049759-0. [DOI] [PubMed] [Google Scholar]
- 14.Hallberg KB, Gonzalez-Toril E, Johnson DB. Acidithiobacillus ferrivorans, sp. nov.; facultatively anaerobic, psychrotolerant iron-, and sulfur-oxidizing acidophiles isolated from metal mine-impacted environments. Extremophiles. 2010;14:9–19. doi: 10.1007/s00792-009-0282-y. [DOI] [PubMed] [Google Scholar]
- 15.Chen L, et al. Acidithiobacillus caldus sulfur oxidation model based on transcriptome analysis between the wild type and sulfur oxygenase reductase defective mutant. PLoS One. 2012;7:e39470. doi: 10.1371/journal.pone.0039470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gupta A, Saha A, Sar P. Thermoplasmata and Nitrososphaeria as dominant archaeal members in acid mine drainage sediment of Malanjkhand Copper Project, India. Arch. Microbiol. 2021;203:1833–1841. doi: 10.1007/s00203-020-02130-4. [DOI] [PubMed] [Google Scholar]
- 17.Yang L, et al. Acidithiobacillus thiooxidans and its potential application. Appl. Microbiol. Biotechnol. 2019;103:7819–7833. doi: 10.1007/s00253-019-10098-5. [DOI] [PubMed] [Google Scholar]
- 18.Stierle AA, Stierle DB. Bioactive secondary metabolites from acid mine waste extremophiles. Nat. Prod. Commun. 2014;9:1037–1044. [PMC free article] [PubMed] [Google Scholar]
- 19.Keller NP. Fungal secondary metabolism: regulation, function and drug discovery. Nat. Rev. Microbiol. 2019;17:167–180. doi: 10.1038/s41579-018-0121-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Stierle DB, Stierle AA, Hobbs JD, Stokken J, Clardy J. Berkeleydione and berkeleytrione, new bioactive metabolites from an acid mine organism. Org. Lett. 2004;6:1049–1052. doi: 10.1021/ol049852k. [DOI] [PubMed] [Google Scholar]
- 21.Moutiez M, Belin P, Gondry M. Aminoacyl-tRNA-Utilizing Enzymes in Natural Product Biosynthesis. Chem. Rev. 2017;117:5578–5618. doi: 10.1021/acs.chemrev.6b00523. [DOI] [PubMed] [Google Scholar]
- 22.Gondry M, et al. A Comprehensive Overview of the Cyclodipeptide Synthase Family Enriched with the Characterization of 32 New Enzymes. Front. Microbiol. 2018;9:46. doi: 10.3389/fmicb.2018.00046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Borthwick AD. 2,5-Diketopiperazines: synthesis, reactions, medicinal chemistry, and bioactive natural products. Chem. Rev. 2012;112:3641–3716. doi: 10.1021/cr200398y. [DOI] [PubMed] [Google Scholar]
- 24.Bowers RM, et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 2017;35:725–731. doi: 10.1038/nbt.3893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Forouzan E, Shariati P, Mousavi Maleki MS, Karkhane AA, Yakhchali B. Practical evaluation of 11 de novo assemblers in metagenome assembly. J. Microbiol. Methods. 2018;151:99–105. doi: 10.1016/j.mimet.2018.06.007. [DOI] [PubMed] [Google Scholar]
- 27.Pasolli E, et al. Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle. Cell. 2019;176:649–662 e620. doi: 10.1016/j.cell.2019.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Li D, et al. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods. 2016;102:3–11. doi: 10.1016/j.ymeth.2016.02.020. [DOI] [PubMed] [Google Scholar]
- 29.Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics. 2015;31:1674–1676. doi: 10.1093/bioinformatics/btv033. [DOI] [PubMed] [Google Scholar]
- 30.Nurk S, Meleshko D, Korobeynikov A, Pevzner P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 2017;27:824–834. doi: 10.1101/gr.213959.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Fritz A, et al. CAMISIM: simulating metagenomes and microbial communities. Microbiome. 2019;7:17. doi: 10.1186/s40168-019-0633-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Sieber CMK, et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Microbiol. 2018;3:836–843. doi: 10.1038/s41564-018-0171-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Uritskiy GV, DiRuggiero J, Taylor J. MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome. 2018;6:158. doi: 10.1186/s40168-018-0541-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Murovec B, Deutsch L, Stres B. Computational Framework for High-Quality Production and Large-Scale Evolutionary Analysis of Metagenome Assembled Genomes. Mol. Biol. Evol. 2020;37:593–598. doi: 10.1093/molbev/msz237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Dong X, et al. Thermogenic hydrocarbon biodegradation by diverse depth-stratified microbial populations at a Scotian Basin cold seep. Nat. Commun. 2020;11:5825. doi: 10.1038/s41467-020-19648-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Xu B, et al. A holistic genome dataset of bacteria, archaea and viruses of the Pearl River estuary. Sci. Data. 2022;9:49. doi: 10.1038/s41597-022-01153-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zhou L, Huang S, Gong J, Xu P, Huang X. 500 metagenome-assembled microbial genomes from 30 subtropical estuaries in South China. Sci. Data. 2022;9:310. doi: 10.1038/s41597-022-01433-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Zhang H, et al. Metagenome sequencing and 768 microbial genomes from cold seep in South China Sea. Sci. Data. 2022;9:480. doi: 10.1038/s41597-022-01586-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Lee S, et al. Methane-derived carbon flows into host-virus networks at different trophic levels in soil. Proc. Natl. Acad. Sci. U S A. 2021;118:e2105124118. doi: 10.1073/pnas.2105124118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bay SK, et al. Trace gas oxidizers are widespread and active members of soil microbial communities. Nat. Microbiol. 2021;6:246–256. doi: 10.1038/s41564-020-00811-w. [DOI] [PubMed] [Google Scholar]
- 41.Li J, et al. Intracellular silicification by early-branching magnetotactic bacteria. Sci. Adv. 2022;8:eabn6045. doi: 10.1126/sciadv.abn6045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Yang H, et al. ABO genotype alters the gut microbiota by regulating GalNAc levels in pigs. Nature. 2022;606:358–367. doi: 10.1038/s41586-022-04769-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.von Schwartzenberg RJ, et al. Caloric restriction disrupts the microbiota and colonization resistance. Nature. 2021;595:272–277. doi: 10.1038/s41586-021-03663-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Saheb Kashaf S, Almeida A, Segre JA, Finn RD. Recovering prokaryotic genomes from host-associated, short-read shotgun metagenomic sequencing data. Nat. Protoc. 2021;16:2520–2541. doi: 10.1038/s41596-021-00508-2. [DOI] [PubMed] [Google Scholar]
- 45.Yang C, et al. A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Comput. Struct. Biotechnol. J. 2021;19:6301–6314. doi: 10.1016/j.csbj.2021.11.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 2015;25:1043–1055. doi: 10.1101/gr.186072.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Nayfach S, et al. Publisher Correction: A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 2021;39:520. doi: 10.1038/s41587-020-00769-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res. 2021;49:9077–9096. doi: 10.1093/nar/gkab688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29:2933–2935. doi: 10.1093/bioinformatics/btt509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Kalvari I, et al. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 2018;46:D335–D342. doi: 10.1093/nar/gkx1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. 2019;36:1925–1927. doi: 10.1093/bioinformatics/btz848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Parks DH, et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 2022;50:D785–D794. doi: 10.1093/nar/gkab776. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Olm MR, Brown CT, Brooks B, Banfield JF. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 2017;11:2864–2868. doi: 10.1038/ismej.2017.126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Nayfach S, et al. Author Correction: A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 2021;39:521. doi: 10.1038/s41587-021-00898-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Price MN, Dehal PS, Arkin AP. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5:e9490. doi: 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Price MN, Dehal PS, Arkin AP. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 2009;26:1641–1650. doi: 10.1093/molbev/msp077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Liu K, Linder CR, Warnow T. RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS One. 2011;6:e27731. doi: 10.1371/journal.pone.0027731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Letunic I, Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–W296. doi: 10.1093/nar/gkab301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Blin K, et al. antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids Res. 2019;47:W81–W87. doi: 10.1093/nar/gkz310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Blin K, et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 2021;49:W29–W35. doi: 10.1093/nar/gkab335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Navarro-Munoz JC, et al. A computational framework to explore large-scale biosynthetic diversity. Nat. Chem. Biol. 2020;16:60–68. doi: 10.1038/s41589-019-0400-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.dataNODE The National Omics Data Encyclopediahttps://www.biosino.org/node/project/detail/OEP001841 (2021).
- 64.2022. dataGSA Genome Sequence Archive. https://ngdc.cncb.ac.cn/gsa/browse/CRA006735
- 65.2022. NCBI Sequence Read Archive. SRP121625
- 66.dataeLMSG an eLibrary of Microbial Systematics and Genomicshttps://www.biosino.org/elmsg/amdDetail (2022).
- 67.dataNODE The National Omics Data Encyclopediahttps://www.biosino.org/node/analysis/detail/OEZ008530 (2022).
- 68.dataZhang Q, Wang L, Liu W. 2022. Mining of novel secondary metabolite biosynthetic gene clusters from acid mine drainage. GenBank. KFVK01000000 [DOI] [PMC free article] [PubMed]
- 69.dataNODE The National Omics Data Encyclopediahttps://www.biosino.org/node/analysis/detail/OEZ008529 (2022).
- 70.dataZhang Q, Wang L, Liu W. 2022. Mining of novel secondary metabolite biosynthetic gene clusters from acid mine drainage. GenBank. KFVK00000000 [DOI] [PMC free article] [PubMed]
- 71.Giddings LA, et al. Characterization of an acid rock drainage microbiome and transcriptome at the Ely Copper Mine Superfund site. PLoS One. 2020;15:e0237599. doi: 10.1371/journal.pone.0237599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Chen LX, et al. Shifts in microbial community composition and function in the acidification of a lead/zinc mine tailings. Environ. Microbiol. 2013;15:2431–2444. doi: 10.1111/1462-2920.12114. [DOI] [PubMed] [Google Scholar]
- 73.Krause S, Bremges A, Munch PC, McHardy AC, Gescher J. Characterisation of a stable laboratory co-culture of acidophilic nanoorganisms. Sci. Rep. 2017;7:3289. doi: 10.1038/s41598-017-03315-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Muhling M, et al. Reconstruction of the Metabolic Potential of Acidophilic Sideroxydans Strains from the Metagenome of an Microaerophilic Enrichment Culture of Acidophilic Iron-Oxidizing Bacteria from a Pilot Plant for the Treatment of Acid Mine Drainage Reveals Metabolic Versatility and Adaptation to Life at Low pH. Front. Microbiol. 2016;7:2082. doi: 10.3389/fmicb.2016.02082. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Arif S, Nacke H, Hoppert M. Metagenome-Assembled Genome Sequences of a Biofilm Derived from Marsberg Copper Mine. Microbiol. Resour. Announc. 2021;10:e01253–01220. doi: 10.1128/MRA.01253-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Liljeqvist, M. et al. Metagenomic analysis reveals adaptations to a cold-adapted lifestyle in a low-temperature acid mine drainage stream. FEMS Microbiol. Ecol. 91 (2015). [DOI] [PubMed]
- 77.Blin K, et al. antiSMASH 2.0—a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res. 2013;41:W204–212. doi: 10.1093/nar/gkt449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Wei B, et al. An atlas of bacterial secondary metabolite biosynthesis gene clusters. Environ. Microbiol. 2021;23:6981–6992. doi: 10.1111/1462-2920.15761. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- 2022. dataGSA Genome Sequence Archive. https://ngdc.cncb.ac.cn/gsa/browse/CRA006735
- 2022. NCBI Sequence Read Archive. SRP121625
- dataZhang Q, Wang L, Liu W. 2022. Mining of novel secondary metabolite biosynthetic gene clusters from acid mine drainage. GenBank. KFVK01000000 [DOI] [PMC free article] [PubMed]
- dataZhang Q, Wang L, Liu W. 2022. Mining of novel secondary metabolite biosynthetic gene clusters from acid mine drainage. GenBank. KFVK00000000 [DOI] [PMC free article] [PubMed]
Supplementary Materials
Data Availability Statement
The version and parameters of all of the bioinformatics tools used in this work are described in the Methods section.





