Abstract
Summary
Microbial secondary metabolites exhibit potential medicinal value. A large number of secondary metabolite biosynthetic gene clusters (BGCs) in the human gut microbiome, which exhibit essential biological activity in microbe–microbe and microbe–host interactions, have not been adequately characterized, making it difficult to prioritize these BGCs for experimental characterization. Here, we present the sBGC-hm, an atlas of secondary metabolite BGCs allows researchers to explore the potential therapeutic benefits of these natural products. One of its key features is the ability to assist in optimizing the BGC structure by utilizing the gene co-occurrence matrix obtained from Human Microbiome Project data. Results are viewable online and can be downloaded as spreadsheets.
Availability and implementation
The database is openly available at https://www.wzubio.com/sbgc. The website is powered by Apache 2 server with PHP and MariaDB.
1 Introduction
Secondary metabolites (SMs) are a series of small bioactive molecules with potential medicinal value that is derived from living organisms, with microorganisms serving as a significant resource (Newman and Cragg 2020). For instance, the majority of known antibiotics and biologically active SMs are derived from the Streptomyces (de Lima Procópio et al. 2012), while thousands of SM biosynthetic gene clusters (BGCs) have been identified in other bacteria (Skinnider et al. 2020; Wei et al. 2021). IMG-ABC, the largest BGC database accessible to the public, stores hundreds of thousands of predicted clusters (Palaniappan et al. 2020). As a result of the explosion of a large amount of BGC data, the prioritization of biologically active clusters for experimental characterization became a challenge. Genome mining has identified a subset of human microbial-derived SM BGCs with essential functional activities (Donia et al. 2014). However, there are still insufficient resources available for human-related BGCs.
Recent implementation of a comprehensive genome collection of the human gut microbiome, including both metagenome-assembled genomes (MAGs) and RefSeq genomes (Hiseni et al. 2021), provides material for the detection of human-related BGCs. As a step towards discovering candidate biology activity SM from the gut microbiome, we created a database called sBGC-hum, an atlas of SM BGCs predicted from the sequenced human gut microbiome. For the prioritization of these BGCs for experimental investigation, their associated metadata, such as BGC family, gene abundance, gender differences (Krumsiek et al. 2015), transporter gene (Crits-Christoph et al. 2021), resistance gene (Yan et al. 2020), etc., are also incorporated. Gene co-occurrence information is also a crucial component for enhancing BGC structure prediction (Libis et al. 2019). To obtain this information, we built a gene co-occurrence matrix using the Human Microbiome Project (HMP) data.
2 Data retrieval and processing
HumGut was used to download 10 432 883 genome sequences of the human microbiome (Hiseni et al. 2021). Their protein-coding genes were predicted with Prokka v1.14.6 and annotated with Prodigal v2.6.3, using the parameters –addgenes and –addmrna (Hyatt et al. 2010; Seemann, 2014). All genome sequences in GenBank format were processed by the command-line version of antiSMASH v6.1.1 with the following parameters: –taxon bacteria –asf –rre –clusterhmmer –pfam2go (Blin et al. 2021). Then, the putative BGCs were clustered and classified using the platform BiG-SCAPE with default parameters (Navarro-Muñoz et al. 2020).
Using the extracted protein sequences from predicted BGCs, a local database was constructed in order to estimate the gene abundance of these BGCs in metagenomic datasets. HMP was used to fetch 552 Illumina platform-derived metagenomic datasets from 501 individuals, including 254 females and 247 males (Turnbaugh et al. 2007). The DIAMOND program aligned each metagenomic dataset to the local protein database with the following parameters: -e 0.00001 -b 5 -c 1 -k 200. If a single read identified multiple proteins, only the hits with the lowest E-value were retained. However, reads with more than 200 hits were removed to reduce the false positives (Donia et al. 2014). The metagenomic reads from identical individuals were combined. Finally, each individual's gene abundance was normalized as follows:
| (1) |
3 Main features
sBGC-hm provides a user-friendly interface with the following key characteristics:
It is the first human gut microbial-derived BGC database to include the most comprehensive set of SM BGCs to date.
Keyword and BLAST searches are provided for acquiring target BGCs. Table 1 lists the items available in sBGC-hm with an example.
The gene co-occurrence matrix derived from HMP data is given to enhance BGC structure prediction (Supplementary Fig. S1).
Additional visualization is employed for the prioritization, such as the abundance of core biosynthetic genes and the gender differences of these genes.
Table 1.
A description of the data items in sBGC-hm with an example BGC. (Detailed descriptions are available on the website)a
| Data items | Examples |
|---|---|
| Cluster | 9 |
| Family | FAM_23957 |
| Organism | Bacteroides sp. CAG:545 |
| Taxon ID | 1262742 |
| Start | 1 |
| End | 22 809 |
| Length | 22 809 |
| Type | Arylpolyene b |
| Gene Count | 16 c |
| Core Genes | 1 |
| Transporter Genes | 0 |
| Resistance Genes | 0 |
| Mean | 0.2452 d |
| Gender Differences | 0.04305 e |
| MiBiG | BGC0000838 (8%) |
| Download | GenBank |
Underline indicates hyperlinks to external information.
The hyperlinks lead to the BGC structure, the gene co-occurrence matrix, the density plot of gene abundance, and the boxplot of the distribution of gene abundance by gender, respectively.
4 Statistics
A total of 36 583 BGCs are detected, with ranthipeptide 27.0% (9872) being the most abundant type. There are also 4.4% (1625) NRPSs and 3.6% (1324) PKSs. Firmicutes account for 73.1% (26 727) of these BGCs, followed by Proteobacteria 10.5% (3840) and Bacteroidota 10.0% (3659). Streptococcus contributes the highest number of BGCs at the genus level, with a total of 2800 BGCs classified into 354 families. The average number of genes in these BGCs is 11.8, while the majority contain 3–5 genes. The BiG-SCAPE analysis discovered 8004 families, the largest of which contained 186 BGCs. According to the Wilcoxon rank-sum test of gene abundance, 3406 BGCs are enriched for males using the HMP data analysis. Statistics and figures for these BGCs are available on the website's statistics page.
5 Discussion
The majority of bacteria in the human microbiome cannot be cultured at this time. The sequenced bacterial genome provides the information necessary to investigate the biosynthetic capability of these microbiomes. Here, the distribution of BGCs in the human gut microbiome was systematically investigated, supporting that BGCs were abundant. The atlas of SM BGCs allows researchers to explore the potential therapeutic benefits of these natural products.
Supplementary Material
Contributor Information
Huixi Zou, National and Local Joint Engineering Research Center of Ecological Treatment Technology for Urban Water Pollution, Wenzhou University, Wenzhou 325035, China; Zhejiang Provincial Key Laboratory for Subtropical Water Environment and Marine Biological Resources Protection, Wenzhou University, Wenzhou 325035, China; College of Life and Environmental Science, Wenzhou University, Wenzhou 325035, China.
Tianli Sun, National and Local Joint Engineering Research Center of Ecological Treatment Technology for Urban Water Pollution, Wenzhou University, Wenzhou 325035, China; Zhejiang Provincial Key Laboratory for Subtropical Water Environment and Marine Biological Resources Protection, Wenzhou University, Wenzhou 325035, China; College of Life and Environmental Science, Wenzhou University, Wenzhou 325035, China.
Bangqun Jin, National and Local Joint Engineering Research Center of Ecological Treatment Technology for Urban Water Pollution, Wenzhou University, Wenzhou 325035, China; Zhejiang Provincial Key Laboratory for Subtropical Water Environment and Marine Biological Resources Protection, Wenzhou University, Wenzhou 325035, China; College of Life and Environmental Science, Wenzhou University, Wenzhou 325035, China.
Shengqin Wang, National and Local Joint Engineering Research Center of Ecological Treatment Technology for Urban Water Pollution, Wenzhou University, Wenzhou 325035, China; Zhejiang Provincial Key Laboratory for Subtropical Water Environment and Marine Biological Resources Protection, Wenzhou University, Wenzhou 325035, China; College of Life and Environmental Science, Wenzhou University, Wenzhou 325035, China.
Supplementary data
Supplementary data is available at Bioinformatics online.
Conflict of interest: None declared.
Funding
This work was supported by the Science and Technology Planning Project of Wenzhou City [Y20190210]; the Special Science and Technology Innovation Project for Seeds and Seedlings of Wenzhou City [N20160016]; the National Natural Science Foundation of China [61601332]; and the Science and Technology Planning Project of Wenzhou City [Y20220134].
References
- Blin K, Shaw S, Kloosterman AM. et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res 2021;49:W29–35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crits-Christoph A, Bhattacharya N, Olm MRet al. Transporter genes in biosynthetic gene clusters predict metabolite characteristics and siderophore activity. Genome Res 2020;31:239–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Procópio REdL, Silva IRd, Martins MKet al. Antibiotics produced by streptomyces. Braz J Infect Dis 2012;16:466–71. [DOI] [PubMed] [Google Scholar]
- Donia MS, Cimermancic P, Schulze CJet al. A systematic analysis of biosynthetic gene clusters in the human microbiome reveals a common family of antibiotics. Cell 2014;158:1402–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hiseni P, Rudi K, Wilson RC. et al. HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data. Microbiome 2021;9:165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hyatt D, Chen GL, LoCascio PF. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 2010;11:119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krumsiek J, Mittelstrass K, Do KTet al. Gender-specific pathway differences in the human serum metabolome. Metabolomics 2015;11:1815–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Libis V, Antonovsky N, Zhang M. et al. Uncovering the biosynthetic potential of rare metagenomic DNA using co-occurrence network analysis of targeted sequences. Nat Commun 2019;10:3848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Navarro-Muñoz JC, Selem-Mojica N, Mullowney MWet al. A computational framework to explore large-scale biosynthetic diversity. Nat Chem Biol 2020;16:60–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Newman DJ, Cragg GM.. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J Nat Prod 2020;83:770–803. [DOI] [PubMed] [Google Scholar]
- Palaniappan K, Chen IMA, Chu K. et al. IMG-ABC v.5.0: an update to the IMG/atlas of biosynthetic gene clusters knowledgebase. Nucleic Acids Res 2020;48:D422–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014;30:2068–9. [DOI] [PubMed] [Google Scholar]
- Skinnider MA, Johnston CW, Gunabalasingam M. et al. Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences. Nat Commun 2020;11:6058. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Turnbaugh PJ, Ley RE, Hamady Met al. The human microbiome project. Nature 2007;449:804–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wei B, Du A-Q, Zhou Z-Yet al. An atlas of bacterial secondary metabolite biosynthesis gene clusters. Environ Microbiol 2021;23:6981–92. [DOI] [PubMed] [Google Scholar]
- Yan Y, Liu N, Tang Y.. Recent developments in self-resistance gene directed natural product discovery. Nat Prod Rep 2020;37:879–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
