antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

Tilmann Weber; Kai Blin; Srikanth Duddela; Daniel Krug; Hyun Uk Kim; Robert Bruccoleri; Sang Yup Lee; Michael A Fischbach; Rolf Müller; Wolfgang Wohlleben; Rainer Breitling; Eriko Takano; Marnix H Medema

doi:10.1093/nar/gkv437

. 2015 May 6;43(Web Server issue):W237–W243. doi: 10.1093/nar/gkv437

antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

Tilmann Weber ^1,^✉, Kai Blin ¹, Srikanth Duddela ², Daniel Krug ^2,³, Hyun Uk Kim ^1,⁴, Robert Bruccoleri ⁵, Sang Yup Lee ^1,⁴, Michael A Fischbach ⁶, Rolf Müller ^2,³, Wolfgang Wohlleben ^7,⁸, Rainer Breitling ⁹, Eriko Takano ⁹, Marnix H Medema ^10,^11,^✉

PMCID: PMC4489286 PMID: 25948579

Abstract

Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software.

INTRODUCTION

The secondary metabolism of bacteria and fungi is a rich source of bioactive chemical compounds with great potential for pharmaceutical, agricultural and nutritional applications. For example, in the field of antiinfectives, almost 70% of the drugs currently in medical use are such secondary metabolites or their derivatives (1). The genes encoding the biosynthetic pathways that are responsible for the production of these secondary metabolites are usually clustered together on the chromosome in biosynthetic gene clusters (BGCs). In recent years, genome mining of such BGCs has become a key methodology to identify new molecules, leading to the discovery of dozens of novel compounds. A variety of computational tools have been developed to support scientists in this field. Most of the available tools are dedicated to the analysis of specific classes of secondary metabolites. For example, ClustScan (2), NP.searcher (3) and SBSPKS (4) focus on non-ribosomal peptide and polyketide biosynthesis pathways, while BAGEL3 (5) focuses on ribosomally synthesized and post-translationally modified peptides (RiPPs). While most tools primarily focus on prokaryotic pathways, SMURF (6) also addresses fungal secondary metabolite producers. For a comprehensive review of tools for the genomic analysis of secondary metabolism, see Weber (7). Since 2011, the antibiotics and Secondary Metabolite Analysis SHell (antiSMASH) has served as a comprehensive web server and a stand-alone tool for the automatic genomic identification and analysis of BGCs of any type, thus facilitating rapid genome mining of a wide range of bacterial and fungal strains (8,9). Here, we report the third version of antiSMASH, which has undergone several key improvements.

NEW FEATURES AND UPDATES

Integration with ClusterFinder

A key limitation to the original antiSMASH BGC detection algorithm was that, despite many major classes of secondary metabolites being covered by its detection logic (see Supplementary Tables S1 and S2), it was still limited to the detection of known types of BGCs. To overcome this limitation, the ClusterFinder algorithm was recently published (10). ClusterFinder uses a hidden Markov model to probabilistically predict BGC-like regions in genomes based on the frequencies of observed PFAM domains inside and outside a comprehensive set of known BGCs. The key assumption of the algorithm is that even the biosynthetic pathways for unknown compound families that are very different from known secondary metabolites utilize the same broad enzyme families (e.g., oxidoreductases and methyltransferases) for the catalysis of key reactions. While ClusterFinder has a somewhat higher false positive rate than the original version of antiSMASH, it has been shown to effectively identify BGCs for entirely new classes of chemicals (10).

From a user perspective, the most convenient way to analyze genomes using ClusterFinder is to inspect its results alongside those of antiSMASH, so that all detailed classification and analysis features offered by antiSMASH are included in the workflow. Hence, we fully integrated ClusterFinder into antiSMASH, including detailed input variables to tune the sensitivity and specificity. Also, the clusters detected by ClusterFinder are further categorized into saccharide (‘Cf_saccharide’), fatty acid (‘Cf_fatty_acid’) and putative (‘Cf_putative’) biosynthetic types, according to the classification rules defined by Cimermancic et al. (10). Altogether, the antiSMASH web server now provides access to both a very specific algorithm that can accurately detect BGCs belonging to a large number of known classes and a highly sensitive algorithm that effectively identifies potentially novel types of BGCs.

Dereplication and comparison with known pathways

Another key new feature in antiSMASH is a dedicated module to compare identified BGCs with those encoding the biosynthetic pathways for known compounds. Previously, this was partially possible using the built-in ClusterBlast module, but as the output contains hits against all BGCs in the database, it is often not immediately obvious whether a BGC has been experimentally characterized or not. Hence, we have now included a dedicated KnownClusterBlast module (Figure 1) that compares identified BGCs with the comprehensive dataset of known BGCs (currently 1172 in total) from the ‘Minimum Information about a Biosynthetic Gene cluster’ (MIBiG) community project (Medema et al., submitted for publication; see http://mibig.secondarymetabolites.org for a full overview and additional info on these BGCs). This is a very important feature as (i) dereplication of existing compounds is crucial for effective discovery of novel natural products instead of finding the same molecules repeatedly, and (ii) comparative analysis of unknown and known gene clusters may provide hints concerning the function of certain genes within the cluster, inferred from homology.

Figure 1. — Example output of a KnownClusterBlast output, using the balhimycin gene cluster (GenBank Y16952.3). The significance thresholds used are the same as for the ClusterBlast module (8). Following the balhimycin gene cluster itself, several other BGCs involved in the biosynthesis of similar glycopeptides are shown as next best hits. The percentage of genes in the query cluster that are present in the hit cluster is included as extra information. Also, hyperlinks to the MIBiG repository are available, where users can find additional information on each gene cluster.

Identification and analysis of enzyme active sites

With the new ‘Active Site Finder’ module, it is possible to identify and annotate conserved amino acid motifs, for example active sites or product-determining key residues of biosynthetic enzymes. The active sites are annotated in the exported Genbank and/or EMBL files. Identified active sites are also displayed in the ‘Gene’ drop-down windows or in the cluster details view of the antiSMASH results web page. The motifs are defined in an XML file that can easily be extended on local antiSMASH installations. Currently, the following motifs are recognized by the active site finder module: active sites for ketosynthase (KS) domains, acyltransferase (AT) domains, dehydratase (DH) domains, ketoreductase (KR) domains, acyl-carrier protein (ACP) domains, thioesterases and cytochrome P450 oxygenases; and predictions based on key residues: ACP-type beta-branching/non-branching, ketoreductase domain (D-/L-) stereochemistry and enoyl reductase domain (2S/2R) stereochemistry.

Improvements in chemical structure prediction

Another important feature of antiSMASH has been its ability to display chemical structures of secondary metabolites predicted from the annotated BGCs. In this new version, we made two major improvements to generate more precise chemical structures by considering the effects of ketoreductase, dehydratase and enoylreductase that influence the redox status of keto groups in polyketides, and trans-AT type type I PKS logic. These corrections laid a foundation for further improvement in the display of more sophisticated structures of secondary metabolites. As before, predicted structures for NRPS and PKS clusters assume NRPS/PKS co-linearity and do not yet predict possible cyclizations or other post-NRPS/-PKS tailoring reactions.

Future developments toward genome-scale metabolic modeling of secondary metabolite biosynthesis

In the next phase of antiSMASH development, one major focus will be the automated integration of the predicted secondary metabolite biosynthesis pathways into genome-scale models of metabolism (11,12). As a first step in this direction, antiSMASH 3.0 already includes an interface to EFICAz2.5, which predicts Enzyme Commission (EC) numbers for all the metabolic genes in the submitted genome (13). In addition, a prototype functionality has been added for the automated generation of prokaryotic draft genome-scale metabolic models, following the established approach of homology-based modeling (14). This will be the basis for further developments toward the integration of secondary metabolic pathways (Kim H.U. et al., unpublished results).

BiosynML output for offline editing

Researchers usually submit single gene clusters, complete genomes or collections of scaffolds to antiSMASH. Hence, the antiSMASH results are frequently not endpoints of their workflow, but are used as input for additional downstream analyses and may also undergo manual curation. We have added the BiosynML output module to export detailed antiSMASH analysis results in XML format, which serves as a container for interfacing to custom analysis workflows and also enables offline archiving of results. The use of BiosynML output is exemplified by a plugin (available from http://www.biosynml.de/) for the widely used desktop software Geneious, thereby allowing users to import antiSMASH-annotated sequences, to organize their pathway collection and to manually refine gene cluster information (Figure 2) (15). Typical tasks performed during offline editing of pathways include the manual assignment of incorporated building blocks on the basis of experimental evidence, grouping of biosynthetic domains into functional modules (typically for modular PKS and NRPS gene clusters) and additional domain-specific analysis, e.g. by sequence alignments and inspection of signature motifs. In addition, the BiosynML plugin can handle direct antiSMASH job submission and result retrieval. Moreover, it also assists with the deposition of MIBiG-compliant gene clusters in the course of the ongoing MIBiG community initiative (Medema et al., submitted for publication). Finally, the BiosynML format aims to facilitate the prototyping of custom bioinformatic workflows by providing structured access to antiSMASH annotation results.

Figure 2. — BiosynML output and Geneious plugin. The schematic shows the interfacing of typical tasks during BGC analysis—including antiSMASH annotation, manual BGC refinement, deposition to in-house databases and submission to the public MIBiG repository—supported by BiosynML functionality.

Improved nomenclature and detection of RiPPs and polyketides

For a number of compound classes, such as polyketides and RiPPs, we have updated the nomenclature and corresponding detection logic (Supplementary Table S1). For lanthipeptides, we have improved the prediction of modification reactions, lanthionine bridge count and finished peptide masses (9). For other RiPPs, the community-agreed nomenclature recently published by van der Donk et al. (16) has been adopted. For polyketides and lipids, specific BGC classes have been added that are responsible for the biosynthesis of (dialkyl)resorcinols (17), aryl polyenes (10), ladderane lipids (18) and polyunsaturated fatty acids (19). Also, new domain types have been added for the domain representation of modular polyketide synthases and non-ribosomal peptide synthetases, including, e.g. branching, crotonase and pyran synthase domains (Supplementary Table S3). These updates ensure that the antiSMASH output pages reflect the latest developments in the field.

Back-end and library updates

The job management and dispatch infrastructure (https://bitbucket.org/antismash/runsmash) of the antiSMASH web server (https://bitbucket.org/antismash/websmash) have been redesigned to be more flexible when adding new features to antiSMASH and to increase speed and scalability. While the old MySQL-based job queue has reliably served around 90,000 jobs since the antiSMASH 1.0 publication in July 2011, the number of jobs submitted per month is still steadily increasing. In the seven weeks since the start of 2015, the new Redis-based queue has handled over 6000 jobs already, while being more flexible when adding new features to antiSMASH. A number of third-party tools, libraries and databases used by antiSMASH were updated to their latest version: BioPython (20) 1.65, PFAM database (21) 27.0 and NCBI BLAST+ (22) 2.2.30. The ClusterBlast database was updated to include clusters detected from the latest GenBank release. Additionally, the ClusterBlast routines now use DIAMOND (23) to calculate the cluster comparisons more quickly.

CONCLUSIONS AND FUTURE PERSPECTIVES

With the newly introduced features, antiSMASH is now even more comprehensive than it was before (Table 1), and will be useful for the discovery of new secondary metabolites as well as for metabolic engineering (24). Still, there are several important future challenges ahead. For example, chemical structure prediction is at the moment still limited to the ‘core’ peptides and polyketides that are off-loaded from modular assembly-lines, while cyclization and tailoring reactions are difficult to accurately predict. Perhaps a combinatorial strategy will make this possible in the near future, leading to a result consisting of multiple possible end compounds (as previously done to some extent in NP.searcher (3)). Another important remaining challenge is to scale up antiSMASH to allow the simultaneous analysis of large numbers of genomes and metagenomes; the development of automated BGC networking (10,25,26) could be a key technology to make this possible. In the coming years, we will strive to continuously upgrade antiSMASH in order to incorporate the latest insights and technologies, so that natural product researchers will always have access to a state-of-the-art tool for the comprehensive identification and analysis of BGCs. We invite the community to join us in these efforts by contributing new algorithms and analysis tools as antiSMASH plugins. This will ensure that the community as a whole will benefit from an integrated and centralized online usage environment.

Table 1. Overview of analyses integrated into antiSMASH.

Rule-based detection of BGCs^a,b
Aminocoumarins	Melanins
Aminoglycosides/aminocyclitols	Microcins
Aryl polyenes	Microviridins
Bacteriocins	Non-ribosomal peptides
Beta-lactams	Nucleosides
Bottromycins	Oligosaccharide
Butyrolactones	Others
ClusterFinder fatty acids	Phenazines
ClusterFinder saccharides	Phosphoglycolipids
Cyanobactins	Phosphonates
(Dialkyl)resorcinols	Polyunsaturated fatty acids
Ectoines	Trans-AT type I PKS
Furans	Type I PKS
Glycocins	Type II PKS
Head-to-tail cyclized peptides	Type III PKS
Heterocyst glycolipid PKS-like	Proteusins
Homoserine lactones	Sactipeptides
Indoles	Siderophores
Ladderane lipids	Terpenes
Lantipeptides	Thiopeptides
Linear azol(in)e-containing peptides (LAPs)
Lasso peptide
Linaridins
Rule-independent detection of BGCs
ClusterFinder
Cluster-specific analyses
Domain structure of PKSs and NRPSs^c
NRPS: A-domain specificity prediction
PKS: AT specificity prediction
Identification of conserved active site motifs; stereochemistry-determining motifs
Prediction of core chemical structure (NRPS, PKS, lanthipeptides)
smCOG secondary metabolism-related gene family prediction
Genome-wide analyses
Protein family detection (PFAM) search
EC number prediction
Homology-based metabolic modeling (with template models Escherichia coli /Streptomyces coelicolor)
Genome comparisons
ClusterBlast (identification of similar clusters in sequenced genomes)
SubClusterBlast (identification of conserved operons and multigene modules with known function)
KnownClusterBlast (identification of similar experimentally characterized gene clusters)
Links to other Web-resources
NCBI BLAST+
NaPDoS
Norine
Output file formats
Genbank
EMBL
SBML (for metabolic model files)
BiosynML
XLS (Microsoft Excel)
Tab-delimited text files
Input file formats

Open in a new tab

FASTA (nucleotide or protein).

Genbank/Genpept.

EMBL.

Direct download via NCBI accession number.

^aFor a list of profile Hidden Markov Models (pHMMs) used to detect the different classes, please see Supplementary Table S1.

^bFor a list of rules, please see Supplementary Table S2.

^cFor a list of detectable domains, please see Supplementary Table S3.

AVAILABILITY

http://antismash.secondarymetabolites.org// This website is free and open to all users and there is no login requirement. Source code is available from https://bitbucket.org/antismash/antismash/.

Supplementary Material

Supplementary Data

Click here for additional data file.^{(219.5KB, pdf)}

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

NWO Rubicon fellowship (to M.H.M.); Novo Nordisk Foundation (to S.Y.L.); German Center for Infection Research (DZIF) (to W.W.), T.W.; BBSRC, EPSRC (BB/M017702/1) to R.Bre and E.T.; The Technology Development Program to Solve Climate Changes on Systems Metabolic Engineering for Biorefineries from the Ministry of Science, ICT, and Future Planning (MSIP) through the National Research Foundation (NRF) of Korea [NRF-2012M1A2A2026556 to H.U.K., S.Y.L.]. Funding for open access charge: NWO Incentive Fund Open Access.

Conflict of interest statement. None declared.

REFERENCES

1.Newman D.J., Cragg G.M. Natural products as sources of new drugs over the 30 years from 1981 to 2010. J. Nat. Prod. 2012;75:311–335. doi: 10.1021/np200906s. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Starcevic A., Zucko J., Simunkovic J., Long P.F., Cullum J., Hranueli D. ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures. Nucleic Acids Res. 2008;36:6882–6892. doi: 10.1093/nar/gkn685. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Li M.H., Ung P.M., Zajkowski J., Garneau-Tsodikova S., Sherman D.H. Automated genome mining for natural products. BMC Bioinformatics. 2009;10:185. doi: 10.1186/1471-2105-10-185. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Anand S., Prasad M.V.R., Yadav G., Kumar N., Shehara J., Ansari M.Z., Mohanty D. SBSPKS: structure based sequence analysis of polyketide synthases. Nucleic Acids Res. 2010;38:W487–W496. doi: 10.1093/nar/gkq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Van Heel A.J., de Jong A., Montalbán-López M., Kok J., Kuipers O.P. BAGEL3: automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides. Nucleic Acids Res. 2013;41:W448–W453. doi: 10.1093/nar/gkt391. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Khaldi N., Seifuddin F.T., Turner G., Haft D., Nierman W.C., Wolfe K.H., Fedorova N.D. SMURF: genomic mapping of fungal secondary metabolite clusters. Fungal Genet. Biol. 2010;47:736–741. doi: 10.1016/j.fgb.2010.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Weber T. In silico tools for the analysis of antibiotic biosynthetic pathways. Int. J. Med. Microbiol. 2014;304:230–235. doi: 10.1016/j.ijmm.2014.02.001. [DOI] [PubMed] [Google Scholar]
8.Medema M.H., Blin K., Cimermancic P., De Jager V., Zakrzewski P., Fischbach M.A., Weber T., Takano E., Breitling R. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011;39:W339–W346. doi: 10.1093/nar/gkr466. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Blin K., Medema M.H., Kazempour D., Fischbach M., Breitling R., Takano E., Weber T. antiSMASH 2.0 – a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res. 2013;41:W204–W212. doi: 10.1093/nar/gkt449. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Cimermancic P., Medema M.H., Claesen J., Kurita K., Wieland Brown L.C., Mavrommatis K., Pati A., Godfrey P.A., Koehrsen M., Clardy J., et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell. 2014;158:412–421. doi: 10.1016/j.cell.2014.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Lewis N.E., Nagarajan H., Palsson B.Ø. Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. 2012;10:291–305. doi: 10.1038/nrmicro2737. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Zakrzewski P., Medema M.H., Gevorgyan A., Kierzek A.M., Breitling R., Takano E. MultiMetEval: comparative and multi-objective analysis of genome-scale metabolic models. PLoS One. 2012;7:e51511. doi: 10.1371/journal.pone.0051511. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kumar N., Skolnick J. EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics. 2012;28:2687–2688. doi: 10.1093/bioinformatics/bts510. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Agren R., Liu L., Shoaie S., Vongsangnak W., Nookaew I., Nielsen J. The RAVEN toolbox and its use for generating a genome-scale metabolic model for Penicillium chrysogenum. PLoS Comput. Biol. 2013;9:e1002980. doi: 10.1371/journal.pcbi.1002980. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Kearse M., Moir R., Wilson A., Stones-Havas S., Cheung M., Sturrock S., Buxton S., Cooper A., Markowitz S., Duran C., et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28:1647–1649. doi: 10.1093/bioinformatics/bts199. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Arnison P.G., Bibb M.J., Bierbaum G., Bowers A.A., Bugni T.S., Bulaj G., Camarero J.A., Campopiano D.J., Challis G.L., Clardy J., et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 2013;30:108–160. doi: 10.1039/c2np20085f. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Fuchs S.W., Bozhüyük K.A.J., Kresovic D., Grundmann F., Dill V., Brachmann A.O., Waterfield N.R., Bode H.B. Formation of 1, 3-cyclohexanediones and resorcinols catalyzed by a widely occurring ketosynthase. Angew. Chem. Int. Ed. Engl. 2013;52:4108–4112. doi: 10.1002/anie.201210116. [DOI] [PubMed] [Google Scholar]
18.Rattray J.E., Strous M., Op den Camp H.J.M., Schouten S., Jetten M.S., Sinninghe Damsté J.S. A comparative genomics study of genetic products potentially encoding ladderane lipid biosynthesis. Biol. Direct. 2009;4:8. doi: 10.1186/1745-6150-4-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Allen E.E., Bartlett D.H. Structure and regulation of the omega-3 polyunsaturated fatty acid synthase genes from the deep-sea bacterium Photobacterium profundum strain SS9. Microbiology. 2002;148:1903–1913. doi: 10.1099/00221287-148-6-1903. [DOI] [PubMed] [Google Scholar]
20.Cock P.J.A., Antao T., Chang J.T., Chapman B.A., Cox C.J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B., et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Finn R.D., Bateman A., Clements J., Coggill P., Eberhardt R.Y., Eddy S.R., Heger A., Hetherington K., Holm L., Mistry J., et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Buchfink B., Xie C., Huson D.H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 2014;12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
24.Weber T., Charusanti P., Musiol-Kroll E.M., Jiang X., Tong Y., Kim H.U., Lee S.Y. Metabolic engineering of antibiotic factories: new tools for antibiotic production in actinomycetes. Trends Biotechnol. 2015;33:15–26. doi: 10.1016/j.tibtech.2014.10.009. [DOI] [PubMed] [Google Scholar]
25.Doroghazi J.R., Albright J.C., Goering A.W., Ju K.-S., Haines R.R., Tchalukov K.A., Labeda D.P., Kelleher N.L., Metcalf W.W. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat. Chem. Biol. 2014;10:963–968. doi: 10.1038/nchembio.1659. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Nguyen D.D., Wu C.H., Moree W.J., Lamsa A., Medema M.H., Zhao X., Gavilan R.G., Aparicio M., Atencio L., Jackson C., et al. MS/MS networking guided analysis of molecule and gene cluster families. Proc. Natl. Acad. Sci. U. S. A. 2013;110:E2611–E2620. doi: 10.1073/pnas.1303471110. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Click here for additional data file.^{(219.5KB, pdf)}

[B1] 1.Newman D.J., Cragg G.M. Natural products as sources of new drugs over the 30 years from 1981 to 2010. J. Nat. Prod. 2012;75:311–335. doi: 10.1021/np200906s. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Starcevic A., Zucko J., Simunkovic J., Long P.F., Cullum J., Hranueli D. ClustScan: an integrated program package for the semi-automatic annotation of modular biosynthetic gene clusters and in silico prediction of novel chemical structures. Nucleic Acids Res. 2008;36:6882–6892. doi: 10.1093/nar/gkn685. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Li M.H., Ung P.M., Zajkowski J., Garneau-Tsodikova S., Sherman D.H. Automated genome mining for natural products. BMC Bioinformatics. 2009;10:185. doi: 10.1186/1471-2105-10-185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Anand S., Prasad M.V.R., Yadav G., Kumar N., Shehara J., Ansari M.Z., Mohanty D. SBSPKS: structure based sequence analysis of polyketide synthases. Nucleic Acids Res. 2010;38:W487–W496. doi: 10.1093/nar/gkq340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Van Heel A.J., de Jong A., Montalbán-López M., Kok J., Kuipers O.P. BAGEL3: automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides. Nucleic Acids Res. 2013;41:W448–W453. doi: 10.1093/nar/gkt391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Khaldi N., Seifuddin F.T., Turner G., Haft D., Nierman W.C., Wolfe K.H., Fedorova N.D. SMURF: genomic mapping of fungal secondary metabolite clusters. Fungal Genet. Biol. 2010;47:736–741. doi: 10.1016/j.fgb.2010.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Weber T. In silico tools for the analysis of antibiotic biosynthetic pathways. Int. J. Med. Microbiol. 2014;304:230–235. doi: 10.1016/j.ijmm.2014.02.001. [DOI] [PubMed] [Google Scholar]

[B8] 8.Medema M.H., Blin K., Cimermancic P., De Jager V., Zakrzewski P., Fischbach M.A., Weber T., Takano E., Breitling R. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011;39:W339–W346. doi: 10.1093/nar/gkr466. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Blin K., Medema M.H., Kazempour D., Fischbach M., Breitling R., Takano E., Weber T. antiSMASH 2.0 – a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res. 2013;41:W204–W212. doi: 10.1093/nar/gkt449. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Cimermancic P., Medema M.H., Claesen J., Kurita K., Wieland Brown L.C., Mavrommatis K., Pati A., Godfrey P.A., Koehrsen M., Clardy J., et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell. 2014;158:412–421. doi: 10.1016/j.cell.2014.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11.Lewis N.E., Nagarajan H., Palsson B.Ø. Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. 2012;10:291–305. doi: 10.1038/nrmicro2737. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12.Zakrzewski P., Medema M.H., Gevorgyan A., Kierzek A.M., Breitling R., Takano E. MultiMetEval: comparative and multi-objective analysis of genome-scale metabolic models. PLoS One. 2012;7:e51511. doi: 10.1371/journal.pone.0051511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Kumar N., Skolnick J. EFICAz2.5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics. 2012;28:2687–2688. doi: 10.1093/bioinformatics/bts510. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14.Agren R., Liu L., Shoaie S., Vongsangnak W., Nookaew I., Nielsen J. The RAVEN toolbox and its use for generating a genome-scale metabolic model for Penicillium chrysogenum. PLoS Comput. Biol. 2013;9:e1002980. doi: 10.1371/journal.pcbi.1002980. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Kearse M., Moir R., Wilson A., Stones-Havas S., Cheung M., Sturrock S., Buxton S., Cooper A., Markowitz S., Duran C., et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28:1647–1649. doi: 10.1093/bioinformatics/bts199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Arnison P.G., Bibb M.J., Bierbaum G., Bowers A.A., Bugni T.S., Bulaj G., Camarero J.A., Campopiano D.J., Challis G.L., Clardy J., et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 2013;30:108–160. doi: 10.1039/c2np20085f. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Fuchs S.W., Bozhüyük K.A.J., Kresovic D., Grundmann F., Dill V., Brachmann A.O., Waterfield N.R., Bode H.B. Formation of 1, 3-cyclohexanediones and resorcinols catalyzed by a widely occurring ketosynthase. Angew. Chem. Int. Ed. Engl. 2013;52:4108–4112. doi: 10.1002/anie.201210116. [DOI] [PubMed] [Google Scholar]

[B18] 18.Rattray J.E., Strous M., Op den Camp H.J.M., Schouten S., Jetten M.S., Sinninghe Damsté J.S. A comparative genomics study of genetic products potentially encoding ladderane lipid biosynthesis. Biol. Direct. 2009;4:8. doi: 10.1186/1745-6150-4-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Allen E.E., Bartlett D.H. Structure and regulation of the omega-3 polyunsaturated fatty acid synthase genes from the deep-sea bacterium Photobacterium profundum strain SS9. Microbiology. 2002;148:1903–1913. doi: 10.1099/00221287-148-6-1903. [DOI] [PubMed] [Google Scholar]

[B20] 20.Cock P.J.A., Antao T., Chang J.T., Chapman B.A., Cox C.J., Dalke A., Friedberg I., Hamelryck T., Kauff F., Wilczynski B., et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Finn R.D., Bateman A., Clements J., Coggill P., Eberhardt R.Y., Eddy S.R., Heger A., Hetherington K., Holm L., Mistry J., et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Buchfink B., Xie C., Huson D.H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 2014;12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]

[B24] 24.Weber T., Charusanti P., Musiol-Kroll E.M., Jiang X., Tong Y., Kim H.U., Lee S.Y. Metabolic engineering of antibiotic factories: new tools for antibiotic production in actinomycetes. Trends Biotechnol. 2015;33:15–26. doi: 10.1016/j.tibtech.2014.10.009. [DOI] [PubMed] [Google Scholar]

[B25] 25.Doroghazi J.R., Albright J.C., Goering A.W., Ju K.-S., Haines R.R., Tchalukov K.A., Labeda D.P., Kelleher N.L., Metcalf W.W. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat. Chem. Biol. 2014;10:963–968. doi: 10.1038/nchembio.1659. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26.Nguyen D.D., Wu C.H., Moree W.J., Lamsa A., Medema M.H., Zhao X., Gavilan R.G., Aparicio M., Atencio L., Jackson C., et al. MS/MS networking guided analysis of molecule and gene cluster families. Proc. Natl. Acad. Sci. U. S. A. 2013;110:E2611–E2620. doi: 10.1073/pnas.1303471110. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

Tilmann Weber

Kai Blin

Srikanth Duddela

Daniel Krug

Hyun Uk Kim

Robert Bruccoleri

Sang Yup Lee

Michael A Fischbach

Rolf Müller

Wolfgang Wohlleben

Rainer Breitling

Eriko Takano

Marnix H Medema

Abstract

INTRODUCTION

NEW FEATURES AND UPDATES

Integration with ClusterFinder

Dereplication and comparison with known pathways

Figure 1.

Identification and analysis of enzyme active sites

Improvements in chemical structure prediction

Future developments toward genome-scale metabolic modeling of secondary metabolite biosynthesis

BiosynML output for offline editing

Figure 2.

Improved nomenclature and detection of RiPPs and polyketides

Back-end and library updates

CONCLUSIONS AND FUTURE PERSPECTIVES

Table 1. Overview of analyses integrated into antiSMASH.

AVAILABILITY

Supplementary Material

SUPPLEMENTARY DATA

FUNDING

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases