antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification

Kai Blin; Thomas Wolf; Marc G Chevrette; Xiaowen Lu; Christopher J Schwalen; Satria A Kautsar; Hernando G Suarez Duran; Emmanuel L C de los Santos; Hyun Uk Kim; Mariana Nave; Jeroen S Dickschat; Douglas A Mitchell; Ekaterina Shelest; Rainer Breitling; Eriko Takano; Sang Yup Lee; Tilmann Weber; Marnix H Medema

doi:10.1093/nar/gkx319

. 2017 Apr 28;45(Web Server issue):W36–W41. doi: 10.1093/nar/gkx319

antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification

Kai Blin ¹, Thomas Wolf ², Marc G Chevrette ³, Xiaowen Lu ⁴, Christopher J Schwalen ⁵, Satria A Kautsar ⁴, Hernando G Suarez Duran ⁴, Emmanuel L C de los Santos ⁶, Hyun Uk Kim ^1,⁷, Mariana Nave ⁸, Jeroen S Dickschat ⁹, Douglas A Mitchell ^5,¹⁰, Ekaterina Shelest ², Rainer Breitling ¹¹, Eriko Takano ¹¹, Sang Yup Lee ^1,⁷, Tilmann Weber ^1,^*, Marnix H Medema ^4,^*

PMCID: PMC5570095 PMID: 28460038

Abstract

Many antibiotics, chemotherapeutics, crop protection agents and food preservatives originate from molecules produced by bacteria, fungi or plants. In recent years, genome mining methodologies have been widely adopted to identify and characterize the biosynthetic gene clusters encoding the production of such compounds. Since 2011, the ‘antibiotics and secondary metabolite analysis shell—antiSMASH’ has assisted researchers in efficiently performing this, both as a web server and a standalone tool. Here, we present the thoroughly updated antiSMASH version 4, which adds several novel features, including prediction of gene cluster boundaries using the ClusterFinder method or the newly integrated CASSIS algorithm, improved substrate specificity prediction for non-ribosomal peptide synthetase adenylation domains based on the new SANDPUMA algorithm, improved predictions for terpene and ribosomally synthesized and post-translationally modified peptides cluster products, reporting of sequence similarity to proteins encoded in experimentally characterized gene clusters on a per-protein basis and a domain-level alignment tool for comparative analysis of trans-AT polyketide synthase assembly line architectures. Additionally, several usability features have been updated and improved. Together, these improvements make antiSMASH up-to-date with the latest developments in natural product research and will further facilitate computational genome mining for the discovery of novel bioactive molecules.

INTRODUCTION

Natural products, also referred to as secondary or specialized metabolites, are the basis of many drugs and are also important molecules for agricultural and nutritional applications; moreover, they play key roles in scientific research as chemical probes to study many aspects of molecular and cellular biology. The observation that the genomes of many microorganisms contain multiple biosynthetic gene clusters (BGCs) that code for the production of such molecules has led to a paradigm shift in natural products research: within the last 10 years, genome mining has been established as an important technology complementing the bioassay- and chemistry-driven classical natural products discovery process (1). This fundamental change was supported by the development and public availability of various genome mining software tools that are usable by wet-lab microbiologists and chemists (as reviewed in (2–4)), such as NP.searcher (5), antiSMASH (6–8), NaPDoS (9) and recently PRISM/GNP (10,11).

The comprehensive open-source BGC mining platform antiSMASH (6–8) was first released in 2011 and has been regularly updated with extended functionality. antiSMASH facilitates the mining of bacterial and fungal genomes and is tightly interconnected with plantiSMASH, a new variant for BGC mining in plants (12), the antiSMASH database (13) and the Minimum Information on Biosynthetic Gene Cluster (MIBiG) repository of experimentally characterized BGCs (14).

Here, we report version 4 of antiSMASH, which includes several major extensions, such as gene cluster boundary prediction for fungal BGCs, improved chemistry predictions for terpene, ribosomal peptide and non-ribosomal peptide BGCs, comparative alignment of trans-AT polyketide synthase (PKS) assembly lines and TTA codon annotation. Moreover, an improved user interface was introduced, along with several other usability and efficiency improvements. The public antiSMASH web server is freely accessible at http://antismash.secondarymetabolites.org.

NEW FEATURES AND UPDATES

Improved prediction of gene cluster boundaries

Estimating the boundaries of BGCs is a continuing challenge for genome mining tools. Traditionally, antiSMASH has opted for a ‘greedy’ approach by design, in order to ensure a greater likelihood of including all pertinent biosynthetic genes. The rationale behind this was that expert users would be better at estimating cluster boundaries than automated algorithms would. However, for certain purposes, it is still highly beneficial for users to review a computer-assisted estimate of where a BGC may start and end. For this reason, antiSMASH has now added two methods to predict the boundaries of BGCs. For fungal genomes, the Cluster Assignment by Islands of Sites (CASSIS) algorithm (15) is used for this purpose, which identifies genes within the BGC that share a common pathway-specific regulatory motif (Figure 1). Additionally, for both bacterial and fungal genomes, the user can now choose to use the ClusterFinder algorithm (16) to estimate cluster boundaries based on frequencies of locally encoded protein domains detected by Pfam (17) (based on these being either more or less BGC-like). If the user selects one of the BGC boundary prediction options (ClusterFinder for bacteria and fungi, CASSIS for fungi only), the extents of the predicted cluster region are displayed as bars above the BGC and also annotated in the GenBank files that can be downloaded.

Figure 1. — Gene cluster border prediction by the Cluster Assignment by Islands of Sites (CASSIS) algorithm. The fourth cluster on chromosome 1 of *Aspergillus nidulans* is shown. The cream-colored bar above the gene arrows spans the genes predicted to be clustered by CASSIS. Further genes in the surrounding are displayed for additional context. Similar functionality is available when using ClusterFinder to predict gene cluster borders.

New algorithms for non-ribosomal peptide and terpene chemistry prediction

Since the first version of antiSMASH, three algorithms have been used within the pipeline to predict the substrate specificities of non-ribosomal peptide synthetase (NRPS) adenylation (A) domains: the support-vector machine (SVM) and active-site motif (ASM) prediction methods from NRPSPredictor2 (18) and the profile HMM (pHMM)-based method from Minowa et al. Since then, several new algorithms have been published to predict A-domain specificity (19–21). More recently, Chevrette et al. (manuscript in review) substantially expanded the training sets for these algorithms, introduced an additional (phylogenetics-based) algorithm (PrediCAT), benchmarked all algorithms systematically and constructed an ensemble prediction method (called SANDPUMA) that outperformed each method individually. To benefit from the latest insights in this field, we have now replaced the previous prediction algorithms with the SANDPUMA predictions; these provide not only the ensemble outputs, but also the individual outputs of the underlying SVM, ASM, PrediCAT and pHMM algorithms. Since the benchmark comparison had shown the Minowa method (22) to be the least reliable of all previously published methods, this algorithm was judged to be uninformative and has been removed from the antiSMASH pipeline.

In addition to the prediction of non-ribosomal peptide chemistry, antiSMASH now also provides chemical structure predictions for the products of bacterial terpene synthases (23). To this end, a terpene cyclase-specific version of PrediCAT (see Supplementary Figure S1 and Table S1) has been included, to predict terpene cyclization patterns (such as 1,6-, 1,10- or 1,11 cyclizations) based on phylogenetic relationships with known enzymes from a documented reference set of terpene cyclases: when a query enzyme forms a monophyletic clade with enzymes with a known cyclization chemistry, this cyclization pattern is assigned to the query as a prediction. These predictions (see Supplementary Figure S1 for accuracy assessment) are then reported alongside the name of and sequence identity to the most closely related experimentally characterized homolog. It should be noted that the predictions are only performed for those terpene BGCs that encode mono-, sesqui- or diterpene cyclases (Pfams PF01397 and/or PF03936) and not for those that (only) encode phytoene synthases, tetraterpene cyclases, oxidosqualene cyclases, tryptophan dimethylallyltransferases, geranylgeranyl diphosphate (GGPP) synthases and/or lycopene cyclases.

Improved RiPP BGC identification and structure prediction

Ribosomally synthesized and Post-translationally modified Peptides (RiPPs) constitute a growing area of natural products research. antiSMASH supports researchers in predicting 15 distinct classes of RiPP BGCs. Previously, antiSMASH predicted only lanthipeptide precursors using a relatively limited pHMM-based approach. The current version of antiSMASH now provides a more sophisticated prediction and classification for class I lanthipeptides as well as lasso peptides, sactipeptides and thiopeptides. Given that RiPPs start as gene-encoded precursor peptides prior to post-translational modification, amino acid sequence prediction provides a wealth of information regarding the structure of the final product. However, the open-reading frames (ORFs) encoding these peptides are often overlooked by automated analysis and can be highly sequence variable, necessitating the need for current precursor identification methods.

To assist in identifying the precursor peptide-encoding gene, antiSMASH now utilizes the algorithm from the genome-mining platform Rapid ORF Description and Evaluation Online (RODEO) (24), which uses a combination of heuristic scoring, SVM and motif analysis to evaluate all candidate precursor peptides in a putative RiPP BGC. To broaden its applicability, the RODEO algorithm was extended to perform precursor prediction not only for lasso peptides, but also for thiopeptides, class I lanthipeptides and sactipeptides (see Supplementary Text 1 and Figures S2–4). When submitting an annotated nucleotide sequence to antiSMASH, the algorithm evaluates small genes that are already part of this annotation, as well as all other small ORFs in intergenic regions across the predicted cluster, in order to mitigate issues with gene prediction.

For the RiPP classes analyzed by the RODEO algorithm, antiSMASH reports: (i) the respective class of RiPP (e.g. lasso peptide or thiopeptide, etc.), (ii) a predicted leader peptide cleavage site and (iii) any potential C-terminal proteolytic processing. Given the post-translational simplicity of lasso peptides, a molecular mass is also calculated, accounting for the number of disulfide bridges. For thiopeptides, the macrocycle size and potential amidation are predicted as well. Molecular weight predictions are not given for the other RiPP subclasses owing to their extensive and variable post-translational modifications.

Trans-AT PKS domain alignments

Several key classes of natural products are produced by multimodular enzymatic assembly lines. Standard similarity searches (as performed in antiSMASH's ClusterBlast module) do not reveal major insights between the natural product structures and the genes for the corresponding multidomain proteins that encode their biosynthetic enzymes. In order to better address this issue, we have now included an assembly line alignment method for trans-AT PKS (E. Helfrich, X. Lu et al. manuscript in preparation), which uses reference phylogenies of ketosynthase (KS) domains to assign KS domains from identified gene clusters into clades that correspond to a certain type of polyketide chemistry. Based on this classification, the encoded assembly line is then aligned to reference assembly lines from known BGCs in MIBiG (14) based on a distance metric that involves the Jaccard index, Goodman–Kruskal gamma function and domain duplication index of KS domain clades at empirically determined weights of 0.5, 0.25 and 0.25, respectively (see also (25)). The assembly lines that are most closely related to the query are then selected and clustered using Unweighted Pair Group Method with arithmetic mean clustering with the same metric and displayed in a visual alignment, in which each KS domain clade is annotated with a distinct color and a text description of the associated chemistry (Figure 2). This analysis allows for a rapid assessment of biochemical relationships between the products of these assembly lines, in order to identify new variants of known molecules or to find novel polyketide scaffolds.

Figure 2. — Visualization of *trans*-AT PKS assembly-line alignments. The top 15 most closely related assembly lines are visualized together with the query sequence (which represents the identified BGC currently in view). When clicking on a domain, its location (amino acid coordinates) within the parent protein are displayed and clicking on the gray connecting edges will trigger a display of the sequence identity between homologous domains based on a MAFFT multiple sequence alignment.

TTA codon annotation

Streptomyces and related genera are important producers of clinically used antibiotics, such as tetracyclines or erythromycin, or drugs to treat parasitic worms such as avermectin. These bacteria have GC-contents of >70% and thus a skew toward higher GC triplets in their codon usage. While genes involved in primary metabolism almost exclusively use CTC codons to code for Leu, key genes in secondary metabolism and cell differentiation often contain TTA codons. As the expression of the TTA-codon specific Leu-tRNA-gene bldA is tightly controlled and the Leu-tRNA only accumulates in later stages of growth, this offers an additional level of regulation (26–28). The expression of the BGCs therefore does not only require activation at the transcriptional level, but also the presence of the TTA-specific Leu-tRNA. This must be considered, for example, for heterologous BGC expression in other streptomycete hosts or metabolic engineering approaches. Therefore, a new feature was included in antiSMASH version 4 to automatically scan all BGCs for the presence of TTA codons and annotate these in the graphical cluster overview and the GenBank/EMBL result files.

Usability and efficiency improvements

antiSMASH comes with an updated, larger ClusterBlast database for comparative gene cluster analysis. In order to keep the runtime of the ClusterBlast analysis at acceptable levels with the much larger database, antiSMASH now uses the BLAST-compatible DIAMOND algorithm (29) to calculate results for ClusterBlast (against all ±220,000 BGCs currently detected in NCBI GenBank) and KnownClusterBlast (against experimentally characterized BGCs from MIBiG (14). ClusterBlast results are now cross-referenced to the antiSMASH database (13), whenever present there, through hyperlinks on the matched clusters; this allows researchers to quickly get a more complete view of these BGCs. Also, for each gene in a predicted gene cluster, an individual BLAST search is now automatically run against all proteins encoded in BGCs deposited in MIBiG (14); this helps researchers to predict functions of individual genes based on similarity of their encoded amino acid sequence to those of experimentally characterized proteins, even when the rest of the surrounding gene clusters are not similar.

In order to simplify selecting the correct input settings, separate submission pages were created for fungal sequences (http://fungismash.secondarymetabolites.org/) and plant sequences (http://plantismash.secondarymetabolites.org/). The main antiSMASH website is now focused on bacterial and archaeal sequences. The metabolic modeling functionality along with an EC number prediction option that were introduced in antiSMASH version 3 were removed again, as they led to extremely long run times and high server load. An updated version with improved reaction rules for secondary metabolite biosynthetic pathways will be released as a separate, but still closely linked program.

In addition to GenBank- and EMBL-formatted files, gene annotations can now also be added to FASTA sequences by also uploading a GFF3-formatted file. To assist job submission and retrieval from third-party tools running upstream or downstream analyses such as the CRISPR single guide RNA finding tool CRISPy-web (30) or the Antibiotics Resistance Target Seeker service (31), the antiSMASH web component now supports a REST-like (32) web API.

CONCLUSIONS AND FUTURE PERSPECTIVES

With the new features now introduced (Table 1), the antiSMASH framework continues to improve through the concerted action of researchers in the natural products community. A number of additional features are still in development, including application of the visual assembly line alignments to NRPSs, detailed gene cluster boundary prediction through phylogenetic profiling and detection of putative resistance genes inside BGCs.

Table 1. Overview of analyzes integrated into antiSMASH.

Open in a new tab

With regard to chemistry prediction of the products of NRPSs and PKSs, we have opted to be conservative for the moment. The recently introduced PRISM pipeline (11) does a great job of automatically predicting a wide range of possible products of each BGC, which facilitates automated matching to large-scale metabolomic data. However, the majority of antiSMASH users still rely on manual comparison of BGCs with smaller-scale experimental data; we feel that this approach benefits more from reliable predictions of substructures and substrate specificities (and refraining from making lower-confidence combinatorial predictions). In this respect, PRISM and antiSMASH offer complementary functionalities and the user can opt to use either pipeline based on the intended research purposes.

We continue to strive for interoperability with other services. For example, antiSMASH predictions are also available through the Joint Genome Institute's IMG-ABC (33) as well as Genoscope's framework MicroScope (34); connections to EFI-EST (35) and other tools are being investigated. Also, we remain committed to collaborating with other researchers worldwide and invite expert feedback as well as technical contributions from the community to improve this important piece of software.

AVAILABILITY

antiSMASH is available from http://antismash.secondarymetabolites.org/. This website is free and open to all users and there is no login requirement. Source code is available from https://bitbucket.org/antismash/antismash/.

Supplementary Material

Supplementary Data

Click here for additional data file.^{(1,018.1KB, zip)}

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Novo Nordisk Foundation (to S.Y.L., T.W.); The Netherlands Organization for Scientific Research (NWO) VENI Grant [863.15.002 to M.H.M.]; Graduate School for Experimental Plant Sciences (EPS) (to M.H.M.); Ministry of Science, ICT and Future Planning through the National Research Foundation (NRF) of Korea [NRF-2012M1A2A2026556 to H.U.K., S.Y.L.]; International Leibniz Research School for Microbial and Molecular Interactions (ILRS), as part of the excellence graduate school Jena School for Microbial Communication (JSMC), supported by the Deutsche Forschungsgemeinschaft (DFG) [to T. Wo.]; Collaborative Research Centre ChemBioSys (CRC 1127 ChemBioSys), by the DFG (to E.S.); NIH National Research Service Award [T32 GM008505 to M.G.C.]; David and Lucile Packard Fellowship for Science and Engineering (to D.A.M.); Department of Chemistry at the University of Illinois at Urbana–Champaign Fellowship (to C.J.S.); NIH Chemical Biology Interface Training Program Fellowship [T32 GM070421 to C.J.S.]; Google Summer of Code grant (to M.N.); Warwick Integrative Synthetic Biology Centre (WISB), and Manchester Synthetic Biology Research Centre (SYNBIOCHEM) funded under the UK Research Councils' ‘Synthetic Biology for Growth’ programme [BB/M017982/1 (WISB), BB/M017702/1 (SYNBIOCHEM)]. Funding for open access charge: Netherlands Organization for Scientific Research (NWO).

Conflict of interest statement. None declared.

REFERENCES

1. Ziemert N., Alanjary M., Weber T.. The evolution of genome mining in microbes - a review. Nat. Prod. Rep. 2016; 33:988–1005. [DOI] [PubMed] [Google Scholar]
2. Weber T. In silico tools for the analysis of antibiotic biosynthetic pathways. Int. J. Med. Microbiol. 2014; 304:230–235. [DOI] [PubMed] [Google Scholar]
3. Weber T., Kim H.U.. The secondary metabolite bioinformatics portal: computational tools to facilitate synthetic biology of secondary metabolite production. Synth. Syst. Biotechnol. 2016; 1:69–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Medema M.H., Fischbach M.A.. Computational approaches to natural product discovery. Nat. Chem. Biol. 2015; 11:639–648. [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Li M.H.T., Ung P.M.U., Zajkowski J., Garneau-Tsodikova S., Sherman D.H.. Automated genome mining for natural products. BMC Bioinformatics. 2009; 10:185. [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Medema M.H., Blin K., Cimermancic P., de Jager V., Zakrzewski P., Fischbach M.A., Weber T., Takano E., Breitling R.. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011; 39:W339–W346. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Blin K., Medema M.H., Kazempour D., Fischbach M.A., Breitling R., Takano E., Weber T.. antiSMASH 2.0–a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res. 2013; 41:W204–W212. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Weber T., Blin K., Duddela S., Krug D., Kim H.U., Bruccoleri R., Lee S.Y., Fischbach M.A., Müller R., Wohlleben W. et al. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. 2015; 43:W237–W243. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Ziemert N., Podell S., Penn K., Badger J.H., Allen E., Jensen P.R.. The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS One. 2012; 7:e34064. [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Johnston C.W., Skinnider M.A., Wyatt M.A., Li X., Ranieri M.R.M., Yang L., Zechel D.L., Ma B., Magarvey N.A.. An automated Genomes-to-Natural Products platform (GNP) for the discovery of modular natural products. Nat. Commun. 2015; 6:8421. [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Skinnider M.A., Dejong C.A., Rees P.N., Johnston C.W., Li H., Webster A.L.H., Wyatt M.A., Magarvey N.A.. Genomes to natural products prediction informatics for secondary metabolomes (PRISM). Nucleic Acids Res. 2015; 43:9645–9662. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Kautsar S.A., Suarez Duran H.G., Blin K., Osbourn A., Medema M.H.. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters. Nucleic Acids Res. 2016; doi:10.1093/nar/gkx305. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Blin K., Medema M.H., Kottmann R., Lee S.Y., Weber T.. The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clusters. Nucleic Acids Res. 2017; 45:D555–D559. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Medema M.H., Kottmann R., Yilmaz P., Cummings M., Biggins J.B., Blin K., de Bruijn I., Chooi Y.H., Claesen J., Coates R.C. et al. Minimum information about a biosynthetic gene cluster. Nat. Chem. Biol. 2015; 11:625–631. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Wolf T., Shelest V., Nath N., Shelest E.. CASSIS and SMIPS: promoter-based prediction of secondary metabolite gene clusters in eukaryotic genomes. Bioinformatics. 2016; 32:1138–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]
16. Cimermancic P., Medema M.H., Claesen J., Kurita K., Wieland Brown L.C., Mavrommatis K., Pati A., Godfrey P.A., Koehrsen M., Clardy J. et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell. 2014; 158:412–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
17. Finn R.D., Coggill P., Eberhardt R.Y., Eddy S.R., Mistry J., Mitchell A.L., Potter S.C., Punta M., Qureshi M., Sangrador-Vegas A. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016; 44:D279–D285. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Röttig M., Medema M.H., Blin K., Weber T., Rausch C., Kohlbacher O.. NRPSpredictor2–a web server for predicting NRPS adenylation domain specificity. Nucleic Acids Res. 2011; 39:W362–W367. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Prieto C., García-Estrada C., Lorenzana D., Martín J.F.. NRPSsp: non-ribosomal peptide synthase substrate predictor. Bioinformatics. 2012; 28:426–427. [DOI] [PubMed] [Google Scholar]
20. Khayatt B.I., Overmars L., Siezen R.J., Francke C.. Classification of the adenylation and acyl-transferase activity of NRPS and PKS systems using ensembles of substrate specific hidden Markov models. PLoS One. 2013; 8:e62136. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Baranašić D., Zucko J., Diminic J., Gacesa R., Long P.F., Cullum J., Hranueli D., Starcevic A.. Predicting substrate specificity of adenylation domains of nonribosomal peptide synthetases and other protein properties by latent semantic indexing. J. Ind. Microbiol. Biotechnol. 2014; 41:461–467. [DOI] [PubMed] [Google Scholar]
22. Minowa Y., Araki M., Kanehisa M.. Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. J. Mol. Biol. 2007; 368:1500–1517. [DOI] [PubMed] [Google Scholar]
23. Dickschat J.S. Bacterial terpene cyclases. Nat. Prod. Rep. 2016; 33:87–110. [DOI] [PubMed] [Google Scholar]
24. Tietz J.I., Schwalen C.J., Patel P.S., Maxson T., Blair P.M., Tai H.C., Zakai U.I., Mitchell D.A.. A new genome-mining tool redefines the lasso peptide biosynthetic landscape. Nat. Chem. Biol. 2017; 13:470–478. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Nguyen D.D., Melnik A.V., Koyama N., Lu X., Schorn M., Fang J., Aguinaldo K., Lincecum T.L. Jr, Ghequire M.G.K., Carrion V.J. et al. Indexing the Pseudomonas specialized metabolome enabled the discovery of poaeamide B and the bananamides. Nat. Microbiol. 2016; 2:16197. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Hackl S., Bechthold A.. The Gene bldA, a regulator of morphological differentiation and antibiotic production in Streptomyces. Arch. Pharm. 2015; 348:455–462. [DOI] [PubMed] [Google Scholar]
27. Leskiw B.K., Bibb M.J., Chater K.F.. The use of a rare codon specifically during development. Mol. Microbiol. 1991; 5:2861–2867. [DOI] [PubMed] [Google Scholar]
28. Leskiw B.K., Lawlor E.J., Fernandez-Abalos J.M., Chater K.F.. TTA codons in some genes prevent their expression in a class of developmental, antibiotic-negative, Streptomyces mutants. Proc. Natl. Acad. Sci. U.S.A. 1991; 88:2461–2465. [DOI] [PMC free article] [PubMed] [Google Scholar]
29. Buchfink B., Xie C., Huson D.H.. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 2015; 12:59–60. [DOI] [PubMed] [Google Scholar]
30. Blin K., Pedersen L.E., Weber T., Lee S.Y.. CRISPy-web: an online resource to design sgRNAs for CRISPR applications. Synth. Syst. Biotechnol. 2016; 1:118–121. [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Alanjary M., Kronmiller B., Adamek M., Blin K., Weber T., Huson D.H., Philmus B., Ziemert N.. The Antibiotic Resistant Target Seeker (ARTS), an exploration engine for antibiotic cluster prioritization and novel drug target discovery. Nucleic Acids Res. 2017; doi:10.1093/nar/gkx360. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Fielding R.T., Taylor R.N.. Principled design of the modern web architecture. ACM Trans. Internet Technol. 2002; 2:115–150. [Google Scholar]
33. Hadjithomas M., Chen I.M.A., Chu K., Ratner A., Palaniappan K., Szeto E., Huang J., Reddy T.B.K., Cimermančič P., Fischbach M.A. et al. IMG-ABC: a knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites. Mbio. 2015; 6:e00932. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Vallenet D., Calteau A., Cruveiller S., Gachet M., Lajus A., Josso A., Mercier J., Renaux A., Rollin J., Rouy Z. et al. MicroScope in 2017: an expanding and evolving integrated resource for community expertise of microbial genomes. Nucleic Acids Res. 2017; 45:D517–D528. [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Gerlt J.A., Bouvier J.T., Davidson D.B., Imker H.J., Sadkhin B., Slater D.R., Whalen K.L.. Enzyme function initiative-enzyme similarity tool (EFI-EST): a web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta. 2015; 1854:1019–1037. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Click here for additional data file.^{(1,018.1KB, zip)}

[B1] 1. Ziemert N., Alanjary M., Weber T.. The evolution of genome mining in microbes - a review. Nat. Prod. Rep. 2016; 33:988–1005. [DOI] [PubMed] [Google Scholar]

[B2] 2. Weber T. In silico tools for the analysis of antibiotic biosynthetic pathways. Int. J. Med. Microbiol. 2014; 304:230–235. [DOI] [PubMed] [Google Scholar]

[B3] 3. Weber T., Kim H.U.. The secondary metabolite bioinformatics portal: computational tools to facilitate synthetic biology of secondary metabolite production. Synth. Syst. Biotechnol. 2016; 1:69–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4. Medema M.H., Fischbach M.A.. Computational approaches to natural product discovery. Nat. Chem. Biol. 2015; 11:639–648. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5. Li M.H.T., Ung P.M.U., Zajkowski J., Garneau-Tsodikova S., Sherman D.H.. Automated genome mining for natural products. BMC Bioinformatics. 2009; 10:185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6. Medema M.H., Blin K., Cimermancic P., de Jager V., Zakrzewski P., Fischbach M.A., Weber T., Takano E., Breitling R.. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011; 39:W339–W346. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7. Blin K., Medema M.H., Kazempour D., Fischbach M.A., Breitling R., Takano E., Weber T.. antiSMASH 2.0–a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res. 2013; 41:W204–W212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8. Weber T., Blin K., Duddela S., Krug D., Kim H.U., Bruccoleri R., Lee S.Y., Fischbach M.A., Müller R., Wohlleben W. et al. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. 2015; 43:W237–W243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9. Ziemert N., Podell S., Penn K., Badger J.H., Allen E., Jensen P.R.. The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS One. 2012; 7:e34064. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10. Johnston C.W., Skinnider M.A., Wyatt M.A., Li X., Ranieri M.R.M., Yang L., Zechel D.L., Ma B., Magarvey N.A.. An automated Genomes-to-Natural Products platform (GNP) for the discovery of modular natural products. Nat. Commun. 2015; 6:8421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11] 11. Skinnider M.A., Dejong C.A., Rees P.N., Johnston C.W., Li H., Webster A.L.H., Wyatt M.A., Magarvey N.A.. Genomes to natural products prediction informatics for secondary metabolomes (PRISM). Nucleic Acids Res. 2015; 43:9645–9662. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B12] 12. Kautsar S.A., Suarez Duran H.G., Blin K., Osbourn A., Medema M.H.. plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters. Nucleic Acids Res. 2016; doi:10.1093/nar/gkx305. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13. Blin K., Medema M.H., Kottmann R., Lee S.Y., Weber T.. The antiSMASH database, a comprehensive database of microbial secondary metabolite biosynthetic gene clusters. Nucleic Acids Res. 2017; 45:D555–D559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14] 14. Medema M.H., Kottmann R., Yilmaz P., Cummings M., Biggins J.B., Blin K., de Bruijn I., Chooi Y.H., Claesen J., Coates R.C. et al. Minimum information about a biosynthetic gene cluster. Nat. Chem. Biol. 2015; 11:625–631. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15. Wolf T., Shelest V., Nath N., Shelest E.. CASSIS and SMIPS: promoter-based prediction of secondary metabolite gene clusters in eukaryotic genomes. Bioinformatics. 2016; 32:1138–1143. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16. Cimermancic P., Medema M.H., Claesen J., Kurita K., Wieland Brown L.C., Mavrommatis K., Pati A., Godfrey P.A., Koehrsen M., Clardy J. et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell. 2014; 158:412–421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17. Finn R.D., Coggill P., Eberhardt R.Y., Eddy S.R., Mistry J., Mitchell A.L., Potter S.C., Punta M., Qureshi M., Sangrador-Vegas A. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016; 44:D279–D285. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18. Röttig M., Medema M.H., Blin K., Weber T., Rausch C., Kohlbacher O.. NRPSpredictor2–a web server for predicting NRPS adenylation domain specificity. Nucleic Acids Res. 2011; 39:W362–W367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19. Prieto C., García-Estrada C., Lorenzana D., Martín J.F.. NRPSsp: non-ribosomal peptide synthase substrate predictor. Bioinformatics. 2012; 28:426–427. [DOI] [PubMed] [Google Scholar]

[B20] 20. Khayatt B.I., Overmars L., Siezen R.J., Francke C.. Classification of the adenylation and acyl-transferase activity of NRPS and PKS systems using ensembles of substrate specific hidden Markov models. PLoS One. 2013; 8:e62136. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21. Baranašić D., Zucko J., Diminic J., Gacesa R., Long P.F., Cullum J., Hranueli D., Starcevic A.. Predicting substrate specificity of adenylation domains of nonribosomal peptide synthetases and other protein properties by latent semantic indexing. J. Ind. Microbiol. Biotechnol. 2014; 41:461–467. [DOI] [PubMed] [Google Scholar]

[B22] 22. Minowa Y., Araki M., Kanehisa M.. Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes. J. Mol. Biol. 2007; 368:1500–1517. [DOI] [PubMed] [Google Scholar]

[B23] 23. Dickschat J.S. Bacterial terpene cyclases. Nat. Prod. Rep. 2016; 33:87–110. [DOI] [PubMed] [Google Scholar]

[B24] 24. Tietz J.I., Schwalen C.J., Patel P.S., Maxson T., Blair P.M., Tai H.C., Zakai U.I., Mitchell D.A.. A new genome-mining tool redefines the lasso peptide biosynthetic landscape. Nat. Chem. Biol. 2017; 13:470–478. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25. Nguyen D.D., Melnik A.V., Koyama N., Lu X., Schorn M., Fang J., Aguinaldo K., Lincecum T.L. Jr, Ghequire M.G.K., Carrion V.J. et al. Indexing the Pseudomonas specialized metabolome enabled the discovery of poaeamide B and the bananamides. Nat. Microbiol. 2016; 2:16197. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26. Hackl S., Bechthold A.. The Gene bldA, a regulator of morphological differentiation and antibiotic production in Streptomyces. Arch. Pharm. 2015; 348:455–462. [DOI] [PubMed] [Google Scholar]

[B27] 27. Leskiw B.K., Bibb M.J., Chater K.F.. The use of a rare codon specifically during development. Mol. Microbiol. 1991; 5:2861–2867. [DOI] [PubMed] [Google Scholar]

[B28] 28. Leskiw B.K., Lawlor E.J., Fernandez-Abalos J.M., Chater K.F.. TTA codons in some genes prevent their expression in a class of developmental, antibiotic-negative, Streptomyces mutants. Proc. Natl. Acad. Sci. U.S.A. 1991; 88:2461–2465. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29] 29. Buchfink B., Xie C., Huson D.H.. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 2015; 12:59–60. [DOI] [PubMed] [Google Scholar]

[B30] 30. Blin K., Pedersen L.E., Weber T., Lee S.Y.. CRISPy-web: an online resource to design sgRNAs for CRISPR applications. Synth. Syst. Biotechnol. 2016; 1:118–121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] 31. Alanjary M., Kronmiller B., Adamek M., Blin K., Weber T., Huson D.H., Philmus B., Ziemert N.. The Antibiotic Resistant Target Seeker (ARTS), an exploration engine for antibiotic cluster prioritization and novel drug target discovery. Nucleic Acids Res. 2017; doi:10.1093/nar/gkx360. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32. Fielding R.T., Taylor R.N.. Principled design of the modern web architecture. ACM Trans. Internet Technol. 2002; 2:115–150. [Google Scholar]

[B33] 33. Hadjithomas M., Chen I.M.A., Chu K., Ratner A., Palaniappan K., Szeto E., Huang J., Reddy T.B.K., Cimermančič P., Fischbach M.A. et al. IMG-ABC: a knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites. Mbio. 2015; 6:e00932. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B34] 34. Vallenet D., Calteau A., Cruveiller S., Gachet M., Lajus A., Josso A., Mercier J., Renaux A., Rollin J., Rouy Z. et al. MicroScope in 2017: an expanding and evolving integrated resource for community expertise of microbial genomes. Nucleic Acids Res. 2017; 45:D517–D528. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35] 35. Gerlt J.A., Bouvier J.T., Davidson D.B., Imker H.J., Sadkhin B., Slater D.R., Whalen K.L.. Enzyme function initiative-enzyme similarity tool (EFI-EST): a web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta. 2015; 1854:1019–1037. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification

Kai Blin

Thomas Wolf

Marc G Chevrette

Xiaowen Lu

Christopher J Schwalen

Satria A Kautsar

Hernando G Suarez Duran

Emmanuel L C de los Santos

Hyun Uk Kim

Mariana Nave

Jeroen S Dickschat

Douglas A Mitchell

Ekaterina Shelest

Rainer Breitling

Eriko Takano

Sang Yup Lee

Tilmann Weber

Marnix H Medema

Abstract

INTRODUCTION

NEW FEATURES AND UPDATES

Improved prediction of gene cluster boundaries

Figure 1.

New algorithms for non-ribosomal peptide and terpene chemistry prediction

Improved RiPP BGC identification and structure prediction

Trans-AT PKS domain alignments

Figure 2.

TTA codon annotation

Usability and efficiency improvements

CONCLUSIONS AND FUTURE PERSPECTIVES

Table 1. Overview of analyzes integrated into antiSMASH.

AVAILABILITY

Supplementary Material

SUPPLEMENTARY DATA

FUNDING

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases