Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2013 May 15;41(Web Server issue):W448–W453. doi: 10.1093/nar/gkt391

BAGEL3: automated identification of genes encoding bacteriocins and (non-)bactericidal posttranslationally modified peptides

Auke J van Heel 1,2, Anne de Jong 1,*, Manuel Montalbán-López 1, Jan Kok 1, Oscar P Kuipers 1,2,*
PMCID: PMC3692055  PMID: 23677608

Abstract

Identifying genes encoding bacteriocins and ribosomally synthesized and posttranslationally modified peptides (RiPPs) can be a challenging task. Especially those peptides that do not have strong homology to previously identified peptides can easily be overlooked. Extensive use of BAGEL2 and user feedback has led us to develop BAGEL3. BAGEL3 features genome mining of prokaryotes, which is largely independent of open reading frame (ORF) predictions and has been extended to cover more (novel) classes of posttranslationally modified peptides. BAGEL3 uses an identification approach that combines direct mining for the gene and indirect mining via context genes. Especially for heavily modified peptides like lanthipeptides, sactipeptides, glycocins and others, this genetic context harbors valuable information that is used for mining purposes. The bacteriocin and context protein databases have been updated and it is now easy for users to submit novel bacteriocins or RiPPs. The output has been simplified to allow user-friendly analysis of the results, in particular for large (meta-genomic) datasets. The genetic context of identified candidate genes is fully annotated. As input, BAGEL3 uses FASTA DNA sequences or folders containing multiple FASTA formatted files. BAGEL3 is freely accessible at http://bagel.molgenrug.nl.

INTRODUCTION

Scientific interest in bacterial antimicrobial peptides and other posttranslationally modified peptides is increasing (1,2). Finding new antibiotic compounds from novel sources to fight multi-drug resistant pathogens has become the focus of many researchers. Furthermore, knowledge about the diverse enzymes involved in posttranslational modifications is rapidly advancing (3–5) and can be used to make new-to-nature antimicrobial peptides (6,7) or to stabilize medically relevant peptides (8). The discovered world of ribosomally synthesized and posttranslationally modified peptides (RiPPs) is constantly expanding. More and more modifications and the enzymes involved are being described (1). With the discovery of each new class new genome mining efforts are triggered. These efforts have led to valuable information and several high-impact publications (4,9–12). The main challenge in these kinds of genome mining efforts is the small size of the genes encoding the peptides of interest. Small open reading frames (ORFs) are often omitted during automated annotation efforts especially when their product sequences do not show strong homology with those of already described peptides, hampering a direct mining approach. Therefore, the large modification enzymes have been used regularly in indirect genome mining efforts. With the design and development of the BActeriocin GEnome mining tooL (BAGEL) since 2005, we aim to facilitate these efforts (13,14). Other useful tools have also been developed, such as the data repository Bactibase (15) and the prediction tool antiSMASH (16), which also supports non-ribosomal peptides but lacks some of the classes supported by the faster BAGEL3. In the current version of BAGEL, BAGEL3, our main goals were to combine direct and indirect mining, generate a simpler, clearer and better quality output, make the analysis more independent of ORF predictions and to facilitate the addition of novel classes of peptides that can be mined for.

IMPLEMENTATION

New in BAGEL3

The major improvement in BAGEL3 is the new dual process (Figure 1), i.e. combining two mining strategies in one procedure. Another major advantage of BAGEL3 is its use of DNA sequences as input instead of annotated genomes, making it less dependent on ORF predictions. Furthermore, novel classes of RiPPs have been implemented, extending the genome mining capabilities of BAGEL3 beyond bacteriocins only. For this purpose new hidden Markov (HMM) models have been added describing specific genes involved in the biosynthesis of cyanobactins (called CyaG after PatG) (17), sactipeptides (called SacCD after TrnCD) (18) and linaridins (called LinL after CypL) (10).

Figure 1.

Figure 1.

Schematic overview of the BAGEL3 genome mining procedure. BAGEL3 uses two different approaches in parallel to find bacteriocins and modified peptides. Both approaches use nucleotide sequences in FASTA format as input. The first approach (left, red) describes how the context-based approach proceeds. The second approach (right, blue) describes the simpler precursor peptide-based mining. Finally, both methods generate a single summary table with links to detailed graphical reports.

BAGEL3 databases

BAGEL3 uses three different databases containing modified or unmodified bacteriocins and other posttranslationally modified peptides (non-bactericidal). The databases have been thoroughly updated. Each database contains all the records belonging to one of the three classes of proteins internal to BAGEL3: Class I contains posttranslationally modified peptides <10 kDa, the modification enzymes of which are encoded in the genomic context of the modified peptide and have been described for more than one case; Class II contains posttranslationally modified peptides <10 kDa not fitting the criteria of the first database; Class III contains anti-microbial proteins >10 kDa. This division is based on the procedure used by BAGEL3 to identify these proteins. The databases can be viewed online (http://bagel.molgenrug.nl/index.php/bacteriocin-database/) and have web links to literature, UniprotKB and NCBI. Users are actively encouraged to add new records to these databases via a web form (http://bagel.molgenrug.nl/index.php/submit-a-bacteriocin).

Description of the software

BAGEL3 uses DNA nucleotide sequences in FASTA format as input; multiple sequence entries per file are allowed. These DNA sequences are analyzed in parallel using two different approaches, one based on finding genes commonly found in the context of bacteriocin or RiPP genes, the other based on finding the gene itself.

The indirect approach (left red box in Figure 1) starts with performing a simple ORF call on the DNA. This call looks for ORFs of a certain minimal length that have a start and a stop codon not taking into account the presence of a possible ribosome-binding site. The products of these ORFs are subsequently screened for the presence of protein domains. Simple and defined rules based on these protein domains are then used to decide which part of the nucleotide sequence should be analyzed in more detail. These DNA sequences are called area(s) of interest (AOI). The size of the area is set to 20 K base pairs centered on the identified context gene. The ORFs in the AOI are then called using Glimmer (19). The next essential step is an additional specialized simple ORF call for every AOI to find the small ORFs that encode the targets of the identified modification enzymes. This ORF call takes into account the rule that was used to identify this AOI, so that when BAGEL3 is, for example, looking for a lanthipeptide, it will only call small ORFs encoding cysteine-containing peptides. Next, the context is annotated using the PFAM database (blast against Uniprot database is also possible in the stand-alone version). The last step is to identify the RiPP gene(s) that are present in the AOI. This is done using the results of both a Blast search against the BAGEL3 Class I and Class II databases and a screening for known motifs. If no direct hit is obtained then BAGEL3 predicts a precursor peptide sequence based on sequence properties and genomic organization.

The direct approach (right blue in box Figure 1) starts with a Glimmer ORF call. Next, the ORFs are blasted against all the three databases. The context (20 K base pair) of Blast hits is annotated using the PFAM database (a blast against Uniprot database is also possible in the stand-alone version).

Because the same peptide could be identified with both approaches, the results of both are compared and filtered to exclude duplicates. Peptides identified via context genes are classified using this information (Table 1). Unmodified peptides identified via homology are classified according to their best Blast hit. Finally, an html output with graphics is generated from the large basic results table (see Figure 2). The whole process is logged into a log file. The nucleotide sequences of the identified AOIs can be downloaded.

Table 1.

Currently supported classes of RiPPs and the rules used to identify potential clusters

Name Rule
Bottromycin (PF04055) AND (PF02624)
Cyanobactin (CyaG)
Glycocin (TIGR04195) AND (PF03412)
Lanthipeptide class II (PF05147) AND (PF13575)
Lanthipeptide class I (PF04737|PF04738|PF14028) AND (PF05147)
Lanthipeptide class III (lanKC)
Lanthipeptide class IV (PF05147) AND (LanL)
LAPs (PF02624) AND (PF00881)
Lasso peptide (PF13471) AND (PF00733)
Linaridin (LinL)
Microcin (PF02794)
Sactipeptides (SacCD) AND (PF04055)
Thiopeptide (PF02624) AND (PF00881) AND (PF14028)

| = or AND = additional requirement. The rules in this table describe the criteria that have to be matched by a certain stretch of DNA to become an AOI. Some rules might overlap, and therefore they are checked in an ordered fashion. In this way, the more stringent rule is checked after the less stringent.

Figure 2.

Figure 2.

Example detailed report of a lantibiotic cluster encoding a nisin variant and its modification enzymes found in Streptococcus suis J14 (NC_017618.1) using BAGEL3. The target gene (smallORF_6) was in this case identified by the specialized small ORF calling procedure.

Availability

The BAGEL3 web server can be accessed and used freely for files up to 20 MB (http://bagel.molgenrug.nl). A stand-alone Linux version is available on request for local installation. The stand-alone version can be easily adapted to personal preferences using a comprehensive configuration file.

System requirements

BAGEL3 runs on an Ubuntu Linux platform (http://www.ubuntu.com) with Apache web server (version2.2), MySQL server (version 14.14), PHP 5.4 (http://www.php.net/), Perl 5.10 (http://www.perl.org/), BioPerl 1.6.9 (http://www.bioperl.org/) and Joomla. Furthermore, the following software packages are used: BLAST 2.2.27 (20); HMMsearch 3.0 (http://hmmer.janelia.org/); Glimmer 3.02 (http://www.cbcb.umd.edu/software/glimmer/) (19), Pfamscan of the Sanger institute (ftp://ftp.sanger.ac.uk/pub/databases/Pfam/Tools/) and the UniRef50 database (http://www.ebi.ac.uk/uniref/).

Validation of BAGEL 3

The BAGEL3 software was validated using a set of 50 genomes known to encode bacteriocins and other modified peptides. It was checked that no known compounds were missed. Next, to validate if novel clusters could be identified, 200 draft genome sequences from the NCBI server were screened. Both these sets of genomes were also used to check for false positives. A false positive was defined as a cluster that did not have at least a likely core peptide or a gene context that can be associated with RiPP biosynthesis.

RESULTS AND DISCUSSION

Analysis of example genomes

Based on the newly added HMM CyaG, which identifies the serine protease, generally termed G protein, in the cyanobactin biosynthesis pathway (21), we found a new cyanobactin encoded in the genome of the cyanobacterium Lyngbya sp PCC8106 (see Table 2). In Enterococcus faecalis Fly1, BAGEL3 identified an interesting novel lantibiotic gene cluster. The cluster could code for two lanthipeptides, which is common for two-component lantibiotics, but in this case they are not modified by a LanM type enzyme but by a single set of LanBC enzymes. Another example of the added value of BAGEL3 is demonstrated when querying the plasmid pTEF2 of E. faecalis V538. Based on the context, BAGEL3 identifies a glycocin-like peptide (Table 2). Glycocins are glycosylated antimicrobial peptides of which Glycocin S and Sublancin 168 are the only two characterized members that also contain disulfide bridges (1). The identified peptide has an exact match with the previously described bacteriocin Enterocin 96 (22). In the article describing Enterocin 96, the authors note that the measured mass is higher than the theoretical peptide mass. This mass difference is perfectly in line with the BAGEL3 predicted glycosylation, which has also been suggested by others (23). In the genome of the honey bee pathogen Paenibacillus larvae subsp larvae BRL 230010, BAGEL3 identified the gene for a so-called sactipeptide that shows low homology to sporulation killing factor SfkA of Bacillus subtilis 168. In the genome of the gram negative pathogen Burkholderia pseudomallei, 354a, a lasso peptide, was identified with strong homology to capistruin (24).

Table 2.

A selection of novel RIPPs identified by BAGEL3

DNA screened Homology (P-value) Identification Sequence
P. larvae subsp larvae BRL 230010 Ctg01135 Sporulation-killingfactor_skfA [1e-10] Context: SacCD Sactipeptide:
MSNHNVRNEPAPAWESSAQNNLSKPAGIPLIKSVGCAACWGAK NISLTRACLPPTPINLAL
pTEF2 E. fecalis V538 Enterocin_96 [2e-41] (exact match) context: TIGR04195 Glycocin:
MLNKKLLENGVVNAVTIDELDAQFGGMSKRDCNL MKACCAGQAVTYAIHSLLNRLGGDSSDPAGCNDIVRKYCK
PF03412
E. faecalis Fly1 cont1.76 leader_abc mature_ab PF02052.7 leaderLanBC context: Lantibiotic:
PF04737.5 PF04738.5 MPKYDDFDLNLKQTSASNQKDTRVTSVMACTPGTCNNKCPN TNWLCSNVCVTKTCWTCA
PF05147.5
E. faecalis Fly1 cont1.76 leader_abc PF02052.7 leaderLanBC context: Lantibiotic:
PF04737.5 PF04738.5 MPKYDDFDLNLKQNVSSSNKEPRITSIKWCTPG TCNNTCKGDSTLKSNCCGGSLMCSLGGC
PF05147.5
Lyngbya sp PCC 8106 Trunkamide[1e-06] Context: Cyanobactin:
CyaG MPCYPSYDGVDASVCMPCYPSYDGVDASVCMPCYP SYDDAE
B. pseudomallei 354a Contig0218 Capistruin[5e-23] Context: Lasso peptide:
PF13471.1 MVRFLAKLLRSTIHGSHGVSLDAVSSTHGTPGFQTPDARV ISRFGFN
PF00733.16

Simple small ORF calling procedure facilitates genome mining

A big problem when screening large amounts of genomic data using BAGEL2 was its dependence on the quality of the annotation. Multiple different ORF prediction procedures were therefore implemented. Consequently, several results had to be compared while still some of the small ORFs encoding antimicrobial peptides were lacking, creating the need for manual evaluation of some of the identified gene clusters. To remove this dependency, we now use a simple small ORF calling procedure for the AOIs identified by their context, obviating the need for reannotation and simplifying the procedure and the analyses.

BAGEL3 is extensible

The novel mining approach used in BAGEL3 has the big advantage that it can easily be extended to include new classes of modified peptides, the only requirement being that the gene of the small peptide of interest lies in a genomic context that can be recognized. The genomic context has to be translated into a simple rule that describes a certain AOI (for examples see Table 1). The described precursor peptides should then be added to the database. If requirements for the precursor peptides are known (for example, must contain a cysteine), these can be added. The context rule should be tested to check if it is specific enough. Adding a new class of peptides to the system can be done in within a few hours if an extensive literature review is available. Users are encouraged to submit novel rules/classes via the online form available on the BAGEL3 web site.

CONCLUSIONS

BAGEL3 is a versatile fast genome-mining tool valid not only for modified and non-modified bacteriocins but also for non-bactericidal RiPPs. It can handle large data sets like those from metagenome projects. This updated version looks for bacteriocins/RiPPs via two different approaches, which increases the success rate and lowers the need for manual evaluation of results. The new design also allows for easy inclusion of novel classes of peptides, and users are therefore encouraged to propose the addition of novel classes.

ACKNOWLEDGEMENTS

The authors thank Martijn Herber for maintaining the Linux servers and Marnix Medema for his useful suggestion of skipping ORF prediction all together. Additionally, the authors would like to thank the anonymous reviewers for the suggested features and rules that have been implemented.

FUNDING

NWO-STW programme GenBiotics [project no.10466 to A.G.v.H.]; ESF-ALW grant within the EuroSynBio programme SynMod (to M.M.-L.). Funding for open access charge: The Molecular Genetics group from the university of groningen GBB.

Conflict of interest statement. None declared.

REFERENCES

  • 1.Arnison PG, Bibb MJ, Bierbaum G, Bowers AA, Bugni TS, Bulaj G, Camarero JA, Campopiano DJ, Challis GL, Clardy J, et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 2013;30:108–160. doi: 10.1039/c2np20085f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Cotter PD, Ross RP, Hill C. Bacteriocins—a viable alternative to antibiotics? Nat. Rev. Microbiol. 2012;11:95–105. doi: 10.1038/nrmicro2937. [DOI] [PubMed] [Google Scholar]
  • 3.Zhang Q, Yu Y, Vélasquez JE, van der Donk WA. Evolution of lanthipeptide synthetases. Proc. Natl Acad. Sci. USA. 2012;109:18361–18366. doi: 10.1073/pnas.1210393109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Freeman MF, Gurgui C, Helf MJ, Morinaka BI, Uria AR, Oldham NJ, Sahl H, Matsunaga S, Piel J. Metagenome mining reveals polytheonamides as posttranslationally modified ribosomal peptides. Science. 2012;338:387–390. doi: 10.1126/science.1226121. [DOI] [PubMed] [Google Scholar]
  • 5.Oman TJ, Boettcher JM, Wang H, Okalibe XN, van der Donk WA. Sublancin is not a lantibiotic but an S-linked glycopeptide. Nat. Chem. Biol. 2011;7:78–80. doi: 10.1038/nchembio.509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Huo L, Rachid S, Stadler M, Wenzel SC, Müller R. Synthetic biotechnology to study and engineer ribosomal bottromycin biosynthesis. Chem. Biol. 2012;19:1278–1287. doi: 10.1016/j.chembiol.2012.08.013. [DOI] [PubMed] [Google Scholar]
  • 7.van Heel AJ, Mu D, Montalbán-López M, Hendriks D, Kuipers OP. Designing and producing modified, new-to-nature, peptides with antimicrobial activity by use of a combination of various lantibiotic modification enzymes. ACS Synth. Biol. 2013 doi: 10.1021/sb3001084. doi: 10.1021/sb3001084. [DOI] [PubMed] [Google Scholar]
  • 8.Kluskens LD, Nelemans SA, Rink R, de Vries L, Meter-Arkema A, Wang Y, Walther T, Kuipers A, Moll GN, Haas M. Angiotensin-(1-7) with thioether bridge: an angiotensin-converting enzyme-resistant, potent angiotensin-(1-7) analog. J. Pharmacol. Exp. Ther. 2009;328:849–854. doi: 10.1124/jpet.108.146431. [DOI] [PubMed] [Google Scholar]
  • 9.Maksimov MO, Pelczer I, Link AJ. Precursor-centric genome-mining approach for lasso peptide discovery. Proc. Natl Acad. Sci. USA. 2012;109:15223–15228. doi: 10.1073/pnas.1208978109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Claesen J, Bibb M. Genome mining and genetic analysis of cypemycin biosynthesis reveal an unusual class of posttranslationally modified peptides. Proc. Natl Acad. Sci. USA. 2010;107:16297–16302. doi: 10.1073/pnas.1008608107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Velásquez JE, Van Der Donk WA. Genome mining for ribosomally synthesized natural products. Curr. Opin. Chem. Biol. 2011;15:11–21. doi: 10.1016/j.cbpa.2010.10.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Lee SW, Mitchell DA, Markley AL, Hensler ME, Gonzalez D, Wohlrab A, Dorrestein PC, Nizet V, Dixon JE. Discovery of a widely distributed toxin biosynthetic gene cluster. Proc. Natl Acad. Sci. USA. 2008;105:5879–5884. doi: 10.1073/pnas.0801338105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.de Jong A, van Hijum SAFT, Bijlsma JJE, Kok J, Kuipers OP. BAGEL: a web-based bacteriocin genome mining tool. Nucleic Acids Res. 2006;34:W273. doi: 10.1093/nar/gkl237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.de Jong A, van Heel AJ, Kok J, Kuipers OP. BAGEL2: mining for bacteriocins in genomic data. Nucleic Acids Res. 2010;38:W647–W651. doi: 10.1093/nar/gkq365. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hammami R, Zouhir A, Le Lay C, Hamida JB, Fliss I. BACTIBASE second release: a database and tool platform for bacteriocin characterization. BMC Microbiol. 2010;10:22. doi: 10.1186/1471-2180-10-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E, Breitling R. antiSMASH: Rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011;39:W339–W346. doi: 10.1093/nar/gkr466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Schmidt EW, Nelson JT, Rasko DA, Sudek S, Eisen JA, Haygood MG, Ravel J. Patellamide A and C biosynthesis by a microcin-like pathway in prochloron didemni, the cyanobacterial symbiont of lissoclinum patella. Proc. Natl Acad. Sci. USA. 2005;102:7315–7320. doi: 10.1073/pnas.0501424102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rea MC, Sit CS, Clayton E, O'Connor PM, Whittal RM, Zheng J, Vederas JC, Ross RP, Hill C. Thuricin CD, a posttranslationally modified bacteriocin with a narrow spectrum of activity against clostridium difficile. Proc. Natl Acad. Sci. USA. 2010;107:9352–9357. doi: 10.1073/pnas.0913554107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with glimmer. Bioinformatics. 2007;23:673–679. doi: 10.1093/bioinformatics/btm009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  • 21.Schmidt EW, Nelson JT, Rasko DA, Sudek S, Eisen JA, Haygood MG, Ravel J. Patellamide A and C biosynthesis by a microcin-like pathway in prochloron didemni, the cyanobacterial symbiont of lissoclinum patella. Proc. Natl Acad. Sci. USA. 2005;102:7315–7320. doi: 10.1073/pnas.0501424102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Izquierdo E, Wagner C, Marchioni E, Aoude-Werner D, Ennahar S. Enterocin 96, a novel class II bacteriocin produced by enterococcus faecalis WHE 96, isolated from munster cheese. Appl. Environ. Microbiol. 2009;75:4273–4276. doi: 10.1128/AEM.02772-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Stepper J, Shastri S, Loo TS, Preston JC, Novak P, Man P, Moore CH, Havlcek V, Patchett ML, Norris GE. Cysteine S-glycosylation, a new post-translational modification found in glycopeptide bacteriocins. FEBS Lett. 2011;585:645–650. doi: 10.1016/j.febslet.2011.01.023. [DOI] [PubMed] [Google Scholar]
  • 24.Knappe TA, Linne U, Zirah S, Rebuffat S, Xie X, Marahiel MA. Isolation and structural characterization of capistruin, a lasso peptide predicted from the genome sequence of burkholderia thailandensis E264. J. Am. Chem. Soc. 2008;130:11446–11454. doi: 10.1021/ja802966g. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES