Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2015 Oct 13;44(Database issue):D1094–D1097. doi: 10.1093/nar/gkv1051

CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides

Faiza Hanif Waghu 1, Ram Shankar Barai 1, Pratima Gurung 1, Susan Idicula-Thomas 1,*
PMCID: PMC4702787  PMID: 26467475

Abstract

Antimicrobial peptides (AMPs) are known to have family-specific sequence composition, which can be mined for discovery and design of AMPs. Here, we present CAMPR3; an update to the existing CAMP database available online at www.camp3.bicnirrh.res.in. It is a database of sequences, structures and family-specific signatures of prokaryotic and eukaryotic AMPs. Family-specific sequence signatures comprising of patterns and Hidden Markov Models were generated for 45 AMP families by analysing 1386 experimentally studied AMPs. These were further used to retrieve AMPs from online sequence databases. More than 4000 AMPs could be identified using these signatures. AMP family signatures provided in CAMPR3 can thus be used to accelerate and expand the discovery of AMPs. CAMPR3 presently holds 10247 sequences, 757 structures and 114 family-specific signatures of AMPs. Users can avail the sequence optimization algorithm for rational design of AMPs. The database integrated with tools for AMP sequence and structure analysis will be a valuable resource for family-based studies on AMPs.

INTRODUCTION

Antimicrobial peptides (AMPs) are host defense molecules produced by a wide range of organisms including bacteria or protozoa as well as animals, where they are produced by the innate immune system (1). AMPs kill microbes via various mechanisms, such as destruction of the microbial membrane, inhibition of macromolecule synthesis (24) etc. Due to these multiple mechanisms of action, it is difficult for microbes to gain resistance against AMPs as compared to conventional antibiotics. Few of the naturally occurring AMPs have also been observed to regulate various physiological functions such as anti-inflammatory properties, angiogenesis and wound healing besides their antimicrobial activity (5,6).

Development in sequencing technology has accelerated availability of genomic and proteomic data of various organisms in public sequence repositories. The annotations of AMPs in these large data sets using wet-lab methods are cost and resource-intensive. AMPs belong to various AMP families. These families exhibit distinctive sequence composition such as cysteine conservation in defensins (7), abundance of histidines in histatins (8), conservation of unusual amino acid such as aminoisobutyric acid in peptaibols (9) and lanthionine in bacteriocins (lantibiotics) (10) etc. This family-specific sequence conservation can be exploited to identify AMPs from a large pool of sequence data. Family-based signatures such as patterns and Hidden Markov Models (HMMs) can be powerful tools to retrieve and annotate sequences available in sequence databases.

Sequence signatures (patterns and HMMs) present in 1386 experimentally studied AMPs represented by 45 families were generated and used to fetch AMPs from sequence databases. This data has been collated and presented as an update to CAMP database. CAMPR3 currently holds 10247 sequences, 757 structures and 114 signatures present in 45 AMP families.

MATERIALS AND METHODS

Data collection and organization

Sequences, structures and family information of AMPs

To update the existing CAMP database (11), protein data available in NCBI (12), UniProtKB (13) and PDB (14) databases post 2013 was queried using appropriate keywords such as ‘antimicrobial’, ‘antibacterial’, ‘antifungal’, ‘antiviral’ and ‘antiparasitic’. The obtained hits were manually curated to extract information on sequence, structure, protein definition, accession numbers, reference literature, activity, taxonomy of the source organism, target organisms with minimum inhibitory concentration (MIC) values, hemolytic activity of the peptide and protein family descriptions. This information is made available in CAMPR3. Links to UniProtKB, PDB, PubMed (12) and other databases dedicated to AMPs are also made available for the benefit of the users.

Signatures of AMPs

Experimentally validated AMPs, whose family information is available in CAMP (11) was used to generate family-based signatures. Families containing at least two members were considered for signature creation. 1386 sequences, representing 45 AMP families were used to generate patterns and HMMs. PRATT tool (15) was used for generation of patterns. Multiple sequence alignments of each AMP family were created using Clustal Omega (16,17) and these were used as input to build HMM models using ‘hmmbuild’ program of HMMER 3.1b1 package (18). A heuristically determined fitness value of 26 or above was used as a threshold for selecting patterns for retrieval of sequences. Since length is an important parameter for sequence alignment, length-based patterns and HMMs were also created. The generated patterns and HMMs were queried against the protein database of NCBI and UniProtKB using ScanProsite tool (19) and jackhmmer tool of HMMER web server (20), respectively, to retrieve hits. The HMMs were queried until convergence or stopped after three iterations. Sequences retrieved using HMMs, having a threshold e-value below 0.005 were considered for further screening. The retrieved hits were curated based on their AMP definitions. For each retrieved AMP; information related to sequence, protein definition, accession numbers, activity, source organism, target organisms, protein family descriptions and links to databases like UniProtKB and PubMed along with the generated signatures are provided in CAMPR3.

Protein sequences, whose definition suggested antimicrobial activity and had at least one supporting literature reference in PubMed proving its antimicrobial activity by wet-lab methods, were included in the Experimentally Validated data set. 590 sequences were retrieved from APD2 (21). These sequences are integrated in the Experimentally Validated or Predicted data set based on the annotation provided by APD2.

AMPs that have annotations indicating their antimicrobial activity but do not have supporting PubMed reference literature were included in the Predicted data set. These sequences are predicted to be antimicrobial either based on their GO (22)/Pfam (23)/InterPro (24)/UniProtKB/NCBI annotations or they were retrieved based on the AMP family signatures.

Algorithm for rational design of AMPs

An in-house Perl script was created to generate all possible single residue substitutions of user defined sequence/s. These sequences are then run through the prediction models (Support Vector Machines (SVMs), Random Forests (RF) and Discriminant analysis (DA)) generated and available in the previous release of CAMP database (11).

Database architecture

The database is built using MySQL Server 5.1.33 as back-end and the front-end is built using PHP, HTML, JavaScript, Open Flash Chart 2 and Perl. The database is hosted on Apache web server 2.2.11. Statistical software R version 2.9.1 (25) was used for development of the prediction server. JSmol viewer (http://wiki.jmol.org/index.php/JSmol) has been integrated for AMP structure visualization.

A brief description of the user interface of CAMPR3 is provided as follows.

Home: the home page provides information about various features of the database.

Databases: the data is divided into four databases which include sequence, structure, patents and the newly incorporated signature database.

Tools: the database includes the following tools for analysis. The AMP prediction tool has been developed in-house. Access to various tools relevant to sequence/structure analysis and available in public domain have also been provided in CAMPR3 for the benefit of the users.

  1. AMP prediction: users can (i) predict AMPs (ii) predict antimicrobial region within peptides and (iii) rationally design AMPs by generating an exhaustive combinatorial library of sequences for a user-defined sequence and predict effect of single residue substitutions on antimicrobial activity using SVMs, RF and DA.

  2. BLAST: users can use BLAST tool (26) to query protein sequence/s against various data sets of CAMPR3 which include the entire database, sequence, structure, patent, experimentally validated, predicted and predicted based on signature data sets to find homologous sequences, structures and other relevant information.

  3. Clustal Omega: users can use Clustal Omega tool of EMBL-EBI to obtain multiple sequence alignment of peptides.

  4. Vector Alignment Search Tool: users can identify similar protein structures and distant homologs that cannot be identified by sequence comparison using VAST of NCBI (27).

  5. PRATT: users can generate AMP family-specific patterns using this tool from ExPASy.

  6. ScanProsite: using this tool from Swiss Institute of Bioinformatics, users can (i) scan proteins against the PROSITE collection of PSSMs/patterns; (ii) scan patterns against protein sequence, structure or user defined database/s and (iii) scan user defined patterns against a set of protein sequences.

  7. PHI-BLAST: users can use PHI-BLAST (28) to find AMPs similar to the query based on a family-specific pattern.

  8. jackhmmer: users can iteratively search a protein sequence/structure database using a set of protein sequences/multiple sequence alignment/HMM as an input to find homologs using this tool from EMBL-EBI.

Search: basic and advanced search options are available for search of AMP families/sequences/structures and signatures.

Links: links to other online AMP databases are provided.

Statistics: information on CAMPR3 statistics can be viewed.

Help: detailed description and use of the various features and tools incorporated in the database is provided for the benefit of the users.

RESULTS AND DISCUSSION

CAMPR3 provides comprehensive information on AMPs and their families as represented by their sequences, structures, activity, signatures, source and target organisms. The unique feature of CAMPR3 as compared to other AMP databases is that information of family-specific signatures has been provided for a large set of both eukaryotic as well as prokaryotic AMPs. It presently contains 114 AMP family-specific sequence signatures (36 patterns and 78 HMMs). Using these signatures, a total of 4222 AMPs were identified, out of which 2739 were absent in the previous CAMP database.

Use of signatures is particularly significant for retrieving sequences that have to be queried specifically by their definitions. For example, AMPs such as thionin-2.1 (UniProt ID: Q42596), varv peptide A/kalata-B1 (UniProt ID: Q5USN7) etc. could not be retrieved from UniProtKB database using search keywords such as ‘antimicrobial’ but could be retrieved using their family signatures.

CAMPR3 currently holds 10247 AMP sequences, of which 4857 are experimentally validated, and 5390 are predicted. Of these, 3491 have been recently identified. The structure database has also been updated to include 757 antimicrobial structures.

Sequence composition is an important determinant of antimicrobial activity. It has been well demonstrated by antimicrobial assays of AMPs and their analogues that minor variations in peptide sequence can drastically alter its antimicrobial activity (29). The prediction algorithm for AMPs, available in CAMPR3 now includes an additional feature for rational design of AMPs. This feature can be used to predict the effect of single residue substitutions on antimicrobial activity.

The features incorporated in CAMPR3 will significantly promote AMP family-based studies. AMPs belonging to a particular AMP family can be effortlessly obtained using the family-based search. This feature, along with the family signatures and tools available in CAMPR3 for sequence and structure analysis, will allow users to study the various AMP families independently and effectively.

CONCLUSION

The database is available for retrieval of sequences/structures/patents/signatures and families of AMPs. Comparison of CAMPR3 with the existing databases dedicated to AMPs is presented in Table 1. AMPs that are not easily retrievable using simple keyword search have been identified/retrieved from public sequence databases using family signatures.

Table 1. Comparison of CAMPR3 with few of the existing AMP databases.

Database Sequences Structures Signatures Nature of data Reference
CAMPR3 10247 757 114 (36 Patterns and 78 HMMs) General -
APD2 2604 350 Absent General (21)
AMPer 1298 Absent 186 HMMs Eukaryotic AMPs (30)
LAMP 5548 Presenta Absent General (31)
BACTIBASE 228 72 Presenta Bacteriocins (32)
YADAMP 2525 Presenta Absent General (33)
PhytAMP 273 39 Presenta Plant AMPs (34)
Peptaibiotics database 1344 Absent Absent Peptaibols (35)
Defensins Knowledgebase 566 Presenta Absent Defensins (36)

aDifficult to retrieve total count.

The highlights of this updated database are as follows.

  1. Massive update on AMP sequences and structures (10247 AMP sequences and 757 AMP structures).

  2. Family-specific signatures of eukaryotic and prokaryotic AMPs.

  3. Sequence optimisation prediction algorithm for antimicrobial activity.

CAMPR3 has been developed with an objective to expand and accelerate research on AMPs.

Acknowledgments

The authors are grateful to Dr Smita D. Mahale (PI of Biomedical Informatics Centre) for all the assistance and support. They also acknowledge the help provided by Ms Shaini Joseph and Mr Lijin Gopi in data collection and design of the web interface, respectively.

FUNDING

This work [RA/296/09-2015] was supported by grants from Department of Science and Technology, Government of India [SB/S3/CE/028/2013] and Indian Council of Medical Research. The open access publication charge for this paper has been waived by Oxford University Press - NAR.

Conflict of interest statement. None declared.

REFERENCES

  • 1.Cruz J., Ortiz C., Guzmán F., Fernández-Lafuente R., Torres R. Antimicrobial peptides: promising compounds against pathogenic microorganisms. Curr. Med. Chem. 2014;21:2299–2321. doi: 10.2174/0929867321666140217110155. [DOI] [PubMed] [Google Scholar]
  • 2.Haney E.F., Petersen A.P., Lau C.K., Jing W., Storey D.G., Vogel H.J. Mechanism of action of puroindoline derived tryptophan-rich antimicrobial peptides. Biochim. Biophys. Acta. 2013;1828:1802–1813. doi: 10.1016/j.bbamem.2013.03.023. [DOI] [PubMed] [Google Scholar]
  • 3.Roy R.N., Lomakin I.B., Gagnon M.G., Steitz T.A. The mechanism of inhibition of protein synthesis by the proline-rich peptide oncocin. Nat. Struct. Mol. Biol. 2015;22:466–469. doi: 10.1038/nsmb.3031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wang S., Thacker P.A., Watford M., Qiao S. Functions of Antimicrobial Peptides in Gut Homeostasis. Curr. Protein Pept. Sci. 2015;16:582–591. doi: 10.2174/1389203716666150630135847. [DOI] [PubMed] [Google Scholar]
  • 5.Frasca L., Lande R. Role of defensins and cathelicidin LL37 in auto-immune and auto-inflammatory diseases. Curr. Pharm. Biotechnol. 2012;13:1882–1897. doi: 10.2174/138920112802273155. [DOI] [PubMed] [Google Scholar]
  • 6.Guilhelmelli F., Vilela N., Albuquerque P., Derengowski Lda, S., Silva-Pereira I., Kyaw C.M. Antibiotic development challenges: the various mechanisms of action of antimicrobial peptides and of bacterial resistance. Front. Microbiol. 2013;4:1–12. doi: 10.3389/fmicb.2013.00353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ganz T. Defensins: antimicrobial peptides of innate immunity. Nat. Rev. Immunol. 2013;3:710–720. doi: 10.1038/nri1180. [DOI] [PubMed] [Google Scholar]
  • 8.van Dijk I.A., Nazmi K., Bolscher J.G., Veerman E.C., Stap J. Histatin-1, a histidine-rich peptide in human saliva, promotes cell-substrate and cell-cell adhesion. FASEB J. 2015;29:3124–3132. doi: 10.1096/fj.14-266825. [DOI] [PubMed] [Google Scholar]
  • 9.Duclohier H. Antimicrobial peptides and peptaibols, substitutes for conventional antibiotics. Curr. Pharm. Des. 2010;16:3212–3223. doi: 10.2174/138161210793292500. [DOI] [PubMed] [Google Scholar]
  • 10.Lohans C.T., Vederas J.C. Structural characterization of thioether-bridged bacteriocins. J. Antibiot. (Tokyo). 2014;67:23–30. doi: 10.1038/ja.2013.81. [DOI] [PubMed] [Google Scholar]
  • 11.Waghu F.H., Gopi L., Barai R.S., Ramteke P., Nizami B., Idicula-Thomas S. CAMP: collection of sequences and structures of antimicrobial peptides. Nucleic Acids Res. 2014;42:D1154–D1158. doi: 10.1093/nar/gkt1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2015;43:D6–D17. doi: 10.1093/nar/gku1130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204–D212. doi: 10.1093/nar/gku989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Rose P.W., Prlić A., Bi C., Bluhm W.F., Christie C.H., Dutta S., Green R.K., Goodsell D.S., Westbrook J.D., Woo J., et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 2015;43:D345–D356. doi: 10.1093/nar/gku1214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jonassen I., Collins J.F., Higgins D.G. Finding flexible patterns in unaligned protein sequences. Protein Sci. 1995;4:1587–1595. doi: 10.1002/pro.5560040817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., Thompson J.D., Higgins D.G. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011;7:1–6. doi: 10.1038/msb.2011.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.McWilliam H., Li W., Uludag M., Squizzato S., Park Y.M., Buso N., Cowley A.P., Lopez R. Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res. 2013;41:W597–W600. doi: 10.1093/nar/gkt376. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Eddy S.R. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. doi: 10.1093/bioinformatics/14.9.755. [DOI] [PubMed] [Google Scholar]
  • 19.de Castro E., Sigrist C.J., Gattiker A., Bulliard V., Langendijk-Genevaux P.S., Gasteiger E., Bairoch A., Hulo N. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 2006;34:W362–W365. doi: 10.1093/nar/gkl124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Finn R.D., Clements J., Eddy S.R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:W29–W37. doi: 10.1093/nar/gkr367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Wang G., Li X., Wang Z. APD2: the updated antimicrobial peptide database and its application in peptide design. Nucleic Acids Res. 2009;37:D933–D937. doi: 10.1093/nar/gkn823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015;43:D1049–D1056. doi: 10.1093/nar/gku1179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Finn R.D., Bateman A., Clements J., Coggill P., Eberhardt R.Y., Eddy S.R., Heger A., Hetherington K., Holm L., Mistry J., et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–D230. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Mitchell A., Chang H.Y., Daugherty L., Fraser M., Hunter S., Lopez R., McAnulla C., McMenamin C., Nuka G., Pesseat S., et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 2015;43:D213–D221. doi: 10.1093/nar/gku1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2009. [Google Scholar]
  • 26.Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Gibrat J.F., Madej T., Bryant S.H. Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 1996;6:377–385. doi: 10.1016/s0959-440x(96)80058-3. [DOI] [PubMed] [Google Scholar]
  • 28.Zhang Z., Schäffer A.A., Miller W., Madden T.L., Lipman D.J., Koonin E.V., Altschul S.F. Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 1998;26:3986–3990. doi: 10.1093/nar/26.17.3986. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Vila-Perelló M., Sánchez-Vallet A., García-Olmedo F., Molina A, Andreu D. Synthetic and structural studies on Pyrularia pubera thionin: a single-residue mutation enhances activity against Gram-negative bacteria. FEBS Lett. 2003;536:215–219. doi: 10.1016/s0014-5793(03)00053-x. [DOI] [PubMed] [Google Scholar]
  • 30.Fjell C.D., Hancock R.E., Cherkasov A. AMPer: a database and an automated discovery tool for antimicrobial peptides. Bioinformatics. 2007;23:1148–1155. doi: 10.1093/bioinformatics/btm068. [DOI] [PubMed] [Google Scholar]
  • 31.Zhao X., Wu H., Lu H., Li G., Huang Q. LAMP: A Database Linking Antimicrobial Peptides. PLoS One. 2013;8:e66557. doi: 10.1371/journal.pone.0066557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hammami R., Zouhir A., Le Lay C., Ben Hamida J., Fliss I. BACTIBASE second release: a database and tool platform for bacteriocin characterization. BMC Microbiol. 2010;10:1–5. doi: 10.1186/1471-2180-10-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Piotto S.P., Sessa L., Concilio S., Iannelli P. YADAMP: yet another database of antimicrobial peptides. Int. J. Antimicrob. Agents. 2012;39:346–351. doi: 10.1016/j.ijantimicag.2011.12.003. [DOI] [PubMed] [Google Scholar]
  • 34.Hammami R., Ben Hamida J., Vergoten G., Fliss I. PhytAMP: a database dedicated to antimicrobial plant peptides. Nucleic Acids Res. 2009;37:D963–D968. doi: 10.1093/nar/gkn655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Neumann N.K., Stoppacher N., Zeilinger S., Degenkolb T., Brückner H., Schuhmacher R. The peptaibiotics database–a comprehensive online resource. Chem. Biodivers. 2015;12:743–751. doi: 10.1002/cbdv.201400393. [DOI] [PubMed] [Google Scholar]
  • 36.Seebah S., Suresh A., Zhuo S., Choong Y.H., Chua H., Chuon D., Beuerman R., Verma C. Defensins knowledgebase: a manually curated database and information source focused on the defensins family of antimicrobial peptides. Nucleic Acids Res. 2007;35:D265–D268. doi: 10.1093/nar/gkl866. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES