Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2025 Mar 18;34(4):e70083. doi: 10.1002/pro.70083

Mining the UniProtKB/Swiss‐Prot database for antimicrobial peptides

Chenkai Li 1,2, Darcy Sutherland 1,3,4, Ali Salehi 1,3, Amelia Richter 1,3, Diana Lin 1, Sambina Islam Aninta 1, Hossein Ebrahimikondori 1,2, Anat Yanai 1,3, Lauren Coombe 1, René L Warren 1, Monica Kotkoff 1, Linda M N Hoang 3,4, Caren C Helbing 5, Inanc Birol 1,3,4,6,
PMCID: PMC11917140  PMID: 40100125

Abstract

The ever‐growing global health threat of antibiotic resistance is compelling researchers to explore alternatives to conventional antibiotics. Antimicrobial peptides (AMPs) are emerging as a promising solution to fill this need. Naturally occurring AMPs are produced by all forms of life as part of the innate immune system. High‐throughput bioinformatics tools have enabled fast and large‐scale discovery of AMPs from genomic, transcriptomic, and proteomic resources of selected organisms. Public protein sequence databases, comprising over 200 million records and growing, serve as comprehensive compendia of sequences from a broad range of source organisms. Yet, large‐scale in silico probing of those databases for novel AMP discovery using modern deep learning techniques has rarely been reported. In the present study, we propose an AMP mining workflow to predict novel AMPs from the UniProtKB/Swiss‐Prot database using the AMP prediction tool, AMPlify, as its discovery engine. Using this workflow, we identified 8008 novel putative AMPs from all eukaryotic sequences in the database. Focusing on the practical use of AMPs as suitable antimicrobial agents with applications in the poultry industry, we prioritized 40 of those AMPs based on their similarities to known chicken AMPs in predicted structures. In our tests, 13 out of the 38 successfully synthesized peptides showed antimicrobial activity against Escherichia coli and/or Staphylococcus aureus. AMPlify and the companion scripts supporting the AMP mining workflow presented herein are publicly available at https://github.com/bcgsc/AMPlify.

Keywords: antibiotic resistance, antimicrobial peptide, deep learning, protein sequence database

1. INTRODUCTION

As a consequence of the worldwide overuse of antibiotics, the world is under the threat of entering a “post‐antibiotic era” (Reardon, 2014). The decreasing effectiveness of conventional antibiotics is posing great challenges in the treatment of many infectious diseases (Reardon, 2014), with an estimated 1.27 million people dying due to antibiotic resistance in 2019 (Antimicrobial Resistance Collaborators, 2022). Aside from the extensive use of conventional antibiotics in clinical settings, antibiotics are also widely used in the agriculture industry (Laxminarayan et al., 2013). Certain multidrug‐resistant (MDR) bacteria can be transmitted between humans and other animals, which further exacerbates the problem (Laxminarayan et al., 2013). While the occurrence of antibiotic resistance is increasing, there has been a substantial decline in the discovery of new antimicrobial agents since the 1990s (Koo & Seo, 2019; Terreni et al., 2021). Consequently, there is an urgent need for the discovery of novel and effective substitutes for conventional antibiotics.

Antimicrobial peptides (AMPs), a family of short and often cationic peptides, are regarded as one promising substitute for conventional antibiotics (van der Does et al., 2019). Naturally occurring AMPs are produced by all life forms as part of the innate immunity (Zhang & Gallo, 2016), usually in an inactive precursor form with a signal peptide, an acidic pro‐sequence, and the bioactive mature peptide (Beckloff & Diamond, 2008). The bioactive mature AMPs are released by proteolytic cleavage of their precursors (Zhang & Gallo, 2016). While the majority of known AMPs recorded in public AMP databases have been shown to possess antibacterial activity (Wang et al., 2016), many of them have also been reported to have other types of antimicrobial activities, including antifungal (De Lucca & Walsh, 1999) and antiviral (Klotman & Chang, 2006). Most AMPs exert their effects by directly interacting with bacterial membranes or cell walls, causing non‐enzymatic disruption (Zhang & Gallo, 2016). Additionally, some eukaryotic AMPs also perform modulation of immune responses (Nguyen et al., 2011; Zhang & Gallo, 2016). In comparison with conventional small‐molecule antibiotics, which have specific functional or structural targets, the diverse modes of action of AMPs may hold an advantage in being able to better overcome bacterial resistance (Boman, 2003). Nevertheless, it is still possible to observe resistance to AMPs if bacteria are exposed to AMPs for extended periods of time (Boman, 2003), highlighting a pressing need to augment our peptide‐based therapeutics arsenal.

Traditional approaches of discovering naturally occurring AMPs through wet lab screening are time‐consuming, labor‐intensive, and costly (Wu et al., 2019). In the last decade, a series of machine learning‐based high‐throughput in silico AMP prediction methods have been developed to overcome this problem (Li et al., 2022, 2023; Meher et al., 2017; Sharma et al., 2021; Veltri et al., 2018; Waghu et al., 2014; Wang et al., 2021; Xiao et al., 2013; Yan et al., 2020; Youmans et al., 2017). AMPlify, which incorporates attention‐based deep learning (Vaswani et al., 2017; Yang et al., 2016), has demonstrated strong performance in AMP prediction (Li et al., 2022, 2023). It has also successfully identified over a hundred lab‐validated AMPs from amphibian and insect genomes and transcriptomes (Li et al., 2022; Lin et al., 2022; Richter et al., 2022).

Current AMP mining studies mostly focus on genomic (Amaral et al., 2012; Li et al., 2022; Pérez de la Lastra et al., 2018; Prichula et al., 2021), transcriptomic (Lin et al., 2022; Richter et al., 2022), or proteomic (Tomazou et al., 2019) resources, with specific organisms chosen for analysis. Few studies have leveraged large public protein sequence databases like UniProt (The UniProt Consortium, 2019) (https://www.uniprot.org), where the majority of the sequences already have extensive functional annotations. This is particularly true for approaches utilizing modern deep learning techniques. Previous attempts using UniProt relied on family‐specific signatures with conserved sequence patterns (Waghu et al., 2016), which may miss AMPs with diverse sequence compositions. Conversely, deep learning methods offer an advantage here by capturing complex sequence features (Li et al., 2022).

We note that mining AMPs using comprehensive protein sequence databases offers another valuable strategy for discovering novel AMPs. First, these large protein sequence databases allow researchers to discover a broad source of novel AMPs. Second, there are many uncharacterized sequences awaiting annotation in public databases, which represent an untapped source of AMPs. Further, it would be of interest to uncover possible antimicrobial properties of characterized proteins or peptides, where only non‐antimicrobial functions have been reported to date. Studies have found that some proteins or peptides with other biological functions exhibit antimicrobial properties (Burdukiewicz et al., 2020; Khurshid et al., 2017). Histatins, for example, a series of peptides found in saliva, belong to this category. Histatins harbor multiple functions beneficial to oral health besides antimicrobial activity, including oral hemostasis, development of acquired tooth pellicle, and assistance in the bonding of some metal ions (Khurshid et al., 2017).

Herein, we present an AMP mining workflow with AMPlify as the core AMP prediction module to predict novel AMPs from all eukaryotic sequences in the UniProtKB/Swiss‐Prot database (The UniProt Consortium, 2019), which contains only manually annotated and reviewed records from the larger UniProt database. With this workflow, we identified 8008 novel putative AMPs. Motivated by the growing global concern about the transmission of antibiotic‐resistant bacteria from other animals to humans—a direct consequence of antibiotics overuse in animal agriculture, including poultry farming (Apata, 2009)—we conducted in vitro testing on a total of 38 predicted peptides. The selection of this set of peptides was based on their similarities to known chicken AMPs in predicted three‐dimensional (3D) structures.

We conducted our tests with the synthesized peptides using lab strains of Escherichia coli ATCC 25922 and Staphylococcus aureus ATCC 29213. We observed that 13 of them demonstrated antimicrobial activity against at least one bacterial strain. Chicken AMPs have theoretically evolved to fight against pathogens in their environment (Zhang & Gallo, 2016), hence exogenous AMPs with high structural similarities to known chicken AMPs may help them cope with bacterial challenges in the farm environment.

2. RESULTS

2.1. Integration of AMPlify balanced and imbalanced models

The AMP prediction module of the AMP mining workflow utilizes two AMPlify models, balanced and imbalanced (Li et al., 2022, 2023), to obtain a curated set of putative AMP sequences. The balanced model of AMPlify exhibited superior performance on highly curated candidate sequence sets characterized by lower noise and higher confidence levels, as evidenced by its success in mining diverse genomes or transcriptomes from the bullfrog and others (Li et al., 2022; Lin et al., 2022; Richter et al., 2022). In contrast, running AMPlify using an imbalanced model has been shown to effectively handle large, highly imbalanced candidate sequence sets enriched with non‐AMPs (Li et al., 2023). Based on the performance of the two models in their respective advantageous application scenarios, we incorporated both in the reported AMP mining workflow. As shown in Figure S1, the prediction module (Steps 3 and 4) performed a two‐stage filtering by applying the imbalanced and balanced models in turn. The integration of the two models can also be considered a filtering scheme of only selecting sequences that are predicted by both models as AMPs.

In our previous work, we built two test sets to evaluate the performance of the balanced and imbalanced models (Li et al., 2023). The balanced test set comprises 835 AMPs and 835 non‐AMPs, while the imbalanced test set comprises 835 AMPs and 25,689 non‐AMPs (Li et al., 2023). Tables S1 and S2 show the performance of the balanced and imbalanced models of AMPlify as well as the integration of the two (noted as “balanced + imbalanced”) on the two test sets with regard to accuracy, sensitivity, specificity, F1 score, and area under the receiver operating characteristic curve (AUROC). The integrated model outputs predicted labels instead of probabilities, hence, the AUROC values were not reported for it.

On the balanced test set, where the AMPlify balanced model performs better than the imbalanced model except for sensitivity (Li et al., 2023), the integrated model does not show much improvement when compared with using the two models independently, with only an increase in specificity of 1.2% (95.69% vs. 94.49%).

On the imbalanced test set, where the AMPlify imbalanced model performs better than the balanced model in all five metrics (Li et al., 2023), the integrated model achieves improvement in accuracy (99.31% vs. 98.94%), specificity (99.56% vs. 99.09%), and F1 score (89.25% vs. 84.87%). For highly imbalanced candidate sequence sets where non‐AMP sequences far outnumber AMP sequences, it is crucial to minimize the number of false positives to prevent an excessive number of non‐AMP sequences from being forwarded to downstream in vitro validation. The integrated model shows slightly lower sensitivity than the imbalanced model (91.50% vs. 94.37%), but its improved specificity, particularly, provides better practical performance in this scenario—reducing false positives by 121 while missing only 24 true AMPs in the imbalanced test set.

Additionally, all AMPlify models (balanced, imbalanced, and their integration) outperform other existing AMP prediction methods for comparison on the two test sets (Tables S1 and S2).

Considering that our source for AMP mining is a large protein sequence database (The UniProt Consortium, 2019), in which most of the sequences are not expected to be AMPs, filtering for putative AMP sequences by integrating balanced and imbalanced models is a sensible way to reduce the number of false positives.

2.2. Predicted AMPs

By applying the AMP mining workflow to all eukaryotic sequences in UniProtKB/Swiss‐Prot, 10,720 distinct candidate mature peptide sequences were predicted as AMPs, of which 8008 (74.70%) were novel putative AMPs (Figure S1). All parent sequences were cleaved by the cleavage module of the AMP discovery pipeline rAMPage v1.0.1 (Lin et al., 2022), which adapts ProP v1.0c (Duckert et al., 2004), for mature peptide sequences. This module was selected due to its demonstrated success in AMP discovery from amphibian and insect transcriptomes, as reported in previous literature (Lin et al., 2022; Richter et al., 2022). Sequences without predicted signal peptides were filtered out in Step 2 of the workflow (Figure S1).

The 8008 novel putative AMPs have an average length of 52.49 aa (standard deviation [SD] = 26.49 amino acids [aa]) and an average net charge of 3.04 (SD = 3.88), while the 4538 known AMP sequences have an average length of 30.21 aa (SD = 20.28 aa) and an average net charge of 3.05 (SD = 3.10) (Figure S2). A one‐sided Welch's t‐test indicates that the larger average length of the novel putative AMPs compared with that of the known AMP sequences is statistically significant (p < 0.05). However, researchers tend to prioritize shorter sequences for validation due to their lower synthesis costs for in vitro validation (Lin et al., 2022), which could explain the difference in average lengths. The novel putative AMP sequences also exhibit a notable level of novelty, sharing a low sequence similarity level of 32.71%, on average, to the known AMP sequences (Figure 1).

FIGURE 1.

FIGURE 1

Sequence similarity distributions of the predicted antimicrobial peptides (AMPs) from the UniProtKB/Swiss‐Prot database to known AMPs. The sequence similarity distribution of all 10,720 predicted AMPs to known AMPs from Antimicrobial Peptide Database (Wang et al., 2016) and Database of Anuran Defense Peptides (Novković et al., 2012) was visualized, along with that of the 8008 novel putative AMPs from all predicted AMPs. The former distribution holds a mean of 38.27% and a standard deviation (SD) of 17.28%, while the latter holds a mean of 32.71% and a SD of 7.21%. The sequence similarity of each predicted AMP to known AMPs was considered as the sequence similarity of that predicted AMP sequence to its most similar sequence in the known AMP set, based on which the distributions were plotted.

Tracing back to the original source entries in the protein sequence database, the 10,720 predicted AMPs corresponded to 8481 parent sequences from a total of 8862 UniProt entries. In contrast, the 8008 novel putative AMPs corresponded to 6349 parent sequences from 6654 UniProt entries (Figure S1). In our analysis, we define parent sequences with AMPs predicted from their cleaved mature peptide sequences to be putative AMP precursor sequences, and UniProt entries of those putative AMP precursor sequences to be putative AMP entries.

Among all 8481 putative AMP precursor sequences identified, 86.84% (7365) only had one distinct predicted AMP sequence, as shown in Figure 2. However, we did notice cases where multiple AMP sequences were predicted from a single putative precursor sequence (Novković et al., 2012). Specifically, 698 putative precursor sequences were predicted with two distinct AMPs from each, 207 predicted with three, 74 predicted with four, and 57 predicted with five. There were 80 putative precursor sequences with more than five distinct AMPs predicted from each.

FIGURE 2.

FIGURE 2

Distribution for the number of predicted mature antimicrobial peptide (AMP) sequences found within each putative AMP precursor sequence mined from the UniProtKB/Swiss‐Prot database. The bar chart was plotted based on 8481 distinct putative AMP precursor sequences from the 8862 putative AMP entries. Each bar shows the number of putative AMP precursor sequences that were predicted with the corresponding number of distinct mature AMPs by AMPlify.

The doughnut chart in Figure 3 categorizes the 8862 putative AMP entries identified by the AMP mining workflow by source organisms. According to the statistics reported by the Antimicrobial Peptide Database (APD3, https://aps.unmc.edu) web server in January 2023, out of all 3569 sequences in records, amphibians are the largest organism source of AMPs (1196 sequences), followed by bacteria (380), plants (371), insects (367), and mammals (363) (Wang et al., 2016). Based on this, all the putative AMP entries we report herein were classified into the following five categories: amphibian, plant, insect, mammalian, and other entries. The category of bacterial AMPs was not included, as we only considered eukaryotic AMPs in this work.

FIGURE 3.

FIGURE 3

Categorization of the putative antimicrobial peptide (AMP) entries identified from the UniProtKB/Swiss‐Prot database based on source organisms. The 8862 putative AMP entries with mature AMPs predicted by AMPlify were classified into five categories (mammalian, plant, amphibian, insect, and other entries) based on their source organisms, as shown in the middle doughnut chart. The predicted AMPs were further checked against the known AMP sequences in Antimicrobial Peptide Database (Wang et al., 2016) and Database of Anuran Defense Peptides (Novković et al., 2012) as well as the annotations of the corresponding UniProt entries. Those not found in those two AMP databases and not annotated with the AMP‐related keywords were labeled as novel AMP sequences. The five small pie charts surrounding the middle doughnut chart provide additional information on the percentages of entries in UniProtKB/Swiss‐Prot identified as putative AMP entries by the AMP mining workflow.

The distribution of source organisms for putative AMP entries is highly variable, reflecting the composition of the UniProtKB/Swiss‐Prot database (Figure 3). Mammalian entries constitute the largest source organism category for putative AMP entries, comprising 37.93% (3361/8862) of all. Out of all 3361 putative AMP entries from the mammalian category, 2703 (80.42%) of them had novel AMPs predicted from their cleaved candidate mature peptide sequences, making it the largest source organism category for novel putative AMP entries as well. Putative AMP entries not belonging to any of the four specified source organism categories also make up a large portion, following the mammalian category, and represent 33.78% (2994/8862) of all putative AMP entries. In this category of putative AMP entries, 79.93% (2393/2994) had novel AMPs predicted from their cleaved candidate mature peptide sequences. Plant entries form the third largest group among all putative AMP entries, contributing 17.42% (1544/8862) to the total. Overall, 1170 out of 1544 putative plant AMP entries (75.78%) were considered novel discoveries. Amphibian and insect categories are the two smallest in this analysis, with 509 and 454 putative AMP entries in each. While 59.25% (269/454) of the putative insect AMP entries were determined to be novel discoveries, the amphibian category is the only source organism category with more than half of the putative AMP entries (76.62%) already identified with known mature AMP sequences and/or annotated with AMP‐related keywords. We postulate the main reason for this is that amphibians are considered a rich source of AMPs (Helbing et al., 2019); hence, they have become a popular source for AMP mining (Li et al., 2022; Lin et al., 2022; Richter et al., 2022), resulting in more of the amphibian AMPs having been discovered and annotated than those from other organisms. Moreover, amphibians are also the source organism group with the largest proportion of entries (7.64%) in UniProtKB/Swiss‐Prot identified as putative AMP entries by the AMP mining workflow, followed by mammals (4.98%), insects (4.73%), source organisms other than the specified ones (4.27%), and plants (3.74%), as can be seen from the pie charts in Figure 3.

We did not observe any instances where a single putative AMP precursor sequence had novel putative AMP(s) predicted, while also containing known AMP(s) in its cleaved candidate mature peptides and/or being annotated with AMP‐related keywords (described in Section 4). We also note that UniProt entries annotated with AMP‐related keywords can be proteins or peptides that assist in the defense against microbes rather than possessing antimicrobial activities themselves. However, to ensure a cleaner set of novel putative AMPs, all the predicted AMPs with their parent sequence entries annotated with AMP‐related keywords were not considered novel discoveries.

2.3. In vitro validation results

Considering the large number of putative AMPs identified by the AMP mining workflow, we prioritized a subset of the sequences for synthesis and in vitro validation. In this work, we primarily focused on the applications of AMPs in poultry farming. Since 3D structures of proteins or peptides may be a determining factor for their biological functions (Alberts et al., 2002), we prioritized putative AMPs based on their similarities to known chicken AMPs in 3D structures predicted by ColabFold (Mirdita et al., 2022). The reference chicken AMP for a putative AMP was defined as the most similar known chicken AMP to that putative AMP in predicted 3D structures.

We selected a total number of 40 short cationic novel putative AMP sequences with AMPlify scores ≥10 for synthesis based on their similarities to known chicken AMPs in predicted structures, as measured by template modeling scores (TM‐scores) (Zhang & Skolnick, 2004, 2005) (Figure S3). The AMPlify score is an AMP prediction score reported by AMPlify (Li et al., 2022, 2023), and the TM‐score is a protein structural similarity score reported by TM‐align (Zhang & Skolnick, 2005) (see the Section 4 for details). Of those selected, 30 had the highest TM‐scores regarding their similarities to known chicken AMPs in predicted structures, while the remaining 10 had the lowest TM‐scores. Tables S3 and S4 list the characteristics of the 40 peptides and their original UniProt entry information, respectively. We note that two of the top 30 sequences were not successfully synthesized, resulting in a final list of 38 putative AMPs for in vitro validation. The three reference chicken AMPs, Chicken CATH‐2 (van Dijk et al., 2005), Chicken CATH‐3 (Xiao et al., 2006), and Ovipin (dos Santos et al., 2022), that matched the top 30 sequences were also tested for comparison (Table S5).

All synthesized peptides were tested against two bacterial isolates: the Gram‐negative E. coli ATCC 25922 and the Gram‐positive S. aureus ATCC 29213. Porcine red blood cells (RBCs) were used to assess the hemolytic activity of the peptides. In this study, we used porcine RBCs, as observations in prior work confirmed that porcine and chicken RBCs report similar results in hemolytic activity measurements (data not shown). Out of the 38 putative AMPs synthesized, 13 displayed antimicrobial activity against E. coli ATCC 25922, with three of the 13 additionally active against S. aureus ATCC 29213. Figure 4 summarizes the antimicrobial and hemolytic activities of the 13 active peptides in minimum inhibitory concentration (MIC) and the concentration that lyses 50% of the RBCs (HC50), respectively, with the entire in vitro validation results of all peptides shown in Table S6. The left section of Figure 4 shows the results for the 11 active peptides from the set of 30 putative AMPs with the highest TM scores, as well as their reference chicken AMPs for comparison. Results of the two active peptides from the set of 10 putative AMPs with the lowest TM scores are shown in the right section of Figure 4. Figure S4 additionally shows the activity of all tested putative AMPs regarding their AMPlify scores and similarities to known chicken AMPs in predicted structures.

FIGURE 4.

FIGURE 4

Antimicrobial and hemolytic activities of the 13 novel antimicrobial peptides (AMPs) mined from the UniProtKB/Swiss‐Prot database that were active against at least one bacterial strain of Escherichia coli ATCC 25922 and Staphylococcus aureus ATCC 29213 in vitro. Antimicrobial and hemolytic activities were measured by minimum inhibitory concentration (MIC) and concentration that lyses 50% (HC50) of the red blood cells (RBCs), respectively. HC50 was determined using porcine RBCs. Data are presented as the lowest effective peptide concentration range (μg/mL) observed in three independent experiments performed in duplicate, with one maximum data point and one minimum data point dropped for each measurement. The results are divided into two sections, as separated by the solid vertical line. The left section shows results of the 11 active peptides that were most similar to known chicken AMPs in predicted three‐dimensional (3D) structures as measured by TM‐scores, while the right section shows results of the two active peptides that were least similar to known chicken AMPs in predicted 3D structures. These 11 active peptides in the left section are categorized into three sub‐sections, as separated by the dashed vertical lines, according to their reference chicken AMPs (i.e., the most similar known chicken AMP to each putative AMP in predicted 3D structures). The results of the three reference chicken AMPs: Chicken CATH‐2 (van Dijk et al., 2005), Chicken CATH‐3 (Xiao et al., 2006), and Ovipin (dos Santos et al., 2022), are listed for comparison in each sub‐section (dark gray background). We note that hemolysis experiments were not performed for putative AMPs that did not show any antimicrobial activity (MIC >128 μg/mL) in at least two repeats for each bacterial strain tested (i.e., MoIn1, MuMu1, and HoSa7).

Among the 10 tested putative AMPs with Chicken CATH‐2 as their reference chicken AMP, five showed antimicrobial activity in our tests. Within this group of five peptides, HoSa1, with its original UniProt entry annotated as an olfactory receptor from humans (The UniProt Consortium, 2019), possessed the strongest antimicrobial activity against E. coli ATCC 25922 (MIC = 16 μg/mL) and was the only peptide that showed additional activity against S. aureus ATCC 29213 (MIC = 128 μg/mL). DiDi2 showed the same antibacterial activity as HoSa1 against E. coli ATCC 25922 with an MIC of 16 μg/mL, with its original UniProt entry annotated as a putative uncharacterized transmembrane protein from social amoebas (The UniProt Consortium, 2019). DaRe1, the parent sequence of which is annotated as a GTP‐binding protein from zebrafish, inhibited the growth of E. coli ATCC 25922 at an MIC of 32–64 μg/mL. MoIn1 and MuMu1 presented the weakest antibacterial activity against E. coli ATCC 25922 in this group, with MICs of 64 to >128 and ≥128 μg/mL, respectively. The parent sequence of MoIn1 is annotated as part of an intermediate translocation complex at the inner envelope membrane of chloroplasts from mulberries, while the parent sequence of MuMu1 is annotated as a mouse surfeit locus protein, a component of the mitochondrial translation regulation assembly intermediate of cytochrome c oxidase complex (MITRAC) (The UniProt Consortium, 2019). The reference chicken AMP, Chicken CATH‐2, possessed stronger antimicrobial activity against E. coli ATCC 25922 (MIC = 8–16 μg/mL) and S. aureus ATCC 29213 (MIC = 32 μg/mL) than all five peptides in this group.

Among the 16 tested putative AMPs with Chicken CATH‐3 as their reference chicken AMP, five showed antimicrobial activity in our tests. Among the five peptides in the group, PlVi1 had the strongest antimicrobial activity against both E. coli ATCC 25922 (MIC = 4–8 μg/mL) and S. aureus ATCC 29213 (MIC = 32 μg/mL), and it was also the most active peptide among all 38 putative AMPs tested. The parent sequence of PlVi1 is annotated as the precursor of an RxLR effector protein that completely suppresses the host cell death induced by cell death‐inducing proteins, which is derived from a type of oomycete that causes the downy mildew disease of grapevines (The UniProt Consortium, 2019). HoSa4, derived from a human IQ domain‐containing protein without a detailed functional annotation (The UniProt Consortium, 2019), was the second most active AMP in this group. HoSa4 was observed to be bioactive against E. coli ATCC 25922 and S. aureus ATCC 29213 with MICs of 32 and 64–128 μg/mL, respectively. We note that PlVi1 and HoSa4 are the only two peptides that were active against both bacterial strains tested in this group. ArTh11 inhibited the growth of E. coli ATCC 25922 at an MIC of 64–128 μg/mL. The parent sequence of ArTh11 is annotated as an F‐box protein from mouse‐ear cress (The UniProt Consortium, 2019), with no specific functional annotation. SuSc1 and HoSa7 were the two least active peptides in the group, with bioactivity against E. coli ATCC 25922 at MICs of 128 and 64 to >128 μg/mL, respectively. SuSc1 was derived from a precursor sequence of adrenomedullin in pigs and cattle, while HoSa7 was cleaved from a human G‐protein‐coupled receptor (The UniProt Consortium, 2019). The reference chicken AMP, Chicken CATH‐3, harbored stronger antimicrobial activity against E. coli ATCC 25922 (MIC = 2–4 μg/mL) and S. aureus ATCC 29213 (MIC = 2 μg/mL) than all five peptides in this group.

Two peptides in the top 30 putative AMP list matched Ovipin as their reference chicken AMP, with one (GaGa2) showing antimicrobial activity in our tests. However, it only showed minimal antibacterial activity against E. coli ATCC 25922 (MIC = 128 μg/mL), with no activity against S. aureus ATCC 29213 (MIC >128 μg/mL). This peptide was derived from a chicken tenascin (The UniProt Consortium, 2019). The reference chicken AMP, Ovipin, did not show any activity against the two bacterial strains tested in the present study (MIC >128 μg/mL) but was previously reported to be bioactive against a Gram‐positive Micrococcus luteus strain (dos Santos et al., 2022).

Among the 10 least similar putative AMPs to known chicken AMPs in predicted structures, only two (GoGo1 and UnBi1) showed activity in our tests. GoGo1 was part of the sequence of a transcriptional repressor found in five different primate species: western lowland gorillas, bonobos, Bornean orangutans, chimpanzees, as well as humans (The UniProt Consortium, 2019). It inhibited the growth of E. coli ATCC 25922 with an MIC of 64 μg/mL. The parent peptide of UnBi1 was annotated as a neurotoxin from sea snails (The UniProt Consortium, 2019). It only showed minimal activity against E. coli ATCC 25922 (MIC = 128 μg/mL). Neither of these two peptides was active against S. aureus ATCC 29213 (MIC >128 μg/mL).

We note that hemolysis experiments were not performed for putative AMPs that did not show any antimicrobial activity (MIC >128 μg/mL) in at least two repeats for each bacterial strain tested, resulting in MoIn1, MuMu1, and HoSa7 not being characterized for hemolytic activity (Table S6). None of the other 10 novel bioactive AMPs were hemolytic to the porcine RBCs (HC50 >128 μg/mL).

3. DISCUSSION

In this work, we applied an AMP mining workflow to discover novel eukaryotic AMPs from the UniProtKB/Swiss‐Prot database. The AMP mining workflow utilizes the state‐of‐the‐art AMP prediction tool AMPlify (Li et al., 2022, 2023), as well as the rAMPage cleavage module (Lin et al., 2022) with ProP (Duckert et al., 2004) for precursor sequence cleavage. The scripts for the AMP mining workflow have been incorporated into AMPlify v2.0.0 and are available at https://github.com/bcgsc/AMPlify. Using this workflow, we identified 8008 distinct novel AMP sequences from all eukaryotic sequences in the UniProtKB/Swiss‐Prot database. We have made all predicted AMPs in the present study publicly available through a Zenodo repository at https://doi.org/10.5281/zenodo.8133088 (Li & Birol, 2023), providing the community with putative AMP sequences to validate in the lab and to extend the current arsenal of peptide‐based therapeutics.

While the presented AMP mining workflow successfully identified a considerable number of putative AMPs, it is essential to acknowledge the limitations inherent in the machine learning‐based tools we employed. Both AMPlify and ProP, although powerful, are not infallible, as neither of them achieves 100% accuracy in their respective tasks. As is common with most machine learning‐based bioinformatics tools, their performance can be limited by the quality as well as the size of available data used for training (Li et al., 2022, 2023). Despite these limitations, the reported performance of AMPlify and ProP (Duckert et al., 2004; Li et al., 2022, 2023) still establishes them as state‐of‐the‐art tools, rendering them highly suitable for integration into an AMP mining workflow. We anticipate that the limitations of these tools will gradually diminish as more training data become available (Li et al., 2022) and machine learning techniques continue to advance (Li et al., 2019).

Herein, we focused on AMPs that have potential utility in poultry farming. As the first step toward a field application, we initially tested our selected putative AMPs against E. coli and S. aureus, and found 13 to be active against at least one of the bacterial strains tested. A total of 11 of these active peptides were relatively similar to chicken AMPs in their predicted 3D structures (TM‐scores ranging from 0.6934 to 0.8352), 10 of which were from organisms other than chickens. Based on the fact that the biological function of a protein or peptide can be influenced by the 3D structure it adopts (Alberts et al., 2002) and chicken AMPs have evolved to fight the pathogens that infect their hosts (Zhang & Gallo, 2016), these newly discovered AMPs with high similarities to chicken AMPs in predicted structures may target similar pathogens as chicken AMPs do. As a result, we foresee the potential usage of these characterized AMPs as possible substitutes for conventional antibiotics to fight against pathogens in chicken farming.

Although some of the putative AMPs were not active against the bacterial strains we tested, they may still be active against other pathogens, including those that infect chickens. Further work should test these AMPs on a broader range of bacterial species and strains, particularly MDR strains, to evaluate their potential as novel antibiotics for chickens. Additionally, investigating potential synergistic effects of these AMPs and conventional antibiotics could reveal enhanced therapeutic benefits (Taheri‐Araghi, 2024).

While it is hypothesized that AMPs may not induce antibiotic resistance to the extent of conventional antibiotics (Boman, 2003), cases of bacterial cross‐resistance to multiple AMPs have been reported (Fleitas & Franco, 2016), raising a concern regarding the usage of these newly discovered AMPs. The similarity between a newly discovered AMP and a chicken AMP in predicted 3D structures suggests a potential similarity in their modes of action (Alberts et al., 2002), indicating a potential risk that a newly discovered AMP may lose its effect if some pathogens have already developed resistance to its reference chicken AMP (Kintses et al., 2019). However, the extent to which structural similarity implies similarity in modes of action remains unclear. While 3D structures can suggest mechanistic patterns, other factors like hydrophobic and positively charged residue distributions can also influence AMP activity. Further, our predicted structures, though analytically useful, lack experimental validation. Some AMPs may even adopt different conformations depending on their microenvironment (Cândido et al., 2019). These limitations highlight the need for further experimental investigation.

If cross‐resistance to our newly discovered AMPs and known chicken AMPs does not exist in pathogens tested, then the novel AMPs with high similarities to known chicken AMPs in predicted structures but originating from other organisms may be prioritized as primary candidates for further translation into applications. These AMPs may have different evolutionary backgrounds from chicken AMPs, suggesting that many chicken pathogens may not have had enough opportunity to develop resistance against them. In the worst case, if cross‐resistance to the newly discovered AMPs and known chicken AMPs does occur in some pathogens, then this work sets up a warning to the scientific community in the use of those AMPs in chicken farming.

Although the AMPs discovered in the present study were derived from putative precursor sequences of existing organisms, it still requires further investigation to determine whether those AMPs occur in nature. Due to the limitations of the in silico cleavage tool used, it is likely that some sequences, which may or may not be real precursors, were incorrectly cleaved, even when the resulting sequence fragments displayed antimicrobial properties in vitro. It has been reported that the degradation products of some non‐antimicrobial proteins with other biological functions exhibit antimicrobial activities (Papareddy et al., 2010), and some of the aforementioned protein sequence fragments may belong to this category of AMPs. Further, we observed instances where the predicted AMPs originated from non‐secreted proteins, such as GoGo1, which is a sequence fragment from a transcriptional repressor. While these AMPs may be less likely to occur naturally, they offer unique advantages, as pathogens may lack resistance mechanisms against them. Investigating the various cleavage patterns and the resulting AMPs predicted from the same precursor sequences may also provide insights into their in vivo processing and potential distinct activities. Lastly, exploring the relationship between these novel AMPs and their parent proteins' functions may provide a better understanding of their broader biological roles.

As the UniProt database frequently updates with more protein sequence data from additional organisms, we expect the AMP mining workflow we present to continue being a valuable discovery and annotation engine. Future work can also be extended to additional peptide sequence databases (e.g., UniProtKB/TrEMBL, the unreviewed section of the UniProt database (The UniProt Consortium, 2019)). Additional work can be done by loosening the criteria in sequence filtering steps for more sequences. For example, sequences without any predicted signal peptides, which were all removed in Step 2 of our workflow (Figure S1), can be secreted proteins or peptides reported in mature or incomplete precursor forms. Though it is out of the research scope of the current work, such sequences warrant further investigation. The identified AMPs may also have broader applications, including human medicine, by leveraging structural and functional similarities across organisms. We expect bioinformatics workflows like the one reported here to continuously discover novel AMPs to combat MDR bacteria.

4. MATERIALS AND METHODS

4.1. Datasets

The present study involves four main datasets: (1) all eukaryotic entries from UniProtKB/Swiss‐Prot (The UniProt Consortium, 2019), (2) entries annotated with AMP‐related keywords in UniProtKB/Swiss‐Prot (The UniProt Consortium, 2019), (3) known AMP sequences in AMP databases (Novković et al., 2012; Wang et al., 2016), and (4) known chicken AMP sequences (Wang et al., 2016).

All eukaryotic entries from the UniProtKB/Swiss‐Prot database (2022_02 release) were downloaded by using the query “(taxonomy_id:2759) AND (reviewed:true).” This set of sequences includes 186,302 distinct sequences from 195,188 UniProt entries.

The set of entries annotated with AMP‐related keywords in the UniProtKB/Swiss‐Prot database includes 18,470 distinct sequences from 19,845 UniProt entries. UniProt entries with their annotations containing any of the following 16 AMP‐related keywords were downloaded: {antimicrobial, antibiotic, antibacterial, antiviral, antifungal, antimalarial, antiparasitic, anti‐protist, anticancer, defense, defensin, cathelicidin, histatin, bacteriocin, microbicidal, fungicide} (Li et al., 2022). We note that 11,925 sequences from 12,229 UniProt entries in this set belong to the above eukaryotic sequence set.

The known AMP sequence set comprises 4538 distinct AMP sequences. All AMP sequences were downloaded from two curated AMP databases: APD3 (Wang et al., 2016) on July 11, 2022, and Database of Anuran Defense Peptides (DADP, http://split4.pmfst.hr/dadp) (Novković et al., 2012) on December 6, 2018.

The known chicken AMP sequence set includes 22 sequences in total. These sequences were downloaded on October 14, 2022, from the APD3 database (Wang et al., 2016) by searching for the source organism “Gallus gallus.”

4.2. AMP mining workflow

The AMP mining workflow we report in the present study includes five steps, as shown in Figure S1.

In Step 1, all eukaryotic protein sequences from the UniProtKB/Swiss‐Prot database were processed to generate candidate mature peptide sequences using the cleavage module of the AMP discovery pipeline rAMPage v1.0.1 (Lin et al., 2022). The cleavage module of rAMPage is based on ProP v1.0c (Duckert et al., 2004), a machine learning‐based tool for cleavage site prediction in eukaryotic protein sequences. The rAMPage cleavage module also provides post‐processing of the ProP output (Lin et al., 2022). ProP predicts all the cleavage sites but only annotates the corresponding signal peptides—it does not provide labels for mature peptide sequences and pro‐sequences (Duckert et al., 2004; Lin et al., 2022). As a result, for each putative precursor sequence, the rAMPage cleavage module takes all cleaved pieces, excluding the predicted signal peptide, as well as all possible recombinations of non‐adjacent cleaved pieces (with a maximum of three pieces within each recombination) as candidate mature peptide sequences (Lin et al., 2022). Additionally, the rAMPage cleavage module removes candidate mature peptide sequences shorter than 2 aa or longer than 200 aa, matching the typical size range of AMPs (van der Does et al., 2019).

The redundancy removal function of the rAMPage cleavage module was turned off in the present study, as the same putative AMP sequence from different UniProt entries was kept as different records (Figure S1) for the convenience of future investigations into their source organisms. However, we did consider them as the same putative AMP sequence when calculating the unique sequence counts presented.

In Step 2, sequences predicted to lack signal peptides were removed, as most AMPs are expected to be secreted (Bals, 2000).

In Step 3, we employed AMPlify, a deep learning‐based tool for AMP prediction (Li et al., 2022, 2023). The AMPlify model uses a bidirectional long short‐term memory (Bi‐LSTM) layer (Gers et al., 2000; Hochreiter & Schmidhuber, 1997; Schuster & Paliwal, 1997), followed by multi‐head scaled dot‐product attention for refined sequence representation (Vaswani et al., 2017) and context attention to generate a summary vector (Yang et al., 2016). It utilizes ensemble learning and is trained on known AMPs and non‐AMP sequences for balanced and imbalanced models, where the training set is heavily biased toward non‐AMP sequences in the latter model. Here, the remaining candidate mature peptide sequences from Step 2 were preliminarily filtered using the imbalanced model of AMPlify v2.0.0 (Li et al., 2023).

Candidate mature peptide sequences predicted as AMPs by the AMPlify imbalanced model were passed on to the next step. Sequences with non‐standard amino acids were also filtered out, as AMPlify does not assign those sequences any predictions.

In Step 4, the predicted AMPs from Step 3 were then passed through the balanced model of AMPlify v2.0.0 (Li et al., 2022) for a more precise filtering. Sequences that were predicted as AMPs by both AMPlify imbalanced and balanced models were collected as the final set of predicted AMPs.

In Step 5, all the predicted AMPs from Step 4 were checked against the known AMP sequence set as well as the annotations of the corresponding UniProt entries for novel putative AMPs. Predicted AMPs were only considered novel putative AMPs if they were not in the known AMP sequence set and if their corresponding UniProt entry annotation did not have AMP‐related keywords (see above under Section 4.1).

4.3. Sequence and structural similarities between peptides

We used two measures of peptide similarity: sequence similarity and structural similarity.

The sequence similarity between two peptides was calculated as 1di,j/maxlilj×100%, where di,j is the edit distance and li,lj are lengths of the peptide sequences. The structural similarity between two peptides was measured with TM‐scores normalized by the average peptide sequence length (Zhang & Skolnick, 2004) using TM‐align (Zhang & Skolnick, 2005). TM‐scores are between 0 and 1, with higher TM‐scores indicating higher similarity between the two structures (Zhang & Skolnick, 2004).

The sequence/structural similarity of a single peptide to a set of peptides was defined as the maximum of all sequence/structural similarity values calculated between that peptide and the peptides in the target set for comparison (i.e., the sequence/structural similarity of that peptide to its most similar target set peptide in their sequences/structures).

In this study, in silico predicted 3D structures were utilized for all peptide structural similarity comparisons. The 3D structures of peptides were predicted using ColabFold v1.4.0 (Mirdita et al., 2022). ColabFold improves the prediction speed of AlphaFold2 (Jumper et al., 2021) by combining it with MMseqs2 homology search (Steinegger & Söding, 2017). For each peptide, five structures were generated by five AlphaFold2 models trained using different random seeds, with relaxations applied to the predicted structures utilizing gradient descent in the Amber force field (Hornak et al., 2006; Jumper et al., 2021). The structures were then ranked by their average predicted local distance difference test (pLDDT) scores—a measure of the confidence of AlphaFold2 predictions. We chose the structure with the highest average pLDDT score for further analysis.

4.4. Selecting putative AMPs for validation

Among all the novel putative AMPs discovered by the AMP mining workflow, we prioritized 397 short cationic peptides with AMPlify scores ≥10 (i.e., AMPlify probability scores ≥0.9) for selection, as shorter peptides are more cost‐effective to synthesize (Lin et al., 2022). We define short cationic peptides as those with lengths between 5 and 35 aa and net charge greater than 0. We note that AMPlify score is a prediction score reported by AMPlify, which is a log transformation of the AMPlify probability score pAMPlify as 10log101pAMPlify, and here the AMPlify scores from the balanced model were taken for analysis.

First, the 3D structures of the 397 putative AMPs, together with the 22 known chicken AMPs, were predicted by ColabFold (Mirdita et al., 2022). Next, the similarity (TM‐score) of each putative AMP to the known chicken AMPs in predicted structures (i.e., similarity of the putative AMP to the most similar known chicken AMP in predicted 3D structures) was calculated using TM‐align (Zhang & Skolnick, 2005). Finally, we ranked the peptides by their similarities to known chicken AMPs in predicted structures. The top 30 putative AMPs with the highest TM‐scores were prioritized for synthesis, as well as the bottom 10 with the lowest TM‐scores for comparison (Tables S3 and S4). Furthermore, the three reference chicken AMPs (i.e., the most similar known chicken AMP to a putative AMP in predicted 3D structures) for the top 30 putative AMPs: Chicken CATH‐2 (van Dijk et al., 2005), Chicken CATH‐3 (Xiao et al., 2006), and Ovipin (dos Santos et al., 2022), were also sent for synthesis for comparison (Table S5). We note that two of the top 30 putative AMPs could not be successfully synthesized, resulting in a final number of 38 putative AMPs in total proceeding to further in vitro validation.

4.5. Antimicrobial susceptibility testing

To measure the antimicrobial activity of our selected putative AMPs in vitro, broth microdilution assays were conducted to determine the minimum inhibitory and minimum bactericidal concentrations (MICs and MBCs, respectively), as outlined by the Clinical and Laboratory Standards Institute [CLSI] (2015). The adaptations for testing cationic AMPs, as described previously (Wiegand et al., 2008), were incorporated. The selected peptides were tested against laboratory isolates of E. coli 25922 and S. aureus 29213, which were purchased from the American Type Culture Collection (ATCC; Manassas, VA, USA). Bacteria from frozen stocks were streaked onto nonselective Columbia blood agar with 5% sheep blood (Oxoid) and incubated for 18–24 h at 37°C. To ensure the uniform colony health before the assay, 2–4 colonies were streaked onto a new agar plate and incubated for another 18–24 h at 37°C on the next day. To create a standardized bacterial inoculum, isolated colonies were suspended in Mueller‐Hinton Broth (MHB; Sigma‐Aldrich, St. Louis, MO, USA). The suspension was adjusted to an optical density of 0.08–0.1 at 600 nm, which corresponds to a 0.5 McFarland standard of approximately 1–2 × 108 colony‐forming units (CFU)/mL. The inoculum was then diluted 1:250, resulting in a final concentration of 5 ± 3 × 105 CFU/mL. The target bacterial density was confirmed by measuring the total viability counts from the final tested inoculum.

The selected putative AMPs were purchased from and synthesized by GenScript (Piscataway, NJ, USA) in lyophilized format. The peptides were stored at −20°C and were suspended in sterile ultrapure water prior to testing. A two‐fold serial dilution of 1280 to 2.5 μg/mL was prepared in sterile 96‐well polypropylene microtiter plates (Greiner Bio‐One #650261, Kremsmünster, Austria). Then, 100 μL of the standardized bacterial inoculum was added to each well, resulting in a final AMP testing range of 128 to 0.25 μg/mL. The MIC values were determined as the lowest peptide concentration in which no visible bacterial growth was observed after a 20–24 h incubation at 37°C. To determine the MBC values, well contents of the MIC and the two adjacent wells containing the two‐ and four‐fold higher peptide concentrations were plated onto nonselective nutrient agar. The concentration in which 99.9% of the inoculum was killed after an incubation for 24 h at 37°C was reported as the MBC.

In our tests, we used Ranatuerin‐4—a known AMP from the American bullfrog (Goraya et al., 1998), and an in‐house peptide [TKPKG]3 (OT15) as the positive and negative control peptides, respectively. We note that OT15 was truncated and derived from a negative control peptide [TKPKG]4 (OT20), not antimicrobial but sharing similar characteristics to AMPs, used in previous studies (Horváti et al., 2017).

4.6. Hemolysis assay

Hemolysis experiments were performed to evaluate the toxicity of the selected peptides to RBCs. Whole blood from healthy donor pigs was purchased from Lampire Biological Laboratories (Pipersville, PA, USA), and RBCs were washed and isolated by centrifugation using Roswell Park Memorial Institute (RPMI) medium (Life Technologies, Grand Island, NY, USA). Peptides were suspended and serially diluted from 1280 to 10 μg/mL using RPMI medium in a 96‐well polypropylene microtiter plate, and then combined with 100 μL of the 1% RBC solution, providing a final AMP testing range of 128 to 1 μg/mL. Plates were centrifuged after an incubation at 37°C for 30–45 min, and 1/2 volume from each supernatant was transferred to a new 96‐well plate. The absorbance of the wells was measured at 415 nm utilizing the Cytation 5 Cell Imaging Multimode Reader (BioTek, CA, USA), and the peptide concentration that lysed 50% of the RBCs (HC50) was used to determine the hemolytic activity. Absorbance readings from wells containing RBCs treated with 11 μL of a 2% Triton‐X100 solution or RPMI medium (AMP solvent‐only) were used to define 100% and 0% hemolysis, respectively. We note that all centrifugation steps were performed at 500× g for 5 min in an Allegra‐6R centrifuge (Beckman Coulter, CA, USA).

AUTHOR CONTRIBUTIONS

Chenkai Li: Conceptualization; methodology; software; data curation; investigation; validation; formal analysis; visualization; writing – original draft; writing – review and editing. Darcy Sutherland: Methodology; formal analysis; validation; investigation; writing – review and editing. Ali Salehi: Methodology; investigation; validation; formal analysis; writing – review and editing. Amelia Richter: Methodology; investigation; validation; formal analysis; writing – review and editing; software. Diana Lin: Software; methodology; writing – review and editing. Sambina Islam Aninta: Methodology; software; writing – review and editing. Hossein Ebrahimikondori: Methodology; software; writing – review and editing. Anat Yanai: Methodology; investigation; validation; formal analysis; writing – review and editing. Lauren Coombe: Formal analysis; investigation; writing – review and editing. René L. Warren: Conceptualization; formal analysis; investigation; methodology; writing – review and editing. Monica Kotkoff: Project administration; writing – review and editing. Linda M. N. Hoang: Conceptualization; funding acquisition; writing – review and editing; supervision. Caren C. Helbing: Conceptualization; supervision; funding acquisition; writing – review and editing. Inanc Birol: Conceptualization; methodology; software; investigation; formal analysis; supervision; funding acquisition; writing – review and editing.

CONFLICT OF INTEREST STATEMENT

Inanc Birol is a co‐founder of and an executive at Amphoraxe Life Sciences Inc.

Supporting information

Data S1. Supporting Information.

PRO-34-e70083-s001.pdf (699.8KB, pdf)

ACKNOWLEDGMENTS

This work was supported by Genome BC and Genome Canada (291PEP). The content of this paper is solely the responsibility of the authors and does not necessarily represent the official views of our funding organizations. Additional support was provided by the Canadian Agricultural Partnership, a federal‐provincial‐territorial initiative, under the Canada‐BC Agri‐Innovation Program. The program is delivered by the Investment Agriculture Foundation of BC. Opinions expressed in this document are those of the authors and not necessarily those of the Governments of Canada and British Columbia or the Investment Agriculture Foundation of BC. The Governments of Canada and British Columbia, the Investment Agriculture Foundation of BC, and their directors, agents, employees, or contractors will not be liable for any claims, damages, or losses of any kind whatsoever arising out of the use of, or reliance upon, this information.

Li C, Sutherland D, Salehi A, Richter A, Lin D, Aninta SI, et al. Mining the UniProtKB/Swiss‐Prot database for antimicrobial peptides. Protein Science. 2025;34(4):e70083. 10.1002/pro.70083

Review Editor: Nir Ben‐Tal

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are openly available via Zenodo at https://doi.org/10.5281/zenodo.8133088 (Li & Birol 2023).

REFERENCES

  1. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Analyzing protein structure and function. Molecular biology of the cell. 4th ed. New York: Garland Science; 2002. [Google Scholar]
  2. Amaral AC, Silva ON, Mundim NCCR, de Carvalho MJA, Migliolo L, Leite JRSA, et al. Predicting antimicrobial peptides from eukaryotic genomes: in silico strategies to develop antibiotics. Peptides. 2012;37(2):301–308. 10.1016/j.peptides.2012.07.021 [DOI] [PubMed] [Google Scholar]
  3. Antimicrobial Resistance Collaborators . Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet. 2022;399(10325):629–655. 10.1016/S0140-6736(21)02724-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Apata DF. Antibiotic resistance in poultry. Int J Poult Sci. 2009;8(4):404–408. 10.3923/ijps.2009.404.408 [DOI] [Google Scholar]
  5. Bals R. Epithelial antimicrobial peptides in host defense against infection. Respir Res. 2000;1:5. 10.1186/rr25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Beckloff N, Diamond G. Computational analysis suggests beta‐defensins are processed to mature peptides by signal peptidase. Protein Pept Lett. 2008;15(5):536–540. 10.2174/092986608784567618 [DOI] [PubMed] [Google Scholar]
  7. Boman HG. Antibacterial peptides: basic facts and emerging concepts. J Intern Med. 2003;254(3):197–215. 10.1046/j.1365-2796.2003.01228.x [DOI] [PubMed] [Google Scholar]
  8. Burdukiewicz M, Sidorczuk K, Rafacz D, Pietluch F, Chilimoniuk J, Rödiger S, et al. Proteomic screening for prediction and design of antimicrobial peptides with AmpGram. Int J Mol Sci. 2020;21(12):4310. 10.3390/ijms21124310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Cândido ES, Cardoso MH, Chan LY, Torres MDT, Oshiro KGN, Porto WF, et al. Short cationic peptide derived from archaea with dual antibacterial properties and anti‐infective potential. ACS Infect Dis. 2019;5(7):1081–1086. 10.1021/acsinfecdis.9b00073 [DOI] [PubMed] [Google Scholar]
  10. Clinical and Laboratory Standards Institute . Methods for dilution antimicrobial susceptibility tests for bacteria that grow aerobically: approved standard. Wayne, PA: Clinical and Laboratory Standards Institute; 2015. [Google Scholar]
  11. De Lucca AJ, Walsh TJ. Antifungal peptides: novel therapeutic compounds against emerging pathogens. Antimicrob Agents Chemother. 1999;43(1):1–11. 10.1128/AAC.43.1.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. dos Santos, SR , Miranda, A , da Silva Junior, PI . 2022. Ovipin: a new antimicrobial peptide from chicken eggs Gallus gallus . bioRxiv 10.1101/2021.09.28.462162. [DOI]
  13. Duckert P, Brunak S, Blom N. Prediction of proprotein convertase cleavage sites. Protein Eng Des Sel. 2004;17(1):107–112. 10.1093/protein/gzh013 [DOI] [PubMed] [Google Scholar]
  14. Fleitas O, Franco OL. Induced bacterial cross‐resistance toward host antimicrobial peptides: a worrying phenomenon. Front Microbiol. 2016;7:381. 10.3389/fmicb.2016.00381 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gers FA, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM. Neural Comput. 2000;12(10):2451–2471. 10.1162/089976600300015015 [DOI] [PubMed] [Google Scholar]
  16. Goraya J, Knoop FC, Conlon JM. Ranatuerins: antimicrobial peptides isolated from the skin of the American bullfrog, Rana catesbeiana . Biochem Biophys Res Commun. 1998;250(3):589–592. 10.1006/bbrc.1998.9362 [DOI] [PubMed] [Google Scholar]
  17. Helbing CC, Hammond SA, Jackman SH, Houston S, Warren RL, Cameron CE, et al. Antimicrobial peptides from Rana [Lithobates] catesbeiana: gene structure and bioinformatic identification of novel forms from tadpoles. Sci Rep. 2019;9:1529. 10.1038/s41598-018-38442-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Hochreiter S, Schmidhuber J. Long short‐term memory. Neural Comput. 1997;9(8):1735–1780. 10.1162/neco.1997.9.8.1735 [DOI] [PubMed] [Google Scholar]
  19. Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins. 2006;65(3):712–725. 10.1002/prot.21123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Horváti K, Bacsa B, Mlinkó T, Szabó N, Hudecz F, Zsila F, et al. Comparative analysis of internalisation, haemolytic, cytotoxic and antibacterial effect of membrane‐active cationic peptides: aspects of experimental setup. Amino Acids. 2017;49(6):1053–1067. 10.1007/s00726-017-2402-9 [DOI] [PubMed] [Google Scholar]
  21. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–589. 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Khurshid Z, Najeeb S, Mali M, Moin SF, Raza SQ, Zohaib S, et al. Histatin peptides: pharmacological functions and their applications in dentistry. Saudi Pharm J. 2017;25(1):25–31. 10.1016/j.jsps.2016.04.027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kintses B, Jangir PK, Fekete G, Számel M, Méhi O, Spohn R, et al. Chemical‐genetic profiling reveals limited cross‐resistance between antimicrobial peptides with different modes of action. Nat Commun. 2019;10(1):5731. 10.1038/s41467-019-13618-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Klotman ME, Chang TL. Defensins in innate antiviral immunity. Nat Rev Immunol. 2006;6(6):447–456. 10.1038/nri1860 [DOI] [PubMed] [Google Scholar]
  25. Koo HB, Seo J. Antimicrobial peptides under clinical investigation. Pept Sci. 2019;111(5):24122. 10.1002/pep2.24122 [DOI] [Google Scholar]
  26. Laxminarayan R, Duse A, Wattal C, Zaidi AKM, Wertheim HFL, Sumpradit N, et al. Antibiotic resistance—the need for global solutions. Lancet Infect Dis. 2013;13(12):1057–1098. 10.1016/S1473-3099(13)70318-9 [DOI] [PubMed] [Google Scholar]
  27. Li C, Birol I. Candidate antimicrobial peptide sequences mined from the UniProtKB/Swiss‐Prot database using AMPlify. Zenodo. 2023. 10.5281/zenodo.8133088 [DOI] [Google Scholar]
  28. Li Y, Huang C, Ding L, Li Z, Pan Y, Gao X. Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods. 2019;166:4–21. 10.1016/j.ymeth.2019.04.008 [DOI] [PubMed] [Google Scholar]
  29. Li C, Sutherland D, Hammond SA, Yang C, Taho F, Bergman L, et al. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genomics. 2022;23:77. 10.1186/s12864-022-08310-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Li C, Warren RL, Birol I. Models and data of AMPlify: a deep learning tool for antimicrobial peptide prediction. BMC Res Notes. 2023;16(1):11. 10.1186/s13104-023-06279-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Lin D, Sutherland D, Aninta SI, Louie N, Nip KM, Li C, et al. Mining amphibian and insect transcriptomes for antimicrobial peptide sequences with rAMPage. Antibiotics. 2022;11(7):952. 10.3390/antibiotics11070952 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico‐chemical and structural features into Chou's general PseAAC. Sci Rep. 2017;7:42362. 10.1038/srep42362 [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. 2022;19(6):679–682. 10.1038/s41592-022-01488-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Nguyen LT, Haney EF, Vogel HJ. The expanding scope of antimicrobial peptide structures and their modes of action. Trends Biotechnol. 2011;29(9):464–472. 10.1016/j.tibtech.2011.05.001 [DOI] [PubMed] [Google Scholar]
  35. Novković M, Simunić J, Bojović V, Tossi A, Juretić D. DADP: the database of anuran defense peptides. Bioinformatics. 2012;28(10):1406–1407. 10.1093/bioinformatics/bts141 [DOI] [PubMed] [Google Scholar]
  36. Papareddy P, Rydengård V, Pasupuleti M, Walse B, Mörgelin M, Chalupka A, et al. Proteolysis of human thrombin generates novel host defense peptides. PLoS Pathog. 2010;6(4):e1000857. 10.1371/journal.ppat.1000857 [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Pérez de la Lastra JM, Garrido‐Orduña C, Borges AA, Jiménez‐Arias D, García‐Machado FJ, Hernández M, et al. Bioinformatics discovery of vertebrate cathelicidins from the mining of available genomes. In: Bobbarala V, editor. Drug discovery – concepts to market. London: IntechOpen; 2018. [Google Scholar]
  38. Prichula J, Primon‐Barros M, Luz RCZ, Castro ÍMS, Paim TGS, Tavares M, et al. Genome mining for antimicrobial compounds in wild marine animals‐associated Enterococci. Mar Drugs. 2021;19(6):328. 10.3390/md19060328 [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Reardon S. Antibiotic resistance sweeping developing world. Nature. 2014;509(7499):141–142. 10.1038/509141a [DOI] [PubMed] [Google Scholar]
  40. Richter A, Sutherland D, Ebrahimikondori H, Babcock A, Louie N, Li C, et al. Associating biological activity and predicted structure of antimicrobial peptides from amphibians and insects. Antibiotics. 2022;11(12):1710. 10.3390/antibiotics11121710 [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45(11):2673–2681. 10.1109/78.650093 [DOI] [Google Scholar]
  42. Sharma R, Shrivastava S, Kumar Singh S, Kumar A, Saxena S, Kumar Singh R. Deep‐ABPpred: identifying antibacterial peptides in protein sequences using bidirectional LSTM with word2vec. Brief Bioinform. 2021;22(5):bbab065. 10.1093/bib/bbab065 [DOI] [PubMed] [Google Scholar]
  43. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35(11):1026–1028. 10.1038/nbt.3988 [DOI] [PubMed] [Google Scholar]
  44. Taheri‐Araghi S. Synergistic action of antimicrobial peptides and antibiotics: current understanding and future directions. Front Microbiol. 2024;15:1390765. 10.3389/fmicb.2024.1390765 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Terreni M, Taccani M, Pregnolato M. New antibiotics for multidrug‐resistant bacterial strains: latest research developments and future perspectives. Molecules. 2021;26(9):2671. 10.3390/molecules26092671 [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. The UniProt Consortium . UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47(D1):D506–D515. 10.1093/nar/gky1049 [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Tomazou M, Oulas A, Anagnostopoulos AK, Tsangaris GT, Spyrou GM. In silico identification of antimicrobial peptides in the proteomes of goat and sheep milk and feta cheese. Proteomes. 2019;7(4):32. 10.3390/proteomes7040032 [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. van der Does AM, Hiemstra PS, Mookherjee N. Antimicrobial host defence peptides: immunomodulatory functions and translational prospects. In: Matsuzaki K, editor. Antimicrobial peptides. Advances in experimental medicine and biology. Singapore: Springer; 2019. p. 149–171. [DOI] [PubMed] [Google Scholar]
  49. van Dijk A, Veldhuizen EJA, van Asten AJAM, Haagsman HP. CMAP27, a novel chicken cathelicidin‐like antimicrobial protein. Vet Immunol Immunopathol. 2005;106(3–4):321–327. 10.1016/j.vetimm.2005.03.003 [DOI] [PubMed] [Google Scholar]
  50. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems; Long Beach: NeurIPS; 2017. [Google Scholar]
  51. Veltri D, Kamath U, Shehu A. Deep learning improves antimicrobial peptide recognition. Bioinformatics. 2018;34(16):2740–2747. 10.1093/bioinformatics/bty179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Waghu FH, Barai RS, Idicula‐Thomas S. Leveraging family‐specific signatures for AMP discovery and high‐throughput annotation. Sci Rep. 2016;6(1):24684. 10.1038/srep24684 [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Waghu FH, Gopi L, Barai RS, Ramteke P, Nizami B, Idicula‐Thomas S. CAMP: collection of sequences and structures of antimicrobial peptides. Nucleic Acids Res. 2014;42(D1):D1154–D1158. 10.1093/nar/gkt1157 [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Wang C, Garlick S, Zloh M. Deep learning for novel antimicrobial peptide design. Biomolecules. 2021;11(3):471. 10.3390/biom11030471 [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Wang G, Li X, Wang Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 2016;44(D1):D1087–D1093. 10.1093/nar/gkv1278 [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Wiegand I, Hilpert K, Hancock REW. Agar and broth dilution methods to determine the minimal inhibitory concentration (MIC) of antimicrobial substances. Nat Protoc. 2008;3(2):163–175. 10.1038/nprot.2007.521 [DOI] [PubMed] [Google Scholar]
  57. Wu Q, Ke H, Li D, Wang Q, Fang J, Zhou J. Recent Progress in machine learning‐based prediction of peptide activity for drug discovery. Curr Top Med Chem. 2019;19(1):4–16. 10.2174/1568026619666190122151634 [DOI] [PubMed] [Google Scholar]
  58. Xiao Y, Cai Y, Bommineni YR, Fernando SC, Prakash O, Gilliland SE, et al. Identification and functional characterization of three chicken cathelicidins with potent antimicrobial activity. J Biol Chem. 2006;281(5):2858–2867. 10.1074/jbc.M507180200 [DOI] [PubMed] [Google Scholar]
  59. Xiao X, Wang P, Lin W‐Z, Jia J‐H, Chou K‐C. iAMP‐2L: a two‐level multi‐label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem. 2013;436(2):168–177. 10.1016/j.ab.2013.01.019 [DOI] [PubMed] [Google Scholar]
  60. Yan J, Bhadra P, Li A, Sethiya P, Qin L, Tai HK, et al. Deep‐AmPEP30: improve short antimicrobial peptides prediction with deep learning. Mol Ther Nucleic Acids. 2020;20:882–894. 10.1016/j.omtn.2020.05.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego: Association for Computational Linguistics; 2016. p. 1480–1489. [Google Scholar]
  62. Youmans M, Spainhour C, Qiu P. Long short‐term memory recurrent neural networks for antibacterial peptide identification. 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Kansas City: IEEE; 2017. p. 498–502. [Google Scholar]
  63. Zhang L‐J, Gallo RL. Antimicrobial peptides. Curr Biol. 2016;26(1):R14–R19. 10.1016/j.cub.2015.11.017 [DOI] [PubMed] [Google Scholar]
  64. Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins. 2004;57(4):702–710. 10.1002/prot.20264 [DOI] [PubMed] [Google Scholar]
  65. Zhang Y, Skolnick J. TM‐align: a protein structure alignment algorithm based on the TM‐score. Nucleic Acids Res. 2005;33(7):2302–2309. 10.1093/nar/gki524 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1. Supporting Information.

PRO-34-e70083-s001.pdf (699.8KB, pdf)

Data Availability Statement

The data that support the findings of this study are openly available via Zenodo at https://doi.org/10.5281/zenodo.8133088 (Li & Birol 2023).


Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES