Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2026 Jan 27.
Published in final edited form as: Methods Mol Biol. 2024;2714:329–352. doi: 10.1007/978-1-0716-3441-7_18

Accelerating the Discovery and Design of Antimicrobial Peptides with Artificial Intelligence

Mariana d C Aguilera-Puga, Natalia L Cancelarich, Mariela M Marani, Cesar de la Fuente-Nunez, Fabien Plisson
PMCID: PMC12834174  NIHMSID: NIHMS2131926  PMID: 37676607

Abstract

Peptides modulate many processes of human physiology targeting ion channels, protein receptors, or enzymes. They represent valuable starting points for the development of new biologics against communicable and non-communicable disorders. However, turning native peptide ligands into druggable materials requires high selectivity and efficacy, predictable metabolism, and good safety profiles. Machine learning models have gradually emerged as cost-effective and time-saving solutions to predict and generate new proteins with optimal properties. In this chapter, we will discuss the evolution and applications of predictive modeling and generative modeling to discover and design safe and effective antimicrobial peptides. We will also present their current limitations and suggest future research directions, applicable to peptide drug design campaigns.

Keywords: Antimicrobial peptides, Machine learning, Predictive modeling, Generative modeling, Representation learning, Algorithmic bias, Antimicrobial resistance, Antibiotics

1. Introduction

Most living organisms produce antimicrobial peptides (AMPs) as defense mechanisms in response to infections. These peptides generally protect their host from pathogens by exhibiting broad-spectrum microbicidal activities against bacteria, fungi, eukaryotic parasites, and enveloped viruses [1]. In addition, AMPs have demonstrated the ability to stimulate the host’s immune responses, anti-inflammatory, and wound healing properties, as well as anti-biofilm activity [28]. Thus, these peptides are also referred to as host defense peptides or HDPs [9, 10]. Moreover, the evolution of these peptides should also be taken into consideration as a mechanism to structure microbial communities [11, 12]. AMPs have received special attention as potential alternatives to conventional antibiotics against drug-resistant infections since they possess several benefits, including potent efficacy, good safety profiles and tolerability, high selectivity, standardized synthesis protocols, predictable metabolism, and present opportunities such as the discovery of new peptides, the development of appropriate formulations, and the ability to synthesize multifunctional peptides and conjugates. Unlike conventional antibiotics, AMPs do not readily select for bacterial resistance mechanisms, which is thought to be due to their non-specific mechanisms of action on the bacterial membrane [1315]. Designing AMPs less susceptible to evolutionary resistance mechanisms without compromising the host’s innate immune response is an exciting prospect [16, 17].

Research strategies aimed at searching for novel AMPs/HDPs include isolating peptides from biological organisms or tissues across the microbial, vegetal, and animal kingdoms [18]. More recently, computational methods have been developed enabling the identification of a myriad of small encrypted domains with the desired characteristics from genomic and proteomic sequences [19, 20] and from larger proteins [2123]. These advances are beyond the scope of this review. Today, the Antimicrobial Peptide Database lists over 3500 natural peptides from six life kingdoms [18], and the Giant Repository of Antimicrobial Peptides Activities, or GRAMPA, regroups roughly 6200 natural and synthetic peptides [24]. However, very few of these molecules have reached the market. Most AMPs currently in clinical trials are modified from natural peptides, the majority of which are administered topically. Academic laboratories and biotechnological and pharmaceutical companies have devoted substantial effort to overcoming limitation such as short half-life and fast elimination (proteolytic degradation, low stability), along with toxicity toward host cells at therapeutic doses without compromising the antimicrobial potential of the peptides [2529]. Identifying peptides that limit the development of bacterial resistance while remaining cost-effective in terms of pharmaceutical development has also been a focus.

Computational models have gradually emerged as cost-effective and time-saving solutions to predict and generate new proteins with optimal properties. In silico predictive modeling accelerates the discovery of new peptides with therapeutic potential by learning the relationships between the protein sequences or structures and their functions (e.g., antimicrobial activity, thermal stability, and aggregation). Generative models create peptides that share meaningful representations (e.g., amphipathic character, conserved domain) with the original sequences or scaffolds. The synergy between predictive and generative models provides an informative way to effectively explore the vast chemical peptide space—20L solutions for each peptide of length L made with canonical residues. Several reviews have summarized the development of early probabilistic models applied to AMPs based on evolutionary information to those leveraging traditional machine learning (ML) and deep learning (DL) algorithms [3033].

In this chapter, we will describe how artificial intelligence (AI), i.e., ML and DL algorithms, can be used to accelerate the discovery and development of AMPs. First, we will introduce the different AI models that have been developed to predict and identify novel AMPs. Second, we will discuss their evolution and current limitations. Third, we will describe the development of generative AMP models. Finally, we will address current multi-objective strategies, combining predictive and generative models to discover and design safe and effective AMPs.

2. Predicting Peptide Antimicrobial Activity

Predictive ML models typically learn the relationships between the peptide sequences and their biological functions, such as their antimicrobial activity, from medium to large datasets. The power of such models relies on several factors such as their input datasets, their independent features (descriptors) and the type of algorithms used (i.e., classical ML vs neural network architectures) as illustrated in Fig. 1. Choosing the right input dataset(s) is the first important step toward model development. Most input datasets are either from in-house experiments or publicly available databases such as APD3 [18] or DBAASP [34]. Over the years, numerous AMP databases have been created adding new information, e.g., sequences, taxonomy, structure predictions, biological measurements in different media; however, only a handful provide experimentally validated 3D structures [35, 36]. The selection of descriptors with independent features representative of peptide sequence and structural diversity greatly influences the performance of ML models—an active field of study known as representation learning [37, 38]. These descriptors can be calculated based on the peptide sequence (e.g., peptide net charge, hydrophobic moment) or can be determined empirically (e.g., solubility, chromatography retention time), and these representation learning features can be derived for both single amino acids and for entire peptides [39]. Finally, AMP predictive models have gradually evolved from evolutionary and genetic algorithms (i.e., hidden Markov and linguistic models) to leveraging classical ML algorithms (i.e., random forest, support vector classifier) and DL architectures such as recurrent neural networks (RNNs) [3033]. Selecting the optimal computational method is critical to significantly accelerate the development of new peptide drug candidates.

Fig. 1.

Fig. 1

Computational AMP discovery and design workflow. This is a schematic diagram illustrating the workflow for using AI to accelerate antibiotic discovery. In the first phase (Experimental Information), large amounts of data are collected on potential antimicrobial peptides, such as their primary sequences, secondary structures, and biological activities. The data is then converted into a digital format that can be read by ML algorithms (Molecular Representation). For this purpose, relevant features are extracted from the data and presented in a standardized format. In the third phase (Machine Learning Models), the use of different algorithms (ML or DL) to create a suitable model is validated using a range of techniques to ensure their accuracy and reliability. In the final phase (Predictions and Generations), the best candidates identified by artificial intelligence models are selected for further testing and validation. (The figure was designed taking inspiration from Figure 1 [33])

2.1. Evolution of AMP Predictors

The earliest predictive models for antimicrobial activity can be traced back to the early 2000s when researchers started to use ML algorithms, such as artificial neural networks (ANN) and support vector machines (SVM), to classify antimicrobial activity—see Table 1. Such models are predominantly built as binary classification problems [i.e., whether a peptide sequence is predicted to be antimicrobial (AMP) or not (non-AMP)]. However, these models were trained on relatively small peptide datasets, which limited their accuracy and overall generalization. One of the earliest and most widely used AMP predictors is the AntiBP server, which used an SVM model trained on 1060 peptides from the Antimicrobial Peptide Database (APD) [18] to predict their activity against 10 bacterial species [41]. These models are sometimes referred to as Quantitative Structure–Activity Relationship (QSAR) studies, a term mostly associated with small molecule drug discovery since the 1960s [64].

Table 1.

List of AMP predictors (ML algorithms)

Year Name Algorithm(s) Training size Feature(s) References
2007 AntiBP SVM, ANN, NNA 872 AAC, diAAC [40]
2010 AntiBP2 SVM, QM, ANN 1998 AAC, PseAAC [41]
2011 Biosino NNA 12,766 AAC, PseAAC, PCP [42]
2012 ANFIS ANFIS 231 AAC, diAAC, PCP [43]
2012 AMPA SVM 100 AI [44]
2012 CS-AMPPred SVM poly. 620 PCP, AAC [45]
2012 ClassAMP RF, SVM 1362 AAC, PseAAC, PCP [46]
2013 PeptideLocator BRNN 4505 SF [47]
2013 iAMP-2L ML-FKNN 3175 PseAAC [48]
2014 DBAASP SVM 1153 AAC, PseAAC, PCP [49]
2015 ADAM SVM, HMM 7000 SF [50]
2015 Ng et al. SVM-LZ 12,766 PSSM [51]
2016 CAMPR3 RF, SVM 7021 PCP [52]
2016 MLAMP RF-ML 3284 AAC, PCP [53]
2017 iAMPPred SVM 8913 PseAAC, NAAC, PCP, SF [54]
2018 AmPEP RF 170,059 CTD global [55]
2018 Joker “linguistic” 303 Patterns [56]
2019 dbAMP RF 19,424 AAC, PCP [57]
2019 AMAP SVM, XGBoost 11,267 AAC [58]
2020 amPEPpy 1.0 RF 170,059 PseAAC, PCP [59]
2020 IAMPE KNN, SVM, RF, XGB 5739 NMR-based clustering, PCP [60]
2020 AmpGram RF 4926 Encoded N-Grams [61]
2020 Meta-iAVP RF, KNN, SVM 1088 AAC, diAAC, GDC, PseAAC [62]
2021 Ensemble-AMPPred Ensemble learning 1840 PseAAC, CTD, PCP [63]

Algorithms— ANFIS adaptive neuro-fuzzy inference system, ANN artificial neural network, BRNN bidirectional recursive neural networks, (ML-F)KNN (multi-level fuzzy) K-nearest neighbor, NNA nearest neighbor algorithm, RF random forest, SVM support vector machine, LZ Lempel-Ziv complexity, QM quantitative matrix, XGB extra gradient boosting. Features—AA amino acid, (Am)PseAAC (amphiphilic) pseudo amino acid composition, (N)AAC (normalized) amino acid composition, AI antimicrobial indices, CTD composition-transition-distribution, diAAC dipeptide composition, GDC g-gap dipeptide composition, N-Grams amino-acid motifs, PCP physicochemical properties, PSSM position-specific scoring matrix, SF structural features

Due to the lack of extensive experimentally known AMP structures in databases, most ML models predict the antimicrobial activity solely based on sequence information and thus are referred to as sequence-first models [65]. These sequence-first approaches rely on the recognition of different sequence motifs and physicochemical properties of existing AMPs to predict new peptides. Sequence-first models are based on classical ML algorithms which require numerical features encoding different peptide characteristics (e.g., amino acid composition, k-mers, or sequence-derived physicochemical properties) on a vector of finite length [66]. Sequence-first AMP predictors based on classical ML algorithms include DBAASP [49], ADAM [50], and iAMPpred [54]—see Table 1. These predictive models integrate multiple features of AMPs in numerical forms, including sequence, structure, and physicochemical properties. The most widely used algorithms are SVM, Decision Tree (CART), Random Forest (RF), and Gradient Boosting (GB). Random Forest is commonly used for predictive modeling tasks due to its ability to handle high-dimensional data and capture complex interactions among features [67]. It works by generating an ensemble of decision trees, where each tree is trained on a random data subset and features. The final prediction is made by aggregating the predictions of all the trees. In addition to its ability to handle high-dimensional data, Random Forest is a powerful algorithm for dealing with imbalanced datasets, which is often the case in AMP prediction [55]. Thus, it is not surprising that one-third of the 25 models listed in Table 1 used this algorithm.

However, classical ML techniques such as SVMs find it challenging to predict complex interactions between AMPs and bacterial membranes, as well as potential synergistic interactions, when combining different peptides. Recent sequence-first AMP predictors using Deep Learning (DL) architectures—convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs)—have also been explored to improve AMP prediction using large data sets and complicated modeling structures. These architectures aim to capture the nonlinear interactions between peptide sequence, structure, and antimicrobial activity to increase the performance of AMP predictors. Unlike classical ML models, these nonlinear models predict the antimicrobial activity of AMPs directly from their sequence, without intermediate features. Algorithms leveraging DL approaches are particularly promising because they can learn complex features and relationships within peptide sequences without the need for manual feature engineering. Peptide sequences are converted into numerical vectors using multiple encoding techniques, such as one-hot encoding where each amino acid is represented by a binary vector of length 20. The vector encoding for the entire peptide sequence will vary with the peptide length. The input datasets representing the AMP dataset are vectors of variable lengths. In some cases, DL algorithms require vectors of fixed length constraining the modeler to pad the existing vectors based on the longest one. AMP predictors integrating DL architectures include AMP scanner [68], DeepAMP [69], and Pep-CNN [70]—see Table 2. An exhaustive list of predictive models applied to AMPs and other types of peptides can be consulted at https://biogenies.info/peptide-prediction-list/ [78].

Table 2.

List of AMP predictors (DL algorithms)

Year Name Algorithm(s) Training size Feature(s) References
2018 PepCVAE VAE 1.6 M Learned representation with characters [71]
2018 AmPEP CNN 1400 N-Grams, Composition [55]
2018 AMP Scanner BiLSTM-RNN 3818 PSSM, Composition [68]
2018 Nagarajan et al. LSTM 1512 Character sequence [72]
2020 PepGAN GAN 22,231 Character sequence [73]
2021 iAMP-CA2L CNN-BiLSTM-SVM Composition, PCP [74]
2021 AMPlify CNN 18,860 Genomic and biochemical features [75]
2021 AI4AMP BiLSTM 706 PCP [37]
2021 sAMP-PFPDeep DNN PCP [76]
2022 LMPred CNN 3556 PCP [77]

Algorithms—BiLSTM (bidirectional) long- to short-term memory, CNN convolutional neural networks, DNN deep neural networks, GAN generative adversarial networks, NLM natural language model, RNN recurrent neural networks, VAE variational autoencoder. Features—N-Grams amino-acid motifs, PCP physicochemical properties, PSSM position-specific scoring matrix

2.2. Limitations

The use of ML can help reduce the cost and time associated with peptide design and discovery. However, biases in input dataset (s) used to train predictive models may affect their performance and our understanding of how and what they learn. Biological datasets are riddled with biases, and a systematic auditing to debiasing before building predictive ML models is essential [79]. Current research efforts to debiasing AMP datasets during recollection include early detection of taxonomic bias and structural bias.

Taxonomic bias refers to the high prevalence of homologous peptide sequences while sampling data from a limited set of source organisms. The study conducted by Rádai and co-workers (2021) illustrated taxonomic bias in predicting AMPs of invertebrate origins, mainly arthropods [80]. The authors collected a dataset of 4575 invertebrate peptides and compared the performances of 20 AMP predictors—ADAM [50], AMPscanner [68], AmpGram [61], CAMPR3 [52], ClassAMP [46], CS-AMPpred [45], iAMP-2L [48], IAMPE [60], iAMP [54], and StM [81]. The results indicated that the performance varied depending on the taxon. Indeed, some tools showed increased performance for certain taxa. Overall, ADAM showed the best performance across all 19 taxa. In addition, the predictive accuracy was found to be very low for many of these tools, suggesting room for improvement, perhaps through the development of generalized ML models. The authors also identified another bias, namely algorithmic bias, where the hidden Markov model provided the best results in most taxa. Two solutions have been proposed to mitigate taxonomic bias across model predictions: (1) reducing the number of homologous sequences in input dataset(s) with tools like CD-HIT [82] and (2) combining predictions from multiple AMP identifying tools [80].

Structural bias refers to the high prevalence of a specific tridimensional structure in datasets. Most predictive ML models rely exclusively on sequence data, and the 3D- structures of peptides are largely ignored. If there is a biased distribution of amino acids, dipeptides, or oligopeptides, which are often linked to the physicochemical properties used for model training, then particular secondary structures would be adopted. As a result, predictive ML models built from sequences favoring certain folds would predict these folds more accurately. With that concept in mind, Aldas-Bulos and Plisson reported a trustworthy approach to assess the many folds of sizable sequence sets [83]. The authors first illustrated its use for fold discovery and mapped the structural space of GRAMPA, a library of over 5900 AMPs. Helices and random coils represented most of the folds (~83%), the rest included β-stranded and mixed structures. They notably showed that all datasets targeting specific strains such as Escherichia coli, Staphylococcus aureus, and Pseudomonas aeruginosa presented skewed distributions toward α-helices. Consequently, ML models trained on these distributions would predict the antimicrobial activity of α-helical sequences more accurately. A handful of predictive ML models have taken structural information into consideration. Some researchers have excluded peptide sequences based on structural limitations, so they presumably fold into the same 3D structure(s), and the structural effects upon prediction are reduced. For example, Dean and co-workers have recently developed tree-based regression models capable of predicting the minimum inhibitory concentration (MIC) against the three strains E. coli, S. aureus, P. aeruginosa for peptides folding uniquely into α-helices [84]. Currently, no methods have been developed to address structural bias. One may encourage researchers to synthesize and test peptides folding into non-helical structures for antimicrobial activity via combinatorial libraries, high-throughput screening, and rational design. The structural information of these active and inactive peptides would be added to existing predictive models in order to generalize over antimicrobial activity regardless of structural diversity. Alternatively, generative algorithms and de novo protein design provide potential solutions to “hallucinate” AMPs from underrepresented folds.

Moreover, a potential bias might exist in the methods used to measure antimicrobial activity, as the biological assays used may not accurately reflect the activity of an AMP in vivo or may use laboratory bacterial strains that are not representative of clinical pathogens of interest [85]. Indeed, in vitro assays are usually performed with bacterial strains that are easy to grow and obtain, but studies involving clinical isolates are lacking. Furthermore, experimental variations between laboratories make it difficult to collect biological measurements without generating noisy data that often leads to poor implementation of regression models. The development of GRAMPA [24], the giant repository of Antimicrobial peptides (AMPs) made from multiple publicly available databases and their corresponding MIC values against different bacteria strains, provides an excellent opportunity to develop novel regression models. For example, Dean and co-workers have notably curated the original GRAMPA datasets to focus on α-helical AMPs [84].

Since AMPs are produced as part of the innate immune system, they interact with immune cells to eliminate pathogens and prevent infections. The assays used to measure the antimicrobial activity of AMPs do not take into account these complex interactions, so the efficacy of AMPs in vivo may additionally be influenced by complex host immune responses [86]. On the other hand, the use of positive and negative labels is common practice in prediction models for AMPs, but negative data sets may contain peptides with different activities, leading to a generalization of antimicrobial activity for each predictive model [87]. To develop accurate ML models, high-quality datasets with representative peptide samples and appropriate features (descriptors) must be developed and used for training and validation, keywords used for data collection should be included [18, 78] and the information about MIC values should be consistent.

2.3. Perspectives

Predictive models have only recently been applied to identify the mechanisms of action of AMPs. In 2016, Lee and co-workers reported a seminal support-vector classifier to predict the membrane activity of AMPs rather than their antimicrobial function [88, 89]. Two years later, Brand and co-workers implemented unsupervised ML methods to classify membrane-active peptides from differential scanning calorimetry and circular dichroism experiments [90]. Despite being limited to α-helical AMPs, these studies revealed that physicochemical properties such as amphiphilicity and helical propensity could help predict the peptides’ activity on the membrane.

3. Generating Novel AMPs

Generative models can produce artificial peptides and proteins with desired properties or functions. Such sequences may resemble the original peptides or proteins as the algorithm intends to learn critical representations (e.g., an interactive domain). The design of novel AMPs using in silico generators initiated in the early 2000s with the artificial substitutions of one or more amino acids across sequences, under selective conditions. These algorithms included random sequence generators, genetic algorithms, evolutionary algorithms, or the ant colony optimization algorithm [31]. In the last 5 years, deep neural network architectures have translated from image classification, object recognition, or language translation to support peptide design and protein engineering. The most common algorithms applied to AMPs include recurrent neural networks (RNNs), convolutional neural networks (CNNs), variational autoencoders (VAEs), or generative adversarial networks (GANs) [32, 33]. RNNs are popular for modeling sequences of data, such as sentences or protein sequences. These algorithms capture the long-term dependencies between the amino acids in the sequence, making them well suited for predicting the properties and functions of (mini-)proteins. Vastly applied in computer vision, CNNs learn the patterns by treating protein sequences as one-dimensional vectors or two-dimensional images. VAEs are neural networks that learn to map input data to a lower-dimensional latent space representation. They generate new protein sequences by sampling from the latent space distribution, subsequently decoding the samples into protein sequences. Finally, GANs consist of two neural networks, a generator and a discriminator, which work together to generate new protein sequences with desired properties. The generator learns to generate new protein sequences by sampling from a latent space distribution, while the discriminator learns to distinguish between real and generated protein sequences. The generator is trained to produce protein sequences that can fool the discriminator into thinking that they are real, while the discriminator is trained to identify the generated protein sequences.

3.1. Evolutionary-Based Generators

Conventional peptide design methods have had limited success in producing AMPs with high potency and low toxicity. Early models like de novo design, linguistic models, and pattern insertion methods designed AMPs through a grammar model which considered the primary sequences as a “vocabulary” and the pattern of repetition of amino acid residues as “rules,” whereas the genetic algorithms produced successive generations of mutation and deletion events on the target sequence to improve the peptide performance and identification of patterns associated with antimicrobial activity. This has led to the exploration of alternative approaches, including evolutionary-based generators. Evolutionary algorithms are a powerful optimization method inspired by natural selection to evolve a molecule toward a desired biological performance through successive generations of event mutations (i.e., mutation, selection, and recombination events). In the context of AMP design, evolutionary algorithms have been used to design new peptides with increased antibacterial activity, stability, and specificity [31, 91, 92]. One advantage of evolution-based generators is that they can search a much larger sequence space than conventional approaches. This is particularly advantageous when designing AMPs with complex biological functions such as broad-spectrum activity and/or high selectivity. Another advantage of evolution-based generators is that they can go beyond what is currently known in sequence space, leading to the identification of previously undiscovered motifs or combinations of motifs. Genetic algorithms can classify virtually any AMP sequence using fitness functions based on antimicrobial activity descriptors and structural information collected from databases [91, 92]. A study using a genetic algorithm [91] led to the design of guavanin 2, a synthetic peptide that displayed anti-infective efficacy in a preclinical mouse model, demonstrating that machines could be used to generate preclinical antimicrobial candidates.

3.2. Deep Learning Models

Several computational peptide sequence generators are based on deep learning algorithms that learn the underlying patterns and relationships between the amino acid sequences and their corresponding peptide/protein structures and biological functions. The algorithms include recurrent neural networks (RNNs), convolutional neural networks (CNNs), variational autoencoders (VAEs), and generative adversarial network (GANs), as shown in Table 3.

Table 3.

List of deep learning AMP generators

Year Name Training size Feature(s) Algorithm (s) References
2018 Müller et al 29,142 One-hot encoding LSTM-RNN [93]
2018 Nagarajan et al. 1512 Character sequence RNN [72]
2018 PepCVAE 1.6 M One-hot encoding VAE [71]
2020 Caceres-Delpiano et al. 2 M Learned representation using structural and evolutionary data VAE [94]
2020 Tucs et al. 508,000 Character sequence GAN [73]
2020 AMPAGANv2 6238 Learned representation using character sequence GAN [95]
2021 Wang et al. 2851 One-hot encoding LSTM-RNN [96]
2021 Das et al. 41,566 Learned representation using character sequence VAE [97]
2021 PepGAN 22,231 Learned representation using character sequence VAE [73]
2021 PandoraGAN 1276 Character sequence GAN [98]
2023 HydrAMP 11,131 Learned representation using character sequence cVAE [99]

Algorithms— LSTM long-short term memory, RNN recurrent neural network, (c)VAE (conditional) variational auto-encoders, GAN generative adversarial networks

Due to their popularity in learning patterns from sequential data, RNNs have been applied to analyze peptide sequences and generate new peptides with similar characteristics. In 2018, Müller and co-workers used the long short-term memory (LSTM) network to create novel antibacterial peptides with low toxicity against Gram-negative and Gram-positive bacteria [93]. The authors used LSTM to overcome the vanishing gradient problem associated with training RNNs on long sequences. The authors proposed an RNN-based model called DeepEyes, which uses a generative approach to identify new AMPs. The model takes the target protein as input and generates a new peptide sequence that binds to the protein’s target site. They showed that their model can generate novel peptides with high binding affinity and antimicrobial activity [93].

Another study by García-Jacas and co-workers compared the performance of different ML models, including DL models, to identify AMPs [100]. They found that DL models, especially RNNs, performed better than other models. In 2019, Hamid and Friedberg proposed a new method that combines word embedding with deep RNNs to identify AMPs from amino acid sequences [101]. Their model, called DeepAMP+, uses a bidirectional LSTM network to capture both forward and reverse dependence of amino acid sequence. They compared the performance of their model with several other classification methods and showed that DeepAMP performed best at identifying AMPs. The same year, Bolatchiev and co-workers created DeepAMP+, an RNN-based model used to engineer new AMPs that can reduce mortality in experimental sepsis. The model was trained on a large database of known AMPs and used to design new peptides with potent antimicrobial activity and low toxicity [102]. Finally, Sharma et al. (2022) developed Deep-ABPpred; the model uses bidirectional LSTM and word2vec techniques to capture the complex dependencies and semantic relationships between amino acid residues in protein sequences for the identification of antibacterial peptides in protein sequences [103].

Other DL algorithms have been applied to the field of AMPs. For example, in 2019, Dean and Walper introduced Variational Autoencoder for Antimicrobial Peptides (VAE-AMP), a VAE trained on a large database of known AMPs to generate peptides with desired properties, such as increased antimicrobial activity or decreased toxicity [104]. In 2022, Ghorbani and co-workers developed a deep attention-based variational autoencoder (DAVAE) for antimicrobial peptide discovery [105]. The DAVAE model consists of an encoder that maps the amino acid sequence of a peptide into a latent space and a decoder that generates a new peptide sequence from a random point in the latent space. The attention mechanism in DAVAE enables the model to focus on specific parts of the input sequence that are more relevant to the prediction of antimicrobial activity. CNNs were trained to distinguish AMPs from non-AMPs by analyzing their amino acid sequences. In 2019, Su and co-workers utilized multi-scale CNNs for the identification of antimicrobial peptides [106]. The multi-scale CNN architecture allowed the model to effectively capture the spatial and structural features of peptides at different scales. In 2020, Yan and co-workers demonstrated that CNNs were effective in predicting the antimicrobial activity of novel peptides, suggesting their potential for developing new AMPs with improved properties [107]. Finally, GANs are another DL model used for generating unique AMPs, as they can produce new data that is comparable to a given dataset. In 2021, Van Oort and co-workers employed a generative adversarial network to engineer novel AMPs that showed stronger antimicrobial activity than the original training dataset [108]. Tucs and co-workers present a novel method for engineering high-activity AMPs using Activity-Aware Generative Adversarial Networks (AA-GAN) [73]. The model AA-GAN uses a combination of CNNs and GANs to generate new peptides. The authors showed that their method could generate peptides with activity comparable to that of the conventional antibiotic ampicillin.

3.3. Perspectives

Most generative models applied to AMPs and peptides are based on their primary sequences, lacking structural information. Alternatively, protein designers have devised robust computational strategies to challenge the sequence-structure-function problem upside down, namely by generating sequences that fold into a predefined tridimensional structure derived from native protein folds or generated de novo [109111]. These approaches, merged under umbrella terms inverse protein folding [112], fixed-backbone, or structure-based protein design [113], sample amino acid distributions to find optimal sequences for a given fold. Computational biologists have notably generated new functional proteins with unprecedented structures [114], artificial immunoglobulins [115], or de novo TIM barrels [116, 117]. ProteinMPNN [118] and MutDock [119] are two recently reported tools democratizing fixed-backbone peptide/protein design practices. Ultimately, reinforcement learning, another branch of artificial intelligence, has moved from board games to design proteins [120].

4. Interplay

Peptide drug design is a multi-objective problem, as it requires optimizing for multiple criteria simultaneously, including efficacy, safety, bioavailability, and cost-effectiveness. Predictive and generative ML models interplay to provide a grounded sequence design to address these challenges. Predictive modeling enables efficient screening and selection of promising candidates. Generative modeling, such as molecular design algorithms and DL-based approaches, generates novel peptide/protein sequences that meet desired properties or functions and optimizes multiple objectives simultaneously. By combining these two modeling strategies, computational biologists and drug designers can explore the vast peptide sequence space in an efficient manner and identify candidates that present the desired properties—as summarized in Fig. 2. Here we listed a series of examples where the authors combined predictive and generative algorithms to develop AMPs.

Fig. 2.

Fig. 2

Streamlining the DMTA cycle for computational peptide design. DMTA stands for design, make, test, and analyze. First, the computational protein designer “designs” the experiment and selects one or more seed peptides. These peptides will be encoded to train ML generators and predictors. Second, the generative model “makes” new peptide sequences based on certain conditions (e.g., maintain amphipathic character, keep 3D structure) similar to a medicinal chemistry campaign. Third, predictive models substitute biological assays and “test” all sequences for potential candidates. Finally, the last step “analyzes” if the generated sequences comply with the desired characteristics. (This figure was adapted from Figure 1 [121])

The first wave of computational strategies combined generative algorithms based on evolution with predictive modeling for antimicrobial activity. In 2013, Maccari and co-workers developed an evolutionary multi-objective optimization approach to design new AMPs with improved biological activity and selectivity. The authors used a genetic algorithm to explore the chemical space of peptides and optimized multiple objectives simultaneously, including antimicrobial activity, hemolytic activity, and physicochemical properties. The potential of their approach was demonstrated by generating new AMPs with potent antimicrobial activity and low toxicity [122]. Another study by Müller and co-workers introduced a sparse neural network model for predicting the antimicrobial activity of peptides from their sequence-derived physicochemical properties. The authors used an evolutionary algorithm to optimize the weights of the neural network [123]. In 2018, Beltran and co-workers reported an evolutionary feature weighting approach for selecting the most relevant molecular descriptors for AMP classification. The authors used a genetic algorithm to optimize the weights of the molecular descriptors and demonstrated the potential of their approach by achieving high accuracy in classifying a large dataset of AMPs [124]. The same year, Porto and co-workers developed a computational approach for designing AMPs using a genetic algorithm. The algorithm was based on a fitness function that incorporated both the antimicrobial activity and the physicochemical properties of guava peptides. The authors tested their approach on a large dataset of natural and synthetic AMPs and found that the algorithm was able to generate new sequences with higher activity than the original peptides [91]. Around the same time, Yoshida and co-workers reported the use of genetic algorithm and predictive modeling to develop analogues of 13-mer peptide Temporin-Ali and succeeded by producing highly potent antimicrobial peptides against E. coli [125]. In 2019, the platform ADAPTABLE, designed by Ramos-Martín and co-workers, integrates multiple computational approaches, including machine learning and evolutionary algorithms, to provide a comprehensive and user-friendly tool for AMP design [126]. Finally, in 2021, Plisson and co-workers designed ~50 new non-hemolytic AMPs by generating random AMPs sequences from measured amino acid frequencies and predicted their non-hemolytic nature or activity from GB algorithms [127].

In the last 5 years, researchers have combined predictive models with generative DL architectures. In 2021, Capecchi and Reymond generated and analyzed a large library of peptides, with the goal of understanding their chemical diversity and distribution in chemical space. This study highlighted multiple distinct regions in peptide chemical space that were enriched in sequences with specific properties, including α-helical structures, cationic charge, or hydrophobicity [128]. The same year, the authors trained RNNs with data from DBAASP (Database of Antimicrobial Activity and Structure of Peptides) to design small non-hemolytic α-helical antimicrobial peptides [129]. In 2022, Agüero-Chapin et al. (2022) proposed a computational approach to identify new AMPs combining evolutionary algorithms and machine learning techniques. First they generated a large library of peptide sequences and then trained a SVM model to predict their antimicrobial activity. The authors tested their approach on a dataset of AMPs with known activity and found that their method was able to identify new peptides with high activity against several bacterial pathogens [130]. In 2022, Röckendorf and co-workers reported a multi-objective optimization approach for designing new membrane-active peptides with improved antimicrobial activity and biocompatibility. The authors used a genetic algorithm to explore the chemical space of peptides and optimized multiple objectives simultaneously, including antimicrobial activity, hemolytic activity, and cell viability [131]. Finally, this year, Szymczack and co-workers reported HydrAMP, a variational autoencoder trained to generate peptide sequences conditioned on antimicrobial activities [99].

5. Conclusions

This chapter discusses the importance of predictive and generative machine learning (ML) models in peptide drug design. Predictive models rely on input datasets and descriptors with independent features to learn mathematical relationships between peptide sequences and their biological indices. Designing the right AI models can significantly accelerate the development of new peptide drug candidates. Generative models produce artificial peptides with desired properties or functions, using algorithms such as RNNs, CNNs, VAEs, or GANs. Peptide drug design is a multi-objective problem, and the interplay between predictive and generative ML models provides a viable design strategy to address these challenges. By combining these two modeling strategies, computational biologists and drug designers have the ability to search through the vast peptide sequence space and pinpoint candidates that fulfill the objectives of interest, ultimately leading to antimicrobial candidates.

However, these models are not exempt from biases and errors. The choice of peptide encoders and algorithms affects model performance. To obtain a robust model with experimental predictions, the coupling of in silico models tested with in vitro and in vivo validations needs to be improved. It will be essential to continue to enrich the databases with high-quality and readily available data. However, there are still some difficulties to overcome regarding the clinical use of synthetic AMP peptides such as production costs, proteolytic stability, and unknown toxicity during systemic administration. The entry of several synthetic AMPs into clinical trials, added to the many peptide mimetics as antimicrobial agents already in the pipeline, suggests an exciting future in this area.

AI could help circumvent all these hurdles by taking into account experimental data including D-amino acids or non-natural residues to reduce the lability to degradation by proteases and by reducing toxicity through the use of formulations. Reliable computational toxicology predictions are being developed to improve models that explicitly consider crucial preclinical toxicological end points for AMPs. It will also be essential to think about the development of models to assess the mechanisms of action of peptides. The combination of expertise, through a close collaboration between computational and experimental scientists, will be critical. Finally, federated learning across institutions could facilitate collaborations without the need to share confidential datasets.

Acknowledgments

Mariana del Carmen Aguilera-Puga (M.C.A.-P.) and Fabien Plisson (F.P.) are thankful to the Mexican research council Consejo Nacional de Ciencia y Tecnología (CONACYT), grant number A1-S-32579 and Premio Rosenkranz 2021 (FunSalud - Fundación para la Salud, A.C. and Roche, AG). M.C.A.-P. was the recipient of national CONACYT postgraduate scholarship. F.P. was supported by a Cátedras CONACYT fellowship—2017–2022. Natalia L. Cancelarich (N.L.C.) was awarded a postdoctoral fellowship by the Argentinian research council Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET). Mariela M. Marani (M.M.M.) is a researcher of CONICET. Cesar de la Fuente-Nunez (C.F.N.) holds a Presidential Professorship at the University of Pennsylvania, is a recipient of the Langer Prize by the AIChE Foundation, and acknowledges funding from the IADR Innovation in Oral Care Award, the Procter & Gamble Company, United Therapeutics, a BBRF Young Investigator Grant, the Nemirovsky Prize, Penn Health-Tech Accelerator Award, the Dean’s Innovation Fund from the Perelman School of Medicine at the University of Pennsylvania, the National Institute of General Medical Sciences of the National Institutes of Health under award number R35GM138201, and the Defense Threat Reduction Agency (DTRA; HDTRA11810041, HDTRA1–21-1–0014, and HDTRA1–23-1–0001).

Footnotes

Competing Interests C.F.N. provides consulting services to Invaio Sciences and is a member of the Scientific Advisory Boards of Nowture S.L. and Phare Bio. The remaining authors declare no competing interests.

References

  • 1.Mookherjee N, Anderson MA, Haagsman HP, Davidson DJ (2020) Antimicrobial host defence peptides: functions and clinical potential. Nat Rev Drug Discov 19:311–332. 10.1038/s41573-019-0058-8 [DOI] [PubMed] [Google Scholar]
  • 2.Bowdish DME, Davidson DJ, Scott MG, Hancock REW (2005) Immunomodulatory activities of small host defense peptides. Antimicrob Agents Chemother 49:1727–1732. 10.1128/AAC.49.5.1727-1732.2005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Franco OL (2011) Peptide promiscuity: an evolutionary concept for plant defense. FEBS Lett 585:995–1000. 10.1016/j.febslet.2011.03.008 [DOI] [PubMed] [Google Scholar]
  • 4.Steinstraesser L, Hirsch T, Schulte M et al. (2012) Innate defense regulator peptide 1018 in wound healing and wound infection. PLoS One 7:e39373. 10.1371/journal.pone.0039373 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.de la Fuente-Núñez C, Reffuveille F, Fernández L, Hancock RE (2013) Bacterial biofilm development as a multicellular adaptation: antibiotic resistance and new therapeutic strategies. Curr Opin Microbiol 16:580–589. 10.1016/j.mib.2013.06.013 [DOI] [PubMed] [Google Scholar]
  • 6.de la Fuente-Núñez C, Cardoso MH, de Souza Cândido E et al. (2016) Synthetic antibiofilm peptides. Biochim Biophys Acta BBA – Biomembr 1858:1061–1069. 10.1016/j.bbamem.2015.12.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Xhindoli D, Pacor S, Benincasa M et al. (2016) The human cathelicidin LL-37—a pore-forming antibacterial peptide and host-cell modulator. Biochim Biophys Acta BBA – Biomembr 1858:546–566. 10.1016/j.bbamem.2015.11.003 [DOI] [PubMed] [Google Scholar]
  • 8.Agbale CM, Sarfo JK, Galyuon IK et al. (2019) Antimicrobial and antibiofilm activities of helical antimicrobial peptide sequences incorporating metal-binding motifs. Biochemistry 58:3802–3812. 10.1021/acs.biochem.9b00440 [DOI] [PubMed] [Google Scholar]
  • 9.Hancock REW, Haney EF, Gill EE (2016) The immunology of host defence peptides: beyond antimicrobial activity. Nat Rev Immunol 16:321–334. 10.1038/nri.2016.29 [DOI] [PubMed] [Google Scholar]
  • 10.Haney EF, Straus SK, Hancock REW (2019) Reassessing the host defense peptide landscape. Front Chem 7:1–22. 10.3389/fchem.2019.00043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Flechas SV, Acosta-González A, Escobar LA et al. (2019) Microbiota and skin defense peptides may facilitate coexistence of two sympatric Andean frog species with a lethal pathogen. ISME J 13:361–373. 10.1038/s41396-018-0284-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Saati-Santamaría Z, Baroncelli R, Rivas R, García-Fraile P (2022) Comparative genomics of the genus pseudomonas reveals host- and environment-specific evolution. Microbiol Spectr 10:e0237022. 10.1128/spectrum.02370-22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kraus D, Peschel A (2006) Molecular mechanisms of bacterial resistance to antimicrobial peptides. In: Shafer WM (ed) Antimicrobial peptides and human disease. Springer, Berlin, Heidelberg, pp 231–250 [DOI] [PubMed] [Google Scholar]
  • 14.Anaya-López JL, López-Meza JE, Ochoa-Zarzosa A (2013) Bacterial resistance to cationic antimicrobial peptides. Crit Rev Microbiol 39:180–195. 10.3109/1040841X.2012.699025 [DOI] [PubMed] [Google Scholar]
  • 15.Assoni L, Milani B, Carvalho MR et al. (2020) Resistance mechanisms to antimicrobial peptides in Gram-positive bacteria. Front Microbiol 11:593215. 10.3389/fmicb.2020.593215 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Magana M, Pushpanathan M, Santos AL et al. (2020) The value of antimicrobial peptides in the age of resistance. Lancet Infect Dis 20:e216–e230. 10.1016/S1473-3099(20)30327-3 [DOI] [PubMed] [Google Scholar]
  • 17.Jangir PK, Ogunlana L, Szili P et al. (2023) The evolution of colistin resistance increases bacterial resistance to host antimicrobial peptides and virulence. eLife 12:e84395. 10.7554/eLife.84395 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Wang G, Li X, Wang Z (2016) APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res 44:D1087–D1093. 10.1093/nar/gkv1278 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Xu J, Li F, Leier A et al. (2021) Comprehensive assessment of machine learning-based methods for predicting antimicrobial peptides. Brief Bioinform 22:bbab083. 10.1093/bib/bbab083 [DOI] [PubMed] [Google Scholar]
  • 20.Torres MDT, Melo MCR, Flowers L et al. (2022) Mining for encrypted peptide antibiotics in the human proteome. Nat Biomed Eng 6:67–75. 10.1038/s41551-021-00801-1 [DOI] [PubMed] [Google Scholar]
  • 21.Brand GD, Magalhães MTQ, Tinoco MLP et al. (2012) Probing protein sequences as sources for encrypted antimicrobial peptides. PLoS One 7:e45848. 10.1371/journal.pone.0045848 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ramada MHS, Brand GD, Abrão FY et al. (2017) Encrypted antimicrobial peptides from plant proteins. Sci Rep 7:13263. 10.1038/s41598-017-13685-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Brand GD, Ramada MHS, Manickchand JR et al. (2019) Intragenic antimicrobial peptides (IAPs) from human proteins with potent antimicrobial and anti-inflammatory activity. PLoS One 14:e0220656. 10.1371/journal.pone.0220656 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Witten J, Witten Z (2019) Deep learning regression model for antimicrobial peptide design. 10.1101/692681 [DOI] [Google Scholar]
  • 25.Haney EF, Hancock REW (2013) Peptide design for antimicrobial and immunomodulatory applications. Pept Sci 100:572–583. 10.1002/bip.22250 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Bahar AA, Ren D (2013) Antimicrobial peptides. Pharmaceuticals 6:1543–1575. 10.3390/ph6121543 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.de la Fuente-Núñez C, Reffuveille F, Mansour SC et al. (2015) D-enantiomeric peptides that eradicate wild-type and multidrug-resistant biofilms and protect against lethal Pseudomonas aeruginosa infections. Chem Biol 22:196–205. 10.1016/j.chembiol.2015.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Li FF, Brimble MA (2019) Using chemical synthesis to optimise antimicrobial peptides in the fight against antimicrobial resistance. Pure Appl Chem 91:181–198. 10.1515/pac-2018-0704 [DOI] [Google Scholar]
  • 29.Lima PG, Oliveira JTA, Amaral JL et al. (2021) Synthetic antimicrobial peptides: characteristics, design, and potential as alternative molecules to overcome microbial resistance. Life Sci 278:119647. 10.1016/j.lfs.2021.119647 [DOI] [PubMed] [Google Scholar]
  • 30.Fjell CD, Jenssen H, Hilpert K et al. (2009) Identification of novel antibacterial peptides by chemoinformatics and machine learning. J Med Chem 52:2006–2015. 10.1021/jm8015365 [DOI] [PubMed] [Google Scholar]
  • 31.Fjell CD, Hiss JA, Hancock REW, Schneider G (2012) Designing antimicrobial peptides: form follows function. Nat Rev Drug Discov 11:37–51. 10.1038/nrd3591 [DOI] [PubMed] [Google Scholar]
  • 32.Cardoso MH, Orozco RQ, Rezende SB et al. (2020) Computer-aided design of antimicrobial peptides: are we generating effective drug candidates? Front Microbiol 10:1–15. 10.3389/fmicb.2019.03097 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Melo MCR, Maasch JRMA, de la Fuente-Nunez C (2021) Accelerating antibiotic discovery through artificial intelligence. Commun Biol 4:1–13. 10.1038/s42003-021-02586-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Pirtskhalava M, Gabrielian A, Cruz P et al. (2016) DBAASP v.2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides. Nucleic Acids Res 44:D1104–D1112. 10.1093/nar/gkv1174 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bin Hafeez A, Jiang X, Bergen PJ, Zhu Y (2021) Antimicrobial peptides: an update on classifications and databases. Int J Mol Sci 22:11691. 10.3390/ijms222111691 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ramazi S, Mohammadi N, Allahverdi A et al. (2022) A review on antimicrobial peptides databases and the computational tools. Database (Oxford) 2022:baac011. 10.1093/database/baac011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Lin T-T, Yang L-Y, Lu I-H et al. (2021) AI4AMP: an antimicrobial peptide predictor using physicochemical property-based encoding method and deep learning. mSystems 6:e00299–e00221. 10.1128/mSystems.00299-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Erjavac I, Kalafatovic D, Mauša G (2022) Coupled encoding methods for antimicrobial peptide prediction: how sensitive is a highly accurate model? Artif Intell Life Sci 2:100034. 10.1016/j.ailsci.2022.100034 [DOI] [Google Scholar]
  • 39.Jenssen H (2011) Descriptors for antimicrobial peptides. Expert Opin Drug Discov 6:171–184. 10.1517/17460441.2011.545817 [DOI] [PubMed] [Google Scholar]
  • 40.Lata S, Sharma B, Raghava G (2007) Analysis and prediction of antibacterial peptides. BMC Bioinformatics 8:263. 10.1186/1471-2105-8-263 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Lata S, Mishra NK, Raghava GP (2010) AntiBP2: improved version of antibacterial peptide prediction. BMC Bioinformatics 11:S19. 10.1186/1471-2105-11-S1-S19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Wang P, Hu L, Liu G et al. (2011) Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PLoS One 6:e18476. 10.1371/journal.pone.0018476 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Fernandes FC, Rigden DJ, Franco OL (2012) Prediction of antimicrobial peptides based on the adaptive neuro-fuzzy inference system application. Biopolymers 98:280–287. 10.1002/bip.22066 [DOI] [PubMed] [Google Scholar]
  • 44.Torrent M, Di Tommaso P, Pulido D et al. (2012) AMPA: an automated web server for prediction of protein antimicrobial regions. Bioinformatics 28:130–131. 10.1093/bioinformatics/btr604 [DOI] [PubMed] [Google Scholar]
  • 45.Porto WF, Pires ÁS, Franco OL (2012) CS-AMPPred: an updated SVM model for antimicrobial activity prediction in cysteine-stabilized peptides. PLoS One 7:e51444. 10.1371/journal.pone.0051444 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Joseph S, Karnik S, Nilawe P et al. (2012) ClassAMP: a prediction tool for classification of antimicrobial peptides. IEEE/ACM Trans Comput Biol Bioinform 9:1535–1538. 10.1109/TCBB.2012.89 [DOI] [PubMed] [Google Scholar]
  • 47.Mooney C, Haslam NJ, Holton TA et al. (2013) PeptideLocator: prediction of bioactive peptides in protein sequences. Bioinformatics 29:1120–1126. 10.1093/bioinformatics/btt103 [DOI] [PubMed] [Google Scholar]
  • 48.Xiao X, Wang P, Lin W-Z et al. (2013) iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem 436:168–177. 10.1016/j.ab.2013.01.019 [DOI] [PubMed] [Google Scholar]
  • 49.Gogoladze G, Grigolava M, Vishnepolsky B et al. (2014) dbaasp: database of antimicrobial activity and structure of peptides. FEMS Microbiol Lett 357:63–68. 10.1111/1574-6968.12489 [DOI] [PubMed] [Google Scholar]
  • 50.Lee H-T, Lee C-C, Yang J-R et al. (2015) A large-scale structural classification of antimicrobial peptides. Biomed Res Int 2015:1–6. 10.1155/2015/475062 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Ng XY, Rosdi BA, Shahrudin S (2015) Prediction of antimicrobial peptides based on sequence alignment and support vector machine-pairwise algorithm utilizing LZ-complexity. Biomed Res Int 2015:e212715. 10.1155/2015/212715 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Waghu FH, Barai RS, Gurung P, Idicula-Thomas S (2016) CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides. Nucleic Acids Res 44. 10.1093/nar/gkv1051 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Lin W, Xu D (2016) Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types. Bioinformatics 32:3745–3752. 10.1093/bioinformatics/btw560 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Meher PK, Sahu TK, Saini V, Rao AR (2017) Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC. Sci Rep 7:42362. 10.1038/srep42362 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Bhadra P, Yan J, Li J et al. (2018) AmPEP: sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Sci Rep 8:1697. 10.1038/s41598-018-19752-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Porto WF, Fensterseifer ICM, Ribeiro SM, Franco OL (2018) Joker: an algorithm to insert patterns into sequences for designing antimicrobial peptides. Biochim Biophys Acta BBA – Gen Subj 1862:2043–2052. 10.1016/j.bbagen.2018.06.011 [DOI] [PubMed] [Google Scholar]
  • 57.Jhong J-H, Chi Y-H, Li W-C et al. (2019) dbAMP: an integrated resource for exploring antimicrobial peptides with functional activities and physicochemical properties on transcriptome and proteome data. Nucleic Acids Res 47:D285–D297. 10.1093/nar/gky1030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Gull S, Shamim N, Minhas F (2019) AMAP: hierarchical multi-label prediction of biologically active and antimicrobial peptides. Comput Biol Med 107:172–181. 10.1016/j.compbiomed.2019.02.018 [DOI] [PubMed] [Google Scholar]
  • 59.Lawrence TJ, Carper DL, Spangler MK et al. (2021) amPEPpy 1.0: a portable and accurate antimicrobial peptide prediction tool. Bioinformatics 37:2058–2060. 10.1093/bioinformatics/btaa917 [DOI] [PubMed] [Google Scholar]
  • 60.Kavousi K, Bagheri M, Behrouzi S et al. (2020) IAMPE: NMR-assisted computational prediction of antimicrobial peptides. J Chem Inf Model 60:4691–4701. 10.1021/acs.jcim.0c00841 [DOI] [PubMed] [Google Scholar]
  • 61.Burdukiewicz M, Sidorczuk K, Rafacz D et al. (2020) Proteomic screening for prediction and design of antimicrobial peptides with AmpGram. Int J Mol Sci 21:4310. 10.3390/ijms21124310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Schaduangrat N, Nantasenamat C, Prachayasittikul V, Shoombuatong W (2019) Meta-iAVP: a sequence-based meta-predictor for improving the prediction of antiviral peptides using effective feature representation. Int J Mol Sci 20:5743. 10.3390/ijms20225743 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Lertampaiporn S, Vorapreeda T, Hongsthong A, Thammarongtham C (2021) Ensemble-AMPPred: robust AMP prediction and recognition using the ensemble learning method with a new hybrid feature for differentiating AMPs. Genes 12:137. 10.3390/genes12020137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29:476–488. 10.1002/minf.201000061 [DOI] [PubMed] [Google Scholar]
  • 65.Delaunay M, Ha-Duong T (2022) Computational tools and strategies to develop peptide-based inhibitors of protein-protein interactions. In: Simonson T (ed) Computational peptide science. Springer US, New York, pp 205–230 [DOI] [PubMed] [Google Scholar]
  • 66.Wang G, Vaisman II, van Hoek ML (2022) Machine learning prediction of antimicrobial peptides. In: Simonson T (ed) Computational peptide science. Springer US, New York, pp 1–37 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Breiman L (2001) Random Forests. Mach Learn 45:5–32. 10.1023/A:1010933404324 [DOI] [Google Scholar]
  • 68.Veltri D, Kamath U, Shehu A (2018) Deep learning improves antimicrobial peptide recognition. Bioinformatics 34:2740–2747. 10.1093/bioinformatics/bty179 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Azim SM, Sharma A, Shatabda S, Dehzangi A (2021) DeepAmp: a convolutional neural network based tool for predicting protein AMPylation sites from binary profile representation. 10.21203/rs.3.rs-1013130/v1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Zhang S, Li X (2022) Pep-CNN: an improved convolutional neural network for predicting therapeutic peptides. Chemom Intell Lab Syst 221:104490. 10.1016/j.chemolab.2022.104490 [DOI] [Google Scholar]
  • 71.Das P, Wadhawan K, Chang O et al. (2018) PepCVAE: semi-supervised targeted design of antimicrobial peptide sequences. 10.48550/ARXIV.1810.07743 [DOI] [Google Scholar]
  • 72.Nagarajan D, Nagarajan T, Roy N et al. (2018) Computational antimicrobial peptide design and evaluation against multidrug-resistant clinical isolates of bacteria. J Biol Chem 293:3492–3509. 10.1074/jbc.M117.805499 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Tucs A, Tran DP, Yumoto A et al. (2020) Generating ampicillin-level antimicrobial peptides with activity-aware generative adversarial networks. ACS Omega 5:22847–22851. 10.1021/acsomega.0c02088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Xiao X, Shao Y-T, Cheng X, Stamatovic B (2021) iAMP-CA2L: a new CNN-BiLSTM-SVM classifier based on cellular automata image for identifying antimicrobial peptides and their functional types. Brief Bioinform 22:bbab209. 10.1093/bib/bbab209 [DOI] [PubMed] [Google Scholar]
  • 75.Li C, Sutherland D, Hammond SA et al. (2022) AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genomics 23:77. 10.1186/s12864-022-08310-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Hussain W (2022) sAMP-PFPDeep: improving accuracy of short antimicrobial peptides prediction using three different sequence encodings and deep neural networks. Brief Bioinform 23:bbab487. 10.1093/bib/bbab487 [DOI] [PubMed] [Google Scholar]
  • 77.Dee W (2022) LMPred: predicting antimicrobial peptides using pre-trained language models and deep learning. Bioinform Adv 2:vbac021. 10.1093/bioadv/vbac021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Bárcenas O, Pintado-Grima C, Sidorczuk K et al. (2022) The dynamic landscape of peptide activity prediction. Comput Struct Biotechnol J 20:6526–6533. 10.1016/j.csbj.2022.11.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Eid F-E, Elmarakeby HA, Chan YA et al. (2021) Systematic auditing is essential to debiasing machine learning in biology. Commun Biol 4:183. 10.1038/s42003-021-01674-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Rádai Z, Kiss J, Nagy NA (2021) Taxonomic bias in AMP prediction of invertebrate peptides. Sci Rep 11:17924. 10.1038/s41598-021-97415-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Porto WF, Ferreira KCV, Ribeiro SM, Franco OL (2022) Sense the moment: a highly sensitive antimicrobial activity predictor based on hydrophobic moment. Biochim Biophys Acta BBA – Gen Subj 1866:130070. 10.1016/j.bbagen.2021.130070 [DOI] [PubMed] [Google Scholar]
  • 82.Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. 10.1093/bioinformatics/btl158 [DOI] [PubMed] [Google Scholar]
  • 83.Aldas-Bulos VD, Plisson F (2023) Benchmarking protein structure predictors to assist machine learning-guided peptide discovery. Digital Discovery, Advanced Article. 10.1039/D3DD00045A [DOI] [Google Scholar]
  • 84.Dean SN, Alvarez JAE, Zabetakis D et al. (2021) PepVAE: variational autoencoder framework for antimicrobial peptide generation and activity prediction. Front Microbiol 12:725727. 10.3389/fmicb.2021.725727 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Zhang Q-Y, Yan Z-B, Meng Y-M et al. (2021) Antimicrobial peptides: mechanism of action, activity and clinical potential. Mil Med Res 8:48. 10.1186/s40779-021-00343-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Huan Y, Kong Q, Mou H, Yi H (2020) Antimicrobial peptides: classification, design, application and research progress in multiple fields. Front Microbiol 11:582779. 10.3389/fmicb.2020.582779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Sidorczuk K, Gagat P, Pietluch F et al. (2022) Benchmarks in antimicrobial peptide prediction are biased due to the selection of negative data. Brief Bioinform 23:bbac343. 10.1093/bib/bbac343 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Lee EY, Fulan BM, Wong GCL, Ferguson AL (2016) Mapping membrane activity in undiscovered peptide sequence space using machine learning. Proc Natl Acad Sci 113:13588–13593. 10.1073/pnas.1609893113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Lee EY, Lee MW, Fulan BM et al. (2017) What can machine learning do for antimicrobial peptides, and what can antimicrobial peptides do for machine learning? Interface Focus 7:20160153. 10.1098/rsfs.2016.0153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Brand GD, Ramada MHS, Genaro-Mattos TC, Bloch C (2018) Towards an experimental classification system for membrane active peptides. Sci Rep 8:1–11. 10.1038/s41598-018-19566-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Porto WF, Irazazabal L, Alves ESF et al. (2018) In silico optimization of a guava antimicrobial peptide enables combinatorial exploration for peptide design. Nat Commun 9. 10.1038/s41467-018-03746-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Torres MDT, de la Fuente-Nunez C (2019) Toward computer-made artificial antibiotics. Curr Opin Microbiol 51:30–38. 10.1016/j.mib.2019.03.004 [DOI] [PubMed] [Google Scholar]
  • 93.Müller AT, Hiss JA, Schneider G (2018) Recurrent neural network model for constructive peptide design. J Chem Inf Model 58:472–479. 10.1021/acs.jcim.7b00414 [DOI] [PubMed] [Google Scholar]
  • 94.Caceres-Delpiano J, Ibañez R, Alegre P et al. (2020) Deep learning enables the design of functional de novo antimicrobial proteins. 10.1101/2020.08.26.266940 [DOI] [Google Scholar]
  • 95.Ferrell JB, Remington JM, Van Oort CM et al. (2020) A generative approach toward precision antimicrobial peptide design. 10.1101/2020.10.02.324087 [DOI] [Google Scholar]
  • 96.Wang C, Garlick S, Zloh M (2021) Deep learning for novel antimicrobial peptide design. Biomol Ther 11:471. 10.3390/biom11030471 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Das P, Sercu T, Wadhawan K et al. (2021) Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nat Biomed Eng 5:613–623. 10.1038/s41551-021-00689-x [DOI] [PubMed] [Google Scholar]
  • 98.Surana S, Arora P, Singh D et al. (2021) PandoraGAN: generating antiviral peptides using Generative Adversarial Network. 10.1101/2021.02.15.431193 [DOI] [Google Scholar]
  • 99.Szymczak P, Możejko M, Grzegorzek T et al. (2023) Discovering highly potent antimicrobial peptides with deep generative model HydrAMP. Nat Commun 14:1453. 10.1038/s41467-023-36994-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.García-Jacas CR, Pinacho-Castellanos SA, García-González LA, Brizuela CA (2022) Do deep learning models make a difference in the identification of antimicrobial peptides? Brief Bioinform 23:bbac094. 10.1093/bib/bbac094 [DOI] [PubMed] [Google Scholar]
  • 101.Hamid M-N, Friedberg I (2019) Identifying antimicrobial peptides using word embedding with deep recurrent neural networks. Bioinformatics 35:2009–2016. 10.1093/bioinformatics/bty937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Bolatchiev A, Baturin V, Shchetinin E, Bolatchieva E (2022) Novel antimicrobial peptides designed using a recurrent neural network reduce mortality in experimental sepsis. Antibiotics 11:411. 10.3390/antibiotics11030411 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Sharma R, Shrivastava S, Kumar Singh S et al. (2021) Deep-ABPpred: identifying antibacterial peptides in protein sequences using bidirectional LSTM with word2vec. Brief Bioinform 22:bbab065. 10.1093/bib/bbab065 [DOI] [PubMed] [Google Scholar]
  • 104.Dean SN, Walper SA (2020) Variational autoencoder for generation of antimicrobial peptides. ACS Omega 5:20746–20754. 10.1021/acsomega.0c00442 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Ghorbani M, Prasad S, Brooks BR, Klauda JB (2022) Deep attention based variational autoencoder for antimicrobial peptide discovery. 10.1101/2022.07.08.499340 [DOI] [Google Scholar]
  • 106.Su X, Xu J, Yin Y et al. (2019) Antimicrobial peptide identification using multi-scale convolutional network. BMC Bioinformatics 20:730. 10.1186/s12859-019-3327-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Yan J, Bhadra P, Li A et al. (2020) Deep-AmPEP30: improve short antimicrobial peptides prediction with deep learning. Mol Ther Nucleic Acids 20:882–894. 10.1016/j.omtn.2020.05.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Van Oort CM, Ferrell JB, Remington JM et al. (2021) AMPGAN v2: machine learning-guided design of antimicrobial peptides. J Chem Inf Model 61:2198–2207. 10.1021/acs.jcim.0c01441 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Ovchinnikov S, Huang P-S (2021) Structure-based protein design with deep learning. Curr Opin Chem Biol 65:136–144. 10.1016/j.cbpa.2021.08.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Pan X, Kortemme T (2021) Recent advances in de novo protein design: principles, methods, and applications. J Biol Chem 296:100558. 10.1016/j.jbc.2021.100558 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Ferruz N, Heinzinger M, Akdel M et al. (2023) From sequence to function through structure: deep learning for protein design. Comput Struct Biotechnol J 21:238–250. 10.1016/j.csbj.2022.11.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Yue K, Dill KA (1992) Inverse protein folding problem: designing polymer sequences. Proc Natl Acad Sci 89:4163–4167. 10.1073/pnas.89.9.4163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.MacDonald JT, Freemont PS (2016) Computational protein design with backbone plasticity. Biochem Soc Trans 44:1523–1529. 10.1042/BST20160155 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Anishchenko I, Pellock SJ, Chidyausiku TM et al. (2021) De novo protein design by deep network hallucination. Nature 600:547–552. 10.1038/s41586-021-04184-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Eguchi RR, Choe CA, Huang P-S (2022) Ig-VAE: generative modeling of protein structure by direct 3D coordinate generation. PLoS Comput Biol 18:e1010271. 10.1371/journal.pcbi.1010271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Huang P-S, Feldmeier K, Parmeggiani F et al. (2016) De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracy. Nat Chem Biol 12:29–34. 10.1038/nchembio.1966 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Anand N, Eguchi R, Mathews II et al. (2022) Protein sequence design with a learned potential. Nat Commun 13:746. 10.1038/s41467-022-28313-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Dauparas J, Anishchenko I, Bennett N et al. (2022) Robust deep learning–based protein sequence design using ProteinMPNN. Science 378:49–56. 10.1126/science.add2187 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Chauhan VM, Pantazes RJ (2022) MutDock: a computational docking approach for fixed-backbone protein scaffold design. Front Mol Biosci 9:933400. 10.3389/fmolb.2022.933400 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Lutz ID, Wang S, Norn C et al. (2023) Top-down design of protein architectures with reinforcement learning. Science 380:266–273. 10.1126/science.adf6591 [DOI] [PubMed] [Google Scholar]
  • 121.Plisson F (2022) Overcoming the challenges in machine learning-guided antimicrobial peptide design. In: Proceedings of the 36th European and the 12th international peptide symposium, Sitges, Spain, pp 207–210. 10.17952/36EPS.2022.207 [DOI] [Google Scholar]
  • 122.Maccari G, Di Luca M, Nifosí R et al. (2013) Antimicrobial peptides design by evolutionary multiobjective optimization. PLoS Comput Biol 9:e1003212. 10.1371/journal.pcbi.1003212 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Müller AT, Kaymaz AC, Gabernet G et al. (2016) Sparse neural network models of antimicrobial peptide-activity relationships. Mol Inform 35:606–614. 10.1002/minf.201600029 [DOI] [PubMed] [Google Scholar]
  • 124.Beltran JA, Aguilera-Mendoza L, Brizuela CA (2018) Optimal selection of molecular descriptors for antimicrobial peptides classification: an evolutionary feature weighting approach. BMC Genomics 19:672. 10.1186/s12864-018-5030-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Yoshida M, Hinkley T, Tsuda S et al. (2018) Using evolutionary algorithms and machine learning to explore sequence space for the discovery of antimicrobial peptides. Chem 4:533–543. 10.1016/j.chempr.2018.01.005 [DOI] [Google Scholar]
  • 126.Ramos-Martín F, Annaval T, Buchoux S et al. (2019) ADAPTABLE: a comprehensive web platform of antimicrobial peptides tailored to the user’s research. Life Sci Alliance 2:e201900512. 10.26508/lsa.201900512 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127.Plisson F, Ramírez-Sánchez O, Martínez-Hernández C (2020) Machine learning-guided discovery and design of non-hemolytic peptides. Sci Rep 10:16581. 10.1038/s41598-020-73644-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 128.Capecchi A, Reymond J-L (2021) Peptides in chemical space. Med Drug Discov 9:100081. 10.1016/j.medidd.2021.100081 [DOI] [Google Scholar]
  • 129.Capecchi A, Cai X, Personne H et al. (2021) Machine learning designs non-hemolytic antimicrobial peptides. Chem Sci 12:9221–9232. 10.1039/D1SC01713F [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130.Agüero-Chapin G, Galpert-Cañizares D, Domínguez-Pérez D et al. (2022) Emerging computational approaches for antimicrobial peptide discovery. Antibiotics 11:936. 10.3390/antibiotics11070936 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131.Röckendorf N, Nehls C, Gutsmann T (2022) Design of membrane active peptides considering multi-objective optimization for biomedical application. Membranes 12:180. 10.3390/membranes12020180 [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES