DeepAFP: An effective computational framework for identifying antifungal peptides based on deep learning

Lantian Yao; Yuntian Zhang; Wenshuo Li; Chia‐Ru Chung; Jiahui Guan; Wenyang Zhang; Ying‐Chih Chiang; Tzong‐Yi Lee

doi:10.1002/pro.4758

. 2023 Oct 1;32(10):e4758. doi: 10.1002/pro.4758

DeepAFP: An effective computational framework for identifying antifungal peptides based on deep learning

Lantian Yao ^1,², Yuntian Zhang ³, Wenshuo Li ², Chia‐Ru Chung ⁴, Jiahui Guan ³, Wenyang Zhang ³, Ying‐Chih Chiang ^1,^3,^✉, Tzong‐Yi Lee ^5,^6,^✉

PMCID: PMC10503419 PMID: 37595093

Abstract

Fungal infections have become a significant global health issue, affecting millions worldwide. Antifungal peptides (AFPs) have emerged as a promising alternative to conventional antifungal drugs due to their low toxicity and low propensity for inducing resistance. In this study, we developed a deep learning‐based framework called DeepAFP to efficiently identify AFPs. DeepAFP fully leverages and mines composition information, evolutionary information, and physicochemical properties of peptides by employing combined kernels from multiple branches of convolutional neural network with bi‐directional long short‐term memory layers. In addition, DeepAFP integrates a transfer learning strategy to obtain efficient representations of peptides for improving model performance. DeepAFP demonstrates strong predictive ability on carefully curated datasets, yielding an accuracy of 93.29% and an F1‐score of 93.45% on the DeepAFP‐Main dataset. The experimental results show that DeepAFP outperforms existing AFP prediction tools, achieving state‐of‐the‐art performance. Finally, we provide a downloadable AFP prediction tool to meet the demands of large‐scale prediction and facilitate the usage of our framework by the public or other researchers. Our framework can accurately identify AFPs in a short time without requiring significant human and material resources, and hence can accelerate the development of AFPs as well as contribute to the treatment of fungal infections. Furthermore, our method can provide new perspectives for other biological sequence analysis tasks.

Keywords: antifungal peptides, deep learning, drug discovery, sequence analysis

1. INTRODUCTION

Fungal infections, also known as mycoses, are caused by various fungal species and can affect humans and animals (Pang et al., 2004; Seyedmousavi et al., ²⁰¹⁸). Fungi are ubiquitous in the environment and are capable of causing a wide range of infections, ranging from superficial skin infections to invasive systemic infections that can be life‐threatening (Baumgardner, 2012). Fungal infections are an increasingly concerning issue worldwide, with an estimated 1 billion individuals affected by fungal diseases (Bongomin et al., 2017; Cornely et al., ²⁰¹⁷).

Antifungal agents have been developed to combat these infections, targeting various components of the fungal cell wall and membrane, as well as intracellular processes (Ahmed et al., 2021; Ghannoum & Rice, ¹⁹⁹⁹). Polyenes, azoles, echinocandins, and nucleoside analogues are the main classes of antifungal agents used clinically (Rapp, 2004; Seyedmousavi et al., ²⁰¹⁷). Antifungal therapy may be complicated by drug interactions, adverse effects, and emergence of resistance (Campoy & Adrio, 2017; Fisher et al., ²⁰²²; Wiederhold, ²⁰¹⁷). In certain cases, combination therapy with antifungal agents may be necessary to achieve therapeutic efficacy, which can increase the burden on patients both physically and financially. Therefore, continued research and development of novel antifungal agents are needed to improve the management of fungal infections (Butts & Krysan, 2012; Dhama et al., ²⁰¹³; Lestrade et al., ²⁰¹⁹; Perfect, ²⁰¹⁷).

In recent years, peptide‐based therapies have gained attention as a potential alternative to traditional small‐molecule drugs for the treatment of infectious diseases (Browne et al., 2020; Craik et al., ²⁰¹³). Among these peptides, antimicrobial peptides (AMPs) have emerged as a promising class of molecules due to their broad‐spectrum activity against a range of microorganisms, including bacteria, viruses, and fungi (Jhong et al., 2022). In particular, antifungal peptides (AFPs) are a unique subset of AMPs that have been found to exhibit potent activity against a variety of fungal species (Moravej et al., 2018). AFPs exert their antimicrobial effects by interacting with the fungal cell surface, ultimately leading to cell death (Struyfs et al., 2021). AFPs have emerged as a promising alternative to conventional antifungal agents, owing to their favorable characteristics of low toxicity and high efficacy (Fang et al., n.d.).

The traditional wet‐lab methods for identifying AFPs require a long experimental period and significant manpower and resources. Therefore, in recent years, an increasing number of researchers have turned to artificial intelligence‐based methods for identifying AFPs. Agrawal et al. utilized different descriptors of peptides, such as amino acid composition (AAC) and dipeptide composition (DPC), and developed AFP prediction models using various classifiers (Agrawal et al., 2018). Meher et al. integrated peptide composition features, structural features, and physicochemical properties to develop a support vector machine (SVM)‐based method, iAMPpred, for identifying AFPs (Meher et al., 2017). Chung et al. proposed a two‐stage machine learning framework to identify AMPs including AFPs (Chung et al., 2020).

Although machine learning‐based methods have achieved good performance, they rely on specific domain knowledge. In recent years, the development of deep learning has provided new insights for protein sequence analysis tasks, partially overcoming the limitations of machine learning methods. Protein sequences are very similar to natural language, and many natural language processing algorithms have been used in protein sequence classification tasks (Zhang, Lin, et al., 2021). The success of deep learning depends on its powerful feature representation ability. Bidirectional Encoder Representations from Transformers (Bert) is a pretrained language model that uses a transformer‐based neural network architecture to generate contextualized representations of words by considering both their left and right context in a given sequence (Devlin et al., 2018). Bert has achieved significant success in many NLP tasks. Zhang et al. proposed a Bert‐based AMP recognition method that improves the predictive ability of the model (Zhang, Lin, et al., 2021). Pang et al. proposed a Bert‐based deep learning algorithm that integrates an imbalanced learning strategy to further enhance the predictive performance of AMPs (Pang et al., 2022). Researchers have used deep learning to identify AFP. Fang et al. used a convolutional neural network layer (CNN) and a long short‐term memory layer (LSTM) to construct a network to identify AFP (Fang et al., 2019). Sharma et al. employed transfer learning and 1DCNN–bi‐directional long short‐term memory (BiLSTM) networks to predict AFPs (Sharma et al., 2022). However, these methods also have limitations in that they do not consider the evolutionary information and physicochemical properties of peptides. Although Fang et al. utilized the evolutionary features and physicochemical properties of peptides to predict AFP, they ignored the Binary Profile of peptides, which was considered to improve the peptide classification task (Fang et al., n.d.).

In this study, we developed a deep learning‐based AFP recognition framework, DeepAFP, and integrated transfer learning strategies. Compared to other similar works, this study has the following novel aspects. First, we incorporated Binary Profile, BLOSUM62 matrix, and Z‐Scale matrix to construct the feature matrix of peptides. Then, we employed combined kernels of CNN and BiLSTM networks to further explore the features of peptides. In addition, we integrated a pretrained protein language model to efficiently represent peptide sequences, which further improved the robustness of the model. Extensive experiments show that our model outperforms existing AFP prediction methods and achieves state‐of‐the‐art performance. Finally, to facilitate researchers who are not familiar with deep learning, we developed a downloadable desktop application on Windows systems, making it easy for them to access our framework. DeepAFP can be available for free at https://github.com/lantianyao/DeepAFP (datasets and code) and https://awi.cuhk.edu.cn/dbAMP/DeepAFP.html (desktop application).

2. MATERIALS AND METHODS

2.1. Dataset preparation

In this study, we collected peptides from previous work (Agrawal et al., 2018). The original dataset was filtered to retain only the peptide sequences containing the 20 standard amino acids to form three new datasets named DeepAFP‐Main, DeepAFP‐Set 1, and DeepAFP‐Set 2. Positive samples of three datasets are made up of AFP, but negative samples are different. In order to enable the model to ignore the effect of peptide length and better mine the internal characteristics of the positives and negatives as much as possible, the peptide length of the positives and negatives was kept in the same distribution during dataset construction. In DeepAFP‐Set 1, the negative samples were composed of AMPs that lacked antifungal activity. In contrast, the negative samples in DeepAFP‐Set 2 were randomly generated from SwissProt. DeepAFP‐Main consisted of negative samples that were compatible with the characteristics of both DeepAFP‐Set 1 and DeepAFP‐Set 2, consisting of peptides containing AMPs but lacking antifungal activity, as well as randomly generated peptides from SwissProt. Finally, DeepAFP‐Set 1 contains 1459 AFPs and 1459 AMPs. DeepAFP‐Set 2 contains 1459 AFPs and 1457 random peptides, and DeepAFP‐Main contains 1459 AFP as well as 1457 AMP plus random peptides. We calculated the similarity of peptides in the datasets, as described in Data S1, Calculation of peptide similarity in the datasets.

Each dataset was further divided into train and independent test sets in a ratio of 8:2. The train set was used to fit the model during the training process, with hyperparameters optimized by 5‐fold cross‐validation technique, while the test set was used to evaluate the final performance of the model. The sizes of train and test sets on the three datasets are summarized in Table 1.

TABLE 1.

Overview of datasets in this study.

Dataset	Train/test	Positive	Negative	Total
DeepAFP‐Set 1 ^a	Train set	1168	1168	2336
DeepAFP‐Set 1 ^a	Test set	291	291	582
DeepAFP‐Set 2 ^b	Train set	1168	1166	2334
DeepAFP‐Set 2 ^b	Test set	291	291	582
DeepAFP‐Main ^c	Train set	1168	1167	2335
DeepAFP‐Main ^c	Test set	291	290	581

Open in a new tab

^{^a}

The negatives in DeepAFP‐Set 1 are AMPs but do not possess antifungal activity.

^{^b}

The negatives in DeepAFP‐Set 2 are random peptides generated from SwissProt.

^{^c}

The negatives in DeepAFP‐Main are made up of a mixture of AMPs and random peptides.

To gain a better understanding of the differences between positive and negative samples, the mean amino acid composition for the DeepAFP‐Main was plotted, as presented in Figure 1. The 20 amino acids were categorized into five groups based on their physicochemical properties. According to Figure 1, AFP demonstrated a significantly higher amino acid composition in the polar (alkaline) group, which includes arginine (R), histidine (H), and lysine (K), as compared to non‐AFPs (AMPs + random peptides). Conversely, non‐AFPs showed higher abundances of amino acids in the polar (acidic) group, such as aspartic acid (D) and glutamic acid (E). In a similar fashion, non‐AFPs exhibited a relatively greater amino acid composition for the hydrophobic amino acids, including alanine (A), isoleucine (I), leucine (L), methionine (M), valine (V), phenylalanine (F), and tryptophan (W), except for tyrosine (Y).

Statistics for DeepAFP‐Main dataset. (a) Mean AAC of positive samples (antifungal peptides, AFPs) and negative samples (non‐AFPs). The 20 amino acids are categorized based on their physiochemical properties. (b) Length distribution of positive samples (AFPs) and negative samples (non‐AFPs).

2.2. Peptide encoding

2.2.1. Binary profile

The binary profile is a widely used technique for representing amino acid sequences (Qureshi et al., 2015; Yao et al., ²⁰²³). This method employs a one‐hot encoding strategy to capture both the composition and order information of a given sequence. Specifically, each amino acid is represented as a 20‐dimensional binary vector, such that only one dimension corresponds to the position of the encoded amino acid, while all other dimensions are set to zero. For instance, the amino acid alanine (A) is encoded as the binary vector [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]. By applying this strategy to a peptide sequence $S$ of length $L$ , we obtain an $L \times 20$ binary matrix that represents the input sequence.

2.2.2. BLOSUM62

Previous research has indicated that the utilization of evolutionary information derived from proteins is beneficial in facilitating protein sequence analysis, particularly for proteins possessing low sequence similarities (An et al., 2019; Liu et al., ²⁰²⁰). The BLOSUM62 matrix is a commonly used substitution matrix in bioinformatics to assess the similarity of protein sequences, based on observed frequencies of amino acid substitutions in a database of aligned sequences with less than 62% sequence identity (Trivedi & Nagarajaram, 2020). The BLOSUM62 matrix is a widely employed method for indicating protein evolutionary information and has been utilized in numerous bioinformatics tasks (Chen et al., 2021; Ma et al., ²⁰²²; Wei et al., ²⁰²²; Yao et al., ²⁰²³). In this study, we have employed the BLOSUM62 matrix to encode both AFPs and non‐AFPs. To encode a given protein or peptide sequence $S$ , each residue is represented as a vector possessing $20$ dimensions, which corresponds to the relevant row of the BLOSUM62 matrix. As a result, a sequence $S$ of length $L$ is represented as an $L \times 20$ matrix, as outlined below.

BLOSUM 62 = [\begin{array}{c} p_{1, 1} & p_{1, 2} & \dots & p_{1, 20} \\ p_{2, 1} & p_{2, 2} & \dots & p_{2, 20} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ p_{L, 1} & p_{L, 2} & \dots & p_{L, 20} \end{array}]

(1)

where $p_{n, i}$ represents the chance that $n th$ amino acid in the peptide is substituted for another.

2.2.3. Z‐Scale

Previous studies demonstrate that the physicochemical properties of amino acids can enhance the recognition of evolutionarily conserved motifs, which is essential to reinforce our model as it aims to recognize peptides with specific functional activity (Lin et al., 2021). The application of amino acid physicochemical properties has yielded promising results in the identification of AMPs. To represent amino acid physicochemical properties, this study utilized the Z‐Scale matrix (Zsl), which characterizes each amino acid with five numerical values representing distinct physicochemical properties (Sandberg et al., 1998). Specifically, $z 1$ indicates hydrophobicity and hydrophilicity, $z 2$ reflects steric bulk properties and polarizability, $z 3$ denotes polarity, and $z 4$ and $z 5$ signify electronic effects. Therefore, a peptide $S$ of length $L$ can be encoded as an $L \times 5$ matrix as follows.

Z ‐ Scale = [\begin{array}{c} z_{1, 1} & z_{1, 2} & \dots & z_{1, 5} \\ z_{2, 1} & z_{2, 2} & \dots & z_{2, 5} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ z_{L, 1} & z_{L, 2} & \dots & z_{L, 5} \end{array}]

(2)

All three descriptors, namely Binary profile, BLOSUM62, and Z‐Scale, are utilized for encoding a peptide. This results in the creation of a feature matrix of peptide $S$ , with a length of $L$ , which is represented as an $L \times 45$ matrix. It should be noted that due to variations in the lengths of the various peptide sequences, the feature matrices of the sequences are zero‐padded to fit the data set.

2.3. Model architecture

The architecture of DeepAFP is presented in Figure 2 and consists of three modules: Pretrained Bert Module, CNN‐BiLSTM Module, and Feature Fusion Module. The Pretrained Bert Module integrates a large‐scale pretrained language model, which is used to obtain efficient representations of peptide sequences. The CNN‐BiLSTM Module employs CNN and BiLSTM to fully exploit the compositional, evolutionary, and physicochemical properties of peptides. The Feature Fusion Module is designed to integrate the two extracted features and generate a higher‐level representation of peptides, which is then used to output the classification results.

The architecture of DeepAFP. DeepAFP consists of three modules, namely (a) Pre‐trained Bert Module, (b) CNN‐BiLSTM Module, and (c) Feature Fusion Module. The pretrained Bert Module integrates a large‐scale pretrained protein language model to obtain efficient representations of peptide sequences. The CNN‐BiLSTM Module utilizes multi‐branch convolutional neural networks (CNN) and bidirectional long short‐term memory networks (BiLSTM) to fully utilize and mine the compositional, evolutionary, and physicochemical properties of peptides. Note that the size of convolution kernels for each branch is different. The Feature Fusion Module is used to integrate the features extracted from the two modules to generate high‐level peptide representations and output the classification results. Here, “FC” refers to “Fully‐Connected Layer.”

2.3.1. Pretrained protein language model

DeepAFP employed a transfer learning strategy to extract their biological features. To achieve this, we converted each peptide into a vector representation using Bert, a transformer‐based architecture that enhances representation through a self‐attention mechanism that captures underlying interrelationships among all possible amino acid pairs within the input peptide.

As shown in Figure 2A, initially, a tokenizer converts each amino acid into a discrete number, digitizing a peptide $S$ with length $L$ into $(a_{1}, a_{2}, \dots, a_{n})$ , where $a_{i}$ represents the token of the $i$ th amino acid. The transformer‐based model utilized TAPE as the backbone, which had been pretrained on the Pfam dataset, containing over 31 million amino acid sequences (Rao et al., 2019). Each token was then embedded into a 768‐dimensional vector derived from the pretrained transformer‐based TAPE model, consisting of 12 coding layers, each with 12 self‐attention heads of 64 dimensions. Consequently, the last layer of the pretrained Bert model converted each token into a 768‐dimensional vector representation, which was averaged over the token dimension to produce a Bert encoding of 768 dimensions at the sequence level.

The Bert encoding was subsequently fed into a fully connected layer with 300 neurons, producing a 300‐dimensional feature vector $f_{0}$ to fit the output of the CNN‐BiLSTM network. In summary, our approach not only leveraged the fundamental characteristics of peptides but also employed transfer learning using a powerful pretrained model to extract their biological features, enabling us to achieve enhanced performance in peptide classification tasks.

2.3.2. Multi‐branch CNN‐BiLSTM network

To efficiently extract critical features for classifying AFP and non‐AFP from the feature matrix, we propose a novel hybrid CNN‐BiLSTM module that integrates kernels from multiple branches, as shown in Figure 2B. Each branch comprises one‐dimensional CNN (Conv1d) and BiLSTM layers for extracting high‐level latent features from the peptide's feature matrix. The kernel size varies across branches, allowing different convolutional branches to derive distinct essential information from the peptide feature matrices in the shallow network. This process is essential for enhancing the model's performance. We employ a rectified linear unit (ReLU) to activate the output of each Conv1d, and one‐dimensional max‐pooling is applied to each branch after ReLU activation to prevent over‐fitting. The local features extracted by the CNN are fed into the BiLSTM layer for capturing long‐range dependencies and sequence‐order information. The LSTM addresses issues of gradient vanishing and gradient exploding, commonly associated with traditional RNNs. The last layer of each branch is a fully connected layer with 100 neurons that convert hybrid features derived from the CNN and BiLSTM to a fixed‐dimension feature vector $f_{i}$ , where $i$ denotes the $i$ ‐th branch of the CNN‐BiLSTM network.

2.3.3. Feature fusion and output module

In this module, our objective is to integrate multi‐granularity features extracted from a feature matrix comprising compositional, evolutionary, and physicochemical features, using a multi‐branch CNN‐biLSTM, along with biological features extracted by a pretrained Bert model, as illustrated as Figure 2C. The initial step involves concatenating these features, and for ease of description, the merged features are denoted below.

F = Concatenate (f_{0}, f_{1}, f_{2}, f_{3})

(3)

Subsequently, the merged features are input into the output module, which comprises a two‐layer multi‐layer perceptron (MLP) with a hidden dimension of 64. This yields the 64‐dimensional final representation ( $F^{'}$ ) of a peptide sequence.

The final feature vector ( $F^{'}$ ) is then passed through the final fully connected layer with the $Relu$ activation function, as depicted in the following equation:

y = Relu (W F^{'} + b)

(4)

where $W$ is the weight matrix and $b$ is the bias vector.

Finally, the softmax function is applied to $y$ , resulting in the final output probability ( $p_{i}$ ), which can be calculated as follows:

p_{i} = \frac{\exp (y_{i})}{\sum_{j} \exp (y_{j})}

(5)

where $j$ belongs to 1 or 0, indicating the label AFP or non‐AFP, respectively.

2.4. Model training and experimental settlement

To ensure reliable predictive performance, we trained the model for 300 epochs with an initial learning rate of 0.001. Additionally, we employed a learning rate decay strategy in the training process to accelerate the model fitting. Specifically, the learning rate was attenuated to half of the previous rate after every 50 training rounds. We also utilized the Adam optimizer (Kingma & Ba, 2014) for model fitting and implemented an early stop strategy to prevent over‐fitting of the model. This strategy required training to be stopped if the model's accuracy did not increase after a certain number of epochs. Lastly, we conducted the training process with 4 $\times$ Nvidia 2080 Ti GPUs.

2.5. Evaluation metrics

To evaluate the predictive performance of models, we chose several well‐known evaluation metrics for machine learning tasks, that is, accuracy (Acc), recall (Rec), precision (Prec), and F1‐score (F1), which are given by the following equations.

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(6)

Precision = \frac{TP}{TP + FP}

(7)

Recall = \frac{TP}{TP + FN}

(8)

F 1 ‐ score = \frac{2 \times Precision \times Recall}{Precision + Recall}

(9)

where TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives, respectively.

3. RESULTS

3.1. Visualization of positives and negatives in different layers

The success of deep learning hinges on its powerful feature representation capabilities, where the neural network's layer‐by‐layer cascade structure plays a crucial role in this process. Each level of the cascade receives features from the previous level, undergoes further processing and manipulation to obtain better feature representations, and delivers them to the next level. To illustrate the effectiveness of our proposed model, we visualized this process as depicted in Figure 3. Specifically, we reduced the dimension of the features generated by different layers based on t‐distributed Stochastic Neighbor Embedding (t‐SNE) (Van der Maaten & Hinton, 2008), and selected several layers, including the BERT Encoding, FC Layer (Bert), Conv1d Layer, BiLSTM Layer, Concatenate Layer, and Latent Feature, to visualize AFPs and non‐AFPs. In Figure 3, blue dots represent non‐AFPs and yellow dots represent AFPs. Our results show that deeper layers in the network demonstrate better feature extraction ability. Notably, in the final layer, the proposed model extracted 64‐dimensional features, which enabled us to easily distinguish AFPs from non‐AFPs.

Visualization of positive and negative samples in different layers of DeepAFP. The visualized layers include (a) BERT encoding, (b) the FC layer after BERT, (c) Conv1D layer, (d) BiLSTM layer, (e) Concatenate layer, and (f) latent feature. These layers correspond to the network architecture in Figure 2. The high‐dimensional features were reduced to two dimensions using t‐SNE. The yellow dots denote AFPs and the blue dots denote non‐AFPs.

3.2. Compare with other existing AFP prediction tools

To demonstrate the capabilities of our proposed method, we compared the performance of our models with that of other available methods on independent test sets, which included AFPtransferPred (Lobo et al., 2023), Zhang et al.'s work (Zhang, Yang, et al., 2021), TransImbAMP (Pang et al., 2022), iAMPpred (Meher et al., 2017), iAMPCN (Xu et al., 2023), AMPfun (Chung et al., 2020), AntiFP (Agrawal et al., 2018), and AFPDeep (Fang et al., 2019). We also tried to compare our proposed framework with AFP‐MFL (Fang et al., n.d.) and Deep‐AFPpred (Sharma et al., 2022), but their websites or codes are currently not available. These methods utilized different features and algorithms to identify AFPs, and the predictive results are summarized in Table 2.

TABLE 2.

Performance comparison between this study and other existing AFP prediction tools.

Dataset	Method	Acc (%)	Rec (%)	Prec (%)	F1 (%)
DeepAFP‐Main	AFPtransferPred	58.35	49.14	60.34	54.17
	Zhang et al.	65.84	64.66	66.55	65.59
	TransImbAMP	66.61	91.07	61.20	73.20
	iAMPpred	77.45	74.10	84.54	78.97
	iAMPCN	79.35	92.44	73.30	81.76
	AMPfun	80.55	79.47	82.47	80.94
	AntiFP	84.68	85.82	83.16	84.47
	AFPDeep	91.05	93.13	89.44	91.25
	This study	93.29	95.53	91.45	93.45
DeepAFP‐Set 1	AFPtransferPred	48.97	47.77	48.94	48.35
	Zhang et al.	55.99	68.55	54.65	60.82
	iAMPpred	61.34	57.82	83.85	68.44
	TransImbAMP	62.20	92.10	57.63	70.90
	iAMPCN	69.59	91.07	63.70	74.96
	AMPfun	75.95	71.39	86.60	78.26
	AntiFP	87.29	88.34	85.91	87.11
	AFPDeep	90.21	87.29	92.70	89.92
	This study	92.44	93.13	91.86	92.49
DeepAFP‐Set 2	AFPtransferPred	66.32	47.42	76.24	58.47
	TransImbAMP	70.27	91.75	64.18	75.53
	Zhang et al.	76.26	65.34	83.41	73.28
	iAMPCN	84.19	90.72	80.24	85.16
	AMPfun	85.91	86.67	84.88	85.76
	iAMPpred	86.77	89.93	82.82	86.23
	AntiFP	90.21	95.00	84.88	89.66
	AFPDeep	94.67	94.16	95.14	94.65
	This study	96.05	95.19	96.85	96.01

Open in a new tab

As shown in Table 2, our proposed method demonstrated remarkable performance on all three datasets and outperformed other available prediction tools. Specifically, DeepAFP achieved an accuracy of 93.29% and an F1‐score of 93.45% on DeepAFP‐Main, which exceeded the existing best‐performing method by more than 8 percentage points. Additionally, DeepAFP also outperformed the existing methods significantly on DeepAFP‐Set 1 and DeepAFP‐Set 2, achieving 92.44% and 96.05% accuracy, respectively, surpassing the best extant approach by 5–6 percentage points. These results suggest that DeepAFP is capable of capturing more effective information associated with antifungal activity of peptides.

Overall, our model delivers substantial improvement over the existing methods in all measures, demonstrating that DeepAFP can provide more stable performance. There are several reasons why our proposed model exhibits outstanding performance. First, our method employs a wider range of peptide features that incorporate composition information, physicochemical properties, and evolutionary information of peptides. These features provide a comprehensive profile of the peptide from different perspectives, which is essential for model construction. Second, compared to conventional machine learning approaches, our proposed method can automatically capture hidden features by deeply mining peptide information through a layer‐by‐layer cascade architecture. DeepAFP incorporates CNN and BiLSTM to learn the local features of a sequence and capture its contextual dependencies, which is crucial for sequence classification tasks. Third, our proposed method incorporates multiple convolutional branches combined with recurrent neural layers. Combining kernels of different sizes through multiple branches to receive information from peptides and perform convolution operations. Compared to traditional single‐branch CNN‐BiLSTM networks, our proposed network yields richer feature variety by extracting multiple granularities of peptide local features, which is beneficial for sequence analysis tasks. Finally, DeepAFP integrates transfer learning strategies to extract more effective protein representations by integrating the large‐scale protein pretraining model, which accelerates peptide classification and enhances the robustness of the model.

3.3. Comparative performance analysis for different amino acid coding approaches

In this study, we employed the Binary Profile, BLOSUM62 matrix, and Z‐Scale matrix to represent the peptide sequences. To investigate the impact of these descriptors on the model performance, we conducted experiments using various combinations of the three feature matrices. The experiment results on DeepAFP‐Main are presented in Table 3.

TABLE 3.

Performance comparison using different amino acid coding approaches on independent test of DeepAFP‐Main. Bold font indicates the highest value of the evaluation indicator.

Feature matrix	Acc (%)	Rec (%)	Prec (%)	F1 (%)
Binary	93.12	93.47	92.83	93.15
BLOSUM	92.25	92.44	92.12	92.28
Zsl	92.77	92.44	93.08	92.76
Binary + BLOSUM	92.25	90.03	94.24	92.09
Blnary + Zsl	93.29	92.78	93.75	93.26
BLOSUM + Zsl	92.25	90.72	93.62	92.15
Binary + BLOSUM + Zsl	93.29	95.53	91.45	93.45

Open in a new tab

The results in Table 3 demonstrate that the model trained with only the Binary Profile outperformed those trained with only BLOSUM62 matrix and Zsl matrix, achieving an accuracy of 93.12% and an F1‐score of 93.15%. Additionally, combining these feature matrices led to even better performance, with the model trained using the Binary Profile, BLOSUM62 matrix, and Zsl matrix achieving the highest accuracy of 93.29% and F1‐score of 93.45%. Similar results were observed on DeepAFP‐Set 1 and DeepAFP‐Set 2, as shown in Tables S1 and S2.

These findings indicate that a comprehensive representation of peptide sequences can be achieved by integrating information about their composition, evolution, and physicochemical properties. First, the Binary Profile provides a straightforward encoding of amino acid sequences, reflecting the type and order of amino acids in a peptide. Second, the BLOSUM62 matrix indicates the likelihood of one amino acid being replaced by another within a given cluster. Consequently, the BLOSUM62 matrix conveys valuable evolutionary information about peptide sequences. Previous studies have demonstrated that incorporating such evolutionary information can enhance the performance of protein function prediction tasks (An et al., 2019; Li & Liu, ²⁰²⁰). Finally, the Zsl matrix of amino acids provides rich information on physicochemical properties, which is crucial for building robust models in multi‐omics tasks (Lin et al., 2021; Yao et al., ²⁰²³).

3.4. Ablation experiment

To further validate the rationality of the model architecture and investigate the impact of different modules on the prediction outcomes, we conducted ablation experiments. Specifically, we sequentially removed the CNN‐BiLSTM module and the Bert module, and evaluated the performance of the trained models, as displayed in Figure 4 and Table S3.

Performance comparison of ablation experiments on (a) DeepAFP‐Main, (b) DeepAFP‐Set 1, and (c) DeepAFP‐Set 2.

The results of the ablation experiment indicate that the CNN‐BiLSTM module had a greater positive effect on AFP prediction than the Bert module. In DeepAFP‐Main, the CNN‐BiLSTM module achieved an accuracy of 92.77% and an F1‐score of 92.83%, while the Bert module attained an accuracy of 90.19% and an F1‐score of 90.19%. Moreover, the fusion of the two modules further enhanced the prediction of AFP, achieving more precise and balanced prediction and outperforming models with only a single module in all measures. Similar conclusions can be drawn for DeepAFP‐Set 1 and DeepAFP‐Set 2.

These findings demonstrate that the integration of transformer‐based pretrained protein models contributes to a more robust model and improves performance. The improvements can be attributed to the efficient encoding of peptide sequences via the transformer‐based architecture, which allows the model to generate more accurate representations for differentiating AFPs. Therefore, the enhanced performance of the model by integrating transfer learning strategies is not surprising.

Furthermore, we conducted comparative experiments with a single CNN‐BiLSTM architecture, and the experimental results are presented in Table 4. The experimental results indicate that the network's performance, employing the combined kernels of CNN‐BiLSTM, achieved superior accuracy and F1‐score across all three datasets. This observation underscores the capacity of the combined kernels of CNN‐BiLSTM architecture to enhance the precision of AFP predictions. The improved performance can be attributed to the utilization of distinct kernels, which enables the extraction of features at varying granularities from the feature matrix, thereby augmenting the neural network's feature exploration capabilities.

TABLE 4.

Performance comparison with single CNN‐BiLSTM architecture on three datasets. Bold font indicates the highest value of the evaluation indicator.

Dataset	Method	Acc (%)	Rec (%)	Prec (%)	F1 (%)
DeepAFP‐Main	Single CNN‐BiLSTM + Bert Encoding	92.43	91.75	93.03	92.39
DeepAFP‐Main	Combined kernels of CNN‐BiLSTM + Bert Encoding (DeepAFP)	93.29	95.53	91.45	93.45
DeepAFP‐Set 1	Single CNN‐BiLSTM + Bert Encoding	92.10	91.75	92.39	92.07
DeepAFP‐Set 1	Combined kernels of CNN‐BiLSTM + Bert Encoding (DeepAFP)	92.44	93.13	91.86	92.49
DeepAFP‐Set 2	Single CNN‐BiLSTM + Bert Encoding	95.36	94.85	95.83	95.34
DeepAFP‐Set 2	Combined kernels of CNN‐BiLSTM + Bert Encoding (DeepAFP)	96.05	95.19	96.85	96.01

Open in a new tab

3.5. ROC curves

To provide a more comprehensive understanding of the proposed method's performance, we have generated ROC curves for models trained using various feature matrices and conducted ablation analyses on three datasets. The ROC curves for the DeepAFP‐Main, DeepAFP‐Set 1, and DeepAFP‐Set 2 are presented in Figures S1–S3, respectively.

Figure S1A displays the ROC curves for models employing different feature matrices on DeepAFP‐Main. The blue curve represents the model trained with Binary profile, BLOSUM62 matrix, and Zsl matrix, which has the largest area under the curve (AUC) value of 0.933. This indicates that binary composition information, evolutionary information, and physicochemical properties of amino acids all contribute favorably to the prediction of AFP. Figure S1B shows the ROC curves for the ablation analysis, where the blue curves correspond to models trained with CNN‐BiLSTM and Bert module, while the orange and green curves correspond to models trained with Bert module and CNN‐BiLSTM module, respectively. The model trained with both modules achieves a larger ROC value indicating that both Bert and CNN‐BiLSTM modules have a positive impact on the AFP prediction task.

3.6. Feature analysis

DeepAFP employs two modules to automatically learn and extract intrinsic features from AFPs. In order to evaluate the effectiveness of the features learned by DeepAFP, we compared our proposed features with five commonly used manual features, including AAC, DPC, the composition of k‐spaced amino acid group (CKSAAGP), pseudo‐amino acid composition (PAAC), and physicochemical property (PHYC), which were introduced in the Data S1, The introduction of five common peptide descriptors used in Feature analysis. Our generated features by DeepAFP are denoted as “Proposed.”

To visually compare the features generated by DeepAFP with the other hand‐crafted features, we visualized these six features on independent test sets. Here, we employed t‐SNE for dimensionality reduction, reducing the dimensionality of the features to two‐dimensional space to visualize the distribution of AFPs and non‐AFPs. The results on the DeepAFP‐Main, DeepAFP‐Set 1, and DeepAFP‐Set 2 are shown in Figure 5, Figures S4 and S5, respectively. As we can see from Figure 5, compared with the other hand‐crafted features, the features extracted by DeepAFP can better cluster positive and negative samples, with fewer samples separated from each other. This demonstrates that the features automatically derived by DeepAFP are more discriminative and robust for AFP identification. In comparison to traditional manual features relying on prior knowledge, DeepAFP automatically learns features via different network modules to acquire deeper representations, suggesting that latent information learned from neural networks can better represent peptides and improve prediction performance. In our model, we used a pretrained model trained from a large‐scale protein dataset to extract biological features of peptides. Furthermore, the model deeply mines the binary features, evolutionary information, and physicochemical properties of peptides, enabling the derived latent features to possess more biological connotations than traditional manual sequence‐based features.

Visualization of positive and negative samples with different features on DeepAFP‐Main. Several commonly used peptide descriptors, including AAC, DPC, PAAC, CKSAAGP, and PHYC were selected for comparison with latent features generated by DeepAFP. The high‐dimensional features were reduced to two dimensions using t‐SNE. The yellow dots denote AFPs and the blue dots denote non‐AFPs.

The latent features learned by DeepAFP are constructed by 64 neurons that are associated with AFP and non‐AFP. In order to further investigate the discriminative capacity of the learned latent features with respect to the antifungal activity of peptides, we randomly selected 50 AFP and 50 non‐AFP samples from the DeepAFP‐Main and performed clustering analysis on the latent features of these samples. The clustering results are shown in Figure 6A, which clearly demonstrates that AFPs and non‐AFPs are clustered into two distinct sub‐trees. This further illustrates that the latent features learned by DeepAFP can accurately capture the antifungal activity characteristics of peptides. In addition, for each latent feature, peptides of the same type, that is, AFP and non‐AFP, often have similar feature values. To further investigate the relationship between latent features and the physicochemical properties of peptides, we calculated the Pearson correlation coefficients between each latent feature and several physicochemical properties, as shown in Figure 6B. It is not difficult to find that Figure 6B has a similar pattern to Figure 6A. For example, latent features (such as numbers 1, 6, 9, etc.) that are significantly positively correlated with Hydrophobicity, Boman Index, IEP, and Net Charge exhibit higher values in AFP samples.

The clustering analysis map of latent features generated by DeepAFP. (a) Heat map of latent features for 100 randomly selected samples. (b) Heat map of correlation coefficients between latent features and several physicochemical properties of peptides.

3.7. Tool availability

In order to meet the demand for large‐scale prediction, the downloadable desktop application based on the Windows system is available at https://awi.cuhk.edu.cn/dbAMP/DeepAFP.html.

After downloading and unzipping, click the “main.exe” to start the program. Since the Bert module integrates protein pretraining models, it requires a significant amount of computational resources. Based on the results of ablation experiments, the predictive performance of the CNN‐BiLSTM module is comparable to the proposed model. Therefore, in order to balance computational resources and predictive performance, our desktop program is based on the CNN‐BiLSTM module. As shown in Figure S6, the user should first enter protein sequences in the Fasta format, followed by the selection of the model. Then, clicking the “Submit” button initiates the prediction process. Once the progress bar reaches 100%, the “Result” button can be clicked to view the prediction results. In addition, users can download the predicted results by clicking the “Download” button. Our developed desktop program not only satisfies the demands for large‐scale predictions but also enables researchers without deep learning knowledge to use our framework easily and conveniently.

4. CONCLUSION

Fungal infections are a major public health problem worldwide, affecting millions of people annually. Despite the availability of antifungal drugs, the emergence of drug‐resistant fungal strains has become a significant concern. This highlights the need for the development of new antifungal agents and the optimization of existing drugs to overcome resistance. The potential therapeutic use of AFPs is currently being explored due to their unique properties, such as low toxicity and low likelihood of inducing drug resistance. However, identifying novel AFPs through traditional wet‐lab methods can be time‐consuming, laborious, and expensive. In recent years, advances in artificial intelligence AI have enabled the development of computational approaches to accelerate the discovery of novel AFPs.

In this study, we developed a deep learning framework called DeepAFP that fully leverages the compositional, evolutionary, and physicochemical properties of peptides. In addition, our proposed method integrates transfer learning strategies, utilizing the pretrain language model to obtain efficient representations of peptides and enhance the predictive performance of the model. In the future, we will explore embeddings from multiple protein language models to further enhance the predictive capabilities of our models.

Extensive experimental results demonstrate that our proposed framework outperforms existing AFP prediction tools, achieving state‐of‐the‐art performance. Finally, we made a downloadable desktop application for Windows systems to facilitate the usage of our framework by the public or other researchers. Our proposed framework can accurately and rapidly identify AFPs with minimal manpower and resources. Moreover, our approach provides a new perspective for other biological sequence analysis tasks.

AUTHOR CONTRIBUTIONS

Lantian Yao and Tzong‐Yi Lee presented the idea. Lantian Yao and Yuntian Zhang implemented the framework and built the pipeline. Wenshuo Li and Jiahui Guan collected the data. Wenyang Zhang, Wenshuo Li, and Chia‐Ru Chung analyzed the results. Tzong‐Yi Lee and Ying‐Chih Chiang supervised the research project.

FUNDING INFORMATION

This work was supported by the Guangdong Province Basic and Applied Basic Research Fund (2021A1515012447), National Natural Science Foundation of China (32070659), and the Kobilka Institute of Innovative Drug Discovery, The Chinese University of Hong Kong, Shenzhen, China. This work was also supported by the Center for Intelligent Drug Systems and Smart Bio‐devices, National Yang Ming Chiao Tung University, Taiwan.

CONFLICT OF INTEREST STATEMENT

The authors declare no competing interests.

Supporting information

Data S1: Supplementary Information.

Click here for additional data file.^{(2MB, pdf)}

ACKNOWLEDGMENTS

The authors sincerely appreciate Kobilka Institute of Innovative Drug Discovery, The Chinese University of Hong Kong (Shenzhen), and the “Center for Intelligent Drug Systems and Smart Bio‐devices” from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education in Taiwan.

Yao L, Zhang Y, Li W, Chung C‐R, Guan J, Zhang W, et al. DeepAFP: An effective computational framework for identifying antifungal peptides based on deep learning. Protein Science. 2023;32(10):e4758. 10.1002/pro.4758

Review Editor: Nir Ben‐Tal

Lantian Yao and Yuntian Zhang contributed equally to this work.

Contributor Information

Ying‐Chih Chiang, Email: chiangyc@cuhk.edu.cn.

Tzong‐Yi Lee, Email: leetzongyi@nycu.edu.tw.

REFERENCES

Agrawal P, Bhalla S, Chaudhary K, Kumar R, Sharma M, Raghava GP. In silico approach for prediction of antifungal peptides. Frontiers in Microbiology. 2018;9:323. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ahmed MZ, Rao T, Saeed A, Mutahir Z, Hameed S, Inayat S, et al. Antifungal drugs: mechanism of action and resistance. Biochemistry of Drug Resistance. 2021;143–165. [Google Scholar]
An J‐Y, Zhou Y, Zhang L, Niu Q, Wang D‐F. Improving self‐interacting proteins prediction accuracy using protein evolutionary information and weighed‐extreme learning machine. Current Bioinformatics. 2019;14:115–122. [Google Scholar]
Baumgardner DJ. Soil‐related bacterial and fungal infections. Journal of the American Board of Family Medicine. 2012;25:734–744. [DOI] [PubMed] [Google Scholar]
Bongomin F, Gago S, Oladele RO, Denning DW. Global and multi‐national prevalence of fungal diseases estimate precision. Journal of Fungi. 2017;3:57. [DOI] [PMC free article] [PubMed] [Google Scholar]
Browne K, Chakraborty S, Chen R, Willcox MD, Black DS, Walsh WR, et al. A new era of antibiotics: the clinical potential of antimicrobial peptides. International Journal of Molecular Sciences. 2020;21:7047. [DOI] [PMC free article] [PubMed] [Google Scholar]
Butts A, Krysan DJ. Antifungal drug discovery: something old and something new. PLoS Pathogens. 2012;8(9):e1002870. [DOI] [PMC free article] [PubMed] [Google Scholar]
Campoy S, Adrio JL. Antifungals. Biochemical Pharmacology. 2017;133:86–96. [DOI] [PubMed] [Google Scholar]
Chen J, Cheong HH, Siu SW. Xdeep‐ACPEP: deep learning method for anticancer peptide activity prediction based on convolutional neural network and multitask learning. Journal of Chemical Information and Modeling. 2021;61:3789–3803. [DOI] [PubMed] [Google Scholar]
Chung C‐R, Kuo T‐R, Wu L‐C, Lee T‐Y, Horng J‐T. Characterization and identification of antimicrobial peptides with different functional activities. Briefings in Bioinformatics. 2020;21:1098–1114. [DOI] [PubMed] [Google Scholar]
Cornely OA, Lass‐Flörl C, Lagrou K, Arsic‐Arsenijevic V, Hoenigl M. Improving outcome of fungal diseases‐guiding experts and patients towards excellence. Mycoses. 2017;60:420–425. [DOI] [PubMed] [Google Scholar]
Craik DJ, Fairlie DP, Liras S, Price D. The future of peptide‐based drugs. Chemical Biology & Drug Design. 2013;81:136–147. [DOI] [PubMed] [Google Scholar]
Devlin J, Chang M‐W, Lee K, Toutanova K. BERT: Pre‐training of deep bidirectional transformers for language understanding. arXiv preprint, arXiv:1810.04805. 2018.
Dhama K, Chakraborty S, Verma AK, Tiwari R, Barathidasan R, Kumar A, et al. Fungal/mycotic diseases of poultry‐diagnosis, treatment and control: a review. Pakistan Journal of Biological Sciences: PJBS. 2013;16:1626–1640. [DOI] [PubMed] [Google Scholar]
Fang C, Moriwaki Y, Li C, Shimizu K. Prediction of antifungal peptides by deep learning with character embedding. IPSJ Transactions on Bioinformatics. 2019;12:21–29. [Google Scholar]
Fang Y, Xu F, Wei L, Jiang Y, Chen J, Wei L, et al. Afp‐mfl: accurate identification of antifungal peptides using multi‐view feature learning. Briefings in Bioinformatics. 2023;24. [DOI] [PubMed] [Google Scholar]
Fisher MC, Alastruey‐Izquierdo A, Berman J, Bicanic T, Bignell EM, Bowyer P, et al. Tackling the emerging threat of antifungal resistance to human health. Nature Reviews. Microbiology. 2022;20:557–571. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghannoum MA, Rice LB. Antifungal agents: mode of action, mechanisms of resistance, and correlation of these mechanisms with bacterial resistance. Clinical Microbiology Reviews. 1999;12:501–517. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jhong J‐H, Yao L, Pang Y, Li Z, Chung C‐R, Wang R, et al. Dbamp 2.0: updated resource for antimicrobial peptides with an enhanced scanning method for genomic and proteomic data. Nucleic Acids Research. 2022;50:D460–D470. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint, arXiv:1412.6980. 2014.
Lestrade PP, Bentvelsen RG, Schauwvlieghe AF, Schalekamp S, van der Velden WJ, Kuiper EJ, et al. Voriconazole resistance and mortality in invasive aspergillosis: a multicenter retrospective cohort study. Clinical Infectious Diseases. 2019;68:1463–1471. [DOI] [PubMed] [Google Scholar]
Li C‐C, Liu B. Motifcnn‐fold: protein fold recognition based on fold‐specific features extracted by motif‐based convolutional neural networks. Briefings in Bioinformatics. 2020;21:2133–2141. [DOI] [PubMed] [Google Scholar]
Lin T‐T, Yang L‐Y, Lu I‐H, Cheng W‐C, Hsu Z‐R, Chen S‐H, et al. Ai4amp: an antimicrobial peptide predictor using physicochemical property‐based encoding method and deep learning. Msystems. 2021;6:e00299‐21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu B, Li C‐C, Yan K. Deepsvm‐fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks. Briefings in Bioinformatics. 2020;21:1733–1741. [DOI] [PubMed] [Google Scholar]
Lobo F, González MS, Boto A, Pérez de la Lastra JM. Prediction of antifungal activity of antimicrobial peptides by transfer learning from protein pretrained models. International Journal of Molecular Sciences. 2023;24:10270. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ma R, Li S, Li W, Yao L, Huang H‐D, Lee T‐Y. Kinasephos 3.0: redesign and expansion of the prediction on kinase‐specific phosphorylation sites. Genomics, Proteomics & Bioinformatics. 2023;21:228–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico‐chemical and structural features into chou s general PSEAAC. Scientific Reports. 2017;7:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moravej H, Moravej Z, Yazdanparast M, Heiat M, Mirhosseini A, Moosazadeh M, et al. Mirnejad antimicrobial peptides: features, action, and their resistance mechanisms in bacteria. Microbial Drug Resistance. 2018;24:747–767. [DOI] [PubMed] [Google Scholar]
Pang KR, Wu JJ, Huang DB, Tyring SK. Subcutaneous fungal infections. Dermatologic Therapy. 2004;17:523–531. [DOI] [PubMed] [Google Scholar]
Pang Y, Yao L, Xu J, Wang Z, Lee T‐Y. Integrating transformer and imbalanced multi‐label learning to identify antimicrobial peptides and their functional activities. Bioinformatics. 2022;38:5368–5374. [DOI] [PMC free article] [PubMed] [Google Scholar]
Perfect JR. The antifungal pipeline: a reality check. Nature Reviews Drug Discovery. 2017;16:603–616. [DOI] [PMC free article] [PubMed] [Google Scholar]
Qureshi A, Tandon H, Kumar M. Avp‐ic50pred: multiple machine learning techniques‐based prediction of peptide antiviral activity in terms of half maximal inhibitory concentration (ic50). Peptide Science. 2015;104:753–763. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rao R, Bhattacharya N, Thomas N, Duan Y, Chen P, Canny J, et al. Evaluating protein transfer learning with tape. Advances in Neural Information Processing Systems. 2019;32:9689–9701. [PMC free article] [PubMed] [Google Scholar]
Rapp RP. Changing strategies for the management of invasive fungal infections pharmacotherapy. The Journal of Human Pharmacology and Drug Therapy. 2004;24:4S–28S. [PubMed] [Google Scholar]
Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. Journal of Medicinal Chemistry. 1998;41:2481–2491. [DOI] [PubMed] [Google Scholar]
Seyedmousavi S, Bosco S d M, de Hoog S, Ebel F, Elad D, Gomes RR, et al. Fungal infections in animals: a patchwork of different situations. Medical Mycology. 2018;56:S165–S187. [DOI] [PMC free article] [PubMed] [Google Scholar]
Seyedmousavi S, Rafati H, Ilkit M, Tolooe A, Hedayati MT, Verweij P. Systemic antifungal agents: current status and projected future developments. In: Lion T, editor. Human fungal pathogen identification: methods and protocols. Volume 1508. New York, NY: Humana Press; 2017. p. 107–139. [DOI] [PubMed] [Google Scholar]
Sharma R, Shrivastava S, Singh SK, Kumar A, Saxena S, Singh RK. Deep‐afppred: identifying novel antifungal peptides using pretrained embeddings from seq2vec with 1DCNN‐BILSTM. Briefings in Bioinformatics. 2022;23:bbab422. [DOI] [PubMed] [Google Scholar]
Struyfs C, Cammue BP, Thevissen K. Membrane‐interacting antifungal peptides. Frontiers in Cell and Development Biology. 2021;9:649875. [DOI] [PMC free article] [PubMed] [Google Scholar]
Trivedi R, Nagarajaram HA. Substitution scoring matrices for proteins—an overview. Protein Science. 2020;29:2150–2163. [DOI] [PMC free article] [PubMed] [Google Scholar]
van der Maaten L, Hinton G. Visualizing data using t‐sne. Journal of Machine Learning Research. 2008;9:2579–2605. [Google Scholar]
Wei L, Ye X, Sakurai T, Mu Z, Wei L. Toxibtl: prediction of peptide toxicity based on information bottleneck and transfer learning. Bioinformatics. 2022;38:1514–1524. [DOI] [PubMed] [Google Scholar]
Wiederhold NP. Antifungal resistance: current trends and future strategies to combat. Infection and Drug Resistance. 2017;10:249–259. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu J, Li F, Li C, Guo X, Landersdorfer C, Shen H‐H, et al. Iampcn: a deep‐learning approach for identifying antimicrobial peptides and their functional activities. Briefings in Bioinformatics. 2023;24:bbad240. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yao L, Li W, Zhang Y, Deng J, Pang Y, Huang Y, et al. Accelerating the discovery of anticancer peptides through deep forest architecture with deep graphical representation. International Journal of Molecular Sciences. 2023;24:4328. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang J, Yang L, Tian Z, Zhao W, Sun C, Zhu L, et al. Large‐scale screening of antifungal peptides based on quantitative structure–activity relationship. ACS Medicinal Chemistry Letters. 2021;13:99–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang Y, Lin J, Zhao L, Zeng X, Liu X. A novel antibacterial peptide recognition algorithm based on bert. Briefings in Bioinformatics. 2021;22:bbab200. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data S1: Supplementary Information.

Click here for additional data file.^{(2MB, pdf)}

[pro4758-bib-0001] Agrawal P, Bhalla S, Chaudhary K, Kumar R, Sharma M, Raghava GP. In silico approach for prediction of antifungal peptides. Frontiers in Microbiology. 2018;9:323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0002] Ahmed MZ, Rao T, Saeed A, Mutahir Z, Hameed S, Inayat S, et al. Antifungal drugs: mechanism of action and resistance. Biochemistry of Drug Resistance. 2021;143–165. [Google Scholar]

[pro4758-bib-0003] An J‐Y, Zhou Y, Zhang L, Niu Q, Wang D‐F. Improving self‐interacting proteins prediction accuracy using protein evolutionary information and weighed‐extreme learning machine. Current Bioinformatics. 2019;14:115–122. [Google Scholar]

[pro4758-bib-0004] Baumgardner DJ. Soil‐related bacterial and fungal infections. Journal of the American Board of Family Medicine. 2012;25:734–744. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0005] Bongomin F, Gago S, Oladele RO, Denning DW. Global and multi‐national prevalence of fungal diseases estimate precision. Journal of Fungi. 2017;3:57. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0006] Browne K, Chakraborty S, Chen R, Willcox MD, Black DS, Walsh WR, et al. A new era of antibiotics: the clinical potential of antimicrobial peptides. International Journal of Molecular Sciences. 2020;21:7047. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0007] Butts A, Krysan DJ. Antifungal drug discovery: something old and something new. PLoS Pathogens. 2012;8(9):e1002870. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0008] Campoy S, Adrio JL. Antifungals. Biochemical Pharmacology. 2017;133:86–96. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0009] Chen J, Cheong HH, Siu SW. Xdeep‐ACPEP: deep learning method for anticancer peptide activity prediction based on convolutional neural network and multitask learning. Journal of Chemical Information and Modeling. 2021;61:3789–3803. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0010] Chung C‐R, Kuo T‐R, Wu L‐C, Lee T‐Y, Horng J‐T. Characterization and identification of antimicrobial peptides with different functional activities. Briefings in Bioinformatics. 2020;21:1098–1114. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0011] Cornely OA, Lass‐Flörl C, Lagrou K, Arsic‐Arsenijevic V, Hoenigl M. Improving outcome of fungal diseases‐guiding experts and patients towards excellence. Mycoses. 2017;60:420–425. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0012] Craik DJ, Fairlie DP, Liras S, Price D. The future of peptide‐based drugs. Chemical Biology & Drug Design. 2013;81:136–147. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0013] Devlin J, Chang M‐W, Lee K, Toutanova K. BERT: Pre‐training of deep bidirectional transformers for language understanding. arXiv preprint, arXiv:1810.04805. 2018.

[pro4758-bib-0014] Dhama K, Chakraborty S, Verma AK, Tiwari R, Barathidasan R, Kumar A, et al. Fungal/mycotic diseases of poultry‐diagnosis, treatment and control: a review. Pakistan Journal of Biological Sciences: PJBS. 2013;16:1626–1640. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0015] Fang C, Moriwaki Y, Li C, Shimizu K. Prediction of antifungal peptides by deep learning with character embedding. IPSJ Transactions on Bioinformatics. 2019;12:21–29. [Google Scholar]

[pro4758-bib-0016] Fang Y, Xu F, Wei L, Jiang Y, Chen J, Wei L, et al. Afp‐mfl: accurate identification of antifungal peptides using multi‐view feature learning. Briefings in Bioinformatics. 2023;24. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0017] Fisher MC, Alastruey‐Izquierdo A, Berman J, Bicanic T, Bignell EM, Bowyer P, et al. Tackling the emerging threat of antifungal resistance to human health. Nature Reviews. Microbiology. 2022;20:557–571. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0018] Ghannoum MA, Rice LB. Antifungal agents: mode of action, mechanisms of resistance, and correlation of these mechanisms with bacterial resistance. Clinical Microbiology Reviews. 1999;12:501–517. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0019] Jhong J‐H, Yao L, Pang Y, Li Z, Chung C‐R, Wang R, et al. Dbamp 2.0: updated resource for antimicrobial peptides with an enhanced scanning method for genomic and proteomic data. Nucleic Acids Research. 2022;50:D460–D470. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0020] Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint, arXiv:1412.6980. 2014.

[pro4758-bib-0021] Lestrade PP, Bentvelsen RG, Schauwvlieghe AF, Schalekamp S, van der Velden WJ, Kuiper EJ, et al. Voriconazole resistance and mortality in invasive aspergillosis: a multicenter retrospective cohort study. Clinical Infectious Diseases. 2019;68:1463–1471. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0022] Li C‐C, Liu B. Motifcnn‐fold: protein fold recognition based on fold‐specific features extracted by motif‐based convolutional neural networks. Briefings in Bioinformatics. 2020;21:2133–2141. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0023] Lin T‐T, Yang L‐Y, Lu I‐H, Cheng W‐C, Hsu Z‐R, Chen S‐H, et al. Ai4amp: an antimicrobial peptide predictor using physicochemical property‐based encoding method and deep learning. Msystems. 2021;6:e00299‐21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0024] Liu B, Li C‐C, Yan K. Deepsvm‐fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks. Briefings in Bioinformatics. 2020;21:1733–1741. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0025] Lobo F, González MS, Boto A, Pérez de la Lastra JM. Prediction of antifungal activity of antimicrobial peptides by transfer learning from protein pretrained models. International Journal of Molecular Sciences. 2023;24:10270. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0026] Ma R, Li S, Li W, Yao L, Huang H‐D, Lee T‐Y. Kinasephos 3.0: redesign and expansion of the prediction on kinase‐specific phosphorylation sites. Genomics, Proteomics & Bioinformatics. 2023;21:228–241. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0027] Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico‐chemical and structural features into chou s general PSEAAC. Scientific Reports. 2017;7:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0028] Moravej H, Moravej Z, Yazdanparast M, Heiat M, Mirhosseini A, Moosazadeh M, et al. Mirnejad antimicrobial peptides: features, action, and their resistance mechanisms in bacteria. Microbial Drug Resistance. 2018;24:747–767. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0029] Pang KR, Wu JJ, Huang DB, Tyring SK. Subcutaneous fungal infections. Dermatologic Therapy. 2004;17:523–531. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0030] Pang Y, Yao L, Xu J, Wang Z, Lee T‐Y. Integrating transformer and imbalanced multi‐label learning to identify antimicrobial peptides and their functional activities. Bioinformatics. 2022;38:5368–5374. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0031] Perfect JR. The antifungal pipeline: a reality check. Nature Reviews Drug Discovery. 2017;16:603–616. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0032] Qureshi A, Tandon H, Kumar M. Avp‐ic50pred: multiple machine learning techniques‐based prediction of peptide antiviral activity in terms of half maximal inhibitory concentration (ic50). Peptide Science. 2015;104:753–763. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0033] Rao R, Bhattacharya N, Thomas N, Duan Y, Chen P, Canny J, et al. Evaluating protein transfer learning with tape. Advances in Neural Information Processing Systems. 2019;32:9689–9701. [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0034] Rapp RP. Changing strategies for the management of invasive fungal infections pharmacotherapy. The Journal of Human Pharmacology and Drug Therapy. 2004;24:4S–28S. [PubMed] [Google Scholar]

[pro4758-bib-0035] Sandberg M, Eriksson L, Jonsson J, Sjöström M, Wold S. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. Journal of Medicinal Chemistry. 1998;41:2481–2491. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0036] Seyedmousavi S, Bosco S d M, de Hoog S, Ebel F, Elad D, Gomes RR, et al. Fungal infections in animals: a patchwork of different situations. Medical Mycology. 2018;56:S165–S187. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0037] Seyedmousavi S, Rafati H, Ilkit M, Tolooe A, Hedayati MT, Verweij P. Systemic antifungal agents: current status and projected future developments. In: Lion T, editor. Human fungal pathogen identification: methods and protocols. Volume 1508. New York, NY: Humana Press; 2017. p. 107–139. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0038] Sharma R, Shrivastava S, Singh SK, Kumar A, Saxena S, Singh RK. Deep‐afppred: identifying novel antifungal peptides using pretrained embeddings from seq2vec with 1DCNN‐BILSTM. Briefings in Bioinformatics. 2022;23:bbab422. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0039] Struyfs C, Cammue BP, Thevissen K. Membrane‐interacting antifungal peptides. Frontiers in Cell and Development Biology. 2021;9:649875. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0040] Trivedi R, Nagarajaram HA. Substitution scoring matrices for proteins—an overview. Protein Science. 2020;29:2150–2163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0041] van der Maaten L, Hinton G. Visualizing data using t‐sne. Journal of Machine Learning Research. 2008;9:2579–2605. [Google Scholar]

[pro4758-bib-0042] Wei L, Ye X, Sakurai T, Mu Z, Wei L. Toxibtl: prediction of peptide toxicity based on information bottleneck and transfer learning. Bioinformatics. 2022;38:1514–1524. [DOI] [PubMed] [Google Scholar]

[pro4758-bib-0043] Wiederhold NP. Antifungal resistance: current trends and future strategies to combat. Infection and Drug Resistance. 2017;10:249–259. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0044] Xu J, Li F, Li C, Guo X, Landersdorfer C, Shen H‐H, et al. Iampcn: a deep‐learning approach for identifying antimicrobial peptides and their functional activities. Briefings in Bioinformatics. 2023;24:bbad240. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0045] Yao L, Li W, Zhang Y, Deng J, Pang Y, Huang Y, et al. Accelerating the discovery of anticancer peptides through deep forest architecture with deep graphical representation. International Journal of Molecular Sciences. 2023;24:4328. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0046] Zhang J, Yang L, Tian Z, Zhao W, Sun C, Zhu L, et al. Large‐scale screening of antifungal peptides based on quantitative structure–activity relationship. ACS Medicinal Chemistry Letters. 2021;13:99–104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pro4758-bib-0047] Zhang Y, Lin J, Zhao L, Zeng X, Liu X. A novel antibacterial peptide recognition algorithm based on bert. Briefings in Bioinformatics. 2021;22:bbab200. [DOI] [PubMed] [Google Scholar]

PERMALINK

DeepAFP: An effective computational framework for identifying antifungal peptides based on deep learning

Lantian Yao

Yuntian Zhang

Wenshuo Li

Chia‐Ru Chung

Jiahui Guan

Wenyang Zhang

Ying‐Chih Chiang

Tzong‐Yi Lee

Abstract

1. INTRODUCTION

2. MATERIALS AND METHODS

2.1. Dataset preparation

TABLE 1.

FIGURE 1.

2.2. Peptide encoding

2.2.1. Binary profile

2.2.2. BLOSUM62

2.2.3. Z‐Scale

2.3. Model architecture

FIGURE 2.

2.3.1. Pretrained protein language model

2.3.2. Multi‐branch CNN‐BiLSTM network

2.3.3. Feature fusion and output module

2.4. Model training and experimental settlement

2.5. Evaluation metrics

3. RESULTS

3.1. Visualization of positives and negatives in different layers

FIGURE 3.

3.2. Compare with other existing AFP prediction tools

TABLE 2.

3.3. Comparative performance analysis for different amino acid coding approaches

TABLE 3.

3.4. Ablation experiment

FIGURE 4.

TABLE 4.

3.5. ROC curves

3.6. Feature analysis

FIGURE 5.

FIGURE 6.

3.7. Tool availability

4. CONCLUSION

AUTHOR CONTRIBUTIONS

FUNDING INFORMATION

CONFLICT OF INTEREST STATEMENT

Supporting information

ACKNOWLEDGMENTS

Contributor Information

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases