Protocol for predicting peptides with anticancer and antimicrobial properties by a tri-fusion neural network

Jiyun Han; Shizhuo Zhang; Juntao Liu

doi:10.1016/j.xpro.2023.102541

. 2023 Sep 2;4(3):102541. doi: 10.1016/j.xpro.2023.102541

Protocol for predicting peptides with anticancer and antimicrobial properties by a tri-fusion neural network

Jiyun Han ¹, Shizhuo Zhang ¹, Juntao Liu ^1,^2,^3,^∗

PMCID: PMC10491854 PMID: 37660298

Summary

Here, we describe the use of TriNet to predict peptides with anticancer and antimicrobial properties by a tri-fusion neural network. We detail the use of TriNet for both the offline Python script version and the online service, thereby demonstrating its convenience for users. In addition, we provide a detailed explanation of the training process of TriNet to enhance the understanding of researchers seeking to leverage deep learning techniques for peptide classification.

For complete details on the use and execution of this protocol, please refer to Zhou et al.¹

Subject areas: Bioinformatics, Sequence Analysis, Cancer, Microbiology, Computer Sciences

Graphical abstract

Highlights

•
Steps for software installation and data preparation
•
Steps for predicting anticancer and antimicrobial peptides using the TriNet approach
•
Details on how to utilize the TriNet online web service
•
Train the TriNet model through the executed python code

Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.

Before you begin

This protocol details the use of TriNet¹ by effectively fusing the information contained in the serial fingerprints, sequence evolutions, and physicochemical properties of peptide sequences. TriNet is a method for predicting anticancer and antimicrobial peptides (ACPs and AMPs) trained on peptide sequences from two training sets, ACPalt² and AMPlify,³ containing experimentally validated ACPs and AMPs, and non-ACPs and non-AMPs that had no annotations related to biological activities. The method was implemented by using Python code with the help of several deep learning libraries, such as TensorFlow, Scikit-learn, and NumPy. We will introduce the two important peptides ACPs and AMPs, the minimal hardware requirements, the downloading steps, and the installation below.⁴ The software can be installed and run under both Linux and Windows environments.

Introduction

Antimicrobial peptides (AMPs), also referred to as host defense peptides, are a class of short peptides ubiquitously present in a plethora of organisms, ranging from microorganisms to humans.⁵ Anticancer peptides (ACPs), a critical subset of AMPs, are discerningly cytotoxic to cancer cells.⁶ There are many challenges associated with conventional anticancer therapies, including drug resistance, adverse reactions, lack of target specificity, and prohibitive costs.⁷ In contrast, certain AMPs can trigger cancer cell death via apoptotic pathway induction.⁸ The multifaceted action and reduced drug resistance of ACPs and AMPs render them promising candidates for anticancer therapies.⁹ Notably, several ACPs, such as LTX-315,¹⁰ have undergone clinical validation for the treatment of various malignancies, while AMPs, such as Pexiganan,¹¹ have found application in the treatment of infectious diseases. AMPs differ from small molecule antibiotics owing to their comparatively lower susceptibility to the development of resistance in pathogenic bacteria. In addition, AMPs possess a robust intracellular evolutionary barrier, inhibiting the horizontal transfer of established resistance mechanisms.¹² Given the medical demands for innovative anticancer medications, AMPs may represent a potential source of effective therapeutic agents, either as standalone interventions or in conjunction with other small molecules.¹³ In conclusion, despite production challenges, prohibitive costs, and short half-lives,¹⁴ AMPs have demonstrated efficacy and safety profiles that substantiate their viability as therapeutic agents in oncological interventions.¹³ Consequently, continued investigations and applications of ACPs and AMPs are of paramount importance to the evolution of pharmacotherapy, particularly in the domains of oncology and infectious diseases.

Downloading and installing the TriNet toolkit

Timing: <30 min

This step describes the installation scripts for required packages and software.

1.
Download the TriNet toolkit from https://github.com/hjy23/TriNet.

Note: To download the source code, please navigate to the ‘Code’ button, and subsequently opt for ‘Download ZIP’ (Figure 1A). Upon the extraction of the aforementioned ZIP file, you will be in possession of a range of files which include the dataset used as examples in the software and its related PSSM profiles, the well-trained model files, a YAML file to set up the environment, and a ReadMe file.

Downloading and Launching TriNet

(A) The TriNet toolkit is available at https://github.com/hjy23/TriNet.

(B) The PowerShell prompt console with the written function to launch TriNet.

Note: TriNet is functional on both the Linux and Windows operating systems with Python 3. We recommend a Python version 3.8. A simplified strategy to run Python on your operating system is to use Anaconda.¹⁵

2.
Download Anaconda from https://www.anaconda.com/products/individual according to your computer specifications.
3.
After Anaconda is installed, launch the PowerShell Prompt console by opening up the Anaconda Navigator.
4.
Before using TriNet for the first time, create a virtual environment and install the following dependencies.
- a.
  Create an Anaconda environment named “TriNet” and complete the automatic installation of all the required libraries and software packages based on the dependencies stored in the “environment.yml” file.
  >cd {downloads_path}/TriNet-master
  
  >conda env create -n TriNet -f environment.yml
  Note: This process allows for the isolation of Python environment and packages from the system and enables seamless replication of identical environments on different machines.
- b.
  Activate the TriNet virtual environment.
  >conda activate TriNet
- c.
  To optimize the performance of TriNet, you can configure TensorFlow-GPU if your system is equipped with a supported GPU as follows.
  >conda install -c conda-forge cudatoolkit=11.2
  
  >conda install -c conda-forge cudnn=8.1
  
  >pip install tensorflow-gpu==2.6

Key resources table

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Deposited data

ACP_example	GitHub	https://github.com/hjy23/TriNet
ACPmain	GitHub	https://github.com/hjy23/TriNet-Reproducing
ACPmaintest	GitHub	https://github.com/hjy23/TriNet-Reproducing
AMPtrain	GitHub	https://github.com/hjy23/TriNet-Reproducing
AMPtest	GitHub	https://github.com/hjy23/TriNet-Reproducing
ACPalttest	GitHub	https://github.com/hjy23/TriNet
AMPlifytest	GitHub	https://github.com/hjy23/TriNet

Software and algorithms

TriNet	GitHub	https://github.com/hjy23/TriNet
ACP-DL	GitHub	https://github.com/haichengyi/ACP-DL
AntiCP-2.0	GitHub	https://github.com/raghavagps/anticp2/
CAMP-SVM	Thomas et al.¹⁶	https://www.bicnirrh.res.in/antimicrobial
DNN	Veltri et al.¹⁷	https://www.dveltri.com/ascan/
TensorFlow (CPU/GPU)	GitHub	https://github.com/tensorflow/tensorflow
Scikit-learn	GitHub	https://github.com/scikit-learn/scikit-learn
NumPy	GitHub	https://github.com/numpy/numpy/releases
SciPy	GitHub	https://github.com/scipy/scipy

Other

Intel Xeon Silver 4210R CPU	N/A	N/A
NVIDIA GeForce RTX 3090 GPU	N/A	N/A
64GB RAM	N/A	N/A
Windows/Linux Operating System	N/A	N/A

Open in a new tab

Step-by-step method details

TriNet is a peptide identification method that can be leveraged by using a well-trained model. In addition, to enable researchers and developers to effectively apply this framework by training it on their own peptide data, we provide a comprehensive step-by-step guide that details the process of training TriNet on the input peptide information with fixed format files.

Data preparation

Timing: <5 min

This step illustrates the progress of data preparation and processing to satisfy the formatting requirements of TriNet.

Note: TriNet captures relevant information from peptide sequences, including serial fingerprints, sequence evolution, and physicochemical properties. These features are then fed into a well-trained deep learning model for classification purposes. TriNet automatically calculates the serial fingerprints and physicochemical properties directly from the primary peptide sequences in the form of a FASTA file. However, for the sequence evolution information, the standard mode of TriNet needs you to provide the position-specific scoring matrix (PSSM) feature profile generated via a sequence similarity alignment by using the BLAST (Basic Local Alignment Search Tool). For convenience, we also provide a fast mode of TriNet without calculating the PSSM matrix, which may slightly reduce the prediction accuracy of TriNet. The process of calculating the PSSM files by using BLAST is also detailed as follows.

1.
Provide a FASTA file.
Note: The FASTA format is a widely used text format in bioinformatics for storing nucleic acid or peptide sequences.
- a.
  To utilize TriNet, submit a FASTA format file containing the peptide sequence(s) to be predicted.
  Note: If you are providing training data instead of predictive data, you will need to include both the peptide sequences and a corresponding label for each peptide.
- b.
  Append a label (1 for positive and 0 for negative) at the end of the corresponding peptide representation separated by a ‘|’.
  Note: For example, the first peptide is represented by “>ACP_1” for predictive purposes, while it should be “ACP_1|1” for training purposes (Figure 2A).
2.
Generate PSSM profiles.
Note: To obtain PSSM profiles as the inputs of the standard mode of TriNet, you should download the BLAST program and make sequence similarity alignments as follows.

Note: The installers and source code are available from https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ (Figure 2B). Download the corresponding installation packages that are compatible with your operating system. Here is an example of how to install the software on Windows and Linux systems.
- a.
  Download and install the BLAST tool on a Windows system.
  - i.
    Download the installation file ncbi-blast-2.13.0+-win64.exe.
  - ii.
    Install BLAST to the specified location by double-clicking on the ncbi-blast-2.13.0+-win64.exe file.
- b.
  Download and install the BLAST tool on a Linux system.
  - i.
    Download the installation package ncbi-blast-2.13.0+-x64-linux.tar.gz.
  - ii.
    Extract the contents of the package.
    >tar -zxvf ncbi-blast-2.13.0+-x64-linux.tar.gz
  - iii.
    Add the absolute PATH of the BLAST executable directory (bin) to the environment variable $PATH to enable direct calls by program name.
    >echo “export PATH={path to Blast}/bin:\$PATH” >> ∼/.bashrc
- c.
  Download the uniref. 90 protein FSATA file and unzip it.
  Note: The uniref. 90 FSATA file can be downloaded from https://ftp.ebi.ac.uk/pub/databases/uniprot/uniref/uniref90/ (Figure 2C).
  
  Note: The UniRef¹⁸ (UniProt Reference Clusters) provides clustered sets of sequences from the UniProt Knowledgebase (UniProtKB), an internationally recognized protein sequence resource to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. UniRef100 combines identical sequences and sub-fragments with 11 or more residues into a single entry. UniRef90 is built by clustering UniRef100 sequences such that each cluster is composed of sequences that have at least 90% sequence identity to, and 80% overlap with, the longest sequence in the cluster (the seed sequence).
  
  >gunzip uniref90.fasta.gz # To obtain the uniref90.fasta file.
- d.
  Build the uniref. 90 database using the makeblastdb command in the BLAST program.
  >makeblastdb -in {path to uniref90.fasta file}/uniref90.fasta -dbtype prot -out {path to the database to be stored}/uniref90 -parse_seqids
- e.
  Start alignment. Run the following command to align the query peptide sequence(s) among the database you have built to obtain the PSSM profile (Figure 3).
  >psiblast -db {path to the uniref90 database} -evalue 0.001 -num_iterations 3 -num_threads 1 -query {path of the query FASTA file} -out_ascii_pssm {path of the output PSSM profile}
- f.
  Set the parameters of blast as follows.
  - i.
    -db: The path of the database you built, type = str.
  - ii.
    -evalue: The evalue represents the number of subject sequences that are expected to be retrieved from the database with a bit score equal to or greater than the one calculated from the alignment of the query and subject sequences, based solely on chance rather than homology, type = float.
  - iii.
    -num_iterations: The number of iterations (integer types usually set to 3), type = int.
  - iv.
    -num_threads: The number of threads you want to use (limited by CPU performance), type = int.
  - v.
    -query: The path of the sequence(s) you want to align (must be in FASTA format), type = str.
  - vi.
    -out_ascii_pssm: The path of the output profile as a PSSM matrix of your query sequence(s), type = str.
    Note: The file names of PSSM profiles should be numbers and be related to the orders of the peptide sequences. For example, if there are N peptide sequences in the FASTA file, the corresponding N PSSM files should be named 1.pssm, 2.pssm, ..., N.pssm.
- g.
  Display the PSSM matrix.
  Note: PSSM is used to describe the conservation degree of amino acid residues in peptide sequences (Figure 3).

Collect and screen peptide sequences.

Note: In AMP/ACP collection, there is a rigorous set of criteria that encompasses the assurance of reliable sources, high-grade purity, antimicrobial activity, compliance with relevant safety standards, and commendable stability.¹⁹ Here, we provide an example for collecting and screening peptides. Additionally, we provide several public datasets and other sources for collecting potential AMPs to be predicted.

Note: For training, validation, and testing datasets, it is imperative to ascertain the reliability of the data sources. In this study, we describe the process of generating the AMPlify³ dataset to explicate the criteria that should be followed when constructing datasets.

Select positive samples.

i.
Combine publicly accessible AMP datasets, namely, the Antimicrobial Peptide Database (APD3, accessible at http://aps.unmc.edu/AP) and the Database of Anuran Defense Peptides (DADP, accessible at http://aps.unmc.edu/AP).

ii.

Remove redundant sequences to yield a non-redundant positive dataset of 4173 AMP sequences.

Note: Each sequence within this positive dataset is characterized by a length not exceeding 200 amino acid residues (see Table 1 for examples of selected AMPs).

Table 1.

Examples of 5 AMPs and 5 non-AMPs in the AMPlify dataset

Peptides	Sequences	Labels
AMP_1	GLLDTFKNLALNAAKSAGVSVLNSLSCKLSKTC	1
AMP_2	AKKPVAKKAAGGVKKPK	1
AMP_3	GIIDIAKKLVGGIRNVLGI	1
AMP_4	QDKPFWDPPIYPV	1
AMP_5	GEIPCGESCVYLPCFLPNCYCRNHVCYLN	1
non-AMP_1	FRGHAKGDKKNQKK	0
non-AMP_2	SGLQFAVMDGQGFLPFPRV	0
non-AMP_3	IVPFLGPLLGLLT	0
non-AMP_4	MRAKWRKKRTRRLKRKRRKVRARSK	0
non-AMP_5	YIELAVVADHGIFTKYNSNLNTIR	0

Open in a new tab

b.
Select negative samples.
- i.
  Collect sequences with no annotations related to antimicrobial activities and with a length not exceeding 200 amino acid residues from the UniProtKB/Swiss-Prot repository.
- ii.
  Remove duplicates and sequences comprising residues beyond the scope of the 20 canonical amino acids, generating a set Sn of candidate negative samples.
- iii.
  Remove the peptide sequences in Sn that were annotated as potential AMPs in the UniProtKB/Swiss-Prot database or included in the 4173 positive samples.
- iv.
  Randomly select 4173 negative samples from the remaining candidate negatives to match the number and length distribution of positive samples (see Table 1 for examples of selected non-AMPs).
  Note: DRAMP¹⁹ (Data Repository of Antimicrobial Peptides) stands as a publicly accessible, curated repository of resources pertaining to antimicrobial peptides. It provides many candidate peptides whose antimicrobial activities have not been assayed yet. These peptides can be subjected to predictive analysis using TriNet to assess their antimicrobial potentials, followed by experimental validations. You can also refer to the following four publicly available AMP databases: ADAM,¹⁷ APD,²⁰ CAMP²¹ and LAMP.²² In addition to the potential AMPs in public databases, you can collect candidate AMPs from other sources. For example, in a recent study,¹² all the microbial genomes of the human gut microbiome were assembled, and then many potential AMPs were extracted via predicted ORFs on the assembled genomes.

Display of FASTA files and downloading interfaces of BLAST and uniref. 90 database

(A) A FASTA format file.

(B) The downloading interface of BLAST.

(C) The download interface of the uniref. 90 database.

Running TriNet with its default trained model

Timing: <5 min

This step is to execute the prediction module, which requires you to submit a FASTA file containing one or more peptide sequences. In addition, you may opt to submit corresponding PSSM profiles, which provide evolutionary information for the input peptide sequence(s) and can potentially improve the prediction accuracy. After running, TriNet returns to you an output file in the format of CSV (detailed in the section of “expected outcomes”).

Note: The amount of time consumed is related to the numbers and lengths of the submitted peptide sequences.

4.
Access the TriNet directory from the PowerShell Prompt Console (as shown in Figure 1B).

Note: For example, if the TriNet.py file is located in the TriNet-master folder, which is in the download folder, you should type the following command into the console prompt.

>cd {downloads_path}/TriNet-master

5.
Launch the TriNet tool.
- a.
  Run TriNet by the following command.
  >TriNet.py [-h] [--PSSM_file PSSM_FILE] [--sequence_file SEQUENCE_FILE] [--output OUTPUT] [--operation_mode OPERATION_MODE]
- b.
  Set the required parameters as follows.
  - i.
    --sequence_file, -s SEQUENCE_FILE: The path of a FASTA format peptide sequence file, type = str.
  - ii.
    --operation_mode, -mode OPERATION_MODE: Four options are offered for the ‘-mode’ item, including ‘sc’, ‘sm’, ‘fc’ and ‘fm’, where ‘s’ represents the standard mode (needing you to provide PSSM profiles), ‘f’ represents the fast mode (only needing a FASTA file), ‘c’ means the prediction of anticancer peptides, and ‘m’ means the prediction of antimicrobial peptides, type = str. e.g., ‘sc’ represents the standard prediction of anticancer peptides.
- c.
  Set the optional parameters as follows.
  - i.
    -h, --help: Printing the help message.
  - ii.
    --output, -o OUTPUT: The path of the prediction result, default with the path of {current path}/output, type = str.
  - iii.
    --PSSM_file, -p PSSM_FILE: The path of PSSM feature profiles, which is a required parameter if you choose the standard mode for prediction, type = str.
    Note: The following command represents the prediction of anticancer peptides by the standard mode, using the ACP_example.fasta file and the PSSM profiles under the ./pssm_acp_example path, and returns the output file acpout.csv.
    
    >python TriNet.py -mode sc -s ./ACP_example.fasta -p ./pssm_acp_example/ -o ./acpout.csv

Utilizing online web service with its default trained model

Timing: <5 min

In addition to the approach for utilizing TriNet mentioned above, we also provide an online web service that enables the prediction of both anticancer and antimicrobial peptides. This alternative approach provides a convenient and accessible means of accessing the functionality of TriNet.

6.
Navigate to the following web address http://liulab.top/TriNet/server to access the online service page (see Figure 4).
7.
Review the notes before using the TriNet online services.

Note: These notes contain essential information that you should be aware of before proceeding with the service.

Note: The use of the fast mode of TriNet for predicting anticancer or antimicrobial peptides requires only a FASTA file containing the peptide sequences to be predicted.

8.
Upload a FASTA file and click the “Submit” button to initiate the fast mode analysis process (see Figure 5A).

Note: If you opt for the standard mode of TriNet to predict anticancer or antimicrobial peptides, additional PSSM profiles are needed.

9.
Upload all the required files and click the “Submit” button to initiate the standard mode analysis process (see Figure 5B).

Note: The file names of the PSSM profiles should be numbers and be related to the orders of the peptide sequences. For example, if there are N peptide sequences in the FASTA file, the corresponding N PSSM files should be named 1.pssm, 2.pssm, ..., N.pssm.

Note: Upon submission of the task, the program will execute and generate a result file, which can be downloaded (see Figure 6). Additionally, the interface provides an option for you to view the results online (see Figure 6, detailed in the “expected outcomes” section).

The online web server of TriNet

(A) The web server homepage (http://liulab.top/TriNet/server) offers a “standard mode” using PSSM features and a “fast mode” without using PSSM features for both ACP and AMP predictions. For the fast mode, you only need to upload a FASTA file as the input.

(B) For the standard mode, you need to upload a FASTA file and corresponding PSSM files as the inputs.

Usage of the online services by the ACP_example.fasta file

(A) Prediction of the anticancer peptides by using the fast mode.

(B) Prediction of the anticancer peptides by using the standard mode.

Illustration of the prediction output interface

Retraining TriNet by using a new training dataset

Timing: <20 min

In this step, we give the detailed steps of retraining TriNet by using a new dataset.

Note: The TriNet project provides the ability to adjust hyperparameters when retraining the model by using your own data as follows. First, download the training code from https://github.com/hjy23/TriNet-Reproducing. Second, activate the TriNet virtual environment and ensure that all necessary dependencies are available. Third, the desired training parameters are adjusted as necessary, and the training process is performed. Here, we use the ACPmain dataset, which is already available in the TriNet-Reproducing file, as an example to demonstrate the training process. If the you intend to train TriNet on your own datasets, you must simply replace the ACPmain dataset with your new datasets. In addition, if you are not interested in evaluating the accuracy of the trained model on an independent test dataset, then there is no need to set a test dataset. However, it is recommended to set aside a portion of the dataset as a test set and evaluate the performance of the trained model on it. For the training set, we split it into a training set and a validation set in a fixed 4:1 ratio. TriNet employs our developed Training and Validation Interaction (TVI) (i.e., iteratively interacting samples between the training and validation sets) method to construct more appropriate training and validation sets for model training.

10.
Perform model training as follows.
- a.
  Download the train code from https://github.com/hjy23/TriNet-Reproducing and unzip it.
- b.
  Navigate to the root directory of the TriNet-Reproducing project.
  >cd TriNet-Reproducing
- c.
  Activate the TriNet virtual environment.
  >conda activate TriNet
- d.
  Run the training script train.py and pass in all necessary parameters to retrain TriNet.
  >python train.py -type ACP -use_PSSM True -train_fasta ./data/ACPmain.txt -train_pssm ./data/pssm_acpmain/ -use_test True -test_fasta ./data/ACPmaintest.txt -test_pssm ./data/pssm_acpmaintest/ -use_TVI True
  Note: When running the Python script via the command line, you have the option to pass in parameters and modify them as desired. However, if no parameters are passed, the script will automatically use default settings. This flexibility allows for customized execution of the script based on specific requirements or preferences.
- e.
  Set the necessary parameters as follows.
  - i.
    --type: The type of training model is antimicrobial peptide (AMP) or anticancer peptide (ACP), type = str, ‘ACP’ or ‘AMP’.
  - ii.
    --use_PSSM: Whether to use the PSSM feature, type = bool, ‘Ture’ or ‘False’.
  - iii.
    --use_TVI: Whether to use the TVI approach to automatically select the more appropriate training and validation sets, type = bool, ‘Ture’ or ‘False’.
  - iv.
    --use_test: Whether to use an independent test set to evaluate the trained model, type = bool, ‘Ture’ or ‘False’.
  - v.
    --train_fasta: The path of the training FASTA file.
  - vi.
    --train_pssm: The path of the train PSSM files while the parameter --use_PSSM is set to be True.
  - vii.
    --test_fasta: The path of the test FASTA file while the parameter --use_test is set to be True.
  - viii.
    --test_pssm: The path of the test PSSM files while the parameters --use_PSSM and --use_test are both set to be True.
- f.
  Set the optional parameters as follows.
  - i.
    --dense_unit_dcgr: The dimension of features learned from the DCGR feature matrix extracted as serial fingerprint information in the first stage, type = int.
  - ii.
    --dense_unit_pssm: The dimension of features learned from the PSSM feature extracted as evolutionary information in the second stage, type = int.
  - iii.
    --dense_unit_all: The dimension of the concatenation features containing the fingerprint, evolutionary and physicochemical property information, type = int.
  - iv.
    --dff: The dimension of hidden units in the middle linear layer of a pointwise feed-forward network in the encoder of the third stage, type = int.
  - v.
    --d1: The probability of randomly dropping input units during each update in the training process of the first stage, type = float.
  - vi.
    --d2: The probability of randomly dropping input units during each update in the training process after the concatenation of the three stages, type = float.
  - vii.
    --lr: The initial learning rate used to control the step size of parameter updates, type = float.
  - viii.
    --batch_size: The number of samples used in each iteration, type = int.
  - ix.
    --decay_rate: The decline percentage of the learning rate in each step of the dynamic learning rate adjustment, type = float.
  - x.
    --decay_step: The number of steps of learning rate decay in each step of the dynamic learning rate adjustment, type = int.
  - xi.
    --epoch: The number of epochs used for training, type = int.
    Note: An example of the training command by setting full parameters.
    
    >python train.py -type ACP -use_PSSM True -use_TVI True -use_test True -train_fasta ./data/ACPmain.txt -train_pssm ./data/pssm_acpmain/ -test_fasta ./data/ACPmaintest.txt -test_pssm ./data/pssm_acpmaintest/ -dense_unit_dcgr 350 -dense_unit_pssm 120 -dense_unit_all 200 –dff 6 -d1 0.5 -d2 0.5 -lr 0.0004 -decay_rate 0.95 -decay_step 500 -batch_size 64 – epoch 50
    Note: In deep neural networks, it is imperative to recognize the importance of hyperparameters because they highly influence the performance of a model.
11.
Tune hyperparameters.
- a.
  Tune learning rate.
  Note: The learning rate is a hyperparameter that governs the amplitude of weight adjustments during each iteration of the model. For smaller-scale datasets, as well as for relatively simple models, employing a smaller learning rate can contribute to a steadier convergence. In contrast, for larger datasets, a larger learning rate has the potential to expedite the training phase while exploiting the statistical properties of the data. It is recommended to initialize the learning rate within the range of 0.1 to 0.0001, followed by fine-tuning through various learning rate schedulers, while diligently monitoring the loss and accuracy metrics.
- b.
  Tune batch size and epoch.
  Note: The batch size is important in defining the quantity of samples processed in each training iteration, while the epoch determines the total number of traversals across the entire dataset. Employing larger batch sizes can improve the training efficiency, especially for larger datasets. However, this also amplifies the susceptibility to overfitting. Conversely, smaller batch sizes usually foster superior model generalization ability.
- c.
  Tune dropout rate.
  Note: Dropout constitutes a prominent regularization strategy aimed at reducing overfitting.²³ The use of smaller datasets, datasets with a higher degree of sequence similarity, datasets comprising shorter peptide sequences, and more complex models inherently increases the susceptibility of the model to overfitting. In such situations, increasing the dropout rate is usually effective in improving the generalization ability of the model. While increasing the dropout rate is effective in reducing overfitting, an excessive dropout rate can hinder the ability of the model to discern complex patterns. It is recommended to initiate the dropout rate at approximately 0.5, followed by adjustments based on the performance of the model with the validation set.
- d.
  Tune hidden size.
  Note: Hidden size is also a crucial aspect that demands fine-tuning based on the dataset characteristics. For larger datasets or those containing many longer sequences, a higher hidden size can improve the complexity and representative capacity of the model. Conversely, a smaller hidden size is suitable for relatively modest datasets to reduce overfitting and promote more robust generalization ability by diminishing the complexity of the model.
- e.
  Tune feature embedding dimension.
  Note: The three hyperparameters, dense_unit_dcgr, dense_unit_pssm, and dense_unit_all, should be also determined by considering the characteristics of a dataset. For datasets that are either smaller in scale, have lower sequence similarity, or contain shorter sequences, it is advisable to select lower dimensional embeddings. Conversely, for larger datasets, adding the dimensions of feature embeddings is an effective approach to obtain an enriched contextual understanding of the data.
  
  Note: TriNet employs binary cross-entropy as the loss function and the Adam optimizer for model optimization. Moreover, the learning rate is adeptly managed by a learning rate scheduler, which fine-tunes the learning rate in real time to ensure the optimal adaptation to the training data. During training, the model may encounter the problems of overfitting and underfitting.
12.
Solve the possible overfitting and underfitting issues.
- a.
  Apply data augmentation.
  Note: Increasing the amount of data and adding more diverse types of data can reduce the risk of overfitting and improve the generalization ability of the model.
- b.
  Apply regularization techniques.
  Note: Employing regularization techniques can penalize model parameters if they are too large, thereby preventing overfitting. The batch normalization technique is used in TriNet to avoid overfitting.
- c.
  Add dropout layers.
  Note: Adding dropout layers in the network can prevent the model from becoming overly reliant on any specific neurons and hence avoid overfitting. The dropout technique is also employed to avoid overfitting in TriNet.
- d.
  Apply early stopping.
  Note: Monitoring the validation loss during training and stopping once the validation loss ceases to decrease (indicating potential overfitting) can prevent overfitting. The early stopping technique is also applied to avoid overfitting in TriNet.
- e.
  Adjust model complexity.
  Note: Opting for a simpler model if overfitting occur, or conversely, a more complex model if underfitting is observed can address these problems. You can adjust the model complexity by tuning the hidden sizes and the embedding sizes.
- f.
  Increase training epochs.
  Note: Adding training iterations is effective in capturing the underlying patterns of the data and hence avoiding underfitting.
  
  Note: The advantages and disadvantages of model retraining with a new dataset. The advantages include the potentials to enhance its predictive capabilities by leveraging new data to recalibrate the parameters of the model. Furthermore, it may overcome the problem of model overfitting and improve its ability to generalize to unseen data. However, it is imperative to note the disadvantages associated with the retraining process. One such disadvantage is the additional expenditure of time and computational resources in conducting the retraining. Additionally, the collections and relabeling of new datasets increase the development cycle and incurs elevated costs. Another potential disadvantage is the risk of performance attrition. If retraining is not effectively implemented with appropriate data management and training strategies, it can cause performance degradation or induce new errors into the model. Based on these considerations, it is crucial to weigh the trade-offs of time, resources, and performance enhancements when determining the retraining of a model.
13.
Utilize the retrained TriNet model.
Note: After retraining, the output is archived in the “checkpoints” folder. To utilize the model for the prediction of ACPs or AMPs, you should execute the “test.py” script and pass in all required parameters. The parameters “-mode”, “-s”, “-p”, and “-o” are the same as those described in the “running TriNet with its default trained model” section, while the “-mp” parameter is explained as follows.
- a.
  --model_path, -mp: The path of the model to be loaded, type = str.
  Note: The command below demonstrates the utilization of a retrained model for the prediction of anticancer peptides. In this instance, the ACP_example.fasta file and the PSSM profiles located within the ./pssm_acp_example directory serve as the inputs through the standard mode. The output is stored in the form of a file named acpout.csv.

>python test.py -mode sc -s ./ACP_example.fasta -p ./pssm_acp_example/ -o ./acpout.csv -mp ./checkpoints/2023-06-28-19_26_42

Expected outcomes

In this section, we elucidate the expected outcomes attributable to the utilization of TriNet. First, we describe the expected outcomes by using the default trained TriNet model. Second, we demonstrate the expected outcomes by using the TriNet web-based online service. Third, if you decide to retrain the model utilizing your own dataset, we provide examples of expected outcomes. Finally, we show performance of TriNet on the ACPalttest and AMPlifytest datasets, employing the standard versions of the ACP and AMP models, respectively.

The default trained TriNet model

When utilizing the default trained model, TriNet yields an output in the form of a CSV file (Figure 7). This file is structured into three principal constituents as follows. 1) The original peptide sequence(s). 2) The computed probability, indicative of the likelihood that the input peptide sequence(s) belong to the ACP or AMP category. A predicted probability threshold of 0.5 or higher is adopted as the criterion for classification, sequences meeting or surpassing this threshold are classified as ACP or AMP, while those falling below this threshold are classified as non-ACP or non-AMP. 3) The definitive classification label(s), assigned by TriNet to the input peptide sequence(s), wherein a label of ‘1’ signifies a predicted ACP or AMP, and ‘0’ denotes a predicted non-ACP or non-AMP.

The web-based online service of TriNet

By employing TriNet via its web-based interface, you can receive not only the aforementioned CSV file but also a web-based result interface (Figure 6), which is delineated into three primary segments as follows. 1) Sequential order of the peptide sequence(s). For instance, an entry labeled as “3” denotes the third peptide sequence present in your submitted FASTA file. 2) The prediction labels for the input peptide sequence(s), wherein a label of ‘1’ signifies a predicted ACP or AMP, and ‘0’ denotes a predicted non-ACP or non-AMP. 3) The predictive probabilities associated with each input peptide sequence, indicative of the likelihood that the input peptide sequence(s) belong to the ACP or AMP category.

The retrained TriNet model

After completing the training process, the trained model is archived within a directory titled “checkpoints”. The “checkpoints” folder contains all training results, each of which is included in a folder named by the timestamp of the corresponding training session (Figure 8). The training results are relevant output files such as “training.log” (recording the progression of the training process), the final trained model file labeled “Model.h5”, a sub-directory titled “logs” that provides visualization of the training process via TensorBoard (Figure 9), and an additional sub-directory named “test” that contains the predicted outcomes for the test dataset along with an ROC curve plot (Figure 10). In addition, we provide a method for displaying the training results using TensorBoard as follows.

Display of the "checkpoints" folder that contains all the training results

Visualization of the training results via TensorBoard

Visualization of the ROC curve of the ACPmaintest set

Navigate to the training directory and type the following commands to launch TensorBoard. Then, enter the following URL http://localhost:6006/ in a web browser to the visualization interface (Figure 9).

>cd ./checkpoints/2023-05-15-19_26_49/

>tensorboard --logdir="logs"

If you provided a test set, the evaluation of the predicted results will be automatically performed in terms of the metrics including accuracy (ACC), precision, sensitivity, specificity, F1_score, and Matthews Correlation Coefficient (MCC). These results are cataloged within the “training.log” file (Figure 11). The selected metrics are calculated as follows. In the following formulas, TP is true positive, FP is false positive, TN is true negative, FN is false negative.

A C C = \frac{T P + T N}{T P + T N + F P + F N}

P r e c i s i o n = \frac{T P}{T P + F P}

S e n s i t i v i t y = \frac{T P}{T P + F N}

S p e c i f i c i t y = \frac{T N}{T N + F P}

{F 1}_{s c o r e} = 2 \times \frac{P r e c i s i o n \times S e n s i t i v i t y}{P r e c i s i o n + S e n s i t i v i t y}

M C C = \frac{(T P \times T N) - (F P \times F N)}{\sqrt{(T P + F P) \times (T P + F N) \times (T N + F P) \times (T N + F N)}}

Visualization of the predicted results evaluated on the ACPmaintest set using metrics such as accuracy, precision, sensitivity, specificity, F1_score, and MCC

Performance evaluation of TriNet

In this section, we evaluate the performance of TriNet in predicting ACPs on the ACPalttest dataset, and AMPs on the AMPlifytest dataset by using the default TriNet model under the metrics of ACC, precision, sensitivity, specificity, F1_score, and MCC. The popular ACP predictors ACP-DL²⁴ and AntiCP-2.0², and AMP predictors CAMP-SVM¹⁶ and DNN¹⁷ are selected as the baselines for performance comparisons. Additionally, ROC curves are plotted for both datasets to showcase the predictive efficiency of TriNet (see Figure 12). The values of the corresponding metrics are demonstrated below (see Table 2; Table 3). Moreover, we show the predictions of top 10 peptides for both test sets (see Table 4; Table 5). From the prediction results, TriNet demonstrated excellent and reliable performance.

ROC curves for visualizing the performance of TriNet

(A) ROC curve of TriNet on the ACPalttest.

(B) ROC curve of TriNet on the AMPlifytest.

Table 2.

Comparison with baseline algorithms ACP-DL²⁴ and AntiCP-2.0² on the ACPalttest dataset

	ACC	Precision	Sensitivity	Specificity	F1_score	MCC
ACP-DL	0.863	0.880	0.839	0.886	0.859	0.726
AntiCP-2.0	0.922	0.950	0.891	0.953	0.920	0.846
TriNet	0.940	0.962	0.917	0.964	0.939	0.882

Open in a new tab

Table 3.

Comparison with baseline algorithms CAMP-SVM¹⁶ and DNN¹⁷ on the AMPlifytest dataset

	ACC	Precision	Sensitivity	Specificity	F1_score	MCC
CAMP-SVM	0.793	0.753	0.873	0.714	0.809	0.594
DNN	0.911	0.914	0.908	0.915	0.911	0.823
TriNet	0.938	0.944	0.931	0.945	0.937	0.876

Open in a new tab

Table 4.

Predictions of the top 10 peptides on the test set of the ACPalttest dataset

Peptides	Probability	Prediction label	True labels
1pos\|ACP20AltTest\|1	0.9997075	1	1
2pos\|ACP20AltTest\|1	0.99997747	1	1
3pos\|ACP20AltTest\|1	1.0	1	1
4pos\|ACP20AltTest\|1	0.9999138	1	1
5pos\|ACP20AltTest\|1	0.9999987	1	1
1neg\|ACP20AltTest\|0	0.00000000038015172	0	0
2neg\|ACP20AltTest\|0	0.000010250933	0	0
3neg\|ACP20AltTest\|0	0.05416303	0	0
4neg\|ACP20AltTest\|0	0.00077758677	0	0
5neg\|ACP20AltTest\|0	0.022113279	0	0

Open in a new tab

Table 5.

Predictions of the top 10 peptides on the test set of the AMPlifytest dataset

Peptides	Probability	Prediction label	True labels
AMP_1	1.0	1	1
AMP_2	0.9988477	1	1
AMP_3	0.99997747	1	1
AMP_4	0.99005216	1	1
AMP_5	0.9993668	1	1
non-AMP_1	0.44176498	0	0
non-AMP_2	0.000000053176816	0	0
non-AMP_3	0.000000000020138622	0	0
non-AMP_4	0.0002941991	0	0
non-AMP_5	0.0000000013553494	0	0

Open in a new tab

Limitations

None of the data used in this protocol were experimentally validated by us but rather referenced from previously published articles with sufficient experimental validation. In addition, the current model is not end-to-end, which means that it does not cover the complete workflow from input to output, and still takes some time to calculate the corresponding handcrafted features separately. Therefore, the inference time of TriNet may be longer than the end-to-end framework. Additionally, we developed a novel training approach termed as TVI, which is achieved by iteratively interacting the samples between the training and validation sets to generate more appropriate training and validation sets. However, the current version of TVI may, in certain cases, perform slightly worse than traditional training methods, and more attention should be paid to the improvements of TVI. We propose the following potential solutions and future directions to overcome these limitations.

Optimization of the feature extraction process

One possible approach to optimize the feature extraction process is to perform feature selection. Sometimes, the feature space may be characterized by high dimensionality, leading to great redundancy. In such cases, the incorporation of feature selection techniques can help reduce the prevalence of irrelevant or redundant features, thereby reducing the computational burden without compromising the model performance. For example, DCGR is employed in TriNet for feature extraction of peptide sequences.²⁵ However, DCGR curves are formulated based on the 158 physicochemical properties of amino acids and finally generate 158 features for each peptide sequence. Redundancy may exist within the 158 features. In the future, we will focus on the employment of feature selection techniques to select a subset of the 158 features that are most critical for the prediction of ACPs and AMPs.

An alternative approach to optimize the feature extraction process is to improve the computational time efficiency of feature extraction via parallel computing. For example, in the extraction of serial fingerprints, sequence evolutions, and multiple physicochemical properties of peptide sequences, the three feature extraction processes are inherently uncorrelated. As such, parallel computation can be employed to optimize the computation time. In addition, parallel computing can also be employed in the extraction of the 158 DCGR features.

Optimization of the TVI method

One possible approach to improve the TVI process involves the integration of a more substantial volume of data. The increase in data quantity can enhance the versatility of TVI selection for the training and validation datasets, thereby improving the generalization ability of the model.

Another possible approach to enhance the TVI process is to introduce feedback mechanisms into this process. This can be achieved by instituting more granular recordings and feedback of model predictions at the end of each epoch, including parameters such as the accuracy and loss metrics.

Moreover, fine-tuning the number of interaction samples exchanged between training and validation datasets based on the specific characteristics of the datasets may be effective in optimizing the TVI process.

Troubleshooting

Problem 1

Related to “downloading and installing the TriNet toolkit”. It is possible that the installation of CUDA and cuDNN in the virtual environment failed due to either outdated graphics driver version or missing dependencies.

Potential solution

It is suggested to install earlier versions of CUDA and cuDNN. However, it is essential to ensure that the installed versions match the information provided on the official TensorFlow website to guarantee compatibility. Install the missing dependency packages based on the error message.

Problem 2

Related to “Data preparation”. When preparing PSSM profiles for the peptide sequences, an error will occur if the names of the PSSM profiles fail to correspond to the peptide sequences in the input FASTA file.

Potential solution

The file names of the PSSM profiles should be numbers and related to the orders of the peptide sequences. For example, if there are N peptide sequences in the FASTA file, the corresponding N PSSM files should be named 1.pssm, 2.pssm, ..., N.pssm. Additionally, for some peptide sequences that are too short to calculate PSSM profiles, it is acceptable not to provide the PSSM profiles for these sequences.

Problem 3

Related to step 3 in “expected outcomes”. We solely focused on the local usage of Tensorboard. You may encounter an error if you are using Tensorboard remotely.

Potential solution

You can run TensorBoard remotely as follows.

•
Use SSH to map the port of the remote server to the local computer by using the following command.

>ssh -L 6006:127.0.0.1:6006 username@remote_server_ip

•
Start TensorBoard on the remote server by the following command on the remote server.

>tensorboard --logdir=“logs” --port=6006

•
Enter the URL http://localhost:6006/ in a web browser to the visualization interface.

Problem 4

Related to step 6 in the “running TriNet with its default trained model” section, executing the TriNet script may fail to yield an output file.

Potential solution

It is imperative to verify the parameters and pay more attention to ensuring that the destination path for the output file has been specified.

Problem 5

Related to step 11 in the “retraining TriNet by using a new training dataset” section. The retraining process for TriNet may be unsuccessful due to the omission of parameters.

Potential solution

To solve this problem, it is important to ensure that all necessary parameters, such as the type of model and whether to use the PSSM feature, are passed prior to initiating the retraining of TriNet.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Juntao Liu (juntaosdu@126.com).

Materials availability

Not applicable.

Acknowledgments

This work was supported by the National Key R&D Program of China with code 2020YFA0712400 and the National Natural Science Foundation of China with codes 61801265 and 62272268. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author contributions

Conceptualization, J.H., J.L.; protocol elaboration, J.H., S.Z.; software development, J.H.; writing, review, and editing, J.H., S.Z., J.L.; funding acquisition, J.L.

Declaration of interests

The authors declare no competing interests.

Data and code availability

The software code used by this protocol can be accessed at https://github.com/hjy23/TriNet and are also provided in an open access disposition at Zenodo: https://doi.org/10.5281/zenodo.8204494. The source training code, all training and testing sets are available at https://github.com/hjy23/TriNet-Reproducing and are also provided in an open access disposition at Zenodo: https://doi.org/10.5281/zenodo.8204486.

References

1.Zhou W., Liu Y., Li Y., Kong S., Wang W., Ding B., Han J., Mou C., Gao X., Liu J. TriNet: A tri-fusion neural network for the prediction of anticancer and antimicrobial peptides. Patterns. 2023;4 doi: 10.1016/j.patter.2023.100702. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Agrawal P., Bhagat D., Mahalwal M., Sharma N., Raghava G.P.S. AntiCP 2.0: an updated model for predicting anticancer peptides. Brief. Bioinform. 2021;22 doi: 10.1093/bib/bbaa153. [DOI] [PubMed] [Google Scholar]
3.Li C., Sutherland D., Hammond S.A., Yang C., Taho F., Bergman L., Houston S., Warren R.L., Wong T., Hoang L.M.N., et al. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genom. 2022;23:77. doi: 10.1186/s12864-022-08310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Moehlin J., Koshy A., Stüder F., Mendoza-Parra M.A. Protocol for using MULTILAYER to reveal molecular tissue substructures from digitized spatial transcriptomes. STAR Protoc. 2021;2 doi: 10.1016/j.xpro.2021.100823. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Mahlapuu M., Håkansson J., Ringstad L., Björn C. Antimicrobial Peptides: An Emerging Category of Therapeutic Agents. Front. Cell. Infect. Microbiol. 2016;6:194. doi: 10.3389/fcimb.2016.00194. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Chiangjong W., Chutipongtanate S., Hongeng S. Anticancer peptide: Physicochemical property, functional aspect and trend in clinical application (Review) Int. J. Oncol. 2020;57:678–696. doi: 10.3892/ijo.2020.5099. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Advancing Cancer Therapy (2021). Nat. Cancer 2, 245–246. 10.1038/s43018-021-00192-x. [DOI] [PubMed]
8.Madera L., Hoskin D.W. In: Antimicrobial Peptides Methods in Molecular Biology. Hansen P.R., editor. Springer New York; 2017. Protocols for Studying Antimicrobial Peptides (AMPs) as Anticancer Agents; pp. 331–343. [DOI] [PubMed] [Google Scholar]
9.Zhu Y., Hao W., Wang X., Ouyang J., Deng X., Yu H., Wang Y. Antimicrobial peptides, conventional antibiotics, and their synergistic utility for the treatment of drug-resistant infections. Med. Res. Rev. 2022;42:1377–1422. doi: 10.1002/med.21879. [DOI] [PubMed] [Google Scholar]
10.Tang T., Huang X., Zhang G., Lu M., Hong Z., Wang M., Huang J., Zhi X., Liang T. Oncolytic peptide LTX-315 induces anti-pancreatic cancer immunity by targeting the ATP11B-PD-L1 axis. J. Immunother. Cancer. 2022;10 doi: 10.1136/jitc-2021-004129. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Gomes D., Santos R., S Soares R., Reis S., Carvalho S., Rego P., C Peleteiro M., Tavares L., Oliveira M., Oliveira M. Pexiganan in Combination with Nisin to Control Polymicrobial Diabetic Foot Infections. Antibiotics. 2020;9:128. doi: 10.3390/antibiotics9030128. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Ma Y., Guo Z., Xia B., Zhang Y., Liu X., Yu Y., Tang N., Tong X., Wang M., Ye X., et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat. Biotechnol. 2022;40:921–931. doi: 10.1038/s41587-022-01226-0. [DOI] [PubMed] [Google Scholar]
13.Tornesello A.L., Borrelli A., Buonaguro L., Buonaguro F.M., Tornesello M.L. Antimicrobial Peptides as Anticancer Agents: Functional Properties and Biological Activities. Molecules. 2020;25:2850. doi: 10.3390/molecules25122850. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Kordi M., Borzouyi Z., Chitsaz S., Asmaei M.H., Salami R., Tabarzad M. Antimicrobial peptides with anticancer activity: Today status, trends and their computational design. Arch. Biochem. Biophys. 2023;733 doi: 10.1016/j.abb.2022.109484. [DOI] [PubMed] [Google Scholar]
15.Mei L., Wu F., Hao G., Yang G. Protocol for hit-to-lead optimization of compounds by auto in silico ligand directing evolution (AILDE) approach. STAR Protoc. 2021;2 doi: 10.1016/j.xpro.2021.100312. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Thomas S., Karnik S., Barai R.S., Jayaraman V.K., Idicula-Thomas S. CAMP: a useful resource for research on antimicrobial peptides. Nucleic Acids Res. 2010;38:D774–D780. doi: 10.1093/nar/gkp1021. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Veltri D., Kamath U., Shehu A. Deep learning improves antimicrobial peptide recognition. Bioinformatics. 2018;34:2740–2747. doi: 10.1093/bioinformatics/bty179. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Suzek B.E., Huang H., McGarvey P., Mazumder R., Wu C.H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23:1282–1288. doi: 10.1093/bioinformatics/btm098. [DOI] [PubMed] [Google Scholar]
19.Shi G., Kang X., Dong F., Liu Y., Zhu N., Hu Y., Xu H., Lao X., Zheng H. DRAMP 3.0: an enhanced comprehensive data repository of antimicrobial peptides. Nucleic Acids Res. 2022;50:D488–D496. doi: 10.1093/nar/gkab651. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Wang G., Li X., Wang Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 2016;44:D1087–D1093. doi: 10.1093/nar/gkv1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Waghu F.H., Barai R.S., Gurung P., Idicula-Thomas S. CAMP R3 : a database on sequences, structures and signatures of antimicrobial peptides: Table 1. Nucleic Acids Res. 2016;44:D1094–D1097. doi: 10.1093/nar/gkv1051. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Ye G., Wu H., Huang J., Wang W., Ge K., Li G., Zhong J., Huang Q. LAMP2: A Major Update of the Database Linking Antimicrobial Peptides. Database. 2020;2020:baaa061. doi: 10.1093/database/baaa061. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014;15:1929–1958. [Google Scholar]
24.Yi H.-C., You Z.-H., Zhou X., Cheng L., Li X., Jiang T.-H., Chen Z.-H. ACP-DL: A Deep Learning Long Short-Term Memory Model to Predict Anticancer Peptides Using High-Efficiency Feature Representation. Mol. Ther. Nucleic Acids. 2019;17:1–9. doi: 10.1016/j.omtn.2019.04.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Mu Z., Yu T., Qi E., Liu J., Li G. DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information. BMC Bioinf. 2019;20:351. doi: 10.1186/s12859-019-2943-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[bib1] 1.Zhou W., Liu Y., Li Y., Kong S., Wang W., Ding B., Han J., Mou C., Gao X., Liu J. TriNet: A tri-fusion neural network for the prediction of anticancer and antimicrobial peptides. Patterns. 2023;4 doi: 10.1016/j.patter.2023.100702. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2.Agrawal P., Bhagat D., Mahalwal M., Sharma N., Raghava G.P.S. AntiCP 2.0: an updated model for predicting anticancer peptides. Brief. Bioinform. 2021;22 doi: 10.1093/bib/bbaa153. [DOI] [PubMed] [Google Scholar]

[bib3] 3.Li C., Sutherland D., Hammond S.A., Yang C., Taho F., Bergman L., Houston S., Warren R.L., Wong T., Hoang L.M.N., et al. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BMC Genom. 2022;23:77. doi: 10.1186/s12864-022-08310-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] 4.Moehlin J., Koshy A., Stüder F., Mendoza-Parra M.A. Protocol for using MULTILAYER to reveal molecular tissue substructures from digitized spatial transcriptomes. STAR Protoc. 2021;2 doi: 10.1016/j.xpro.2021.100823. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] 5.Mahlapuu M., Håkansson J., Ringstad L., Björn C. Antimicrobial Peptides: An Emerging Category of Therapeutic Agents. Front. Cell. Infect. Microbiol. 2016;6:194. doi: 10.3389/fcimb.2016.00194. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] 6.Chiangjong W., Chutipongtanate S., Hongeng S. Anticancer peptide: Physicochemical property, functional aspect and trend in clinical application (Review) Int. J. Oncol. 2020;57:678–696. doi: 10.3892/ijo.2020.5099. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7.Advancing Cancer Therapy (2021). Nat. Cancer 2, 245–246. 10.1038/s43018-021-00192-x. [DOI] [PubMed]

[bib8] 8.Madera L., Hoskin D.W. In: Antimicrobial Peptides Methods in Molecular Biology. Hansen P.R., editor. Springer New York; 2017. Protocols for Studying Antimicrobial Peptides (AMPs) as Anticancer Agents; pp. 331–343. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Zhu Y., Hao W., Wang X., Ouyang J., Deng X., Yu H., Wang Y. Antimicrobial peptides, conventional antibiotics, and their synergistic utility for the treatment of drug-resistant infections. Med. Res. Rev. 2022;42:1377–1422. doi: 10.1002/med.21879. [DOI] [PubMed] [Google Scholar]

[bib10] 10.Tang T., Huang X., Zhang G., Lu M., Hong Z., Wang M., Huang J., Zhi X., Liang T. Oncolytic peptide LTX-315 induces anti-pancreatic cancer immunity by targeting the ATP11B-PD-L1 axis. J. Immunother. Cancer. 2022;10 doi: 10.1136/jitc-2021-004129. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] 11.Gomes D., Santos R., S Soares R., Reis S., Carvalho S., Rego P., C Peleteiro M., Tavares L., Oliveira M., Oliveira M. Pexiganan in Combination with Nisin to Control Polymicrobial Diabetic Foot Infections. Antibiotics. 2020;9:128. doi: 10.3390/antibiotics9030128. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12.Ma Y., Guo Z., Xia B., Zhang Y., Liu X., Yu Y., Tang N., Tong X., Wang M., Ye X., et al. Identification of antimicrobial peptides from the human gut microbiome using deep learning. Nat. Biotechnol. 2022;40:921–931. doi: 10.1038/s41587-022-01226-0. [DOI] [PubMed] [Google Scholar]

[bib13] 13.Tornesello A.L., Borrelli A., Buonaguro L., Buonaguro F.M., Tornesello M.L. Antimicrobial Peptides as Anticancer Agents: Functional Properties and Biological Activities. Molecules. 2020;25:2850. doi: 10.3390/molecules25122850. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14.Kordi M., Borzouyi Z., Chitsaz S., Asmaei M.H., Salami R., Tabarzad M. Antimicrobial peptides with anticancer activity: Today status, trends and their computational design. Arch. Biochem. Biophys. 2023;733 doi: 10.1016/j.abb.2022.109484. [DOI] [PubMed] [Google Scholar]

[bib15] 15.Mei L., Wu F., Hao G., Yang G. Protocol for hit-to-lead optimization of compounds by auto in silico ligand directing evolution (AILDE) approach. STAR Protoc. 2021;2 doi: 10.1016/j.xpro.2021.100312. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] 16.Thomas S., Karnik S., Barai R.S., Jayaraman V.K., Idicula-Thomas S. CAMP: a useful resource for research on antimicrobial peptides. Nucleic Acids Res. 2010;38:D774–D780. doi: 10.1093/nar/gkp1021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] 17.Veltri D., Kamath U., Shehu A. Deep learning improves antimicrobial peptide recognition. Bioinformatics. 2018;34:2740–2747. doi: 10.1093/bioinformatics/bty179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Suzek B.E., Huang H., McGarvey P., Mazumder R., Wu C.H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23:1282–1288. doi: 10.1093/bioinformatics/btm098. [DOI] [PubMed] [Google Scholar]

[bib19] 19.Shi G., Kang X., Dong F., Liu Y., Zhu N., Hu Y., Xu H., Lao X., Zheng H. DRAMP 3.0: an enhanced comprehensive data repository of antimicrobial peptides. Nucleic Acids Res. 2022;50:D488–D496. doi: 10.1093/nar/gkab651. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20.Wang G., Li X., Wang Z. APD3: the antimicrobial peptide database as a tool for research and education. Nucleic Acids Res. 2016;44:D1087–D1093. doi: 10.1093/nar/gkv1278. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21.Waghu F.H., Barai R.S., Gurung P., Idicula-Thomas S. CAMP R3 : a database on sequences, structures and signatures of antimicrobial peptides: Table 1. Nucleic Acids Res. 2016;44:D1094–D1097. doi: 10.1093/nar/gkv1051. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] 22.Ye G., Wu H., Huang J., Wang W., Ge K., Li G., Zhong J., Huang Q. LAMP2: A Major Update of the Database Linking Antimicrobial Peptides. Database. 2020;2020:baaa061. doi: 10.1093/database/baaa061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] 23.Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014;15:1929–1958. [Google Scholar]

[bib24] 24.Yi H.-C., You Z.-H., Zhou X., Cheng L., Li X., Jiang T.-H., Chen Z.-H. ACP-DL: A Deep Learning Long Short-Term Memory Model to Predict Anticancer Peptides Using High-Efficiency Feature Representation. Mol. Ther. Nucleic Acids. 2019;17:1–9. doi: 10.1016/j.omtn.2019.04.025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25.Mu Z., Yu T., Qi E., Liu J., Li G. DCGR: feature extractions from protein sequences based on CGR via remodeling multiple information. BMC Bioinf. 2019;20:351. doi: 10.1186/s12859-019-2943-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Protocol for predicting peptides with anticancer and antimicrobial properties by a tri-fusion neural network

Jiyun Han

Shizhuo Zhang

Juntao Liu

Summary

Graphical abstract

Highlights

Before you begin

Introduction

Downloading and installing the TriNet toolkit

Figure 1.

Key resources table

Step-by-step method details

Data preparation

Table 1.

Figure 2.

Figure 3.

Running TriNet with its default trained model

Utilizing online web service with its default trained model

Figure 4.

Figure 5.

Figure 6.

Retraining TriNet by using a new training dataset

Expected outcomes

The default trained TriNet model

Figure 7.

The web-based online service of TriNet

The retrained TriNet model

Figure 8.

Figure 9.

Figure 10.

Figure 11.

Performance evaluation of TriNet

Figure 12.

Table 2.

Table 3.

Table 4.

Table 5.

Limitations

Optimization of the feature extraction process

Optimization of the TVI method

Troubleshooting

Problem 1

Potential solution

Problem 2

Potential solution

Problem 3

Potential solution

Problem 4

Potential solution

Problem 5

Potential solution

Resource availability

Lead contact

Materials availability

Acknowledgments

Author contributions

Declaration of interests

Data and code availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases