Abstract
Vaccine development is one of the most promising fields, and multi-epitope vaccine, which does not need laborious culture processes, is an attractive alternative to classical vaccines with the advantage of safety, and efficiency. The rapid development of algorithms and the accumulation of immune data have facilitated the advancement of computer-aided vaccine design. Here we systemically reviewed the in silico data and algorithms resource, for different steps of computational vaccine design, including immunogen selection, epitope prediction, vaccine construction, optimization, and evaluation. The performance of different available tools on epitope prediction and immunogenicity evaluation was tested and compared on benchmark datasets. Finally, we discuss the future research direction for the construction of a multiepitope vaccine.
Keywords: multi-epitope vaccine, computational design, bioinformatics tools, epitope
Introduction
Vaccination is the most cost-effective intervention for the prevention and treatment of disease, which could save over 2.5 million people in estimation each year [1], and vaccination is seen as one of the most important medical achievements of the past two centuries [2]. The development and publicize of multiple specific-disease vaccines have greatly decreased the morbidity and mortality rates, as well as lightened the social health burden. Typical examples such as the mortality and incidence rate of tetanus, measles, mumps, and other similar diseases have been dramatically reduced by 99% or even eradicated to the popularity of related vaccines [3]. More recent examples of COVID-19 vaccines can quickly control the devastating coronavirus outbreak, demonstrating the power of vaccines. The successful development of therapeutic vaccines for prostate cancer and multivalent vaccines for pneumococcus indicates that the effect of vaccines has been extended to the prevention and treatment of cancer and other diseases with high antigen variability [3]. The clinical trial result of mRNA vaccines against melanoma exhibited the potential and promise of personalized cancer vaccines [4].
A vaccine is a biological product that must contain antigenic components derived from the pathogen or produced synthetically to represent the pathogen [5]. Vaccines could effectively control the spread of infectious disease outbreaks, the faster a vaccine is deployed, the faster an outbreak can be controlled [6]. However, it would take almost 5-10 years to develop a traditional live-attenuated vaccine or inactivated vaccine [6], the time–cost and tedious experiment process makes classical vaccines not suitable for the prevention and control of emergent epidemics. In addition, the attenuated or killed part of pathogens may return to its virulent state which would lose the ability to protect and may even cause damage to the human body [7]. Besides classical live-attenuated and inactivated vaccines, there are several platforms including subunit vaccines, nucleic acid-based vaccines, viral vectors, and virus-like particles have been developed over the past few decades [5].
The subunit vaccines and other platforms could eliminate the risk of ‘return to virulent state’, meanwhile subunit vaccines are more controllable, which are defined as a vaccine that recognizes and utilizes only the antigenic components of the whole pathogen as the novel antigens [8]. Subunit vaccines are mostly composed of one or multiple protein antigens [9], and this composition would not introduce non-antigenic components like classical vaccination using whole organisms which could improve safety, while the recombinant protein-based subunit vaccines would face the questions as high-cost, low stability and difficult to purify and the risk of autoimmune responses [9, 10]. The use of epitopes for the development of subunit vaccines is considered an attractive solution since the epitope-based vaccine can be produced easily and economically with high effectiveness and minimal side effects. The antigenic epitope is the particular part of an antigen that would be recognized by receptors and the basic unit that could elicit an adaptive immune response [11]. The multi-epitope vaccine is a promising strategy for the prophylactic and therapeutic against tumors and viral infections [12], meanwhile, an epitope-based strategy also plays an important role in assisting mRNA vaccine design [13], mRNA vaccine is one of the technologies leading the development of COVID-19 vaccines. Currently, there are 102 multi-epitope vaccines targeting various diseases at different stages of clinical trials, 10 of which are against COVID-19. (https://clinicaltrials.gov/) (20,240,803).
With the explosion of immune-related information and the development of technology, classical immunology has been overtaken by a high-efficiency in silico approach, termed Reverse Vaccinology (RV) (i.e. genomic-based rather than pathogen itself approaches to develop vaccines) [14, 15]. Other in silico processes such as subtractive genomics, proteomics, computational vaccinology, and immunoinformatics, have been proposed and integrated with RV to make the development of vaccines from conventional methods to rational design based on the knowledge of whole genome sequence, host-pathogen interaction, immunological data, omics technologies, and computational tools [16–18]. The advancement of computational methodologies has significantly facilitated vaccine development. For instance, RV-related technologies can be employed to modify virulence gene fragments in pathogens for attenuated vaccines. For nucleic acid vaccines, the antigen sequences can be screened through data resources and be optimized by the property calculation. The stability and immunogenicity of VLPs can be assessed through structural modeling and in silico prediction. In terms of subunit vaccines, especially for multi-epitope vaccines, different algorithms could be applied in the process of vaccine development, including rapid screening of candidate antigens, high-throughput prediction of core epitopes, vaccine optimization, and immunogenicity prediction. Most computational tools are easy to use, open source, and user friendly, and have paved the way to the rational development of multi-epitope vaccines.
In this review, we summarized the critical steps of rational development of multi-epitope vaccines and the advances in relevant computational tools and immune databases, we also discussed the current strategies and widely used bioinformatics tools of each step for the development of multi-epitope vaccines on the basis of the relative literature. Further, we designed three benchmark validations on T-cell epitope predictors, B-cell epitope predictors, and immunogenicity predictors for real-world vaccine sequences. While there is still room for improvement in prediction performance, the benchmark test and real-world successful cases illustrated that in-silico methods have broad potential for vaccine development.
The pipeline of design epitope vaccines and related information resources
The development of multi-epitope vaccines is a multi-step process, and the computational design of vaccines is mainly focused on four aspects (Fig. 1). (a) Determination of potential vaccine antigen candidates and predicting epitopes with desired ability to evoke immune response. The immunogen is the core element of rational multi-epitope vaccines and determines the immune target and specificity. Identification of appropriate epitopes is the key part of epitope vaccines, which would influence immune efficiency and coverage [13]. (b) Computationally construction of vaccines with the aid of adjuvant, linker, and other functional peptides, helps maintain the immunogenicity and stability of vaccines. (c) Optimizing designed vaccines through in silico codon optimization could improve expression efficiency [10]. (d) Immunological testing and validation. A series of evaluations for vaccines should be taken to screen the ideal vaccine with the ability to induce a robust immune response, non-allergic and non-toxic to humans, and could provide broad protection. In silico evaluation could help reduce the time of experimental validation. The necessity and significance of each step will be discussed in the corresponding section of this review.
Figure 1.
Steps of rational in silico design a multi-epitope vaccine. The process could be divided into four parts: A. Immunogen design, the designed multi-epitope vaccine is expected to protect humans against most strains of the targeted pathogen which is related to the selection of representative antigens and the prediction of epitopes at first. Appropriate identification of antigen and epitope is the core step of development of vaccines which determines the efficiency and specificity of the induced immune response. B. Vaccine construction, the vaccine candidates could be constructed with the aid of adjuvant, linker, and other functional peptides after the determination of antigenic epitopes. The application of them is to increase the immunity of vaccines. C. Vaccine optimization, to achieve the maximal expression of designed vaccines in heterologous hosts, the codon optimization of constructed vaccines should be performed using bioinformatics tools. D. in silico evaluation, the designed vaccine candidates should be verified by a series of computational evaluations before the experiments which could reduce the number of experiments and save unnecessary losses.
The accurate selection of appropriate antigens and their immunodominant epitopes is the first key and selective step for the development of the multi-epitope vaccine [12]. With the development of relevant technologies and a deeper understanding of related knowledge of immunity, the bioinformatics-based methods for protein antigen selection have taken over classical vaccinology which is laborious and time-consuming conventional approaches that require pathogen cultivation. There are many related immunological databases and analytical tools have been created by researchers for vaccine design in the past few decades, the type of information has almost covered all fields and reliable. The detailed description of these databases such as the amount and type of stored data are listed in Table 1. The integrated approaches were proposed to accelerate the progress of filtering and determining the antigen proteins, there are several computational pipelines for the identification of potential vaccine candidates (PVCs) have been developed, taking the genomes or proteomes of pathogens as input, the final output is the predicted or created PVC of the whole data which could be used for further study. The description of typical tools can be found in the ‘Supplementary Pipelines for PVCs’ part.
Table 1.
List of immune-related databases
| Name | Application | External tools | Manual curation | Experiment validation | Data (20240726) | URL | Ref |
|---|---|---|---|---|---|---|---|
| Database for comprehensive information | |||||||
| VIOLIN | Various vaccine-associated research data | >10 relatively independent programs | Y | Y | 4708 vaccines or vaccine candidates for 219 pathogens, 24,345 vaccine-related abstracts, and 10,317 full-text documents | https://violinet.org/ | [19] |
| IEDB | Immune epitope data related to all species and includes antibody, T cell, and MHC binding contexts | Epitope prediction tools, epitope analysis tools | Y | Y | 1,619,740 peptidic epitopes, 3188 non-peptidic epitopes, 537,708 T cell assays, 1,406,296 B cell assays, 4,879,777 MHC ligand assays, 4520 epitope source organisms, 1011 restricting MHC alleles, and 24,982 references | https://www.iedb.org/ | [20] |
| CEDAR | A companion site of IEDB, providing cancer-specific epitope and receptor data | Epitope prediction tools, epitope analysis tools | Y | Y | 1305,154 peptidic epitopes, 87 non-peptidic epitopes, 151,618 T cell assays, 131,693 B cell assays, 4,057,203 MHC ligand assays, 1663 epitope source organisms, 662 restricting MHC alleles, and 5264 references | https://cedar.iedb.org/home_v3.php | [21] |
| IMGT | Genes, sequence, and structure information of immunoglobulin and T cell receptor of vertebrates. | 17 online tools of genes, sequences and structures | N | N | comprises 7 databases, 17 online tools, and > 20,000 pages of web resources. | https://www.imgt.org/ | [22] |
| AntigenDB | antigens with basic information and additional information | Epitope search; peptide mapping; antigenic blast; | N | Y | an extensive collection of proteins, glycoproteins, and lipoproteins (>500), extracted from 44 important pathogenic species | http://www.imtech.res.in/raghava/antigendb/ | [23] |
| Databases for immunogenic/antigenic peptides and proteins | |||||||
| AntiJen v2.0 | continuous quantitative data on immunological molecular interactions, data from position-specific peptide libraries, and biophysical data | a nucleotide and a peptide BLAST search | N | Y | over 24,000 entries from published experimentally determined data | https://www.ddg-pharmfac.net/antijen/AntiJen/antijenhomepage.htm | [24] |
| PIRD | project information, sample information, raw sequencing data, annotated TCR or BCR repertoires, and TBAdb | Data analysis tools, data visualization tools | N | N | Contain 3 projects: ~3657 samples, ~11,395,649,000 sequences, ~3657 locus information | https://db.cngb.org/pird/ | [25] |
| AgAbDb | structure and derived data of antigen–antibody interactions; can be used as benchmark dataset of B-cell epitopes | Jmol visualized program; a tool for predicting epitopes | N | Y | 427 antigen–antibody complexes contain 289 protein-Ab complexes and 138 peptide-Ab complexes | http://bioinfo.net.in/AgAbDb | [26] |
| Bcipep | Comprehensive information about linear B-cell epitopes, including related monoclonal or polyclonal antibodies, and neutralization potential of anti-peptide antibody | peptide mapping and antigenic BLAST; | N | Y | 3031 entries that include 763 immunodominant, 1797 immunogenic, and 471 null-immunogenic epitopes | http://www.imtech.res.in/raghava/bcipep | [27] |
| EPIMHC | MHC-binding peptides and T-cell epitopes that are observed in real proteins | NA | N | N | 4875 distinct MHC-binding peptides, of which 2224 are T cell epitopes (1267 MHCI-restricted targeted 226 MHCI and 957 MHCII-restricted targeted 226 MHCII), including 84 epitopes derived from tumor-associated antigens | http://bio.med.ucm.es/epimhc/ | [28] |
| TANTIGEN2.0 | Information on human tumor antigens that contain T cell epitopes and HLA ligands, and information on validated TCR molecules | Blast, MAFFT, HLA binding prediction tools, visualization tools | Y | Y | 4296 antigen variants from 403 unique tumor antigens and more than 1500 T cell epitopes and HLA ligands | http://projects.met-hilab.org/tadb/ | [29] |
| Protegen | protective antigens and associated information | BLAST program | Y | Y | 1631 protective antigens | http://www.violinet.org/protegen | [30] |
| Epitome | all known antigen/ antibody complex structures and detailed information of interaction residues | Jmol visualized program; BLAST search | N | Y | 142 antigens from protein–antibody complex structures with a current total of 10,180 antigenic interactions | http://www.rostlab.org/services/epitome/ | [31] |
| CED | conformational epitopes and related information | visualization tool | Y | Y | 225 entries | http://web.kuicr.kyoto-u.ac.jp/~ced | [32] |
| SEDB | 3D structure of epitopes and its interaction with antigens and antibodies; B-cell, T-cell, and MHC binding proteins | Gene-Ontology, MolProbity, epitope visualization tool; blast tool | N | Y | 299 MHC-binding epitopes, 272 B-cell epitopes, 419 linear epitopes, 49 T-cell epitopes, 64 non-peptidic epitopes, and 126 discontinuous epitopes, among them 614 epitopes are determined by X-Ray | http://sedb.bicpu.edu.in/ | [33] |
| Database for immune-related molecules, such as MHC, TCR, | |||||||
| ATLAS | information on affinities, structures, and experimental details for TCRs, peptides, and MHCs | Modeling software (Rosetta program) | Y | Y | 694 measured binding affinities of TCR-pMHC complexes | https://zlab.umassmed.edu/atlas/web | [34] |
| STCRDab | TCR structure data with annotations; a resource that automatically collects and curates TCR structure data from PDB | Modeling tool | Y | Y | 618 PDB entries with a TCR structure, 851 αβTCRs, 18 γδ TCRs, 680 TCRs complexed to MHC/MHC-like molecules, also contains 37 CDR clusters cover 6 types | https://opig.stats.ox.ac.uk/webapps/stcrdab-stcrpred/ | [35] |
| TBAdb | deposited in PIRD; Antigen-specific TCRs and BCRs information | NA | Y | N | 52,287 sequences with 71 diseases | https://db.cngb.org/pird/ | [25] |
| TCRdb | human TCR beta chain sequences associated with specific tissue/clinical condition/cell type | NA | Y | N | 131 TCR-Seq projects, 8265 TCR-Seq samples, and 277,439,349 TCR CDR3 sequences | https://guolab.wchscu.cn/TCRdb/#/ | [36] |
| SYFPEITHI | MHC ligands and peptide motifs of humans and other species | Epitope prediction tool | N | N | > 7000 peptide sequences known to bind class I and class II MHC molecules | http://www.syfpeithi.de/ | [37] |
| McPAS-TCR | Detailed information on pathology-associated TCR sequences; | NA | Y | Y | About 40,000 TCR information | http://friedmanlab.weizmann.ac.il/McPAS-TCR/ | [38] |
| VDJdb | Known antigen-specific TCR sequences | an epitope-centric approach to TCR annotation | Y | Y | About over 80,000 TCR records, over 2000 unique epitopes, and over 500 studies | https://vdjdb.cdr3.net/ | [39] |
| MPID-T2 | Sequence–structure–function information on pMHC and TR/pMHC interactions | NA | Y | Y | 415 entries from five MHC sources, spanning 56 alleles; 353 pMHC structures, 62 TR/pMHC complexes; 352 MHC class I complexes, and 63 MHC class II structures | http://biolinfo.org/mpid-t2/ | [40] |
| MHCBN | Sequence and structure data of MHC binding and non-binding peptides | mapping of peptide on query sequence; creation of data sets; blast | N | Y | 19,777 entries including 17,129 MHC binders and 2648 MHC non-binders for >400 MHC molecules | http://crdd.osdd.net/raghava/mhcbn/ | [41] |
| Other immunological repositories | |||||||
| PRRDB | Comprehensive information about pattern-recognition receptors (PRRs) and their ligands | BLAST and Smith-Waterman algorithm | Y | Y | extensive information about 467 unique PRRs and 827 ligands from ~600 research articles | https://webs.iiitd.edu.in/raghava/prrdb2/ | [42] |
| InnateDB | Molecular interactions and pathway annotations of relevance to all mammalian cellular systems | visualization tool; data analysis tools | Y | Y | 18,780 interactions | https://www.innatedb.com/ | [43] |
| AFND | frequency data on the polymorphisms of several immune-related genes | visualization tools; analysis tools | N | N | 1801 population studies, 1785 gene/allele data, 683 haplotype data, and 192 genotype data | http://www.allelefrequencies.net/ | [44] |
| DEG | currently available essential gene records among prokaryotes and eukaryotes | blast tool; essential-gene analysis tools | N | Y | ~35,000 genes records for prokaryotes, ~50,000 genes records for eukaryotes | http://origin.tubic.org/deg/public/index.php | [45] |
| ViPR | information for several human pathogenic viruses; supported by NIAID | analytical and visualization tools | N | N | Information for over 50,000 virus strains from 912 species belonging to 70 genera and 14 families | www.ViPRbrc.org | [46] |
| Database for specific diseases required for designing subunit vaccines. | |||||||
| MycobacRV | known mycobacterial vaccines; epitope information for predicted adhesins; Allergen data; epitope conservation data; | analysis tools | N | N | 25 Mycobacterial strains and species, a list of 742 adhesin and adhesin-like proteins having extracellular and cell surface localization, and a list of 233 non-redundant most probable adhesin vaccine candidates | https://mycobacteriarv.igib.res.in/ | – |
| FungalRV | immunoinformatics data on predicted adhesins and adhesin like proteins; known fungal vaccines; epitope information for predicted adhesins; Allergen data; | Adhesin predictor; analysis tools | N | N | predicted 307 adhesin and adhesin-like proteins and known vaccine candidates | https://fungalrv.igib.res.in/ | [47] |
| HPVdb | antigen entries derived from high/low-risk HPV genotypes, T cell epitopes, and HLA ligands | Basic and specialized analysis tools | Y | Y | 2781 curated antigen entries of antigenic proteins derived from both 18 genotypes of high-risk HPV and low-risk HPV, 191 verified T cell epitopes, and 45 verified HLA ligands | http://cvc.dfci.harvard.edu/hpv/ | [48] |
| HIV DATABASES | data on HIV genetic sequences and immunological epitopes | analysis tool and visualization tool | N | N | 4287 antibodies, >150,000 antibody neutralization assay IC50 values, 2058 unique immunogenic CD8+ CTL epitopes, 725 unique immunogenic CD4+ T helper epitopes | https://www.hiv.lanl.gov/content/index | – |
| dbEBV | EBV genomic variation landscape; global frequency and relationship with human health of each variant | evolutionary tree building; visualization tool | N | N | curated 942 EBV genomes with 109,893 variant loci from different tissues or cell lines in 24 countries | http://dbebv.omicsbio.info/ | [49] |
| EBVdb | Information on EBV antigens, verified T cell epitopes, and HLA ligands | Analysis tools; Visualization tool | Y | Y | 2622 curated EBV antigenic proteins, 610 verified T cell epitopes, 26 verified HLA ligands | http://projects.met-hilab.org/ebv/ | [50] |
| FLAVIdB | information on protein sequences, immunological data, and structural data of flavivirus | Analysis tools; Visualization tool | Y | Y | 12,858 entries of flavivirus antigen sequences, 184 verified T-cell epitopes, 201 verified B-cell epitopes, and 4 representative molecular structures of the dengue virus envelope protein | cvc.dfci.harvard.edu/flavi/ | [51] |
| FluKB | curated immunological data and protein sequence data of influenza | analytical tools; Visualization tool | Y | N | Over 400,000 influenza protein sequences, 357 verified T-cell epitopes, 685 HLA binders, 16 naturally processed MHC ligands, and a collection of 28 influenza antibodies and their structurally defined B-cell epitopes | http://research4.dfci.harvard.edu/cvc/flukb/ | [52] |
| ViralZone | graphics describing virion organization, genome transcription, and translation strategies; fact sheets on all known virus families/genera | ClustalW alignment software | Y | N | 879 Virus description pages: 158 families, 711 genera, 8 individual species, 352 viral molecular biology pages | https://viralzone.expasy.org/ | [53] |
| SDAP | Sequence, structure, IgE epitopes, and binding data of allergenic proteins | Modeling tools; analysis tools | N | N | 1908 allergens and isoallergens, 1628 Protein sequences for allergens and isoallergens, 109 Allergens with PDB structures, 378 3D models for allergens and isoallergens, 30 Allergens with IgE epitope sets, 233 new Pfam allergen classes | http://fermi.utmb.edu/SDAP/ | [54] |
The built-in tools such as searching and browsing are included by all database, thus not listed in external tools column.
Key for application and data column: pMHC = peptide-major histocompatibility complex molecules, TR/TCR = T-cell receptors, CDR = complementary-determining region 3.
Immunogen design
An ideal multi-epitope vaccine should composed of various B cell epitopes and T cell epitopes that could elicit cytotoxic T cell (CTL), helper T cell (Th), and B cells, and induce an effective immune response against target pathogens [11]. The traditional way to determine epitopes is laboratory-based which is a time-consuming process and obtains limited results, so the sight of predicted epitopes turned to computational methods. The epitopes could be classified as B cell epitopes (BCEs) and T cell epitopes (TCEs), and their generation processes are different, this review will discuss BCEs and TCEs separately below.
T-cell epitope prediction
An antigen usually goes through three processes for eliciting an adaptive immune response of hosts: (i) antigen processing and presentation, (ii) peptide binds to MHC molecules and is presented by MHC, and (iii) T-cell receptor recognizes the peptide–MHC complex [55]. The detailed illustration of three processes of endogenous antigen and exogenous antigen can be found in the ‘Supplementary Endogenous/exogenous processing of antigens’ part.
MHC class I molecules consist of a single chain and the binding groove is closed, so the short peptides of 8-11 amino acids are preferred to bind to MHC class I [56]. The binding groove of MHC class II molecules is open, so the MHC II-bound peptide length would range from 11 to 25 residues, while only the core of nine residues sits in the binding groove [57]. The difference in structure between MHC class I molecules and MHC class II molecules makes the difficulty level of prediction of the binder different. The detailed diagrammatic presentation can be seen in Fig. 2. Based on the processing of antigens, the epitopes prediction tools could be divided into four categories: prediction of antigen processing and presentation, prediction of pMHC binding, prediction of T cell recognition of pMHC complexes, and miscellaneous methods. The paper mainly introduces the characteristics of the above four types of methods, widely used tools, and promising tools currently developed. The feature, claimed prediction performance and other information of typical tools of each category can be seen in Table 2.
Figure 2.
Diagrammatic presentation of endogenous processing of antigens and exogenous processing of antigens. A. Presents the endogenous processing of antigens. B. Presents the exogenous processing of antigens. From left to right is the process of antigen processing and presentation, peptides bind to MHC molecules, p-MHC complexes are transported to and recognized by TCR molecules. (i)-(iv) represent different tool types. The differences in processing details and tools can be seen below.
Table 2.
The list of in silico tools for predicting T-cell epitopes
| Related process | Method name | feature | Training dataset | MHC class | Pan-/specific- | Performance measure | year | Ref | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc (%) | AUC | MCC | Sens (%) | Spec (%) | |||||||||
| Prediction of Antigen Processing and Presentation | NetChop v2.0 | NN-based model for predicting proteasome cleavage motifs | MHC I ligand dataset: 458 cleavage sites determined by MHC-I ligands of 188 human protein; In vitro degradation dataset: 184 distinct sites | I | – | 0.53 | 0.80 | 0.88 | 2002 | [58] | |||
| TAPPred | sequence-based SVM or cascade SVM model for predicting TAP ligands | 431 peptides, including 409 bind to TAP with varying affinity, and 22 peptides have negligible or no binding affinity | I | – | – | – | – | – | – | 2003 | [59] | ||
| ProPred1 | matrix-based method for predicting MHC binders with proteasomal cleavage site | – | I | – | 0.80 | – | – | 0.81 | 0.67 | 2003 | [60] | ||
| RANKPEP | Hybrid method using PSSMs or profiles for predicting MHC binders, statistical language models for predicting proteasomal cleavage | Training sets for statistical modeling: C-terminus and flanking regions of 332 antigens restricted by human MHC-I | I, II | – | – | >0.8 | – | – | – | 2004 | [61] | ||
| NetChop v3.1 | NN-based model using new sequence encoding schemes for predicting proteasome cleavage | MHC I ligand dataset: 746 cleavage sites; In vitro degradation dataset: 164 cleavage sites | I | – | – | 0.85 | 0.48 | – | – | 2005 | [62] | ||
| Pcleavage | SVM-based model for predicting constitutive proteasome and immunoproteasome cleavage sites | in vitro digested dataa, 506 MHC ligandsb from over 250 proteins | I | – | 70.0a 76.7b | 0.805a 0.615b | 0.43a 0.54b | 84.6a 84.3b | 55.6a 68.0b | 2005 | [56] | ||
| NetCTL 1.2 | Hybrid model integrates MHC class I binding, proteasomal cleavage, and TAP transport efficiency | 863 epitope-protein pairs, all 9 mer peptides contained in the source protein sequences with no annotated are seen as negative peptides | I | 12 MHC-I supertypes | >0.72 | 2007 | [63] | ||||||
| TAPreg | SVM-based model for predicting affinity of peptides to TAP | 613 nonamer peptides with various TAP affinities, | I | – | – | – | – | – | – | 2009 | [64] | ||
| TAP Hunter | SVM-based model using sequence local description for predicting variable-length TAP ligands | 276 TAP binding and 94 non-binding nonamer peptides | I | – | 0.88 | 0.88 | 2010 | [57] | |||||
| Prediction of Peptide–MHC Binding | Sequence-based | BIMAS | quantitative matrices-based model using linear regression on the measured half-life of HLA-A2 complexes with bound peptide | – | I | – | – | – | – | – | – | 1994 | [65] |
| MOTIF | Motif-based | – | I | – | 1995 | [66] | |||||||
| SYFPEITHI | database for MHC ligands and peptide motifs, also provide prediction of epitope based on motif | – | I, II | – | 1999 | [37] | |||||||
| TEPITOPE | Method combines virtual matrices and DNA microarray technology | – | II | pan | 1999 | [67] | |||||||
| SVMHC | SVM-based model for MHC-I binders, and matrices-based method for MHC-II binders | SYFPEITHI data: 3500 sequences that are natural ligands to T-cell epitopes | I, II | – | 0.85 | 0.90 | 2002 | [68, 69] | |||||
| Udaka et al. | A query learning algorithm based on HMM model | 329 binding peptides | I | – | 0.84 | 2002 | [70] | ||||||
| NetMHC | quantitative ANN model | 400 9mer peptides with various binding affinities, 65 QBC-selected peptides for binding to HLA-A*0204 | I | – | 2003 | [71] | |||||||
| PEPVAC | PSSM-based model combined with a dynamic algorithm, also can predict proteasome cleavage using probabilistic language models | – | I | Supertypes, A2, A3, B7, A24 and B15 | >0.8 | 2005 | [72] | ||||||
| MULTIPRED | Method provides HMM or ANN model for predicting | 9mer peptide sequences: 3050 (664 binders and 2386 non-binders) related to 15 variants of A2 supertype, 2216 (680 binders and 1536 non-binders) related to 8 variants of A3 supertype and 2396 (448 binders and 1948 non-binders) related to 6 DR variants | I, II | Supertypes, class I A2, A3, and class II DR | >0.8 | 2005 | [73] | ||||||
| NetMHCII | SMM-align method | quantitative MHC–peptide binding data: 5147 peptides covering 14 HLA-DR and three H2-IA alleles | II | – | 0.756 | 2007 | [74] | ||||||
| NetMHCpan | ANN-based model | 37,384 unique peptide-HLA interactions, 26,503 interactions covering 24 HLA-A alleles, 10,881 interactions covering 18 HLA-B alleles | I | pan | 0.95 | 0.74 | 2007 | [75] | |||||
| NetMHCIIpan | ANN-based model using SMM-align method | 14,607 unique peptide-HLA interactions, 7839 positive binders, and 6768 negative samples | II | pan | 0.787 | 2008 | [76] | ||||||
| MULTIPRED2 | System using NetMHCpan 2.0 and NetMHCIIpan 1.0 as prediction engines for individual or combination of alleles, or supertypes | – | I, II | supertypes, 13 class I and 13 class II | – | – | – | – | – | 2010 | [77] | ||
| NetMHC4.0 | a sequence alignment method based on pan-length ANN | IEDB dataset covering 118 MHC class I alleles and each allele with over 20 binders as positive, 100 random natural peptides as negatives | I | – | >0.88 | 2016 | [78] | ||||||
| MixMHCpred | MHC-I ligand predictor using an unsupervised way to annotate motifs based on co-occurrence of HLA-I alleles | HLA peptidomics data (58 alleles in total) | I | pan | 0.979 | 2017 | [79] | ||||||
| HLAthena | Model with three single-layer neural networks trained on mass spectrometry data and different peptide encoding. | 186,464 eluted peptides from 95 alleles | I | pan | 2019 | [80] | |||||||
| MixMHC2pred | MHC-II ligand predictor using a motif deconvolution algorithm combines unbiased mass spectrometry to train models | 99,265 unique peptides eluted from HLA-II molecules | II | pan | 0.83 | 2019 | [81] | ||||||
| NetMHCpan-4.1 | ANN-based model using NNAlign_MA framework to train data | 13,245,212 data covering 250 distinct MHC class I molecules | I | pan | >0.985* | 2020 | [82] | ||||||
| NetMHCIIpan-4.0 | ANN-based model using NNAlign_MA framework to train data | 4,086,230 data covering a total of 116 distinct MHC class II molecules | II | pan | >0.975* | 2020 | [83] | ||||||
| MHCflurry 2.0 | Integrated predictor which combined models for MHC I binding and antigen processing | 75,378 entries from 56 samples | I | pan | >0.91 | 2020 | [84] | ||||||
| RBM-MHC | Method combines a Restricted Boltzmann Machine based model and a semi-supervised HLA-I classifier | The data is from mass spectrometry and binding affinity assays available in IEDB | I | pan | 0.97 | 2021 | [85] | ||||||
| TransPHLA | transformer-based model using self-attention | 112 HLA-I alleles, 359,166 positive data (pHLA binders) and 1,795,830 negative data (non-binding pHLAs) | I | pan | >0.9 | >0.9 | >0.8 | – | – | 2022 | [86] | ||
| MixMHC2pred v2.0 | 2 successive blocks of NNs with distinct tasks based on the binding motifs and peptide sequence features | HLA peptidomics dataset of 627,013 unique MHC-II ligands and derive motifs for 88 MHC-II alleles | II | pan | 0.945 | 2023 | [87] | ||||||
| BigMHC | Model comprise an ensemble of seven deep neural networks, offering two models: ELa and IMb | 259,298 EL and 15,065,287 negative instancesa; 1407 antigens and 4778 negativeb | I | pan | 0.9733a 0.7767b | 2023 | [88] | ||||||
| TripHLApan | Model integrates triple coding matrix, BiGRU+Attention models, and transfer learning strategy | 2,788,602 HLA-I peptides (464,767 positive sample); 963,186 HLA-II peptides (160,531 positive samples) | I, II | pan | – | 0.979 | – | – | – | 2024 | [89] | ||
| MixMHCpred 3.0 | 2 blocks of NNs based on the binding motifs and peptide length distributions | HLA peptidomics dataset of 511,553 ligands interacting with 143 MHC-I alleles | I | pan | >0.98 | 2024 | [90] | ||||||
| ConvNeXt-MHC | Method using degenerate coding approach and ConvNeXt model, which combines transfer learning and semi-supervised learning methods | mass spectrometry training seta: 206,515 positive data points and 842,060 negative data points; pMHC affinity training setb: 210,509 positive data points and 38,633 negative data points. | I | pan | 0.964a 0.9048b | – | 0.886a’0.8143b | – | – | 2024 | [91] | ||
| ImmuneApp | An interpretable, attention-based hybrid deep learning framework, offering ImmuneApp-Neo for immunogenicity prediciton | 349,650 ligands | I | pan | 0.9650 | 2024 | [92] | ||||||
| Structure-based Method | EpiDock | Method based on homology modeling and docking score-based quantitative matrix | – | II | – | 0.83 | 2013 | [93] | |||||
| MHCfold | Consist of a CNN-based modeling module and a transformer’s encoder-based specificity module | 390 pMHCI structures; 3,459,753pMHCI pairs | I | – | 0.94 | 2022 | [94] | ||||||
| Fine-tuned AlphaFold 2 | a simple classification module on top of the AlphaFold network and fine-tuning the combined network parameters for predicting MHC binders | training set consisting of 10,340 pMHC examples, 203 structurally characterized and 5102 modeled pMHC binder examples, and 5035 non-binder examples, distributed across 68 Class I alleles and 39 Class II allele pairs. | Ia, IIb | – | 0.97a 0.93b | 2023 | [95] | ||||||
| Prediction of T cell recognition of pMHC complexes | sequence-based | NetTCR-1.0 | CNN-based model | 9012 positive and 66,102 negative TCR-peptide combinations, spanning 91 peptides and 8920 TCR sequences | I | – | 0.727 | 2018 | [96] | ||||
| ImRex | a CNN with the combined representation of sequences of epitopes and TCR CDR3 | Mixed chain dataset: 19,842 unique TCR CDR3 alpha/beta sequences-epitope pairs, 120 epitopes; alpha chain dataset: 5654 sequences, 60 epitopes; beta chain dataset: 14,188 sequences, 118 epitopes | I | – | 0.68 | 2021 | [97] | ||||||
| structure-based models hybrid approaches | TCRdock | Pipeline consists of a specialized version for TCR-pMHC modeling of AlphaFold and matric-based score function | 279 total training examples | I | – | 0.82 | 2023 | [98] | |||||
| RACER | a coarse-grained, chemically accurate energy model relies on known TCR–peptide structures and experimental data | consist of three TCRs pre-identified strong-binding peptides and decoy peptides with randomized sequences | I | – | 0.89 | 2021 | [99] | ||||||
| NetTCR 2.2 | CNN-based model combining pan- and peptide-specific training, loss-scaling, and sequence similarity integration | 6353 TCR sequences across 26 peptides as positive set, negatives were generated by swapping the TCRs for a given peptide with TCRs binding to other peptides, IMMREP 2022 dataset | I | pan | 0.8476 | 2023 | [100] | ||||||
| CATCR | system using CNN model extract structure features and a transformer to encode segment-based coded sequence features | 65,069 CDR3β–epitope pairs | I | 0.848 | 0.89 | 2024 | [101] | ||||||
| Miscellaneous methods | CTLpred | Method based on QMa, ANNb, and SVMc model for predicting CTL epitopes, also provides consensusd and combinede approaches | 1137 CTL epitopes, 1134 non-epitopes | I | – | 0.700a 0.722b0.752c 0.776d 0.758e | 0.652a 0.732b 0.738c 0.669d 0.797e | 0.749a 0.712b 0.770c 0.884d 0.719e | 2004 | [102] | |||
| IL4pred | System provides SVM-based modela and hybrid approachb for predicting IL4 inducing peptides | 904 experimentally validated IL4-inducing and 742 non-inducing MHC class II binders | II | – | 0.6908a 0.7576b | – | 0.38a 0.51b | 0.7058a 0.7876b | 0.6725a 0.721b | 2013 | [103] | ||
| IFNepitope | System provides motif tool, SVM-based modelb, and hybrid approachc h for predicting IFN-γ inducing peptides | main dataset: 3705 IFN-γ inducing and 6728 non-IFN-γ inducing MHC class II binders; IFNgOnly dataset: 4483 IFN-γ inducing epitopes and 2160 epitopes that induce other cytokine except IFN-γ | II | – | 0.7954b 0.8210c | 0.55b 0.62c | 0.665b 0.7798c | 0.8671b 0.8436c | 2013 | [104] | |||
| CD4episcore | System provides an ANN-based model for immunogenicity predictionb, 7-allele method for HLA binding predictionsa, and combine methodc | Training dataset consists of two sets: experimental data (1032 positive epitopes and 5739 negative peptides), tetramer data (124 positives and 5319 negative peptides) | II | – | 0.703a 0.702b 0.725c | 2018 | [105] | ||||||
Key for feature: NN = neural network; SVM = support vector machine; HMM = Hidden Markov Models; PSSM = position-specific scoring matrix; ANN = artificial neural network; SMM-align = stabilization matrix alignment method; CNN = Convolutional Neural Networks; QM = quantitative matrix. EL = eluted ligands. BA = binding affinity. Key for Performance measure: Acc (%) = accuracy; AUC = Area Under the ROC Curve; MCC = Matthew’s correlation coefficient; Sens (%) = sensitivity; Spec (%) = specificity.
*: This value represents the averaged AUC value of multiple HLA molecules reported in this paper.
Prediction of antigen processing and presentation
Natural processing by cellular antigen processing machinery is a necessary step for peptides that would be presented on MHC class I molecules. The first step is the antigen protein was cleaved by the proteasome and the potential epitopes were generated [106]. The prediction of the cleavage site of the proteasome is vital for the identification of potential immunogenic regions in a pathogen protein and the step also shows some degree of specificity. Based on this theory, computational methods such as NetChop [58] and Pcleavage [56] have been developed. The proteasomal fragments were translocation into the endoplasmic reticulum by TAP, and the efficiency of TAP-mediated translocation of peptides has been shown to be proportional to its TAP binding affinity [107]. Studying the selectivity and specificity of TAP may contribute significantly to predicting CTL epitopes. Several in silico methods have been proposed to test whether peptides are capable of binding TAP, such as TAPPred [59], TAP Hunter [57], and TAPreg [64]. There are also hybrid approaches have also been proposed, such as ProPred1 [60] and RANKPEP [61] tools combine the proteasomal cleavage with the process of MHC binding to peptides, and the NetCTL 1.2 [63] tool increases the transfer efficiency of TAP on top of these two processes. Nevertheless, these strategies have limited improvement in performance, the reason might be the insufficient specificity of both proteasomal cleavage and TAP transport predictors, which makes the most predicted binders cannot be identified and presented to MHC class I molecules successfully [107]. For MHC-II molecules, the antigen process has been studied in several researches but related computational approaches have not been widely developed yet [108]. The mentioned approaches have been concluded in Table 2.
Prediction of peptide–MHC binding
The binding of peptide and MHC molecule is the most selective step [109], numerous predictors have been developed which aim to solve three questions: (i) distinguish MHC binders from nonbinders, (ii) predict the binding affinity of peptides to MHC molecules, and (iii) deal with human leukocyte antigens (HLAs) polymorphism (i.e. MHC molecules are known as HLA in humans). The frequency of HLA alleles can vary greatly in different ethnicities and HLA alleles bind distinct sets of peptides, specific epitopes focused on different ethnicities could be predicted by the research of HLA alleles and specific multi-epitope vaccines could be developed [55]. For well-studied HLAs, the determination of binders and calculating the binding affinity can be achieved by most of prediction tools. However, for the less-studied HLAs which are the majority, the traditional approach that generated large data sets of each HLA allele is unfeasible [110], thus the solution of HLA polymorphism is an important difficulty of design of epitope predicted tools to be considered.
The first approach to solve the problem which is the extreme polymorphism of the MHC is to define HLA supertypes. Some researchers found out that different HLA molecules could bind to similar peptides and developed methods to cluster HLA molecules with similar peptide binding specificity as HLA supertypes [110]. In 1999, Sette et al. defined nine HLA Class I supertypes and proved that these nine supertypes could cover almost all known binding properties of HLA class I molecules [111]. Thus, the epitope of unseen HLA molecules could be obtained by predicting the affinity of the peptide with a single MHC molecule representing the corresponding supertype, and this could achieve the population-wide epitope discovery. The following similar approaches such as PEPVAC [72], MULTIPRED [73], MULTIPRED2 [77], and others also exhibited the effectiveness of methods based on supertypes for both HLA-I and HLA-II alleles. However, the methods of supertype are oversimplified for some specific therapies or requirements [110], some researchers proposed pan-specific prediction methods to solve the question, the main advantage is that it could predict the binding of any HLA alleles to any peptides even without any binding information of the query HLA [108]. TEPITOPE [67] is the first pan-specific method that could predict binding to 51 prevalent HLA-DR alleles. NetMHCpan [75] is the first and most commonly used pan-specific prediction tool for MHC-I molecules, which generates a quantitative prediction of the affinity of any peptide-HLA-I interaction.
The bioinformatics tools developed for the prediction of peptide–MHC binding could be divided into two main categories: sequence-based methods rely on sequence information of known binders and HLA molecules, and structure-based methods generally rely on the information of pMHC structures. The typical methods of the two categories have been summarized by year of issue in Table 2.
Sequence-based method relies on sequence information of known binders and HLA molecules
According to the course of development, the sequence-based methods could be divided into two types: simple motif-based models and other heuristic approaches, which are focus on the contribution of each or specific residue to the binding; and machine learning-based methods which are introduced to improve the predictive performance of motif-based approaches. Among machine learning (ML)-based methods, the fundamental processes are first to collect a comprehensive epitope dataset; then, extract the feature vectors from various sources, including propensity scales, protein sequences, and 3D structures; and finally, obtain a trained model by using ML algorithms to train on the collected sample dataset. The progress of algorithms greatly improves the prediction accuracy and promoted the development of epitope prediction in-silico, the average AUC values over 0.9 which means excellent performance of prediction tools. The applications of semi-supervised model and unsupervised model also pave the way of the development of pan-specific methods. The detailed description of both two kinds of method can be seen in the ‘Supplementary Sequence-based method of pMHC binding’ part.
Structure-based method relies on the information of pMHC structures
The structure-based methods depend on the biochemical properties of amino acids involved in pMHC interactions, while only a few molecules obtained 3D structures by experiments, and most of the molecules still need to be modeled [112]. The most structure-based methods would produce various potential docking conformations of given peptides with a given receptor (i.e. sampling), the generated conformations would be ranked and screened to select the suitable binders with binding affinity of given receptors by scoring functions [113]. The prediction performance of the method would be greatly influenced by the quality of both the sampling algorithms and the scoring functions [113]. Since the unsatisfactory performance and high computing costs of computational modeling, the application of structure-based methods is far less than that of sequence-based methods. EpiDock [93] is the first structure-based approach for predicting MHC binders which utilized homology modeling and rigid docking as sampling algorithm and quantitative matrixes-based score function. The method could apply to 23 most frequent HLA II molecules and almost cover over 95% of the human population with an overall accuracy of 83%. With the remarkable performance of AlphaFold which could provide atomic-level information for modeling, several AlphaFold-based tools for have been proposed, such as Fine-tuned AlphaFold 2 [95]. The deep learning (DL)-based methods also show great potential, such as MHCfold [94]. Both methods exhibit impressive superior performance which is comparable to the overall performance of the state-of-the-art NetMHCpan-4.1 and NetMHCIIpan-4.0. The application of structure-based methods is limited by their long computation times and limited coverage of diverse MHC alleles, and most methods are not as user-friendly as sequence-based methods [114], the great performance and limitations of structure-based methods show that there are still a lot of space to explore and robust potential to apply.
Prediction of T cell recognition of pMHC complexes
The above in silico methods are the prediction for MHC class I/II binders, TAP binders, and protease cleavage, which in essence is by reducing the number of potential epitope candidates by constructing a powerful filter [115]. However, most of the predicted binders do not induce an immune response actually because they do not evoke TCR-specific recognition by the T cell. Therefore, efforts were devoted to the development of TCR-pMHC specific interaction prediction methods for studying the initiation and effectiveness of adaptive immune response [116]. In addition to considering antigen processing to develop tools, some interesting ideas have been proposed, such as using real epitopes as datasets to train prediction models and determine whether peptides can trigger cytokines to screen epitopes. The detailed discussion of these two categories can be found in the ‘Supplementary Prediction of TCR-pMHC complexes’ part and ‘Supplementary Miscellaneous methods’ part.
Benchmark dataset generation for T-cell epitope prediction
The MHC-I and MHC-II peptides were derived from IEDB [20]. For a fair comparison, we only involve peptides included in the IEDB database after the year 2024, which can be considered as the completely independent testing dataset for model comparison. The benchmark dataset for T-cell epitope involving 1323 experimentally validated positive peptides and 1305 negative peptides were listed in Supplementary Table S1. Among them, two major pathogens including bacteria and viruses were derived for validation on the pathogens level, which included 53 positive peptides and four negative peptides for bacteria, and 1215 positive peptides and 1278 negative peptides for viruses (Supplementary Table S1).
Moreover, the top two abundant species of SARS-CoV-2 and influenza virus for MHC-I peptides, as well as the top two abundant species of SARS-CoV-2 and hepatitis E virus (HEV) for MHC-II peptides were selected for validation on species level (Supplementary Table S2). The detailed information on how to prepare benchmark datasets and the selection of models can be seen in the ‘Supplementary Benchmark dataset generation for T cell epitope predictors’.
Model comparison for MHC-I T-cell epitope prediction
For model evaluation, we selected eight MHC-I T-cell epitope prediction tools and four MHC-II T-cell epitope prediction tools for comparison, including both the classical and latest SOTA approaches (Table 3 and Table 7). As a typical binary classification issue, we selected accuracy, precision, F1-score, sensitivity, specificity, and Matthews correlation coefficient (MCC) for comparison.
Table 3.
The average performance of each of the MHC-I epitope predictors on the benchmark dataset
| Year | Method name | Availability | Input data | peptide length(mer) | Output | TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2024 | TripHLApan | GitHub codes | peptide, HLA | <=14 | probability | 893 | 653 | 598 | 386 | 0.6111 | 0.5989 | 0.6448 | 0.6982 | 0.5220 | 0.2238 |
| ImmuneApp | webserver | peptide, HLA | > = 8 | score, rank, label, slide window | 907 | 416 | 885 | 412 | 0.5050 | 0.5061 | 0.5831 | 0.6876 | 0.3198 | 0.0080 | |
| MixMHCpred (v3.0) | GitHub codes | peptide, HLA | 8 ~ 14 | score, %rank | 885 | 722 | 525 | 390 | 0.6372 | 0.6277 | 0.6592 | 0.6941 | 0.5790 | 0.2750 | |
| 2023 | BigMHC-EL | GitHub codes | peptide, HLA | > = 8 | score | 460 | 1036 | 265 | 859 | 0.5710 | 0.6345 | 0.4501 | 0.3487 | 0.7963 | 0.1621 |
| BigMHC-IM | GitHub codes | peptide, HLA | > = 8 | score | 318 | 1128 | 173 | 1001 | 0.5519 | 0.6477 | 0.3514 | 0.2411 | 0.8670 | 0.1385 | |
| 2022 | TransPHLA | webserver; GitHub codes | peptide, HLA, HLA sequence | > = 8, the sequence over 15 will be cut | Probability, label, slide window | 1147 | 169 | 1132 | 172 | 0.5023 | 0.5033 | 0.6376 | 0.8696 | 0.1299 | −0.0007 |
| 2020 | NetMHCpan-4.1-EL | webserver | peptide, HLA | > = 8 | score, %rank, slide window | 1081 | 322 | 979 | 238 | 0.5355 | 0.5248 | 0.6398 | 0.8196 | 0.2475 | 0.0818 |
| NetMHCpan-4.1-BA | webserver | peptide, HLA | > = 8 | score, %rank, slide window | 1039 | 400 | 901 | 280 | 0.5492 | 0.5356 | 0.6376 | 0.7877 | 0.3075 | 0.1085 | |
| MHCflurry 2.0 | GitHub codes | peptide, HLA | <16 | score, %rank | 1013 | 655 | 600 | 267 | 0.6580 | 0.6280 | 0.7003 | 0.7914 | 0.5219 | 0.3256 | |
| 2017 | HLAthena | webserver | peptide, HLA | > = 8 | score, prank, slide window | 251 | 1104 | 197 | 1068 | 0.5172 | 0.5603 | 0.2841 | 0.1903 | 0.8486 | 0.0516 |
Table 7.
The average performance of each of the benchmarked HLA-II epitope predictors on the benchmark dataset
| Year | Method name | Availability | Input data | peptide length(mer) | Output | TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2024 | TripHLApanII | GitHub codes | Peptide, HLA | – | probability | 86 | 13 | 23 | 78 | 0.4950 | 0.7890 | 0.6300 | 0.5244 | 0.3611 | −0.0883 |
| 2023 | MixMHC2pred-2.0 | GitHub codes | Peptide, HLA | 12 ~ 21 | score, rank, slide window | 24 | 28 | 8 | 140 | 0.2600 | 0.7500 | 0.2449 | 0.1463 | 0.7778 | −0.0795 |
| 2020 | NetMHCIIpan-4.0-EL | webserver | Peptide, HLA | >14 | score, rank, slide window | 60 | 22 | 10 | 99 | 0.4293 | 0.8571 | 0.5240 | 0.3774 | 0.6875 | 0.0503 |
| NetMHCIIpan-4.0-BA | webserver | Peptide, HLA | >14 | score, rank | 107 | 18 | 14 | 52 | 0.6545 | 0.8843 | 0.7643 | 0.6730 | 0.5625 | 0.1825 | |
| 2018 | CD4episcore | webserver | Peptide | >14 | Peptide over the cut-off | 111 | 17 | 15 | 48 | 0.6702 | 0.8810 | 0.7789 | 0.6981 | 0.5313 | 0.1808 |
The overall prediction performance of eight tools showed comparable performance in accuracy (0.50 to 0.66) and precision (0.50 to 0.65), but illustrated different performances on other indicators including F1-score (0.28 to 0.70), sensitivity (0.20 to 0.87), specificity (0.13 to 0.87) and MCC (−0.00 to 0.33). It should be noted that the applicable range of each tool is slightly different. For example, several tools limited the length interval of input peptides, commonly 8-mer to 14 or 15-mer, while other tools limited the longest or shortest peptides as input. Thus, users need to choose suitable tools for their research targets based on the application range of different methods. In fact, we did an extra validation on the consistent testing dataset, which illustrated similar performance (Table 4).
Table 4.
The average performance of each of the MHC-I epitope predictors on the consistent dataset
| Year | Method name | TP | TN | FP | FN | accuracy | precision | f1_score | sensitivity | specificity | mcc |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2024 | TripHLApan | 891 | 651 | 596 | 384 | 0.6114 | 0.5992 | 0.6452 | 0.6988 | 0.5221 | 0.2245 |
| ImmuneApp | 888 | 389 | 858 | 387 | 0.5063 | 0.5086 | 0.5879 | 0.6965 | 0.3119 | 0.0091 | |
| MixMHCpred (v3.0) | 885 | 722 | 525 | 390 | 0.6372 | 0.6277 | 0.6592 | 0.6941 | 0.5790 | 0.2750 | |
| 2023 | BigMHC-EL | 460 | 982 | 265 | 815 | 0.5718 | 0.6345 | 0.4600 | 0.3608 | 0.7875 | 0.1638 |
| BigMHC-IM | 318 | 1074 | 173 | 957 | 0.5519 | 0.6477 | 0.3601 | 0.2494 | 0.8613 | 0.1398 | |
| 2022 | TransPHLA | 1116 | 142 | 1105 | 159 | 0.4988 | 0.5025 | 0.6384 | 0.8753 | 0.1139 | −0.0167 |
| 2020 | NetMHCpan-4.1—EL | 1055 | 285 | 962 | 220 | 0.5313 | 0.5231 | 0.6409 | 0.8275 | 0.2285 | 0.0700 |
| NetMHCpan-4.1—BA | 1016 | 360 | 887 | 259 | 0.5456 | 0.5339 | 0.6394 | 0.7969 | 0.2887 | 0.0994 | |
| MHCflurry 2.0 | 1012 | 648 | 599 | 263 | 0.6582 | 0.6281 | 0.7013 | 0.7937 | 0.5196 | 0.3262 | |
| 2017 | HLAthena | 250 | 1050 | 197 | 1025 | 0.5155 | 0.5593 | 0.2904 | 0.1961 | 0.8420 | 0.0499 |
Further, we evaluated the performance of the above MHC-I T-cell epitope prediction tools on bacteria. Results showed that TransPHLA could achieve the best accuracy of 0.8070 and precision of 0.9200, followed by NetMHCpan-4.1, MHCflurry and MixMHCpred (v3.0), which all achieved an accuracy over 0.7 and were higher than the remaining tools (Table 5). Meanwhile, the validation on viruses illustrated that MHCflurry 2.0 achieved the best prediction accuracy of 0.6592, followed by MixMHCpred (v3.0) of 0.6393. It can be found that the above tools showed a large difference in performance on bacteria, whereas presented similar performance on viruses. This might be caused by the following results: (i) The validated number of bacteria is relatively small, which cannot reflect the performance of different algorithms, and (ii) The dataset of viruses contains a large amount of SARS-CoV-2, which was rarely included in the training dataset of the above models and led to a decrease in the overall prediction performance.
Table 5.
Performance of each HLA-I epitope predictors on the virus dataset and bacteria dataset
| bacteria | virus | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc | TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc | |
| TripHLApan | 31 | 0 | 4 | 22 | 0.5439 | 0.8857 | 0.7045 | 0.5849 | 0.0000 | −0.2178 | 828 | 648 | 576 | 343 | 0.6163 | 0.5897 | 0.6431 | 0.7071 | 0.5294 | 0.2400 |
| ImmuneApp | 30 | 0 | 4 | 23 | 0.5263 | 0.8824 | 0.6897 | 0.5660 | 0.0000 | −0.2260 | 847 | 410 | 864 | 364 | 0.5058 | 0.4950 | 0.5797 | 0.6994 | 0.3218 | 0.0229 |
| MixMHCpred (v3.0) | 40 | 0 | 4 | 13 | 0.7018 | 0.9091 | 0.8247 | 0.7547 | 0.0000 | −0.1493 | 810 | 716 | 504 | 357 | 0.6393 | 0.6164 | 0.6530 | 0.6941 | 0.5869 | 0.2823 |
| BigMHC-EL | 23 | 0 | 4 | 30 | 0.4035 | 0.8519 | 0.5750 | 0.4340 | 0.0000 | −0.2896 | 420 | 1027 | 247 | 791 | 0.5823 | 0.6297 | 0.4473 | 0.3468 | 0.8061 | 0.1725 |
| BigMHC-IM | 11 | 2 | 2 | 42 | 0.2281 | 0.8462 | 0.3333 | 0.2075 | 0.5000 | −0.1780 | 298 | 1109 | 165 | 913 | 0.5662 | 0.6436 | 0.3560 | 0.2461 | 0.8705 | 0.1496 |
| TransPHLA | 46 | 0 | 4 | 7 | 0.8070 | 0.9200 | 0.8932 | 0.8679 | 0.0000 | −0.1028 | 1058 | 167 | 1107 | 153 | 0.4930 | 0.4887 | 0.6268 | 0.8737 | 0.1311 | 0.0071 |
| NetMHCpan-4.1—EL | 41 | 0 | 4 | 12 | 0.7193 | 0.9111 | 0.8367 | 0.7736 | 0.0000 | −0.1419 | 999 | 319 | 955 | 212 | 0.5304 | 0.5113 | 0.6313 | 0.8249 | 0.2504 | 0.0919 |
| NetMHCpan-4.1—BA | 42 | 0 | 4 | 11 | 0.7368 | 0.9130 | 0.8485 | 0.7925 | 0.0000 | −0.1343 | 958 | 397 | 877 | 253 | 0.5453 | 0.5221 | 0.6290 | 0.7911 | 0.3116 | 0.1168 |
| MHCflurry 2.0 | 41 | 0 | 4 | 12 | 0.7193 | 0.9111 | 0.8367 | 0.7736 | 0.0000 | −0.1419 | 931 | 651 | 577 | 241 | 0.6592 | 0.6174 | 0.6948 | 0.7944 | 0.5301 | 0.3357 |
| HLAthena | 13 | 3 | 1 | 40 | 0.2807 | 0.9286 | 0.3881 | 0.2453 | 0.7500 | −0.0028 | 230 | 1274 | 0 | 981 | 0.6052 | 1.0000 | 0.3192 | 0.1899 | 1.0000 | 0.3276 |
Moreover, two major viruses of SARS-CoV-2 and influenza virus were evaluated to perform the comparison on the species level. It can be found that the prediction accuracy of compared tools on the influenza virus was around 0.49 to 0.55 with similar precision (0.56 to 0.63), while with variable sensitivity (0.17 to 0.93) and specificity (0.08 to 0.88). The Results on SARS-CoV-2 illustrated different phenomena, in which the latest SOTA tools such as MHCflurry 2.0 could give an accuracy of 0.8676 with a precision of 0.7811. Meanwhile, the classical NetMHCpan-4.1 could only achieve an accuracy of 0.5077 and a precision of 0.3653. This might be due to the fact that for SARS-CoV-2, some of the methods do not incorporate enough data for training purposes (Table 6).
Table 6.
Performance of each HLA-I epitope predictor on the different virus datasets
| Method name | Source Organism | Species | Negative | Positive | TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TripHLApan | Influenza A virus | Influenza A virus | 137 | 169 | 135 | 30 | 107 | 34 | 0.5392 | 0.5579 | 0.6569 | 0.7988 | 0.2190 | 0.0218 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 544 | 234 | 144 | 438 | 106 | 90 | 0.7481 | 0.5760 | 0.5950 | 0.6154 | 0.8051 | 0.4130 | |
| ImmuneApp | Influenza A virus | Influenza A virus | 137 | 169 | 125 | 41 | 96 | 44 | 0.5425 | 0.5656 | 0.6410 | 0.7396 | 0.2993 | 0.0432 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 544 | 234 | 175 | 154 | 390 | 59 | 0.4229 | 0.3097 | 0.4380 | 0.7479 | 0.2831 | 0.0318 | |
| MixMHCpred (v3.0) | Influenza A virus | Influenza A virus | 137 | 169 | 116 | 50 | 87 | 53 | 0.5425 | 0.5714 | 0.6237 | 0.6864 | 0.3650 | 0.0540 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 544 | 234 | 144 | 496 | 48 | 90 | 0.8226 | 0.7500 | 0.6761 | 0.6154 | 0.9118 | 0.5607 | |
| BigMHC-EL | Influenza A virus | Influenza A virus | 137 | 169 | 55 | 100 | 37 | 114 | 0.5065 | 0.5978 | 0.4215 | 0.3254 | 0.7299 | 0.0600 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 544 | 234 | 66 | 520 | 24 | 168 | 0.7532 | 0.7333 | 0.4074 | 0.2821 | 0.9559 | 0.3412 | |
| BigMHC-IM | Influenza A virus | Influenza A virus | 137 | 169 | 29 | 120 | 17 | 140 | 0.4869 | 0.6304 | 0.2698 | 0.1716 | 0.8759 | 0.0661 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 544 | 234 | 44 | 534 | 10 | 190 | 0.7429 | 0.8148 | 0.3056 | 0.1880 | 0.9816 | 0.3061 | |
| TransPHLA | Influenza A virus | Influenza A virus | 137 | 169 | 158 | 11 | 126 | 11 | 0.5523 | 0.5563 | 0.6976 | 0.9349 | 0.0803 | 0.0293 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 544 | 234 | 215 | 44 | 500 | 19 | 0.3329 | 0.3007 | 0.4531 | 0.9188 | 0.0809 | −0.0005 | |
| NetMHCpan-4.1—EL | Influenza A virus | Influenza A virus | 137 | 169 | 144 | 24 | 113 | 25 | 0.5490 | 0.5603 | 0.6761 | 0.8521 | 0.1752 | 0.0370 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 544 | 234 | 220 | 135 | 409 | 14 | 0.4563 | 0.3498 | 0.5098 | 0.9402 | 0.2482 | 0.2195 | |
| NetMHCpan-4.1—BA | Influenza A virus | Influenza A virus | 137 | 169 | 143 | 26 | 111 | 26 | 0.5523 | 0.5630 | 0.6761 | 0.8462 | 0.1898 | 0.0476 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 544 | 234 | 202 | 193 | 351 | 32 | 0.5077 | 0.3653 | 0.5133 | 0.8632 | 0.3548 | 0.2205 | |
| MHCflurry 2.0 | Influenza A virus | Influenza A virus | 137 | 169 | 134 | 33 | 104 | 35 | 0.5456 | 0.5630 | 0.6585 | 0.7929 | 0.2409 | 0.0404 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 544 | 234 | 182 | 493 | 51 | 52 | 0.8676 | 0.7811 | 0.7794 | 0.7778 | 0.9063 | 0.6849 | |
| HLAthena | Influenza A virus | Influenza A virus | 137 | 169 | 39 | 110 | 27 | 130 | 0.4869 | 0.5909 | 0.3319 | 0.2308 | 0.8029 | 0.0407 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 544 | 234 | 43 | 470 | 74 | 191 | 0.6594 | 0.3675 | 0.2450 | 0.1838 | 0.8640 | 0.0612 |
For MHC-II T-cell epitope predictors, we selected four tools for evaluation on the benchmark dataset. Among them, CD4episcore achieves the best prediction performance on benchmark testing datasets (Table 7) as well as the consistent datasets (Table 8). Studies on two major viruses illustrated that the above tools perform better on SARS-CoV-2 than those on HEV (Table 9). Considering both datasets are relatively small, the performance of those tools on emerging pathogens may require further validation when more data is accumulated.
Table 8.
The average performance of each of the benchmarked HLA-II epitope predictors on the consistent dataset
| Year | Method name | TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2024 | TripHLApan | 82 | 13 | 19 | 77 | 0.4974 | 0.8119 | 0.6308 | 0.5157 | 0.4063 | −0.0584 |
| 2023 | MixMHC2pred-2.0 | 23 | 24 | 8 | 136 | 0.2461 | 0.7419 | 0.2421 | 0.1447 | 0.7500 | −0.1067 |
| 2020 | NetMHCIIpan-4.0-EL | 60 | 22 | 10 | 99 | 0.4293 | 0.8571 | 0.5240 | 0.3774 | 0.6875 | 0.0503 |
| NetMHCIIpan-4.0-BA | 107 | 18 | 14 | 52 | 0.6545 | 0.8843 | 0.7643 | 0.6730 | 0.5625 | 0.1825 | |
| 2018 | CD4episcore | 111 | 17 | 15 | 48 | 0.6702 | 0.8810 | 0.7789 | 0.6981 | 0.5313 | 0.1808 |
Table 9.
Performance of each HLA-II epitope predictor on the different virus datasets
| Method name | Source Organism | Species | Negative | Positive | TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TripHLApan | Hepatitis E virus | Hepatitis E virus | 24 | 76 | 25 | 11 | 13 | 51 | 0.3600 | 0.6759 | 0.4386 | 0.3289 | 0.4583 | −0.2127 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 4 | 16 | 14 | 1 | 3 | 2 | 0.7500 | 0.8235 | 0.8485 | 0.8750 | 0.2500 | 0.1400 | |
| MixMHC2pred-2.0 | Hepatitis E virus | Hepatitis E virus | 24 | 76 | 9 | 23 | 1 | 67 | 0.3200 | 0.9000 | 0.2093 | 0.1184 | 0.9583 | 0.1093 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 4 | 16 | 11 | 1 | 3 | 5 | 0.6000 | 0.7857 | 0.7333 | 0.6875 | 0.2500 | −0.0546 | |
| NetMHCIIpan-4.0-EL | Hepatitis E virus | Hepatitis E virus | 20 | 72 | 50 | 12 | 8 | 22 | 0.6739 | 0.8621 | 0.7692 | 0.6944 | 0.6000 | 0.2516 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 4 | 15 | 8 | 3 | 1 | 7 | 0.5789 | 0.8889 | 0.6667 | 0.5333 | 0.7500 | 0.2313 | |
| NetMHCIIpan-4.0-BA | Hepatitis E virus | Hepatitis E virus | 20 | 72 | 24 | 15 | 5 | 48 | 0.4239 | 0.8276 | 0.4752 | 0.3333 | 0.7500 | 0.0740 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 4 | 15 | 7 | 4 | 0 | 8 | 0.5789 | 1.0000 | 0.6364 | 0.4667 | 1.0000 | 0.3944 | |
| CD4episcore | Hepatitis E virus | Hepatitis E virus | 20 | 72 | 57 | 12 | 8 | 15 | 0.75 | 0.8769 | 0.8321 | 0.7917 | 0.6000 | 0.3548 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 4 | 15 | 14 | 0 | 4 | 1 | 0.7368 | 0.7778 | 0.8485 | 0.9333 | 0.0000 | −0.1217 |
In general, we evaluated the performance of current SOTA T-cell epitope predictors, the performance illustrated that each tool may have different emphases, some focusing on filtering out as many positives as possible, while others concentrate on ensuring that the predicted positives are as reliable as possible. The users can choose different tools based on benchmark validation for their research purposes.
B-cell epitope prediction
Another important part of adaptive immune response is B cell-induced humoral immunity, B cells mediate humoral adaptive immunity by the production and secretion of different forms of antibodies [55]. The activation of B-cells could be classified into two pathways based on the usage of T-cells. The illustration of B cell activation pathways can be seen in Fig. 3, the description can be seen in the ‘Supplementary B cell activation pathways’ part. The binding parts of BCR (i.e. B-cell epitopes) with antigens could be divided into linear epitopes that is short linear peptides and discontinuous epitopes that is produced by folded protein structures based on their spatial structures [117]. The computational approaches for predicting B-cell epitopes are classified as tools for predicting linear epitopes and tools for predicting conformation epitopes. This review will focus on the features and widely used tools of each class, the input information, features, training dataset and claimed prediction performance, and other typical tools have been summarized in Table 10.
Figure 3.
The schematic illustration of B cell activation pathways. A. Represents the B cell activation pathways, the right parts mean the T cell-dependent pathway, and the left parts mean the T cell-independent pathway. B. Is the simple example of linear B-cell epitopes. C. Is the simple example of conformational B-cell epitopes.
Table 10.
The list of in silico tools for predicting linear and conformational B-cell epitopes
| Type | Method | Feature | Training dataset | Input | Performance measure | Year | Ref | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Acc (%) | AUC | MCC | Sens (%) | Spec (%) | ||||||||
| Linear | PREDITOP | amino acid propensity scales-based | – | propensity scale, query sequence, a journal file | ~0.6 | – | – | – | – | 1993 | [115] | |
| PEOPLE | amino acid propensity scales-based combined method | – | – | – | – | – | – | – | 1999 | [116] | ||
| BcePred | Combination of amino acid propensity scales-based | – | protein sequence, properties, threshold | 0.5870 | 0.56 | 0.61 | 2004 | [118] | ||||
| ABCpred | ANN-based | 700 B-cell epitopes and 700 random peptides as positive and negative dataset | protein sequence, epitope length, threshold | 0.6593 | 0.3187 | 0.6714 | 0.6471 | 2006 | [119] | |||
| BepiPred | HMM model combined with propensity scale method | 127 protein sequences with not fully annotated, 0.08 epitope density | – | 0.671 | 2006 | [120] | ||||||
| AlgPred | model for mapping lgE epitopes and other functions | 178 IgE epitopes | protein sequences | – | – | – | 0.1747 | 0.9814 | 2006 | [121] | ||
| BCPRED | SVM-based model using string kernels | 701 linear B-cell epitopes, 701 non-epitopes as positive and negative dataset | Antigen sequence, predicted method, epitope length, threshold | 0.6790 | 0.758 | 0.7261 | 0.632 | 2008 | [122] | |||
| IgPred | dipeptide composition-based SVM model for predicting antibody class-specific B-cell epitopes | 14,725 B-cell epitopes include 11,981 IgGa, 2341 IgEb, 403 IgAc specific epitopes, and 22,835 non-B-cell, 80% sequence as training dataset | amino acid sequence of peptides | 0.7173a 0.8496b 0.7228c | 0.77a 0.90b 0.78c | 0.44a 0.70b 0.45c | – | – | 2013 | [123] | ||
| BCIgEpred | A dual-layer model for predicting lgE epitopes | 1273 IgE epitopes, 4226 non-IgE epitopes | protein sequences | 0.743 | 0.833 | 0.48 | 0.726 | 0.758 | 2018 | [124] | ||
| iBCE-EL | An ensemble method fusion extremely randomized tree and gradient boosting classifiers | 4440 BCEs and 5485 non-BCEs | protein sequences | 0.732 | 0.789 | 0.463 | 0.742 | 0.724 | 2018 | [125] | ||
| SPADE | pipeline for the localization of cross-reactive lgE-epitopes based on the structural | – | Structures of the query protein and a cross-reactive second allergen | – | – | – | 0.57 | 0.71 | 2019 | [126] | ||
| AlgPred2 | ensemble approach for mapping lgE epitopes and other functions | 10,451 IgE epitopes, 307,866 non-IgE epitopes | protein sequences | – | – | – | 0.7299 | – | 2020 | [127] | ||
| DLBEpitope | feedforward deep neural network-based ensemble model | 25,884 linear B-cell epitopes and 214,679 non-epitopes as positive and negative sample, then extent the dataset to 20,000 positive and 20,000 negative samples for each length | NA | 0.9568 | 2020 | [128] | ||||||
| AbCPE | multi-label method for prediction of antibody class(es) for B-cell epitopes | 10,744 epitope sequences, covering 4 specific antibody classes and 7 classes of combination of different antibodies | epitope sequences | – | – | – | – | – | 2021 | [129] | ||
| BepiPred-3.0 | protein language model | 358 antigens and 5011 epitope residues | protein sequence, threshold, top epitope percentage cutoff | 0.688 | 0.762 | 0.309 | 2022 | [130] | ||||
| DeepLBCEPred | A Bi-LSTM and multi-scale CNN-based DL method | 555 sequences of BCEs and 555 sequences without BCEs | protein sequences | 0.77 | 0.54 | 0.78 | 0.75 | 2023 | [131] | |||
| epitope1D | An explainable ML method leveraging two new descriptors: graph-based signatures and organism identification | 20,638 epitopes and 6-times negative samples | peptides or whole protein sequences, desired window size, organism taxonomy | 0.935 | 0.613 | 2023 | [132] | |||||
| Conformational | Generic antigenic region-based Method | CEP | Method using accessibility of residues and spatial distance cut-off | – | 3D structure of antigens | 0.75 | 2005 | [133] | ||||
| DiscoTope | Method using a combination of amino acid statistics, spatial information, and surface exposure | discontinuous epitopes from 76 X-ray structures of antibody/antigen protein complexes | 3D structure of antigens | 0.155 | 0.95 | 2006 | [134] | |||||
| ElliPro | tool based on the geometrical properties of protein structure | – | protein sequence or structure | 0.840 | 0.732 | – | 0.601 | 2008 | [135] | |||
| SEPPA | Method based on the residual context and the spatial compactness of neighboring residues | 82 Ab-Ag complexes, contained 84 unique epitopes | 3D protein structure | 0.742 | 0.580 | 0.707 | 2009 | [136] | ||||
| CBTOPE | SVM-based model using composition profile of patterns in a protein sequence | 187 antigens, 2261 positive patterns (Ab interacting residues of B-cell epitopes), 107,414 negative patterns (non-Ab interacting residues of B-cell epitopes) | NA | 0.8659 | 0.73 | 0.8313 | 0.9006 | 2010 | [137] | |||
| BEST | a sliding window-based SVM predictor utilizes Ag sequences | 701 B-cell epitopes 20-mers and 701 non-B-cell epitopes 20-mers | amino acid sequence | 0.745 | 0.811 | 0.53 | 0.561 | 0.929 | 2012 | [138] | ||
| CBEP | SVM-based model adopts cost-sensitive ensemble classifiers and spatial clustering | Rubinstein’s bound structure dataseta; Liang’s unbound structure datasetb; | amino acid sequence | 0.721a 0.703b | 2014 | [139] | ||||||
| SEPIa | Method based on sequence-based features and a voting algorithm combines two ML classifiers | 85 Ab-Ag complexes, 1667 conformational B-cell epitope residues, and 16,780 other residues | amino acid sequence | – | 0.65 | – | – | – | 2017 | [140] | ||
| SEPPA 3.0 | logistic regression model added two critical glycosylation parameters, and final calibration based on neighboring antigenicity | 767 protein antigens (520 with N-glycosylation sites), including 16,544 epitope residues as positive set and 172,975 non-epitope residues as negative set. | 3D protein structure, subcellular localization of protein antigen, and species of immune host | 0.665 | 0.749 | 2019 | [141] | |||||
| DiscoTope-3.0 | tool that employs inverse folding structure representations and a positive-unlabelled learning strategy, adapted for both AlphaFold predicteda or solvedb structures | 582 Ab-Ag complexes, covering 1466 antigen chains, epitopes are defined as the set of residues within 4 Å of any antibody heavy atom | 3D structure of antigens | 0.783a 0.795b | 0.227a 0.214b | 2024 | [142] | |||||
| Antibody-specific epitope-based Method | PEASE | RF model predicting Ab-specific epitopes from antibody sequence | 120 Ab-Ag complexes | Ag structure, Ab sequence, Ag sequence (optional, when the 3D structure of Ag is a computational model) | – | – | – | – | – | 2014 | [143] | |
| EpiPred | Global docking-based model combines geometric matching and specific scores of Ab-Ag structures | 148 Ab-Ag complex structures, 30 of which were randomly chosen as test set, the rest as training dataset | structure of an antibody and an antigen | – | – | – | – | – | 2014 | [144] | ||
| SEPPA-mAb | tool for predicting mAb-specific epitopes, consists of a SEPPA 3.0 model and a fingerprint-based patch model | 860 representative Ab-Ag complexes | 3D structure of Ab-Ag pair, subcellular localization of protein antigen, and species of immune host (optional) | 0.873 | 2023 | [145] | ||||||
| Mimotope-based Method | MIMOX | Mimotope-based method for phage display based epitope mapping | – | Sequence and 3D structure of antigen, a set of mimotope sequences | – | – | – | – | – | 2006 | [146] | |
| Pepitope | Mimotope-based ensemble method for epitope mapping based on affinity-selected peptides | – | a set of affinity-selected peptide sequences and a PDB identifier of the antigen | — | – | – | – | – | 2007 | [147] | ||
| Pep-3D-Search | Mimotope-based method using Ant Colony Optimization algorithm | – | 3D structure of an antigen and a set of mimotopes or a motif | 0.1758 | 0.3642 | 2008 | [148] | |||||
| Sun et al | ensemble method that combines structure-based method and mimotope-based method | 150 Ag-Ab complexes with only one antigen chain | antigen structure and mimotopes | 0.79 | 0.14 | 0.58 | 0.81 | 2015 | [149] | |||
Key for feature: ANN = artificial neural network; SMM-align = stabilization matrix alignment method; HMM = Hidden Markov Models; ML = machine learning; DL = deep learning; RF = random forest; Ab-Ag = antibody–antigen. Key for Performance measure: Acc (%) = accuracy; AUC = Area Under the ROC Curve; MCC = Matthew’s correlation coefficient; Sens (%) = sensitivity; Spec (%) = specificity.
Prediction of linear B-cell epitopes
Linear B cell epitopes were continuous in sequence, and though the percentage of linear epitopes is small, the predicted tools still received major attention. The early tools were based on sequence and based on simple AA propensity scales depicting physicochemical features of B-cell epitopes, as some qualities such as hydrophilicity, accessibility, and mobility could be used to predict which region of protein can provide antigenic peptides. To improve the poor performance of the amino acid propensity scale-based tools, ML models were introduced. The common tools like ABCpred [119] is an ANN-based method that achieved a prediction accuracy of 65.93%, and BCPRED [122] is an SVM-based method with an AUC value of 0.758. After the phase of propensity scales-based semiempirical methods and the following phase of initial ML methods, the current generation of epitope prediction methods has integrated with emerging technology resources or the recent advances in existing knowledge resources. DLBEpitope [128] is the first effort that designed prediction models for linear B-cell epitopes using DL methods, scored an impressive 0.95 AUC on the IEDB dataset, significantly outperforming the other methods. In addition, novel techniques and advances such as fuzzy-Artmap Artificial Neural Networks, feature maps, and advanced protein language model have also been used in developing prediction tools [150]. The above methods were proposed to predict the antigenic region or B-cell epitopes that can induce B-cell response, there are some predictive tools at another level which can distinguish the predicted epitopes that could induce which class of antibody [129]. The ‘Supplementary B linear epitopes’ part has listed a more detailed description of each phase.
Prediction of conformational B-cell epitopes
Approximately 90% of the BCEs are discontinuous in nature, however, because of the insufficient of available antibody–antigen complex structures, the degree of the development of discontinuous epitopes is limited [129]. Existing methods for conformational epitope prediction can be broadly classified into three categories: generic antigenic region-based methods, mimotope-based methods, and antibody-based methods. According to the kind of input data as antigen primary sequence or tertiary structure, the generic antigenic region-based methods could be divided into sequence-based methods and structure-based methods. The first predicted method CEP [133] was developed based on the solvent-accessibility character of residues and spatial distance cut-off to predict potential epitope regions from the 3D structure of antigens. CBTOPE [137] is the first attempt in this area to develop a B-cell conformational epitope prediction tool based only on the antigen’s primary sequence. However, the available structure-based methods are heavily dependent on the number and quality of currently available antigen 3D structure data [151], and the main downside of sequence-based methods is that the predicted epitope residues are not grouped into the corresponding epitopes and the prediction performance is far from satisfactory [152]. Antibody-specific epitope predictors are the new trend in the field, which could deal with a more constrained and tractable task as opposed to generic epitope prediction, but it is antibody-dependent, and the number and knowledge of the subsistent antibodies limited the usage [151]. The other category of the predicted method is the mimotope-based approach, the peptides that could mimic epitopes are called mimotopes [153]. The core idea of the approach is to map the mimotope sequences that are obtained from a random phage display library [154]. The 3D structure of antigen and antibody affinity-selected peptides both needed to be input. The high false positive or false negative rate is the main disadvantage of the mimotope-based method, since the complexity of the screening process [155]. The size and data diversity of the phage display library would also greatly affect the performance [156]. The description of typical methods of each category has been summarized by year of issue in Table 10, and the detailed record of typical tools of each category can be found in the ‘Supplementary B conformation’ part.
The above tools have brought pioneering progress in the development of epitope prediction, which greatly promotes the process of the fields and brings inspiration to the development of novel prediction methods with high performance. However, one drawback exhibited in most tools is the instability of predictions [117]. Cia and co-workers [157] constructed a large and well-curated dataset that contains over 250 antibody–antigen structures in 2023 to verify the performance of nine commonly conformational B-cell epitope predictors. According to the assessment, each models output different prediction performance on the benchmark dataset, which reflects the instability of predictions and the effect of different data sets on predictions. The nine methods tested above are divided into generic and antibody-specific methods, the other research indicated that the performance of mimotope-based approaches was also not as good as expected, so there is still urgency to develop a new method to predict conformation B-cell epitopes that could have high performance in most proteins.
Benchmark dataset generation for linear B-cell epitope prediction
The linear B-cell epitopes were derived from IEDB [20] with key words of linear peptide, human host, and positive/negative, and we only involve peptides included in the IEDB database after 2023, which can be considered as the completely independent testing dataset for model comparison.
The benchmark dataset for B-cell epitope involving 2158 experimentally validated positive peptides and 2240 negative peptides were listed in Supplementary Table S1. Among them, two major pathogens including bacteria and viruses were derived for validation on the pathogens level, which included 233 positive peptides and 767 negative peptides for bacteria, and 837 positive peptides and 708 negative peptides for viruses (Supplementary Table S1).
Moreover, the epitopes from the top two abundant species of Norwalk virus and SARS-CoV-2 were selected for validation on the species level (Supplementary Table S2). The detailed information on how to prepare benchmark datasets and the selection of models can be seen in the ‘Supplementary Benchmark dataset generation for B cell epitope predictors’.
Model comparison for B-cell epitope predictors
The performance of SOTA spatial B-cell epitope predictors including Bepipred2 [158], CBTOPE [137], SEPPA 3.0 [141], DiscoTope 2.0 [159], ElliPro [135], EPSVR [160], BEpro [161], epitope3D [162], and EpiPred [144] were systemically reviewed in Cabriel’s work [157] . By four specifically designed matrices, Cabriel’s benchmark test showed that DiscoTope2, EPSVR, and BEpro are significantly better than random procedure and random patches procedure across all metrics. BepiPred 2, SEPPA 3.0, ElliPro, and EpiPred are significantly better for some of the metrics, while CBTOPE and epitope3D are no better than random across all metrics [157]. Moreover, Cabriel’s work evaluated the performance of SARS-CoV-2 spike protein and showed that EPSVR achieved the best ROC-AUC of 0.75, followed by ElliPro of 0.66 and BepiPred2 of 0.64. Meanwhile, the other three tested tools of CBTOPE (ROC-AUC = 0.51), DiscoTope2 (ROC-AUC = 0.60), and epitope3D (ROC-AUC = 0.38) show no better than random [157].
Besides spatial B-cell epitope prediction, we evaluated the performance of linear-epitope prediction tools on the benchmark dataset (Table 11). Among all four tools, iBCE-EL seems to achieve the best accuracy of 0.6157, and epitope1D achieves the best precision of 0.6791. For balanced metrics such as balanced accuracy and MCC, iBCE-EL could outperform all others (Table 12). The validation on pathogens level (Table 13) and species level (Table 14) illustrated similar results, while iBCE-EL could outperform all other three on the benchmark dataset. Note that all tools performed better on the Norwalk virus than SARS-CoV-2, it is expected that as a newly emerging pathogen, previous tools may fail to predict the real epitope peptides. This might be enhanced when enough data for new pathogens accumulated for model training.
Table 11.
Average performance of each of the benchmarked B cell linear epitope predictors on the benchmark dataset
| Year | Method name | Availability | Input data | peptide length(mer) | Output | TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2023 | DeepLBCEPred | webserver | peptide | label | 892 | 1357 | 883 | 1266 | 0.5114 | 0.5025 | 0.4539 | 0.4133 | 0.6058 | 0.0195 | |
| 2023 | epitope1D | webserver | peptide | >5, over 25 will be cut | Score, label | 91 | 2195 | 43 | 2053 | 0.5217 | 0.6791 | 0.0799 | 0.0424 | 0.9808 | 0.0674 |
| 2022 | BepiPred-3.0 | webserver | peptide | >9 | Analysis result of each sequence, uppercase letters are epitopes | 1650 | 105 | 2110 | 414 | 0.4101 | 0.4388 | 0.5666 | 0.7994 | 0.0474 | −0.2345 |
| 2018 | iBCE-EL | webserver | peptide | Probability, label | 1154 | 1554 | 686 | 1004 | 0.6157 | 0.6272 | 0.5773 | 0.5348 | 0.6938 | 0.2316 |
Table 12.
Average performance of each of the benchmarked B cell linear epitope predictors on the consistent dataset
| Year | Method name | TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2023 | DeepLBCEPred | 805 | 1355 | 860 | 1259 | 0.5048 | 0.4835 | 0.4318 | 0.3900 | 0.6117 | 0.0018 |
| 2023 | epitope1D | 84 | 2177 | 38 | 1980 | 0.5284 | 0.6885 | 0.0769 | 0.0407 | 0.9828 | 0.0707 |
| 2022 | BepiPred-3.0 | 1650 | 105 | 2110 | 414 | 0.4101 | 0.4388 | 0.5666 | 0.7994 | 0.0474 | −0.2345 |
| 2018 | iBCE-EL | 1068 | 1550 | 665 | 996 | 0.6118 | 0.6163 | 0.5625 | 0.5174 | 0.6998 | 0.2211 |
Table 13.
Performance of each B cell linear epitope predictors on the virus dataset and bacteria dataset
| bacteria | virus | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc | TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc | |
| DeepLBCEPred | 109 | 484 | 283 | 124 | 0.5930 | 0.2781 | 0.3488 | 0.4678 | 0.6310 | 0.0856 | 296 | 485 | 223 | 541 | 0.5055 | 0.5703 | 0.4366 | 0.3536 | 0.6850 | 0.0408 |
| epitope1D | 1 | 766 | 1 | 232 | 0.7670 | 0.5000 | 0.0085 | 0.0043 | 0.9987 | 0.0283 | 37 | 698 | 10 | 794 | 0.4776 | 0.7872 | 0.0843 | 0.0445 | 0.9859 | 0.0881 |
| BepiPred-3.0 | 227 | 8 | 751 | 4 | 0.2374 | 0.2321 | 0.3755 | 0.9827 | 0.0105 | −0.0262 | 597 | 35 | 673 | 205 | 0.4185 | 0.4701 | 0.5763 | 0.7444 | 0.0494 | −0.2814 |
| iBCE-EL | 70 | 649 | 118 | 163 | 0.7190 | 0.3723 | 0.3325 | 0.3004 | 0.8462 | 0.1586 | 523 | 490 | 218 | 314 | 0.6557 | 0.7058 | 0.6629 | 0.6249 | 0.6921 | 0.3161 |
Table 14.
Performance of each B cell linear epitope predictor on the different virus datasets
| Method name | Source Organism | Species | Positive | Negative | TP | TN | FP | FN | Accuracy | precision | f1_score | sensitivity | specificity | mcc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DeepLBCEPred | Norovirus GII | Norwalk virus | 157 | 9 | 10 | 9 | 0 | 147 | 0.1145 | 1.0 | 0.1198 | 0.0637 | 1.0000 | 0.0606 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 397 | 540 | 155 | 395 | 145 | 242 | 0.5870 | 0.5167 | 0.4448 | 0.3904 | 0.7315 | 0.1291 | |
| epitope1D | Norovirus GII | Norwalk virus | 157 | 9 | 16 | 9 | 0 | 141 | 0.1506 | 1.0 | 0.1850 | 0.1019 | 1.0000 | 0.0782 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 391 | 540 | 4 | 537 | 3 | 387 | 0.5811 | 0.5714 | 0.0201 | 0.0102 | 0.9944 | 0.0267 | |
| BepiPred-3.0 | Norovirus GII | Norwalk virus | 155 | 9 | 96 | 0 | 9 | 59 | 0.5854 | 0.9143 | 0.7385 | 0.6193 | 0.0000 | −0.1806 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 370 | 540 | 263 | 22 | 518 | 107 | 0.3132 | 0.3367 | 0.4570 | 0.7108 | 0.0407 | −0.3499 | |
| iBCE-EL | Norovirus GII | Norwalk virus | 157 | 9 | 102 | 0 | 9 | 55 | 0.6145 | 0.9189 | 0.7612 | 0.6497 | 0.0000 | −0.1685 |
| Severe acute respiratory syndrome coronavirus 2 | Severe acute respiratory syndrome coronavirus 2 | 397 | 540 | 247 | 432 | 108 | 150 | 0.7247 | 0.6958 | 0.6569 | 0.6222 | 0.8000 | 0.4300 |
Vaccine construction and optimization
In the general construction of a multi-epitope vaccine, an adjuvant would be added to the N-terminal with a suitable linker, and following the linker, PADRE sequence is appended which could be seen as an additional adjuvant to increase the immunogenicity, then connected to epitopes with linkers and there is no fixed sort of placing the epitopes, finally, other functional peptides could be appended to the vaccine’s C-terminal or not. Codon adaptation is an important process before in silico cloning of the vaccine construct, which aims to adjust the codon of the vaccine to fit the codon usage preference of hosts to achieve a high expression rate by tools. In real-world applications, in-silico tools already illustrated a strong ability to in assist the design of vaccine pathogens. For example, a personalized neoantigen vaccine targeted melanoma constructed based on the epitopes identified by NetMHCpan version 2.4 tool, which has already passed the clinical I phase trial and confirmed that in vaccinated four years later, still can provide the body with durable anti-tumor ability [163]. Similarly, neoantigen vaccine targeting glioblastoma, also developed based on target epitopes identified by NetMHCpan version 2.4 tool, has shown great immune ability in phase Ib trial and could generate circulating polyfunctional neoantigen-specific CD4 and CD8 T cell responses [164]. These success paradigms demonstrate the feasibility of the computational design of epitope-based vaccines and the significant contribution of bioinformatics methods especially epitope prediction tools to the development of multi-epitope vaccine candidates. These studies also exhibited the potential of the development of a personalized precision vaccine. The commonly used literature of predicted tools, linkers, adjuvants, functional peptides, and codon optimization tools can be seen in Table 15, which summarizes the main studies of the development of a multi-epitope vaccine based on bioinformatics tools in 2019-2024.
Table 15.
Main studies of multi-epitope vaccine design using bioinformatics tools against different pathogens
| Target pathogen | Pre-PVC | Feature | Epitope prediction tool | Adjuvant | Linker | Functional peptides | codon optimization | Experimental Verification | Years | Ref |
|---|---|---|---|---|---|---|---|---|---|---|
| Acinetobacter baumannii | 32 protein candidates | Selected from a literature review | CTL, HTL, B: EigenBio’s proprietary epitope prediction software | AddaS03 | EAAAK, GPGPG, KK, | 6-His Tag, thioredoxin protein | GenScript | Y | 2024 | [165] |
| Haemophilius parainfluenzae | two proteins that are essential and non-homologous protein with highest antigenicity | entire proteome is retrieved from UniProt, screened by several filters | CTL: CTLPred B: ABCPred (for linear); ElliPro (for discontinuous) |
cholera enterotoxin subunit B | EAAAK, AAY, GPGPG, KK | NA | JCat | N | 2024 | [166] |
| Bluetongue virus (BTV) | 3 full-length amino acid sequences of BTV NS proteins | Download form available genomes of 24 serotypes of BTV of NCBI database | CTL: ‘NetMHCpan BA 4.1’ module of IEDB; IEDB-AR server HTL: ‘MHCIIpan 4.0 BA’ module of IEDB; NetBoLAIIPan 1.0 tool |
beta-defensin 2, or 50S ribosomal subunit | EAAAK, AAY, GPGPG | NA | JCat | N | 2024 | [167] |
| Epstein–Barr virus (EBV) | EBVpoly recombinant protein | An engineered recombinant polyprotein that is formed by linking 20 CD8 + T cell epitopes from several EBV proteins | CTL: literature | Amphiphile-modified CpG DNA adjuvant | proteasome liberation amino acid sequence | EBV gp350 protein | experiment | Y | 2023 | [168] |
| SARS-CoV-2 | A surface glycoprotein (ID: QHO62112.1) | retrieved from the NCBI database | CTL: NETCTL_1.2; MHC-NP tool; IEDB MHC-I binding predictions tool; MHC-NP HTL: IEDB MHC-II binding predictions B: Bepipred 2.0, Chou & Fasman Beta-Turn Prediction, Emini Surface Accessibility Prediction, Karplus & Schulz Flexibility Prediction, Kolaskar & Tongaonkar Antigenicity and Parker Hydrophilicity Prediction |
NA | NA | NA | NA | N | 2020 | [169] |
| Mycoplasma synoviae | 5 proteins of strain rSC0200 | These proteins have been regarded as promising PVCs in earlier studies | CTL: NetMHCcons 1.1 HTL: NetMHCIIpan 4.0 B: BCpred |
β-defensin | EAAK, AAY, GPGPG, KK | NA | Experiment, replacement of TGA by TGG | Y | 2023 | [170] |
| Group B streptococcus (GBS) | 15 proteins | experimentally proven protection against GBS | CTL: TEPITOOL HTL: TEPITOOL B: ABCpred; Bcpred |
NA | GPPGPG, LRMKLPKS | NA | JCat | Y | 2022 | [171] |
| monkeypox virus | 176 gene-coding peptide sequences | whole genome and all 176 genome-encoded proteins has been studied | T: IEDB v2.24 webserver B: Bepipred 2.0 |
CTxB | EAAAK, AAY, GPGPG, KK | PADRE | NA | N | 2022 | [172] |
| monkeypox virus | 139 epitopes sequences | experimentally determined epitopes, obtained from ViPR database | – | β-defensin 3, 50S ribosomal protein L7/L12, Heparin-binding hemagglutinin | EAAAK, AAY, GPGPG, GGGS | PADRE, TAT | JCat | N | 2022 | [173] |
| Hafnia alvei | 12 proteins | Whole 11 genome sequences download from NCBI, screened by several filters | T: IEDB T-cell antigenic determinants analysis package; B: Bepipred |
Cholera toxin B-subunit | EAAAK, GPGPG, | NA | JCat | N | 2022 | [174] |
| Lymphocytic choriomeningitis virus | 4 proteins | Whole proteome retrieved from the UniProt database, screened by several filters | CTL: NetCTL 1.2; CTLPred; IEDB SMM method HTL: NetMHCIIpan 4.0; IEDB SMM method; IFNepitope; IL4pred; IL10pred B: BCPred; BepiPred-2.0; ABCpred; Ellipro (for discontinuous) |
50 S ribosomal protein L7/L12 | EAAAK, AAY, GPGPG, KK | PADRE, 6xHis tag | JCat | N | 2022 | [175] |
| Monkeypox virus | 10 proteins with the highest antigenic probability score | entire proteome related to epidemic in 2022, downloaded from UniProt, screened by antigenic and allergic | CTL: NetCTL-1.2 HTL: IEDB MHC-II binding predictions; IFN-epitope B: ABCpred |
humanβ-defensin-2 | EAAAK, AAY, GPGPG, KK | NA | JCat | N | 2023 | [176] |
| Crimean-Congo hemorrhagic fever virus (CCHFV) | 2 consensus sequences of glycoprotein precursors (GPC) and nucleoproteins (NP) | GPC and NP have been proven protection against CCHFV | CTL: NetCTL 1.2; IEDB MHC-I binding predictions; SMMPMBEC HTL: NN-align 2.3 (NetMHCII 2.3) B: Bepipred 2.0, Emini surface accessibility prediction, Kolaskar and Tongaonkar antigenicity and Karplus and Schulz flexibility prediction method |
50S ribosomal protein L7/L12 | GGGGSEAAAKGGGGS, GPGPG, KK, AAY | PADRE, MITD sequence, | GenSmart | N | 2022 | [177] |
| Aeromonas hydrophila | An aerolysin protein that conserved and has no similarity with human and normal flora proteome | The role of Aerolysin has been proven by studies, related information obtained from UniProt and PDB database, screened by several filters | CTL: IEDB-recommended method HTL: IEDB-recommended method B: BepiPred-2.0 |
Cholera Toxin B | EAAAK, GPGPG | NA | JCat | N | 2024 | [178] |
| Coxsackievirus B (CVB) | complete amino acid sequences | NCBI Protein database | CTL: NetCTL-1.2; NetMHCpan 4.1 HTL: NetMHCII-2.3; NetMHCIIpan 4.0; IFN- epitope B: ABCpred, BCPreds (for linear); ElliPro, DiscoTope (for discontinuous) |
β-defensin | EAAAK, AAY, GPGPG, KK | PADRE, TAT | JCat | N | 2022 | [179] |
| Monkeypox Virus | MPXVgp181 virus protein | Protein dataset obtained from NCBI, screened by several filters | CTL: IEDB MHC-I binding predictions tool HTL: IEDB MHC-II binding predictions tool; NN-align 2.3 B: Bepipred 2.0 |
50S ribosomal protein L7/L12 |
EAAAK, AAY, GPGPG, KK | 6xHis tag | JCat | N | 2022 | [180] |
| Rotavirus and Norovirus | All structural proteins of two viruses | epidemiological relevance | CTL: IEDB MHC-I binding predictions; NetCTL-1.2 HTL: IEDB MHC-II binding predictions; NETMHCII 2.3 B: ABCpred |
conserved enterotoxic sequence of RV’s NSP4, tetanus toxin epitope P2 | EAAAK, AAY, GPGPG | NA | GenSmart | N | 2023 | [181] |
| Visceral leishmaniasis | 10 proteins (6 are parasitically derived, 4 are salivary proteins) | Literature survey, immune ability of salivary proteins has been proven, retrieval from NCBI, screened by several filters | CTL: NetCTL-1.2 HTL: RANKPEP B: ABCpred |
TLR-4 agonist | EAAAK, AAY, GPGPG, KK | NA | NA | N | 2020 | [182] |
| SARS-CoV-2 | 9 SARS-CoV-2 proteins | proteome of SARS-CoV-2, screened by length and antigenic | CTL: NetCTL-1.2; EpiToolKit HTL: IEDB MHC-II binding predictions; IFN- epitope B: ABCpred, BepiPred (for linear); ElliPro (for discontinuous) |
β-defensin | EAAAK, AAY, GPGPG, KK | PADRE, TAT | JCat | N | 2020 | [183] |
NA: data not given.
Vaccine construction
Base on the mentioned construction strategy, the assembly process of designing a multi-epitope vaccine would involve the application of linkers, adjuvant, and other functional peptides. The related information resources of vaccine construction and optimization are listed in Table 16, which contains information of databases and analysis tools of adjuvants, linker, and codon optimization.
Table 16.
A list of tools and databases of adjuvants, linker, and codon optimization
| Name | usage | Input | Output | URL | Ref |
|---|---|---|---|---|---|
| linker | |||||
| LinkerDB | database of inter-domain linkers | Query types | A list of linker identifiers along with their sequences, the identifier could connect to the 3D structure of linkers | http://mathbio.nimr.mrc.ac.uk | [209] |
| MEROPS | manually curated resources for peptidases, their inhibitors, and substrates with known cleavage sites | Index of Name, MEROPS Identifier, or source organism | A list of peptidase names with detailed information links | https://www.ebi.ac.uk/merops/ | [210] |
| LINKER | an automatic program generates peptide sequences with extended conformations determined by experiments | desired linker sequence length and optional input parameters | A list of peptide sequences with specified criteria | http://astro.temple.edu/~feng/Servers/BioinformaticServers.htm | [211, 212] |
| SynLinker | System compiled 2260 linker sequences containing natural linkers and artificial/empirical linkers | Query criteria | A list of linker candidates with structure information satisfying criteria, fusion protein structure, and hydropathicity plot | http://bioinfo.bti.a-star.edu.sg/synlinker | [213] |
| adjuvant | |||||
| Vaxjo | database and analysis system contains the basic information and usage of vaccine adjuvants | Keyword and their field | A table of related adjuvants with a link of basic adjuvant information and associated vaccine information. | https://violinet.org/vaxjo/ | [214] |
| vaccineDA | model of designing oligo deoxy nucleotide-based vaccine adjuvants | Query nucleotide sequence | A list of query sequences with predicted class (Immunomodulatory or not), score, and additional properties | https://webs.iiitd.edu.in/raghava/vaccineda/ | [215] |
| vaxinPAD | webserver for designing or predicting peptide-based vaccine adjuvants. | query peptide sequence | A list of query sequences with predicted class (Immunomodulatory or not), score, and additional properties | https://webs.iiitd.edu.in/raghava/vaxinpad/ | [216] |
| Codon optimization | |||||
| JCAT | statistic-based method, a Java-based tool, for most prokaryotic and some eukaryotic organisms | query sequence, class of query sequence, organism which codon should be adapted to | The CAI values of query sequence and optimized sequence, graphical representation of the relative adaptiveness | http://www.prodoric.de/JCat | [217] |
| Optipyzer | statistic-based method, a multi-species server with the ability to process large sets of genes | A query sequence, species weights, sequence type | Optimized sequence | https://optipyzer.com | [218] |
| ICOR | An RNN-based method, for Escherichia coli | amino acid sequence | an optimized nucleotide codon sequence | https://github.com/Lattice-Automation/icor-codon-optimizati | [206] |
| Fu et al. | A DL-based method that converts the codon optimization into sequence annotation with codon boxes, for E. coli | NA | NA | https://github.com/Devil625/Codon_Optimization.git | [219] |
| MOABC | A heuristic-based method using a multi-objective adaptation of the Artificial Bee Colony algorithm | NA | NA | NA | [220] |
| LinearDesign | A heuristic-based method for optimized mRNA in both structure stability and codon usage by adapting the lattice parsing concept | Single protein sequence, the class of paste sequence, beam size(optimal) | Two CSV files, one contains predicted values, optimized sequence, and structure, and another contains codon usage frequency table | https://rna.baidu.com/app/vaccine/linear-design/forecast | [221] |
Use of linkers
Linkers are generally short stretch amino acid sequences derived from nature, inserted in a protein to separate multiple domains and act as spacers [184]. Linkers are indispensable components of rational vaccine design, ensuring the immunogenicity of each individual epitope and minimizing the junctional immunogenicity, and some linkers could induce an immune response and thus increase the immunogenicity of the vaccine [185]. The natural linkers may serve as a good lead for fusing proteins of interest, and the empirical linkers have been widely used in the design of various applications [186]. To accelerate the progress of selecting linkers, several interesting bioinformatics tools and databases have been developed to facilitate linker selection for rational vaccine designing, the introduction of information resources including input, output, and usage has been listed in Table 16.
Based on the sequence and structure, linkers used in vaccine design could be divided into two categories: flexible and rigid linkers [186]. ①Flexible linkers have great flexibility and mobility and are suitable for joining two functional domains, allowing a certain amount of movement or interaction of the protein domains [187]. Most widely used flexible linkers such as GPGPG, AAY, HEYGAEALERAG, and KK sequences were usually applied to join CTL, HTL, and BCL epitopes. The linker of GPGPG can induce HTL immune response and facilitate epitope presentation [188]. The AAY and HEYGAEALERAG sequence provides the site of proteasome cleavage, thus could prevent the loss of epitope during antigen presentation [185, 189, 190]. KK linker can reduce the junctional immunogenicity, which might be caused by linearly joint epitopes [185]. Nevertheless, the high flexibility and lack of rigidity make the functional domains separate less effectively and may cause a loss of biological activity and poor expression yields [187, 191]. ②When a fixed distance between the functional domains and sufficient separation of protein domains is desired, rigid linkers may be the best choice [192]. EAAAK is one of the commonly used rigid linkers for joining adjuvant and CTL epitopes [188] and could separate the two protein components to increase efficiency and minimize interference [189].
Researchers have demonstrated that linkers could increase the stability of fusion protein vaccine and the degree of vaccine stability is changed with different linkers [193]. The other properties also have significant impacts on function and flexibility [194]. Thus, the properties and components of linkers and the requirements and characteristics of the designed vaccine should be worth careful consideration when selecting a suitable linker for rational vaccine development.
Use of adjuvants
One of the limitations of epitope-based subunit vaccines is their weak ability to activate the immunity system as they only consist of the antigenic components of pathogens. Thus, adjuvants are always co-administered with vaccines to enhance the magnitude and durability of the immune response by using a delivery system or chemical conjugation with the peptide to incorporate into the vaccines [8, 195]. Besides the ability to elicit a robust immune response, adjuvants could also increase the biological half-life of vaccines, induce the production of immunoregulatory cytokines, and induce local inflammation and cellular recruitment [196]. Detailed information about the mechanisms and platforms of adjuvants could be seen in other research [197].
Two major types of adjuvants include delivery adjuvants and immune agonists. (i) Delivery adjuvants, including aluminum adjuvants, MF59, emulsions, liposomes, etc., can effectively help antigen presentation. The earliest and most widely used aluminum adjuvants could induce an effective humoral immune response, but insufficient cellular immune response [198] and could also cause cell damage [199]. (ii) Immunoagonists, such as CpG oligodeoxynucleotides (CpG ODN), could enhance the immune response by recognizing and triggering toll-like receptors, activation of toll-like receptors (TLRs) on the surface of APCs will induce the secretion of various kinds of cytokines which will promote Th response [200]. TLR agonists, especially for TLR3, TLR4, and TLR9, are widely used adjuvants for computationally designed vaccines [201], and the efficacy of TLR agonists has been proven in many preclinical and clinical studies [200]. However, there is a potential risk of systemic toxicity and inflammation, and it may not be ideal to enhance the immunity of the vaccine [202, 203].
There are some resources for the selection and development of vaccine adjuvants, the description of these resources has been listed in Table 16 and ‘Supplementary Adjuvant and functional peptides’ part. A good adjuvant that could be used for the development of a multi-epitope vaccine must have some characteristics like safe, well-tolerated, stable, reproducible, robust, scalable, and easy to produce [204]. In the progress of the vaccine design currently, the selection of adjuvants is resource-independent, and usually references previous related literature to choose several types of adjuvants to construct vaccine models and the one with the best immunity will serve as the last selected. The majority of vaccine candidates use the agonists of TLRs as adjuvants to synergistically activate the immune response [205]. We still need to take care of the disadvantages of adjuvants in the rational development of vaccines, including the possibility of adverse reactions, less effective in older populations, and weak ability to induce CD8+ T cell-mediated cellular immunity [203]. Therefore, we should be careful about the selection of adjuvants. The PADRE sequence is also appended to vaccine construction to increase the immunogenicity which could be seen as an additional adjuvant, the application of PADRE sequence and other functional peptides is introduced in the ‘Supplementary Adjuvant and functional peptides’ part.
Vaccine construct optimization
Determining the expression of the constructed protein in a heterologous host is a significant step for the development of vaccines, the successful expression of vaccine candidates would be allowed for future production and purification [16, 206]. As the degeneracy of codons allows multiple codon coding the same protein and the codon usage bias is different for expression hosts, codon optimization of vaccines by using tools is an effective approach to improve the protein expression in the heterologous hosts [9, 207]. Besides the improvement of protein expression, the effectiveness of immunization, stability, protein conformation, and protein function could also be altered by codon optimization [208]. The CAI index and CG-content are widely used for assessing the express degree of vaccine sequence [173, 174, 180], it is generally agreed that a CAI value between 0.8-1.0, the score of GC-content between 30 and 70% would be a good vaccine candidate with high efficiency of transcription and translation [174]. There are numerous tools for codon optimization have been developed which could be divided into three categories: statistic-based, ML-based, and heuristic-based methods. The introduction of the three categories can be seen in the ‘Supplementary Codon optimization’ part and the information of related methods has listed in Table 16.
Computational verification of vaccine construct
Prediction of antigenicity
Besides the proteins that have been experimentally demonstrated to be immunogenicity, the antigenicity of other candidate proteins should be tested, only the proteins that are considered to have antigenicity can be selected for further study. There are several related RV programs, that could be categorized into two types: filtering and classifying [222], both types take protein sequences as input, and output whether the proteins can be potential vaccine candidates or not based on the antigenicity predicted result. Vaxign [223], NERVE [224], Jenner-predict [225], and VacSol [226] are classical filtering tools. Vacceed [227] and VaxiJen [228] are the typical classifying programs.
However, none of the above models could achieve a recall over 0.76. The advent and progress of ML/DL algorithms have also brought great improvements in the above two fields of predicting immunogenicity. The newest Vaxign-DL [222] method was developed by employing a multi-layer perceptron model which is a kind of DL algorithm that operates through the sequential layering of nonlinear processing units, achieve an AUC value of 0.94. The Vaxign-DL and VaxiJen v3.0 [229], which has been updated by introducing several ML algorithms and a new dataset, both exhibited great performance in the bacterial immunogenicity prediction, while the VirusImmu [230] method is proposed to improve the accuracy in the prediction of viral protective antigens, which adopt a soft voting approach to construct an ensemble model based on eight commonly-used ML methods. The VirusImmu method shows the powerful and stable capability for immunogenicity prediction with the highest AUC value over other commonly used models on their independent external test set.
Prediction of allergenicity and toxicity
The epitopes selected for the construction of multi-epitope vaccines must be non-allergic and non-toxic to avoid potential dangers for humans.
In silico prediction of peptide allergenicity
The assessment of potential allergenicity is an indispensable step in the development of vaccines [231]. The development of classical vaccines introduces large proteins or whole organisms as antigens which increases the unnecessary antigenic load, while epitope-based vaccines just comprise antigen regions of foreign substances which could decrease the chance of inducing an allergenic response [232]. The selected epitopes used for the vaccine construct and the final vaccine construction candidates must be classified as non-allergen by allergenicity prediction tools [232]. There are two criteria for the assessment of allergenic potential defined by WHO/FAO: the identity of at least six contiguous amino acids, or over 35% similarity over a window of 80 amino acids with known allergens [233]. Databases, such as AllergenOnline [234], COMPARE [235], and SDAP [54] et al. could obtain information on known allergens. The computational methods of prediction of potential allergenicity could be divided into three categories: alignment-based, alignment-free, and hybrid approaches, the classical tools have been listed in Table 19, and the discussion of all three categories can be seen in the ‘Supplementary Prediction of allergenicity’ part.
Table 19.
A list of tools for evaluation of vaccine
| Usage | Tool | Feature | Input | Output | Cut off | Platform | Year | Ref |
|---|---|---|---|---|---|---|---|---|
| antigenicity | Vacceed | Classifying class, pipeline based on eukaryotic pathogens resources, including build proteome and ML algorithm. | Proteome | a ranked list of protein candidates | >0.5 | source code | 2014 | [227] |
| VaxiJen | Classifying class, alignment-independent method based on ACC transformation | Single/multiple protein sequence, target organism | prediction probability, a statement of protective antigen or non-antigen | >0.5 | webserver | 2007 | [228] | |
| VaxiJen v3.0 | Classifying class, a voting approach based on three supervised ML methods | target organism, single/multiple protein/peptide | Predicted probability for immunogenicity | NA | webserver | 2020 | [229] | |
| NERVE | Filtering class, an automated RV system, identify PVCs from bacterial proteomes | Proteome | A ranked table of PVCs with predicted features information and links to corresponding primary data | Non-surface antigens, >2 transmembrane helices, adhesin probability>0.46 or 0.38, and no or low similarity with human proteins | source code | 2006 | [224] | |
| Jenner-predict server | Filtering class, method predicting PVCs from bacterial proteomes based on identifying critical functional domains | Protein sequence or a proteome | A ranked list of PVCs and information of predicted parameters | Non-cytosolic protein, <3 transmembrane helices, Pfam ID is listed in master list | webserver | 2013 | [225] | |
| Vaxign | Filtering class, a system based on genome sequences, including two programs: Vaxign Query, Dynamic Vaxign Analysis | Selected genomes, or single /multiple protein sequence and parameters (optional) | A table of PVCs with information of predicted parameters and similar proteins | Outer membrane proteins, <2 transmembrane helix, >0.51 Adhesin probability, no homology with human and mouse proteins | webserver | 2010 | [223] | |
| VacSol | Filtering class, a highly scalable, multi-mode, and configurable software for identifying PVCs from bacterial proteomes | proteome | a summary report of query proteome with predicted feature information of each sequence, the proteins that meet all feature criteria in report would have an epitope analysis table | no homology with human proteins, <2 transmembrane helix, essential gene, virulent protein, no cytoplasmatic protein | source code | 2017 | [226] | |
| Vaxign-ML | Classifying class, a ML classification RV program, incorporates both biological and physicochemical properties | Single protein sequence, pathogen type | the percentile rank score and basic information | >58% | webserver source code | 2020 | [262] | |
| Vaxign2 | Classifying class, a comprehensive web server consisting of predictive and computational workflow components | Protein sequence, parameters | A table with basic analysis results, a table with immunogenicity and functional profile, contains basic and population coverage information of predicted epitope | Vaxign-ML score > 90, adhesin probability>0.51 | webserver source code | 2021 | [263] | |
| DeepImmuno | A CNN-based model using a beta-binomial distribution approach to determine the immunogenicity potential of query peptide with HLA-I molecules | Peptide and MHC molecule | Immunogenicity score and binding score of query peptide and MHC molecule, extra information of query sequence and MHC. | >0.5 | webserver source code | 2021 | [261] | |
| Vaxign-DL | a three-layer fully connected NN model | NA | NA | NA | NA | 2023 | [222] | |
| VirusImmu | Classifying class, a novel soft-voting ensemble approach based on the top three models | Protein sequences | the predicted antigenicity score | >0.4 | source code | 2023 | [230] | |
| allergenicity | Allermatch | sequence similarity-based, three alignment methods: sliding window approacha, wordmatchb, and full alignmentc | amino acid sequence without header | A ranked table of similar allergens with alignment scores and detailed informationab A bar diagram shows the hit number between the query sequence and allergenic protein database, a ranked list of similar allergens with alignment scores and detailed informationc | 35%a | webserver | 2003 | [264] |
| AllerTool | sequence similarity-based, four integrated tools: XR-BLASTa, XR-Graphb, ALR-SCANc and ALR-SVMd | amino acid sequence | information on allergens that have reported cross-reactivity with the individual matchesa, a possible allergen cross-reactivity relationship graphb, a list of matchs that satisfy either of the rulesc, a list of high-similarity allergen sequences and reported cross-reactivity informationd | NA | webserver | 2007 | [265] | |
| Bjorklund et al | motif similarity-based, a dataset of allergen-representative peptides (ARPs), a supervised classifier DASARP based on the automated selection of ARPs | Peptide sequence | Predicted score value | 5.51, a higher value indicating a higher risk of allergenicity | NA | 2005 | [266] | |
| Lu et al. | motif similarity-based, a dataset of allergen-specific motifs based on physical and chemical properties (PCP-motifs) for 17 highly populated protein domains, a score model based on PCP-motifs | NA | NA | NA | NA | 2018 | [267] | |
| AllergenFP | An alignment-free descriptor-based fingerprint approach | One protein sequence | a statement of probable allergen or probable non-allergen, a link of protein from the pair with the highest Tanimoto similarity index | NA | webserver | 2014 | [268] | |
| AllerTOP | alignment-free predictor based on the main physicochemical properties of proteins | One protein sequence | a statement of probable allergen or probable non-allergen, a link of the nearest protein in UniPrptKB database | NA | webserver | 2013 | [269] | |
| AllerTOP v.2 | alignment-free method using amino acid E-descriptors, ACC transformation, and ML algorithm | One protein sequence | a statement of probable allergen or probable non-allergen, a link of the nearest protein in UniPrptKB database | NA | webserver | 2014 | [270] | |
| ProAll-D | alignment-free model using long short term memory slgorithm | protein sequence | a statement of allergen or non-allergen | NA | webserver source code | 2022 | [271] | |
| Kumar et al | alignment-free ensembled approach using three DL models | Protein sequence | the class label (allergen or not) of query protein | NA | python programs | 2023 | [272] | |
| AlgPred | Hybrid approach, contains (i) scanning of IgE epitopes; (ii) motif-based approach; (iii) SVM-based method using amino acid composition; (iv) SVM module based on dipeptide composition, (v) BLAST search on ARPs, and (vi) Hybrid Approach | One protein sequence | comprehensive information about the prediction that includes score, threshold, distance from threshold, precision and negative prediction value | −0.4(iii) −0.2(iv) | webserver | 2006 | [121] | |
| AlgPred 2.0 | Hybrid approach, integrates four major modules: (i) prediction using hybrid or RF model, (ii) IgE epitope mapping, (iii) motif scan, and (iv) BLAST search | one or more protein sequence | a statement of allergen or non-allergen, predicted score of each method (for (i)), the predicted similar proteins or motifs from the database (for (ii) (iii) (iv)) | (i) default threshold is 0.3, while user can change the value | webserver standalone source code python programs | 2020 | [127] | |
| AllerCatPro1.7 | Hybrid model based on similarity of both their amino acid sequences and 3D structures | one or more protein sequence | A table with the predicted result for allergenicity, the identity scores of similar allergens | NA | webserver | 2019 | [273] | |
| AllerCatPro 2.0 | Hybrid model based on similarity of both their amino acid sequences and predicted 3D structures, provide clinical relevance information | one or more protein/nucleotide | A table with the predicted result for allergenicity, the identity scores, functionality and clinical information of similar allergens | NA | webserver | 2022 | [274] | |
| toxicity | ToxinPred 1.0 | Model developed for predicting and designing toxic peptides, including several major modules: (i) designing peptide, provides multiple methods to select, (ii) batch submission, (iii) Protein scanning, (iv) QMS calculator, and (v) motif scanning | one or more peptide/protein sequence | A list with a statement of toxic or non-toxic, result score, and predicted physiochemical properties (for (i)(ii)(iii)), the mutant positions of peptides (for (i)(ii)), a table of original sequence and QM score ((for (iv)), a quantitative matrix of positions and QM scores (for (v)) | SVM threshold and E-value cut-off for motif-based method should be defined by users, the default status is 0.0 and 10 | webserver | 2013 | [225] |
| ToxinPred2.0 | a protein toxicity predictor, including four major modules: (i) prediction using two models, (ii) motif scan, (iii) BLAST search and (iv) Download | one or more protein sequences | a statement of toxic or non-toxic, result scores of each method (for (i)), similar sequence information from the database (for (ii)(iii)) | (i) default threshold is 0.6, (iii) default E-value is 10e-6, both values could be changed | webserver stand-alone | 2022 | [236] | |
| ToxinPred3.0 | a peptide toxicity predictor, including five major modules: (i) prediction, using models of ET/DL based or hybrid approaches, (ii) protein scanning, (iii) motif scan, (iv) BLAST search, and (v) Download | one or more protein sequences | a statement of toxic or non-toxic, result scores of each method and precision value (for (i)(ii)(iii)), similar sequence information (for (iv)) | <0.5 means non-toxic of ET/DL based model, <0.38 means non-toxic of hybrid approaches, the E-value is 10e-3, while the values could also be defined by users | webserver standalone pip package | 2023 | [238] | |
| ToxDL | a multi-modal DL-based approach, which could deal with variable-length sequences in input | one or more peptide sequence | The predicted score and status, a proclaimed contribution score for each amino acid, and toxic domains detected by InterProScan | >0.5, non-toxic | webserver source code | 2021 | [239] | |
| CSM-Toxin | an in-silico protein toxicity classifier using natural languages model | one or more protein sequences | a table includes predictions for each protein sequence with general physicochemical details | NA | webserver source code | 2023 | [240] | |
| ATSE | a peptide toxicity predictor based on DL model exploiting structural and evolutionary information | peptide sequences | a toxicity probability of a given peptide | NA | webserver | 2021 | [241] | |
| ToxIBTL | a peptide and protein toxicity predictor based on DL framework by utilizing the information bottleneck principle and transfer learning | Protein/peptide sequence | a toxicity probability | <0.5, non-toxic | webserver source code | 2022 | [242] | |
| Protein-peptide docking | pyDockWEB | rigid-body docking program using electrostatics and desolvation scoring | 3D structures of two interacting proteins | a gzip compressed tar archive containing structure files and process files of generated docking model | NA | webserver | 2007 | [249] |
| ZDOCK | rigid-body docking program using a combination of shape complementarity, electrostatics and statistical potential terms for scoring | two structures to be docked | 3D structures of generated complex models and the center-of-mass positions of ligands, download link of each predicted model | NA | webserver | 2014 | [248] | |
| GalaxyPepDock | templated-based docking method based on interaction similarity and energy optimization | protein structure, peptide sequence | A table contains the best 10 generated models with structure, additional information, download links | NA | webserver | 2015 | [192] | |
| Cluspro | rigid-body docking program using four scoring schemes | Two protein/peptide 3D structures | The calculated scores and 3D structures of top 10/20/30 scoring docking models with download link | NA | webserver | 2020 | [250] | |
| Immune Simulation | C-IMMSIM | An agent-based model to simulate the immune system response of mammalian at cellular level after the injection of antigen | one or more protein sequence | Plots relative to the cell count of immune related cells, the predicted outcome of the epitope/peptide | NA | webserver | 2010 | [257, 260] |
Key for feature: ACC: auto cross covariance; ML: machine learn; DL: deep learning; NN: neural network; ET: Extra tree. NA: data not given.
In silico prediction of peptide toxicity
The screening of non-toxic peptides is another important step in the filter of epitope candidates. The computational prediction methods for chemical toxicity and the in silico tools specialized for toxins of certain animal origins have been greatly studied over the years [236], while the attempt of the predictive technologies of peptide toxicity is limited but still pave the way of determining the non-toxic peptide that reduces the number of experiments [237]. ToxinPred [225] server has been extensively adopted by the scientific community for predicting the toxicity of peptides, which is an SVM-based model using features of dipeptide composition and amino acid composition (AAC) to determine the toxicity peptides or non-toxicity peptides. The upgraded version of ToxinPred2.0 [236] was designed for the prediction of protein toxicity, which compensates for the ToxinPred 1.0 length limit on predictable peptides. The newest version 3.0 [238] was proposed in 2023 for predicting peptide toxicity with the upgrade of an algorithm by using a new ML model (extra tree-based) or DL model (ANN–LSTM with fixed sequence length), and the predicted performance of ensemble approaches and solely ML/DL model all achieved a remarkable AUC value.
The field of developing DL-based model is getting more attention since their great performance and the ability to handle complex tasks. Many in silico methods of DL-based have been proposed, including ToxDL [239], CSM-Toxin [240], ATSE [241], and ToxIBTL [242], and the MLP and CNN are the most used algorithms [243]. However, sometimes the performance of DL models is not better than ML models as expected may be the insufficient large dataset and the non-clear mechanistic understanding of the prediction have limited the progress of the model [243, 244]. The above methods (with the exception of ToxinPred) have rare applications in the construction of epitope-based vaccines due to their late development. With the emergence of more epitope-based vaccine research, the methods may be better applied.
The potential of binding to immune receptors
The innate immune receptors are important sensors of the innate immune system, which can recognize and bind to pathogen-associated molecules, and then act as molecular switches to trigger innate immune activation and subsequent adaptive immune responses [245, 246]. TLRs are the main immune receptors in mammals [191], TLRs are widely distributed and can detect a variety of ligands derived from both pathogenic and non-pathogenic microbial infections [245]. The molecular docking of TLRs and vaccine candidates is a pivotal step in validating vaccine effectiveness since successful docking could be seen as a signature of the potential of vaccine construction to trigger human immune response [10]. There are several points should be noted when selecting a docking receptor among the ten known TLRs: could recognize and respond to the pathogen-associated molecular patterns and ligands of specific pathogens, high expression levels of TLR in cells that interact with pathogens, using the extracellular or ectodomains to bind with ligands [187, 247]. There are numerous docking tools for computational simulation of the binding of vaccine constructs and TLRs, classical tools exhibit the effective performance of protein-TLR binding, such as GalaxyPepDock [192], ZDOCK [248], pyDockWEB [249], and Cluspro [250]. Meanwhile, in order to improve the prediction accuracy, a variety of ML\DL algorithms have been introduced into the development of tools [251, 252], and the remarkable and revolutionary performance of AlphaFold3 in predicting the joint structure of complexes has revealed the great potential of AI in the development of molecular docking and modeling structures in the future [253].
Immune response simulation
Computational immune simulation is significantly important in the in silico design of vaccines, the verification of the ability to induce the immune response of vaccine candidates by using in silico methods with accuracy and low computational cost would effectively reduce the trial-and-error cost of experimental work. The techniques of modeling the immune system could be divided into two categories: equation-based and agent-based modeling (ABM) [254]. However, the non-linearities of the immune system make it difficult to get the right model by using equation-based methods [255], thus, the programs are commonly developed by using ABM technology. ABMs are models which observe and describe the characteristics of population by using simple rules to dictate the behavior and interaction patterns of agents at the individual level [256]. C-IMMSIM is the only available simulation program that has been great widely used in the development of multi-epitope vaccines [16]. C-IMMSIM simultaneously simulates three compartments found in mammals: the bone marrow, the thymus, and a tertiary lymphatic organ, the tertiary organ is the place where the interactions among cells and molecules take place and have been described geometrically [257]. The program uses the sequence of antigen protein in FASTA format as the input, administrates vaccine injection following user-defined intervals, and the immune response profiles in the human body of antibodies titers and several immune cells are the final output. The simulation time and targeted people could be changed by adjusting the default parameters [258]. The high consistent of predicted results with real-world animal experimental data has been demonstrated, which greatly indicated the reliability of the C-IMMSIM server and showed the remarkable application of immunoinformatic techniques in vaccine development [259, 260].
Validation of real-world cases
To explore the performance of antigenicity prediction tools in real-world applications, we have collected positive vaccine sequences and constructed negative sequences for validation. The positive multi-epitope vaccines, including SARS-CoV-2 vaccine [183], CVB vaccine [179], MPXV-1-3 [173], LCMV vaccine [175], ChRNV22 [181], MVC [176], and MPXV vaccines [172, 180] have been collected from published literature, all of above illustrated potential immunogenicity through computational verification or experimental verification in their studies. The collected process of negative samples can be seen in the ‘Supplementary Validation of real-world cases’. Here, we selected two widely used and user-friendly tools of VaxiJen [228] and DeepImmuno [261] tools for real-world vaccine validation. As illustrated in Table 17, the predicted score of VaxiJen 2.0 is higher in the vaccine group (average value of 0.6219) than in negative control group (average value of 0.5125), but without statistical significance (P = .14). Though the average scores of both groups are over the threshold, the predicted scores of the vaccine group are well above the threshold, the highest score over twice than the threshold. The result exhibited that although there is no statistical significance between the negative and positive results, the VaxiJen tool can get higher scores for real vaccines. The result may lack of representation since the volume of vaccine sequence datasets is relatively small which may make the tool cannot effectively distinguish negative data from positive data during operation.
Table 17.
Statistical analysis result of VaxiJen 2.0 tool
| VaxiJen 2.0 (threshold = 0.4) | Average score | P value | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| positive | 0.5308 | 0.5076 | 0.6319 | 0.5517 | 0.5940 | 0.6323 | 0.5923 | 0.5606 | 0.4391 | 1.1081 | 0.6219 | 0.1363 |
| negative | 0.5687 | 0.4694 | 0.4855 | 0.3673 | 0.7175 | 0.5662 | 0.4920 | 0.6097 | 0.4210 | 0.4273 | 0.5125 | |
Meanwhile, the results of DeepImmuno illustrated in Table 18, most of the average scores of positive group are higher than that of negative group, statistical significance in four out of 10 tested HLA alleles (P < .05), indicating that this tool may have the potential to distinguish effective immune epitopes (positive results) from ineffective ones (negative results). The performance of this tool in recognizing the immunogenicity of CD8+ epitopes reflects the effectiveness and importance of computational tools in the process of vaccine verification. Meanwhile, it provides a reliable idea for the development of prediction tools for the immunogenicity of CD4+ epitopes and B cell epitopes. According to the performance difference of the two tools, we should be cautious about using computational tools for antigen screening. The antigen candidates of vaccines can be identified efficiently by in silico methods but there may still be FP results, so screened results should be combined with other more reliable experimental methods for comprehensive evaluation. Secondly, both of these tools provide useful ideas for future tool development or improvement, we look forward to more comprehensive and accurate immunogenicity prediction tools in the future.
Table 18.
Statistical analysis result of DeepImmuno tool
| DeepImmuno (threshold = 0.5) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| HLA-A*0201 | HLA-A*1101 | HLA-A*2402 | HLA-B*0702 | HLA-B*0801 | HLA-B*3501 | HLA-B*4001 | HLA-C*0102 | HLA-C*0401 | HLA-C*0702 | ||
| Average score | Positive | 0.5468 | 0.5387 | 0.4449 | 0.5102 | 0.8133 | 0.7225 | 0.8008 | 0.9558 | 0.8402 | 0.8423 |
| negative | 0.5091 | 0.4795 | 0.4191 | 0.4987 | 0.7774 | 0.6778 | 0.7666 | 0.8087 | 0.9379 | 0.8032 | |
| P value | 0.1972 | 0.1625 | 0.2097 | 0.6922 | 0.0136* | 0.0381* | 0.0565 | 3.669E-18* | 8.071E-12* | 0.0364* | |
The role of ai in the development of vaccines
The applications of AI technology have greatly improved the development of vaccines, both ML and DL techniques are widely used to develop computational tools. AI is becoming increasingly relevant for epitope prediction, the development of DL-based methods including DLBEpitope [128], BigMHC [88], MHCflurry [84], and other epitope predictors have greatly improved the efficiency of epitopes screening, tools like Fine-tuned AlphaFold2 and MHCfold [94] can achieve the accurate modeling of pMHC complexes.
Besides the above epitope predictors, the application of AI has also promoted the construction of pipelines for the design and screening of immunogenic antigens for vaccines. NeoDisc [275] is an end-to-end clinical proteogenomic pipeline that integrates various in silico tools for the identification, prediction, and prioritization of immunogenic tumor-specific HLA-I and -II antigens. The DeepNovoAA [276] and pTuneos [277] are also pipelines developed for the identification and design of neoantigens based on DL. The Neo-intline [278] tool is an integrated pipeline to simulate the presentation process of peptides in vivo. The TransPHLA-AOMP [86] is a transformer-based model that is derived from the same team as TransPHLA used for pHLA binding prediction. Based on the predicted results of TransPHLA, the AOMP program can automatically optimize mutated peptides for peptide vaccine design. The above tools demonstrated that AI technology can greatly promote the progress in antigen identification and design and the potential for personalized antigen discovery and neoantigen cancer vaccine design.
Moreover, AI plays an important role in the optimization of vaccines, the LinearDesign [221] tool introduced the classical concept of lattice parsing in computational linguistics to handle the insurmountable computational challenge caused by codon usage and the limited in application of mRNA vaccines caused by mRNA instability and degradation. The application of the tool would shorten the cycle and reduce the costs of the development of vaccines. In the computational verification of vaccine candidates, AI-based tools such as DeepImmuno [261], ProAll-D [271], ToxinPred3.0 [238], CSM-Toxin [240], ATSE [241], and others have covered the antigenicity, allergenicity and toxicity, and other aspects of the validation.
Besides design based on current appeared pathogens or mutations, the development of the EveScape [279] tool also demonstrated the potential of AI in the forecasting of emerging mutant strains future. The EveScape is a flexible framework that can quantify the viral escape potential of mutations at scale and predict probable further mutations. According to the forecasted emerging viruses with pandemic potential, we can assess the ability of developed vaccines to protect against future new viruses and improve or design new vaccines or drugs to achieve early prevention and control. The development of the above tools covers various aspects of vaccine design and demonstrates that the application of AI technology has greatly improved the design of vaccines. The utilization of AI-based tools has simplified the steps, and accelerated the process of vaccine development, and improved the accuracy and efficiency of vaccine design. The requirement for a large number of computing sources and long run times [280], and the need for high quality and large number of training data [281] have limited the development and application of DL-based tools to some extent. The un-interpretable result of prediction is another main challenge [282]. Thus, to promote the development of DL-based methods with great performance, we may need to further explore the high-quality data and the mechanism of tools.
Discussion and perspective
Revolutionary technological advances over the past few decades have greatly changed the form of vaccine development, the usage of effective and reliable bioinformatic tools has replaced the tedious and time-consuming experiment, which greatly improves the application and shortens the development cycle of vaccines. Following the genome era advent, the integration of computational means with biology knowledge is an inevitable tendency to develop medical interventions, the multi-epitope vaccine is a new attractive type, and the rational design of vaccines showed a guarantee to produce a long-term cross-protection and controlled robust immune response, and also could improve side effects to ensure the safety of vaccines in humans. The rational design of multi-epitope vaccines largely depends on the existing databases and immunoinformatics tools, mainly including three aspects: 1) immunogen design, 2) vaccine construction and optimization, and 3) computational verification. In this review, we systemically review the pipeline of computational designing a multi-epitope vaccine, and the current development strategies with commonly used in silico resources including databases and tools of each step have been introduced. Further, we designed three benchmark validations on T-cell epitope predictors, B-cell epitope predictors, and immunogenicity predictors for real-world vaccine sequences. The results of benchmark validation can provide hints to users for in-silico vaccine design and optimization.
The development process of multiple-epitope vaccines is highly procedural and data-driven [283]. The high-quality database, effective computational tools, and standard procedure will greatly promote the development of vaccines. The selection of suitable data is the foundation of further analysis which will decide the accuracy and reliability of the result. Currently, related databases are well established, and most of them contain high-quality experimentally or manually curated data, such as IEDB [20] and AntiJen v2.0 [24]. Meanwhile, there are multiple pan-epitope databases, with only a few for specific pathogens. For example, the HBV and influenza peptides in the IEDB database are relatively small, which cannot be used to generate the pathogen-specific model. This is largely due to the accumulation of experimental data and the updating rate of each database. The update frequency will also affect the data quality, such as the IMGT/mAb-DB database being updated twice per year [22], InnateDB database will weekly updated annotation [43], STCRDab is automatically updated weekly [35], up-to-date data will help researchers to master the latest developments. Besides, the difference in data format and annotation standard will increase the complexity of data processing. As the usage of databases, whether the operation is simple, whether to provide query tools and extra tools, for the integration of the relevant database, whether free and so on will affect the use efficiency of the user. These gaps in databases may be bridged through the development of stricter and standard criteria for data processing and the application of advanced technology. This is one of the important development directions of our future.
Also, data can be considered as the source of vaccine design, and appropriate bioinformatics methods are the tools that influence the success rate. Despite the fact that relevant methods have been constantly introduced in the last decades with several successful cases of in-silico vaccine design, the overall success rate is still not particularly satisfactory. This is partially because the performance of models still needs to be further improved, but more importantly due to the limitations of the computational tools that are designed to solve binary classification, prediction and regression issues cannot deal with the entire complex biological process, especially the response of the immune system to the vaccines. The computational design encompasses the screening of antigens, the prediction of epitopes, the optimization of vaccine construct, the computational verification, and so on, which would involve many tools. The selection of suitable tools is also a challenge. To standardize and simplify the overall process of vaccine design, the development of AI-based pipelines that integrate various tools with great performance is another direction for future study. T cell epitope predictors developed mainly focusing on the mechanism of processing and presentation, but many peptides that could be processed and presented still are not immunogenic [284]. Thus, we should clarify the internal mechanism of being epitopes and combine the other aspects such as the binding stability of peptide and HLA molecule, the secretion of specific cytokines, and the binding of pMHC-TCR to develop new epitope predictors. Currently, tools for predicting epitopes are mainly sequence-based, however, sequences are essentially a one-dimensional abstraction that often fails to encapsulate some high-dimensional information [16], meanwhile, the time-consuming experiments have limited the accumulation of high-quality 3D structures which makes the development of structure-based method lag behind the sequence-based methods. But the occurrence of AlphaFold which has exhibited significant performance in modeling, may accelerate the progress of the prediction and application of conformational epitopes since an accurate computational modeling model could greatly reduce the need of structures in nature. There are several tools based on AlphaFold have been developed [95, 98], but the number is still low, we might need to pay more attention to studying how to apply AlphaFold to epitope-based vaccine developments.
Despite the great advantages of currently available tools, there is still room for improvement to guide future vaccine design. Current SOTA methods can be divided into webserver-based tools and GitHub code- based tools according to the usage for users. New tools such as MixMHCpred [90], BigMHC [88], and ConvNeXt-MHC [91] can only be applied by GitHub code whether than the user-friendly webserver. Moreover, those tools require high computational resources, which may not be convenient enough for researchers without enough background knowledge and resources. In this regard, webserver-based tools such as ImmuneApp [92], NetMHCpan [75], HLAthena [80], and TransPHLA [86] are more friendly for researchers. Meanwhile, webserver-based tools limited the submission maximum. For example, ImmuneApp, NetMHCpan-4.1, and HLAthena can only submit a limited number of HLA alleles and peptides, which makes it difficult for large-scale screening.
More importantly, besides those tools for predicting the interactions between specific molecules such as peptide–MHC, pMHC-TCR, and epitope-BCR, there is an urgent need to develop integrated epitope prediction processes, which can fill in the gaps of high prediction performance of in-silico models and low clinical application successes. In the field of tumor neoantigen vaccine design, pipelines rather than individual tools become to accumulate in recent years, which help to accelerate the discovery of neoantigens. Typical works including muller’s approach [285] and neo-intline [278] considered multiple or even dozens of different in vivo processing steps to incorporate the pipeline that ultimately allows for the direct screening of suitable neoantigen design. This idea can also be applied to the development of in-silico approaches for pathogen-related vaccine design in the future.
Secondly, in the selection of linkers and adjuvants during the construction of multi-epitope vaccines, there are only a few relevant information resources, and the design tools of vaccine adjuvants are still needed. Meanwhile, since the current selection usually references previous related literature, the scope of application, advantages, and disadvantages of currently commonly used adjuvants to be obtained through a literature survey, maybe could provide comprehensive information for future vaccine design and selection. The use of nanotechnology can offer a self-adjuvanting delivery system and reduce the toxicity of currently studied experimental adjuvants in the development of vaccines, the application of antigen in nanoparticle form is another area worth discussing [7]. Since advanced technology has revolutionized vaccine development, there are various studies about the computational design of vaccines every year, but only a small portion of a huge number of designed vaccines has the potential to apply and is worth validating by experiments. The in-silico evaluation platform should be constructed and used as the last step to assess the effectiveness of studies of the development of vaccines. In addition to the above-discussed computation verification terms factors, others like glycosylation site, physicochemical properties, etc. have also been studied in some research. Therefore, constructing a complete and rigorous standard process of evaluation could ensure the feasibility and evaluation efficiency of vaccine evaluation data. This review has systemically summarized the strategy of rational design of multi-epitope vaccines in silico and introduced the widely used bioinformatics tools and databases on the basis of the relative literature. We hope this review can help the researchers identify the basic steps of developing a multi-epitope vaccine and select suitable tools for each step quickly, provide some help for future research.
Key Points
The multi-epitope vaccine is a promising strategy for the prophylactic and therapeutic against pathogens infection, the pipeline of computational designing a multi-epitope vaccine can be divided into four critical steps.
The current development strategies with commonly used in silico resources including databases and tools of each step have been concluded which can help researchers quickly grasp the basic points of developing multi-epitope vaccines and available resources.
The description of tools used in each process containing claimed performance, features, dataset, and input/output information have been summarized, which can used as a reference for researchers to select suitable tools of each process for designing a multi-epitope vaccine.
This study provides three benchmark validations on T-cell epitope predictors, B-cell epitope predictors, and immunogenicity predictors for real-world vaccine sequences, the results can provide hints to users for in-silico vaccine design and optimization.
Supplementary Material
Acknowledgements
This work is supported by the Medical Science Data Center of Fudan University.
Contributor Information
Yiwen Wei, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.
Tianyi Qiu, Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute; Shanghai Institute of Infectious Disease and Biosecurity, Shanghai Medical College, Fudan University, No. 180, Fenglin Road, Xuhui Destrict, Shanghai 200032, China.
Yisi Ai, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.
Yuxi Zhang, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.
Junting Xie, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.
Dong Zhang, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.
Xiaochuan Luo, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.
Xiulan Sun, State Key Laboratory of Food Science and Technology, School of Food Science and Technology, National Engineering Research Center for Functional Foods, Synergetic Innovation Center of Food Safety and Nutrition, Jiangnan University, Lihu Avenue 1800, Wuxi, Jiangsu 214122, China.
Xin Wang, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China; Shanghai Collaborative Innovation Center of Energy Therapy for Tumors, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.
Jingxuan Qiu, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China; Shanghai Collaborative Innovation Center of Energy Therapy for Tumors, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.
Authors’ contributions
YWW: Conceptualization, Writing—original draft, Investigation, Formal analysis, Data curation. TYQ: Conceptualization, Writing—review & editing, Project administration, Supervision, Resources. YSA, YXZ, JTX, DZ, and XCL: Resources, Data curation, Validation. XLS and XW: Writing—review & editing, Supervision. JXQ: Writing—review & editing, Resources, Conceptualization.
Conflict of interest: None declared.
Funding
This work was supported by grants from the National Key Research and Development Program of China (2022YFF1103101), the National Natural Science Foundation of China (32370697).
Data availability
For access to any research-related data, kindly reach out to the corresponding author.
References
- 1. Zimmermann P, Curtis N. Factors that influence the immune response to vaccination. Clin Microbiol Rev 2019;32:e00084–18. 10.1128/cmr.00084-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. van der Kooij RS, Steendam R, Zuidema J. et al. Microfluidic production of polymeric Core-Shell microspheres for the delayed pulsatile release of bovine serum albumin as a model antigen. Pharmaceutics 2021;13:1854. 10.3390/pharmaceutics13111854 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Delany I, Rappuoli R, De Gregorio E. Vaccines for the 21st century. EMBO Mol Med 2014;6:708–20. 10.1002/emmm.201403876 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Dolgin E. How personalized cancer vaccines could keep tumours from coming back. NewsFeature. Nature 2024;630:290–2. 10.1038/d41586-024-01717-x [DOI] [PubMed] [Google Scholar]
- 5. Pollard AJ, Bijker EM. A guide to vaccinology: from basic principles to new developments. Nat Rev Immunol 2020;21:83–100. 10.1038/s41577-020-00479-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Excler J-L, Saville M, Berkley S. et al. Vaccine development for emerging infectious diseases. Nat Med 2021;27:591–600. 10.1038/s41591-021-01301-0 [DOI] [PubMed] [Google Scholar]
- 7. Skwarczynski M, Toth I. Peptide-based synthetic vaccines. Chem Sci 2016;7:842–54. 10.1039/C5SC03892H [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Kalita P, Tripathi T. Methodological advances in the design of peptide-based vaccines. Drug Discov Today 2022;27:1367–80. 10.1016/j.drudis.2022.03.004 [DOI] [PubMed] [Google Scholar]
- 9. Bahrami AA, Payandeh Z, Khalili S. et al. Immunoinformatics: In Silico approaches and computational design of a multi-epitope, immunogenic protein, international reviews of immunology. Int Rev Immunol 2019;38:307–22. 10.1080/08830185.2019.1657426 [DOI] [PubMed] [Google Scholar]
- 10. Yurina V, Adianingsih OR. Predicting epitopes for vaccine development using bioinformatics tools. Therapeutic Advances in Vaccines and Immunotherapy 2022;10:25151355221100218. 10.1177/25151355221100218 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Zhang L. Multi-epitope vaccines: a promising strategy against tumors and viral infections. Cell Mol Immunol 2018;15:182–4. 10.1038/cmi.2017.92 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Parvizpour S, Pourseif MM, Razmara J. et al. Epitope-based vaccine design: a comprehensive overview of bioinformatics approaches. Drug Discov Today 2020;25:1034–42. 10.1016/j.drudis.2020.03.006 [DOI] [PubMed] [Google Scholar]
- 13. Cai X, Li JJ, Liu T. et al. Infectious disease mRNA vaccines and a review on epitope prediction for vaccine design. Brief Funct Genomics 2021;20:289–303. 10.1093/bfgp/elab027 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Rappuoli R. Reverse vaccinology. Curr Opin Microbiol 2000;3:445–50. 10.1016/S1369-5274(00)00119-3 [DOI] [PubMed] [Google Scholar]
- 15. Rappuoli R, Bottomley MJ, D’Oro U. et al. Reverse vaccinology 2.0: human immunology instructs vaccine antigen design. Journal of Experimental Medicine 2016;213:469–81. 10.1084/jem.20151960 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Goodswen SJ, Kennedy PJ, Ellis JT. A guide to current methodology and usage of reverse vaccinology towards in silico vaccine discovery. FEMS Microbiol Rev 2023;47:fuad004. 10.1093/femsre/fuad004 [DOI] [PubMed] [Google Scholar]
- 17. Woolums AR, Swiderski C. New approaches to vaccinology made possible by advances in next generation sequencing. Bioinformatics and Protein Modeling, Current Issues in Molecular Biology 2021;42:605–34. 10.21775/cimb.042.605 [DOI] [PubMed] [Google Scholar]
- 18. Hegde NR, Gauthami S, Sampath Kumar HM. et al. The use of databases, data mining and immunoinformatics in vaccinology: Where are we? Expert Opin Drug Discovery 2017;13:117–30. 10.1080/17460441.2018.1413088 [DOI] [PubMed] [Google Scholar]
- 19. He Y, Racz R, Sayers S. et al. Updates on the web-based VIOLIN vaccine database and analysis system. Nucleic Acids Res 2014;42:D1124–32. 10.1093/nar/gkt1133 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Vita R, Mahajan S, Overton JA. et al. The immune epitope database (IEDB): 2018 update. Nucleic Acids Res 2019;47:D339–43. 10.1093/nar/gky1006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Koşaloğlu-Yalçın Z, Blazeska N, Vita R. et al. The cancer epitope database and analysis resource (CEDAR). Nucleic Acids Res 2023;51:D845–52. 10.1093/nar/gkac902 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Manso T, Folch G, Giudicelli V. et al. IMGT® databases, related tools and web resources through three main axes of research and development. Nucleic Acids Res 2022;50:D1262–72. 10.1093/nar/gkab1136 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Ansari HR, Flower DR, Raghava GPS. AntigenDB: an immunoinformatics database of pathogen antigens. Nucleic Acids Res 2010;38:D847–53. 10.1093/nar/gkp830 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Toseland CP, Clayton DJ, McSparron H. et al. AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data. Immunome Research 2005;1:4. 10.1186/1745-7580-1-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Zhang W, Wang L, Liu K. et al. PIRD: Pan immune repertoire database. Bioinformatics 2020;36:897–903. 10.1093/bioinformatics/btz614 [DOI] [PubMed] [Google Scholar]
- 26. Kulkarni-Kale U, Raskar-Renuse S, Natekar-Kalantre G. et al. Antigen-Antibody Interaction Database (AgAbDb): a compendium of antigen-antibody interactions. In: De RK, Tomar N (eds.), Immunoinformatics. Methods Mol Biol Humana Press, New York, NY, 2014;1184:149–64. 10.1007/978-1-4939-1115-8_8 [DOI] [PubMed] [Google Scholar]
- 27. Saha S, Bhasin M, Raghava GPS. Bcipep: a database of B-cell epitopes. BMC Genomics 2005;6:79. 10.1186/1471-2164-6-79 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Reche PA, Zhang H, Glutting J-P. et al. EPIMHC: a curated database of MHC-binding peptides for customized computational vaccinology. Bioinformatics 2005;21:2140–1. 10.1093/bioinformatics/bti269 [DOI] [PubMed] [Google Scholar]
- 29. Zhang G, Chitkushev L, Olsen LR. et al. TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes. BMC bioinformatics 2021;22:40. 10.1186/s12859-021-03962-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Yang B, Sayers S, Xiang Z. et al. Protegen: a web-based protective antigen database and analysis system. Nucleic Acids Res 2010;39:D1073–8. 10.1093/nar/gkq944 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Schlessinger A, Ofran Y, Yachdav G. et al. Epitome: database of structure-inferred antigenic epitopes. Nucleic Acids Res 2006;34:D777–80. 10.1093/nar/gkj053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Huang J, Honda W. CED: a conformational epitope database. BMC Immunol 2006;7:7. 10.1186/1471-2172-7-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Sharma OP, Das AA, Krishna R. et al. Structural epitope database (SEDB): a web-based database for the epitope, and its intermolecular interaction along with the tertiary structure information. Journal of Proteomics & Bioinformatics 2012;5:1–6. 10.4172/jpb.1000217 [DOI] [Google Scholar]
- 34. Borrman T, Cimons J, Cosiano M. et al. ATLAS: a database linking binding affinities with structures for wild-type and mutant TCR-pMHC complexes, proteins: atructure. Function, and Bioinformatics 2017;85:908–16. 10.1002/prot.25260 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Leem J, de Oliveira SHP, Krawczyk K. et al. STCRDab: the structural T-cell receptor database. Nucleic Acids Res 2018;46:D406–12. 10.1093/nar/gkx971 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Chen S-Y, Yue T, Lei Q. et al. TCRdb: a comprehensive database for T-cell receptor sequences with powerful search function. Nucleic Acids Res 2020;49:D468–74. 10.1093/nar/gkaa796 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Rammensee H, Bachmann J, Emmerich NP. et al. SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics 1999;50:213–9. 10.1007/s002510050595 [DOI] [PubMed] [Google Scholar]
- 38. Tickotsky N, Sagiv T, Prilusky J. et al. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics 2017;33:2924–9. 10.1093/bioinformatics/btx286 [DOI] [PubMed] [Google Scholar]
- 39. Shugay M, Bagaev DV, Zvyagin IV. et al. VDJdb: a curated database of T-cell receptor sequences with known antigen specificity. Nucleic Acids Res 2018;46:D419–27. 10.1093/nar/gkx760 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Khan JM, Cheruku HR, Tong JC. et al. MPID-T2: a database for sequence–structure–function analyses of pMHC and TR/pMHC structures. Bioinformatics 2011;27:1192–3. 10.1093/bioinformatics/btr104 [DOI] [PubMed] [Google Scholar]
- 41. Bhasin M, Singh H, Raghava GPS. MHCBN: a comprehensive database of MHC binding and non-binding peptides. Bioinformatics 2003;19:665–6. 10.1093/bioinformatics/btg055 [DOI] [PubMed] [Google Scholar]
- 42. Kaur D, Patiyal S, Sharma N. et al. PRRDB 2.0: a comprehensive database of pattern-recognition receptors and their ligands. Database 2019;2019:baz076. 10.1093/database/baz076 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Breuer K, Foroushani AK, Laird MR. et al. InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation. Nucleic Acids Res 2013;41:D1228–33. 10.1093/nar/gks1147 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Gonzalez-Galarza Faviel F, McCabe A, Santos Eduardo JM. et al. Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data and new query tools. Nucleic Acids Res 2020;48:D783–8. 10.1093/nar/gkz1029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Luo H, Lin Y, Liu T. et al. DEG 15, an update of the database of essential genes that includes built-in analysis tools. Nucleic Acids Res 2021;49:D677–86. 10.1093/nar/gkaa917 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Pickett BE, Sadat EL, Zhang Y. et al. ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic Acids Res 2012;40:D593–8. 10.1093/nar/gkr859 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Chaudhuri R, Ansari FA, Raghunandanan MV. et al. FungalRV: adhesin prediction and immunoinformatics portal for human fungal pathogens. BMC Genomics 2011;12:192. 10.1186/1471-2164-12-192 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Zhang GL, Riemer AB, Keskin DB. et al. HPVdb: a data mining system for knowledge discovery in human papillomavirus with applications in T cell immunology and vaccinology. Database 2014;2014:bau031. 10.1093/database/bau031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Xie R, Cao B, Wu Z. et al. dbEBV: a database of Epstein-Barr virus variants and their correlations with human health. Comput Struct Biotechnol J 2024;23:2076–82. 10.1016/j.csbj.2024.04.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Zhang GL, Chitkushev L, Keskin DB. et al. EBVdb: a data mining system for knowledge discovery in Epstein-Barr virus with applications in T cell immunology and vaccinology. In: 2015 International Workshop on Artificial Immune Systems (AIS) 2015;1–8. 10.1109/AISW.2015.7469232 [DOI] [Google Scholar]
- 51. Olsen LR, Zhang GL, Reinherz EL. et al. FLAVIdB: a data mining system for knowledge discovery in flaviviruses with direct applications in immunology and vaccinology. Immunome research 2011;7:2. [PMC free article] [PubMed] [Google Scholar]
- 52. Simon C, Kudahl UJ, Sun J. et al. FluKB: a knowledge-based system for influenza vaccine target discovery and analysis of the immunological properties of influenza viruses. J Immunol Res 2015;2015:380975. 10.1155/2015/380975 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Hulo C, de Castro E, Masson P. et al. ViralZone: a knowledge resource to understand virus diversity. Nucleic Acids Res 2011;39:D576–82. 10.1093/nar/gkq901 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Ivanciuc O, Schein CH, Braun W. SDAP: database and computational tools for allergenic proteins. Nucleic Acids Res 2003;31:359–62. 10.1093/nar/gkg010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Sánchez-Trincado JL, Gomez-Perosanz M, PAJJOIR R. Fundamentals and methods for T- and B-cell epitope prediction. J Immunol Res 2017;2017:2680160. 10.1155/2017/2680160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Bhasin M, Raghava GPS. Pcleavage: an SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic sequences. Nucleic Acids Res 2005;33:W202–7. 10.1093/nar/gki587 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Lam TH, Mamitsuka H, Ren EC. et al. TAP hunter: a SVM-based system for predicting TAP ligands using local description of amino acid sequence. Immunome Research 2010;6:S6. 10.1186/1745-7580-6-S1-S6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Keşmir C, Nussbaum AK, Schild H. et al. Prediction of proteasome cleavage motifs by neural networks. Protein Engineering, Design and Selection 2002;15:287–96. 10.1093/protein/15.4.287 [DOI] [PubMed] [Google Scholar]
- 59. Bhasin M, Raghava GPS. Analysis and prediction of affinity of TAP binding peptides using cascade SVM. Protein Sci 2004;13:596–607. 10.1110/ps.03373104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Singh H, Raghava GPS. ProPred1: prediction of promiscuous MHC class-I binding sites. Bioinformatics (Oxford, England) 2003;19:1009–14. 10.1093/bioinformatics/btg108 [DOI] [PubMed] [Google Scholar]
- 61. Reche PA, Glutting J-P, Zhang H. et al. Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles. Immunogenetics 2004;56:405–19. 10.1007/s00251-004-0709-7 [DOI] [PubMed] [Google Scholar]
- 62. Nielsen M, Lundegaard C, Lund O. et al. The role of the proteasome in generating cytotoxic T-cell epitopes: insights obtained from improved predictions of proteasomal cleavage. Immunogenetics 2005;57:33–41. 10.1007/s00251-005-0781-7 [DOI] [PubMed] [Google Scholar]
- 63. Larsen MV, Lundegaard C, Lamberth K. et al. Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. BMC Bioinformatics 2007;8:424. 10.1186/1471-2105-8-424 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Diez-Rivero CM, Chenlo B, Zuluaga P. et al. Quantitative modeling of peptide binding to TAP using support vector machine, proteins: structure. Function, and Bioinformatics 2010;78:63–72. 10.1002/prot.22535 [DOI] [PubMed] [Google Scholar]
- 65. Parker KC, Bednarek MA, Coligan JE. Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. The Journal of Immunology 1994;152:163–75. 10.4049/jimmunol.152.1.163 [DOI] [PubMed] [Google Scholar]
- 66. D'Amaro J, Houbiers JGA, Drijfhout JW. et al. A computer program for predicting possible cytotoxic T lymphocyte epitopes based on HLA class I peptide-binding motifs. Hum Immunol 1995;43:13–8. 10.1016/0198-8859(94)00153-H [DOI] [PubMed] [Google Scholar]
- 67. Sturniolo T, Bono E, Ding J. et al. Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat Biotechnol 1999;17:555–61. 10.1038/9858 [DOI] [PubMed] [Google Scholar]
- 68. Dönnes P, Elofsson A. Prediction of MHC class I binding peptides, using SVMHC. BMC Bioinformatics 2002;3:25. 10.1186/1471-2105-3-25 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Dönnes P, Kohlbacher O. SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res 2006;34:W194–7. 10.1093/nar/gkl284 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Udaka K, Mamitsuka H, Nakaseko Y. et al. Prediction of MHC class I binding peptides by a query learning algorithm based on hidden Markov models. Journal of Biological Physics 2002;28:183–94. 10.1023/A:1019931731519 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Buus S, Lauemøller SL, Worning P. et al. Sensitive quantitative predictions of peptide-MHC binding by a ‘query by committee’ artificial neural network approach. Tissue Antigens 2003;62:378–84. 10.1034/j.1399-0039.2003.00112.x [DOI] [PubMed] [Google Scholar]
- 72. Reche PA, Reinherz EL. PEPVAC: a web server for multi-epitope vaccine development based on the prediction of supertypic MHC ligands. Nucleic Acids Res 2005;33:W138–42. 10.1093/nar/gki357 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Zhang GL, Khan AM, Srinivasan KN. et al. MULTIPRED: a computational system for prediction of promiscuous HLA binding peptides. Nucleic Acids Res 2005;33:W172–9. 10.1093/nar/gki452 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Nielsen M, Lundegaard C, Lund O. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics 2007;8:238. 10.1186/1471-2105-8-238 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Nielsen M, Lundegaard C, Blicher T. et al. NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PloS One 2007;2:e796. 10.1371/journal.pone.0000796 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Nielsen M, Lundegaard C, Blicher T. et al. Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan. PLoS Comput Biol 2008;4:e1000107. 10.1371/journal.pcbi.1000107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77. Zhang GL, DeLuca DS, Keskin DB. et al. MULTIPRED2: a computational system for large-scale identification of peptides predicted to bind to HLA supertypes and alleles. J Immunol Methods 2011;374:53–61. 10.1016/j.jim.2010.11.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Andreatta M, Nielsen M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics (Oxford, England) 2016;32:511–7. 10.1093/bioinformatics/btv639 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Bassani-Sternberg M, Chong C, Guillaume P. et al. Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity. PLoS Comput Biol 2017;13:e1005725. 10.1371/journal.pcbi.1005725 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Sarkizova S, Klaeger S, Le PM. et al. A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nat Biotechnol 2019;38:199–209. 10.1038/s41587-019-0322-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Racle J, Michaux J, Rockinger GA. et al. Robust prediction of HLA class II epitopes by deep motif deconvolution of immunopeptidomes. Nat Biotechnol 2019;37:1283–6. 10.1038/s41587-019-0289-6 [DOI] [PubMed] [Google Scholar]
- 82. Reynisson B, Alvarez B, Paul S. et al. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res 2020;48:W449–54. 10.1093/nar/gkaa379 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83. Reynisson B, Barra C, Kaabinejadian S. et al. Improved prediction of MHC II antigen presentation through integration and motif deconvolution of mass spectrometry MHC eluted ligand data. J Proteome Res 2020;19:2304–15. 10.1021/acs.jproteome.9b00874 [DOI] [PubMed] [Google Scholar]
- 84. O’Donnell TJ, Rubinsteyn A, Laserson U. MHCflurry 2.0: improved Pan-Allele prediction of MHC class I-presented peptides by incorporating antigen processing. Cell Systems 2020;11: 42–48.e47. 10.1016/j.cels.2020.06.010 [DOI] [PubMed] [Google Scholar]
- 85. Bravi B, Tubiana J, Cocco S. et al. RBM-MHC: a semi-supervised machine-learning method for sample-specific prediction of antigen presentation by HLA-I alleles. Cell Systems 2021;12:195–202.e199. 10.1016/j.cels.2020.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86. Chu Y, Zhang Y, Wang Q. et al. A transformer-based model to predict peptide–HLA class I binding and optimize mutated peptides for vaccine design. Nature Machine Intelligence 2022;4:300–11. 10.1038/s42256-022-00459-7 [DOI] [Google Scholar]
- 87. Racle J, Guillaume P, Schmidt J. et al. Machine learning predictions of MHC-II specificities reveal alternative binding mode of class II epitopes. Immunity 2023;56:1359–1375.e1313. 10.1016/j.immuni.2023.03.009 [DOI] [PubMed] [Google Scholar]
- 88. Albert BA, Yang Y, Shao XM. et al. Deep neural networks predict class I major histocompatibility complex epitope presentation and transfer learn neoepitope immunogenicity. Nature Machine Intelligence 2023;5:861–72. 10.1038/s42256-023-00694-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89. Wang M, Lei C, Wang J. et al. TripHLApan: predicting HLA molecules binding peptides based on triple coding matrix and transfer learning. Brief Bioinform 2024;25:bbae154. 10.1093/bib/bbae154 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90. Tadros DM, Racle J, Gfeller D. Predicting MHC-I ligands across alleles and species: how far can we go? bioRxiv 2024;593183. 10.1101/2024.05.08.593183 [DOI] [Google Scholar]
- 91. Zhang L, Song W, Zhu T. et al. ConvNeXt-MHC: improving MHC–peptide affinity prediction by structure-derived degenerate coding and the ConvNeXt model. Brief Bioinform 2024;25:bbae133. 10.1093/bib/bbae133 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92. Xu H, Hu R, Dong X. et al. ImmuneApp for HLA-I epitope prediction and immunopeptidome analysis. Nat Commun 2024;15:8926. 10.1038/s41467-024-53296-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Atanasova M, Patronov A, Dimitrov I. et al. EpiDOCK: a molecular docking-based tool for MHC class II binding prediction. Protein Engineering, Design and Selection 2013;26:631–4. 10.1093/protein/gzt018 [DOI] [PubMed] [Google Scholar]
- 94. Aronson A, Hochner T, Cohen T. et al. Structure modeling and specificity of peptide-MHC class I interactions using geometric deep learning. bioRxiv 2022;520566. 10.1101/2022.12.15.520566 [DOI] [Google Scholar]
- 95. Motmaen A, Dauparas J, Baek M. et al. Peptide-binding specificity prediction using fine-tuned protein structure prediction networks. Proc Natl Acad Sci U S A 2023;120:e2216697120. 10.1073/pnas.2216697120 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96. Jurtz VI, Jessen LE, Bentzen AK. et al. NetTCR: sequence-based prediction of TCR binding to peptide-MHC complexes using convolutional neural networks. bioRxiv 2018;433706. 10.1101/433706 [DOI] [Google Scholar]
- 97. Moris P, De Pauw J, Postovskaya A. et al. Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification. Brief Bioinform 2021;22:bbaa318. 10.1093/bib/bbaa318 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98. Bradley P. Structure-based prediction of T cell receptor: peptide-MHC interactions. Elife 2023;12:e82813. 10.7554/eLife.82813 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99. Lin X, George JT, Schafer NP. et al. Rapid assessment of T-cell receptor specificity of the immune repertoire. Nature Computational Science 2021;1:362–73. 10.1038/s43588-021-00076-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100. Jensen MF, Nielsen M. Enhancing TCR specificity predictions by combined pan- and peptide-specific training, loss-scaling, and sequence similarity integration. Elife 2024;12:RP93934. 10.7554/eLife.93934 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101. Ji H, Wang X-X, Zhang Q. et al. Predicting TCR sequences for unseen antigen epitopes using structural and sequence features. Brief Bioinform 2024;25:bbae210. 10.1093/bib/bbae210 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102. Bhasin M, Raghava GPS. Prediction of CTL epitopes using QM, SVM and ANN techniques. Vaccine 2004;22:3195–204. 10.1016/j.vaccine.2004.02.005 [DOI] [PubMed] [Google Scholar]
- 103. Dhanda SK, Gupta S, Vir P. et al. Prediction of IL4 inducing peptides. J Immunol Res 2013;2013:263952. 10.1155/2013/263952 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104. Dhanda SK, Vir P, Raghava GPS. Designing of interferon-gamma inducing MHC class-II binders. Biol Direct 2013;8:30. 10.1186/1745-6150-8-30 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105. Dhanda SK, Karosiene E, Edwards L. et al. Predicting HLA CD4 immunogenicity in human populations. Front Immunol 2018;9:1369. 10.3389/fimmu.2018.01369 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106. Kotsias F, Cebrian I, Alloatti A. Antigen processing and presentation. In: Lhuillier C., Galluzzi L. (eds). International Review of Cell and Molecular Biology. Academic Press, 2019;348:69–121. 10.1016/bs.ircmb.2019.07.005 [DOI] [PubMed] [Google Scholar]
- 107. Backert L, Kohlbacher O. Immunoinformatics and epitope prediction in the age of genomic medicine. Genome Med 2015;7:119. 10.1186/s13073-015-0245-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108. Peters B, Nielsen M, Sette A. T cell epitope predictions. Annu Rev Immunol 2020;38:123–45. 10.1146/annurev-immunol-082119-124838 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109. Fisch A, Reynisson B, Benedictus L. et al. Integral use of Immunopeptidomics and Immunoinformatics for the characterization of antigen presentation and rational identification of BoLA-DR–presented peptides and epitopes. The Journal of Immunology 2021;206:2489–97. 10.4049/jimmunol.2001409 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110. Nielsen M, Andreatta M, Peters B. et al. Immunoinformatics: predicting peptide-MHC binding. Annual review of biomedical data science 2020;3:191–215. 10.1146/annurev-biodatasci-021920-100259 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111. Sette A, Sidney J. Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism. Immunogenetics 1999;50:201–12. 10.1007/s002510050594 [DOI] [PubMed] [Google Scholar]
- 112. Perez MAS, Cuendet MA, Röhrig UF. et al. Structural prediction of peptide–MHC binding modes. In: Simonson T (ed.), Computational Peptide Science: Methods and Protocols, pp. 245–82. Springer US: New York, NY, 2022. [DOI] [PubMed] [Google Scholar]
- 113. Antunes AD, Abella RJ, Devaurs D. et al. Structure-based methods for binding mode and binding affinity prediction for peptide-MHC complexes. Curr Top Med Chem 2018;18:2239–55. 10.2174/1568026619666181224101744 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114. Parizi FM, Marzella DF, Ramakrishnan G. et al. PANDORA v2.0: benchmarking peptide-MHC II models and software improvements. Front Immunol 2023;14:1285899. 10.3389/fimmu.2023.1285899 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 115. Pellequer† JL, Westhof E. PREDITOP: a program for antigenicity prediction. J Mol Graph 1993;11:204–10. 10.1016/0263-7855(93)80074-2 [DOI] [PubMed] [Google Scholar]
- 116. Alix AJP. Predictive estimation of protein linear epitopes by using the program PEOPLE. Vaccine 1999;18:311–4. 10.1016/S0264-410X(99)00329-1 [DOI] [PubMed] [Google Scholar]
- 117. Kumar N, Bajiya N, Patiyal S. et al. Multi-perspectives and challenges in identifying B-cell epitopes. Protein Sci 2023;32:e4785. 10.1002/pro.4785 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 118. Saha S, Raghava GPS. BcePred: Prediction of continuous B-cell epitopes in antigenic sequences using Physico-chemical properties. In: Nicosia G, Cutello V, Bentley PJ. et al. (eds.), Artificial Immune Systems, pp. 197–204. Springer Berlin Heidelberg: Berlin, Heidelberg, 2004. [Google Scholar]
- 119. Saha S, Raghava GPS. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network, proteins: structure. Function, and Bioinformatics 2006;65:40–8. 10.1002/prot.21078 [DOI] [PubMed] [Google Scholar]
- 120. Larsen JEP, Lund O, Nielsen M. Improved method for predicting linear B-cell epitopes. Immunome Research 2006;2:2. 10.1186/1745-7580-2-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 121. Saha S, Raghava GPS. AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res 2006;34:W202–9. 10.1093/nar/gkl343 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 122. El-Manzalawy Y, Dobbs D, Honavar V. Predicting linear B-cell epitopes using string kernels. J Mol Recognit 2008;21:243–55. 10.1002/jmr.893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 123. Gupta S, Ansari HR, Gautam A. et al. Identification of B-cell epitopes in an antigen for inducing specific class of antibodies. Biol Direct 2013;8:27. 10.1186/1745-6150-8-27 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 124. Saravanan V, Gautham N. BCIgEPRED—a dual-layer approach for predicting linear IgE epitopes. Mol Biol 2018;52:285–93. 10.1134/S0026893318020127 [DOI] [PubMed] [Google Scholar]
- 125. Manavalan B, Govindaraj RG, Shin TH. et al. iBCE-EL: a new ensemble learning framework for improved linear B-cell epitope prediction. Front Immunol 2018;9:1695. 10.3389/fimmu.2018.01695 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 126. Dall’ Antonia F, Keller W. SPADE web service for prediction of allergen IgE epitopes. Nucleic Acids Res 2019;47:W496–501. 10.1093/nar/gkz331 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 127. Sharma N, Patiyal S, Dhall A. et al. AlgPred 2.0: an improved method for predicting allergenic proteins and mapping of IgE epitopes. Brief Bioinform 2021;22:bbaa294. 10.1093/bib/bbaa294 [DOI] [PubMed] [Google Scholar]
- 128. Liu T, Shi K, Li W. Deep learning methods improve linear B-cell epitope prediction. BioData Mining 2020;13:1. 10.1186/s13040-020-00211-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129. Kadam K, Peerzada N, Karbhal R. et al. Antibody class(es) predictor for epitopes (AbCPE): a multi-label classification algorithm. Frontiers in Bioinformatics 2021;1:709951. 10.3389/fbinf.2021.709951 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 130. Clifford JN, Høie MH, Deleuran S. et al. BepiPred-3.0: improved B-cell epitope prediction using protein language models. Protein Sci 2022;31:e4497. 10.1002/pro.4497 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131. Qi Y, Zheng P, GJFIM H. DeepLBCEPred: a Bi-LSTM and multi-scale CNN-based deep learning method for predicting linear B-cell epitopes. Front Microbiol 2023;14:1117027. 10.3389/fmicb.2023.1117027 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132. da Silva BM, Ascher DB, Pires DEV. epitope1D: accurate taxonomy-aware B-cell linear epitope prediction. Brief Bioinform 2023;24:bbad114. 10.1093/bib/bbad114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 133. Kulkarni-Kale U, Bhosle S, Kolaskar AS. CEP: a conformational epitope prediction server. Nucleic Acids Res 2005;33:W168–71. 10.1093/nar/gki460 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134. Haste Andersen P, Nielsen M, Lund O. Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci 2006;15:2558–67. 10.1110/ps.062405906 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135. Ponomarenko J, Bui H-H, Li W. et al. ElliPro: a new structure-based tool for the prediction of antibody epitopes. BMC Bioinformatics 2008;9:514. 10.1186/1471-2105-9-514 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 136. Sun J, Wu D, Xu T. et al. SEPPA: a computational server for spatial epitope prediction of protein antigens. Nucleic Acids Res 2009;37:W612–6. 10.1093/nar/gkp417 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 137. Ansari HR, Raghava GPS. Identification of conformational B-cell epitopes in an antigen from its primary sequence. Immunome Research 2010; 6:6. 10.1186/1745-7580-6-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 138. Gao J, Faraggi E, Zhou Y. et al. BEST: improved prediction of B-cell epitopes from antigen sequences. PloS One 2012;7:e40104. 10.1371/journal.pone.0040104 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139. Zhang J, Zhao X, Sun P. et al. Conformational B-cell epitopes prediction from sequences using cost-sensitive ensemble classifiers and spatial clustering. Biomed Res Int 2014;2014:689219. 10.1155/2014/689219 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140. Dalkas GA, Rooman M. SEPIa, a knowledge-driven algorithm for predicting conformational B-cell epitopes from the amino acid sequence. BMC Bioinformatics 2017;18:95. 10.1186/s12859-017-1528-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141. Zhou C, Chen Z, Zhang L. et al. SEPPA 3.0—enhanced spatial epitope prediction enabling glycoprotein antigens. Nucleic Acids Res 2019;47:W388–94. 10.1093/nar/gkz413 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 142. Høie MH, Gade FS, Johansen Julie M. et al. DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations. Front Immunol 2024;15:15. 10.3389/fimmu.2024.1322712 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 143. Sela-Culang I, Ashkenazi S, Peters B. et al. PEASE: predicting B-cell epitopes utilizing antibody sequence. Bioinformatics 2015;31:1313–5. 10.1093/bioinformatics/btu790 [DOI] [PubMed] [Google Scholar]
- 144. Krawczyk K, Liu X, Baker T. et al. Improving B-cell epitope prediction and its application to global antibody-antigen docking. Bioinformatics 2014;30:2288–94. 10.1093/bioinformatics/btu190 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 145. Qiu T, Zhang L, Chen Z. et al. SEPPA-mAb: spatial epitope prediction of protein antigens for mAbs. Nucleic Acids Res 2023;51:W528–34. 10.1093/nar/gkad427 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146. Huang J, Gutteridge A, Honda W. et al. MIMOX: a web tool for phage display based epitope mapping. BMC Bioinformatics 2006;7:451. 10.1186/1471-2105-7-451 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147. Mayrose I, Penn O, Erez E. et al. Pepitope: epitope mapping from affinity-selected peptides. Bioinformatics 2007;23:3244–6. 10.1093/bioinformatics/btm493 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 148. Huang YX, Bao YL, Guo SY. et al. Pep-3D-search: a method for B-cell epitope prediction based on mimotope analysis. BMC Bioinformatics 2008;9:538. 10.1186/1471-2105-9-538 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149. Sun P, Ju H, Zhang B. et al. Conformational B-cell epitope prediction method based on antigen preprocessing and mimotopes analysis. Biomed Res Int 2015;2015:1–8. 10.1155/2015/257030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150. Manzoor H, Wani A. Evolution of machine learning methods in linear B-cell epitope prediction. In: 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, New Delhi, India, pp. 979–85, 2023.
- 151. El-Manzalawy Y, Dobbs D, Honavar VG. In Silico prediction of linear B-cell epitopes on proteins. In: Zhou Y, Kloczkowski A, Faraggi E. et al. (eds.), Prediction of Protein Secondary Structure, 255–64. Springer New York: New York, NY, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152. Zhou J, Chen J, Peng Y. et al. A promising tool in serological diagnosis: current research progress of antigenic epitopes in infectious diseases. Pathogens 2022;11:1095. 10.3390/pathogens11101095 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 153. Wu C-H, Liu IJ, Lu R-M. et al. Advancement and applications of peptide phage display technology in biomedical science. J Biomed Sci 2016;23:8. 10.1186/s12929-016-0223-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154. Zhang C, Li Y, Tang W. et al. The relationship between B-cell epitope and Mimotope sequences. Protein & Peptide Letters 2016;23:132–41. 10.2174/0929866523666151230124538 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 155. Aghebati-Maleki L, Bakhshinejad B, Baradaran B. et al. Phage display as a promising approach for vaccine development. J Biomed Sci 2016;23:66. 10.1186/s12929-016-0285-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156. Zhang WUY, Wan Y, Li DG. et al. A mimotope of pre-S2 region of surface antigen of viral hepatitis B screened by phage display. Cell Res 2001;11:203–8. 10.1038/sj.cr.7290087 [DOI] [PubMed] [Google Scholar]
- 157. Cia G, Pucci F, Rooman M. Critical review of conformational B-cell epitope prediction methods. Brief Bioinform 2023;24:bbac567. 10.1093/bib/bbac567 [DOI] [PubMed] [Google Scholar]
- 158. Jespersen MC, Peters B, Nielsen M. et al. BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Res 2017;45:W24–9. 10.1093/nar/gkx346 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159. Kringelum JV, Lundegaard C, Lund O. et al. Reliable B cell epitope predictions: impacts of method development and improved benchmarking. PLoS Comput Biol 2012;8:e1002829. 10.1371/journal.pcbi.1002829 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160. Liang S, Zheng D, Standley DM. et al. EPSVR and EPMeta: prediction of antigenic epitopes using support vector regression and multiple server results. BMC Bioinformatics 2010;11:381. 10.1186/1471-2105-11-381 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161. Sweredoski MJ, Baldi P. PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure. Bioinformatics 2008;24:1459–60. 10.1093/bioinformatics/btn199 [DOI] [PubMed] [Google Scholar]
- 162. da Silva BM, Myung Y, Ascher DB. et al. epitope3D: a machine learning method for conformational B-cell epitope prediction. Brief Bioinform 2022;23:bbab423. 10.1093/bib/bbab423 [DOI] [PubMed] [Google Scholar]
- 163. Ott PA, Hu Z, Keskin DB. et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 2017;547:217–21. 10.1038/nature22991 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164. Keskin DB, Anandappa AJ, Sun J. et al. Neoantigen vaccine generates intratumoral T cell responses in phase Ib glioblastoma trial. Nature 2019;565:234–9. 10.1038/s41586-018-0792-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 165. Jeffreys S, Tompkins MP, Aki J. et al. Development and evaluation of an Immunoinformatics-based multi-peptide vaccine against Acinetobacter baumannii infection. Vaccine 2024;12:358. 10.3390/vaccines12040358 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166. Ghaffar SA, Tahir H, Muhammad S. et al. Designing of a multi-epitopes based vaccine against Haemophilius parainfluenzae and its validation through integrated computational approaches. Front Immunol 2024;15:1380732. 10.3389/fimmu.2024.1380732 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 167. Kolla HB, Dutt M, Kumar A. et al. Immuno-informatics study identifies conserved T cell epitopes in non-structural proteins of bluetongue virus serotypes: formulation of a computationally optimized next-generation broad-spectrum multi-epitope vaccine. Front Immunol 2024;15:1424307. 10.3389/fimmu.2024.1424307 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 168. Dasari V, McNeil LK, Beckett K. et al. Lymph node targeted multi-epitope subunit vaccine promotes effective immunity to EBV in HLA-expressing mice. Nat Commun 2023;14:4371. 10.1038/s41467-023-39770-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 169. Alam A, Khan A, Imam N. et al. Design of an epitope-based peptide vaccine against the SARS-CoV-2: a vaccine-informatics approach. Brief Bioinform 2021;22:1309–23. 10.1093/bib/bbaa340 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 170. Zhang G, Han L, Zhao Y. et al. Development and evaluation of a multi-epitope subunit vaccine against mycoplasma synoviae infection. Int J Biol Macromol 2023;253:126685. 10.1016/j.ijbiomac.2023.126685 [DOI] [PubMed] [Google Scholar]
- 171. Zhang Y, Liang S, Zhang S. et al. Development and evaluation of a multi-epitope subunit vaccine against group B streptococcus infection. Emerging Microbes & Infections 2022;11:2371–82. 10.1080/22221751.2022.2122585 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 172. Bhattacharya M, Chatterjee S, Nag S. et al. Designing, characterization, and immune stimulation of a novel multi-epitopic peptide-based potential vaccine candidate against monkeypox virus through screening its whole genome encoded proteins: an immunoinformatics approach. Travel Med Infect Dis 2022;50:102481. 10.1016/j.tmaid.2022.102481 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 173. Aziz S, Almajhdi FN, Waqas M. et al. Contriving multi-epitope vaccine ensemble for monkeypox disease using an immunoinformatics approach. Front Immunol 2022;13:1004804. 10.3389/fimmu.2022.1004804 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 174. Alshabrmi FM, Alrumaihi F, Alrasheedi SF. et al. An In-Silico investigation to design a multi-epitopes vaccine against multi-drug resistant hafnia alvei. Vaccine 2022;10:1127. 10.3390/vaccines10071127 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 175. Waqas M, Aziz S, Bushra A. et al. Employing an immunoinformatics approach revealed potent multi-epitope based subunit vaccine for lymphocytic choriomeningitis virus. J Infect Public Health 2023;16:214–32. 10.1016/j.jiph.2022.12.023 [DOI] [PubMed] [Google Scholar]
- 176. Jin Y, Fayyaz A, Liaqat A. et al. Proteomics-based vaccine targets annotation and design of subunit and mRNA-based vaccines for Monkeypox virus (MPXV) against the recent outbreak. Comput Biol Med 2023;159:106893. 10.1016/j.compbiomed.2023.106893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 177. Shahrear S, Islam ABMMK. Immunoinformatics guided modeling of CCHF_GN728, an mRNA-based universal vaccine against Crimean-Congo hemorrhagic fever virus. Comput Biol Med 2022;140:105098. 10.1016/j.compbiomed.2021.105098 [DOI] [PubMed] [Google Scholar]
- 178. Alawam AS, Alwethaynani MS. Construction of an aerolysin-based multi-epitope vaccine against Aeromonas hydrophila: an in silico machine learning and artificial intelligence-supported approach. Front Immunol 2024;15:1369890. 10.3389/fimmu.2024.1369890 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 179. Huang S, Zhang C, Li J. et al. Designing a multi-epitope vaccine against coxsackievirus B based on immunoinformatics approaches. Front Immunol 2022;13:933594. 10.3389/fimmu.2022.933594 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 180. Bhattacharya K, Shamkh IM, Khan MS. et al. Multi-epitope vaccine design against Monkeypox virus via reverse vaccinology method exploiting Immunoinformatic and Bioinformatic approaches. Vaccine 2022;10:2010. 10.3390/vaccines10122010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 181. de Oliveira MA, Vilela Rodrigues TC, Tiwari S. et al. Immunoinformatics-guided design of a multi-valent vaccine against rotavirus and norovirus (ChRNV22). Comput Biol Med 2023;159:106941. 10.1016/j.compbiomed.2023.106941 [DOI] [PubMed] [Google Scholar]
- 182. Ojha R, Pandey RK, Prajapati VK. Vaccinomics strategy to concoct a promising subunit vaccine for visceral leishmaniasis targeting sandfly and leishmania antigens. Int J Biol Macromol 2020;156:548–57. 10.1016/j.ijbiomac.2020.04.097 [DOI] [PubMed] [Google Scholar]
- 183. Dong R, Chu Z, Yu F. et al. Contriving multi-epitope subunit of vaccine for COVID-19: Immunoinformatics approaches. Front Immunol 2020;11:1784. 10.3389/fimmu.2020.01784 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 184. Reddy Chichili VP, Kumar V, Sivaraman J. Linkers in the structural biology of protein–protein interactions. Protein Sci 2013;22:153–67. 10.1002/pro.2206 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 185. Ayyagari VS, T C V KAP. et al. Design of a multi-epitope-based vaccine targeting M-protein of SARS-CoV2: an immunoinformatics approach. Journal of Biomolecular Structure and Dynamics 2022; 40:2963–77. 10.1080/07391102.2020.1850357 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 186. Chen X, Zaro J, Shen W-C. Fusion protein linkers: effects on production, bioactivity, and pharmacokinetics. In: Schmidt SR (ed.), Fusion Protein Technologies for Biopharmaceuticals. John Wiley & Sons, Inc., 2013;57–73. 10.1002/9781118354599.ch4 [DOI] [Google Scholar]
- 187. Enosi, Tuipulotu D, Netzler Natalie E, Lun Jennifer H. et al. TLR7 agonists display potent antiviral effects against norovirus infection via innate stimulation. Antimicrob Agents Chemother 2018;62:e02417–17. 10.1128/AAC.02417-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 188. Kolla HB, Tirumalasetty C, Sreerama K. et al. An immunoinformatics approach for the design of a multi-epitope vaccine targeting super antigen TSST-1 of Staphylococcus aureus. Journal of Genetic Engineering and Biotechnology 2021;19:69. 10.1186/s43141-021-00160-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 189. Andongma BT, Huang Y, Chen F. et al. In silico design of a promiscuous chimeric multi-epitope vaccine against mycobacterium tuberculosis. Comput Struct Biotechnol J 2023;21:991–1004. 10.1016/j.csbj.2023.01.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 190. Tarrahimofrad H, Rahimnahal S, Zamani J. et al. Designing a multi-epitope vaccine to provoke the robust immune response against influenza a H7N9. Sci Rep 2021;11:24485. 10.1038/s41598-021-03932-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 191. Takeda K, Akira S. Toll-like receptors. Curr Protoc Immunol 2015;109:14.12.11–0. 10.1002/0471142735.im1412s109 [DOI] [PubMed] [Google Scholar]
- 192. Lee H, Heo L, Lee MS. et al. GalaxyPepDock: a protein–peptide docking tool based on interaction similarity and energy optimization. Nucleic Acids Res 2015;43:W431–5. 10.1093/nar/gkv495 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 193. Zane L, Kraschowetz S, Trentini MM. et al. Peptide linker increased the stability of pneumococcal fusion protein vaccine candidate. Front Bioeng Biotechnol 2023;11:1108300. 10.3389/fbioe.2023.1108300 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 194. Patel DK, Menon DV, Patel DH. et al. Linkers: a synergistic way for the synthesis of chimeric proteins. Protein Expr Purif 2022;191:106012. 10.1016/j.pep.2021.106012 [DOI] [PubMed] [Google Scholar]
- 195. Laupèze B, Hervé C, Di Pasquale A. et al. Adjuvant systems for vaccines: 13 years of post-licensure experience in diverse populations have progressed the way adjuvanted vaccine safety is investigated and understood. Vaccine 2019;37:5670–80. 10.1016/j.vaccine.2019.07.098 [DOI] [PubMed] [Google Scholar]
- 196. Apostólico JS, Lunardelli VAS, Coirada FC. et al. Adjuvants: classification, modus operandi, and licensing. J Immunol Res 2016;2016:1459394. 10.1155/2016/1459394 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 197. Pulendran B, S. Arunachalam P, O’Hagan DT. Emerging concepts in the science of vaccine adjuvants. Nat Rev Drug Discov 2021;20:454–75. 10.1038/s41573-021-00163-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 198. Bajoria S, Kaur K, Kumru OS. et al. Antigen-adjuvant interactions, stability, and immunogenicity profiles of a SARS-CoV-2 receptor-binding domain (RBD) antigen formulated with aluminum salt and CpG adjuvants. Hum Vaccin Immunother 2022;18:2079346. 10.1080/21645515.2022.2079346 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 199. Ćirović A, Ćirović A, Nikolić D. et al. The adjuvant aluminum fate—metabolic tale based on the basics of chemistry and biochemistry. J Trace Elem Med Biol 2021;68:126822. 10.1016/j.jtemb.2021.126822 [DOI] [PubMed] [Google Scholar]
- 200. Dowling JK, Mansell A. Toll-like receptors: the Swiss army knife of immunity and vaccine development. Clinical & Translational Immunology 2016;5:e85. 10.1038/cti.2016.22 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 201. Damas MSF, Mazur FG, CCDM F. et al. A systematic Immuno-Informatic approach to design a multiepitope-based vaccine against emerging multiple drug resistant Serratia marcescens. Front Immunol 2022;13:768569. 10.3389/fimmu.2022.768569 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 202. Wu TYH, Singh M, Miller AT. et al. Rational design of small molecules as vaccine adjuvants. Sci Transl Med 2014;6:263ra160–0. 10.1126/scitranslmed.3009980 [DOI] [PubMed] [Google Scholar]
- 203. Zhao T, Cai Y, Jiang Y. et al. Vaccine adjuvants: mechanisms and platforms. Signal Transduct Target Ther 2023;8:283. 10.1038/s41392-023-01557-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 204. Brito LA, Malyala P, O'Hagan DT. Vaccine adjuvant formulations: a pharmaceutical perspective. Semin Immunol 2013;25:130–45. 10.1016/j.smim.2013.05.007 [DOI] [PubMed] [Google Scholar]
- 205. Gilkes AP, Albin TJ, Manna S. et al. Tuning subunit vaccines with novel TLR Triagonist adjuvants to generate protective immune responses against Coxiella burnetii. The Journal of Immunology 2020;204:611–21. 10.4049/jimmunol.1900991 [DOI] [PubMed] [Google Scholar]
- 206. Jain R, Jain A, Mauro E. et al. ICOR: improving codon optimization with recurrent neural networks. BMC Bioinformatics 2023;24:132. 10.1186/s12859-023-05246-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 207. Paremskaia AI, Kogan AA, Murashkina A. et al. Codon-optimization in gene therapy: promises, prospects and challenges. Front Bioeng Biotechnol 2024;12:1371596. 10.3389/fbioe.2024.1371596 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 208. Mauro VP, Chappell SA. A critical analysis of codon optimization in human therapeutics. Trends Mol Med 2014;20:604–13. 10.1016/j.molmed.2014.09.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 209. George RA, Heringa J. An analysis of protein domain linkers: their classification and role in protein folding. Protein Engineering, Design and Selection 2002;15:871–9. 10.1093/protein/15.11.871 [DOI] [PubMed] [Google Scholar]
- 210. Rawlings ND, Barrett AJ, Thomas PD. et al. The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Res 2018;46:D624–32. 10.1093/nar/gkx1134 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 211. Crasto CJ, Feng J-A. LINKER: a program to generate linker sequences for fusion proteins. Protein Engineering, Design and Selection 2000;13:309–12. 10.1093/protein/13.5.309 [DOI] [PubMed] [Google Scholar]
- 212. Xue F, Gu Z, Feng J-a. LINKER: a web server to generate peptide sequences with extended conformation. Nucleic Acids Res 2004;32:W562–5. 10.1093/nar/gkh422 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 213. Liu C, Chin JX, Lee D-Y. SynLinker: an integrated system for designing linkers and synthetic fusion proteins. Bioinformatics 2015;31:3700–2. 10.1093/bioinformatics/btv447 [DOI] [PubMed] [Google Scholar]
- 214. Sayers S, Ulysse G, Xiang Z. et al. Vaxjo: a web-based vaccine adjuvant database and its application for analysis of vaccine adjuvants and their uses in vaccine development. Biomed Res Int 2012;2012:831486. 10.1155/2012/831486 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 215. Nagpal G, Gupta S, Chaudhary K. et al. VaccineDA: prediction, design and genome-wide screening of oligodeoxynucleotide-based vaccine adjuvants. Sci Rep 2015;5:12478. 10.1038/srep12478 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 216. Nagpal G, Chaudhary K, Agrawal P. et al. Computer-aided prediction of antigen presenting cell modulators for designing peptide-based vaccine adjuvants. J Transl Med 2018;16:181. 10.1186/s12967-018-1560-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 217. Grote A, Hiller K, Scheer M. et al. JCat: a novel tool to adapt codon usage of a target gene to its potential expression host. Nucleic Acids Res 2005;33:W526–31. 10.1093/nar/gki376 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 218. LeRoy N, Roleck C. Optipyzer: a fast and flexible multi-species codon optimization server. bioRxiv 2023;541759. 10.1101/2023.05.22.541759 [DOI] [Google Scholar]
- 219. Fu H, Liang Y, Zhong X. et al. Codon optimization with deep learning to enhance protein expression. Sci Rep 2020;10:17617. 10.1038/s41598-020-74091-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 220. Gonzalez-Sanchez B, Vega-Rodríguez MA, Santander-Jiménez S. et al. Multi-objective artificial bee colony for designing multiple genes encoding the same protein. Appl Soft Comput 2019;74:90–8. 10.1016/j.asoc.2018.10.023 [DOI] [Google Scholar]
- 221. Zhang H, Zhang L, Lin A. et al. Algorithm for optimized mRNA design improves stability and immunogenicity. Nature 2023;621:396–403. 10.1038/s41586-023-06127-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 222. Zhang Y, Huffman A, Johnson J. et al. Vaxign-DL: a deep learning-based method for vaccine design and its evaluation. bioRxiv 2023;2023:2011.2029.569096. 10.1101/2023.11.29.569096 [DOI] [Google Scholar]
- 223. He Y, Xiang Z, Mobley HLT. Vaxign: the first web-based vaccine design program for reverse vaccinology and applications for vaccine development. Biomed Res Int 2010;2010:1–15. 10.1155/2010/297505 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 224. Vivona S, Bernante F, Filippini F. NERVE: new enhanced reverse vaccinology environment. BMC Biotechnol 2006;6:35. 10.1186/1472-6750-6-35 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 225. Jaiswal V, Chanumolu SK, Gupta A. et al. Jenner-predict server: prediction of protein vaccine candidates (PVCs) in bacteria based on host-pathogen interactions. BMC Bioinformatics 2013;14:211. 10.1186/1471-2105-14-211 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 226. Rizwan M, Naz A, Ahmad J. et al. VacSol: a high throughput in silico pipeline to predict potential therapeutic targets in prokaryotic pathogens using subtractive reverse vaccinology. BMC Bioinformatics 2017;18:106. 10.1186/s12859-017-1540-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 227. Goodswen SJ, Kennedy PJ, Ellis JT. Vacceed: a high-throughput in silico vaccine candidate discovery pipeline for eukaryotic pathogens based on reverse vaccinology. Bioinformatics 2014;30:2381–3. 10.1093/bioinformatics/btu300 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 228. Doytchinova IA, Flower DR. VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics 2007;8:4. 10.1186/1471-2105-8-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 229. Dimitrov I, Zaharieva N, Doytchinova I. Bacterial immunogenicity prediction by machine learning methods. Vaccine 2020;8:709. 10.3390/vaccines8040709 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 230. Li J, Zhao Z, Tai C. et al. VirusImmu: a novel ensemble machine learning approach for viral immunogenicity prediction. bioRxiv 2023;2023:2011.2023.568426. 10.1101/2023.11.23.568426 [DOI] [Google Scholar]
- 231. Dey J, Mahapatra SR, Raj TK. et al. Designing a novel multi-epitope vaccine to evoke a robust immune response against pathogenic multidrug-resistant enterococcus faecium bacterium. Gut Pathogens 2022;14:21. 10.1186/s13099-022-00495-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 232. Chauhan V, Rungta T, Goyal K. et al. Designing a multi-epitope based vaccine to combat Kaposi sarcoma utilizing Immunoinformatics approach. Sci Rep 2019;9:2517. 10.1038/s41598-019-39299-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 233. Alimentarius C. Codex principles and guidelines on FOODS derived from BIOTECHNOLOGY. (CAC/GL 44-2003). Available at: https://mobil.bfr.bund.de/cm/343/codex_principles_and_guidelines_on_foods_derived_from_biotechnology.pdf. 2003.
- 234. Goodman RE, Ebisawa M, Ferreira F. et al. AllergenOnline: a peer-reviewed, curated allergen database to assess novel food proteins for potential cross-reactivity. Mol Nutr Food Res 2016;60:1183–98. 10.1002/mnfr.201500769 [DOI] [PubMed] [Google Scholar]
- 235. van Ree R, Sapiter Ballerda D, Berin MC. et al. The COMPARE Database: A Public Resource for Allergen Identification. Adapted for Continuous Improvement. Front Allergy 2021;2:700533. https://www.frontiersin.org/journals/allergy/articles/10.3389/falgy.2021.700533 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 236. Sharma N, Naorem LD, Jain S. et al. ToxinPred2: an improved method for predicting toxicity of proteins. Brief Bioinform 2022;23:bbac174. 10.1093/bib/bbac174 [DOI] [PubMed] [Google Scholar]
- 237. Robles-Loaiza AA, Pinos-Tamayo EA, Mendes B. et al. Traditional and computational screening of non-toxic peptides and approaches to improving selectivity. Pharmaceuticals 2022;15:323. 10.3390/ph15030323 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 238. Rathore AS, Choudhury S, Arora A. et al. ToxinPred 3.0: an improved method for predicting the toxicity of peptides. Comput Biol Med 2024;179:108926. 10.1016/j.compbiomed.2024.108926 [DOI] [PubMed] [Google Scholar]
- 239. Pan X, Zuallaert J, Wang X. et al. ToxDL: deep learning using primary structure and domain embeddings for assessing protein toxicity. Bioinformatics 2020;36:5159–68. 10.1093/bioinformatics/btaa656 [DOI] [PubMed] [Google Scholar]
- 240. Morozov V, Rodrigues CHM, Ascher DB. CSM-toxin: a web-server for predicting protein toxicity. Pharmaceutics 2023;15:431. 10.3390/pharmaceutics15020431 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 241. Wei L, Ye X, Xue Y. et al. ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism. Brief Bioinform 2021;22:bbab041. 10.1093/bib/bbab041 [DOI] [PubMed] [Google Scholar]
- 242. Wei L, Ye X, Sakurai T. et al. ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning. Bioinformatics 2022;38:1514–24. 10.1093/bioinformatics/btac006 [DOI] [PubMed] [Google Scholar]
- 243. Guo W, Liu J, Dong F. et al. Review of machine learning and deep learning models for toxicity prediction. Exp Biol Med 2023;248:1952–73. 10.1177/15353702231209421 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 244. Pérez Santín E, Rodríguez Solana R, González García M. et al. Toxicity prediction based on artificial intelligence: a multidisciplinary overview. WIREs Computational Molecular Science 2021;11:e1516. 10.1002/wcms.1516 [DOI] [Google Scholar]
- 245. Ishii KJ, Koyama S, Nakagawa A. et al. Host innate immune receptors and beyond: making sense of microbial infections. Cell Host Microbe 2008;3:352–63. 10.1016/j.chom.2008.05.003 [DOI] [PubMed] [Google Scholar]
- 246. Choe J, Kelker MS, Wilson IA. Crystal structure of human toll-like receptor 3 (TLR3) Ectodomain. Science 2005;309:581–5. 10.1126/science.1115253 [DOI] [PubMed] [Google Scholar]
- 247. Kaur A, Baldwin J, Brar D. et al. Toll-like receptor (TLR) agonists as a driving force behind next-generation vaccine adjuvants and cancer therapeutics. Curr Opin Chem Biol 2022;70:102172. 10.1016/j.cbpa.2022.102172 [DOI] [PubMed] [Google Scholar]
- 248. Pierce BG, Wiehe K, Hwang H. et al. ZDOCK server: interactive docking prediction of protein–protein complexes and symmetric multimers. Bioinformatics 2014;30:1771–3. 10.1093/bioinformatics/btu097 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 249. Jiménez-García B, Pons C, Fernández-Recio J. pyDockWEB: a web server for rigid-body protein–protein docking using electrostatics and desolvation scoring. Bioinformatics 2013;29:1698–9. 10.1093/bioinformatics/btt262 [DOI] [PubMed] [Google Scholar]
- 250. Kozakov D, Hall DR, Xia B. et al. The ClusPro web server for protein–protein docking. Nat Protoc 2017;12:255–78. 10.1038/nprot.2016.169 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 251. Mondal A, Chang L, Perez A. Modelling peptide–protein complexes: docking, simulations and machine learning. QRB Discovery 2022;3:e17. 10.1017/qrd.2022.14 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 252. Vittorio S, Lunghini F, Morerio P. et al. Addressing docking pose selection with structure-based deep learning: recent advances, challenges and opportunities. Comput Struct Biotechnol J 2024;23:2141–51. 10.1016/j.csbj.2024.05.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 253. Abramson J, Adler J, Dunger J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024;630:493–500. 10.1038/s41586-024-07487-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 254. Bianca C, Pennisi M. Immune system modelling by top-down and bottom-up approaches. International Mathematical Forum 2012;7:109–28. 10.1017/qrd.2022.14 [DOI] [Google Scholar]
- 255. Shinde SB, Kurhekar MP. Review of the systems biology of the immune system using agent-based models. IET Syst Biol 2018;12:83–92. 10.1049/iet-syb.2017.0073 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 256. Bauer AL, Beauchemin CAA, Perelson AS. Agent-based modeling of host-pathogen systems: the successes and challenges. Inform Sci 2009;179:1379–89. 10.1016/j.ins.2008.11.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 257. Rapin N, Lund O, Bernaschi M. et al. Computational immunology meets bioinformatics: the use of prediction tools for molecular binding in the simulation of the immune system. PloS One 2010;5:e9862. 10.1371/journal.pone.0009862 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 258. Rapin N, Lund O, Castiglione F. Immune system simulation online. Bioinformatics 2011;27:2013–4. 10.1093/bioinformatics/btr335 [DOI] [PubMed] [Google Scholar]
- 259. Cheng P, Xue Y, Wang J. et al. Evaluation of the consistence between the results of Immunoinformatics predictions and real-world animal experiments of a new tuberculosis vaccine MP3RT. Front Cell Infect Microbiol 2022;12:2235–988. 10.3389/fcimb.2022.1047306 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 260. Stolfi P, Castiglione F, Mastrostefano E. et al. In-silico evaluation of adenoviral COVID-19 vaccination protocols: assessment of immunological memory up to 6 months after the third dose. Front Immunol 2022;13:998262. 10.3389/fimmu.2022.998262 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 261. Li G, Iyer B, Prasath VBS. et al. DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity. Brief Bioinform 2021;22:bbab160. 10.1093/bib/bbab160 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 262. Ong E, Wang H, Wong MU. et al. Vaxign-ML: supervised machine learning reverse vaccinology model for improved prediction of bacterial protective antigens. Bioinformatics 2020;36:3185–91. 10.1093/bioinformatics/btaa119 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 263. Ong E, Cooke MF, Huffman A. et al. Vaxign2: the second generation of the first web-based vaccine design program using reverse vaccinology and machine learning. Nucleic Acids Res 2021;49:W671–8. 10.1093/nar/gkab279 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 264. Fiers MWEJ, Kleter GA, Nijland H. et al. Allermatch™, a webtool for the prediction of potential allergenicity according to current FAO/WHO Codex alimentarius guidelines. BMC Bioinformatics 2004;5:133. 10.1186/1471-2105-5-133 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 265. Zhang ZH, Koh JLY, Zhang GL. et al. AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins. Bioinformatics 2007;23:504–6. 10.1093/bioinformatics/btl621 [DOI] [PubMed] [Google Scholar]
- 266. Björklund ÅK, Soeria-Atmadja D, Zorzet A. et al. Supervised identification of allergen-representative peptides for in silico detection of potentially allergenic proteins. Bioinformatics 2005;21:39–50. 10.1093/bioinformatics/bth477 [DOI] [PubMed] [Google Scholar]
- 267. Lu W, Negi SS, Schein CH. et al. Distinguishing allergens from non-allergenic homologues using physical–chemical property (PCP) motifs. Mol Immunol 2018;99:1–8. 10.1016/j.molimm.2018.03.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 268. Dimitrov I, Naneva L, Doytchinova I. et al. AllergenFP: allergenicity prediction by descriptor fingerprints. Bioinformatics 2014;30:846–51. 10.1093/bioinformatics/btt619 [DOI] [PubMed] [Google Scholar]
- 269. Dimitrov I, Flower DR, Doytchinova I. AllerTOP—a server for in silico prediction of allergens. BMC Bioinformatics 2013;14:S4. 10.1186/1471-2105-14-S6-S4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 270. Dimitrov I, Bangov I, Flower DR. et al. AllerTOP v.2—a server for in silico prediction of allergens. J Mol Model 2014;20:2278. 10.1007/s00894-014-2278-5 [DOI] [PubMed] [Google Scholar]
- 271. Shanthappa PM, Kumar R. ProAll-D: protein allergen detection using long short term memory - a deep learning approach. ADMET & DMPK 2022;10:231–40. 10.5599/admet.1335 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 272. Kumar A, Rana PS. A deep learning based ensemble approach for protein allergen classification. PeerJ Computer science 2023;9:e1622. 10.7717/peerj-cs.1622 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 273. Maurer-Stroh S, Maurer-Stroh S, Krutz NL. et al. AllerCatPro—prediction of protein allergenicity potential from the protein sequence. Bioinformatics 2019;35:3020–7. 10.1093/bioinformatics/btz029 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 274. Nguyen MN, Krutz NL, Limviphuvadh V. et al. AllerCatPro 2.0: a web server for predicting protein allergenicity potential. Nucleic Acids Res 2022;50:W36–43. 10.1093/nar/gkac446 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 275. Huber F, Arnaud M, Stevenson BJ. et al. A comprehensive proteogenomic pipeline for neoantigen discovery to advance personalized cancer immunotherapy. Nat Biotechnol 2024. 10.1038/s41587-024-02420-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 276. Tran NH, Qiao R, Xin L. et al. Personalized deep learning of individual immunopeptidomes to identify neoantigens for cancer vaccines. Nature Machine Intelligence 2020;2:764–71. 10.1038/s42256-020-00260-4 [DOI] [Google Scholar]
- 277. Zhou C, Wei Z, Zhang Z. et al. pTuneos: prioritizing tumor neoantigens from next-generation sequencing data. Genome Med 2019;11:67. 10.1186/s13073-019-0679-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 278. Li B, Jing P, Zheng G. et al. Neo-intline: integrated pipeline enables neoantigen design through the in-silico presentation of T-cell epitope. Signal Transduct Target Ther 2023;8:397. 10.1038/s41392-023-01644-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 279. Thadani NN, Gurev S, Notin P. et al. Learning from prepandemic data to forecast viral escape. Nature 2023;622:818–25. 10.1038/s41586-023-06617-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 280. Li Y, Huang C, Ding L. et al. Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods 2019;166:4–21. 10.1016/j.ymeth.2019.04.008 [DOI] [PubMed] [Google Scholar]
- 281. Bravi B. Development and use of machine learning algorithms in vaccine target selection. npj Vaccines 2024;9:15. 10.1038/s41541-023-00795-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 282. Cavasotto CN, Scardino V. Machine learning toxicity prediction: latest advances by toxicity end point. ACS Omega 2022;7:47536–46. 10.1021/acsomega.2c05693 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 283. Li Y, Farhan MHR, Yang X. et al. A review on the development of bacterial multi-epitope recombinant protein vaccines via reverse vaccinology. Int J Biol Macromol 2024;282:136827. 10.1016/j.ijbiomac.2024.136827 [DOI] [PubMed] [Google Scholar]
- 284. Slathia PS, Sharma P. In Silico designing of vaccines: Methods, tools, and their limitations. In: Singh DB (ed.), Computer-Aided Drug Design, pp. 245–77. Singapore: Springer Singapore, 2020. [Google Scholar]
- 285. Müller M, Huber F, Arnaud M. et al. Machine learning methods and harmonized datasets improve immunogenic neoantigen prediction. Immunity 2023;56:2650–2663.e2656. 10.1016/j.immuni.2023.09.002 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
For access to any research-related data, kindly reach out to the corresponding author.



