Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2025 Feb 14;26(1):bbaf055. doi: 10.1093/bib/bbaf055

Advances of computational methods enhance the development of multi-epitope vaccines

Yiwen Wei 1,#, Tianyi Qiu 2,#, Yisi Ai 3, Yuxi Zhang 4, Junting Xie 5, Dong Zhang 6, Xiaochuan Luo 7, Xiulan Sun 8, Xin Wang 9,10, Jingxuan Qiu 11,12,
PMCID: PMC11827616  PMID: 39951549

Abstract

Vaccine development is one of the most promising fields, and multi-epitope vaccine, which does not need laborious culture processes, is an attractive alternative to classical vaccines with the advantage of safety, and efficiency. The rapid development of algorithms and the accumulation of immune data have facilitated the advancement of computer-aided vaccine design. Here we systemically reviewed the in silico data and algorithms resource, for different steps of computational vaccine design, including immunogen selection, epitope prediction, vaccine construction, optimization, and evaluation. The performance of different available tools on epitope prediction and immunogenicity evaluation was tested and compared on benchmark datasets. Finally, we discuss the future research direction for the construction of a multiepitope vaccine.

Keywords: multi-epitope vaccine, computational design, bioinformatics tools, epitope

Introduction

Vaccination is the most cost-effective intervention for the prevention and treatment of disease, which could save over 2.5 million people in estimation each year [1], and vaccination is seen as one of the most important medical achievements of the past two centuries [2]. The development and publicize of multiple specific-disease vaccines have greatly decreased the morbidity and mortality rates, as well as lightened the social health burden. Typical examples such as the mortality and incidence rate of tetanus, measles, mumps, and other similar diseases have been dramatically reduced by 99% or even eradicated to the popularity of related vaccines [3]. More recent examples of COVID-19 vaccines can quickly control the devastating coronavirus outbreak, demonstrating the power of vaccines. The successful development of therapeutic vaccines for prostate cancer and multivalent vaccines for pneumococcus indicates that the effect of vaccines has been extended to the prevention and treatment of cancer and other diseases with high antigen variability [3]. The clinical trial result of mRNA vaccines against melanoma exhibited the potential and promise of personalized cancer vaccines [4].

A vaccine is a biological product that must contain antigenic components derived from the pathogen or produced synthetically to represent the pathogen [5]. Vaccines could effectively control the spread of infectious disease outbreaks, the faster a vaccine is deployed, the faster an outbreak can be controlled [6]. However, it would take almost 5-10 years to develop a traditional live-attenuated vaccine or inactivated vaccine [6], the time–cost and tedious experiment process makes classical vaccines not suitable for the prevention and control of emergent epidemics. In addition, the attenuated or killed part of pathogens may return to its virulent state which would lose the ability to protect and may even cause damage to the human body [7]. Besides classical live-attenuated and inactivated vaccines, there are several platforms including subunit vaccines, nucleic acid-based vaccines, viral vectors, and virus-like particles have been developed over the past few decades [5].

The subunit vaccines and other platforms could eliminate the risk of ‘return to virulent state’, meanwhile subunit vaccines are more controllable, which are defined as a vaccine that recognizes and utilizes only the antigenic components of the whole pathogen as the novel antigens [8]. Subunit vaccines are mostly composed of one or multiple protein antigens [9], and this composition would not introduce non-antigenic components like classical vaccination using whole organisms which could improve safety, while the recombinant protein-based subunit vaccines would face the questions as high-cost, low stability and difficult to purify and the risk of autoimmune responses [9, 10]. The use of epitopes for the development of subunit vaccines is considered an attractive solution since the epitope-based vaccine can be produced easily and economically with high effectiveness and minimal side effects. The antigenic epitope is the particular part of an antigen that would be recognized by receptors and the basic unit that could elicit an adaptive immune response [11]. The multi-epitope vaccine is a promising strategy for the prophylactic and therapeutic against tumors and viral infections [12], meanwhile, an epitope-based strategy also plays an important role in assisting mRNA vaccine design [13], mRNA vaccine is one of the technologies leading the development of COVID-19 vaccines. Currently, there are 102 multi-epitope vaccines targeting various diseases at different stages of clinical trials, 10 of which are against COVID-19. (https://clinicaltrials.gov/) (20,240,803).

With the explosion of immune-related information and the development of technology, classical immunology has been overtaken by a high-efficiency in silico approach, termed Reverse Vaccinology (RV) (i.e. genomic-based rather than pathogen itself approaches to develop vaccines) [14, 15]. Other in silico processes such as subtractive genomics, proteomics, computational vaccinology, and immunoinformatics, have been proposed and integrated with RV to make the development of vaccines from conventional methods to rational design based on the knowledge of whole genome sequence, host-pathogen interaction, immunological data, omics technologies, and computational tools [16–18]. The advancement of computational methodologies has significantly facilitated vaccine development. For instance, RV-related technologies can be employed to modify virulence gene fragments in pathogens for attenuated vaccines. For nucleic acid vaccines, the antigen sequences can be screened through data resources and be optimized by the property calculation. The stability and immunogenicity of VLPs can be assessed through structural modeling and in silico prediction. In terms of subunit vaccines, especially for multi-epitope vaccines, different algorithms could be applied in the process of vaccine development, including rapid screening of candidate antigens, high-throughput prediction of core epitopes, vaccine optimization, and immunogenicity prediction. Most computational tools are easy to use, open source, and user friendly, and have paved the way to the rational development of multi-epitope vaccines.

In this review, we summarized the critical steps of rational development of multi-epitope vaccines and the advances in relevant computational tools and immune databases, we also discussed the current strategies and widely used bioinformatics tools of each step for the development of multi-epitope vaccines on the basis of the relative literature. Further, we designed three benchmark validations on T-cell epitope predictors, B-cell epitope predictors, and immunogenicity predictors for real-world vaccine sequences. While there is still room for improvement in prediction performance, the benchmark test and real-world successful cases illustrated that in-silico methods have broad potential for vaccine development.

The pipeline of design epitope vaccines and related information resources

The development of multi-epitope vaccines is a multi-step process, and the computational design of vaccines is mainly focused on four aspects (Fig. 1). (a) Determination of potential vaccine antigen candidates and predicting epitopes with desired ability to evoke immune response. The immunogen is the core element of rational multi-epitope vaccines and determines the immune target and specificity. Identification of appropriate epitopes is the key part of epitope vaccines, which would influence immune efficiency and coverage [13]. (b) Computationally construction of vaccines with the aid of adjuvant, linker, and other functional peptides, helps maintain the immunogenicity and stability of vaccines. (c) Optimizing designed vaccines through in silico codon optimization could improve expression efficiency [10]. (d) Immunological testing and validation. A series of evaluations for vaccines should be taken to screen the ideal vaccine with the ability to induce a robust immune response, non-allergic and non-toxic to humans, and could provide broad protection. In silico evaluation could help reduce the time of experimental validation. The necessity and significance of each step will be discussed in the corresponding section of this review.

Figure 1.

Figure 1

Steps of rational in silico design a multi-epitope vaccine. The process could be divided into four parts: A. Immunogen design, the designed multi-epitope vaccine is expected to protect humans against most strains of the targeted pathogen which is related to the selection of representative antigens and the prediction of epitopes at first. Appropriate identification of antigen and epitope is the core step of development of vaccines which determines the efficiency and specificity of the induced immune response. B. Vaccine construction, the vaccine candidates could be constructed with the aid of adjuvant, linker, and other functional peptides after the determination of antigenic epitopes. The application of them is to increase the immunity of vaccines. C. Vaccine optimization, to achieve the maximal expression of designed vaccines in heterologous hosts, the codon optimization of constructed vaccines should be performed using bioinformatics tools. D. in silico evaluation, the designed vaccine candidates should be verified by a series of computational evaluations before the experiments which could reduce the number of experiments and save unnecessary losses.

The accurate selection of appropriate antigens and their immunodominant epitopes is the first key and selective step for the development of the multi-epitope vaccine [12]. With the development of relevant technologies and a deeper understanding of related knowledge of immunity, the bioinformatics-based methods for protein antigen selection have taken over classical vaccinology which is laborious and time-consuming conventional approaches that require pathogen cultivation. There are many related immunological databases and analytical tools have been created by researchers for vaccine design in the past few decades, the type of information has almost covered all fields and reliable. The detailed description of these databases such as the amount and type of stored data are listed in Table 1. The integrated approaches were proposed to accelerate the progress of filtering and determining the antigen proteins, there are several computational pipelines for the identification of potential vaccine candidates (PVCs) have been developed, taking the genomes or proteomes of pathogens as input, the final output is the predicted or created PVC of the whole data which could be used for further study. The description of typical tools can be found in the ‘Supplementary Pipelines for PVCs’ part.

Table 1.

List of immune-related databases

Name Application External tools Manual curation Experiment validation Data (20240726) URL Ref
Database for comprehensive information
VIOLIN Various vaccine-associated research data >10 relatively independent programs Y Y 4708 vaccines or vaccine candidates for 219 pathogens, 24,345 vaccine-related abstracts, and 10,317 full-text documents https://violinet.org/ [19]
IEDB Immune epitope data related to all species and includes antibody, T cell, and MHC binding contexts Epitope prediction tools, epitope analysis tools Y Y 1,619,740 peptidic epitopes, 3188 non-peptidic epitopes, 537,708 T cell assays, 1,406,296 B cell assays, 4,879,777 MHC ligand assays, 4520 epitope source organisms, 1011 restricting MHC alleles, and 24,982 references https://www.iedb.org/ [20]
CEDAR A companion site of IEDB, providing cancer-specific epitope and receptor data Epitope prediction tools, epitope analysis tools Y Y 1305,154 peptidic epitopes, 87 non-peptidic epitopes, 151,618 T cell assays, 131,693 B cell assays, 4,057,203 MHC ligand assays, 1663 epitope source organisms, 662 restricting MHC alleles, and 5264 references https://cedar.iedb.org/home_v3.php [21]
IMGT Genes, sequence, and structure information of immunoglobulin and T cell receptor of vertebrates. 17 online tools of genes, sequences and structures N N comprises 7 databases, 17 online tools, and > 20,000 pages of web resources. https://www.imgt.org/ [22]
AntigenDB antigens with basic information and additional information Epitope search; peptide mapping; antigenic blast; N Y an extensive collection of proteins, glycoproteins, and lipoproteins (>500), extracted from 44 important pathogenic species http://www.imtech.res.in/raghava/antigendb/ [23]
Databases for immunogenic/antigenic peptides and proteins
AntiJen v2.0 continuous quantitative data on immunological molecular interactions, data from position-specific peptide libraries, and biophysical data a nucleotide and a peptide BLAST search N Y over 24,000 entries from published experimentally determined data https://www.ddg-pharmfac.net/antijen/AntiJen/antijenhomepage.htm [24]
PIRD project information, sample information, raw sequencing data, annotated TCR or BCR repertoires, and TBAdb Data analysis tools, data visualization tools N N Contain 3 projects: ~3657 samples, ~11,395,649,000 sequences, ~3657 locus information https://db.cngb.org/pird/ [25]
AgAbDb structure and derived data of antigen–antibody interactions; can be used as benchmark dataset of B-cell epitopes Jmol visualized program; a tool for predicting epitopes N Y 427 antigen–antibody complexes contain 289 protein-Ab complexes and 138 peptide-Ab complexes http://bioinfo.net.in/AgAbDb [26]
Bcipep Comprehensive information about linear B-cell epitopes, including related monoclonal or polyclonal antibodies, and neutralization potential of anti-peptide antibody peptide mapping and antigenic BLAST; N Y 3031 entries that include 763 immunodominant, 1797 immunogenic, and 471 null-immunogenic epitopes http://www.imtech.res.in/raghava/bcipep [27]
EPIMHC MHC-binding peptides and T-cell epitopes that are observed in real proteins NA N N 4875 distinct MHC-binding peptides, of which 2224 are T cell epitopes (1267 MHCI-restricted targeted 226 MHCI and 957 MHCII-restricted targeted 226 MHCII), including 84 epitopes derived from tumor-associated antigens http://bio.med.ucm.es/epimhc/ [28]
TANTIGEN2.0 Information on human tumor antigens that contain T cell epitopes and HLA ligands, and information on validated TCR molecules Blast, MAFFT, HLA binding prediction tools, visualization tools Y Y 4296 antigen variants from 403 unique tumor antigens and more than 1500 T cell epitopes and HLA ligands http://projects.met-hilab.org/tadb/ [29]
Protegen protective antigens and associated information BLAST program Y Y 1631 protective antigens http://www.violinet.org/protegen [30]
Epitome all known antigen/ antibody complex structures and detailed information of interaction residues Jmol visualized program; BLAST search N Y 142 antigens from protein–antibody complex structures with a current total of 10,180 antigenic interactions http://www.rostlab.org/services/epitome/ [31]
CED conformational epitopes and related information visualization tool Y Y 225 entries http://web.kuicr.kyoto-u.ac.jp/~ced [32]
SEDB 3D structure of epitopes and its interaction with antigens and antibodies; B-cell, T-cell, and MHC binding proteins Gene-Ontology, MolProbity, epitope visualization tool; blast tool N Y 299 MHC-binding epitopes, 272 B-cell epitopes, 419 linear epitopes, 49 T-cell epitopes, 64 non-peptidic epitopes, and 126 discontinuous epitopes, among them 614 epitopes are determined by X-Ray http://sedb.bicpu.edu.in/ [33]
Database for immune-related molecules, such as MHC, TCR,
ATLAS information on affinities, structures, and experimental details for TCRs, peptides, and MHCs Modeling software (Rosetta program) Y Y 694 measured binding affinities of TCR-pMHC complexes https://zlab.umassmed.edu/atlas/web [34]
STCRDab TCR structure data with annotations; a resource that automatically collects and curates TCR structure data from PDB Modeling tool Y Y 618 PDB entries with a TCR structure, 851 αβTCRs, 18 γδ TCRs, 680 TCRs complexed to MHC/MHC-like molecules, also contains 37 CDR clusters cover 6 types https://opig.stats.ox.ac.uk/webapps/stcrdab-stcrpred/ [35]
TBAdb deposited in PIRD; Antigen-specific TCRs and BCRs information NA Y N 52,287 sequences with 71 diseases https://db.cngb.org/pird/ [25]
TCRdb human TCR beta chain sequences associated with specific tissue/clinical condition/cell type NA Y N 131 TCR-Seq projects, 8265 TCR-Seq samples, and 277,439,349 TCR CDR3 sequences https://guolab.wchscu.cn/TCRdb/#/ [36]
SYFPEITHI MHC ligands and peptide motifs of humans and other species Epitope prediction tool N N > 7000 peptide sequences known to bind class I and class II MHC molecules http://www.syfpeithi.de/ [37]
McPAS-TCR Detailed information on pathology-associated TCR sequences; NA Y Y About 40,000 TCR information http://friedmanlab.weizmann.ac.il/McPAS-TCR/ [38]
VDJdb Known antigen-specific TCR sequences an epitope-centric approach to TCR annotation Y Y About over 80,000 TCR records, over 2000 unique epitopes, and over 500 studies https://vdjdb.cdr3.net/ [39]
MPID-T2 Sequence–structure–function information on pMHC and TR/pMHC interactions NA Y Y 415 entries from five MHC sources, spanning 56 alleles; 353 pMHC structures, 62 TR/pMHC complexes; 352 MHC class I complexes, and 63 MHC class II structures http://biolinfo.org/mpid-t2/ [40]
MHCBN Sequence and structure data of MHC binding and non-binding peptides mapping of peptide on query sequence; creation of data sets; blast N Y 19,777 entries including 17,129 MHC binders and 2648 MHC non-binders for >400 MHC molecules http://crdd.osdd.net/raghava/mhcbn/ [41]
Other immunological repositories
PRRDB Comprehensive information about pattern-recognition receptors (PRRs) and their ligands BLAST and Smith-Waterman algorithm Y Y extensive information about 467 unique PRRs and 827 ligands from ~600 research articles https://webs.iiitd.edu.in/raghava/prrdb2/ [42]
InnateDB Molecular interactions and pathway annotations of relevance to all mammalian cellular systems visualization tool; data analysis tools Y Y 18,780 interactions https://www.innatedb.com/ [43]
AFND frequency data on the polymorphisms of several immune-related genes visualization tools; analysis tools N N 1801 population studies, 1785 gene/allele data, 683 haplotype data, and 192 genotype data http://www.allelefrequencies.net/ [44]
DEG currently available essential gene records among prokaryotes and eukaryotes blast tool; essential-gene analysis tools N Y ~35,000 genes records for prokaryotes, ~50,000 genes records for eukaryotes http://origin.tubic.org/deg/public/index.php [45]
ViPR information for several human pathogenic viruses; supported by NIAID analytical and visualization tools N N Information for over 50,000 virus strains from 912 species belonging to 70 genera and 14 families www.ViPRbrc.org [46]
Database for specific diseases required for designing subunit vaccines.
MycobacRV known mycobacterial vaccines; epitope information for predicted adhesins; Allergen data; epitope conservation data; analysis tools N N 25 Mycobacterial strains and species, a list of 742 adhesin and adhesin-like proteins having extracellular and cell surface localization, and a list of 233 non-redundant most probable adhesin vaccine candidates https://mycobacteriarv.igib.res.in/
FungalRV immunoinformatics data on predicted adhesins and adhesin like proteins; known fungal vaccines; epitope information for predicted adhesins; Allergen data; Adhesin predictor; analysis tools N N predicted 307 adhesin and adhesin-like proteins and known vaccine candidates https://fungalrv.igib.res.in/ [47]
HPVdb antigen entries derived from high/low-risk HPV genotypes, T cell epitopes, and HLA ligands Basic and specialized analysis tools Y Y 2781 curated antigen entries of antigenic proteins derived from both 18 genotypes of high-risk HPV and low-risk HPV, 191 verified T cell epitopes, and 45 verified HLA ligands http://cvc.dfci.harvard.edu/hpv/ [48]
HIV DATABASES data on HIV genetic sequences and immunological epitopes analysis tool and visualization tool N N 4287 antibodies, >150,000 antibody neutralization assay IC50 values, 2058 unique immunogenic CD8+ CTL epitopes, 725 unique immunogenic CD4+ T helper epitopes https://www.hiv.lanl.gov/content/index
dbEBV EBV genomic variation landscape; global frequency and relationship with human health of each variant evolutionary tree building; visualization tool N N curated 942 EBV genomes with 109,893 variant loci from different tissues or cell lines in 24 countries http://dbebv.omicsbio.info/ [49]
EBVdb Information on EBV antigens, verified T cell epitopes, and HLA ligands Analysis tools; Visualization tool Y Y 2622 curated EBV antigenic proteins, 610 verified T cell epitopes, 26 verified HLA ligands http://projects.met-hilab.org/ebv/ [50]
FLAVIdB information on protein sequences, immunological data, and structural data of flavivirus Analysis tools; Visualization tool Y Y 12,858 entries of flavivirus antigen sequences, 184 verified T-cell epitopes, 201 verified B-cell epitopes, and 4 representative molecular structures of the dengue virus envelope protein cvc.dfci.harvard.edu/flavi/ [51]
FluKB curated immunological data and protein sequence data of influenza analytical tools; Visualization tool Y N Over 400,000 influenza protein sequences, 357 verified T-cell epitopes, 685 HLA binders, 16 naturally processed MHC ligands, and a collection of 28 influenza antibodies and their structurally defined B-cell epitopes http://research4.dfci.harvard.edu/cvc/flukb/ [52]
ViralZone graphics describing virion organization, genome transcription, and translation strategies; fact sheets on all known virus families/genera ClustalW alignment software Y N 879 Virus description pages: 158 families, 711 genera, 8 individual species, 352 viral molecular biology pages https://viralzone.expasy.org/ [53]
SDAP Sequence, structure, IgE epitopes, and binding data of allergenic proteins Modeling tools; analysis tools N N 1908 allergens and isoallergens, 1628 Protein sequences for allergens and isoallergens, 109 Allergens with PDB structures, 378 3D models for allergens and isoallergens, 30 Allergens with IgE epitope sets, 233 new Pfam allergen classes http://fermi.utmb.edu/SDAP/ [54]

The built-in tools such as searching and browsing are included by all database, thus not listed in external tools column.

Key for application and data column: pMHC = peptide-major histocompatibility complex molecules, TR/TCR = T-cell receptors, CDR = complementary-determining region 3.

Immunogen design

An ideal multi-epitope vaccine should composed of various B cell epitopes and T cell epitopes that could elicit cytotoxic T cell (CTL), helper T cell (Th), and B cells, and induce an effective immune response against target pathogens [11]. The traditional way to determine epitopes is laboratory-based which is a time-consuming process and obtains limited results, so the sight of predicted epitopes turned to computational methods. The epitopes could be classified as B cell epitopes (BCEs) and T cell epitopes (TCEs), and their generation processes are different, this review will discuss BCEs and TCEs separately below.

T-cell epitope prediction

An antigen usually goes through three processes for eliciting an adaptive immune response of hosts: (i) antigen processing and presentation, (ii) peptide binds to MHC molecules and is presented by MHC, and (iii) T-cell receptor recognizes the peptide–MHC complex [55]. The detailed illustration of three processes of endogenous antigen and exogenous antigen can be found in the ‘Supplementary Endogenous/exogenous processing of antigens’ part.

MHC class I molecules consist of a single chain and the binding groove is closed, so the short peptides of 8-11 amino acids are preferred to bind to MHC class I [56]. The binding groove of MHC class II molecules is open, so the MHC II-bound peptide length would range from 11 to 25 residues, while only the core of nine residues sits in the binding groove [57]. The difference in structure between MHC class I molecules and MHC class II molecules makes the difficulty level of prediction of the binder different. The detailed diagrammatic presentation can be seen in Fig. 2. Based on the processing of antigens, the epitopes prediction tools could be divided into four categories: prediction of antigen processing and presentation, prediction of pMHC binding, prediction of T cell recognition of pMHC complexes, and miscellaneous methods. The paper mainly introduces the characteristics of the above four types of methods, widely used tools, and promising tools currently developed. The feature, claimed prediction performance and other information of typical tools of each category can be seen in Table 2.

Figure 2.

Figure 2

Diagrammatic presentation of endogenous processing of antigens and exogenous processing of antigens. A. Presents the endogenous processing of antigens. B. Presents the exogenous processing of antigens. From left to right is the process of antigen processing and presentation, peptides bind to MHC molecules, p-MHC complexes are transported to and recognized by TCR molecules. (i)-(iv) represent different tool types. The differences in processing details and tools can be seen below.

Table 2.

The list of in silico tools for predicting T-cell epitopes

Related process   Method name feature Training dataset MHC class Pan-/specific- Performance measure year Ref
              Acc (%) AUC MCC Sens (%) Spec (%)    
Prediction of Antigen Processing and Presentation NetChop v2.0 NN-based model for predicting proteasome cleavage motifs MHC I ligand dataset: 458 cleavage sites determined by MHC-I ligands of 188 human protein; In vitro degradation dataset: 184 distinct sites I 0.53 0.80 0.88 2002 [58]
TAPPred sequence-based SVM or cascade SVM model for predicting TAP ligands 431 peptides, including 409 bind to TAP with varying affinity, and 22 peptides have negligible or no binding affinity I 2003 [59]
ProPred1 matrix-based method for predicting MHC binders with proteasomal cleavage site I 0.80 0.81 0.67 2003 [60]
RANKPEP Hybrid method using PSSMs or profiles for predicting MHC binders, statistical language models for predicting proteasomal cleavage Training sets for statistical modeling: C-terminus and flanking regions of 332 antigens restricted by human MHC-I I, II >0.8 2004 [61]
NetChop v3.1 NN-based model using new sequence encoding schemes for predicting proteasome cleavage MHC I ligand dataset: 746 cleavage sites; In vitro degradation dataset: 164 cleavage sites I 0.85 0.48 2005 [62]
Pcleavage SVM-based model for predicting constitutive proteasome and immunoproteasome cleavage sites in vitro digested dataa, 506 MHC ligandsb from over 250 proteins I 70.0a 76.7b 0.805a 0.615b 0.43a 0.54b 84.6a 84.3b 55.6a 68.0b 2005 [56]
NetCTL 1.2 Hybrid model integrates MHC class I binding, proteasomal cleavage, and TAP transport efficiency 863 epitope-protein pairs, all 9 mer peptides contained in the source protein sequences with no annotated are seen as negative peptides I 12 MHC-I supertypes >0.72 2007 [63]
TAPreg SVM-based model for predicting affinity of peptides to TAP 613 nonamer peptides with various TAP affinities, I 2009 [64]
TAP Hunter SVM-based model using sequence local description for predicting variable-length TAP ligands 276 TAP binding and 94 non-binding nonamer peptides I 0.88 0.88 2010 [57]
Prediction of Peptide–MHC Binding Sequence-based BIMAS quantitative matrices-based model using linear regression on the measured half-life of HLA-A2 complexes with bound peptide I 1994 [65]
MOTIF Motif-based I 1995 [66]
SYFPEITHI database for MHC ligands and peptide motifs, also provide prediction of epitope based on motif I, II 1999 [37]
TEPITOPE Method combines virtual matrices and DNA microarray technology II pan 1999 [67]
SVMHC SVM-based model for MHC-I binders, and matrices-based method for MHC-II binders SYFPEITHI data: 3500 sequences that are natural ligands to T-cell epitopes I, II 0.85 0.90 2002 [68, 69]
Udaka et al. A query learning algorithm based on HMM model 329 binding peptides I 0.84 2002 [70]
NetMHC quantitative ANN model 400 9mer peptides with various binding affinities, 65 QBC-selected peptides for binding to HLA-A*0204 I 2003 [71]
PEPVAC PSSM-based model combined with a dynamic algorithm, also can predict proteasome cleavage using probabilistic language models I Supertypes, A2, A3, B7, A24 and B15 >0.8 2005 [72]
MULTIPRED Method provides HMM or ANN model for predicting 9mer peptide sequences: 3050 (664 binders and 2386 non-binders) related to 15 variants of A2 supertype, 2216 (680 binders and 1536 non-binders) related to 8 variants of A3 supertype and 2396 (448 binders and 1948 non-binders) related to 6 DR variants I, II Supertypes, class I A2, A3, and class II DR >0.8 2005 [73]
NetMHCII SMM-align method quantitative MHC–peptide binding data: 5147 peptides covering 14 HLA-DR and three H2-IA alleles II 0.756 2007 [74]
NetMHCpan ANN-based model 37,384 unique peptide-HLA interactions, 26,503 interactions covering 24 HLA-A alleles, 10,881 interactions covering 18 HLA-B alleles I pan 0.95 0.74 2007 [75]
NetMHCIIpan ANN-based model using SMM-align method 14,607 unique peptide-HLA interactions, 7839 positive binders, and 6768 negative samples II pan 0.787 2008 [76]
MULTIPRED2 System using NetMHCpan 2.0 and NetMHCIIpan 1.0 as prediction engines for individual or combination of alleles, or supertypes I, II supertypes, 13 class I and 13 class II 2010 [77]
NetMHC4.0 a sequence alignment method based on pan-length ANN IEDB dataset covering 118 MHC class I alleles and each allele with over 20 binders as positive, 100 random natural peptides as negatives I >0.88 2016 [78]
MixMHCpred MHC-I ligand predictor using an unsupervised way to annotate motifs based on co-occurrence of HLA-I alleles HLA peptidomics data (58 alleles in total) I pan 0.979 2017 [79]
HLAthena Model with three single-layer neural networks trained on mass spectrometry data and different peptide encoding. 186,464 eluted peptides from 95 alleles I pan 2019 [80]
MixMHC2pred MHC-II ligand predictor using a motif deconvolution algorithm combines unbiased mass spectrometry to train models 99,265 unique peptides eluted from HLA-II molecules II pan 0.83 2019 [81]
NetMHCpan-4.1 ANN-based model using NNAlign_MA framework to train data 13,245,212 data covering 250 distinct MHC class I molecules I pan >0.985* 2020 [82]
NetMHCIIpan-4.0 ANN-based model using NNAlign_MA framework to train data 4,086,230 data covering a total of 116 distinct MHC class II molecules II pan >0.975* 2020 [83]
MHCflurry 2.0 Integrated predictor which combined models for MHC I binding and antigen processing 75,378 entries from 56 samples I pan >0.91 2020 [84]
RBM-MHC Method combines a Restricted Boltzmann Machine based model and a semi-supervised HLA-I classifier The data is from mass spectrometry and binding affinity assays available in IEDB I pan 0.97 2021 [85]
TransPHLA transformer-based model using self-attention 112 HLA-I alleles, 359,166 positive data (pHLA binders) and 1,795,830 negative data (non-binding pHLAs) I pan >0.9 >0.9 >0.8 2022 [86]
MixMHC2pred v2.0 2 successive blocks of NNs with distinct tasks based on the binding motifs and peptide sequence features HLA peptidomics dataset of 627,013 unique MHC-II ligands and derive motifs for 88 MHC-II alleles II pan 0.945 2023 [87]
BigMHC Model comprise an ensemble of seven deep neural networks, offering two models: ELa and IMb 259,298 EL and 15,065,287 negative instancesa; 1407 antigens and 4778 negativeb I pan 0.9733a 0.7767b 2023 [88]
TripHLApan Model integrates triple coding matrix, BiGRU+Attention models, and transfer learning strategy 2,788,602 HLA-I peptides (464,767 positive sample); 963,186 HLA-II peptides (160,531 positive samples) I, II pan 0.979 2024 [89]
MixMHCpred 3.0 2 blocks of NNs based on the binding motifs and peptide length distributions HLA peptidomics dataset of 511,553 ligands interacting with 143 MHC-I alleles I pan >0.98 2024 [90]
ConvNeXt-MHC Method using degenerate coding approach and ConvNeXt model, which combines transfer learning and semi-supervised learning methods mass spectrometry training seta: 206,515 positive data points and 842,060 negative data points; pMHC affinity training setb: 210,509 positive data points and 38,633 negative data points. I pan 0.964a 0.9048b 0.886a’0.8143b 2024 [91]
ImmuneApp An interpretable, attention-based hybrid deep learning framework, offering ImmuneApp-Neo for immunogenicity prediciton 349,650 ligands I pan 0.9650 2024 [92]
Structure-based Method EpiDock Method based on homology modeling and docking score-based quantitative matrix II 0.83 2013 [93]
MHCfold Consist of a CNN-based modeling module and a transformer’s encoder-based specificity module 390 pMHCI structures; 3,459,753pMHCI pairs I 0.94 2022 [94]
Fine-tuned AlphaFold 2 a simple classification module on top of the AlphaFold network and fine-tuning the combined network parameters for predicting MHC binders training set consisting of 10,340 pMHC examples, 203 structurally characterized and 5102 modeled pMHC binder examples, and 5035 non-binder examples, distributed across 68 Class I alleles and 39 Class II allele pairs. Ia, IIb 0.97a 0.93b 2023 [95]
Prediction of T cell recognition of pMHC complexes sequence-based NetTCR-1.0 CNN-based model 9012 positive and 66,102 negative TCR-peptide combinations, spanning 91 peptides and 8920 TCR sequences I 0.727 2018 [96]
ImRex a CNN with the combined representation of sequences of epitopes and TCR CDR3 Mixed chain dataset: 19,842 unique TCR CDR3 alpha/beta sequences-epitope pairs, 120 epitopes; alpha chain dataset: 5654 sequences, 60 epitopes; beta chain dataset: 14,188 sequences, 118 epitopes I 0.68 2021 [97]
structure-based models hybrid approaches TCRdock Pipeline consists of a specialized version for TCR-pMHC modeling of AlphaFold and matric-based score function 279 total training examples I 0.82 2023 [98]
RACER a coarse-grained, chemically accurate energy model relies on known TCR–peptide structures and experimental data consist of three TCRs pre-identified strong-binding peptides and decoy peptides with randomized sequences I 0.89 2021 [99]
NetTCR 2.2 CNN-based model combining pan- and peptide-specific training, loss-scaling, and sequence similarity integration 6353 TCR sequences across 26 peptides as positive set, negatives were generated by swapping the TCRs for a given peptide with TCRs binding to other peptides, IMMREP 2022 dataset I pan 0.8476 2023 [100]
CATCR system using CNN model extract structure features and a transformer to encode segment-based coded sequence features 65,069 CDR3β–epitope pairs I 0.848 0.89 2024 [101]
Miscellaneous methods CTLpred Method based on QMa, ANNb, and SVMc model for predicting CTL epitopes, also provides consensusd and combinede approaches 1137 CTL epitopes, 1134 non-epitopes I 0.700a 0.722b0.752c 0.776d 0.758e 0.652a 0.732b 0.738c 0.669d 0.797e 0.749a 0.712b 0.770c 0.884d 0.719e 2004 [102]
IL4pred System provides SVM-based modela and hybrid approachb for predicting IL4 inducing peptides 904 experimentally validated IL4-inducing and 742 non-inducing MHC class II binders II 0.6908a 0.7576b 0.38a 0.51b 0.7058a 0.7876b 0.6725a 0.721b 2013 [103]
IFNepitope System provides motif tool, SVM-based modelb, and hybrid approachc h for predicting IFN-γ inducing peptides main dataset: 3705 IFN-γ inducing and 6728 non-IFN-γ inducing MHC class II binders; IFNgOnly dataset: 4483 IFN-γ inducing epitopes and 2160 epitopes that induce other cytokine except IFN-γ II 0.7954b 0.8210c 0.55b 0.62c 0.665b 0.7798c 0.8671b 0.8436c 2013 [104]
CD4episcore System provides an ANN-based model for immunogenicity predictionb, 7-allele method for HLA binding predictionsa, and combine methodc Training dataset consists of two sets: experimental data (1032 positive epitopes and 5739 negative peptides), tetramer data (124 positives and 5319 negative peptides) II 0.703a 0.702b 0.725c 2018 [105]

Key for feature: NN = neural network; SVM = support vector machine; HMM = Hidden Markov Models; PSSM = position-specific scoring matrix; ANN = artificial neural network; SMM-align = stabilization matrix alignment method; CNN = Convolutional Neural Networks; QM = quantitative matrix. EL = eluted ligands. BA = binding affinity. Key for Performance measure: Acc (%) = accuracy; AUC = Area Under the ROC Curve; MCC = Matthew’s correlation coefficient; Sens (%) = sensitivity; Spec (%) = specificity.

*: This value represents the averaged AUC value of multiple HLA molecules reported in this paper.

Prediction of antigen processing and presentation

Natural processing by cellular antigen processing machinery is a necessary step for peptides that would be presented on MHC class I molecules. The first step is the antigen protein was cleaved by the proteasome and the potential epitopes were generated [106]. The prediction of the cleavage site of the proteasome is vital for the identification of potential immunogenic regions in a pathogen protein and the step also shows some degree of specificity. Based on this theory, computational methods such as NetChop [58] and Pcleavage [56] have been developed. The proteasomal fragments were translocation into the endoplasmic reticulum by TAP, and the efficiency of TAP-mediated translocation of peptides has been shown to be proportional to its TAP binding affinity [107]. Studying the selectivity and specificity of TAP may contribute significantly to predicting CTL epitopes. Several in silico methods have been proposed to test whether peptides are capable of binding TAP, such as TAPPred [59], TAP Hunter [57], and TAPreg [64]. There are also hybrid approaches have also been proposed, such as ProPred1 [60] and RANKPEP [61] tools combine the proteasomal cleavage with the process of MHC binding to peptides, and the NetCTL 1.2 [63] tool increases the transfer efficiency of TAP on top of these two processes. Nevertheless, these strategies have limited improvement in performance, the reason might be the insufficient specificity of both proteasomal cleavage and TAP transport predictors, which makes the most predicted binders cannot be identified and presented to MHC class I molecules successfully [107]. For MHC-II molecules, the antigen process has been studied in several researches but related computational approaches have not been widely developed yet [108]. The mentioned approaches have been concluded in Table 2.

Prediction of peptide–MHC binding

The binding of peptide and MHC molecule is the most selective step [109], numerous predictors have been developed which aim to solve three questions: (i) distinguish MHC binders from nonbinders, (ii) predict the binding affinity of peptides to MHC molecules, and (iii) deal with human leukocyte antigens (HLAs) polymorphism (i.e. MHC molecules are known as HLA in humans). The frequency of HLA alleles can vary greatly in different ethnicities and HLA alleles bind distinct sets of peptides, specific epitopes focused on different ethnicities could be predicted by the research of HLA alleles and specific multi-epitope vaccines could be developed [55]. For well-studied HLAs, the determination of binders and calculating the binding affinity can be achieved by most of prediction tools. However, for the less-studied HLAs which are the majority, the traditional approach that generated large data sets of each HLA allele is unfeasible [110], thus the solution of HLA polymorphism is an important difficulty of design of epitope predicted tools to be considered.

The first approach to solve the problem which is the extreme polymorphism of the MHC is to define HLA supertypes. Some researchers found out that different HLA molecules could bind to similar peptides and developed methods to cluster HLA molecules with similar peptide binding specificity as HLA supertypes [110]. In 1999, Sette et al. defined nine HLA Class I supertypes and proved that these nine supertypes could cover almost all known binding properties of HLA class I molecules [111]. Thus, the epitope of unseen HLA molecules could be obtained by predicting the affinity of the peptide with a single MHC molecule representing the corresponding supertype, and this could achieve the population-wide epitope discovery. The following similar approaches such as PEPVAC [72], MULTIPRED [73], MULTIPRED2 [77], and others also exhibited the effectiveness of methods based on supertypes for both HLA-I and HLA-II alleles. However, the methods of supertype are oversimplified for some specific therapies or requirements [110], some researchers proposed pan-specific prediction methods to solve the question, the main advantage is that it could predict the binding of any HLA alleles to any peptides even without any binding information of the query HLA [108]. TEPITOPE [67] is the first pan-specific method that could predict binding to 51 prevalent HLA-DR alleles. NetMHCpan [75] is the first and most commonly used pan-specific prediction tool for MHC-I molecules, which generates a quantitative prediction of the affinity of any peptide-HLA-I interaction.

The bioinformatics tools developed for the prediction of peptide–MHC binding could be divided into two main categories: sequence-based methods rely on sequence information of known binders and HLA molecules, and structure-based methods generally rely on the information of pMHC structures. The typical methods of the two categories have been summarized by year of issue in Table 2.

Sequence-based method relies on sequence information of known binders and HLA molecules

According to the course of development, the sequence-based methods could be divided into two types: simple motif-based models and other heuristic approaches, which are focus on the contribution of each or specific residue to the binding; and machine learning-based methods which are introduced to improve the predictive performance of motif-based approaches. Among machine learning (ML)-based methods, the fundamental processes are first to collect a comprehensive epitope dataset; then, extract the feature vectors from various sources, including propensity scales, protein sequences, and 3D structures; and finally, obtain a trained model by using ML algorithms to train on the collected sample dataset. The progress of algorithms greatly improves the prediction accuracy and promoted the development of epitope prediction in-silico, the average AUC values over 0.9 which means excellent performance of prediction tools. The applications of semi-supervised model and unsupervised model also pave the way of the development of pan-specific methods. The detailed description of both two kinds of method can be seen in the ‘Supplementary Sequence-based method of pMHC binding’ part.

Structure-based method relies on the information of pMHC structures

The structure-based methods depend on the biochemical properties of amino acids involved in pMHC interactions, while only a few molecules obtained 3D structures by experiments, and most of the molecules still need to be modeled [112]. The most structure-based methods would produce various potential docking conformations of given peptides with a given receptor (i.e. sampling), the generated conformations would be ranked and screened to select the suitable binders with binding affinity of given receptors by scoring functions [113]. The prediction performance of the method would be greatly influenced by the quality of both the sampling algorithms and the scoring functions [113]. Since the unsatisfactory performance and high computing costs of computational modeling, the application of structure-based methods is far less than that of sequence-based methods. EpiDock [93] is the first structure-based approach for predicting MHC binders which utilized homology modeling and rigid docking as sampling algorithm and quantitative matrixes-based score function. The method could apply to 23 most frequent HLA II molecules and almost cover over 95% of the human population with an overall accuracy of 83%. With the remarkable performance of AlphaFold which could provide atomic-level information for modeling, several AlphaFold-based tools for have been proposed, such as Fine-tuned AlphaFold 2 [95]. The deep learning (DL)-based methods also show great potential, such as MHCfold [94]. Both methods exhibit impressive superior performance which is comparable to the overall performance of the state-of-the-art NetMHCpan-4.1 and NetMHCIIpan-4.0. The application of structure-based methods is limited by their long computation times and limited coverage of diverse MHC alleles, and most methods are not as user-friendly as sequence-based methods [114], the great performance and limitations of structure-based methods show that there are still a lot of space to explore and robust potential to apply.

Prediction of T cell recognition of pMHC complexes

The above in silico methods are the prediction for MHC class I/II binders, TAP binders, and protease cleavage, which in essence is by reducing the number of potential epitope candidates by constructing a powerful filter [115]. However, most of the predicted binders do not induce an immune response actually because they do not evoke TCR-specific recognition by the T cell. Therefore, efforts were devoted to the development of TCR-pMHC specific interaction prediction methods for studying the initiation and effectiveness of adaptive immune response [116]. In addition to considering antigen processing to develop tools, some interesting ideas have been proposed, such as using real epitopes as datasets to train prediction models and determine whether peptides can trigger cytokines to screen epitopes. The detailed discussion of these two categories can be found in the ‘Supplementary Prediction of TCR-pMHC complexes’ part and ‘Supplementary Miscellaneous methods’ part.

Benchmark dataset generation for T-cell epitope prediction

The MHC-I and MHC-II peptides were derived from IEDB [20]. For a fair comparison, we only involve peptides included in the IEDB database after the year 2024, which can be considered as the completely independent testing dataset for model comparison. The benchmark dataset for T-cell epitope involving 1323 experimentally validated positive peptides and 1305 negative peptides were listed in Supplementary Table S1. Among them, two major pathogens including bacteria and viruses were derived for validation on the pathogens level, which included 53 positive peptides and four negative peptides for bacteria, and 1215 positive peptides and 1278 negative peptides for viruses (Supplementary Table S1).

Moreover, the top two abundant species of SARS-CoV-2 and influenza virus for MHC-I peptides, as well as the top two abundant species of SARS-CoV-2 and hepatitis E virus (HEV) for MHC-II peptides were selected for validation on species level (Supplementary Table S2). The detailed information on how to prepare benchmark datasets and the selection of models can be seen in the ‘Supplementary Benchmark dataset generation for T cell epitope predictors’.

Model comparison for MHC-I T-cell epitope prediction

For model evaluation, we selected eight MHC-I T-cell epitope prediction tools and four MHC-II T-cell epitope prediction tools for comparison, including both the classical and latest SOTA approaches (Table 3 and Table 7). As a typical binary classification issue, we selected accuracy, precision, F1-score, sensitivity, specificity, and Matthews correlation coefficient (MCC) for comparison.

Table 3.

The average performance of each of the MHC-I epitope predictors on the benchmark dataset

Year Method name Availability Input data peptide length(mer) Output TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc
2024 TripHLApan GitHub codes peptide, HLA <=14 probability 893 653 598 386 0.6111 0.5989 0.6448 0.6982 0.5220 0.2238
ImmuneApp webserver peptide, HLA > = 8 score, rank, label, slide window 907 416 885 412 0.5050 0.5061 0.5831 0.6876 0.3198 0.0080
MixMHCpred (v3.0) GitHub codes peptide, HLA 8 ~ 14 score, %rank 885 722 525 390 0.6372 0.6277 0.6592 0.6941 0.5790 0.2750
2023 BigMHC-EL GitHub codes peptide, HLA > = 8 score 460 1036 265 859 0.5710 0.6345 0.4501 0.3487 0.7963 0.1621
BigMHC-IM GitHub codes peptide, HLA > = 8 score 318 1128 173 1001 0.5519 0.6477 0.3514 0.2411 0.8670 0.1385
2022 TransPHLA webserver; GitHub codes peptide, HLA, HLA sequence > = 8, the sequence over 15 will be cut Probability, label, slide window 1147 169 1132 172 0.5023 0.5033 0.6376 0.8696 0.1299 −0.0007
2020 NetMHCpan-4.1-EL webserver peptide, HLA > = 8 score, %rank, slide window 1081 322 979 238 0.5355 0.5248 0.6398 0.8196 0.2475 0.0818
NetMHCpan-4.1-BA webserver peptide, HLA > = 8 score, %rank, slide window 1039 400 901 280 0.5492 0.5356 0.6376 0.7877 0.3075 0.1085
MHCflurry 2.0 GitHub codes peptide, HLA <16 score, %rank 1013 655 600 267 0.6580 0.6280 0.7003 0.7914 0.5219 0.3256
2017 HLAthena webserver peptide, HLA > = 8 score, prank, slide window 251 1104 197 1068 0.5172 0.5603 0.2841 0.1903 0.8486 0.0516
Table 7.

The average performance of each of the benchmarked HLA-II epitope predictors on the benchmark dataset

Year Method name Availability Input data peptide length(mer) Output TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc
2024 TripHLApanII GitHub codes Peptide, HLA probability 86 13 23 78 0.4950 0.7890 0.6300 0.5244 0.3611 −0.0883
2023 MixMHC2pred-2.0 GitHub codes Peptide, HLA 12 ~ 21 score, rank, slide window 24 28 8 140 0.2600 0.7500 0.2449 0.1463 0.7778 −0.0795
2020 NetMHCIIpan-4.0-EL webserver Peptide, HLA >14 score, rank, slide window 60 22 10 99 0.4293 0.8571 0.5240 0.3774 0.6875 0.0503
NetMHCIIpan-4.0-BA webserver Peptide, HLA >14 score, rank 107 18 14 52 0.6545 0.8843 0.7643 0.6730 0.5625 0.1825
2018 CD4episcore webserver Peptide >14 Peptide over the cut-off 111 17 15 48 0.6702 0.8810 0.7789 0.6981 0.5313 0.1808

The overall prediction performance of eight tools showed comparable performance in accuracy (0.50 to 0.66) and precision (0.50 to 0.65), but illustrated different performances on other indicators including F1-score (0.28 to 0.70), sensitivity (0.20 to 0.87), specificity (0.13 to 0.87) and MCC (−0.00 to 0.33). It should be noted that the applicable range of each tool is slightly different. For example, several tools limited the length interval of input peptides, commonly 8-mer to 14 or 15-mer, while other tools limited the longest or shortest peptides as input. Thus, users need to choose suitable tools for their research targets based on the application range of different methods. In fact, we did an extra validation on the consistent testing dataset, which illustrated similar performance (Table 4).

Table 4.

The average performance of each of the MHC-I epitope predictors on the consistent dataset

Year Method name TP TN FP FN accuracy precision f1_score sensitivity specificity mcc
2024 TripHLApan 891 651 596 384 0.6114 0.5992 0.6452 0.6988 0.5221 0.2245
ImmuneApp 888 389 858 387 0.5063 0.5086 0.5879 0.6965 0.3119 0.0091
MixMHCpred (v3.0) 885 722 525 390 0.6372 0.6277 0.6592 0.6941 0.5790 0.2750
2023 BigMHC-EL 460 982 265 815 0.5718 0.6345 0.4600 0.3608 0.7875 0.1638
BigMHC-IM 318 1074 173 957 0.5519 0.6477 0.3601 0.2494 0.8613 0.1398
2022 TransPHLA 1116 142 1105 159 0.4988 0.5025 0.6384 0.8753 0.1139 −0.0167
2020 NetMHCpan-4.1—EL 1055 285 962 220 0.5313 0.5231 0.6409 0.8275 0.2285 0.0700
NetMHCpan-4.1—BA 1016 360 887 259 0.5456 0.5339 0.6394 0.7969 0.2887 0.0994
MHCflurry 2.0 1012 648 599 263 0.6582 0.6281 0.7013 0.7937 0.5196 0.3262
2017 HLAthena 250 1050 197 1025 0.5155 0.5593 0.2904 0.1961 0.8420 0.0499

Further, we evaluated the performance of the above MHC-I T-cell epitope prediction tools on bacteria. Results showed that TransPHLA could achieve the best accuracy of 0.8070 and precision of 0.9200, followed by NetMHCpan-4.1, MHCflurry and MixMHCpred (v3.0), which all achieved an accuracy over 0.7 and were higher than the remaining tools (Table 5). Meanwhile, the validation on viruses illustrated that MHCflurry 2.0 achieved the best prediction accuracy of 0.6592, followed by MixMHCpred (v3.0) of 0.6393. It can be found that the above tools showed a large difference in performance on bacteria, whereas presented similar performance on viruses. This might be caused by the following results: (i) The validated number of bacteria is relatively small, which cannot reflect the performance of different algorithms, and (ii) The dataset of viruses contains a large amount of SARS-CoV-2, which was rarely included in the training dataset of the above models and led to a decrease in the overall prediction performance.

Table 5.

Performance of each HLA-I epitope predictors on the virus dataset and bacteria dataset

  bacteria virus
  TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc
TripHLApan 31 0 4 22 0.5439 0.8857 0.7045 0.5849 0.0000 −0.2178 828 648 576 343 0.6163 0.5897 0.6431 0.7071 0.5294 0.2400
ImmuneApp 30 0 4 23 0.5263 0.8824 0.6897 0.5660 0.0000 −0.2260 847 410 864 364 0.5058 0.4950 0.5797 0.6994 0.3218 0.0229
MixMHCpred (v3.0) 40 0 4 13 0.7018 0.9091 0.8247 0.7547 0.0000 −0.1493 810 716 504 357 0.6393 0.6164 0.6530 0.6941 0.5869 0.2823
BigMHC-EL 23 0 4 30 0.4035 0.8519 0.5750 0.4340 0.0000 −0.2896 420 1027 247 791 0.5823 0.6297 0.4473 0.3468 0.8061 0.1725
BigMHC-IM 11 2 2 42 0.2281 0.8462 0.3333 0.2075 0.5000 −0.1780 298 1109 165 913 0.5662 0.6436 0.3560 0.2461 0.8705 0.1496
TransPHLA 46 0 4 7 0.8070 0.9200 0.8932 0.8679 0.0000 −0.1028 1058 167 1107 153 0.4930 0.4887 0.6268 0.8737 0.1311 0.0071
NetMHCpan-4.1—EL 41 0 4 12 0.7193 0.9111 0.8367 0.7736 0.0000 −0.1419 999 319 955 212 0.5304 0.5113 0.6313 0.8249 0.2504 0.0919
NetMHCpan-4.1—BA 42 0 4 11 0.7368 0.9130 0.8485 0.7925 0.0000 −0.1343 958 397 877 253 0.5453 0.5221 0.6290 0.7911 0.3116 0.1168
MHCflurry 2.0 41 0 4 12 0.7193 0.9111 0.8367 0.7736 0.0000 −0.1419 931 651 577 241 0.6592 0.6174 0.6948 0.7944 0.5301 0.3357
HLAthena 13 3 1 40 0.2807 0.9286 0.3881 0.2453 0.7500 −0.0028 230 1274 0 981 0.6052 1.0000 0.3192 0.1899 1.0000 0.3276

Moreover, two major viruses of SARS-CoV-2 and influenza virus were evaluated to perform the comparison on the species level. It can be found that the prediction accuracy of compared tools on the influenza virus was around 0.49 to 0.55 with similar precision (0.56 to 0.63), while with variable sensitivity (0.17 to 0.93) and specificity (0.08 to 0.88). The Results on SARS-CoV-2 illustrated different phenomena, in which the latest SOTA tools such as MHCflurry 2.0 could give an accuracy of 0.8676 with a precision of 0.7811. Meanwhile, the classical NetMHCpan-4.1 could only achieve an accuracy of 0.5077 and a precision of 0.3653. This might be due to the fact that for SARS-CoV-2, some of the methods do not incorporate enough data for training purposes (Table 6).

Table 6.

Performance of each HLA-I epitope predictor on the different virus datasets

Method name Source Organism Species Negative Positive TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc
TripHLApan Influenza A virus Influenza A virus 137 169 135 30 107 34 0.5392 0.5579 0.6569 0.7988 0.2190 0.0218
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 544 234 144 438 106 90 0.7481 0.5760 0.5950 0.6154 0.8051 0.4130
ImmuneApp Influenza A virus Influenza A virus 137 169 125 41 96 44 0.5425 0.5656 0.6410 0.7396 0.2993 0.0432
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 544 234 175 154 390 59 0.4229 0.3097 0.4380 0.7479 0.2831 0.0318
MixMHCpred (v3.0) Influenza A virus Influenza A virus 137 169 116 50 87 53 0.5425 0.5714 0.6237 0.6864 0.3650 0.0540
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 544 234 144 496 48 90 0.8226 0.7500 0.6761 0.6154 0.9118 0.5607
BigMHC-EL Influenza A virus Influenza A virus 137 169 55 100 37 114 0.5065 0.5978 0.4215 0.3254 0.7299 0.0600
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 544 234 66 520 24 168 0.7532 0.7333 0.4074 0.2821 0.9559 0.3412
BigMHC-IM Influenza A virus Influenza A virus 137 169 29 120 17 140 0.4869 0.6304 0.2698 0.1716 0.8759 0.0661
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 544 234 44 534 10 190 0.7429 0.8148 0.3056 0.1880 0.9816 0.3061
TransPHLA Influenza A virus Influenza A virus 137 169 158 11 126 11 0.5523 0.5563 0.6976 0.9349 0.0803 0.0293
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 544 234 215 44 500 19 0.3329 0.3007 0.4531 0.9188 0.0809 −0.0005
NetMHCpan-4.1—EL Influenza A virus Influenza A virus 137 169 144 24 113 25 0.5490 0.5603 0.6761 0.8521 0.1752 0.0370
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 544 234 220 135 409 14 0.4563 0.3498 0.5098 0.9402 0.2482 0.2195
NetMHCpan-4.1—BA Influenza A virus Influenza A virus 137 169 143 26 111 26 0.5523 0.5630 0.6761 0.8462 0.1898 0.0476
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 544 234 202 193 351 32 0.5077 0.3653 0.5133 0.8632 0.3548 0.2205
MHCflurry 2.0 Influenza A virus Influenza A virus 137 169 134 33 104 35 0.5456 0.5630 0.6585 0.7929 0.2409 0.0404
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 544 234 182 493 51 52 0.8676 0.7811 0.7794 0.7778 0.9063 0.6849
HLAthena Influenza A virus Influenza A virus 137 169 39 110 27 130 0.4869 0.5909 0.3319 0.2308 0.8029 0.0407
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 544 234 43 470 74 191 0.6594 0.3675 0.2450 0.1838 0.8640 0.0612

For MHC-II T-cell epitope predictors, we selected four tools for evaluation on the benchmark dataset. Among them, CD4episcore achieves the best prediction performance on benchmark testing datasets (Table 7) as well as the consistent datasets (Table 8). Studies on two major viruses illustrated that the above tools perform better on SARS-CoV-2 than those on HEV (Table 9). Considering both datasets are relatively small, the performance of those tools on emerging pathogens may require further validation when more data is accumulated.

Table 8.

The average performance of each of the benchmarked HLA-II epitope predictors on the consistent dataset

Year Method name TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc
2024 TripHLApan 82 13 19 77 0.4974 0.8119 0.6308 0.5157 0.4063 −0.0584
2023 MixMHC2pred-2.0 23 24 8 136 0.2461 0.7419 0.2421 0.1447 0.7500 −0.1067
2020 NetMHCIIpan-4.0-EL 60 22 10 99 0.4293 0.8571 0.5240 0.3774 0.6875 0.0503
NetMHCIIpan-4.0-BA 107 18 14 52 0.6545 0.8843 0.7643 0.6730 0.5625 0.1825
2018 CD4episcore 111 17 15 48 0.6702 0.8810 0.7789 0.6981 0.5313 0.1808
Table 9.

Performance of each HLA-II epitope predictor on the different virus datasets

Method name Source Organism Species Negative Positive TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc
TripHLApan Hepatitis E virus Hepatitis E virus 24 76 25 11 13 51 0.3600 0.6759 0.4386 0.3289 0.4583 −0.2127
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 4 16 14 1 3 2 0.7500 0.8235 0.8485 0.8750 0.2500 0.1400
MixMHC2pred-2.0 Hepatitis E virus Hepatitis E virus 24 76 9 23 1 67 0.3200 0.9000 0.2093 0.1184 0.9583 0.1093
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 4 16 11 1 3 5 0.6000 0.7857 0.7333 0.6875 0.2500 −0.0546
NetMHCIIpan-4.0-EL Hepatitis E virus Hepatitis E virus 20 72 50 12 8 22 0.6739 0.8621 0.7692 0.6944 0.6000 0.2516
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 4 15 8 3 1 7 0.5789 0.8889 0.6667 0.5333 0.7500 0.2313
NetMHCIIpan-4.0-BA Hepatitis E virus Hepatitis E virus 20 72 24 15 5 48 0.4239 0.8276 0.4752 0.3333 0.7500 0.0740
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 4 15 7 4 0 8 0.5789 1.0000 0.6364 0.4667 1.0000 0.3944
CD4episcore Hepatitis E virus Hepatitis E virus 20 72 57 12 8 15 0.75 0.8769 0.8321 0.7917 0.6000 0.3548
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 4 15 14 0 4 1 0.7368 0.7778 0.8485 0.9333 0.0000 −0.1217

In general, we evaluated the performance of current SOTA T-cell epitope predictors, the performance illustrated that each tool may have different emphases, some focusing on filtering out as many positives as possible, while others concentrate on ensuring that the predicted positives are as reliable as possible. The users can choose different tools based on benchmark validation for their research purposes.

B-cell epitope prediction

Another important part of adaptive immune response is B cell-induced humoral immunity, B cells mediate humoral adaptive immunity by the production and secretion of different forms of antibodies [55]. The activation of B-cells could be classified into two pathways based on the usage of T-cells. The illustration of B cell activation pathways can be seen in Fig. 3, the description can be seen in the ‘Supplementary B cell activation pathways’ part. The binding parts of BCR (i.e. B-cell epitopes) with antigens could be divided into linear epitopes that is short linear peptides and discontinuous epitopes that is produced by folded protein structures based on their spatial structures [117]. The computational approaches for predicting B-cell epitopes are classified as tools for predicting linear epitopes and tools for predicting conformation epitopes. This review will focus on the features and widely used tools of each class, the input information, features, training dataset and claimed prediction performance, and other typical tools have been summarized in Table 10.

Figure 3.

Figure 3

The schematic illustration of B cell activation pathways. A. Represents the B cell activation pathways, the right parts mean the T cell-dependent pathway, and the left parts mean the T cell-independent pathway. B. Is the simple example of linear B-cell epitopes. C. Is the simple example of conformational B-cell epitopes.

Table 10.

The list of in silico tools for predicting linear and conformational B-cell epitopes

Type   Method Feature Training dataset Input Performance measure Year Ref
            Acc (%) AUC MCC Sens (%) Spec (%)    
Linear PREDITOP amino acid propensity scales-based propensity scale, query sequence, a journal file ~0.6 1993 [115]
PEOPLE amino acid propensity scales-based combined method 1999 [116]
BcePred Combination of amino acid propensity scales-based protein sequence, properties, threshold 0.5870 0.56 0.61 2004 [118]
ABCpred ANN-based 700 B-cell epitopes and 700 random peptides as positive and negative dataset protein sequence, epitope length, threshold 0.6593 0.3187 0.6714 0.6471 2006 [119]
BepiPred HMM model combined with propensity scale method 127 protein sequences with not fully annotated, 0.08 epitope density 0.671 2006 [120]
AlgPred model for mapping lgE epitopes and other functions 178 IgE epitopes protein sequences 0.1747 0.9814 2006 [121]
BCPRED SVM-based model using string kernels 701 linear B-cell epitopes, 701 non-epitopes as positive and negative dataset Antigen sequence, predicted method, epitope length, threshold 0.6790 0.758 0.7261 0.632 2008 [122]
IgPred dipeptide composition-based SVM model for predicting antibody class-specific B-cell epitopes 14,725 B-cell epitopes include 11,981 IgGa, 2341 IgEb, 403 IgAc specific epitopes, and 22,835 non-B-cell, 80% sequence as training dataset amino acid sequence of peptides 0.7173a 0.8496b 0.7228c 0.77a 0.90b 0.78c 0.44a 0.70b 0.45c 2013 [123]
BCIgEpred A dual-layer model for predicting lgE epitopes 1273 IgE epitopes, 4226 non-IgE epitopes protein sequences 0.743 0.833 0.48 0.726 0.758 2018 [124]
iBCE-EL An ensemble method fusion extremely randomized tree and gradient boosting classifiers 4440 BCEs and 5485 non-BCEs protein sequences 0.732 0.789 0.463 0.742 0.724 2018 [125]
SPADE pipeline for the localization of cross-reactive lgE-epitopes based on the structural Structures of the query protein and a cross-reactive second allergen 0.57 0.71 2019 [126]
AlgPred2 ensemble approach for mapping lgE epitopes and other functions 10,451 IgE epitopes, 307,866 non-IgE epitopes protein sequences 0.7299 2020 [127]
DLBEpitope feedforward deep neural network-based ensemble model 25,884 linear B-cell epitopes and 214,679 non-epitopes as positive and negative sample, then extent the dataset to 20,000 positive and 20,000 negative samples for each length NA 0.9568 2020 [128]
AbCPE multi-label method for prediction of antibody class(es) for B-cell epitopes 10,744 epitope sequences, covering 4 specific antibody classes and 7 classes of combination of different antibodies epitope sequences 2021 [129]
BepiPred-3.0 protein language model 358 antigens and 5011 epitope residues protein sequence, threshold, top epitope percentage cutoff 0.688 0.762 0.309 2022 [130]
DeepLBCEPred A Bi-LSTM and multi-scale CNN-based DL method 555 sequences of BCEs and 555 sequences without BCEs protein sequences 0.77 0.54 0.78 0.75 2023 [131]
epitope1D An explainable ML method leveraging two new descriptors: graph-based signatures and organism identification 20,638 epitopes and 6-times negative samples peptides or whole protein sequences, desired window size, organism taxonomy 0.935 0.613 2023 [132]
Conformational Generic antigenic region-based Method CEP Method using accessibility of residues and spatial distance cut-off 3D structure of antigens 0.75 2005 [133]
DiscoTope Method using a combination of amino acid statistics, spatial information, and surface exposure discontinuous epitopes from 76 X-ray structures of antibody/antigen protein complexes 3D structure of antigens 0.155 0.95 2006 [134]
ElliPro tool based on the geometrical properties of protein structure protein sequence or structure 0.840 0.732 0.601 2008 [135]
SEPPA Method based on the residual context and the spatial compactness of neighboring residues 82 Ab-Ag complexes, contained 84 unique epitopes 3D protein structure 0.742 0.580 0.707 2009 [136]
CBTOPE SVM-based model using composition profile of patterns in a protein sequence 187 antigens, 2261 positive patterns (Ab interacting residues of B-cell epitopes), 107,414 negative patterns (non-Ab interacting residues of B-cell epitopes) NA 0.8659 0.73 0.8313 0.9006 2010 [137]
BEST a sliding window-based SVM predictor utilizes Ag sequences 701 B-cell epitopes 20-mers and 701 non-B-cell epitopes 20-mers amino acid sequence 0.745 0.811 0.53 0.561 0.929 2012 [138]
CBEP SVM-based model adopts cost-sensitive ensemble classifiers and spatial clustering Rubinstein’s bound structure dataseta; Liang’s unbound structure datasetb; amino acid sequence 0.721a 0.703b 2014 [139]
SEPIa Method based on sequence-based features and a voting algorithm combines two ML classifiers 85 Ab-Ag complexes, 1667 conformational B-cell epitope residues, and 16,780 other residues amino acid sequence 0.65 2017 [140]
SEPPA 3.0 logistic regression model added two critical glycosylation parameters, and final calibration based on neighboring antigenicity 767 protein antigens (520 with N-glycosylation sites), including 16,544 epitope residues as positive set and 172,975 non-epitope residues as negative set. 3D protein structure, subcellular localization of protein antigen, and species of immune host 0.665 0.749 2019 [141]
DiscoTope-3.0 tool that employs inverse folding structure representations and a positive-unlabelled learning strategy, adapted for both AlphaFold predicteda or solvedb structures 582 Ab-Ag complexes, covering 1466 antigen chains, epitopes are defined as the set of residues within 4 Å of any antibody heavy atom 3D structure of antigens 0.783a 0.795b 0.227a 0.214b 2024 [142]
Antibody-specific epitope-based Method PEASE RF model predicting Ab-specific epitopes from antibody sequence 120 Ab-Ag complexes Ag structure, Ab sequence, Ag sequence (optional, when the 3D structure of Ag is a computational model) 2014 [143]
EpiPred Global docking-based model combines geometric matching and specific scores of Ab-Ag structures 148 Ab-Ag complex structures, 30 of which were randomly chosen as test set, the rest as training dataset structure of an antibody and an antigen 2014 [144]
SEPPA-mAb tool for predicting mAb-specific epitopes, consists of a SEPPA 3.0 model and a fingerprint-based patch model 860 representative Ab-Ag complexes 3D structure of Ab-Ag pair, subcellular localization of protein antigen, and species of immune host (optional) 0.873 2023 [145]
Mimotope-based Method MIMOX Mimotope-based method for phage display based epitope mapping Sequence and 3D structure of antigen, a set of mimotope sequences 2006 [146]
Pepitope Mimotope-based ensemble method for epitope mapping based on affinity-selected peptides a set of affinity-selected peptide sequences and a PDB identifier of the antigen 2007 [147]
Pep-3D-Search Mimotope-based method using Ant Colony Optimization algorithm 3D structure of an antigen and a set of mimotopes or a motif 0.1758 0.3642 2008 [148]
Sun et al ensemble method that combines structure-based method and mimotope-based method 150 Ag-Ab complexes with only one antigen chain antigen structure and mimotopes 0.79 0.14 0.58 0.81 2015 [149]

Key for feature: ANN = artificial neural network; SMM-align = stabilization matrix alignment method; HMM = Hidden Markov Models; ML = machine learning; DL = deep learning; RF = random forest; Ab-Ag = antibody–antigen. Key for Performance measure: Acc (%) = accuracy; AUC = Area Under the ROC Curve; MCC = Matthew’s correlation coefficient; Sens (%) = sensitivity; Spec (%) = specificity.

Prediction of linear B-cell epitopes

Linear B cell epitopes were continuous in sequence, and though the percentage of linear epitopes is small, the predicted tools still received major attention. The early tools were based on sequence and based on simple AA propensity scales depicting physicochemical features of B-cell epitopes, as some qualities such as hydrophilicity, accessibility, and mobility could be used to predict which region of protein can provide antigenic peptides. To improve the poor performance of the amino acid propensity scale-based tools, ML models were introduced. The common tools like ABCpred [119] is an ANN-based method that achieved a prediction accuracy of 65.93%, and BCPRED [122] is an SVM-based method with an AUC value of 0.758. After the phase of propensity scales-based semiempirical methods and the following phase of initial ML methods, the current generation of epitope prediction methods has integrated with emerging technology resources or the recent advances in existing knowledge resources. DLBEpitope [128] is the first effort that designed prediction models for linear B-cell epitopes using DL methods, scored an impressive 0.95 AUC on the IEDB dataset, significantly outperforming the other methods. In addition, novel techniques and advances such as fuzzy-Artmap Artificial Neural Networks, feature maps, and advanced protein language model have also been used in developing prediction tools [150]. The above methods were proposed to predict the antigenic region or B-cell epitopes that can induce B-cell response, there are some predictive tools at another level which can distinguish the predicted epitopes that could induce which class of antibody [129]. The ‘Supplementary B linear epitopes’ part has listed a more detailed description of each phase.

Prediction of conformational B-cell epitopes

Approximately 90% of the BCEs are discontinuous in nature, however, because of the insufficient of available antibody–antigen complex structures, the degree of the development of discontinuous epitopes is limited [129]. Existing methods for conformational epitope prediction can be broadly classified into three categories: generic antigenic region-based methods, mimotope-based methods, and antibody-based methods. According to the kind of input data as antigen primary sequence or tertiary structure, the generic antigenic region-based methods could be divided into sequence-based methods and structure-based methods. The first predicted method CEP [133] was developed based on the solvent-accessibility character of residues and spatial distance cut-off to predict potential epitope regions from the 3D structure of antigens. CBTOPE [137] is the first attempt in this area to develop a B-cell conformational epitope prediction tool based only on the antigen’s primary sequence. However, the available structure-based methods are heavily dependent on the number and quality of currently available antigen 3D structure data [151], and the main downside of sequence-based methods is that the predicted epitope residues are not grouped into the corresponding epitopes and the prediction performance is far from satisfactory [152]. Antibody-specific epitope predictors are the new trend in the field, which could deal with a more constrained and tractable task as opposed to generic epitope prediction, but it is antibody-dependent, and the number and knowledge of the subsistent antibodies limited the usage [151]. The other category of the predicted method is the mimotope-based approach, the peptides that could mimic epitopes are called mimotopes [153]. The core idea of the approach is to map the mimotope sequences that are obtained from a random phage display library [154]. The 3D structure of antigen and antibody affinity-selected peptides both needed to be input. The high false positive or false negative rate is the main disadvantage of the mimotope-based method, since the complexity of the screening process [155]. The size and data diversity of the phage display library would also greatly affect the performance [156]. The description of typical methods of each category has been summarized by year of issue in Table 10, and the detailed record of typical tools of each category can be found in the ‘Supplementary B conformation’ part.

The above tools have brought pioneering progress in the development of epitope prediction, which greatly promotes the process of the fields and brings inspiration to the development of novel prediction methods with high performance. However, one drawback exhibited in most tools is the instability of predictions [117]. Cia and co-workers [157] constructed a large and well-curated dataset that contains over 250 antibody–antigen structures in 2023 to verify the performance of nine commonly conformational B-cell epitope predictors. According to the assessment, each models output different prediction performance on the benchmark dataset, which reflects the instability of predictions and the effect of different data sets on predictions. The nine methods tested above are divided into generic and antibody-specific methods, the other research indicated that the performance of mimotope-based approaches was also not as good as expected, so there is still urgency to develop a new method to predict conformation B-cell epitopes that could have high performance in most proteins.

Benchmark dataset generation for linear B-cell epitope prediction

The linear B-cell epitopes were derived from IEDB [20] with key words of linear peptide, human host, and positive/negative, and we only involve peptides included in the IEDB database after 2023, which can be considered as the completely independent testing dataset for model comparison.

The benchmark dataset for B-cell epitope involving 2158 experimentally validated positive peptides and 2240 negative peptides were listed in Supplementary Table S1. Among them, two major pathogens including bacteria and viruses were derived for validation on the pathogens level, which included 233 positive peptides and 767 negative peptides for bacteria, and 837 positive peptides and 708 negative peptides for viruses (Supplementary Table S1).

Moreover, the epitopes from the top two abundant species of Norwalk virus and SARS-CoV-2 were selected for validation on the species level (Supplementary Table S2). The detailed information on how to prepare benchmark datasets and the selection of models can be seen in the ‘Supplementary Benchmark dataset generation for B cell epitope predictors’.

Model comparison for B-cell epitope predictors

The performance of SOTA spatial B-cell epitope predictors including Bepipred2 [158], CBTOPE [137], SEPPA 3.0 [141], DiscoTope 2.0 [159], ElliPro [135], EPSVR [160], BEpro [161], epitope3D [162], and EpiPred [144] were systemically reviewed in Cabriel’s work [157] . By four specifically designed matrices, Cabriel’s benchmark test showed that DiscoTope2, EPSVR, and BEpro are significantly better than random procedure and random patches procedure across all metrics. BepiPred 2, SEPPA 3.0, ElliPro, and EpiPred are significantly better for some of the metrics, while CBTOPE and epitope3D are no better than random across all metrics [157]. Moreover, Cabriel’s work evaluated the performance of SARS-CoV-2 spike protein and showed that EPSVR achieved the best ROC-AUC of 0.75, followed by ElliPro of 0.66 and BepiPred2 of 0.64. Meanwhile, the other three tested tools of CBTOPE (ROC-AUC = 0.51), DiscoTope2 (ROC-AUC = 0.60), and epitope3D (ROC-AUC = 0.38) show no better than random [157].

Besides spatial B-cell epitope prediction, we evaluated the performance of linear-epitope prediction tools on the benchmark dataset (Table 11). Among all four tools, iBCE-EL seems to achieve the best accuracy of 0.6157, and epitope1D achieves the best precision of 0.6791. For balanced metrics such as balanced accuracy and MCC, iBCE-EL could outperform all others (Table 12). The validation on pathogens level (Table 13) and species level (Table 14) illustrated similar results, while iBCE-EL could outperform all other three on the benchmark dataset. Note that all tools performed better on the Norwalk virus than SARS-CoV-2, it is expected that as a newly emerging pathogen, previous tools may fail to predict the real epitope peptides. This might be enhanced when enough data for new pathogens accumulated for model training.

Table 11.

Average performance of each of the benchmarked B cell linear epitope predictors on the benchmark dataset

Year Method name Availability Input data peptide length(mer) Output TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc
2023 DeepLBCEPred webserver peptide label 892 1357 883 1266 0.5114 0.5025 0.4539 0.4133 0.6058 0.0195
2023 epitope1D webserver peptide >5, over 25 will be cut Score, label 91 2195 43 2053 0.5217 0.6791 0.0799 0.0424 0.9808 0.0674
2022 BepiPred-3.0 webserver peptide >9 Analysis result of each sequence, uppercase letters are epitopes 1650 105 2110 414 0.4101 0.4388 0.5666 0.7994 0.0474 −0.2345
2018 iBCE-EL webserver peptide Probability, label 1154 1554 686 1004 0.6157 0.6272 0.5773 0.5348 0.6938 0.2316
Table 12.

Average performance of each of the benchmarked B cell linear epitope predictors on the consistent dataset

Year Method name TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc
2023 DeepLBCEPred 805 1355 860 1259 0.5048 0.4835 0.4318 0.3900 0.6117 0.0018
2023 epitope1D 84 2177 38 1980 0.5284 0.6885 0.0769 0.0407 0.9828 0.0707
2022 BepiPred-3.0 1650 105 2110 414 0.4101 0.4388 0.5666 0.7994 0.0474 −0.2345
2018 iBCE-EL 1068 1550 665 996 0.6118 0.6163 0.5625 0.5174 0.6998 0.2211
Table 13.

Performance of each B cell linear epitope predictors on the virus dataset and bacteria dataset

  bacteria virus
  TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc
DeepLBCEPred 109 484 283 124 0.5930 0.2781 0.3488 0.4678 0.6310 0.0856 296 485 223 541 0.5055 0.5703 0.4366 0.3536 0.6850 0.0408
epitope1D 1 766 1 232 0.7670 0.5000 0.0085 0.0043 0.9987 0.0283 37 698 10 794 0.4776 0.7872 0.0843 0.0445 0.9859 0.0881
BepiPred-3.0 227 8 751 4 0.2374 0.2321 0.3755 0.9827 0.0105 −0.0262 597 35 673 205 0.4185 0.4701 0.5763 0.7444 0.0494 −0.2814
iBCE-EL 70 649 118 163 0.7190 0.3723 0.3325 0.3004 0.8462 0.1586 523 490 218 314 0.6557 0.7058 0.6629 0.6249 0.6921 0.3161
Table 14.

Performance of each B cell linear epitope predictor on the different virus datasets

Method name Source Organism Species Positive Negative TP TN FP FN Accuracy precision f1_score sensitivity specificity mcc
DeepLBCEPred Norovirus GII Norwalk virus 157 9 10 9 0 147 0.1145 1.0 0.1198 0.0637 1.0000 0.0606
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 397 540 155 395 145 242 0.5870 0.5167 0.4448 0.3904 0.7315 0.1291
epitope1D Norovirus GII Norwalk virus 157 9 16 9 0 141 0.1506 1.0 0.1850 0.1019 1.0000 0.0782
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 391 540 4 537 3 387 0.5811 0.5714 0.0201 0.0102 0.9944 0.0267
BepiPred-3.0 Norovirus GII Norwalk virus 155 9 96 0 9 59 0.5854 0.9143 0.7385 0.6193 0.0000 −0.1806
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 370 540 263 22 518 107 0.3132 0.3367 0.4570 0.7108 0.0407 −0.3499
iBCE-EL Norovirus GII Norwalk virus 157 9 102 0 9 55 0.6145 0.9189 0.7612 0.6497 0.0000 −0.1685
Severe acute respiratory syndrome coronavirus 2 Severe acute respiratory syndrome coronavirus 2 397 540 247 432 108 150 0.7247 0.6958 0.6569 0.6222 0.8000 0.4300

Vaccine construction and optimization

In the general construction of a multi-epitope vaccine, an adjuvant would be added to the N-terminal with a suitable linker, and following the linker, PADRE sequence is appended which could be seen as an additional adjuvant to increase the immunogenicity, then connected to epitopes with linkers and there is no fixed sort of placing the epitopes, finally, other functional peptides could be appended to the vaccine’s C-terminal or not. Codon adaptation is an important process before in silico cloning of the vaccine construct, which aims to adjust the codon of the vaccine to fit the codon usage preference of hosts to achieve a high expression rate by tools. In real-world applications, in-silico tools already illustrated a strong ability to in assist the design of vaccine pathogens. For example, a personalized neoantigen vaccine targeted melanoma constructed based on the epitopes identified by NetMHCpan version 2.4 tool, which has already passed the clinical I phase trial and confirmed that in vaccinated four years later, still can provide the body with durable anti-tumor ability [163]. Similarly, neoantigen vaccine targeting glioblastoma, also developed based on target epitopes identified by NetMHCpan version 2.4 tool, has shown great immune ability in phase Ib trial and could generate circulating polyfunctional neoantigen-specific CD4 and CD8 T cell responses [164]. These success paradigms demonstrate the feasibility of the computational design of epitope-based vaccines and the significant contribution of bioinformatics methods especially epitope prediction tools to the development of multi-epitope vaccine candidates. These studies also exhibited the potential of the development of a personalized precision vaccine. The commonly used literature of predicted tools, linkers, adjuvants, functional peptides, and codon optimization tools can be seen in Table 15, which summarizes the main studies of the development of a multi-epitope vaccine based on bioinformatics tools in 2019-2024.

Table 15.

Main studies of multi-epitope vaccine design using bioinformatics tools against different pathogens

Target pathogen Pre-PVC Feature Epitope prediction tool Adjuvant Linker Functional peptides codon optimization Experimental Verification Years Ref
Acinetobacter baumannii 32 protein candidates Selected from a literature review CTL, HTL, B: EigenBio’s proprietary epitope prediction software AddaS03 EAAAK, GPGPG, KK, 6-His Tag, thioredoxin protein GenScript Y 2024 [165]
Haemophilius parainfluenzae two proteins that are essential and non-homologous protein with highest antigenicity entire proteome is retrieved from UniProt, screened by several filters CTL: CTLPred
B: ABCPred (for linear); ElliPro (for discontinuous)
cholera enterotoxin subunit B EAAAK, AAY, GPGPG, KK NA JCat N 2024 [166]
Bluetongue virus (BTV) 3 full-length amino acid sequences of BTV NS proteins Download form available genomes of 24 serotypes of BTV of NCBI database CTL: ‘NetMHCpan BA 4.1’ module of IEDB; IEDB-AR server
HTL: ‘MHCIIpan 4.0 BA’ module of IEDB; NetBoLAIIPan 1.0 tool
beta-defensin 2, or 50S ribosomal subunit EAAAK, AAY, GPGPG NA JCat N 2024 [167]
Epstein–Barr virus (EBV) EBVpoly recombinant protein An engineered recombinant polyprotein that is formed by linking 20 CD8 + T cell epitopes from several EBV proteins CTL: literature Amphiphile-modified CpG DNA adjuvant proteasome liberation amino acid sequence EBV gp350 protein experiment Y 2023 [168]
SARS-CoV-2 A surface glycoprotein (ID: QHO62112.1) retrieved from the NCBI database CTL: NETCTL_1.2; MHC-NP tool; IEDB MHC-I binding predictions tool; MHC-NP
HTL: IEDB MHC-II binding predictions
B: Bepipred 2.0, Chou & Fasman Beta-Turn Prediction, Emini Surface Accessibility Prediction, Karplus & Schulz Flexibility Prediction, Kolaskar & Tongaonkar Antigenicity and Parker Hydrophilicity Prediction
NA NA NA NA N 2020 [169]
Mycoplasma synoviae 5 proteins of strain rSC0200 These proteins have been regarded as promising PVCs in earlier studies CTL: NetMHCcons 1.1
HTL: NetMHCIIpan 4.0
B: BCpred
β-defensin EAAK, AAY, GPGPG, KK NA Experiment, replacement of TGA by TGG Y 2023 [170]
Group B streptococcus (GBS) 15 proteins experimentally proven protection against GBS CTL: TEPITOOL
HTL: TEPITOOL
B: ABCpred; Bcpred
NA GPPGPG, LRMKLPKS NA JCat Y 2022 [171]
monkeypox virus 176 gene-coding peptide sequences whole genome and all 176 genome-encoded proteins has been studied T: IEDB v2.24 webserver
B: Bepipred 2.0
CTxB EAAAK, AAY, GPGPG, KK PADRE NA N 2022 [172]
monkeypox virus 139 epitopes sequences experimentally determined epitopes, obtained from ViPR database β-defensin 3, 50S ribosomal protein L7/L12, Heparin-binding hemagglutinin EAAAK, AAY, GPGPG, GGGS PADRE, TAT JCat N 2022 [173]
Hafnia alvei 12 proteins Whole 11 genome sequences download from NCBI, screened by several filters T: IEDB T-cell antigenic determinants analysis package;
B: Bepipred
Cholera toxin B-subunit EAAAK, GPGPG, NA JCat N 2022 [174]
Lymphocytic choriomeningitis virus 4 proteins Whole proteome retrieved from the UniProt database, screened by several filters CTL: NetCTL 1.2; CTLPred; IEDB SMM method
HTL: NetMHCIIpan 4.0; IEDB SMM method; IFNepitope; IL4pred; IL10pred
B: BCPred; BepiPred-2.0; ABCpred; Ellipro (for discontinuous)
50 S ribosomal protein L7/L12 EAAAK, AAY, GPGPG, KK PADRE, 6xHis tag JCat N 2022 [175]
Monkeypox virus 10 proteins with the highest antigenic probability score entire proteome related to epidemic in 2022, downloaded from UniProt, screened by antigenic and allergic CTL: NetCTL-1.2
HTL: IEDB MHC-II binding predictions; IFN-epitope
B: ABCpred
humanβ-defensin-2 EAAAK, AAY, GPGPG, KK NA JCat N 2023 [176]
Crimean-Congo hemorrhagic fever virus (CCHFV) 2 consensus sequences of glycoprotein precursors (GPC) and nucleoproteins (NP) GPC and NP have been proven protection against CCHFV CTL: NetCTL 1.2; IEDB MHC-I binding predictions; SMMPMBEC
HTL: NN-align 2.3 (NetMHCII 2.3)
B: Bepipred 2.0, Emini surface accessibility prediction, Kolaskar and Tongaonkar antigenicity and Karplus and Schulz flexibility prediction method
50S ribosomal protein L7/L12 GGGGSEAAAKGGGGS, GPGPG, KK, AAY PADRE, MITD sequence, GenSmart N 2022 [177]
Aeromonas hydrophila An aerolysin protein that conserved and has no similarity with human and normal flora proteome The role of Aerolysin has been proven by studies, related information obtained from UniProt and PDB database, screened by several filters CTL: IEDB-recommended method
HTL: IEDB-recommended method
B: BepiPred-2.0
Cholera Toxin B EAAAK, GPGPG NA JCat N 2024 [178]
Coxsackievirus B (CVB) complete amino acid sequences NCBI Protein database CTL: NetCTL-1.2; NetMHCpan 4.1
HTL: NetMHCII-2.3; NetMHCIIpan 4.0; IFN- epitope
B: ABCpred, BCPreds (for linear);
ElliPro, DiscoTope (for discontinuous)
β-defensin EAAAK, AAY, GPGPG, KK PADRE, TAT JCat N 2022 [179]
Monkeypox Virus MPXVgp181 virus protein Protein dataset obtained from NCBI, screened by several filters CTL: IEDB MHC-I binding predictions tool
HTL: IEDB MHC-II binding predictions tool; NN-align 2.3
B: Bepipred 2.0
50S ribosomal
protein L7/L12
EAAAK, AAY, GPGPG, KK 6xHis tag JCat N 2022 [180]
Rotavirus and Norovirus All structural proteins of two viruses epidemiological relevance CTL: IEDB MHC-I binding predictions; NetCTL-1.2
HTL: IEDB MHC-II binding predictions; NETMHCII 2.3
B: ABCpred
conserved enterotoxic sequence of RV’s NSP4, tetanus toxin epitope P2 EAAAK, AAY, GPGPG NA GenSmart N 2023 [181]
Visceral leishmaniasis 10 proteins (6 are parasitically derived, 4 are salivary proteins) Literature survey, immune ability of salivary proteins has been proven, retrieval from NCBI, screened by several filters CTL: NetCTL-1.2
HTL: RANKPEP
B: ABCpred
TLR-4 agonist EAAAK, AAY, GPGPG, KK NA NA N 2020 [182]
SARS-CoV-2 9 SARS-CoV-2 proteins proteome of SARS-CoV-2, screened by length and antigenic CTL: NetCTL-1.2; EpiToolKit
HTL: IEDB MHC-II binding predictions; IFN- epitope
B: ABCpred, BepiPred (for linear);
ElliPro (for discontinuous)
β-defensin EAAAK, AAY, GPGPG, KK PADRE, TAT JCat N 2020 [183]

NA: data not given.

Vaccine construction

Base on the mentioned construction strategy, the assembly process of designing a multi-epitope vaccine would involve the application of linkers, adjuvant, and other functional peptides. The related information resources of vaccine construction and optimization are listed in Table 16, which contains information of databases and analysis tools of adjuvants, linker, and codon optimization.

Table 16.

A list of tools and databases of adjuvants, linker, and codon optimization

Name usage Input Output URL Ref
linker
LinkerDB database of inter-domain linkers Query types A list of linker identifiers along with their sequences, the identifier could connect to the 3D structure of linkers http://mathbio.nimr.mrc.ac.uk [209]
MEROPS manually curated resources for peptidases, their inhibitors, and substrates with known cleavage sites Index of Name, MEROPS Identifier, or source organism A list of peptidase names with detailed information links https://www.ebi.ac.uk/merops/ [210]
LINKER an automatic program generates peptide sequences with extended conformations determined by experiments desired linker sequence length and optional input parameters A list of peptide sequences with specified criteria http://astro.temple.edu/~feng/Servers/BioinformaticServers.htm [211, 212]
SynLinker System compiled 2260 linker sequences containing natural linkers and artificial/empirical linkers Query criteria A list of linker candidates with structure information satisfying criteria, fusion protein structure, and hydropathicity plot http://bioinfo.bti.a-star.edu.sg/synlinker [213]
adjuvant
Vaxjo database and analysis system contains the basic information and usage of vaccine adjuvants Keyword and their field A table of related adjuvants with a link of basic adjuvant information and associated vaccine information. https://violinet.org/vaxjo/ [214]
vaccineDA model of designing oligo deoxy nucleotide-based vaccine adjuvants Query nucleotide sequence A list of query sequences with predicted class (Immunomodulatory or not), score, and additional properties https://webs.iiitd.edu.in/raghava/vaccineda/ [215]
vaxinPAD webserver for designing or predicting peptide-based vaccine adjuvants. query peptide sequence A list of query sequences with predicted class (Immunomodulatory or not), score, and additional properties https://webs.iiitd.edu.in/raghava/vaxinpad/ [216]
Codon optimization
JCAT statistic-based method, a Java-based tool, for most prokaryotic and some eukaryotic organisms query sequence, class of query sequence, organism which codon should be adapted to The CAI values of query sequence and optimized sequence, graphical representation of the relative adaptiveness http://www.prodoric.de/JCat [217]
Optipyzer statistic-based method, a multi-species server with the ability to process large sets of genes A query sequence, species weights, sequence type Optimized sequence https://optipyzer.com [218]
ICOR An RNN-based method, for Escherichia coli amino acid sequence an optimized nucleotide codon sequence https://github.com/Lattice-Automation/icor-codon-optimizati [206]
Fu et al. A DL-based method that converts the codon optimization into sequence annotation with codon boxes, for E. coli NA NA https://github.com/Devil625/Codon_Optimization.git [219]
MOABC A heuristic-based method using a multi-objective adaptation of the Artificial Bee Colony algorithm NA NA NA [220]
LinearDesign A heuristic-based method for optimized mRNA in both structure stability and codon usage by adapting the lattice parsing concept Single protein sequence, the class of paste sequence, beam size(optimal) Two CSV files, one contains predicted values, optimized sequence, and structure, and another contains codon usage frequency table https://rna.baidu.com/app/vaccine/linear-design/forecast [221]

Use of linkers

Linkers are generally short stretch amino acid sequences derived from nature, inserted in a protein to separate multiple domains and act as spacers [184]. Linkers are indispensable components of rational vaccine design, ensuring the immunogenicity of each individual epitope and minimizing the junctional immunogenicity, and some linkers could induce an immune response and thus increase the immunogenicity of the vaccine [185]. The natural linkers may serve as a good lead for fusing proteins of interest, and the empirical linkers have been widely used in the design of various applications [186]. To accelerate the progress of selecting linkers, several interesting bioinformatics tools and databases have been developed to facilitate linker selection for rational vaccine designing, the introduction of information resources including input, output, and usage has been listed in Table 16.

Based on the sequence and structure, linkers used in vaccine design could be divided into two categories: flexible and rigid linkers [186]. ①Flexible linkers have great flexibility and mobility and are suitable for joining two functional domains, allowing a certain amount of movement or interaction of the protein domains [187]. Most widely used flexible linkers such as GPGPG, AAY, HEYGAEALERAG, and KK sequences were usually applied to join CTL, HTL, and BCL epitopes. The linker of GPGPG can induce HTL immune response and facilitate epitope presentation [188]. The AAY and HEYGAEALERAG sequence provides the site of proteasome cleavage, thus could prevent the loss of epitope during antigen presentation [185, 189, 190]. KK linker can reduce the junctional immunogenicity, which might be caused by linearly joint epitopes [185]. Nevertheless, the high flexibility and lack of rigidity make the functional domains separate less effectively and may cause a loss of biological activity and poor expression yields [187, 191]. ②When a fixed distance between the functional domains and sufficient separation of protein domains is desired, rigid linkers may be the best choice [192]. EAAAK is one of the commonly used rigid linkers for joining adjuvant and CTL epitopes [188] and could separate the two protein components to increase efficiency and minimize interference [189].

Researchers have demonstrated that linkers could increase the stability of fusion protein vaccine and the degree of vaccine stability is changed with different linkers [193]. The other properties also have significant impacts on function and flexibility [194]. Thus, the properties and components of linkers and the requirements and characteristics of the designed vaccine should be worth careful consideration when selecting a suitable linker for rational vaccine development.

Use of adjuvants

One of the limitations of epitope-based subunit vaccines is their weak ability to activate the immunity system as they only consist of the antigenic components of pathogens. Thus, adjuvants are always co-administered with vaccines to enhance the magnitude and durability of the immune response by using a delivery system or chemical conjugation with the peptide to incorporate into the vaccines [8, 195]. Besides the ability to elicit a robust immune response, adjuvants could also increase the biological half-life of vaccines, induce the production of immunoregulatory cytokines, and induce local inflammation and cellular recruitment [196]. Detailed information about the mechanisms and platforms of adjuvants could be seen in other research [197].

Two major types of adjuvants include delivery adjuvants and immune agonists. (i) Delivery adjuvants, including aluminum adjuvants, MF59, emulsions, liposomes, etc., can effectively help antigen presentation. The earliest and most widely used aluminum adjuvants could induce an effective humoral immune response, but insufficient cellular immune response [198] and could also cause cell damage [199]. (ii) Immunoagonists, such as CpG oligodeoxynucleotides (CpG ODN), could enhance the immune response by recognizing and triggering toll-like receptors, activation of toll-like receptors (TLRs) on the surface of APCs will induce the secretion of various kinds of cytokines which will promote Th response [200]. TLR agonists, especially for TLR3, TLR4, and TLR9, are widely used adjuvants for computationally designed vaccines [201], and the efficacy of TLR agonists has been proven in many preclinical and clinical studies [200]. However, there is a potential risk of systemic toxicity and inflammation, and it may not be ideal to enhance the immunity of the vaccine [202, 203].

There are some resources for the selection and development of vaccine adjuvants, the description of these resources has been listed in Table 16 and ‘Supplementary Adjuvant and functional peptides’ part. A good adjuvant that could be used for the development of a multi-epitope vaccine must have some characteristics like safe, well-tolerated, stable, reproducible, robust, scalable, and easy to produce [204]. In the progress of the vaccine design currently, the selection of adjuvants is resource-independent, and usually references previous related literature to choose several types of adjuvants to construct vaccine models and the one with the best immunity will serve as the last selected. The majority of vaccine candidates use the agonists of TLRs as adjuvants to synergistically activate the immune response [205]. We still need to take care of the disadvantages of adjuvants in the rational development of vaccines, including the possibility of adverse reactions, less effective in older populations, and weak ability to induce CD8+ T cell-mediated cellular immunity [203]. Therefore, we should be careful about the selection of adjuvants. The PADRE sequence is also appended to vaccine construction to increase the immunogenicity which could be seen as an additional adjuvant, the application of PADRE sequence and other functional peptides is introduced in the ‘Supplementary Adjuvant and functional peptides’ part.

Vaccine construct optimization

Determining the expression of the constructed protein in a heterologous host is a significant step for the development of vaccines, the successful expression of vaccine candidates would be allowed for future production and purification [16, 206]. As the degeneracy of codons allows multiple codon coding the same protein and the codon usage bias is different for expression hosts, codon optimization of vaccines by using tools is an effective approach to improve the protein expression in the heterologous hosts [9, 207]. Besides the improvement of protein expression, the effectiveness of immunization, stability, protein conformation, and protein function could also be altered by codon optimization [208]. The CAI index and CG-content are widely used for assessing the express degree of vaccine sequence [173, 174, 180], it is generally agreed that a CAI value between 0.8-1.0, the score of GC-content between 30 and 70% would be a good vaccine candidate with high efficiency of transcription and translation [174]. There are numerous tools for codon optimization have been developed which could be divided into three categories: statistic-based, ML-based, and heuristic-based methods. The introduction of the three categories can be seen in the ‘Supplementary Codon optimization’ part and the information of related methods has listed in Table 16.

Computational verification of vaccine construct

Prediction of antigenicity

Besides the proteins that have been experimentally demonstrated to be immunogenicity, the antigenicity of other candidate proteins should be tested, only the proteins that are considered to have antigenicity can be selected for further study. There are several related RV programs, that could be categorized into two types: filtering and classifying [222], both types take protein sequences as input, and output whether the proteins can be potential vaccine candidates or not based on the antigenicity predicted result. Vaxign [223], NERVE [224], Jenner-predict [225], and VacSol [226] are classical filtering tools. Vacceed [227] and VaxiJen [228] are the typical classifying programs.

However, none of the above models could achieve a recall over 0.76. The advent and progress of ML/DL algorithms have also brought great improvements in the above two fields of predicting immunogenicity. The newest Vaxign-DL [222] method was developed by employing a multi-layer perceptron model which is a kind of DL algorithm that operates through the sequential layering of nonlinear processing units, achieve an AUC value of 0.94. The Vaxign-DL and VaxiJen v3.0 [229], which has been updated by introducing several ML algorithms and a new dataset, both exhibited great performance in the bacterial immunogenicity prediction, while the VirusImmu [230] method is proposed to improve the accuracy in the prediction of viral protective antigens, which adopt a soft voting approach to construct an ensemble model based on eight commonly-used ML methods. The VirusImmu method shows the powerful and stable capability for immunogenicity prediction with the highest AUC value over other commonly used models on their independent external test set.

Prediction of allergenicity and toxicity

The epitopes selected for the construction of multi-epitope vaccines must be non-allergic and non-toxic to avoid potential dangers for humans.

In silico prediction of peptide allergenicity

The assessment of potential allergenicity is an indispensable step in the development of vaccines [231]. The development of classical vaccines introduces large proteins or whole organisms as antigens which increases the unnecessary antigenic load, while epitope-based vaccines just comprise antigen regions of foreign substances which could decrease the chance of inducing an allergenic response [232]. The selected epitopes used for the vaccine construct and the final vaccine construction candidates must be classified as non-allergen by allergenicity prediction tools [232]. There are two criteria for the assessment of allergenic potential defined by WHO/FAO: the identity of at least six contiguous amino acids, or over 35% similarity over a window of 80 amino acids with known allergens [233]. Databases, such as AllergenOnline [234], COMPARE [235], and SDAP [54] et al. could obtain information on known allergens. The computational methods of prediction of potential allergenicity could be divided into three categories: alignment-based, alignment-free, and hybrid approaches, the classical tools have been listed in Table 19, and the discussion of all three categories can be seen in the ‘Supplementary Prediction of allergenicity’ part.

Table 19.

A list of tools for evaluation of vaccine

Usage Tool Feature Input Output Cut off Platform Year Ref
antigenicity Vacceed Classifying class, pipeline based on eukaryotic pathogens resources, including build proteome and ML algorithm. Proteome a ranked list of protein candidates >0.5 source code 2014 [227]
VaxiJen Classifying class, alignment-independent method based on ACC transformation Single/multiple protein sequence, target organism prediction probability, a statement of protective antigen or non-antigen >0.5 webserver 2007 [228]
VaxiJen v3.0 Classifying class, a voting approach based on three supervised ML methods target organism, single/multiple protein/peptide Predicted probability for immunogenicity NA webserver 2020 [229]
NERVE Filtering class, an automated RV system, identify PVCs from bacterial proteomes Proteome A ranked table of PVCs with predicted features information and links to corresponding primary data Non-surface antigens, >2 transmembrane helices, adhesin probability>0.46 or 0.38, and no or low similarity with human proteins source code 2006 [224]
Jenner-predict server Filtering class, method predicting PVCs from bacterial proteomes based on identifying critical functional domains Protein sequence or a proteome A ranked list of PVCs and information of predicted parameters Non-cytosolic protein, <3 transmembrane helices, Pfam ID is listed in master list webserver 2013 [225]
Vaxign Filtering class, a system based on genome sequences, including two programs: Vaxign Query, Dynamic Vaxign Analysis Selected genomes, or single /multiple protein sequence and parameters (optional) A table of PVCs with information of predicted parameters and similar proteins Outer membrane proteins, <2 transmembrane helix, >0.51 Adhesin probability, no homology with human and mouse proteins webserver 2010 [223]
VacSol Filtering class, a highly scalable, multi-mode, and configurable software for identifying PVCs from bacterial proteomes proteome a summary report of query proteome with predicted feature information of each sequence, the proteins that meet all feature criteria in report would have an epitope analysis table no homology with human proteins, <2 transmembrane helix, essential gene, virulent protein, no cytoplasmatic protein source code 2017 [226]
Vaxign-ML Classifying class, a ML classification RV program, incorporates both biological and physicochemical properties Single protein sequence, pathogen type the percentile rank score and basic information >58% webserver source code 2020 [262]
Vaxign2 Classifying class, a comprehensive web server consisting of predictive and computational workflow components Protein sequence, parameters A table with basic analysis results, a table with immunogenicity and functional profile, contains basic and population coverage information of predicted epitope Vaxign-ML score > 90, adhesin probability>0.51 webserver source code 2021 [263]
DeepImmuno A CNN-based model using a beta-binomial distribution approach to determine the immunogenicity potential of query peptide with HLA-I molecules Peptide and MHC molecule Immunogenicity score and binding score of query peptide and MHC molecule, extra information of query sequence and MHC. >0.5 webserver source code 2021 [261]
Vaxign-DL a three-layer fully connected NN model NA NA NA NA 2023 [222]
VirusImmu Classifying class, a novel soft-voting ensemble approach based on the top three models Protein sequences the predicted antigenicity score >0.4 source code 2023 [230]
allergenicity Allermatch sequence similarity-based, three alignment methods: sliding window approacha, wordmatchb, and full alignmentc amino acid sequence without header A ranked table of similar allergens with alignment scores and detailed informationab A bar diagram shows the hit number between the query sequence and allergenic protein database, a ranked list of similar allergens with alignment scores and detailed informationc 35%a webserver 2003 [264]
AllerTool sequence similarity-based, four integrated tools: XR-BLASTa, XR-Graphb, ALR-SCANc and ALR-SVMd amino acid sequence information on allergens that have reported cross-reactivity with the individual matchesa, a possible allergen cross-reactivity relationship graphb, a list of matchs that satisfy either of the rulesc, a list of high-similarity allergen sequences and reported cross-reactivity informationd NA webserver 2007 [265]
Bjorklund et al motif similarity-based, a dataset of allergen-representative peptides (ARPs), a supervised classifier DASARP based on the automated selection of ARPs Peptide sequence Predicted score value 5.51, a higher value indicating a higher risk of allergenicity NA 2005 [266]
Lu et al. motif similarity-based, a dataset of allergen-specific motifs based on physical and chemical properties (PCP-motifs) for 17 highly populated protein domains, a score model based on PCP-motifs NA NA NA NA 2018 [267]
AllergenFP An alignment-free descriptor-based fingerprint approach One protein sequence a statement of probable allergen or probable non-allergen, a link of protein from the pair with the highest Tanimoto similarity index NA webserver 2014 [268]
AllerTOP alignment-free predictor based on the main physicochemical properties of proteins One protein sequence a statement of probable allergen or probable non-allergen, a link of the nearest protein in UniPrptKB database NA webserver 2013 [269]
AllerTOP v.2 alignment-free method using amino acid E-descriptors, ACC transformation, and ML algorithm One protein sequence a statement of probable allergen or probable non-allergen, a link of the nearest protein in UniPrptKB database NA webserver 2014 [270]
ProAll-D alignment-free model using long short term memory slgorithm protein sequence a statement of allergen or non-allergen NA webserver source code 2022 [271]
Kumar et al alignment-free ensembled approach using three DL models Protein sequence the class label (allergen or not) of query protein NA python programs 2023 [272]
AlgPred Hybrid approach, contains (i) scanning of IgE epitopes; (ii) motif-based approach; (iii) SVM-based method using amino acid composition; (iv) SVM module based on dipeptide composition, (v) BLAST search on ARPs, and (vi) Hybrid Approach One protein sequence comprehensive information about the prediction that includes score, threshold, distance from threshold, precision and negative prediction value −0.4(iii) −0.2(iv) webserver 2006 [121]
AlgPred 2.0 Hybrid approach, integrates four major modules: (i) prediction using hybrid or RF model, (ii) IgE epitope mapping, (iii) motif scan, and (iv) BLAST search one or more protein sequence a statement of allergen or non-allergen, predicted score of each method (for (i)), the predicted similar proteins or motifs from the database (for (ii) (iii) (iv)) (i) default threshold is 0.3, while user can change the value webserver standalone source code python programs 2020 [127]
AllerCatPro1.7 Hybrid model based on similarity of both their amino acid sequences and 3D structures one or more protein sequence A table with the predicted result for allergenicity, the identity scores of similar allergens NA webserver 2019 [273]
AllerCatPro 2.0 Hybrid model based on similarity of both their amino acid sequences and predicted 3D structures, provide clinical relevance information one or more protein/nucleotide A table with the predicted result for allergenicity, the identity scores, functionality and clinical information of similar allergens NA webserver 2022 [274]
toxicity ToxinPred 1.0 Model developed for predicting and designing toxic peptides, including several major modules: (i) designing peptide, provides multiple methods to select, (ii) batch submission, (iii) Protein scanning, (iv) QMS calculator, and (v) motif scanning one or more peptide/protein sequence A list with a statement of toxic or non-toxic, result score, and predicted physiochemical properties (for (i)(ii)(iii)), the mutant positions of peptides (for (i)(ii)), a table of original sequence and QM score ((for (iv)), a quantitative matrix of positions and QM scores (for (v)) SVM threshold and E-value cut-off for motif-based method should be defined by users, the default status is 0.0 and 10 webserver 2013 [225]
ToxinPred2.0 a protein toxicity predictor, including four major modules: (i) prediction using two models, (ii) motif scan, (iii) BLAST search and (iv) Download one or more protein sequences a statement of toxic or non-toxic, result scores of each method (for (i)), similar sequence information from the database (for (ii)(iii)) (i) default threshold is 0.6, (iii) default E-value is 10e-6, both values could be changed webserver stand-alone 2022 [236]
ToxinPred3.0 a peptide toxicity predictor, including five major modules: (i) prediction, using models of ET/DL based or hybrid approaches, (ii) protein scanning, (iii) motif scan, (iv) BLAST search, and (v) Download one or more protein sequences a statement of toxic or non-toxic, result scores of each method and precision value (for (i)(ii)(iii)), similar sequence information (for (iv)) <0.5 means non-toxic of ET/DL based model, <0.38 means non-toxic of hybrid approaches, the E-value is 10e-3, while the values could also be defined by users webserver standalone pip package 2023 [238]
ToxDL a multi-modal DL-based approach, which could deal with variable-length sequences in input one or more peptide sequence The predicted score and status, a proclaimed contribution score for each amino acid, and toxic domains detected by InterProScan >0.5, non-toxic webserver source code 2021 [239]
CSM-Toxin an in-silico protein toxicity classifier using natural languages model one or more protein sequences a table includes predictions for each protein sequence with general physicochemical details NA webserver source code 2023 [240]
ATSE a peptide toxicity predictor based on DL model exploiting structural and evolutionary information peptide sequences a toxicity probability of a given peptide NA webserver 2021 [241]
ToxIBTL a peptide and protein toxicity predictor based on DL framework by utilizing the information bottleneck principle and transfer learning Protein/peptide sequence a toxicity probability <0.5, non-toxic webserver source code 2022 [242]
Protein-peptide docking pyDockWEB rigid-body docking program using electrostatics and desolvation scoring 3D structures of two interacting proteins a gzip compressed tar archive containing structure files and process files of generated docking model NA webserver 2007 [249]
ZDOCK rigid-body docking program using a combination of shape complementarity, electrostatics and statistical potential terms for scoring two structures to be docked 3D structures of generated complex models and the center-of-mass positions of ligands, download link of each predicted model NA webserver 2014 [248]
GalaxyPepDock templated-based docking method based on interaction similarity and energy optimization protein structure, peptide sequence A table contains the best 10 generated models with structure, additional information, download links NA webserver 2015 [192]
Cluspro rigid-body docking program using four scoring schemes Two protein/peptide 3D structures The calculated scores and 3D structures of top 10/20/30 scoring docking models with download link NA webserver 2020 [250]
Immune Simulation C-IMMSIM An agent-based model to simulate the immune system response of mammalian at cellular level after the injection of antigen one or more protein sequence Plots relative to the cell count of immune related cells, the predicted outcome of the epitope/peptide NA webserver 2010 [257, 260]

Key for feature: ACC: auto cross covariance; ML: machine learn; DL: deep learning; NN: neural network; ET: Extra tree. NA: data not given.

In silico prediction of peptide toxicity

The screening of non-toxic peptides is another important step in the filter of epitope candidates. The computational prediction methods for chemical toxicity and the in silico tools specialized for toxins of certain animal origins have been greatly studied over the years [236], while the attempt of the predictive technologies of peptide toxicity is limited but still pave the way of determining the non-toxic peptide that reduces the number of experiments [237]. ToxinPred [225] server has been extensively adopted by the scientific community for predicting the toxicity of peptides, which is an SVM-based model using features of dipeptide composition and amino acid composition (AAC) to determine the toxicity peptides or non-toxicity peptides. The upgraded version of ToxinPred2.0 [236] was designed for the prediction of protein toxicity, which compensates for the ToxinPred 1.0 length limit on predictable peptides. The newest version 3.0 [238] was proposed in 2023 for predicting peptide toxicity with the upgrade of an algorithm by using a new ML model (extra tree-based) or DL model (ANN–LSTM with fixed sequence length), and the predicted performance of ensemble approaches and solely ML/DL model all achieved a remarkable AUC value.

The field of developing DL-based model is getting more attention since their great performance and the ability to handle complex tasks. Many in silico methods of DL-based have been proposed, including ToxDL [239], CSM-Toxin [240], ATSE [241], and ToxIBTL [242], and the MLP and CNN are the most used algorithms [243]. However, sometimes the performance of DL models is not better than ML models as expected may be the insufficient large dataset and the non-clear mechanistic understanding of the prediction have limited the progress of the model [243, 244]. The above methods (with the exception of ToxinPred) have rare applications in the construction of epitope-based vaccines due to their late development. With the emergence of more epitope-based vaccine research, the methods may be better applied.

The potential of binding to immune receptors

The innate immune receptors are important sensors of the innate immune system, which can recognize and bind to pathogen-associated molecules, and then act as molecular switches to trigger innate immune activation and subsequent adaptive immune responses [245, 246]. TLRs are the main immune receptors in mammals [191], TLRs are widely distributed and can detect a variety of ligands derived from both pathogenic and non-pathogenic microbial infections [245]. The molecular docking of TLRs and vaccine candidates is a pivotal step in validating vaccine effectiveness since successful docking could be seen as a signature of the potential of vaccine construction to trigger human immune response [10]. There are several points should be noted when selecting a docking receptor among the ten known TLRs: could recognize and respond to the pathogen-associated molecular patterns and ligands of specific pathogens, high expression levels of TLR in cells that interact with pathogens, using the extracellular or ectodomains to bind with ligands [187, 247]. There are numerous docking tools for computational simulation of the binding of vaccine constructs and TLRs, classical tools exhibit the effective performance of protein-TLR binding, such as GalaxyPepDock [192], ZDOCK [248], pyDockWEB [249], and Cluspro [250]. Meanwhile, in order to improve the prediction accuracy, a variety of ML\DL algorithms have been introduced into the development of tools [251, 252], and the remarkable and revolutionary performance of AlphaFold3 in predicting the joint structure of complexes has revealed the great potential of AI in the development of molecular docking and modeling structures in the future [253].

Immune response simulation

Computational immune simulation is significantly important in the in silico design of vaccines, the verification of the ability to induce the immune response of vaccine candidates by using in silico methods with accuracy and low computational cost would effectively reduce the trial-and-error cost of experimental work. The techniques of modeling the immune system could be divided into two categories: equation-based and agent-based modeling (ABM) [254]. However, the non-linearities of the immune system make it difficult to get the right model by using equation-based methods [255], thus, the programs are commonly developed by using ABM technology. ABMs are models which observe and describe the characteristics of population by using simple rules to dictate the behavior and interaction patterns of agents at the individual level [256]. C-IMMSIM is the only available simulation program that has been great widely used in the development of multi-epitope vaccines [16]. C-IMMSIM simultaneously simulates three compartments found in mammals: the bone marrow, the thymus, and a tertiary lymphatic organ, the tertiary organ is the place where the interactions among cells and molecules take place and have been described geometrically [257]. The program uses the sequence of antigen protein in FASTA format as the input, administrates vaccine injection following user-defined intervals, and the immune response profiles in the human body of antibodies titers and several immune cells are the final output. The simulation time and targeted people could be changed by adjusting the default parameters [258]. The high consistent of predicted results with real-world animal experimental data has been demonstrated, which greatly indicated the reliability of the C-IMMSIM server and showed the remarkable application of immunoinformatic techniques in vaccine development [259, 260].

Validation of real-world cases

To explore the performance of antigenicity prediction tools in real-world applications, we have collected positive vaccine sequences and constructed negative sequences for validation. The positive multi-epitope vaccines, including SARS-CoV-2 vaccine [183], CVB vaccine [179], MPXV-1-3 [173], LCMV vaccine [175], ChRNV22 [181], MVC [176], and MPXV vaccines [172, 180] have been collected from published literature, all of above illustrated potential immunogenicity through computational verification or experimental verification in their studies. The collected process of negative samples can be seen in the ‘Supplementary Validation of real-world cases’. Here, we selected two widely used and user-friendly tools of VaxiJen [228] and DeepImmuno [261] tools for real-world vaccine validation. As illustrated in Table 17, the predicted score of VaxiJen 2.0 is higher in the vaccine group (average value of 0.6219) than in negative control group (average value of 0.5125), but without statistical significance (P = .14). Though the average scores of both groups are over the threshold, the predicted scores of the vaccine group are well above the threshold, the highest score over twice than the threshold. The result exhibited that although there is no statistical significance between the negative and positive results, the VaxiJen tool can get higher scores for real vaccines. The result may lack of representation since the volume of vaccine sequence datasets is relatively small which may make the tool cannot effectively distinguish negative data from positive data during operation.

Table 17.

Statistical analysis result of VaxiJen 2.0 tool

  VaxiJen 2.0 (threshold = 0.4) Average score P value
positive 0.5308 0.5076 0.6319 0.5517 0.5940 0.6323 0.5923 0.5606 0.4391 1.1081 0.6219 0.1363
negative 0.5687 0.4694 0.4855 0.3673 0.7175 0.5662 0.4920 0.6097 0.4210 0.4273 0.5125

Meanwhile, the results of DeepImmuno illustrated in Table 18, most of the average scores of positive group are higher than that of negative group, statistical significance in four out of 10 tested HLA alleles (P < .05), indicating that this tool may have the potential to distinguish effective immune epitopes (positive results) from ineffective ones (negative results). The performance of this tool in recognizing the immunogenicity of CD8+ epitopes reflects the effectiveness and importance of computational tools in the process of vaccine verification. Meanwhile, it provides a reliable idea for the development of prediction tools for the immunogenicity of CD4+ epitopes and B cell epitopes. According to the performance difference of the two tools, we should be cautious about using computational tools for antigen screening. The antigen candidates of vaccines can be identified efficiently by in silico methods but there may still be FP results, so screened results should be combined with other more reliable experimental methods for comprehensive evaluation. Secondly, both of these tools provide useful ideas for future tool development or improvement, we look forward to more comprehensive and accurate immunogenicity prediction tools in the future.

Table 18.

Statistical analysis result of DeepImmuno tool

    DeepImmuno (threshold = 0.5)
    HLA-A*0201 HLA-A*1101 HLA-A*2402 HLA-B*0702 HLA-B*0801 HLA-B*3501 HLA-B*4001 HLA-C*0102 HLA-C*0401 HLA-C*0702
Average score Positive 0.5468 0.5387 0.4449 0.5102 0.8133 0.7225 0.8008 0.9558 0.8402 0.8423
negative 0.5091 0.4795 0.4191 0.4987 0.7774 0.6778 0.7666 0.8087 0.9379 0.8032
P value 0.1972 0.1625 0.2097 0.6922 0.0136* 0.0381* 0.0565 3.669E-18* 8.071E-12* 0.0364*

The role of ai in the development of vaccines

The applications of AI technology have greatly improved the development of vaccines, both ML and DL techniques are widely used to develop computational tools. AI is becoming increasingly relevant for epitope prediction, the development of DL-based methods including DLBEpitope [128], BigMHC [88], MHCflurry [84], and other epitope predictors have greatly improved the efficiency of epitopes screening, tools like Fine-tuned AlphaFold2 and MHCfold [94] can achieve the accurate modeling of pMHC complexes.

Besides the above epitope predictors, the application of AI has also promoted the construction of pipelines for the design and screening of immunogenic antigens for vaccines. NeoDisc [275] is an end-to-end clinical proteogenomic pipeline that integrates various in silico tools for the identification, prediction, and prioritization of immunogenic tumor-specific HLA-I and -II antigens. The DeepNovoAA [276] and pTuneos [277] are also pipelines developed for the identification and design of neoantigens based on DL. The Neo-intline [278] tool is an integrated pipeline to simulate the presentation process of peptides in vivo. The TransPHLA-AOMP [86] is a transformer-based model that is derived from the same team as TransPHLA used for pHLA binding prediction. Based on the predicted results of TransPHLA, the AOMP program can automatically optimize mutated peptides for peptide vaccine design. The above tools demonstrated that AI technology can greatly promote the progress in antigen identification and design and the potential for personalized antigen discovery and neoantigen cancer vaccine design.

Moreover, AI plays an important role in the optimization of vaccines, the LinearDesign [221] tool introduced the classical concept of lattice parsing in computational linguistics to handle the insurmountable computational challenge caused by codon usage and the limited in application of mRNA vaccines caused by mRNA instability and degradation. The application of the tool would shorten the cycle and reduce the costs of the development of vaccines. In the computational verification of vaccine candidates, AI-based tools such as DeepImmuno [261], ProAll-D [271], ToxinPred3.0 [238], CSM-Toxin [240], ATSE [241], and others have covered the antigenicity, allergenicity and toxicity, and other aspects of the validation.

Besides design based on current appeared pathogens or mutations, the development of the EveScape [279] tool also demonstrated the potential of AI in the forecasting of emerging mutant strains future. The EveScape is a flexible framework that can quantify the viral escape potential of mutations at scale and predict probable further mutations. According to the forecasted emerging viruses with pandemic potential, we can assess the ability of developed vaccines to protect against future new viruses and improve or design new vaccines or drugs to achieve early prevention and control. The development of the above tools covers various aspects of vaccine design and demonstrates that the application of AI technology has greatly improved the design of vaccines. The utilization of AI-based tools has simplified the steps, and accelerated the process of vaccine development, and improved the accuracy and efficiency of vaccine design. The requirement for a large number of computing sources and long run times [280], and the need for high quality and large number of training data [281] have limited the development and application of DL-based tools to some extent. The un-interpretable result of prediction is another main challenge [282]. Thus, to promote the development of DL-based methods with great performance, we may need to further explore the high-quality data and the mechanism of tools.

Discussion and perspective

Revolutionary technological advances over the past few decades have greatly changed the form of vaccine development, the usage of effective and reliable bioinformatic tools has replaced the tedious and time-consuming experiment, which greatly improves the application and shortens the development cycle of vaccines. Following the genome era advent, the integration of computational means with biology knowledge is an inevitable tendency to develop medical interventions, the multi-epitope vaccine is a new attractive type, and the rational design of vaccines showed a guarantee to produce a long-term cross-protection and controlled robust immune response, and also could improve side effects to ensure the safety of vaccines in humans. The rational design of multi-epitope vaccines largely depends on the existing databases and immunoinformatics tools, mainly including three aspects: 1) immunogen design, 2) vaccine construction and optimization, and 3) computational verification. In this review, we systemically review the pipeline of computational designing a multi-epitope vaccine, and the current development strategies with commonly used in silico resources including databases and tools of each step have been introduced. Further, we designed three benchmark validations on T-cell epitope predictors, B-cell epitope predictors, and immunogenicity predictors for real-world vaccine sequences. The results of benchmark validation can provide hints to users for in-silico vaccine design and optimization.

The development process of multiple-epitope vaccines is highly procedural and data-driven [283]. The high-quality database, effective computational tools, and standard procedure will greatly promote the development of vaccines. The selection of suitable data is the foundation of further analysis which will decide the accuracy and reliability of the result. Currently, related databases are well established, and most of them contain high-quality experimentally or manually curated data, such as IEDB [20] and AntiJen v2.0 [24]. Meanwhile, there are multiple pan-epitope databases, with only a few for specific pathogens. For example, the HBV and influenza peptides in the IEDB database are relatively small, which cannot be used to generate the pathogen-specific model. This is largely due to the accumulation of experimental data and the updating rate of each database. The update frequency will also affect the data quality, such as the IMGT/mAb-DB database being updated twice per year [22], InnateDB database will weekly updated annotation [43], STCRDab is automatically updated weekly [35], up-to-date data will help researchers to master the latest developments. Besides, the difference in data format and annotation standard will increase the complexity of data processing. As the usage of databases, whether the operation is simple, whether to provide query tools and extra tools, for the integration of the relevant database, whether free and so on will affect the use efficiency of the user. These gaps in databases may be bridged through the development of stricter and standard criteria for data processing and the application of advanced technology. This is one of the important development directions of our future.

Also, data can be considered as the source of vaccine design, and appropriate bioinformatics methods are the tools that influence the success rate. Despite the fact that relevant methods have been constantly introduced in the last decades with several successful cases of in-silico vaccine design, the overall success rate is still not particularly satisfactory. This is partially because the performance of models still needs to be further improved, but more importantly due to the limitations of the computational tools that are designed to solve binary classification, prediction and regression issues cannot deal with the entire complex biological process, especially the response of the immune system to the vaccines. The computational design encompasses the screening of antigens, the prediction of epitopes, the optimization of vaccine construct, the computational verification, and so on, which would involve many tools. The selection of suitable tools is also a challenge. To standardize and simplify the overall process of vaccine design, the development of AI-based pipelines that integrate various tools with great performance is another direction for future study. T cell epitope predictors developed mainly focusing on the mechanism of processing and presentation, but many peptides that could be processed and presented still are not immunogenic [284]. Thus, we should clarify the internal mechanism of being epitopes and combine the other aspects such as the binding stability of peptide and HLA molecule, the secretion of specific cytokines, and the binding of pMHC-TCR to develop new epitope predictors. Currently, tools for predicting epitopes are mainly sequence-based, however, sequences are essentially a one-dimensional abstraction that often fails to encapsulate some high-dimensional information [16], meanwhile, the time-consuming experiments have limited the accumulation of high-quality 3D structures which makes the development of structure-based method lag behind the sequence-based methods. But the occurrence of AlphaFold which has exhibited significant performance in modeling, may accelerate the progress of the prediction and application of conformational epitopes since an accurate computational modeling model could greatly reduce the need of structures in nature. There are several tools based on AlphaFold have been developed [95, 98], but the number is still low, we might need to pay more attention to studying how to apply AlphaFold to epitope-based vaccine developments.

Despite the great advantages of currently available tools, there is still room for improvement to guide future vaccine design. Current SOTA methods can be divided into webserver-based tools and GitHub code- based tools according to the usage for users. New tools such as MixMHCpred [90], BigMHC [88], and ConvNeXt-MHC [91] can only be applied by GitHub code whether than the user-friendly webserver. Moreover, those tools require high computational resources, which may not be convenient enough for researchers without enough background knowledge and resources. In this regard, webserver-based tools such as ImmuneApp [92], NetMHCpan [75], HLAthena [80], and TransPHLA [86] are more friendly for researchers. Meanwhile, webserver-based tools limited the submission maximum. For example, ImmuneApp, NetMHCpan-4.1, and HLAthena can only submit a limited number of HLA alleles and peptides, which makes it difficult for large-scale screening.

More importantly, besides those tools for predicting the interactions between specific molecules such as peptide–MHC, pMHC-TCR, and epitope-BCR, there is an urgent need to develop integrated epitope prediction processes, which can fill in the gaps of high prediction performance of in-silico models and low clinical application successes. In the field of tumor neoantigen vaccine design, pipelines rather than individual tools become to accumulate in recent years, which help to accelerate the discovery of neoantigens. Typical works including muller’s approach [285] and neo-intline [278] considered multiple or even dozens of different in vivo processing steps to incorporate the pipeline that ultimately allows for the direct screening of suitable neoantigen design. This idea can also be applied to the development of in-silico approaches for pathogen-related vaccine design in the future.

Secondly, in the selection of linkers and adjuvants during the construction of multi-epitope vaccines, there are only a few relevant information resources, and the design tools of vaccine adjuvants are still needed. Meanwhile, since the current selection usually references previous related literature, the scope of application, advantages, and disadvantages of currently commonly used adjuvants to be obtained through a literature survey, maybe could provide comprehensive information for future vaccine design and selection. The use of nanotechnology can offer a self-adjuvanting delivery system and reduce the toxicity of currently studied experimental adjuvants in the development of vaccines, the application of antigen in nanoparticle form is another area worth discussing [7]. Since advanced technology has revolutionized vaccine development, there are various studies about the computational design of vaccines every year, but only a small portion of a huge number of designed vaccines has the potential to apply and is worth validating by experiments. The in-silico evaluation platform should be constructed and used as the last step to assess the effectiveness of studies of the development of vaccines. In addition to the above-discussed computation verification terms factors, others like glycosylation site, physicochemical properties, etc. have also been studied in some research. Therefore, constructing a complete and rigorous standard process of evaluation could ensure the feasibility and evaluation efficiency of vaccine evaluation data. This review has systemically summarized the strategy of rational design of multi-epitope vaccines in silico and introduced the widely used bioinformatics tools and databases on the basis of the relative literature. We hope this review can help the researchers identify the basic steps of developing a multi-epitope vaccine and select suitable tools for each step quickly, provide some help for future research.

Key Points

  • The multi-epitope vaccine is a promising strategy for the prophylactic and therapeutic against pathogens infection, the pipeline of computational designing a multi-epitope vaccine can be divided into four critical steps.

  • The current development strategies with commonly used in silico resources including databases and tools of each step have been concluded which can help researchers quickly grasp the basic points of developing multi-epitope vaccines and available resources.

  • The description of tools used in each process containing claimed performance, features, dataset, and input/output information have been summarized, which can used as a reference for researchers to select suitable tools of each process for designing a multi-epitope vaccine.

  • This study provides three benchmark validations on T-cell epitope predictors, B-cell epitope predictors, and immunogenicity predictors for real-world vaccine sequences, the results can provide hints to users for in-silico vaccine design and optimization.

Supplementary Material

bbaf055-Supplementary
bbaf055-supplementary.docx (118.1KB, docx)

Acknowledgements

This work is supported by the Medical Science Data Center of Fudan University.

Contributor Information

Yiwen Wei, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.

Tianyi Qiu, Institute of Clinical Science, Zhongshan Hospital; Intelligent Medicine Institute; Shanghai Institute of Infectious Disease and Biosecurity, Shanghai Medical College, Fudan University, No. 180, Fenglin Road, Xuhui Destrict, Shanghai 200032, China.

Yisi Ai, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.

Yuxi Zhang, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.

Junting Xie, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.

Dong Zhang, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.

Xiaochuan Luo, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.

Xiulan Sun, State Key Laboratory of Food Science and Technology, School of Food Science and Technology, National Engineering Research Center for Functional Foods, Synergetic Innovation Center of Food Safety and Nutrition, Jiangnan University, Lihu Avenue 1800, Wuxi, Jiangsu 214122, China.

Xin Wang, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China; Shanghai Collaborative Innovation Center of Energy Therapy for Tumors, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.

Jingxuan Qiu, School of Health Science and Engineering, University of Shanghai for Science and Technology, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China; Shanghai Collaborative Innovation Center of Energy Therapy for Tumors, No. 334, Jungong Road, Yangpu District, Shanghai 200093, China.

Authors’ contributions

YWW: Conceptualization, Writing—original draft, Investigation, Formal analysis, Data curation. TYQ: Conceptualization, Writing—review & editing, Project administration, Supervision, Resources. YSA, YXZ, JTX, DZ, and XCL: Resources, Data curation, Validation. XLS and XW: Writing—review & editing, Supervision. JXQ: Writing—review & editing, Resources, Conceptualization.

Conflict of interest: None declared.

Funding

This work was supported by grants from the National Key Research and Development Program of China (2022YFF1103101), the National Natural Science Foundation of China (32370697).

Data availability

For access to any research-related data, kindly reach out to the corresponding author.

References

  • 1. Zimmermann  P, Curtis  N. Factors that influence the immune response to vaccination. Clin Microbiol Rev  2019;32:e00084–18. 10.1128/cmr.00084-18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. van der  Kooij  RS, Steendam  R, Zuidema  J. et al.  Microfluidic production of polymeric Core-Shell microspheres for the delayed pulsatile release of bovine serum albumin as a model antigen. Pharmaceutics  2021;13:1854. 10.3390/pharmaceutics13111854 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Delany  I, Rappuoli  R, De Gregorio  E. Vaccines for the 21st century. EMBO Mol Med  2014;6:708–20. 10.1002/emmm.201403876 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Dolgin  E. How personalized cancer vaccines could keep tumours from coming back. NewsFeature. Nature  2024;630:290–2. 10.1038/d41586-024-01717-x [DOI] [PubMed] [Google Scholar]
  • 5. Pollard  AJ, Bijker  EM. A guide to vaccinology: from basic principles to new developments. Nat Rev Immunol  2020;21:83–100. 10.1038/s41577-020-00479-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Excler  J-L, Saville  M, Berkley  S. et al.  Vaccine development for emerging infectious diseases. Nat Med  2021;27:591–600. 10.1038/s41591-021-01301-0 [DOI] [PubMed] [Google Scholar]
  • 7. Skwarczynski  M, Toth  I. Peptide-based synthetic vaccines. Chem Sci  2016;7:842–54. 10.1039/C5SC03892H [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Kalita  P, Tripathi  T. Methodological advances in the design of peptide-based vaccines. Drug Discov Today  2022;27:1367–80. 10.1016/j.drudis.2022.03.004 [DOI] [PubMed] [Google Scholar]
  • 9. Bahrami  AA, Payandeh  Z, Khalili  S. et al.  Immunoinformatics: In Silico approaches and computational design of a multi-epitope, immunogenic protein, international reviews of immunology. Int Rev Immunol  2019;38:307–22. 10.1080/08830185.2019.1657426 [DOI] [PubMed] [Google Scholar]
  • 10. Yurina  V, Adianingsih  OR. Predicting epitopes for vaccine development using bioinformatics tools. Therapeutic Advances in Vaccines and Immunotherapy  2022;10:25151355221100218. 10.1177/25151355221100218 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Zhang  L. Multi-epitope vaccines: a promising strategy against tumors and viral infections. Cell Mol Immunol  2018;15:182–4. 10.1038/cmi.2017.92 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Parvizpour  S, Pourseif  MM, Razmara  J. et al.  Epitope-based vaccine design: a comprehensive overview of bioinformatics approaches. Drug Discov Today  2020;25:1034–42. 10.1016/j.drudis.2020.03.006 [DOI] [PubMed] [Google Scholar]
  • 13. Cai  X, Li  JJ, Liu  T. et al.  Infectious disease mRNA vaccines and a review on epitope prediction for vaccine design. Brief Funct Genomics  2021;20:289–303. 10.1093/bfgp/elab027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Rappuoli  R. Reverse vaccinology. Curr Opin Microbiol  2000;3:445–50. 10.1016/S1369-5274(00)00119-3 [DOI] [PubMed] [Google Scholar]
  • 15. Rappuoli  R, Bottomley  MJ, D’Oro  U. et al.  Reverse vaccinology 2.0: human immunology instructs vaccine antigen design. Journal of Experimental Medicine  2016;213:469–81. 10.1084/jem.20151960 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Goodswen  SJ, Kennedy  PJ, Ellis  JT. A guide to current methodology and usage of reverse vaccinology towards in silico vaccine discovery. FEMS Microbiol Rev  2023;47:fuad004. 10.1093/femsre/fuad004 [DOI] [PubMed] [Google Scholar]
  • 17. Woolums  AR, Swiderski  C. New approaches to vaccinology made possible by advances in next generation sequencing. Bioinformatics and Protein Modeling, Current Issues in Molecular Biology  2021;42:605–34. 10.21775/cimb.042.605 [DOI] [PubMed] [Google Scholar]
  • 18. Hegde  NR, Gauthami  S, Sampath Kumar  HM. et al.  The use of databases, data mining and immunoinformatics in vaccinology: Where are we?  Expert Opin Drug Discovery  2017;13:117–30. 10.1080/17460441.2018.1413088 [DOI] [PubMed] [Google Scholar]
  • 19. He  Y, Racz  R, Sayers  S. et al.  Updates on the web-based VIOLIN vaccine database and analysis system. Nucleic Acids Res  2014;42:D1124–32. 10.1093/nar/gkt1133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Vita  R, Mahajan  S, Overton  JA. et al.  The immune epitope database (IEDB): 2018 update. Nucleic Acids Res  2019;47:D339–43. 10.1093/nar/gky1006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Koşaloğlu-Yalçın  Z, Blazeska  N, Vita  R. et al.  The cancer epitope database and analysis resource (CEDAR). Nucleic Acids Res  2023;51:D845–52. 10.1093/nar/gkac902 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Manso  T, Folch  G, Giudicelli  V. et al.  IMGT® databases, related tools and web resources through three main axes of research and development. Nucleic Acids Res  2022;50:D1262–72. 10.1093/nar/gkab1136 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Ansari  HR, Flower  DR, Raghava  GPS. AntigenDB: an immunoinformatics database of pathogen antigens. Nucleic Acids Res  2010;38:D847–53. 10.1093/nar/gkp830 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Toseland  CP, Clayton  DJ, McSparron  H. et al.  AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data. Immunome Research  2005;1:4. 10.1186/1745-7580-1-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Zhang  W, Wang  L, Liu  K. et al.  PIRD: Pan immune repertoire database. Bioinformatics  2020;36:897–903. 10.1093/bioinformatics/btz614 [DOI] [PubMed] [Google Scholar]
  • 26. Kulkarni-Kale  U, Raskar-Renuse  S, Natekar-Kalantre  G. et al.  Antigen-Antibody Interaction Database (AgAbDb): a compendium of antigen-antibody interactions. In: De RK, Tomar N (eds.), Immunoinformatics. Methods Mol Biol Humana Press, New York, NY, 2014;1184:149–64. 10.1007/978-1-4939-1115-8_8 [DOI] [PubMed] [Google Scholar]
  • 27. Saha  S, Bhasin  M, Raghava  GPS. Bcipep: a database of B-cell epitopes. BMC Genomics  2005;6:79. 10.1186/1471-2164-6-79 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Reche  PA, Zhang  H, Glutting  J-P. et al.  EPIMHC: a curated database of MHC-binding peptides for customized computational vaccinology. Bioinformatics  2005;21:2140–1. 10.1093/bioinformatics/bti269 [DOI] [PubMed] [Google Scholar]
  • 29. Zhang  G, Chitkushev  L, Olsen  LR. et al.  TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes. BMC bioinformatics  2021;22:40. 10.1186/s12859-021-03962-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Yang  B, Sayers  S, Xiang  Z. et al.  Protegen: a web-based protective antigen database and analysis system. Nucleic Acids Res  2010;39:D1073–8. 10.1093/nar/gkq944 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Schlessinger  A, Ofran  Y, Yachdav  G. et al.  Epitome: database of structure-inferred antigenic epitopes. Nucleic Acids Res  2006;34:D777–80. 10.1093/nar/gkj053 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Huang  J, Honda  W. CED: a conformational epitope database. BMC Immunol  2006;7:7. 10.1186/1471-2172-7-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Sharma  OP, Das  AA, Krishna  R. et al.  Structural epitope database (SEDB): a web-based database for the epitope, and its intermolecular interaction along with the tertiary structure information. Journal of Proteomics & Bioinformatics  2012;5:1–6. 10.4172/jpb.1000217 [DOI] [Google Scholar]
  • 34. Borrman  T, Cimons  J, Cosiano  M. et al.  ATLAS: a database linking binding affinities with structures for wild-type and mutant TCR-pMHC complexes, proteins: atructure. Function, and Bioinformatics  2017;85:908–16. 10.1002/prot.25260 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Leem  J, de  Oliveira  SHP, Krawczyk  K. et al.  STCRDab: the structural T-cell receptor database. Nucleic Acids Res  2018;46:D406–12. 10.1093/nar/gkx971 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Chen  S-Y, Yue  T, Lei  Q. et al.  TCRdb: a comprehensive database for T-cell receptor sequences with powerful search function. Nucleic Acids Res  2020;49:D468–74. 10.1093/nar/gkaa796 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Rammensee  H, Bachmann  J, Emmerich  NP. et al.  SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics  1999;50:213–9. 10.1007/s002510050595 [DOI] [PubMed] [Google Scholar]
  • 38. Tickotsky  N, Sagiv  T, Prilusky  J. et al.  McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics  2017;33:2924–9. 10.1093/bioinformatics/btx286 [DOI] [PubMed] [Google Scholar]
  • 39. Shugay  M, Bagaev  DV, Zvyagin  IV. et al.  VDJdb: a curated database of T-cell receptor sequences with known antigen specificity. Nucleic Acids Res  2018;46:D419–27. 10.1093/nar/gkx760 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Khan  JM, Cheruku  HR, Tong  JC. et al.  MPID-T2: a database for sequence–structure–function analyses of pMHC and TR/pMHC structures. Bioinformatics  2011;27:1192–3. 10.1093/bioinformatics/btr104 [DOI] [PubMed] [Google Scholar]
  • 41. Bhasin  M, Singh  H, Raghava  GPS. MHCBN: a comprehensive database of MHC binding and non-binding peptides. Bioinformatics  2003;19:665–6. 10.1093/bioinformatics/btg055 [DOI] [PubMed] [Google Scholar]
  • 42. Kaur  D, Patiyal  S, Sharma  N. et al.  PRRDB 2.0: a comprehensive database of pattern-recognition receptors and their ligands. Database  2019;2019:baz076. 10.1093/database/baz076 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Breuer  K, Foroushani  AK, Laird  MR. et al.  InnateDB: systems biology of innate immunity and beyond—recent updates and continuing curation. Nucleic Acids Res  2013;41:D1228–33. 10.1093/nar/gks1147 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Gonzalez-Galarza Faviel  F, McCabe  A, Santos Eduardo  JM. et al.  Allele frequency net database (AFND) 2020 update: gold-standard data classification, open access genotype data and new query tools. Nucleic Acids Res  2020;48:D783–8. 10.1093/nar/gkz1029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Luo  H, Lin  Y, Liu  T. et al.  DEG 15, an update of the database of essential genes that includes built-in analysis tools. Nucleic Acids Res  2021;49:D677–86. 10.1093/nar/gkaa917 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Pickett  BE, Sadat  EL, Zhang  Y. et al.  ViPR: an open bioinformatics database and analysis resource for virology research. Nucleic Acids Res  2012;40:D593–8. 10.1093/nar/gkr859 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Chaudhuri  R, Ansari  FA, Raghunandanan  MV. et al.  FungalRV: adhesin prediction and immunoinformatics portal for human fungal pathogens. BMC Genomics  2011;12:192. 10.1186/1471-2164-12-192 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Zhang  GL, Riemer  AB, Keskin  DB. et al.  HPVdb: a data mining system for knowledge discovery in human papillomavirus with applications in T cell immunology and vaccinology. Database  2014;2014:bau031. 10.1093/database/bau031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Xie  R, Cao  B, Wu  Z. et al.  dbEBV: a database of Epstein-Barr virus variants and their correlations with human health. Comput Struct Biotechnol J  2024;23:2076–82. 10.1016/j.csbj.2024.04.043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Zhang  GL, Chitkushev  L, Keskin  DB. et al.  EBVdb: a data mining system for knowledge discovery in Epstein-Barr virus with applications in T cell immunology and vaccinology. In: 2015 International Workshop on Artificial Immune Systems (AIS)  2015;1–8. 10.1109/AISW.2015.7469232 [DOI] [Google Scholar]
  • 51. Olsen  LR, Zhang  GL, Reinherz  EL. et al.  FLAVIdB: a data mining system for knowledge discovery in flaviviruses with direct applications in immunology and vaccinology. Immunome research  2011;7:2. [PMC free article] [PubMed] [Google Scholar]
  • 52. Simon  C, Kudahl  UJ, Sun  J. et al.  FluKB: a knowledge-based system for influenza vaccine target discovery and analysis of the immunological properties of influenza viruses. J Immunol Res  2015;2015:380975. 10.1155/2015/380975 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Hulo  C, de  Castro  E, Masson  P. et al.  ViralZone: a knowledge resource to understand virus diversity. Nucleic Acids Res  2011;39:D576–82. 10.1093/nar/gkq901 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Ivanciuc  O, Schein  CH, Braun  W. SDAP: database and computational tools for allergenic proteins. Nucleic Acids Res  2003;31:359–62. 10.1093/nar/gkg010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Sánchez-Trincado  JL, Gomez-Perosanz  M, PAJJOIR  R. Fundamentals and methods for T- and B-cell epitope prediction. J Immunol Res  2017;2017:2680160. 10.1155/2017/2680160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Bhasin  M, Raghava  GPS. Pcleavage: an SVM based method for prediction of constitutive proteasome and immunoproteasome cleavage sites in antigenic sequences. Nucleic Acids Res  2005;33:W202–7. 10.1093/nar/gki587 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Lam  TH, Mamitsuka  H, Ren  EC. et al.  TAP hunter: a SVM-based system for predicting TAP ligands using local description of amino acid sequence. Immunome Research  2010;6:S6. 10.1186/1745-7580-6-S1-S6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Keşmir  C, Nussbaum  AK, Schild  H. et al.  Prediction of proteasome cleavage motifs by neural networks. Protein Engineering, Design and Selection  2002;15:287–96. 10.1093/protein/15.4.287 [DOI] [PubMed] [Google Scholar]
  • 59. Bhasin  M, Raghava  GPS. Analysis and prediction of affinity of TAP binding peptides using cascade SVM. Protein Sci  2004;13:596–607. 10.1110/ps.03373104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Singh  H, Raghava  GPS. ProPred1: prediction of promiscuous MHC class-I binding sites. Bioinformatics (Oxford, England)  2003;19:1009–14. 10.1093/bioinformatics/btg108 [DOI] [PubMed] [Google Scholar]
  • 61. Reche  PA, Glutting  J-P, Zhang  H. et al.  Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles. Immunogenetics  2004;56:405–19. 10.1007/s00251-004-0709-7 [DOI] [PubMed] [Google Scholar]
  • 62. Nielsen  M, Lundegaard  C, Lund  O. et al.  The role of the proteasome in generating cytotoxic T-cell epitopes: insights obtained from improved predictions of proteasomal cleavage. Immunogenetics  2005;57:33–41. 10.1007/s00251-005-0781-7 [DOI] [PubMed] [Google Scholar]
  • 63. Larsen  MV, Lundegaard  C, Lamberth  K. et al.  Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. BMC Bioinformatics  2007;8:424. 10.1186/1471-2105-8-424 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Diez-Rivero  CM, Chenlo  B, Zuluaga  P. et al.  Quantitative modeling of peptide binding to TAP using support vector machine, proteins: structure. Function, and Bioinformatics  2010;78:63–72. 10.1002/prot.22535 [DOI] [PubMed] [Google Scholar]
  • 65. Parker  KC, Bednarek  MA, Coligan  JE. Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. The Journal of Immunology  1994;152:163–75. 10.4049/jimmunol.152.1.163 [DOI] [PubMed] [Google Scholar]
  • 66. D'Amaro  J, Houbiers  JGA, Drijfhout  JW. et al.  A computer program for predicting possible cytotoxic T lymphocyte epitopes based on HLA class I peptide-binding motifs. Hum Immunol  1995;43:13–8. 10.1016/0198-8859(94)00153-H [DOI] [PubMed] [Google Scholar]
  • 67. Sturniolo  T, Bono  E, Ding  J. et al.  Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat Biotechnol  1999;17:555–61. 10.1038/9858 [DOI] [PubMed] [Google Scholar]
  • 68. Dönnes  P, Elofsson  A. Prediction of MHC class I binding peptides, using SVMHC. BMC Bioinformatics  2002;3:25. 10.1186/1471-2105-3-25 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Dönnes  P, Kohlbacher  O. SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res  2006;34:W194–7. 10.1093/nar/gkl284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Udaka  K, Mamitsuka  H, Nakaseko  Y. et al.  Prediction of MHC class I binding peptides by a query learning algorithm based on hidden Markov models. Journal of Biological Physics  2002;28:183–94. 10.1023/A:1019931731519 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Buus  S, Lauemøller  SL, Worning  P. et al.  Sensitive quantitative predictions of peptide-MHC binding by a ‘query by committee’ artificial neural network approach. Tissue Antigens  2003;62:378–84. 10.1034/j.1399-0039.2003.00112.x [DOI] [PubMed] [Google Scholar]
  • 72. Reche  PA, Reinherz  EL. PEPVAC: a web server for multi-epitope vaccine development based on the prediction of supertypic MHC ligands. Nucleic Acids Res  2005;33:W138–42. 10.1093/nar/gki357 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Zhang  GL, Khan  AM, Srinivasan  KN. et al.  MULTIPRED: a computational system for prediction of promiscuous HLA binding peptides. Nucleic Acids Res  2005;33:W172–9. 10.1093/nar/gki452 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74. Nielsen  M, Lundegaard  C, Lund  O. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics  2007;8:238. 10.1186/1471-2105-8-238 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Nielsen  M, Lundegaard  C, Blicher  T. et al.  NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence. PloS One  2007;2:e796. 10.1371/journal.pone.0000796 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Nielsen  M, Lundegaard  C, Blicher  T. et al.  Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan. PLoS Comput Biol  2008;4:e1000107. 10.1371/journal.pcbi.1000107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Zhang  GL, DeLuca  DS, Keskin  DB. et al.  MULTIPRED2: a computational system for large-scale identification of peptides predicted to bind to HLA supertypes and alleles. J Immunol Methods  2011;374:53–61. 10.1016/j.jim.2010.11.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78. Andreatta  M, Nielsen  M. Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics (Oxford, England)  2016;32:511–7. 10.1093/bioinformatics/btv639 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79. Bassani-Sternberg  M, Chong  C, Guillaume  P. et al.  Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity. PLoS Comput Biol  2017;13:e1005725. 10.1371/journal.pcbi.1005725 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80. Sarkizova  S, Klaeger  S, Le  PM. et al.  A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nat Biotechnol  2019;38:199–209. 10.1038/s41587-019-0322-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81. Racle  J, Michaux  J, Rockinger  GA. et al.  Robust prediction of HLA class II epitopes by deep motif deconvolution of immunopeptidomes. Nat Biotechnol  2019;37:1283–6. 10.1038/s41587-019-0289-6 [DOI] [PubMed] [Google Scholar]
  • 82. Reynisson  B, Alvarez  B, Paul  S. et al.  NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res  2020;48:W449–54. 10.1093/nar/gkaa379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83. Reynisson  B, Barra  C, Kaabinejadian  S. et al.  Improved prediction of MHC II antigen presentation through integration and motif deconvolution of mass spectrometry MHC eluted ligand data. J Proteome Res  2020;19:2304–15. 10.1021/acs.jproteome.9b00874 [DOI] [PubMed] [Google Scholar]
  • 84. O’Donnell  TJ, Rubinsteyn  A, Laserson  U. MHCflurry 2.0: improved Pan-Allele prediction of MHC class I-presented peptides by incorporating antigen processing. Cell Systems  2020;11: 42–48.e47. 10.1016/j.cels.2020.06.010 [DOI] [PubMed] [Google Scholar]
  • 85. Bravi  B, Tubiana  J, Cocco  S. et al.  RBM-MHC: a semi-supervised machine-learning method for sample-specific prediction of antigen presentation by HLA-I alleles. Cell Systems  2021;12:195–202.e199. 10.1016/j.cels.2020.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86. Chu  Y, Zhang  Y, Wang  Q. et al.  A transformer-based model to predict peptide–HLA class I binding and optimize mutated peptides for vaccine design. Nature Machine Intelligence  2022;4:300–11. 10.1038/s42256-022-00459-7 [DOI] [Google Scholar]
  • 87. Racle  J, Guillaume  P, Schmidt  J. et al.  Machine learning predictions of MHC-II specificities reveal alternative binding mode of class II epitopes. Immunity  2023;56:1359–1375.e1313. 10.1016/j.immuni.2023.03.009 [DOI] [PubMed] [Google Scholar]
  • 88. Albert  BA, Yang  Y, Shao  XM. et al.  Deep neural networks predict class I major histocompatibility complex epitope presentation and transfer learn neoepitope immunogenicity. Nature Machine Intelligence  2023;5:861–72. 10.1038/s42256-023-00694-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89. Wang  M, Lei  C, Wang  J. et al.  TripHLApan: predicting HLA molecules binding peptides based on triple coding matrix and transfer learning. Brief Bioinform  2024;25:bbae154. 10.1093/bib/bbae154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90. Tadros  DM, Racle  J, Gfeller  D. Predicting MHC-I ligands across alleles and species: how far can we go?  bioRxiv  2024;593183. 10.1101/2024.05.08.593183 [DOI] [Google Scholar]
  • 91. Zhang  L, Song  W, Zhu  T. et al.  ConvNeXt-MHC: improving MHC–peptide affinity prediction by structure-derived degenerate coding and the ConvNeXt model. Brief Bioinform  2024;25:bbae133. 10.1093/bib/bbae133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Xu  H, Hu  R, Dong  X. et al.  ImmuneApp for HLA-I epitope prediction and immunopeptidome analysis. Nat Commun  2024;15:8926. 10.1038/s41467-024-53296-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Atanasova  M, Patronov  A, Dimitrov  I. et al.  EpiDOCK: a molecular docking-based tool for MHC class II binding prediction. Protein Engineering, Design and Selection  2013;26:631–4. 10.1093/protein/gzt018 [DOI] [PubMed] [Google Scholar]
  • 94. Aronson  A, Hochner  T, Cohen  T. et al.  Structure modeling and specificity of peptide-MHC class I interactions using geometric deep learning. bioRxiv  2022;520566. 10.1101/2022.12.15.520566 [DOI] [Google Scholar]
  • 95. Motmaen  A, Dauparas  J, Baek  M. et al.  Peptide-binding specificity prediction using fine-tuned protein structure prediction networks. Proc Natl Acad Sci U S A  2023;120:e2216697120. 10.1073/pnas.2216697120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96. Jurtz  VI, Jessen  LE, Bentzen  AK. et al.  NetTCR: sequence-based prediction of TCR binding to peptide-MHC complexes using convolutional neural networks. bioRxiv  2018;433706. 10.1101/433706 [DOI] [Google Scholar]
  • 97. Moris  P, De Pauw  J, Postovskaya  A. et al.  Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification. Brief Bioinform  2021;22:bbaa318. 10.1093/bib/bbaa318 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Bradley  P. Structure-based prediction of T cell receptor: peptide-MHC interactions. Elife  2023;12:e82813. 10.7554/eLife.82813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Lin  X, George  JT, Schafer  NP. et al.  Rapid assessment of T-cell receptor specificity of the immune repertoire. Nature Computational Science  2021;1:362–73. 10.1038/s43588-021-00076-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100. Jensen  MF, Nielsen  M. Enhancing TCR specificity predictions by combined pan- and peptide-specific training, loss-scaling, and sequence similarity integration. Elife  2024;12:RP93934. 10.7554/eLife.93934 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101. Ji  H, Wang  X-X, Zhang  Q. et al.  Predicting TCR sequences for unseen antigen epitopes using structural and sequence features. Brief Bioinform  2024;25:bbae210. 10.1093/bib/bbae210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102. Bhasin  M, Raghava  GPS. Prediction of CTL epitopes using QM, SVM and ANN techniques. Vaccine  2004;22:3195–204. 10.1016/j.vaccine.2004.02.005 [DOI] [PubMed] [Google Scholar]
  • 103. Dhanda  SK, Gupta  S, Vir  P. et al.  Prediction of IL4 inducing peptides. J Immunol Res  2013;2013:263952. 10.1155/2013/263952 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104. Dhanda  SK, Vir  P, Raghava  GPS. Designing of interferon-gamma inducing MHC class-II binders. Biol Direct  2013;8:30. 10.1186/1745-6150-8-30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105. Dhanda  SK, Karosiene  E, Edwards  L. et al.  Predicting HLA CD4 immunogenicity in human populations. Front Immunol  2018;9:1369. 10.3389/fimmu.2018.01369 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106. Kotsias  F, Cebrian  I, Alloatti  A. Antigen processing and presentation. In: Lhuillier  C., Galluzzi  L. (eds). International Review of Cell and Molecular Biology. Academic Press, 2019;348:69–121. 10.1016/bs.ircmb.2019.07.005 [DOI] [PubMed] [Google Scholar]
  • 107. Backert  L, Kohlbacher  O. Immunoinformatics and epitope prediction in the age of genomic medicine. Genome Med  2015;7:119. 10.1186/s13073-015-0245-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108. Peters  B, Nielsen  M, Sette  A. T cell epitope predictions. Annu Rev Immunol  2020;38:123–45. 10.1146/annurev-immunol-082119-124838 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109. Fisch  A, Reynisson  B, Benedictus  L. et al.  Integral use of Immunopeptidomics and Immunoinformatics for the characterization of antigen presentation and rational identification of BoLA-DR–presented peptides and epitopes. The Journal of Immunology  2021;206:2489–97. 10.4049/jimmunol.2001409 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110. Nielsen  M, Andreatta  M, Peters  B. et al.  Immunoinformatics: predicting peptide-MHC binding. Annual review of biomedical data science  2020;3:191–215. 10.1146/annurev-biodatasci-021920-100259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111. Sette  A, Sidney  J. Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism. Immunogenetics  1999;50:201–12. 10.1007/s002510050594 [DOI] [PubMed] [Google Scholar]
  • 112. Perez  MAS, Cuendet  MA, Röhrig  UF. et al.  Structural prediction of peptide–MHC binding modes. In: Simonson  T (ed.), Computational Peptide Science: Methods and Protocols, pp. 245–82. Springer US: New York, NY, 2022. [DOI] [PubMed] [Google Scholar]
  • 113. Antunes  AD, Abella  RJ, Devaurs  D. et al.  Structure-based methods for binding mode and binding affinity prediction for peptide-MHC complexes. Curr Top Med Chem  2018;18:2239–55. 10.2174/1568026619666181224101744 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114. Parizi  FM, Marzella  DF, Ramakrishnan  G. et al.  PANDORA v2.0: benchmarking peptide-MHC II models and software improvements. Front Immunol  2023;14:1285899. 10.3389/fimmu.2023.1285899 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115. Pellequer†  JL, Westhof  E. PREDITOP: a program for antigenicity prediction. J Mol Graph  1993;11:204–10. 10.1016/0263-7855(93)80074-2 [DOI] [PubMed] [Google Scholar]
  • 116. Alix  AJP. Predictive estimation of protein linear epitopes by using the program PEOPLE. Vaccine  1999;18:311–4. 10.1016/S0264-410X(99)00329-1 [DOI] [PubMed] [Google Scholar]
  • 117. Kumar  N, Bajiya  N, Patiyal  S. et al.  Multi-perspectives and challenges in identifying B-cell epitopes. Protein Sci  2023;32:e4785. 10.1002/pro.4785 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118. Saha  S, Raghava  GPS. BcePred: Prediction of continuous B-cell epitopes in antigenic sequences using Physico-chemical properties. In: Nicosia  G, Cutello  V, Bentley  PJ. et al. (eds.), Artificial Immune Systems, pp. 197–204. Springer Berlin Heidelberg: Berlin, Heidelberg, 2004. [Google Scholar]
  • 119. Saha  S, Raghava  GPS. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network, proteins: structure. Function, and Bioinformatics  2006;65:40–8. 10.1002/prot.21078 [DOI] [PubMed] [Google Scholar]
  • 120. Larsen  JEP, Lund  O, Nielsen  M. Improved method for predicting linear B-cell epitopes. Immunome Research  2006;2:2. 10.1186/1745-7580-2-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121. Saha  S, Raghava  GPS. AlgPred: prediction of allergenic proteins and mapping of IgE epitopes. Nucleic Acids Res  2006;34:W202–9. 10.1093/nar/gkl343 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122. El-Manzalawy  Y, Dobbs  D, Honavar  V. Predicting linear B-cell epitopes using string kernels. J Mol Recognit  2008;21:243–55. 10.1002/jmr.893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123. Gupta  S, Ansari  HR, Gautam  A. et al.  Identification of B-cell epitopes in an antigen for inducing specific class of antibodies. Biol Direct  2013;8:27. 10.1186/1745-6150-8-27 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124. Saravanan  V, Gautham  N. BCIgEPRED—a dual-layer approach for predicting linear IgE epitopes. Mol Biol  2018;52:285–93. 10.1134/S0026893318020127 [DOI] [PubMed] [Google Scholar]
  • 125. Manavalan  B, Govindaraj  RG, Shin  TH. et al.  iBCE-EL: a new ensemble learning framework for improved linear B-cell epitope prediction. Front Immunol  2018;9:1695. 10.3389/fimmu.2018.01695 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126. Dall’ Antonia  F, Keller  W. SPADE web service for prediction of allergen IgE epitopes. Nucleic Acids Res  2019;47:W496–501. 10.1093/nar/gkz331 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 127. Sharma  N, Patiyal  S, Dhall  A. et al.  AlgPred 2.0: an improved method for predicting allergenic proteins and mapping of IgE epitopes. Brief Bioinform  2021;22:bbaa294. 10.1093/bib/bbaa294 [DOI] [PubMed] [Google Scholar]
  • 128. Liu  T, Shi  K, Li  W. Deep learning methods improve linear B-cell epitope prediction. BioData Mining  2020;13:1. 10.1186/s13040-020-00211-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129. Kadam  K, Peerzada  N, Karbhal  R. et al.  Antibody class(es) predictor for epitopes (AbCPE): a multi-label classification algorithm. Frontiers in Bioinformatics  2021;1:709951. 10.3389/fbinf.2021.709951 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130. Clifford  JN, Høie  MH, Deleuran  S. et al.  BepiPred-3.0: improved B-cell epitope prediction using protein language models. Protein Sci  2022;31:e4497. 10.1002/pro.4497 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 131. Qi  Y, Zheng  P, GJFIM  H. DeepLBCEPred: a Bi-LSTM and multi-scale CNN-based deep learning method for predicting linear B-cell epitopes. Front Microbiol  2023;14:1117027. 10.3389/fmicb.2023.1117027 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132. da  Silva  BM, Ascher  DB, Pires  DEV. epitope1D: accurate taxonomy-aware B-cell linear epitope prediction. Brief Bioinform  2023;24:bbad114. 10.1093/bib/bbad114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133. Kulkarni-Kale  U, Bhosle  S, Kolaskar  AS. CEP: a conformational epitope prediction server. Nucleic Acids Res  2005;33:W168–71. 10.1093/nar/gki460 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134. Haste Andersen  P, Nielsen  M, Lund  O. Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci  2006;15:2558–67. 10.1110/ps.062405906 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 135. Ponomarenko  J, Bui  H-H, Li  W. et al.  ElliPro: a new structure-based tool for the prediction of antibody epitopes. BMC Bioinformatics  2008;9:514. 10.1186/1471-2105-9-514 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 136. Sun  J, Wu  D, Xu  T. et al.  SEPPA: a computational server for spatial epitope prediction of protein antigens. Nucleic Acids Res  2009;37:W612–6. 10.1093/nar/gkp417 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 137. Ansari  HR, Raghava  GPS. Identification of conformational B-cell epitopes in an antigen from its primary sequence. Immunome Research  2010; 6:6. 10.1186/1745-7580-6-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 138. Gao  J, Faraggi  E, Zhou  Y. et al.  BEST: improved prediction of B-cell epitopes from antigen sequences. PloS One  2012;7:e40104. 10.1371/journal.pone.0040104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139. Zhang  J, Zhao  X, Sun  P. et al.  Conformational B-cell epitopes prediction from sequences using cost-sensitive ensemble classifiers and spatial clustering. Biomed Res Int  2014;2014:689219. 10.1155/2014/689219 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140. Dalkas  GA, Rooman  M. SEPIa, a knowledge-driven algorithm for predicting conformational B-cell epitopes from the amino acid sequence. BMC Bioinformatics  2017;18:95. 10.1186/s12859-017-1528-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141. Zhou  C, Chen  Z, Zhang  L. et al.  SEPPA 3.0—enhanced spatial epitope prediction enabling glycoprotein antigens. Nucleic Acids Res  2019;47:W388–94. 10.1093/nar/gkz413 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142. Høie  MH, Gade  FS, Johansen Julie  M. et al.  DiscoTope-3.0: improved B-cell epitope prediction using inverse folding latent representations. Front Immunol  2024;15:15. 10.3389/fimmu.2024.1322712 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 143. Sela-Culang  I, Ashkenazi  S, Peters  B. et al.  PEASE: predicting B-cell epitopes utilizing antibody sequence. Bioinformatics  2015;31:1313–5. 10.1093/bioinformatics/btu790 [DOI] [PubMed] [Google Scholar]
  • 144. Krawczyk  K, Liu  X, Baker  T. et al.  Improving B-cell epitope prediction and its application to global antibody-antigen docking. Bioinformatics  2014;30:2288–94. 10.1093/bioinformatics/btu190 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145. Qiu  T, Zhang  L, Chen  Z. et al.  SEPPA-mAb: spatial epitope prediction of protein antigens for mAbs. Nucleic Acids Res  2023;51:W528–34. 10.1093/nar/gkad427 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146. Huang  J, Gutteridge  A, Honda  W. et al.  MIMOX: a web tool for phage display based epitope mapping. BMC Bioinformatics  2006;7:451. 10.1186/1471-2105-7-451 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147. Mayrose  I, Penn  O, Erez  E. et al.  Pepitope: epitope mapping from affinity-selected peptides. Bioinformatics  2007;23:3244–6. 10.1093/bioinformatics/btm493 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148. Huang  YX, Bao  YL, Guo  SY. et al.  Pep-3D-search: a method for B-cell epitope prediction based on mimotope analysis. BMC Bioinformatics  2008;9:538. 10.1186/1471-2105-9-538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 149. Sun  P, Ju  H, Zhang  B. et al.  Conformational B-cell epitope prediction method based on antigen preprocessing and mimotopes analysis. Biomed Res Int  2015;2015:1–8. 10.1155/2015/257030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 150. Manzoor  H, Wani  A. Evolution of machine learning methods in linear B-cell epitope prediction. In: 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, New Delhi, India, pp. 979–85, 2023.
  • 151. El-Manzalawy  Y, Dobbs  D, Honavar  VG. In Silico prediction of linear B-cell epitopes on proteins. In: Zhou  Y, Kloczkowski  A, Faraggi  E. et al. (eds.), Prediction of Protein Secondary Structure, 255–64. Springer New York: New York, NY, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152. Zhou  J, Chen  J, Peng  Y. et al.  A promising tool in serological diagnosis: current research progress of antigenic epitopes in infectious diseases. Pathogens  2022;11:1095. 10.3390/pathogens11101095 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153. Wu  C-H, Liu  IJ, Lu  R-M. et al.  Advancement and applications of peptide phage display technology in biomedical science. J Biomed Sci  2016;23:8. 10.1186/s12929-016-0223-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154. Zhang  C, Li  Y, Tang  W. et al.  The relationship between B-cell epitope and Mimotope sequences. Protein & Peptide Letters  2016;23:132–41. 10.2174/0929866523666151230124538 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155. Aghebati-Maleki  L, Bakhshinejad  B, Baradaran  B. et al.  Phage display as a promising approach for vaccine development. J Biomed Sci  2016;23:66. 10.1186/s12929-016-0285-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156. Zhang  WUY, Wan  Y, Li  DG. et al.  A mimotope of pre-S2 region of surface antigen of viral hepatitis B screened by phage display. Cell Res  2001;11:203–8. 10.1038/sj.cr.7290087 [DOI] [PubMed] [Google Scholar]
  • 157. Cia  G, Pucci  F, Rooman  M. Critical review of conformational B-cell epitope prediction methods. Brief Bioinform  2023;24:bbac567. 10.1093/bib/bbac567 [DOI] [PubMed] [Google Scholar]
  • 158. Jespersen  MC, Peters  B, Nielsen  M. et al.  BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Res  2017;45:W24–9. 10.1093/nar/gkx346 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159. Kringelum  JV, Lundegaard  C, Lund  O. et al.  Reliable B cell epitope predictions: impacts of method development and improved benchmarking. PLoS Comput Biol  2012;8:e1002829. 10.1371/journal.pcbi.1002829 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 160. Liang  S, Zheng  D, Standley  DM. et al.  EPSVR and EPMeta: prediction of antigenic epitopes using support vector regression and multiple server results. BMC Bioinformatics  2010;11:381. 10.1186/1471-2105-11-381 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 161. Sweredoski  MJ, Baldi  P. PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure. Bioinformatics  2008;24:1459–60. 10.1093/bioinformatics/btn199 [DOI] [PubMed] [Google Scholar]
  • 162. da  Silva  BM, Myung  Y, Ascher  DB. et al.  epitope3D: a machine learning method for conformational B-cell epitope prediction. Brief Bioinform  2022;23:bbab423. 10.1093/bib/bbab423 [DOI] [PubMed] [Google Scholar]
  • 163. Ott  PA, Hu  Z, Keskin  DB. et al.  An immunogenic personal neoantigen vaccine for patients with melanoma. Nature  2017;547:217–21. 10.1038/nature22991 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 164. Keskin  DB, Anandappa  AJ, Sun  J. et al.  Neoantigen vaccine generates intratumoral T cell responses in phase Ib glioblastoma trial. Nature  2019;565:234–9. 10.1038/s41586-018-0792-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 165. Jeffreys  S, Tompkins  MP, Aki  J. et al.  Development and evaluation of an Immunoinformatics-based multi-peptide vaccine against Acinetobacter baumannii infection. Vaccine  2024;12:358. 10.3390/vaccines12040358 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 166. Ghaffar  SA, Tahir  H, Muhammad  S. et al.  Designing of a multi-epitopes based vaccine against Haemophilius parainfluenzae and its validation through integrated computational approaches. Front Immunol  2024;15:1380732. 10.3389/fimmu.2024.1380732 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 167. Kolla  HB, Dutt  M, Kumar  A. et al.  Immuno-informatics study identifies conserved T cell epitopes in non-structural proteins of bluetongue virus serotypes: formulation of a computationally optimized next-generation broad-spectrum multi-epitope vaccine. Front Immunol  2024;15:1424307. 10.3389/fimmu.2024.1424307 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 168. Dasari  V, McNeil  LK, Beckett  K. et al.  Lymph node targeted multi-epitope subunit vaccine promotes effective immunity to EBV in HLA-expressing mice. Nat Commun  2023;14:4371. 10.1038/s41467-023-39770-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169. Alam  A, Khan  A, Imam  N. et al.  Design of an epitope-based peptide vaccine against the SARS-CoV-2: a vaccine-informatics approach. Brief Bioinform  2021;22:1309–23. 10.1093/bib/bbaa340 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 170. Zhang  G, Han  L, Zhao  Y. et al.  Development and evaluation of a multi-epitope subunit vaccine against mycoplasma synoviae infection. Int J Biol Macromol  2023;253:126685. 10.1016/j.ijbiomac.2023.126685 [DOI] [PubMed] [Google Scholar]
  • 171. Zhang  Y, Liang  S, Zhang  S. et al.  Development and evaluation of a multi-epitope subunit vaccine against group B streptococcus infection. Emerging Microbes & Infections  2022;11:2371–82. 10.1080/22221751.2022.2122585 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 172. Bhattacharya  M, Chatterjee  S, Nag  S. et al.  Designing, characterization, and immune stimulation of a novel multi-epitopic peptide-based potential vaccine candidate against monkeypox virus through screening its whole genome encoded proteins: an immunoinformatics approach. Travel Med Infect Dis  2022;50:102481. 10.1016/j.tmaid.2022.102481 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 173. Aziz  S, Almajhdi  FN, Waqas  M. et al.  Contriving multi-epitope vaccine ensemble for monkeypox disease using an immunoinformatics approach. Front Immunol  2022;13:1004804. 10.3389/fimmu.2022.1004804 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 174. Alshabrmi  FM, Alrumaihi  F, Alrasheedi  SF. et al.  An In-Silico investigation to design a multi-epitopes vaccine against multi-drug resistant hafnia alvei. Vaccine  2022;10:1127. 10.3390/vaccines10071127 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 175. Waqas  M, Aziz  S, Bushra  A. et al.  Employing an immunoinformatics approach revealed potent multi-epitope based subunit vaccine for lymphocytic choriomeningitis virus. J Infect Public Health  2023;16:214–32. 10.1016/j.jiph.2022.12.023 [DOI] [PubMed] [Google Scholar]
  • 176. Jin  Y, Fayyaz  A, Liaqat  A. et al.  Proteomics-based vaccine targets annotation and design of subunit and mRNA-based vaccines for Monkeypox virus (MPXV) against the recent outbreak. Comput Biol Med  2023;159:106893. 10.1016/j.compbiomed.2023.106893 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 177. Shahrear  S, Islam  ABMMK. Immunoinformatics guided modeling of CCHF_GN728, an mRNA-based universal vaccine against Crimean-Congo hemorrhagic fever virus. Comput Biol Med  2022;140:105098. 10.1016/j.compbiomed.2021.105098 [DOI] [PubMed] [Google Scholar]
  • 178. Alawam  AS, Alwethaynani  MS. Construction of an aerolysin-based multi-epitope vaccine against Aeromonas hydrophila: an in silico machine learning and artificial intelligence-supported approach. Front Immunol  2024;15:1369890. 10.3389/fimmu.2024.1369890 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 179. Huang  S, Zhang  C, Li  J. et al.  Designing a multi-epitope vaccine against coxsackievirus B based on immunoinformatics approaches. Front Immunol  2022;13:933594. 10.3389/fimmu.2022.933594 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 180. Bhattacharya  K, Shamkh  IM, Khan  MS. et al.  Multi-epitope vaccine design against Monkeypox virus via reverse vaccinology method exploiting Immunoinformatic and Bioinformatic approaches. Vaccine  2022;10:2010. 10.3390/vaccines10122010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 181. de  Oliveira  MA, Vilela Rodrigues  TC, Tiwari  S. et al.  Immunoinformatics-guided design of a multi-valent vaccine against rotavirus and norovirus (ChRNV22). Comput Biol Med  2023;159:106941. 10.1016/j.compbiomed.2023.106941 [DOI] [PubMed] [Google Scholar]
  • 182. Ojha  R, Pandey  RK, Prajapati  VK. Vaccinomics strategy to concoct a promising subunit vaccine for visceral leishmaniasis targeting sandfly and leishmania antigens. Int J Biol Macromol  2020;156:548–57. 10.1016/j.ijbiomac.2020.04.097 [DOI] [PubMed] [Google Scholar]
  • 183. Dong  R, Chu  Z, Yu  F. et al.  Contriving multi-epitope subunit of vaccine for COVID-19: Immunoinformatics approaches. Front Immunol  2020;11:1784. 10.3389/fimmu.2020.01784 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 184. Reddy Chichili  VP, Kumar  V, Sivaraman  J. Linkers in the structural biology of protein–protein interactions. Protein Sci  2013;22:153–67. 10.1002/pro.2206 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 185. Ayyagari  VS, T C V  KAP. et al.  Design of a multi-epitope-based vaccine targeting M-protein of SARS-CoV2: an immunoinformatics approach. Journal of Biomolecular Structure and Dynamics  2022; 40:2963–77. 10.1080/07391102.2020.1850357 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 186. Chen  X, Zaro  J, Shen  W-C. Fusion protein linkers: effects on production, bioactivity, and pharmacokinetics. In: Schmidt SR (ed.), Fusion Protein Technologies for Biopharmaceuticals. John Wiley & Sons, Inc., 2013;57–73. 10.1002/9781118354599.ch4 [DOI] [Google Scholar]
  • 187. Enosi, Tuipulotu  D, Netzler Natalie  E, Lun Jennifer  H. et al.  TLR7 agonists display potent antiviral effects against norovirus infection via innate stimulation. Antimicrob Agents Chemother  2018;62:e02417–17. 10.1128/AAC.02417-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 188. Kolla  HB, Tirumalasetty  C, Sreerama  K. et al.  An immunoinformatics approach for the design of a multi-epitope vaccine targeting super antigen TSST-1 of Staphylococcus aureus. Journal of Genetic Engineering and Biotechnology  2021;19:69. 10.1186/s43141-021-00160-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 189. Andongma  BT, Huang  Y, Chen  F. et al.  In silico design of a promiscuous chimeric multi-epitope vaccine against mycobacterium tuberculosis. Comput Struct Biotechnol J  2023;21:991–1004. 10.1016/j.csbj.2023.01.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 190. Tarrahimofrad  H, Rahimnahal  S, Zamani  J. et al.  Designing a multi-epitope vaccine to provoke the robust immune response against influenza a H7N9. Sci Rep  2021;11:24485. 10.1038/s41598-021-03932-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 191. Takeda  K, Akira  S. Toll-like receptors. Curr Protoc Immunol  2015;109:14.12.11–0. 10.1002/0471142735.im1412s109 [DOI] [PubMed] [Google Scholar]
  • 192. Lee  H, Heo  L, Lee  MS. et al.  GalaxyPepDock: a protein–peptide docking tool based on interaction similarity and energy optimization. Nucleic Acids Res  2015;43:W431–5. 10.1093/nar/gkv495 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 193. Zane  L, Kraschowetz  S, Trentini  MM. et al.  Peptide linker increased the stability of pneumococcal fusion protein vaccine candidate. Front Bioeng Biotechnol  2023;11:1108300. 10.3389/fbioe.2023.1108300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 194. Patel  DK, Menon  DV, Patel  DH. et al.  Linkers: a synergistic way for the synthesis of chimeric proteins. Protein Expr Purif  2022;191:106012. 10.1016/j.pep.2021.106012 [DOI] [PubMed] [Google Scholar]
  • 195. Laupèze  B, Hervé  C, Di Pasquale  A. et al.  Adjuvant systems for vaccines: 13 years of post-licensure experience in diverse populations have progressed the way adjuvanted vaccine safety is investigated and understood. Vaccine  2019;37:5670–80. 10.1016/j.vaccine.2019.07.098 [DOI] [PubMed] [Google Scholar]
  • 196. Apostólico  JS, Lunardelli  VAS, Coirada  FC. et al.  Adjuvants: classification, modus operandi, and licensing. J Immunol Res  2016;2016:1459394. 10.1155/2016/1459394 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 197. Pulendran  B, S. Arunachalam P, O’Hagan DT.  Emerging concepts in the science of vaccine adjuvants. Nat Rev Drug Discov  2021;20:454–75. 10.1038/s41573-021-00163-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 198. Bajoria  S, Kaur  K, Kumru  OS. et al.  Antigen-adjuvant interactions, stability, and immunogenicity profiles of a SARS-CoV-2 receptor-binding domain (RBD) antigen formulated with aluminum salt and CpG adjuvants. Hum Vaccin Immunother  2022;18:2079346. 10.1080/21645515.2022.2079346 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 199. Ćirović  A, Ćirović  A, Nikolić  D. et al.  The adjuvant aluminum fate—metabolic tale based on the basics of chemistry and biochemistry. J Trace Elem Med Biol  2021;68:126822. 10.1016/j.jtemb.2021.126822 [DOI] [PubMed] [Google Scholar]
  • 200. Dowling  JK, Mansell  A. Toll-like receptors: the Swiss army knife of immunity and vaccine development. Clinical & Translational Immunology  2016;5:e85. 10.1038/cti.2016.22 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 201. Damas  MSF, Mazur  FG, CCDM  F. et al.  A systematic Immuno-Informatic approach to design a multiepitope-based vaccine against emerging multiple drug resistant Serratia marcescens. Front Immunol  2022;13:768569. 10.3389/fimmu.2022.768569 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 202. Wu  TYH, Singh  M, Miller  AT. et al.  Rational design of small molecules as vaccine adjuvants. Sci Transl Med  2014;6:263ra160–0. 10.1126/scitranslmed.3009980 [DOI] [PubMed] [Google Scholar]
  • 203. Zhao  T, Cai  Y, Jiang  Y. et al.  Vaccine adjuvants: mechanisms and platforms. Signal Transduct Target Ther  2023;8:283. 10.1038/s41392-023-01557-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 204. Brito  LA, Malyala  P, O'Hagan  DT. Vaccine adjuvant formulations: a pharmaceutical perspective. Semin Immunol  2013;25:130–45. 10.1016/j.smim.2013.05.007 [DOI] [PubMed] [Google Scholar]
  • 205. Gilkes  AP, Albin  TJ, Manna  S. et al.  Tuning subunit vaccines with novel TLR Triagonist adjuvants to generate protective immune responses against Coxiella burnetii. The Journal of Immunology  2020;204:611–21. 10.4049/jimmunol.1900991 [DOI] [PubMed] [Google Scholar]
  • 206. Jain  R, Jain  A, Mauro  E. et al.  ICOR: improving codon optimization with recurrent neural networks. BMC Bioinformatics  2023;24:132. 10.1186/s12859-023-05246-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 207. Paremskaia  AI, Kogan  AA, Murashkina  A. et al.  Codon-optimization in gene therapy: promises, prospects and challenges. Front Bioeng Biotechnol  2024;12:1371596. 10.3389/fbioe.2024.1371596 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 208. Mauro  VP, Chappell  SA. A critical analysis of codon optimization in human therapeutics. Trends Mol Med  2014;20:604–13. 10.1016/j.molmed.2014.09.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 209. George  RA, Heringa  J. An analysis of protein domain linkers: their classification and role in protein folding. Protein Engineering, Design and Selection  2002;15:871–9. 10.1093/protein/15.11.871 [DOI] [PubMed] [Google Scholar]
  • 210. Rawlings  ND, Barrett  AJ, Thomas  PD. et al.  The MEROPS database of proteolytic enzymes, their substrates and inhibitors in 2017 and a comparison with peptidases in the PANTHER database. Nucleic Acids Res  2018;46:D624–32. 10.1093/nar/gkx1134 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 211. Crasto  CJ, Feng  J-A. LINKER: a program to generate linker sequences for fusion proteins. Protein Engineering, Design and Selection  2000;13:309–12. 10.1093/protein/13.5.309 [DOI] [PubMed] [Google Scholar]
  • 212. Xue  F, Gu  Z, Feng  J-a. LINKER: a web server to generate peptide sequences with extended conformation. Nucleic Acids Res  2004;32:W562–5. 10.1093/nar/gkh422 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 213. Liu  C, Chin  JX, Lee  D-Y. SynLinker: an integrated system for designing linkers and synthetic fusion proteins. Bioinformatics  2015;31:3700–2. 10.1093/bioinformatics/btv447 [DOI] [PubMed] [Google Scholar]
  • 214. Sayers  S, Ulysse  G, Xiang  Z. et al.  Vaxjo: a web-based vaccine adjuvant database and its application for analysis of vaccine adjuvants and their uses in vaccine development. Biomed Res Int  2012;2012:831486. 10.1155/2012/831486 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 215. Nagpal  G, Gupta  S, Chaudhary  K. et al.  VaccineDA: prediction, design and genome-wide screening of oligodeoxynucleotide-based vaccine adjuvants. Sci Rep  2015;5:12478. 10.1038/srep12478 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 216. Nagpal  G, Chaudhary  K, Agrawal  P. et al.  Computer-aided prediction of antigen presenting cell modulators for designing peptide-based vaccine adjuvants. J Transl Med  2018;16:181. 10.1186/s12967-018-1560-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 217. Grote  A, Hiller  K, Scheer  M. et al.  JCat: a novel tool to adapt codon usage of a target gene to its potential expression host. Nucleic Acids Res  2005;33:W526–31. 10.1093/nar/gki376 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 218. LeRoy  N, Roleck  C. Optipyzer: a fast and flexible multi-species codon optimization server. bioRxiv  2023;541759. 10.1101/2023.05.22.541759 [DOI] [Google Scholar]
  • 219. Fu  H, Liang  Y, Zhong  X. et al.  Codon optimization with deep learning to enhance protein expression. Sci Rep  2020;10:17617. 10.1038/s41598-020-74091-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 220. Gonzalez-Sanchez  B, Vega-Rodríguez  MA, Santander-Jiménez  S. et al.  Multi-objective artificial bee colony for designing multiple genes encoding the same protein. Appl Soft Comput  2019;74:90–8. 10.1016/j.asoc.2018.10.023 [DOI] [Google Scholar]
  • 221. Zhang  H, Zhang  L, Lin  A. et al.  Algorithm for optimized mRNA design improves stability and immunogenicity. Nature  2023;621:396–403. 10.1038/s41586-023-06127-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 222. Zhang  Y, Huffman  A, Johnson  J. et al.  Vaxign-DL: a deep learning-based method for vaccine design and its evaluation. bioRxiv  2023;2023:2011.2029.569096. 10.1101/2023.11.29.569096 [DOI] [Google Scholar]
  • 223. He  Y, Xiang  Z, Mobley  HLT. Vaxign: the first web-based vaccine design program for reverse vaccinology and applications for vaccine development. Biomed Res Int  2010;2010:1–15. 10.1155/2010/297505 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 224. Vivona  S, Bernante  F, Filippini  F. NERVE: new enhanced reverse vaccinology environment. BMC Biotechnol  2006;6:35. 10.1186/1472-6750-6-35 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 225. Jaiswal  V, Chanumolu  SK, Gupta  A. et al.  Jenner-predict server: prediction of protein vaccine candidates (PVCs) in bacteria based on host-pathogen interactions. BMC Bioinformatics  2013;14:211. 10.1186/1471-2105-14-211 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 226. Rizwan  M, Naz  A, Ahmad  J. et al.  VacSol: a high throughput in silico pipeline to predict potential therapeutic targets in prokaryotic pathogens using subtractive reverse vaccinology. BMC Bioinformatics  2017;18:106. 10.1186/s12859-017-1540-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 227. Goodswen  SJ, Kennedy  PJ, Ellis  JT. Vacceed: a high-throughput in silico vaccine candidate discovery pipeline for eukaryotic pathogens based on reverse vaccinology. Bioinformatics  2014;30:2381–3. 10.1093/bioinformatics/btu300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 228. Doytchinova  IA, Flower  DR. VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics  2007;8:4. 10.1186/1471-2105-8-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 229. Dimitrov  I, Zaharieva  N, Doytchinova  I. Bacterial immunogenicity prediction by machine learning methods. Vaccine  2020;8:709. 10.3390/vaccines8040709 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 230. Li  J, Zhao  Z, Tai  C. et al.  VirusImmu: a novel ensemble machine learning approach for viral immunogenicity prediction. bioRxiv  2023;2023:2011.2023.568426. 10.1101/2023.11.23.568426 [DOI] [Google Scholar]
  • 231. Dey  J, Mahapatra  SR, Raj  TK. et al.  Designing a novel multi-epitope vaccine to evoke a robust immune response against pathogenic multidrug-resistant enterococcus faecium bacterium. Gut Pathogens  2022;14:21. 10.1186/s13099-022-00495-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 232. Chauhan  V, Rungta  T, Goyal  K. et al.  Designing a multi-epitope based vaccine to combat Kaposi sarcoma utilizing Immunoinformatics approach. Sci Rep  2019;9:2517. 10.1038/s41598-019-39299-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 233. Alimentarius  C. Codex principles and guidelines on FOODS derived from BIOTECHNOLOGY. (CAC/GL 44-2003). Available at: https://mobil.bfr.bund.de/cm/343/codex_principles_and_guidelines_on_foods_derived_from_biotechnology.pdf. 2003.
  • 234. Goodman  RE, Ebisawa  M, Ferreira  F. et al.  AllergenOnline: a peer-reviewed, curated allergen database to assess novel food proteins for potential cross-reactivity. Mol Nutr Food Res  2016;60:1183–98. 10.1002/mnfr.201500769 [DOI] [PubMed] [Google Scholar]
  • 235. van  Ree  R, Sapiter Ballerda  D, Berin  MC. et al.  The COMPARE Database: A Public Resource for Allergen Identification. Adapted for Continuous Improvement. Front Allergy  2021;2:700533. https://www.frontiersin.org/journals/allergy/articles/10.3389/falgy.2021.700533 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 236. Sharma  N, Naorem  LD, Jain  S. et al.  ToxinPred2: an improved method for predicting toxicity of proteins. Brief Bioinform  2022;23:bbac174. 10.1093/bib/bbac174 [DOI] [PubMed] [Google Scholar]
  • 237. Robles-Loaiza  AA, Pinos-Tamayo  EA, Mendes  B. et al.  Traditional and computational screening of non-toxic peptides and approaches to improving selectivity. Pharmaceuticals  2022;15:323. 10.3390/ph15030323 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 238. Rathore  AS, Choudhury  S, Arora  A. et al.  ToxinPred 3.0: an improved method for predicting the toxicity of peptides. Comput Biol Med  2024;179:108926. 10.1016/j.compbiomed.2024.108926 [DOI] [PubMed] [Google Scholar]
  • 239. Pan  X, Zuallaert  J, Wang  X. et al.  ToxDL: deep learning using primary structure and domain embeddings for assessing protein toxicity. Bioinformatics  2020;36:5159–68. 10.1093/bioinformatics/btaa656 [DOI] [PubMed] [Google Scholar]
  • 240. Morozov  V, Rodrigues  CHM, Ascher  DB. CSM-toxin: a web-server for predicting protein toxicity. Pharmaceutics  2023;15:431. 10.3390/pharmaceutics15020431 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 241. Wei  L, Ye  X, Xue  Y. et al.  ATSE: a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural network and attention mechanism. Brief Bioinform  2021;22:bbab041. 10.1093/bib/bbab041 [DOI] [PubMed] [Google Scholar]
  • 242. Wei  L, Ye  X, Sakurai  T. et al.  ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning. Bioinformatics  2022;38:1514–24. 10.1093/bioinformatics/btac006 [DOI] [PubMed] [Google Scholar]
  • 243. Guo  W, Liu  J, Dong  F. et al.  Review of machine learning and deep learning models for toxicity prediction. Exp Biol Med  2023;248:1952–73. 10.1177/15353702231209421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 244. Pérez Santín  E, Rodríguez Solana  R, González García  M. et al.  Toxicity prediction based on artificial intelligence: a multidisciplinary overview. WIREs Computational Molecular Science  2021;11:e1516. 10.1002/wcms.1516 [DOI] [Google Scholar]
  • 245. Ishii  KJ, Koyama  S, Nakagawa  A. et al.  Host innate immune receptors and beyond: making sense of microbial infections. Cell Host Microbe  2008;3:352–63. 10.1016/j.chom.2008.05.003 [DOI] [PubMed] [Google Scholar]
  • 246. Choe  J, Kelker  MS, Wilson  IA. Crystal structure of human toll-like receptor 3 (TLR3) Ectodomain. Science  2005;309:581–5. 10.1126/science.1115253 [DOI] [PubMed] [Google Scholar]
  • 247. Kaur  A, Baldwin  J, Brar  D. et al.  Toll-like receptor (TLR) agonists as a driving force behind next-generation vaccine adjuvants and cancer therapeutics. Curr Opin Chem Biol  2022;70:102172. 10.1016/j.cbpa.2022.102172 [DOI] [PubMed] [Google Scholar]
  • 248. Pierce  BG, Wiehe  K, Hwang  H. et al.  ZDOCK server: interactive docking prediction of protein–protein complexes and symmetric multimers. Bioinformatics  2014;30:1771–3. 10.1093/bioinformatics/btu097 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 249. Jiménez-García  B, Pons  C, Fernández-Recio  J. pyDockWEB: a web server for rigid-body protein–protein docking using electrostatics and desolvation scoring. Bioinformatics  2013;29:1698–9. 10.1093/bioinformatics/btt262 [DOI] [PubMed] [Google Scholar]
  • 250. Kozakov  D, Hall  DR, Xia  B. et al.  The ClusPro web server for protein–protein docking. Nat Protoc  2017;12:255–78. 10.1038/nprot.2016.169 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 251. Mondal  A, Chang  L, Perez  A. Modelling peptide–protein complexes: docking, simulations and machine learning. QRB Discovery  2022;3:e17. 10.1017/qrd.2022.14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 252. Vittorio  S, Lunghini  F, Morerio  P. et al.  Addressing docking pose selection with structure-based deep learning: recent advances, challenges and opportunities. Comput Struct Biotechnol J  2024;23:2141–51. 10.1016/j.csbj.2024.05.024 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 253. Abramson  J, Adler  J, Dunger  J. et al.  Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature  2024;630:493–500. 10.1038/s41586-024-07487-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 254. Bianca  C, Pennisi  M. Immune system modelling by top-down and bottom-up approaches. International Mathematical Forum  2012;7:109–28. 10.1017/qrd.2022.14 [DOI] [Google Scholar]
  • 255. Shinde  SB, Kurhekar  MP. Review of the systems biology of the immune system using agent-based models. IET Syst Biol  2018;12:83–92. 10.1049/iet-syb.2017.0073 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 256. Bauer  AL, Beauchemin  CAA, Perelson  AS. Agent-based modeling of host-pathogen systems: the successes and challenges. Inform Sci  2009;179:1379–89. 10.1016/j.ins.2008.11.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 257. Rapin  N, Lund  O, Bernaschi  M. et al.  Computational immunology meets bioinformatics: the use of prediction tools for molecular binding in the simulation of the immune system. PloS One  2010;5:e9862. 10.1371/journal.pone.0009862 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 258. Rapin  N, Lund  O, Castiglione  F. Immune system simulation online. Bioinformatics  2011;27:2013–4. 10.1093/bioinformatics/btr335 [DOI] [PubMed] [Google Scholar]
  • 259. Cheng  P, Xue  Y, Wang  J. et al.  Evaluation of the consistence between the results of Immunoinformatics predictions and real-world animal experiments of a new tuberculosis vaccine MP3RT. Front Cell Infect Microbiol  2022;12:2235–988. 10.3389/fcimb.2022.1047306 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 260. Stolfi  P, Castiglione  F, Mastrostefano  E. et al.  In-silico evaluation of adenoviral COVID-19 vaccination protocols: assessment of immunological memory up to 6 months after the third dose. Front Immunol  2022;13:998262. 10.3389/fimmu.2022.998262 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 261. Li  G, Iyer  B, Prasath  VBS. et al.  DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity. Brief Bioinform  2021;22:bbab160. 10.1093/bib/bbab160 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 262. Ong  E, Wang  H, Wong  MU. et al.  Vaxign-ML: supervised machine learning reverse vaccinology model for improved prediction of bacterial protective antigens. Bioinformatics  2020;36:3185–91. 10.1093/bioinformatics/btaa119 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 263. Ong  E, Cooke  MF, Huffman  A. et al.  Vaxign2: the second generation of the first web-based vaccine design program using reverse vaccinology and machine learning. Nucleic Acids Res  2021;49:W671–8. 10.1093/nar/gkab279 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 264. Fiers  MWEJ, Kleter  GA, Nijland  H. et al.  Allermatch™, a webtool for the prediction of potential allergenicity according to current FAO/WHO Codex alimentarius guidelines. BMC Bioinformatics  2004;5:133. 10.1186/1471-2105-5-133 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 265. Zhang  ZH, Koh  JLY, Zhang  GL. et al.  AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins. Bioinformatics  2007;23:504–6. 10.1093/bioinformatics/btl621 [DOI] [PubMed] [Google Scholar]
  • 266. Björklund  ÅK, Soeria-Atmadja  D, Zorzet  A. et al.  Supervised identification of allergen-representative peptides for in silico detection of potentially allergenic proteins. Bioinformatics  2005;21:39–50. 10.1093/bioinformatics/bth477 [DOI] [PubMed] [Google Scholar]
  • 267. Lu  W, Negi  SS, Schein  CH. et al.  Distinguishing allergens from non-allergenic homologues using physical–chemical property (PCP) motifs. Mol Immunol  2018;99:1–8. 10.1016/j.molimm.2018.03.022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 268. Dimitrov  I, Naneva  L, Doytchinova  I. et al.  AllergenFP: allergenicity prediction by descriptor fingerprints. Bioinformatics  2014;30:846–51. 10.1093/bioinformatics/btt619 [DOI] [PubMed] [Google Scholar]
  • 269. Dimitrov  I, Flower  DR, Doytchinova  I. AllerTOP—a server for in silico prediction of allergens. BMC Bioinformatics  2013;14:S4. 10.1186/1471-2105-14-S6-S4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 270. Dimitrov  I, Bangov  I, Flower  DR. et al.  AllerTOP v.2—a server for in silico prediction of allergens. J Mol Model  2014;20:2278. 10.1007/s00894-014-2278-5 [DOI] [PubMed] [Google Scholar]
  • 271. Shanthappa  PM, Kumar  R. ProAll-D: protein allergen detection using long short term memory - a deep learning approach. ADMET & DMPK  2022;10:231–40. 10.5599/admet.1335 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 272. Kumar  A, Rana  PS. A deep learning based ensemble approach for protein allergen classification. PeerJ Computer science  2023;9:e1622. 10.7717/peerj-cs.1622 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 273. Maurer-Stroh  S, Maurer-Stroh  S, Krutz  NL. et al.  AllerCatPro—prediction of protein allergenicity potential from the protein sequence. Bioinformatics  2019;35:3020–7. 10.1093/bioinformatics/btz029 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 274. Nguyen  MN, Krutz  NL, Limviphuvadh  V. et al.  AllerCatPro 2.0: a web server for predicting protein allergenicity potential. Nucleic Acids Res  2022;50:W36–43. 10.1093/nar/gkac446 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 275. Huber  F, Arnaud  M, Stevenson  BJ. et al.  A comprehensive proteogenomic pipeline for neoantigen discovery to advance personalized cancer immunotherapy. Nat Biotechnol  2024. 10.1038/s41587-024-02420-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 276. Tran  NH, Qiao  R, Xin  L. et al.  Personalized deep learning of individual immunopeptidomes to identify neoantigens for cancer vaccines. Nature Machine Intelligence  2020;2:764–71. 10.1038/s42256-020-00260-4 [DOI] [Google Scholar]
  • 277. Zhou  C, Wei  Z, Zhang  Z. et al.  pTuneos: prioritizing tumor neoantigens from next-generation sequencing data. Genome Med  2019;11:67. 10.1186/s13073-019-0679-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 278. Li  B, Jing  P, Zheng  G. et al.  Neo-intline: integrated pipeline enables neoantigen design through the in-silico presentation of T-cell epitope. Signal Transduct Target Ther  2023;8:397. 10.1038/s41392-023-01644-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 279. Thadani  NN, Gurev  S, Notin  P. et al.  Learning from prepandemic data to forecast viral escape. Nature  2023;622:818–25. 10.1038/s41586-023-06617-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 280. Li  Y, Huang  C, Ding  L. et al.  Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods  2019;166:4–21. 10.1016/j.ymeth.2019.04.008 [DOI] [PubMed] [Google Scholar]
  • 281. Bravi  B. Development and use of machine learning algorithms in vaccine target selection. npj Vaccines  2024;9:15. 10.1038/s41541-023-00795-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 282. Cavasotto  CN, Scardino  V. Machine learning toxicity prediction: latest advances by toxicity end point. ACS Omega  2022;7:47536–46. 10.1021/acsomega.2c05693 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 283. Li  Y, Farhan  MHR, Yang  X. et al.  A review on the development of bacterial multi-epitope recombinant protein vaccines via reverse vaccinology. Int J Biol Macromol  2024;282:136827. 10.1016/j.ijbiomac.2024.136827 [DOI] [PubMed] [Google Scholar]
  • 284. Slathia  PS, Sharma  P. In Silico designing of vaccines: Methods, tools, and their limitations. In: Singh  DB (ed.), Computer-Aided Drug Design, pp. 245–77. Singapore: Springer Singapore, 2020. [Google Scholar]
  • 285. Müller  M, Huber  F, Arnaud  M. et al.  Machine learning methods and harmonized datasets improve immunogenic neoantigen prediction. Immunity  2023;56:2650–2663.e2656. 10.1016/j.immuni.2023.09.002 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

bbaf055-Supplementary
bbaf055-supplementary.docx (118.1KB, docx)

Data Availability Statement

For access to any research-related data, kindly reach out to the corresponding author.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES