Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2022 Aug 12;12:13739. doi: 10.1038/s41598-022-18021-1

Prediction of B cell epitopes in proteins using a novel sequence similarity-based method

Alvaro Ras-Carmona 1, Alexander A Lehmann 1,2, Paul V Lehmann 2, Pedro A Reche 1,
PMCID: PMC9374694  PMID: 35962028

Abstract

Prediction of B cell epitopes that can replace the antigen for antibody production and detection is of great interest for research and the biotech industry. Here, we developed a novel BLAST-based method to predict linear B cell epitopes. To that end, we generated a BLAST-formatted database upon a dataset of 62,730 known linear B cell epitope sequences and considered as a B cell epitope any peptide sequence producing ungapped BLAST hits to this database with identity ≥ 80% and length ≥ 8. We examined B cell epitope predictions by this method in tenfold cross-validations in which we considered various types of non-B cell epitopes, including 62,730 peptide sequences with verified negative B cell assays. As a result, we obtained values of accuracy, specificity and sensitivity of 72.54 ± 0.27%, 81.59 ± 0.37% and 63.49 ± 0.43%, respectively. In an independent dataset incorporating 503 B cell epitopes, this method reached accuracy, specificity and sensitivity of 74.85%, 99.20% and 50.50%, respectively, outperforming state-of-the-art methods to predict linear B cell epitopes. We implemented this BLAST-based approach to predict B cell epitopes at http://imath.med.ucm.es/bepiblast.

Subject terms: Computational models, Protein analysis, Applied immunology, Humoral immunity, B cells

Introduction

A B cell epitope, also known as antigenic determinant, is defined as the specific portion of antigen that is recognized by the B cell receptor or its soluble form (antibodies) secreted after B cell activation13. B cell epitopes can be classified as conformational (also known as discontinuous) or linear (also known as continuous). In proteins, conformational B cell epitopes include residues that are not sequential in the primary structure, but close in space in the antigen three-dimensional structure3,4. In contrast, linear B cell epitopes consist of sequential amino acid residues. These B cell epitopes can be recognized by antibodies out of the remaining protein context and can replace the whole protein for antibody production3,4. There are numerous approaches and methods to predict linear B cell epitopes37. Some of them are based on amino acid propensity scales that depict physicochemical properties of B cell epitopes. The first of such scales was introduced by Hopp and Woods8 and many other scales followed latter, including those based on flexibility9, hydrophobicity10,11, surface accessibility12 and antigenicity13. Most recent approaches to predict B cell epitopes use machine learning algorithms such as neural network14,15, support vector machine1618 or random forest19,20, which are trained on features of known B cell epitopes. As shown by various benchmark evaluations2123, the performance of B cell epitope prediction methods can be quite low and there is still room for improvement.

B cell prediction methods rely on the existence of B cell epitope sequence commonalities. Thereby, we developed a sequence-similarity based method to predict linear B cell epitopes in protein sequences, using the basic local alignment search tool (BLAST)24. The approach relies in finding BLAST hits to a database including 62,730 known linear B cell epitopes extracted from the Immune Epitope DataBase (IEDB)25,26. For evaluation, any ungapped sequence hit to this database with identity ≥ 80% and length ≥ 8 was considered a B cell epitope. In tenfold cross-validations, this method reached an accuracy > 70% and > 65% in various independent datasets including B cell epitopes obtained from the BCIPEP database27 and different types of non-B cell epitopes, outperforming related methods such as those implemented by BepiPred19,20, IBCE-EL28 and LBtope17. BLAST-based B cell epitope predictions are available at BepiBlast (http://imath.med.ucm.es/bepiplast).

Results

B cell epitope BLAST database

We built a BLAST formatted database upon the amino acid sequence of 62,730 experimentally verified linear B cell epitopes, including 940 that are known to be targeted by neutralizing antibodies, obtained from IEDB25,26. Hereafter we will refer to this database as BEPIBD. All B cell epitope sequences in BEPIBD range from 8 to 25 residues. BEPIBD is available as supplementary data at the journal website (Supplementary Dataset 1 online). The mean and median length of B cell epitope sequences in BEPIBD is 13.91 ± 2.85 and 12, respectively. Sequences included in BEPIBD did not share more than 90% identity and the average sequence similarity in the dataset was of 18.36 ± 6.17%. An amino acid frequency analysis revealed that some amino acids are more frequent than others in B cell epitopes (Fig. 1a). The most frequent amino acids are serine (8.33%), alanine (7.75%), leucine (7.87%) and glycine (7.45%), while cysteine (1.56%), tryptophan (2.24%) and methionine (2.37%) are less frequent. However, this scenario changed when amino acid frequencies in B cell epitopes were compared with those in SWISSPROT29,30. As shown in Fig. 1b, tryptophan, proline and histidine are in this case clearly overrepresented in B cell epitopes.

Figure 1.

Figure 1

Absolute and relative amino acid frequencies in B-cell epitopes. (a) Figure shows the frequency in percentage (Y axis) of each of the 20 distinct amino acids (X axis) in B cell epitopes included in BEPIBD. (b) Figure represents the same amino acid frequencies but relative to those in SWISSPROT, represented as log2 values.

BLAST discrimination of B cell epitope

We used BEPIBD as a target database for testing the ability of BLAST to discriminate between B and non-B cell epitopes as indicated in “Methods”. Briefly, we considered as a B cell epitope any query peptide with at least an ungapped hit with length ≥ 8 and identity ≥ 80% to BEPIBD. We evaluated this approach to discriminate B cell epitopes from non-B cell epitopes under tenfold cross-validation, considering various datasets of non B cell epitopes and the same dataset of B cell epitopes (details in “Methods”). Two negative datasets of non-B cell epitopes, RANDPEP and IEDBNB, each including 62,730 peptide sequences, were used in these tenfold cross-validation. RANDPEP includes peptides with random amino acid sequences and IEDBNB includes peptides with reported negative B cell epitope assays obtained from IEDB (more details in “Methods”). RANDPEP and IEDBNB are available as supplementary data at the journal’s website (Supplementary Datasets 2 and 3). The performance of the BLAST approach to discriminate B and non-B cell epitopes was determined by computing the sensitivity (SE), specificity (SP), accuracy (ACC) and the Matthew’s correlation coefficient (MMC) during tenfold cross-validations. As summarized in Table 1, the approach yielded an ACC of 72.54 ± 0.27% when considering as non-B cell epitopes those in the IEDBNB dataset and 81.32 ± 0.20% when considering the non-B cell epitopes in the RANDPEP dataset.

Table 1.

Performance of BLAST-based discrimination of B and non-B cell epitopes.

Negative dataset % SE % SP % ACC MCC
RANDPEP 63.49 ± 0.43 99.15 ± 0.15 81.32 ± 0.20 0.67 ± 0.01
IEDBNB 63.49 ± 0.43 81.59 ± 0.37 72.54 ± 0.27 0.46 ± 0.01

Table reports the sensitivity (% SE), specificity (% SP), accuracy (% ACC), and Matthew’s correlation coefficient (MMC) of BLAST-based discrimination of B cell epitopes in BEPIBD from non-B cell epitopes included in the RANDPEP and IEDBNB datasets. Values were obtained in tenfold cross-validation experiments.

We also evaluated the BLAST-based method in an independent dataset of B and non-B cell epitopes using BEPIBD as the target for BLAST searches and compared the predictions with those produced by BepiPred19,20, IBCE-EL28, and LBtope17. B cell epitopes in the independent dataset consisted of 503 linear B cell epitopes obtained from the BCIPEP database27 (BECIP dataset). As before, we also considered two negative datasets, each including 503 non-B cell epitopes, consisting of random peptide sequences (IRPEP dataset) and non-B cell epitopes from IEDB (INB dataset), respectively, that were obtained as described previously but did not overlap with any of datasets previously described (BEPIBD, RANDPEP and IEDBNB). The global sequence similarity between BECIP dataset and the BEPIBD is of 17.86 ± 5.66% while the sequence similarity between the IRPEP and INB datasets and their counterparts, RANDPEP and IEDBNB, is of 18.21 ± 4.70% and 18.92 ± 6.10%, respectively. The BECIP, IRPEP and INB datasets are available as supplementary data at the journal’s website (Supplementary Dataset 4, 5 and 6).

The results of the B cell epitope predictions using the BLAST-based method, BepiPred, LBtope and IBCE-EL in the BECIP independent dataset in combination with two noted negative datasets are shown in Table 2. The measures of the performance achieved by the BLAST-based method in these independent tests were similar to those obtained in cross-validation (Table 1) but were a bit lower. Thus, the ACC achieved when considering random peptides as non-B cell epitopes (IRPEP peptides) was 74.85%, while in cross-validation was 81.32 ± 0.20%. Similarly, the ACC obtained considering non-B cell epitopes in the INB dataset was 69.48%, lower that the obtained in cross-validation (72.54 ± 0.27%). In any case, the BLAST-based method consistently outperformed all the competing methods in terms of ACC and MCC values in all the tests. The only noted exception was obtained on the INB dataset (non-B cell epitopes from IEDB) with the method IBCE-EL. In sum, this comparison underlines the ability of the BLAST-based method to predict B cell epitopes.

Table 2.

Comparative performance of B cell epitope prediction methods.

Negative dataset Method/tool % SE % SP % ACC MCC
IRPEP BLAST 50.50 99.20 74.85 0.57
BepiPred 37.60 65.01 51.35 0.03
LBtope 42.21 76.34 59.32 0.20
IBCE-EL 77.80 14.91 46.26 − 0.09
INB BLAST 50.50 88.47 69.48 0.42
BepiPred 37.60 66.60 52.14 0.04
LBtope 42.41 77.73 60.02 0.20
IBCE-EL 77.80 82.11 79.96 0.60

Table reports the sensitivity (% SE), specificity (% SP), accuracy (% ACC) and Matthew’s correlation coefficient (MMC) of the BLAST-based method, BepiPred, LBtope and IBCE-EL discriminating B cell epitopes in the BECIP dataset from non-B cell epitopes in two different datasets (IRPEP and INB). B cell epitope predictions with LBtope and IBCE-EL were carried out at the relevant web sites and BepiPred predictions were carried out using the standalone version of BepiPred (details in “Methods”).

BepiBlast web server

We have developed a web-based tool under the name of BepiBlast to enable the prediction of linear B cell epitopes using BLAST. BepiBlast is available for free public use at http://imath.med.ucm.es/bepiblast. The BepiBlast interface, shown in Fig. 2a, has been designed for intuitive and easy use. The input data for BepiBlast can be one or several protein sequences in FASTA format, which can be pasted or uploaded to the server. After submission, BepiBlast runs a BLASTP against BEPIBD and processes the BLAST output to identify B cell epitopes as query fragments from ungapped hits with identity and length higher than 80% and 8, respectively. These search criteria to identify B cell epitopes within protein queries (gaps, minimum identity and peptide length) can be modified by the user. Moreover, if the option “Only neutralizing” is selected, BepiBlast will only return B cell epitopes resulting from hits to neutralizing epitopes. The main output of BepiBlast (Fig. 2b) consists of a table listing all non-overlapping B cell epitopes with bit scores and predicted accessibility and flexibility computed as indicated in “Methods”. Since BLAST searches often detect overlapping B cell epitope hits, BepiBlast uses Zb values, computed as indicated in “Methods”, to simplify the results and return non-overlapping B cell epitope cores. Likewise, BepiPred uses Zb values to color the sequence residues into an RGB scale and visualize the predicted B cell epitopes in the query sequence. Such visualization is shown if the “Graphics” option is selected. The output of BepiBlast also includes BLAST hit information, including the source or the IEDB ID of the known epitope, which is processed to return the predicted B cell epitopes.

Figure 2.

Figure 2

BepiBlast web server. (a) BepiBlast interface. (b) Representative BepiBlast output obtained with default settings. The shown results were obtained for hemagglutinin from Influenza A virus (UniProt Id: P03437). BepiBlast main result consists of a table displaying the following information (from left to right): peptide starting position; peptide ending position; predicted B cell epitope; bit score; accessibility value and flexibility value.

Discussion

Determining the specific regions of a protein that can be recognized by antibodies, B cell epitopes, is of great practical interest. In fact, the primary aim of predicting B cell epitopes in protein sequences is to identify constituent fragments that can substitute the entire protein to produce specific antibodies. In this sense, predicting linear B cell epitopes is of particular relevance for they can be formulated as synthetic peptides which are suitable for antigen-specific antibody production. Currently, there are numerous methods and tools to predict linear B cell epitopes37. Given the essentially endless diversity of the BCR and antibodies, almost any peptide can be suitable for recognition and hence be a B cell epitope. Therefore, the most complex and recent B-cell prediction methods make use of machine learning (ML)-based models that are generated by training ML algorithms on feature data drawn from experimentally determined B cell epitopes and assumed non-B cell epitopes1420. As a result of training, ML-algorithms capture subtle patterns into a single model that serve to distinguish B cell epitopes from non-B cell epitopes. Unfortunately, these approaches suffer from the fact that we do not have bona fide sets of non-B cell epitopes. Subsequently, ML-algorithms are generally trained on random peptides14,19,3133 or peptides with reported negative B cell epitope assays17,28,34. However, it is questionable that random peptides, or peptides with reported negative assays, are not or cannot be antigenic. Not surprisingly, independent benchmark comparisons of B cell epitope prediction methods show that ML-based approaches are marginally better than simple amino acid propensity scales2123. Given the noted limitations, in this work we explored an alternative B ell epitope prediction approach that only takes in consideration the large wealth of known B cell epitope sequences.

The IEDB, the largest repository of immune epitopes, currently includes more than 200,000 unique B cell epitope sequences (release 201,439). Thereby, we considered that a valid approach to predict B cell epitopes in protein sequences is to detect sequence similarities to individual known B cell epitopes using BLAST. To validate such approach, we gathered from IEDB a dataset of 62,730 known B cell epitopes to generate BLAST-formatted databases and considered a B cell epitope any BLAST hit with a length and identity ≥ 8 and 80%, respectively. We showed that in tenfold cross-validation this BLAST-based method could distinguish known B cell epitopes from two types of assumed non-B cell epitopes with a remarkable accuracy (Table 1). For example, the accuracy obtained considering non-B cell epitopes with reported negative assays is 72.54 ± 0.27%. We also showed that the accuracy of the BLAST-based method on an independent source of known B cell epitopes and two distinct datasets of non-B cell epitopes is above 69%, higher than that obtained with competing ML-based methods, such as BepiPred, IBCE-EL and LBtope (Table 2). There is however an exception. IBCE-EL achieved better accuracy than the BLAST method (79.96% vs 69.48%) when considering as non-B cell epitopes peptides with reported B cell epitope negative assays. However, it is worth noting that IBCE-EL models were precisely trained on such data. In other words, the testing dataset is a valid independent dataset to assess the accuracy of our method but not that of IBCE-EL.

Following the noted results, we developed a web-based tool, BepiBlast, enabling B cell epitope predictions in protein sequences using this BLAST-based method. Given the practical relevance, prediction of linear B cell epitopes has been tackled through numerous approaches, ranging from simple amino acid propensity scales to sophisticated models resulting of combining perturbation theory and machine learning3537. Moreover, there are a number of tools to predict linear B cell epitopes that are available for free public use online (Table 3). In general, state-of-the-art tools for linear B cell epitope prediction implement alignment-free methods based on ML (Table 3). In fact, to our knowledge, BepiBlast, is the only tool that implements an alignment-based module designed and validated for the specific task of predicting linear B cell epitopes. However, it is worth noting that alignment-based approaches, similar to those implemented by BepiBlast, have been used to identify similarity between antigens and to detect antigen cross-reactivity and/or molecular mimicry38,39. Relevant examples of tools that have been released to detect molecular mimicry using alignment-based approaches are EPITOPEDIA40 and CE-BLAST39.

Table 3.

Comparison of available web-based tools for predicting linear B cell epitopes.

Tool Algorithm Training dataset Validation URL Reference
B cell epitopes Non-B cell epitopes
BepiBlast BLAST 62,730 X, I http://imath.med.ucm.es/bepiblast/
Bceps Support vector machine 555 555 (a) X, I, E http://imath.med.ucm.es/bceps/ 18
BepiPred 2.0a Random forest 3542 36,785 X, I, E https://services.healthtech.dtu.dk/service.php?BepiPred-2.0 20
LBtopeb Support vector machine 14,876 23,321 (b) X, I https://webs.iiitd.edu.in/raghava/lbtope/ 17
IBCE-EL Random tree with boosting 4440 5485 (b) X, I http://www.thegleelab.org/iBCE-EL/ 28
DLBEpitope Deep neural network 22,012 201,563 (b) X, I http://ccb1.bmi.ac.cn:81/dlbepitope/index.php? 15
ILBE Random Forest 4440 5485 (b) X, I http://kurata14.bio.kyutech.ac.jp/iLBE/ 41
ABCPred Neural network 700 700 (a) X, I https://webs.iiitd.edu.in/raghava/abcpred/ 14
BCPREDS Support vector machine 701 701 (a) X, I, E http://ailab.ist.psu.edu/bcpred/ 32
SVMtrip Support vector machine 4925 4925 (b) X http://sysbio.unl.edu/SVMTriP/prediction.php 16

For each tool, table reports the underlying algorithm; the number of B and non-B cell epitopes for model building; the method used for validation (X: cross-validation; I: independent dataset; E: case example); the URL of the tool and the reference. The letter between parenthesis indicates the type of non-B cell epitopes in the training dataset: a, random peptide sequences; b, peptide sequences with reported negative B cell epitope assays. aFor BepiPred, B and non-B cell epitope figures correspond to antigen residues that in the tertiary structure of antibody-antigen complexes contact the antibody or not, respectively. bData for default model in LBtope.

Compared with other tools, BepiBlast stands out for relying on the largest collection of known B cell epitopes without non-B cell epitopes. The absence of non-B cell epitopes may limit the chance for over fitting that particularly affect to ML-based methods42. Unlike competing tools, BepiBlast can also report if predicted B cell epitopes come from BLAST hits to neutralizing B cell epitopes as well as the accessibility and flexibility of B cell epitopes. Antibodies generated against predicted linear B cell epitopes do often fail to recognize the native protein, but this can be compensated by selecting B cell epitopes with enhanced flexibility and solvent accessibility18,38,43,44.

Conclusions

We have shown that sequence similarity to available B cell epitope sequences poses a valid and advantageous approach to predict B cell epitopes on nominal antigens. We have enabled such predictions for free public use at BepiBlast.

Methods

B cell epitopes

Linear B cell epitopes were extracted from IEDB25,26. Only experimentally verified B cell epitopes with positive assays were considered and all sources were considered. An independent set of known linear B cell epitopes was downloaded from the BCIPEP database27. Only B cell epitopes with a size between 8 and 25 residues were considered and CD-HIT45 was used to reduce sequence redundancy, discarding sequences with identity ≥ 90%. B cell epitopes obtained from IEDB and BCIPEP were distinct.

Non-B cell epitopes

Two types of non-B cell epitopes were considered in this study. A type of non-B cell epitopes consisted of random peptide sequences generated using the amino acid composition of proteins in the SWISSPROT database29,30. Length distribution of these non-B cell epitopes was fixed to resemble that of known B cell epitopes obtained from IEDB. The other type of non-B cell epitopes consisted of peptides with negative antibody recognition assays and size between 8 and 25 residues that were obtained from IEDB. All non-B cell epitopes were subjected to sequence redundancy using CD-HIT so that amino acid sequence identity was < 90%.

Sequence similarity analysis

Sequence similarity was analyzed after pairwise sequence alignments generated using the Needleman–Wunsch global alignment algorithm implemented by the needle application of the Biopython package46. As we described elsewhere47, to obtain a measure of average sequence similarity in a dataset, all sequences were aligned pairwise but with themselves (for a dataset with N sequences there will be N × N − 1 alignments), identities were obtained for each alignment and the average identity was computed.

Evaluation of BLAST-based predictions of B cell epitopes

B epitope predictions were assessed after BLAST24 searches to a BLAST formatted database consisting of B cell epitopes obtained from IEDB. Under this approach, any peptide sequence query with at least an ungapped hit with length ≥ 8 and identity ≥ 80% to the database was considered as a B cell epitope. BLAST-based predictions of B cell epitopes were evaluated under tenfold cross-validation, considering datasets of B cell and non-B cell epitopes with equal number of sequences. For each round of cross-validation, BLAST formatted databases were generated upon 90% of all B cell epitope sequences and used as a target database for BLAST testing of 10% of the remaining B cell epitopes as well as 10% of non-B cell epitopes. B cell and non-B cell epitopes with at least one hit (ungapped, length ≥ 8 and identity ≥ 80%) to the target database were considered as true and false positives (TP and FP), respectively, while non-B cell and B cell epitopes with no hits were considered as true and false negatives (TN and FN), respectively. Sensitivity (SE), specificity (SP), accuracy (ACC) and the Matthews correlation coefficient (MCC) were computed using Eqs. (1), (2), (3) and (4), respectively.

SE=TPTP+FN×100, 1
SP=TNTN+FP×100, 2
ACC=(TP+TN)(TP+FP+TN+FN)×100, 3
MCC=TP×TN-(FP×FN)(TN+FN)(TP+FN)(TN+FP)(TP+FP). 4

These same criteria and parameters were used to evaluate BLAST-based B cell epitope predictions on independent datasets.

Prediction of linear B cell epitopes with freely available tools

For comparative analysis, linear B cell epitopes were predicted using the web-based tools IBCE-EL28 and LBtope17, available at http://www.thegleelab.org/iBCE-EL/ and https://webs.iiitd.edu.in/raghava/lbtope/peptide.php, respectively. LBtope predictions were carried out selecting the default model labeled as “LBtope_Variable (original dataset)” and peptides with probability higher than 0.6 were considered B cell epitopes, as suggested by the tool. BCE-EL predictions were also carried out with default settings, considering peptides labeled by the tool as “BCE” as B cell epitopes. B cell epitopes were also predicted using a standalone version of BepiPred (Bepipred 2.0)19,20. BepiPred assigns B cell epitope propensities per residue and average B cell epitope scores were computed. Peptide sequences with scores ≥ 0.5 were considered as B cell epitopes.

BLAST-based detection of B cell epitopes in protein sequences

B cell epitopes in protein query sequences were detected after BLAST searches to a database consisting of known linear B cell epitopes using the standalone version of BLAST with default settings. Protein query fragments from hits with length ≥ 8, identity ≥ 80% and no gaps were considered as B cell epitopes. Since BLAST searches often produce various overlapping hits, we devised a system to simplify the results and identify B cell epitope cores. To that end, the occurrence of protein residues in hits is first tallied up (B). Residues that are not included in any hit are assigned B = 0. Subsequently, normalized B values, ZB, are computed using Eq. (5).

ZB=B-μBB. 5

B is the tallied up occurrence of a particular residue in B cell epitope hits, µB is the mean of B of all protein residues and B is its standard deviation. ZB values were then used to identify B cell epitope cores as those consisting of 8 residues or more (those matching overlapping B cell epitope hits) with ZB values ≥ than flanking residues.

Other procedures

B cell epitope scores were computed as bit scores upon BLAST hit alignments using the BLOSUM62 substitution matrix48. Relative solvent accessibility (RSA) and normalized B values—used as a measure of flexibility—per residue were predicted for the entire protein query sequences using NetSurfP49 and profBval50, respectively, and measures of epitope accessibility and epitope flexibility consisted of average values computed from the corresponding epitope residue values.

Web implementation

BLAST-based B cell epitope predictions were implemented for free public use on the Web using a Python CGI (Common Gateway Interface) script that executes BLAST searches on user-provided input data and returns the predicted B cell epitopes to the browser, along with epitope annotation information from IEDB. The front-end web interface was developed using Hyper Text Markup Language (HTML) in combination with Cascading Style Sheets (CSS) and JavaScript. Web page administration is done using Apache HTTP Server (https://httpd.apache.org).

Supplementary Information

Acknowledgements

We wish to thank to Dr. Esther Lafuente for critical reading and helpful comments, and the CAM research agency for financial support through grant IND2020/BMD-17364 to PAR.

Author contributions

Conceptualization, P.A.R.; methodology, A.R.C., A.A.L., P.V.L. and P.A.R.; investigation: A.R.C. and P.A.R., writing-original draft: A.R.C. and P.A.R.; final writing and editing: A.R.C., A.A.L., P.V.L. and P.A.R. All authors have read and approved the final manuscript.

Data availability

All data generated or analysed during this study are included in this published article and its Supplementary information files.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-022-18021-1.

References

  • 1.Murphy K, Weaver C. Janeway’s Immunobiology. W.W. Norton & Company; 2016. [Google Scholar]
  • 2.Van Regenmortel, M. H. V. What is a B-cell epitope? In BT—Epitope Mapping Protocols, 2nd ed. (eds. Schutkowski, M. & Reineke, U.) 3–20 (Humana Press, 2009). 10.1007/978-1-59745-450-6_1.
  • 3.Sanchez-Trincado JL, Gomez-Perosanz M, Reche PA. Fundamentals and methods for T- and B-cell epitope prediction. J. Immunol. Res. 2017;2017:2680160. doi: 10.1155/2017/2680160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ponomarenko JV, van Regenmortel MHV. B-cell epitope prediction. Struct. Bioinform. 2009;2:1096. [Google Scholar]
  • 5.Backert L, Kohlbacher O. Immunoinformatics and epitope prediction in the age of genomic medicine. Genome Med. 2015;7:1–12. doi: 10.1186/s13073-015-0245-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lundegaard C, Lund O, Keşmir C, Brunak S, Nielsen M. Modeling the adaptive immune system: Predictions and simulations. Bioinformatics. 2007;23:3265–3275. doi: 10.1093/bioinformatics/btm471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Greenbaum JA, et al. Towards a consensus on datasets and evaluation metrics for developing B-cell epitope prediction tools. J. Mol. Recognit. 2007;20:75–82. doi: 10.1002/jmr.815. [DOI] [PubMed] [Google Scholar]
  • 8.Hopp TP, Woods KR. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. U. S. A. 1981;78:3824–3828. doi: 10.1073/pnas.78.6.3824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Karplus PA, Schulz GE. Prediction of chain flexibility in proteins. Naturwissenschaften. 1985;72:212–213. doi: 10.1007/BF01195768. [DOI] [Google Scholar]
  • 10.Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 1982;157:105–132. doi: 10.1016/0022-2836(82)90515-0. [DOI] [PubMed] [Google Scholar]
  • 11.Eisenberg D, Weiss RM, Terwilliger TC. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc. Natl. Acad. Sci. U. S. A. 1984;81:140–144. doi: 10.1073/pnas.81.1.140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Emini EA, Hughes JV, Perlow DS, Boger J. Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide. J. Virol. 1985;55:836–839. doi: 10.1128/jvi.55.3.836-839.1985. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kolaskar AS, Tongaonkar PC. A semi-empirical method for prediction of antigenic determinants on protein antigens. FEBS Lett. 1990;276:172–174. doi: 10.1016/0014-5793(90)80535-Q. [DOI] [PubMed] [Google Scholar]
  • 14.Saha S, Raghava GPS. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins Struct. Funct. Bioinform. 2006;65:40–48. doi: 10.1002/prot.21078. [DOI] [PubMed] [Google Scholar]
  • 15.Liu T, Shi K, Li W. Deep learning methods improve linear B-cell epitope prediction. BioData Min. 2020;13:1. doi: 10.1186/s13040-020-00211-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Yao B, Zhang L, Liang S, Zhang C. SVMTriP: A method to predict antigenic epitopes using support vector machine to integrate tri-peptide similarity and propensity. PLoS One. 2012;7:e45152. doi: 10.1371/journal.pone.0045152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Singh H, Ansari HR, Raghava GPS. Improved method for linear B-cell epitope prediction using antigen’s primary sequence. PLoS One. 2013;8:e62216. doi: 10.1371/journal.pone.0062216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Ras-Carmona A, Pelaez-Prestel HF, Lafuente EM, Reche PA. BCEPS: A web server to predict linear B cell epitopes with enhanced immunogenicity and cross-reactivity. Cells. 2021;10:2744. doi: 10.3390/cells10102744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Larsen JEP, Lund O, Nielsen M. Improved method for predicting linear B-cell epitopes. Immunome Res. 2006;2:2. doi: 10.1186/1745-7580-2-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Jespersen MC, Peters B, Nielsen M, Marcatili P. BepiPred-2.0: Improving sequence-based B-cell epitope prediction using conformational epitopes. Nucleic Acids Res. 2017;45:W24–W29. doi: 10.1093/nar/gkx346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Blythe MJ, Flower DR. Benchmarking B cell epitope prediction: Underperformance of existing methods. Protein Sci. 2005;14:246–248. doi: 10.1110/ps.041059505. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Galanis KA, et al. Linear B-cell epitope prediction for in silico vaccine design: A performance review of methods available via command-line interface. Int. J. Mol. Sci. 2021;22:3210. doi: 10.3390/ijms22063210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Raimondi D, Orlando G, Vranken WF, Moreau Y. Exploring the limitations of biophysical propensity scales coupled with machine learning for protein sequence analysis. Sci. Rep. 2019;9:16932. doi: 10.1038/s41598-019-53324-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  • 25.Vita R, et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 2015;43:D405–D412. doi: 10.1093/nar/gku938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Vita R, et al. The immune epitope database (IEDB): 2018 update. Nucleic Acids Res. 2019;47:D339–D343. doi: 10.1093/nar/gky1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Saha S, Bhasin M, Raghava GPS. Bcipep: A database of B-cell epitopes. BMC Genom. 2005;6:79. doi: 10.1186/1471-2164-6-79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Manavalan B, Govindaraj RG, Shin TH, Kim MO, Lee G. iBCE-EL: A new ensemble learning framework for improved linear B-cell epitope prediction. Front. Immunol. 2018;9:1695. doi: 10.3389/fimmu.2018.01695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Boutet E, et al. UniProtKB/Swiss-Prot, the manually annotated section of the UniProt knowledge base: How to use the entry view. Methods Mol. Biol. 2016;1374:23–54. doi: 10.1007/978-1-4939-3167-5_2. [DOI] [PubMed] [Google Scholar]
  • 30.Poux S, et al. On expert curation and scalability: UniProtKB/Swiss-Prot as a case study. Bioinformatics. 2017;33:3454–3460. doi: 10.1093/bioinformatics/btx439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Sweredoski MJ, Baldi P. COBEpro: A novel system for predicting continuous B-cell epitopes. Protein Eng. Des. Sel. 2009;22:113–120. doi: 10.1093/protein/gzn075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.El-Manzalawy Y, Dobbs D, Honavar V. Predicting linear B-cell epitopes using string kernels. J. Mol. Recognit. Interdiscip. J. 2008;21:243–255. doi: 10.1002/jmr.893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Chen J, Liu H, Yang J, Chou K-C. Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids. 2007;33:423–428. doi: 10.1007/s00726-006-0485-9. [DOI] [PubMed] [Google Scholar]
  • 34.Lian Y, Ge M, Pan X-M. EPMLR: Sequence-based linear B-cell epitope prediction method using multiple linear regression. BMC Bioinform. 2014;15:414. doi: 10.1186/s12859-014-0414-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Martínez-Arzate SG, et al. PTML model for proteome mining of B-cell epitopes and theoretical-experimental study of Bm86 protein sequences from Colima, Mexico. J. Proteome Res. 2017;16:4093–4103. doi: 10.1021/acs.jproteome.7b00477. [DOI] [PubMed] [Google Scholar]
  • 36.González-Díaz H, Pérez-Montoto LG, Ubeira FM. Model for vaccine design by prediction of B-epitopes of IEDB given perturbations in peptide sequence, in vivo process, experimental techniques, and source or host organisms. J. Immunol. Res. 2014;2014:768515. doi: 10.1155/2014/768515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Vázquez-Prieto S, Paniagua E, Solana H, Ubeira FM, González-Díaz H. A study of the Immune Epitope Database for some fungi species using network topological indices. Mol. Divers. 2017;21:713–718. doi: 10.1007/s11030-017-9749-4. [DOI] [PubMed] [Google Scholar]
  • 38.Reche PA. Potential cross-reactive immunity to SARS-CoV-2 from common human pathogens and vaccines. Front. Immunol. 2020;11:586984. doi: 10.3389/fimmu.2020.586984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Qiu T, et al. CE-BLAST makes it possible to compute antigenic similarity for newly emerging pathogens. Nat. Commun. 2018;9:1772. doi: 10.1038/s41467-018-04171-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Balbin CA, et al. Epitopedia: Identifying molecular mimicry between pathogens and known immune epitopes. bioRxiv. 2022 doi: 10.1101/2021.08.26.457577. [DOI] [Google Scholar]
  • 41.Hasan MM, Khatun MS, Kurata H. iLBE for computational identification of linear B-cell epitopes by integrating sequence and evolutionary features. Genom. Proteom. Bioinform. 2020;18:593–600. doi: 10.1016/j.gpb.2019.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Charilaou P, Battat R. Machine learning models and over-fitting considerations. World J. Gastroenterol. 2022;28:605–607. doi: 10.3748/wjg.v28.i5.605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Quinzo MJ, Lafuente EM, Zuluaga P, Flower DR, Reche PA. Computational assembly of a human Cytomegalovirus vaccine upon experimental epitope legacy. BMC Bioinform. 2019;20:476. doi: 10.1186/s12859-019-3052-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Alonso-Padilla J, Lafuente EM, Reche PA. Computer-aided design of an epitope-based vaccine against Epstein–Barr virus. J. Immunol. Res. 2017;2017:9363750. doi: 10.1155/2017/9363750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
  • 46.Cock PJA, et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Ras-Carmona A, Gomez-Perosanz M, Reche PA. Prediction of unconventional protein secretion by exosomes. BMC Bioinform. 2021;22:333. doi: 10.1186/s12859-021-04219-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U. S. A. 1992;89:10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Klausen MS, et al. NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning. Proteins Struct. Funct. Bioinform. 2019;87:520–527. doi: 10.1002/prot.25674. [DOI] [PubMed] [Google Scholar]
  • 50.Schlessinger A, Yachdav G, Rost B. PROFbval: Predict flexible and rigid residues in proteins. Bioinformatics. 2006;22:891–893. doi: 10.1093/bioinformatics/btl032. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

All data generated or analysed during this study are included in this published article and its Supplementary information files.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES