Skip to main content
Wiley - PMC COVID-19 Collection logoLink to Wiley - PMC COVID-19 Collection
. 2008 May 22;21(4):243–255. doi: 10.1002/jmr.893

Predicting linear B‐cell epitopes using string kernels

Yasser EL‐Manzalawy 1,2,4,5,, Drena Dobbs 3,4,5, Vasant Honavar 1,2,3,5
PMCID: PMC2683948  NIHMSID: NIHMS67493  PMID: 18496882

Abstract

The identification and characterization of B‐cell epitopes play an important role in vaccine design, immunodiagnostic tests, and antibody production. Therefore, computational tools for reliably predicting linear B‐cell epitopes are highly desirable. We evaluated Support Vector Machine (SVM) classifiers trained utilizing five different kernel methods using fivefold cross‐validation on a homology‐reduced data set of 701 linear B‐cell epitopes, extracted from Bcipep database, and 701 non‐epitopes, randomly extracted from SwissProt sequences. Based on the results of our computational experiments, we propose BCPred, a novel method for predicting linear B‐cell epitopes using the subsequence kernel. We show that the predictive performance of BCPred (AUC = 0.758) outperforms 11 SVM‐based classifiers developed and evaluated in our experiments as well as our implementation of AAP (AUC = 0.7), a recently proposed method for predicting linear B‐cell epitopes using amino acid pair antigenicity. Furthermore, we compared BCPred with AAP and ABCPred, a method that uses recurrent neural networks, using two data sets of unique B‐cell epitopes that had been previously used to evaluate ABCPred. Analysis of the data sets used and the results of this comparison show that conclusions about the relative performance of different B‐cell epitope prediction methods drawn on the basis of experiments using data sets of unique B‐cell epitopes are likely to yield overly optimistic estimates of performance of evaluated methods. This argues for the use of carefully homology‐reduced data sets in comparing B‐cell epitope prediction methods to avoid misleading conclusions about how different methods compare to each other. Our homology‐reduced data set and implementations of BCPred as well as the APP method are publicly available through our web‐based server, BCPREDS, at: http://ailab.cs.iastate.edu/bcpreds/. Copyright © 2008 John Wiley & Sons, Ltd.

Keywords: linear B‐cell epitope, epitope mapping, epitope prediction

REFERENCES

  1. Alix A. 1999. Predictive estimation of protein linear epitopes by using the program PEOPLE. Vaccine 18: 311–314. [DOI] [PubMed] [Google Scholar]
  2. Bairoch A, Apweiler R. 2000. The SWISS‐PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28: 45–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baldi P, Brunak S, Chauvin Y, Andersen C, Nielsen H. 2000. Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics 16: 412–424. [DOI] [PubMed] [Google Scholar]
  4. Barlow D, Edwards M, Thornton J. 1986. Continuous and discontinuous protein antigenic determinants. Nature 322: 747–748. [DOI] [PubMed] [Google Scholar]
  5. Beniac D, Andonov A, Grudeski E, Booth T. 2006. Architecture of the SARS coronavirus prefusion spike. Nat. Struct. Mol. Biol. 13: 751–752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Björklund Å, Soeria‐Atmadja D, Zorzet A, Hammerling U, Gustafsson M. 2005. Supervised identification of allergen‐representative peptides for in silico detection of potentially allergenic proteins. Bioinformatics 21: 39–50. [DOI] [PubMed] [Google Scholar]
  7. Blythe M, Flower D. 2005. Benchmarking B cell epitope prediction: underperformance of existing methods. Prot. Sci. 14: 246–248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bulashevska A, Eils R. 2006. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains. BMC Bioinform. 7: 298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chen J, Liu H, Yang J, Chou K. 2007. Prediction of linear B‐cell epitopes using amino acid pair antigenicity scale. Amino Acids 33: 423–428. [DOI] [PubMed] [Google Scholar]
  10. Clark A, Florencio C, Watkins C, Serayet M. 2006. Planar languages and learnability. International Colloquium on Grammatical Inference (ICGI06). Lihue, Hawaii. [Google Scholar]
  11. Cui J, Han L, Lin H, Tan Z, Jiang L, Cao Z, Chen Y. 2006. MHC‐BPS: MHC‐binder prediction server for identifying peptides of flexible lengths from sequence‐derived physicochemical properties. Immunogenetics 58: 607–613. [DOI] [PubMed] [Google Scholar]
  12. Demšar J. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7: 1–30. [Google Scholar]
  13. Dimitrov D. 2003. The secret life of ACE2 as a receptor for the SARS virus. Cell 115: 652–653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Drosten C, Gunther S, Preiser W, van der Werf S, Brodt H, Becker S, Rabenau H, Panning M, Kolesnikova L, Fouchier R, Berger A, Burguiere A, Cinatl J, Eickmann M, Escriou N, Grywna K, Kramme S, Manuguerra J, Muller S, Rickerts V, Sturmer M, Vieth S, Klenk H, Osterhaus ADME, Schmitz H, Doerr HW. 2003. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N. Engl. J. Med. V 348: 1967–1976. [DOI] [PubMed] [Google Scholar]
  15. Emini E, Hughes J, Perlow D, Boger J. 1985. Induction of hepatitis A virus‐neutralizing antibody by a virus‐specific synthetic peptide. J. Virol. 55: 836–839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Fisher R. 1973. Statistical Methods and Scientific Inference. Hafner Press: New York. [Google Scholar]
  17. Flower D. 2007. Immunoinformatics: Predicting Immunogenicity in silico, 1st Ed Humana: Totowa NJ. [DOI] [PubMed] [Google Scholar]
  18. Fouchier R, Kuiken T, Schutten M, van Amerongen G, van Doornum G, van den Hoogen B, Peiris M, Lim W, Stöhr K, Osterhaus A. 2003. Koch's postulates fulfilled for SARS virus. Nature 423: 240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Friedman M. 1940. A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11: 86–92. [Google Scholar]
  20. Greenbaum J, Andersen P, Blythe M, Bui H, Cachau R, Crowe J, Davies M, Kolaskar A, Lund O, Morrison S, Mumey B, Ofran Y, Pellequer J, Pinilla C, Ponomarenko JV, Raghava GPS, van Regenmortel MHV, Roggen EL, Sette A, Schlessinger A, Sollner J, Zand M, Peters B. 2007. Towards a consensus on datasets and evaluation metrics for developing B‐cell epitope prediction tools. J. Mol. Recognit. 20: 75–82. [DOI] [PubMed] [Google Scholar]
  21. Haussler D. 1999. Convolution kernels on discrete structures. UC Santa Cruz Technical Report UCS‐CRL‐99–10.
  22. Karplus P, Schulz G. 1985. Prediction of chain flexibility in proteins: a tool for the selection of peptide antigen. Naturwiss. 72: 21–213. [Google Scholar]
  23. Ksiazek T, Erdman D, Goldsmith C, Zaki S, Peret T, Emery S, Tong S, Urbani C, Comer J, Lim W, Rollin PE, Dowell SF, Ling A, Humphrey CD, Shieh W, Guarner J, Paddock CD, Rota P, Fields B, DeRisi J, Yang J, Cox N, Hughes JM, Le Duc JW, Bellini WJ, Anderson LJ, the SARS Working Group. 2003. A novel coronavirus associated with severe acute respiratory syndrome. N. Engl. J. Med. 348: 1953–1966. [DOI] [PubMed] [Google Scholar]
  24. Langeveld J, Martinez Torrecuadrada J, Boshuizen R, Meloen R, Ignacio C. 2001. Characterisation of a protective linear B cell epitope against feline parvoviruses. Vaccine 19: 2352–2360. [DOI] [PubMed] [Google Scholar]
  25. Larsen J, Lund O, Nielsen M. 2006. Improved method for predicting linear B‐cell epitopes. Immun. Res. 2: 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Leslie C, Eskin E, Cohen A, Weston J, Noble W. 2004. Mismatch string kernels for discriminative protein classification. Bioinformatics 20: 467–476. [DOI] [PubMed] [Google Scholar]
  27. Leslie C, Eskin E, Noble W. 2002. The spectrum kernel: a string kernel for SVM protein classification. Proc. Pacific Sympos. Biocomput. 7: 566–575. [PubMed] [Google Scholar]
  28. Li W, Jaroszewski L, Godzik A. 2002. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics 18: 77–82. [DOI] [PubMed] [Google Scholar]
  29. Li W, Moore M, Vasilieva N, Sui J, Wong S, Berne M, Somasundaran M, Sullivan J, Luzuriaga K, Greenough TC, Choe H, Farzan M. 2003. Angiotensin‐converting enzyme 2 is a functional receptor for the SARS coronavirus. Nature 426: 450–454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Lien S, Shih Y, Chen H, Tsai J, Leng C, Lin M, Lin L, Liu H, Chou A, Chang Y, Chen Y, Chong P, Liu S. 2007. Identification of synthetic vaccine candidates against SARS CoV infection. Biochem. Biophys. Res. Commun. 358: 716–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Lodhi H, Saunders C, Shawe‐Taylor J, Cristianini N, Watkins C. 2002. Text classification using string kernels. J. Mach. Learn. Res. 2: 419–444. [Google Scholar]
  32. Neter J, Wasserman J, Kutner M. 1985. Applied Linear Statistical Models, 2nd Ed Irwin: Homewood, IL. [Google Scholar]
  33. Odorico M, Pellequer J. 2003. BEPITOPE: predicting the location of continuous epitopes and patterns in proteins. J. Mol. Recognit. 16: 20–22. [DOI] [PubMed] [Google Scholar]
  34. Parker JM, Guo D, Hodges RS. 1986. New hydrophilicity scale derived from high‐performance liquid chromatography peptide retention data: Correlation of predicted surface residues with antigenicity and x‐ray‐derived accessible sites. Biochemistry 25: 5425–5432. [DOI] [PubMed] [Google Scholar]
  35. Peiris J, Lai S, Poon L, Guan Y, Yam L, Lim W, Nicholls J, Yee W, Yan W, Cheung M, Cheng V, Chan K, Tsang D, Tung R, Ng T, Yuen K, members of the SARS study group. 2003. Coronavirus as a possible cause of severe acute respiratory syndrome. The Lancet 361: 1319–1325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Pellequer J, Westhof E. 1993. PREDITOP: a program for antigenicity prediction. J. Mol. Graph. 11: 204–210. [DOI] [PubMed] [Google Scholar]
  37. Pellequer J, Westhof E, Van Regenmortel M. 1991. Predicting location of continuous epitopes in proteins from their primary structures. Meth. Enzymol. 203: 176–201. [DOI] [PubMed] [Google Scholar]
  38. Pellequer J, Westhof E, Van Regenmortel M. 1993. Correlation between the location of antigenic sites and the prediction of turns in proteins. Immunol. Lett. 36: 83–99. [DOI] [PubMed] [Google Scholar]
  39. Pier G, Lyczak J, Wetzler L. 2004. Immunology, Infection, and Immunity, 1st Ed ASM Press: PL Washington. [Google Scholar]
  40. Platt J. 1998. Fast Training of Support Vector Machines using Sequential Minimal Optimization. MIT Press: Cambridge, MA. [Google Scholar]
  41. Prabakaran P, Gan J, Feng Y, Zhu Z, Choudhry V, Xiao X, Ji X, Dimitrov D. 2006. Structure of severe acute respiratory syndrome coronavirus receptor‐binding domain complexed with neutralizing antibody. J. Biol. Chem. 281: 15829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Rangwala H, DeRonne K, Karypis G. 2006. Protein Structure Prediction using String Kernels. Defense Technical Information Center. [Google Scholar]
  43. Saha S, Bhasin M, Raghava GP. 2005. Bcipep: a database of B‐cell epitopes. BMC Genomics 6: 79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Saha S, Raghava GP. 2004. BcePred: prediction of continuous B‐cell epitopes in antigenic sequences using physico‐chemical properties. Lect. Notes Comput. Sci. 3239: 197–204. [Google Scholar]
  45. Saha S, Raghava G. 2006a. ABCPred benchmarking datasets. Available at http://www.imtech.res.in/raghava/abcpred/dataset.html
  46. Saha S, Raghava G. 2006b. Prediction of continuous B‐cell epitopes in an antigen using recurrent neural network. Proteins 65: 40–48. [DOI] [PubMed] [Google Scholar]
  47. Saigo H, Vert J, Ueda N, Akutsu T. 2004. Protein homology detection using string alignment kernels. Bioinformatics 20: 1682–1689. [DOI] [PubMed] [Google Scholar]
  48. Sainz JB, Rausch J, Gallaher W, Garry R, Wimley W. 2005. Identification and characterization of the putative fusion peptide of the severe acute respiratory syndrome‐associated coronavirus spike protein. Virology 79: 7195–7206. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Salomon J, Flower D. 2006. Predicting class II MHC‐peptide binding: a kernel based approach using similarity scores. BMC Bioinform. 7: 501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Seewald A, Kleedorfer F. 2005. Lambda pruning: an approximation of the string subsequence kernel. Technical Report, Osterreichisches Forschungsinstitut fur Artificial Intelligence, Wien, TR‐2005–13.
  51. Söllner J, Mayer B. 2006. Machine learning approaches for prediction of linear B‐cell epitopes on proteins. J. Mol. Recognit. 19: 200–208. [DOI] [PubMed] [Google Scholar]
  52. Sui J, Li W, Murakami A, Tamin A, Matthews L, Wong S, Moore M, Tallarico A, Olurinde M, Choe H, Anderson LJ, Bellini WJ, Farzan M, Marasco WA. 2004. Potent neutralization of severe acute respiratory syndrome (SARS) coronavirus by a human mAb to S1 protein that blocks receptor association. Proc. Natl Acad. Sci. 101: 2536–2541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Vapnik V. 2000. The Nature of Statistical Learning Theory, 2nd Ed Springer‐Verlag; New York, Inc. New York, NY, USA. [Google Scholar]
  54. Walter G. 1986. Production and use of antibodies against synthetic peptides. J. Immunol. Meth. 88: 149–161. [DOI] [PubMed] [Google Scholar]
  55. Witten IH, Frank E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd Ed Morgan Kaufmann; San Francisco, USA. [Google Scholar]
  56. Wu F, Olson B, Dobbs D, Honavar V. 2006. Comparing kernels for predicting protein binding sites from amino acid sequence. International Joint Conference on Neural Networks (IJCNN06); 1612–1616. [Google Scholar]
  57. Wu X, Shang B, Yang R, Yu H, Hai Z, Shen X, Ji Y, Lin Y, Di Wu Y, Lin G, Tian L, Gan XQ, Yang S, Jiang WH, Dai EH, Wang XY, Jiang HL, Xie YH, Zhu XL, Pei G, Li L, Wu JR, Sun B. 2004. The spike protein of severe acute respiratory syndrome (SARS) is cleaved in virus infected Vero‐E6 cells. Cell Res. 14: 400–406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Yu C, Chen Y, Lu C, Hwang J. 2006. Prediction of protein subcellular localization. Proteins 64: 643–651. [DOI] [PubMed] [Google Scholar]
  59. Zaki N, Deris S, Illias R. 2005. Application of string kernels in protein sequence classification. Appl. Bioinform. 4: 45–52. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Molecular Recognition are provided here courtesy of Wiley

RESOURCES