Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2000 Jun;9(6):1162–1176. doi: 10.1110/ps.9.6.1162

Cascaded multiple classifiers for secondary structure prediction.

M Ouali 1, R D King 1
PMCID: PMC2144653  PMID: 10892809

Abstract

We describe a new classifier for protein secondary structure prediction that is formed by cascading together different types of classifiers using neural networks and linear discrimination. The new classifier achieves an accuracy of 76.7% (assessed by a rigorous full Jack-knife procedure) on a new nonredundant dataset of 496 nonhomologous sequences (obtained from G.J. Barton and J.A. Cuff). This database was especially designed to train and test protein secondary structure prediction methods, and it uses a more stringent definition of homologous sequence than in previous studies. We show that it is possible to design classifiers that can highly discriminate the three classes (H, E, C) with an accuracy of up to 78% for beta-strands, using only a local window and resampling techniques. This indicates that the importance of long-range interactions for the prediction of beta-strands has been probably previously overestimated.

Full Text

The Full Text of this article is available as a PDF (286.1 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Anfinsen C. B. Principles that govern the folding of protein chains. Science. 1973 Jul 20;181(4096):223–230. doi: 10.1126/science.181.4096.223. [DOI] [PubMed] [Google Scholar]
  4. Avbelj F., Fele L. Role of main-chain electrostatics, hydrophobic effect and side-chain conformational entropy in determining the secondary structure of proteins. J Mol Biol. 1998 Jun 12;279(3):665–684. doi: 10.1006/jmbi.1998.1792. [DOI] [PubMed] [Google Scholar]
  5. Baldi P., Brunak S., Frasconi P., Soda G., Pollastri G. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics. 1999 Nov;15(11):937–946. doi: 10.1093/bioinformatics/15.11.937. [DOI] [PubMed] [Google Scholar]
  6. Baldwin R. L., Rose G. D. Is protein folding hierarchic? I. Local structure and peptide folding. Trends Biochem Sci. 1999 Jan;24(1):26–33. doi: 10.1016/s0968-0004(98)01346-2. [DOI] [PubMed] [Google Scholar]
  7. Barton G. J., Sternberg M. J. A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J Mol Biol. 1987 Nov 20;198(2):327–337. doi: 10.1016/0022-2836(87)90316-0. [DOI] [PubMed] [Google Scholar]
  8. Biou V., Gibrat J. F., Levin J. M., Robson B., Garnier J. Secondary structure prediction: combination of three different methods. Protein Eng. 1988 Sep;2(3):185–191. doi: 10.1093/protein/2.3.185. [DOI] [PubMed] [Google Scholar]
  9. Chou P. Y., Fasman G. D. Prediction of protein conformation. Biochemistry. 1974 Jan 15;13(2):222–245. doi: 10.1021/bi00699a002. [DOI] [PubMed] [Google Scholar]
  10. Cohen F. E., Abarbanel R. M., Kuntz I. D., Fletterick R. J. Secondary structure assignment for alpha/beta proteins by a combinatorial approach. Biochemistry. 1983 Oct 11;22(21):4894–4904. doi: 10.1021/bi00290a005. [DOI] [PubMed] [Google Scholar]
  11. Cuff J. A., Barton G. J. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins. 1999 Mar 1;34(4):508–519. doi: 10.1002/(sici)1097-0134(19990301)34:4<508::aid-prot10>3.0.co;2-4. [DOI] [PubMed] [Google Scholar]
  12. Eisenberg D. Three-dimensional structure of membrane and surface proteins. Annu Rev Biochem. 1984;53:595–623. doi: 10.1146/annurev.bi.53.070184.003115. [DOI] [PubMed] [Google Scholar]
  13. Ewbank J. J., Creighton T. E. Protein folding by stages. Curr Biol. 1992 Jul;2(7):347–349. doi: 10.1016/0960-9822(92)90051-b. [DOI] [PubMed] [Google Scholar]
  14. Feng D. F., Johnson M. S., Doolittle R. F. Aligning amino acid sequences: comparison of commonly used methods. J Mol Evol. 1984;21(2):112–125. doi: 10.1007/BF02100085. [DOI] [PubMed] [Google Scholar]
  15. Frishman D., Argos P. Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng. 1996 Feb;9(2):133–142. doi: 10.1093/protein/9.2.133. [DOI] [PubMed] [Google Scholar]
  16. Frishman D., Argos P. Seventy-five percent accuracy in protein secondary structure prediction. Proteins. 1997 Mar;27(3):329–335. doi: 10.1002/(sici)1097-0134(199703)27:3<329::aid-prot1>3.0.co;2-8. [DOI] [PubMed] [Google Scholar]
  17. Garnier J., Gibrat J. F., Robson B. GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. 1996;266:540–553. doi: 10.1016/s0076-6879(96)66034-0. [DOI] [PubMed] [Google Scholar]
  18. Garnier J., Osguthorpe D. J., Robson B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol. 1978 Mar 25;120(1):97–120. doi: 10.1016/0022-2836(78)90297-8. [DOI] [PubMed] [Google Scholar]
  19. Geourjon C., Deléage G. SOPM: a self-optimized method for protein secondary structure prediction. Protein Eng. 1994 Feb;7(2):157–164. doi: 10.1093/protein/7.2.157. [DOI] [PubMed] [Google Scholar]
  20. Gibrat J. F., Garnier J., Robson B. Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol. 1987 Dec 5;198(3):425–443. doi: 10.1016/0022-2836(87)90292-0. [DOI] [PubMed] [Google Scholar]
  21. Henikoff S., Henikoff J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Holley L. H., Karplus M. Protein secondary structure prediction with a neural network. Proc Natl Acad Sci U S A. 1989 Jan;86(1):152–156. doi: 10.1073/pnas.86.1.152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Hubbard T. J., Sander C. The role of heat-shock and chaperone proteins in protein folding: possible molecular mechanisms. Protein Eng. 1991 Oct;4(7):711–717. doi: 10.1093/protein/4.7.711. [DOI] [PubMed] [Google Scholar]
  24. Jones D. T. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999 Sep 17;292(2):195–202. doi: 10.1006/jmbi.1999.3091. [DOI] [PubMed] [Google Scholar]
  25. Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
  26. Kawabata T., Doi J. Improvement of protein secondary structure prediction using binary word encoding. Proteins. 1997 Jan;27(1):36–46. doi: 10.1002/(sici)1097-0134(199701)27:1<36::aid-prot5>3.0.co;2-l. [DOI] [PubMed] [Google Scholar]
  27. King R. D., Sternberg M. J. Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci. 1996 Nov;5(11):2298–2310. doi: 10.1002/pro.5560051116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. King R. D., Sternberg M. J. Machine learning approach for the prediction of protein secondary structure. J Mol Biol. 1990 Nov 20;216(2):441–457. doi: 10.1016/S0022-2836(05)80333-X. [DOI] [PubMed] [Google Scholar]
  29. Kneller D. G., Cohen F. E., Langridge R. Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol. 1990 Jul 5;214(1):171–182. doi: 10.1016/0022-2836(90)90154-E. [DOI] [PubMed] [Google Scholar]
  30. Levin J. M. Exploring the limits of nearest neighbour secondary structure prediction. Protein Eng. 1997 Jul;10(7):771–776. doi: 10.1093/protein/10.7.771. [DOI] [PubMed] [Google Scholar]
  31. Levin J. M., Pascarella S., Argos P., Garnier J. Quantification of secondary structure prediction improvement using multiple alignments. Protein Eng. 1993 Nov;6(8):849–854. doi: 10.1093/protein/6.8.849. [DOI] [PubMed] [Google Scholar]
  32. Lim V. I. Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. J Mol Biol. 1974 Oct 5;88(4):873–894. doi: 10.1016/0022-2836(74)90405-7. [DOI] [PubMed] [Google Scholar]
  33. Matthews B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta. 1975 Oct 20;405(2):442–451. doi: 10.1016/0005-2795(75)90109-9. [DOI] [PubMed] [Google Scholar]
  34. Muggleton S., King R. D., Sternberg M. J. Protein secondary structure prediction using logic-based machine learning. Protein Eng. 1992 Oct;5(7):647–657. doi: 10.1093/protein/5.7.647. [DOI] [PubMed] [Google Scholar]
  35. Needleman S. B., Wunsch C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
  36. Orengo C. A., Michie A. D., Jones S., Jones D. T., Swindells M. B., Thornton J. M. CATH--a hierarchic classification of protein domain structures. Structure. 1997 Aug 15;5(8):1093–1108. doi: 10.1016/s0969-2126(97)00260-8. [DOI] [PubMed] [Google Scholar]
  37. Ptitsyn O. B., Finkelstein A. V. Theory of protein secondary structure and algorithm of its prediction. Biopolymers. 1983 Jan;22(1):15–25. doi: 10.1002/bip.360220105. [DOI] [PubMed] [Google Scholar]
  38. Qian N., Sejnowski T. J. Predicting the secondary structure of globular proteins using neural network models. J Mol Biol. 1988 Aug 20;202(4):865–884. doi: 10.1016/0022-2836(88)90564-5. [DOI] [PubMed] [Google Scholar]
  39. Riis S. K., Krogh A. Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. J Comput Biol. 1996 Spring;3(1):163–183. doi: 10.1089/cmb.1996.3.163. [DOI] [PubMed] [Google Scholar]
  40. Robson B., Pain R. H. Analysis of the code relating sequence to conformation in proteins: possible implications for the mechanism of formation of helical regions. J Mol Biol. 1971 May 28;58(1):237–259. doi: 10.1016/0022-2836(71)90243-9. [DOI] [PubMed] [Google Scholar]
  41. Robson B., Suzuki E. Conformational properties of amino acid residues in globular proteins. J Mol Biol. 1976 Nov 5;107(3):327–356. doi: 10.1016/s0022-2836(76)80008-3. [DOI] [PubMed] [Google Scholar]
  42. Rost B. PHD: predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymol. 1996;266:525–539. doi: 10.1016/s0076-6879(96)66033-9. [DOI] [PubMed] [Google Scholar]
  43. Rost B., Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993 Jul 20;232(2):584–599. doi: 10.1006/jmbi.1993.1413. [DOI] [PubMed] [Google Scholar]
  44. Rost B., Sander C., Schneider R. Redefining the goals of protein secondary structure prediction. J Mol Biol. 1994 Jan 7;235(1):13–26. doi: 10.1016/s0022-2836(05)80007-5. [DOI] [PubMed] [Google Scholar]
  45. Salamov A. A., Solovyev V. V. Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J Mol Biol. 1995 Mar 17;247(1):11–15. doi: 10.1006/jmbi.1994.0116. [DOI] [PubMed] [Google Scholar]
  46. Salamov A. A., Solovyev V. V. Protein secondary structure prediction using local alignments. J Mol Biol. 1997 Apr 25;268(1):31–36. doi: 10.1006/jmbi.1997.0958. [DOI] [PubMed] [Google Scholar]
  47. Tatusov R. L., Altschul S. F., Koonin E. V. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091–12095. doi: 10.1073/pnas.91.25.12091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Thompson J. D., Higgins D. G., Gibson T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Yi T. M., Lander E. S. Protein secondary structure prediction using nearest-neighbor methods. J Mol Biol. 1993 Aug 20;232(4):1117–1129. doi: 10.1006/jmbi.1993.1464. [DOI] [PubMed] [Google Scholar]
  50. Zemla A., Venclovas C., Fidelis K., Rost B. A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins. 1999 Feb 1;34(2):220–223. doi: 10.1002/(sici)1097-0134(19990201)34:2<220::aid-prot7>3.0.co;2-k. [DOI] [PubMed] [Google Scholar]
  51. Zhang C. T., Chou K. C. An optimization approach to predicting protein structural class from amino acid composition. Protein Sci. 1992 Mar;1(3):401–408. doi: 10.1002/pro.5560010312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Zimmermann K., Gibrat J. F. In unison: regularization of protein secondary structure predictions that makes use of multiple sequence alignments. Protein Eng. 1998 Oct;11(10):861–865. doi: 10.1093/protein/11.10.861. [DOI] [PubMed] [Google Scholar]
  53. Zvelebil M. J., Barton G. J., Taylor W. R., Sternberg M. J. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol. 1987 Jun 20;195(4):957–961. doi: 10.1016/0022-2836(87)90501-8. [DOI] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES