Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 1998 Jun;7(6):1431–1440. doi: 10.1002/pro.5560070620

Fold prediction by a hierarchy of sequence, threading, and modeling methods.

L Jaroszewski 1, L Rychlewski 1, B Zhang 1, A Godzik 1
PMCID: PMC2144032  PMID: 9655348

Abstract

Several fold recognition algorithms are compared to each other in terms of prediction accuracy and significance. It is shown that on standard benchmarks, hybrid methods, which combine scoring based on sequence-sequence and sequence-structure matching, surpass both sequence and threading methods in the number of accurate predictions. However, the sequence similarity contributes most to the prediction accuracy. This strongly argues that most examples of apparently nonhomologous proteins with similar folds are actually related by evolution. While disappointing from the perspective of the fundamental understanding of protein folding, this adds a new significance to fold recognition methods as a possible first step in function prediction. Despite hybrid methods being more accurate at fold prediction than either the sequence or threading methods, each of the methods is correct in some cases where others have failed. This partly reflects a different perspective on sequence/structure relationship embedded in various methods. To combine predictions from different methods, estimates of significance of predictions are made for all methods. With the help of such estimates, it is possible to develop a "jury" method, which has accuracy higher than any of the single methods. Finally, building full three-dimensional models for all top predictions helps to eliminate possible false positives where alignments, which are optimal in the one-dimensional sequences, lead to unsolvable sterical conflicts for the full three-dimensional models.

Full Text

The Full Text of this article is available as a PDF (3.0 MB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. Babbitt P. C., Mrachko G. T., Hasson M. S., Huisman G. W., Kolter R., Ringe D., Petsko G. A., Kenyon G. L., Gerlt J. A. A functionally diverse enzyme superfamily that abstracts the alpha protons of carboxylic acids. Science. 1995 Feb 24;267(5201):1159–1161. doi: 10.1126/science.7855594. [DOI] [PubMed] [Google Scholar]
  3. Bowie J. U., Clarke N. D., Pabo C. O., Sauer R. T. Identification of protein folds: matching hydrophobicity patterns of sequence sets with solvent accessibility patterns of known structures. Proteins. 1990;7(3):257–264. doi: 10.1002/prot.340070307. [DOI] [PubMed] [Google Scholar]
  4. Bowie J. U., Lüthy R., Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 Jul 12;253(5016):164–170. doi: 10.1126/science.1853201. [DOI] [PubMed] [Google Scholar]
  5. Bryant S. H., Altschul S. F. Statistics of sequence-structure threading. Curr Opin Struct Biol. 1995 Apr;5(2):236–244. doi: 10.1016/0959-440x(95)80082-4. [DOI] [PubMed] [Google Scholar]
  6. Bryant S. H., Lawrence C. E. An empirical energy function for threading protein sequence through the folding motif. Proteins. 1993 May;16(1):92–112. doi: 10.1002/prot.340160110. [DOI] [PubMed] [Google Scholar]
  7. Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992 Jun 18;357(6379):543–544. doi: 10.1038/357543a0. [DOI] [PubMed] [Google Scholar]
  8. Finkel'shtein A. V., Reva B. A. Opredelenie khoda tsepi globuliarnykh belkov metodom samosoglasovannogo polia. Biofizika. 1990 May-Jun;35(3):402–406. [PubMed] [Google Scholar]
  9. Finkelstein A. V., Ptitsyn O. B. Why do globular proteins fit the limited set of folding patterns? Prog Biophys Mol Biol. 1987;50(3):171–190. doi: 10.1016/0079-6107(87)90013-7. [DOI] [PubMed] [Google Scholar]
  10. Godzik A., Kolinski A., Skolnick J. Topology fingerprint approach to the inverse protein folding problem. J Mol Biol. 1992 Sep 5;227(1):227–238. doi: 10.1016/0022-2836(92)90693-e. [DOI] [PubMed] [Google Scholar]
  11. Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Holm L., Sander C. Structural alignment of globins, phycocyanins and colicin A. FEBS Lett. 1993 Jan 11;315(3):301–306. doi: 10.1016/0014-5793(93)81183-z. [DOI] [PubMed] [Google Scholar]
  13. Jones D. T., Taylor W. R., Thornton J. M. A new approach to protein fold recognition. Nature. 1992 Jul 2;358(6381):86–89. doi: 10.1038/358086a0. [DOI] [PubMed] [Google Scholar]
  14. Karlin S., Altschul S. F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. doi: 10.1073/pnas.87.6.2264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lüthy R., McLachlan A. D., Eisenberg D. Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins. 1991;10(3):229–239. doi: 10.1002/prot.340100307. [DOI] [PubMed] [Google Scholar]
  16. Maiorov V. N., Crippen G. M. Contact potential that recognizes the correct folding of globular proteins. J Mol Biol. 1992 Oct 5;227(3):876–888. doi: 10.1016/0022-2836(92)90228-c. [DOI] [PubMed] [Google Scholar]
  17. Orengo C. A., Flores T. P., Jones D. T., Taylor W. R., Thornton J. M. Recurring structural motifs in proteins with different functions. Curr Biol. 1993 Mar;3(3):131–139. doi: 10.1016/0960-9822(93)90254-l. [DOI] [PubMed] [Google Scholar]
  18. Ouzounis C., Sander C., Scharf M., Schneider R. Prediction of protein structure by evaluation of sequence-structure fitness. Aligning sequences to contact profiles derived from three-dimensional structures. J Mol Biol. 1993 Aug 5;232(3):805–825. doi: 10.1006/jmbi.1993.1433. [DOI] [PubMed] [Google Scholar]
  19. Pascarella S., Argos P. A data bank merging related protein structures and sequences. Protein Eng. 1992 Mar;5(2):121–137. doi: 10.1093/protein/5.2.121. [DOI] [PubMed] [Google Scholar]
  20. Pawłowski K., Jaroszewski L., Bierzyñski A., Godzik A. Multiple model approach--dealing with alignment ambiguities in protein modeling. Pac Symp Biocomput. 1997:328–339. [PubMed] [Google Scholar]
  21. Pearson W. R. Effective protein sequence comparison. Methods Enzymol. 1996;266:227–258. doi: 10.1016/s0076-6879(96)66017-0. [DOI] [PubMed] [Google Scholar]
  22. Rice D. W., Eisenberg D. A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence. J Mol Biol. 1997 Apr 11;267(4):1026–1038. doi: 10.1006/jmbi.1997.0924. [DOI] [PubMed] [Google Scholar]
  23. Sippl M. J., Weitckus S. Detection of native-like models for amino acid sequences of unknown three-dimensional structure in a data base of known protein conformations. Proteins. 1992 Jul;13(3):258–271. doi: 10.1002/prot.340130308. [DOI] [PubMed] [Google Scholar]
  24. Tomii K., Kanehisa M. Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Eng. 1996 Jan;9(1):27–36. doi: 10.1093/protein/9.1.27. [DOI] [PubMed] [Google Scholar]
  25. Vogt G., Etzold T., Argos P. An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J Mol Biol. 1995 Jun 16;249(4):816–831. doi: 10.1006/jmbi.1995.0340. [DOI] [PubMed] [Google Scholar]
  26. Waterman M. S., Vingron M. Rapid and accurate estimates of statistical significance for sequence data base searches. Proc Natl Acad Sci U S A. 1994 May 24;91(11):4625–4628. doi: 10.1073/pnas.91.11.4625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Yi T. M., Lander E. S. Recognition of related proteins by iterative template refinement (ITR). Protein Sci. 1994 Aug;3(8):1315–1328. doi: 10.1002/pro.5560030818. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Zhang B., Jaroszewski L., Rychlewski L., Godzik A. Similarities and differences between nonhomologous proteins with similar folds: evaluation of threading strategies. Fold Des. 1997;2(5):307–317. doi: 10.1016/S1359-0278(97)00042-4. [DOI] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES