Abstract
Most recent protein secondary structure prediction methods use sequence alignments to improve the prediction quality. We investigate the relationship between the location of secondary structural elements, gaps, and variable residue positions in multiple sequence alignments. We further investigate how these relationships compare with those found in structurally aligned protein families. We show how such associations may be used to improve the quality of prediction of the secondary structure elements, using the Quadratic-Logistic method with profiles. Furthermore, we analyze the extent to which the number of homologous sequences influences the quality of prediction. The analysis of variable residue positions shows that surprisingly, helical regions exhibit greater variability than do coil regions, which are generally thought to be the most common secondary structure elements in loops. However, the correlation between variability and the presence of helices does not significantly improve prediction quality. Gaps are a distinct signal for coil regions. Increasing the coil propensity for those residues occurring in gap regions enhances the overall prediction quality. Prediction accuracy increases initially with the number of homologues, but changes negligibly as the number of homologues exceeds about 14. The alignment quality affects the prediction more than other factors, hence a careful selection and alignment of even a small number of homologues can lead to significant improvements in prediction accuracy.
Full Text
The Full Text of this article is available as a PDF (819.9 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Benner S. A., Jenny T. F., Cohen M. A., Gonnet G. H. Predicting the conformation of proteins from sequences. Progress and future progress. Adv Enzyme Regul. 1994;34:269–353. doi: 10.1016/0065-2571(94)90021-3. [DOI] [PubMed] [Google Scholar]
- Biou V., Gibrat J. F., Levin J. M., Robson B., Garnier J. Secondary structure prediction: combination of three different methods. Protein Eng. 1988 Sep;2(3):185–191. doi: 10.1093/protein/2.3.185. [DOI] [PubMed] [Google Scholar]
- Chothia C., Lesk A. M. The relation between the divergence of sequence and structure in proteins. EMBO J. 1986 Apr;5(4):823–826. doi: 10.1002/j.1460-2075.1986.tb04288.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chou P. Y., Fasman G. D. Empirical predictions of protein conformation. Annu Rev Biochem. 1978;47:251–276. doi: 10.1146/annurev.bi.47.070178.001343. [DOI] [PubMed] [Google Scholar]
- Garnier J., Osguthorpe D. J., Robson B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol. 1978 Mar 25;120(1):97–120. doi: 10.1016/0022-2836(78)90297-8. [DOI] [PubMed] [Google Scholar]
- Gerloff D. L., Benner S. A. A consensus prediction of the secondary structure for the 6-phospho-beta-D-galactosidase superfamily. Proteins. 1995 Apr;21(4):273–281. doi: 10.1002/prot.340210402. [DOI] [PubMed] [Google Scholar]
- Gibrat J. F., Garnier J., Robson B. Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol. 1987 Dec 5;198(3):425–443. doi: 10.1016/0022-2836(87)90292-0. [DOI] [PubMed] [Google Scholar]
- Greer J. Comparative model-building of the mammalian serine proteases. J Mol Biol. 1981 Dec 25;153(4):1027–1042. doi: 10.1016/0022-2836(81)90465-4. [DOI] [PubMed] [Google Scholar]
- Greer J. Comparative modeling methods: application to the family of the mammalian serine proteases. Proteins. 1990;7(4):317–334. doi: 10.1002/prot.340070404. [DOI] [PubMed] [Google Scholar]
- Jenny T. F., Gerloff D. L., Cohen M. A., Benner S. A. Predicted secondary and supersecondary structure for the serine-threonine-specific protein phosphatase family. Proteins. 1995 Jan;21(1):1–10. doi: 10.1002/prot.340210102. [DOI] [PubMed] [Google Scholar]
- Kabsch W., Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577–2637. doi: 10.1002/bip.360221211. [DOI] [PubMed] [Google Scholar]
- King R. D., Sternberg M. J. Machine learning approach for the prediction of protein secondary structure. J Mol Biol. 1990 Nov 20;216(2):441–457. doi: 10.1016/S0022-2836(05)80333-X. [DOI] [PubMed] [Google Scholar]
- Levin J. M., Garnier J. Improvements in a secondary structure prediction method based on a search for local sequence homologies and its use as a model building tool. Biochim Biophys Acta. 1988 Aug 10;955(3):283–295. doi: 10.1016/0167-4838(88)90206-3. [DOI] [PubMed] [Google Scholar]
- Levin J. M., Pascarella S., Argos P., Garnier J. Quantification of secondary structure prediction improvement using multiple alignments. Protein Eng. 1993 Nov;6(8):849–854. doi: 10.1093/protein/6.8.849. [DOI] [PubMed] [Google Scholar]
- Rost B., Sander C. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins. 1994 May;19(1):55–72. doi: 10.1002/prot.340190108. [DOI] [PubMed] [Google Scholar]
- Rost B., Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993 Jul 20;232(2):584–599. doi: 10.1006/jmbi.1993.1413. [DOI] [PubMed] [Google Scholar]
- Rost B., Sander C., Schneider R. PHD--an automatic mail server for protein secondary structure prediction. Comput Appl Biosci. 1994 Feb;10(1):53–60. doi: 10.1093/bioinformatics/10.1.53. [DOI] [PubMed] [Google Scholar]
- Salamov A. A., Solovyev V. V. Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J Mol Biol. 1995 Mar 17;247(1):11–15. doi: 10.1006/jmbi.1994.0116. [DOI] [PubMed] [Google Scholar]
- Salzberg S., Cost S. Predicting protein secondary structure with a nearest-neighbor algorithm. J Mol Biol. 1992 Sep 20;227(2):371–374. doi: 10.1016/0022-2836(92)90892-n. [DOI] [PubMed] [Google Scholar]
- Sander C., Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9(1):56–68. doi: 10.1002/prot.340090107. [DOI] [PubMed] [Google Scholar]
- Stolorz P., Lapedes A., Xia Y. Predicting protein secondary structure using neural net and statistical methods. J Mol Biol. 1992 May 20;225(2):363–377. doi: 10.1016/0022-2836(92)90927-c. [DOI] [PubMed] [Google Scholar]
- Zhang X., Mesirov J. P., Waltz D. L. Hybrid system for protein secondary structure prediction. J Mol Biol. 1992 Jun 20;225(4):1049–1063. doi: 10.1016/0022-2836(92)90104-r. [DOI] [PubMed] [Google Scholar]
- Zvelebil M. J., Barton G. J., Taylor W. R., Sternberg M. J. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol. 1987 Jun 20;195(4):957–961. doi: 10.1016/0022-2836(87)90501-8. [DOI] [PubMed] [Google Scholar]
