Skip to main content

Some NLM-NCBI services and products are experiencing heavy traffic, which may affect performance and availability. We apologize for the inconvenience and appreciate your patience. For assistance, please contact our Help Desk at info@ncbi.nlm.nih.gov.

Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 1999 May;8(5):1104–1115. doi: 10.1110/ps.8.5.1104

From fold predictions to function predictions: automation of functional site conservation analysis for functional genome predictions.

B Zhang 1, L Rychlewski 1, K Pawłowski 1, J S Fetrow 1, J Skolnick 1, A Godzik 1
PMCID: PMC2144342  PMID: 10338021

Abstract

A database of functional sites for proteins with known structures, SITE, is constructed and used in conjunction with a simple pattern matching program SiteMatch to evaluate possible function conservation in a recently constructed database of fold predictions for Escherichia coli proteins (Rychlewski L et al., 1999, Protein Sci 8:614-624). In this and other prediction databases, fold predictions are based on algorithms that can recognize weak sequence similarities and putatively assign new proteins into already characterized protein families. It is not clear whether such sequence similarities arise from distant homologies or general similarity of physicochemical features along the sequence. Leaving aside the important question of nature of relations within fold superfamilies, it is possible to assess possible function conservation by looking at the pattern of conservation of crucial functional residues. SITE consists of a multilevel function description based on structure annotations and structure analyses. In particular, active site residues, ligand binding residues, and patterns of hydrophobic residues on the protein surface are used to describe different functional features. SiteMatch, a simple pattern matching program, is designed to check the conservation of residues involved in protein activity in alignments generated by any alignment method. Here, this procedure is used to study conservation of functional features in alignments between protein sequences from the E. coli genome and their optimal structural templates. The optimal templates were identified and alignments taken from the database of genomic structural predictions was described in a previous publication (Rychlewski L et al., 1999, Protein Sci 8:614-624). An automated assessment of function conservation is used to analyze the relation between fold and function similarity for a large number of fold predictions. For instance, it is shown that identifying low significance predictions with a high level of functional residue conservations can be used to extend the prediction sensitivity for fold prediction methods. Over 100 new fold/function predictions in this class were obtained in the E. coli genome. At the same time, about 30% of our previous fold predictions are not confirmed as function predictions, further highlighting the problem of function divergence in fold superfamilies.

Full Text

The Full Text of this article is available as a PDF (1.4 MB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bairoch A., Apweiler R. The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. 1999 Jan 1;27(1):49–54. doi: 10.1093/nar/27.1.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F., Jr, Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
  5. Bork P., Gibson T. J. Applying motif and profile searches. Methods Enzymol. 1996;266:162–184. doi: 10.1016/s0076-6879(96)66013-3. [DOI] [PubMed] [Google Scholar]
  6. Bowie J. U., Lüthy R., Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 Jul 12;253(5016):164–170. doi: 10.1126/science.1853201. [DOI] [PubMed] [Google Scholar]
  7. Fetrow J. S., Godzik A., Skolnick J. Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. J Mol Biol. 1998 Oct 2;282(4):703–711. doi: 10.1006/jmbi.1998.2061. [DOI] [PubMed] [Google Scholar]
  8. Fetrow J. S., Skolnick J. Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. J Mol Biol. 1998 Sep 4;281(5):949–968. doi: 10.1006/jmbi.1998.1993. [DOI] [PubMed] [Google Scholar]
  9. Fischer D., Eisenberg D. Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. Proc Natl Acad Sci U S A. 1997 Oct 28;94(22):11929–11934. doi: 10.1073/pnas.94.22.11929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Frishman D., Mewes H. W. Protein structural classes in five complete genomes. Nat Struct Biol. 1997 Aug;4(8):626–628. doi: 10.1038/nsb0897-626. [DOI] [PubMed] [Google Scholar]
  11. Godzik A., Kolinski A., Skolnick J. Topology fingerprint approach to the inverse protein folding problem. J Mol Biol. 1992 Sep 5;227(1):227–238. doi: 10.1016/0022-2836(92)90693-e. [DOI] [PubMed] [Google Scholar]
  12. Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Hecht H. J., Sobek H., Haag T., Pfeifer O., van Pée K. H. The metal-ion-free oxidoreductase from Streptomyces aureofaciens has an alpha/beta hydrolase fold. Nat Struct Biol. 1994 Aug;1(8):532–537. doi: 10.1038/nsb0894-532. [DOI] [PubMed] [Google Scholar]
  14. Hobohm U., Scharf M., Schneider R., Sander C. Selection of representative protein data sets. Protein Sci. 1992 Mar;1(3):409–417. doi: 10.1002/pro.5560010313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Huynen M., Dandekar T., Bork P. Differential genome analysis applied to the species-specific features of Helicobacter pylori. FEBS Lett. 1998 Apr 10;426(1):1–5. doi: 10.1016/s0014-5793(98)00276-2. [DOI] [PubMed] [Google Scholar]
  16. Jaroszewski L., Rychlewski L., Zhang B., Godzik A. Fold prediction by a hierarchy of sequence, threading, and modeling methods. Protein Sci. 1998 Jun;7(6):1431–1440. doi: 10.1002/pro.5560070620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Jones D. T., Taylor W. R., Thornton J. M. A new approach to protein fold recognition. Nature. 1992 Jul 2;358(6381):86–89. doi: 10.1038/358086a0. [DOI] [PubMed] [Google Scholar]
  18. Kanatani A., Masuda T., Shimoda T., Misoka F., Lin X. S., Yoshimoto T., Tsuru D. Protease II from Escherichia coli: sequencing and expression of the enzyme gene and characterization of the expressed enzyme. J Biochem. 1991 Sep;110(3):315–320. doi: 10.1093/oxfordjournals.jbchem.a123577. [DOI] [PubMed] [Google Scholar]
  19. Laskowski R. A., Hutchinson E. G., Michie A. D., Wallace A. C., Jones M. L., Thornton J. M. PDBsum: a Web-based database of summaries and analyses of all PDB structures. Trends Biochem Sci. 1997 Dec;22(12):488–490. doi: 10.1016/s0968-0004(97)01140-7. [DOI] [PubMed] [Google Scholar]
  20. Medrano F. J., Alonso J., García J. L., Romero A., Bode W., Gomis-Rüth F. X. Structure of proline iminopeptidase from Xanthomonas campestris pv. citri: a prototype for the prolyl oligopeptidase family. EMBO J. 1998 Jan 2;17(1):1–9. doi: 10.1093/emboj/17.1.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Murzin A. G. How far divergent evolution goes in proteins. Curr Opin Struct Biol. 1998 Jun;8(3):380–387. doi: 10.1016/s0959-440x(98)80073-0. [DOI] [PubMed] [Google Scholar]
  22. Pawłowski K., Bierzyński A., Godzik A. Structural diversity in a family of homologous proteins. J Mol Biol. 1996 May 3;258(2):349–366. doi: 10.1006/jmbi.1996.0255. [DOI] [PubMed] [Google Scholar]
  23. Pearson W. R., Miller W. Dynamic programming algorithms for biological sequence comparison. Methods Enzymol. 1992;210:575–601. doi: 10.1016/0076-6879(92)10029-d. [DOI] [PubMed] [Google Scholar]
  24. Russell R. B., Copley R. R., Barton G. J. Protein fold recognition by mapping predicted secondary structures. J Mol Biol. 1996 Jun 14;259(3):349–365. doi: 10.1006/jmbi.1996.0325. [DOI] [PubMed] [Google Scholar]
  25. Rychlewski L., Zhang B., Godzik A. Fold and function predictions for Mycoplasma genitalium proteins. Fold Des. 1998;3(4):229–238. doi: 10.1016/S1359-0278(98)00034-0. [DOI] [PubMed] [Google Scholar]
  26. Rychlewski L., Zhang B., Godzik A. Functional insights from structural predictions: analysis of the Escherichia coli genome. Protein Sci. 1999 Mar;8(3):614–624. doi: 10.1110/ps.8.3.614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Tesmer J. J., Klem T. J., Deras M. L., Davisson V. J., Smith J. L. The crystal structure of GMP synthetase reveals a novel catalytic triad and is a structural paradigm for two enzyme families. Nat Struct Biol. 1996 Jan;3(1):74–86. doi: 10.1038/nsb0196-74. [DOI] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES