Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 1999 Mar;8(3):614–624. doi: 10.1110/ps.8.3.614

Functional insights from structural predictions: analysis of the Escherichia coli genome.

L Rychlewski 1, B Zhang 1, A Godzik 1
PMCID: PMC2144289  PMID: 10091664

Abstract

Fold assignments for proteins from the Escherichia coli genome are carried out using BASIC, a profile-profile alignment algorithm, recently tested on fold recognition benchmarks and on the Mycoplasma genitalium genome and PSI BLAST, the newest generation of the de facto standard in homology search algorithms. The fold assignments are followed by automated modeling and the resulting three-dimensional models are analyzed for possible function prediction. Close to 30% of the proteins encoded in the E. coli genome can be recognized as homologous to a protein family with known structure. Most of these homologies (23% of the entire genome) can be recognized both by PSI BLAST and BASIC algorithms, but the latter recognizes an additional 260 homologies. Previous estimates suggested that only 10-15% of E. coli proteins can be characterized this way. This dramatic increase in the number of recognized homologies between E. coli proteins and structurally characterized protein families is partly due to the rapid increase of the database of known protein structures, but mostly it is due to the significant improvement in prediction algorithms. Knowing protein structure adds a new dimension to our understanding of its function and the predictions presented here can be used to predict function for uncharacterized proteins. Several examples, analyzed in more detail in this paper, include the DPS protein protecting DNA from oxidative damage (predicted to be homologous to ferritin with iron ion acting as a reducing agent) and the ahpC/tsa family of proteins, which provides resistance to various oxidating agents (predicted to be homologous to glutathione peroxidase).

Full Text

The Full Text of this article is available as a PDF (613.0 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Anderson A. J., Dawes E. A. Occurrence, metabolism, metabolic role, and industrial uses of bacterial polyhydroxyalkanoates. Microbiol Rev. 1990 Dec;54(4):450–472. doi: 10.1128/mr.54.4.450-472.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2241–2245. doi: 10.1093/nar/19.suppl.2241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bork P., Gibson T. J. Applying motif and profile searches. Methods Enzymol. 1996;266:162–184. doi: 10.1016/s0076-6879(96)66013-3. [DOI] [PubMed] [Google Scholar]
  5. Bowie J. U., Lüthy R., Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 Jul 12;253(5016):164–170. doi: 10.1126/science.1853201. [DOI] [PubMed] [Google Scholar]
  6. Bozzi M., Mignogna G., Stefanini S., Barra D., Longhi C., Valenti P., Chiancone E. A novel non-heme iron-binding ferritin related to the DNA-binding proteins of the Dps family in Listeria innocua. J Biol Chem. 1997 Feb 7;272(6):3259–3265. doi: 10.1074/jbc.272.6.3259. [DOI] [PubMed] [Google Scholar]
  7. Bryant S. H., Lawrence C. E. An empirical energy function for threading protein sequence through the folding motif. Proteins. 1993 May;16(1):92–112. doi: 10.1002/prot.340160110. [DOI] [PubMed] [Google Scholar]
  8. Fetrow J. S., Godzik A., Skolnick J. Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity. J Mol Biol. 1998 Oct 2;282(4):703–711. doi: 10.1006/jmbi.1998.2061. [DOI] [PubMed] [Google Scholar]
  9. Fischer D., Eisenberg D. Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. Proc Natl Acad Sci U S A. 1997 Oct 28;94(22):11929–11934. doi: 10.1073/pnas.94.22.11929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Frishman D., Mewes H. W. Protein structural classes in five complete genomes. Nat Struct Biol. 1997 Aug;4(8):626–628. doi: 10.1038/nsb0897-626. [DOI] [PubMed] [Google Scholar]
  11. Godzik A., Kolinski A., Skolnick J. Topology fingerprint approach to the inverse protein folding problem. J Mol Biol. 1992 Sep 5;227(1):227–238. doi: 10.1016/0022-2836(92)90693-e. [DOI] [PubMed] [Google Scholar]
  12. Gonnet G. H., Cohen M. A., Benner S. A. Exhaustive matching of the entire protein sequence database. Science. 1992 Jun 5;256(5062):1443–1445. doi: 10.1126/science.1604319. [DOI] [PubMed] [Google Scholar]
  13. Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Huang R., Reusch R. N. Poly(3-hydroxybutyrate) is associated with specific proteins in the cytoplasm and membranes of Escherichia coli. J Biol Chem. 1996 Sep 6;271(36):22196–22202. doi: 10.1074/jbc.271.36.22196. [DOI] [PubMed] [Google Scholar]
  15. Jaroszewski L., Rychlewski L., Zhang B., Godzik A. Fold prediction by a hierarchy of sequence, threading, and modeling methods. Protein Sci. 1998 Jun;7(6):1431–1440. doi: 10.1002/pro.5560070620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Jones D. T., Taylor W. R., Thornton J. M. A new approach to protein fold recognition. Nature. 1992 Jul 2;358(6381):86–89. doi: 10.1038/358086a0. [DOI] [PubMed] [Google Scholar]
  17. Karlin S., Altschul S. F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. doi: 10.1073/pnas.87.6.2264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Laskowski R. A., Hutchinson E. G., Michie A. D., Wallace A. C., Jones M. L., Thornton J. M. PDBsum: a Web-based database of summaries and analyses of all PDB structures. Trends Biochem Sci. 1997 Dec;22(12):488–490. doi: 10.1016/s0968-0004(97)01140-7. [DOI] [PubMed] [Google Scholar]
  19. Matsuo Y., Nishikawa K. Protein structural similarities predicted by a sequence-structure compatibility method. Protein Sci. 1994 Nov;3(11):2055–2063. doi: 10.1002/pro.5560031118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Moxon E. R., Higgins C. F. E. coli genome sequence. A blueprint for life. Nature. 1997 Sep 11;389(6647):120–121. doi: 10.1038/38107. [DOI] [PubMed] [Google Scholar]
  21. Orengo C. A., Flores T. P., Jones D. T., Taylor W. R., Thornton J. M. Recurring structural motifs in proteins with different functions. Curr Biol. 1993 Mar;3(3):131–139. doi: 10.1016/0960-9822(93)90254-l. [DOI] [PubMed] [Google Scholar]
  22. Ouzounis C., Sander C., Scharf M., Schneider R. Prediction of protein structure by evaluation of sequence-structure fitness. Aligning sequences to contact profiles derived from three-dimensional structures. J Mol Biol. 1993 Aug 5;232(3):805–825. doi: 10.1006/jmbi.1993.1433. [DOI] [PubMed] [Google Scholar]
  23. Quiocho F. A., Ledvina P. S. Atomic structure and specificity of bacterial periplasmic receptors for active transport and chemotaxis: variation of common themes. Mol Microbiol. 1996 Apr;20(1):17–25. doi: 10.1111/j.1365-2958.1996.tb02484.x. [DOI] [PubMed] [Google Scholar]
  24. Russell R. B., Copley R. R., Barton G. J. Protein fold recognition by mapping predicted secondary structures. J Mol Biol. 1996 Jun 14;259(3):349–365. doi: 10.1006/jmbi.1996.0325. [DOI] [PubMed] [Google Scholar]
  25. Rychlewski L., Zhang B., Godzik A. Fold and function predictions for Mycoplasma genitalium proteins. Fold Des. 1998;3(4):229–238. doi: 10.1016/S1359-0278(98)00034-0. [DOI] [PubMed] [Google Scholar]
  26. Sali A., Overington J. P. Derivation of rules for comparative protein modeling from a database of protein structure alignments. Protein Sci. 1994 Sep;3(9):1582–1596. doi: 10.1002/pro.5560030923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Tyrrell R., Verschueren K. H., Dodson E. J., Murshudov G. N., Addy C., Wilkinson A. J. The structure of the cofactor-binding fragment of the LysR family member, CysB: a familiar fold with a surprising subunit arrangement. Structure. 1997 Aug 15;5(8):1017–1032. doi: 10.1016/s0969-2126(97)00254-2. [DOI] [PubMed] [Google Scholar]
  28. Wang F., Lee S. Y. Production of poly(3-hydroxybutyrate) by fed-batch culture of filamentation-suppressed recombinant Escherichia coli. Appl Environ Microbiol. 1997 Dec;63(12):4765–4769. doi: 10.1128/aem.63.12.4765-4769.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Wilmanns M., Eisenberg D. Inverse protein folding by the residue pair preference profile method: estimating the correctness of alignments of structurally compatible sequences. Protein Eng. 1995 Jul;8(7):627–639. doi: 10.1093/protein/8.7.627. [DOI] [PubMed] [Google Scholar]
  30. Yi T. M., Lander E. S. Recognition of related proteins by iterative template refinement (ITR). Protein Sci. 1994 Aug;3(8):1315–1328. doi: 10.1002/pro.5560030818. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES