Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 1999 Apr;8(4):771–777. doi: 10.1110/ps.8.4.771

Genome analysis: Assigning protein coding regions to three-dimensional structures.

A A Salamov 1, M Suwa 1, C A Orengo 1, M B Swindells 1
PMCID: PMC2144302  PMID: 10211823

Abstract

We describe the results of a procedure for maximizing the number of sequences that can be reliably linked to a protein of known three-dimensional structure. Unlike other methods, which try to increase sensitivity through the use of fold recognition software, we only use conventional sequence alignment tools, but apply them in a manner that significantly increases the number of relationships detected. We analyzed 11 genomes and found that, depending on the genome, between 23 and 32% of the ORFs had significant matches to proteins of known structure. In all cases, the aligned region consisted of either >100 residues or >50% of the smaller sequence. Slightly higher percentages could be attained if smaller motifs were also included. This is significantly higher than most previously reported methods, even those that have a fold-recognition component. We survey the biochemical and structural characteristics of the most frequently occurring proteins, and discuss the extent to which alignment methods can realistically assign function to gene products.

Full Text

The Full Text of this article is available as a PDF (1.9 MB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Gish W. Local alignment statistics. Methods Enzymol. 1996;266:460–480. doi: 10.1016/s0076-6879(96)66029-7. [DOI] [PubMed] [Google Scholar]
  2. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  3. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Annereau J. P., Wulbrand U., Vankeerberghen A., Cuppens H., Bontems F., Tümmler B., Cassiman J. J., Stoven V. A novel model for the first nucleotide binding domain of the cystic fibrosis transmembrane conductance regulator. FEBS Lett. 1997 May 5;407(3):303–308. doi: 10.1016/s0014-5793(97)00363-3. [DOI] [PubMed] [Google Scholar]
  5. Berger B., Wilson D. B., Wolf E., Tonchev T., Milla M., Kim P. S. Predicting coiled coils by use of pairwise residue correlations. Proc Natl Acad Sci U S A. 1995 Aug 29;92(18):8259–8263. doi: 10.1073/pnas.92.18.8259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F., Jr, Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol. 1977 May 25;112(3):535–542. doi: 10.1016/s0022-2836(77)80200-3. [DOI] [PubMed] [Google Scholar]
  7. Bowie J. U., Lüthy R., Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 Jul 12;253(5016):164–170. doi: 10.1126/science.1853201. [DOI] [PubMed] [Google Scholar]
  8. Brenner S. E., Chothia C., Hubbard T. J. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci U S A. 1998 May 26;95(11):6073–6078. doi: 10.1073/pnas.95.11.6073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Casari G., Andrade M. A., Bork P., Boyle J., Daruvar A., Ouzounis C., Schneider R., Tamames J., Valencia A., Sander C. Challenging times for bioinformatics. Nature. 1995 Aug 24;376(6542):647–648. doi: 10.1038/376647a0. [DOI] [PubMed] [Google Scholar]
  10. Cutting G. R., Kasch L. M., Rosenstein B. J., Zielenski J., Tsui L. C., Antonarakis S. E., Kazazian H. H., Jr A cluster of cystic fibrosis mutations in the first nucleotide-binding fold of the cystic fibrosis conductance regulator protein. Nature. 1990 Jul 26;346(6282):366–369. doi: 10.1038/346366a0. [DOI] [PubMed] [Google Scholar]
  11. Fischer D., Eisenberg D. Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. Proc Natl Acad Sci U S A. 1997 Oct 28;94(22):11929–11934. doi: 10.1073/pnas.94.22.11929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gribskov M. Translational initiation factors IF-1 and eIF-2 alpha share an RNA-binding motif with prokaryotic ribosomal protein S1 and polynucleotide phosphorylase. Gene. 1992 Sep 21;119(1):107–111. doi: 10.1016/0378-1119(92)90073-x. [DOI] [PubMed] [Google Scholar]
  13. Henikoff S., Henikoff J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hoedemaeker F. J., Davidson A. R., Rose D. R. A model for the nucleotide-binding domains of ABC transporters based on the large domain of aspartate aminotransferase. Proteins. 1998 Feb 15;30(3):275–286. [PubMed] [Google Scholar]
  15. Jones D. T., Taylor W. R., Thornton J. M. A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry. 1994 Mar 15;33(10):3038–3049. doi: 10.1021/bi00176a037. [DOI] [PubMed] [Google Scholar]
  16. Jones D. T., Taylor W. R., Thornton J. M. A new approach to protein fold recognition. Nature. 1992 Jul 2;358(6381):86–89. doi: 10.1038/358086a0. [DOI] [PubMed] [Google Scholar]
  17. Jones S., Stewart M., Michie A., Swindells M. B., Orengo C., Thornton J. M. Domain assignment for protein structures using a consensus approach: characterization and analysis. Protein Sci. 1998 Feb;7(2):233–242. doi: 10.1002/pro.5560070202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Karlin S., Altschul S. F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. doi: 10.1073/pnas.87.6.2264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Koonin E. V., Tatusov R. L. Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity. Application of an iterative approach to database search. J Mol Biol. 1994 Nov 18;244(1):125–132. doi: 10.1006/jmbi.1994.1711. [DOI] [PubMed] [Google Scholar]
  20. Martin A. C., Orengo C. A., Hutchinson E. G., Jones S., Karmirantzou M., Laskowski R. A., Mitchell J. B., Taroni C., Thornton J. M. Protein folds and functions. Structure. 1998 Jul 15;6(7):875–884. doi: 10.1016/s0969-2126(98)00089-6. [DOI] [PubMed] [Google Scholar]
  21. Mimura C. S., Holbrook S. R., Ames G. F. Structural model of the nucleotide-binding conserved component of periplasmic permeases. Proc Natl Acad Sci U S A. 1991 Jan 1;88(1):84–88. doi: 10.1073/pnas.88.1.84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Orengo C. A., Michie A. D., Jones S., Jones D. T., Swindells M. B., Thornton J. M. CATH--a hierarchic classification of protein domain structures. Structure. 1997 Aug 15;5(8):1093–1108. doi: 10.1016/s0969-2126(97)00260-8. [DOI] [PubMed] [Google Scholar]
  23. Park J., Teichmann S. A., Hubbard T., Chothia C. Intermediate sequences increase the detection of homology between sequences. J Mol Biol. 1997 Oct 17;273(1):349–354. doi: 10.1006/jmbi.1997.1288. [DOI] [PubMed] [Google Scholar]
  24. Pearson W. R., Lipman D. J. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444–2448. doi: 10.1073/pnas.85.8.2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Rychlewski L., Zhang B., Godzik A. Fold and function predictions for Mycoplasma genitalium proteins. Fold Des. 1998;3(4):229–238. doi: 10.1016/S1359-0278(98)00034-0. [DOI] [PubMed] [Google Scholar]
  26. Wootton J. C. Non-globular domains in protein sequences: automated segmentation using complexity measures. Comput Chem. 1994 Sep;18(3):269–285. doi: 10.1016/0097-8485(94)85023-2. [DOI] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES