Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 2000 Feb;9(2):232–241. doi: 10.1110/ps.9.2.232

Comparison of sequence profiles. Strategies for structural predictions using sequence information.

L Rychlewski 1, L Jaroszewski 1, W Li 1, A Godzik 1
PMCID: PMC2144550  PMID: 10716175

Abstract

Distant homologies between proteins are often discovered only after three-dimensional structures of both proteins are solved. The sequence divergence for such proteins can be so large that simple comparison of their sequences fails to identify any similarity. New generation of sensitive alignment tools use averaged sequences of entire homologous families (profiles) to detect such homologies. Several algorithms, including the newest generation of BLAST algorithms and BASIC, an algorithm used in our group to assign fold predictions for proteins from several genomes, are compared to each other on the large set of structurally similar proteins with little sequence similarity. Proteins in the benchmark are classified according to the level of their similarity, which allows us to demonstrate that most of the improvement of the new algorithms is achieved for proteins with strong functional similarities, with almost no progress in recognizing distant fold similarities. It is also shown that details of profile calculation strongly influence its sensitivity in recognizing distant homologies. The most important choice is how to include information from diverging members of the family, avoiding generating false predictions, while accounting for entire sequence divergence within a family. PSI-BLAST takes a conservative approach, deriving a profile from core members of the family, providing a solid improvement without almost any false predictions. BASIC strives for better sensitivity by increasing the weight of divergent family members and paying the price in lower reliability. A new FFAS algorithm introduced here uses a new procedure for profile generation that takes into account all the relations within the family and matches BASIC sensitivity with PSI-BLAST like reliability.

Full Text

The Full Text of this article is available as a PDF (318.9 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., Lipman D. J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep 1;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Aravind L., Koonin E. V. Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J Mol Biol. 1999 Apr 16;287(5):1023–1040. doi: 10.1006/jmbi.1999.2653. [DOI] [PubMed] [Google Scholar]
  4. Bairoch A., Bucher P., Hofmann K. The PROSITE database, its status in 1995. Nucleic Acids Res. 1996 Jan 1;24(1):189–196. doi: 10.1093/nar/24.1.189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bork P., Gibson T. J. Applying motif and profile searches. Methods Enzymol. 1996;266:162–184. doi: 10.1016/s0076-6879(96)66013-3. [DOI] [PubMed] [Google Scholar]
  6. Brenner S. E., Chothia C., Hubbard T. J. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc Natl Acad Sci U S A. 1998 May 26;95(11):6073–6078. doi: 10.1073/pnas.95.11.6073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Fischer D., Elofsson A., Rice D., Eisenberg D. Assessing the performance of fold recognition methods by means of a comprehensive benchmark. Pac Symp Biocomput. 1996:300–318. [PubMed] [Google Scholar]
  8. Gibrat J. F., Madej T., Bryant S. H. Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996 Jun;6(3):377–385. doi: 10.1016/s0959-440x(96)80058-3. [DOI] [PubMed] [Google Scholar]
  9. Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Henikoff S., Henikoff J. G. Position-based sequence weights. J Mol Biol. 1994 Nov 4;243(4):574–578. doi: 10.1016/0022-2836(94)90032-9. [DOI] [PubMed] [Google Scholar]
  11. Holm L., Sander C. Dictionary of recurrent domains in protein structures. Proteins. 1998 Oct 1;33(1):88–96. doi: 10.1002/(sici)1097-0134(19981001)33:1<88::aid-prot8>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]
  12. Krogh A., Mian I. S., Haussler D. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 1994 Nov 11;22(22):4768–4778. doi: 10.1093/nar/22.22.4768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Murzin A. G., Brenner S. E., Hubbard T., Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995 Apr 7;247(4):536–540. doi: 10.1006/jmbi.1995.0159. [DOI] [PubMed] [Google Scholar]
  14. Murzin A. G. Structure classification-based assessment of CASP3 predictions for the fold recognition targets. Proteins. 1999;Suppl 3:88–103. doi: 10.1002/(sici)1097-0134(1999)37:3+<88::aid-prot13>3.3.co;2-v. [DOI] [PubMed] [Google Scholar]
  15. Needleman S. B., Wunsch C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
  16. Orengo C. A., Michie A. D., Jones S., Jones D. T., Swindells M. B., Thornton J. M. CATH--a hierarchic classification of protein domain structures. Structure. 1997 Aug 15;5(8):1093–1108. doi: 10.1016/s0969-2126(97)00260-8. [DOI] [PubMed] [Google Scholar]
  17. Park J., Karplus K., Barrett C., Hughey R., Haussler D., Hubbard T., Chothia C. Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. J Mol Biol. 1998 Dec 11;284(4):1201–1210. doi: 10.1006/jmbi.1998.2221. [DOI] [PubMed] [Google Scholar]
  18. Park J., Teichmann S. A., Hubbard T., Chothia C. Intermediate sequences increase the detection of homology between sequences. J Mol Biol. 1997 Oct 17;273(1):349–354. doi: 10.1006/jmbi.1997.1288. [DOI] [PubMed] [Google Scholar]
  19. Pawłowski K., Zhang B., Rychlewski L., Godzik A. The Helicobacter pylori genome: from sequence analysis to structural and functional predictions. Proteins. 1999 Jul 1;36(1):20–30. [PubMed] [Google Scholar]
  20. Pearson W. R. Empirical statistical estimates for sequence similarity searches. J Mol Biol. 1998 Feb 13;276(1):71–84. doi: 10.1006/jmbi.1997.1525. [DOI] [PubMed] [Google Scholar]
  21. Rychlewski L., Zhang B., Godzik A. Fold and function predictions for Mycoplasma genitalium proteins. Fold Des. 1998;3(4):229–238. doi: 10.1016/S1359-0278(98)00034-0. [DOI] [PubMed] [Google Scholar]
  22. Rychlewski L., Zhang B., Godzik A. Functional insights from structural predictions: analysis of the Escherichia coli genome. Protein Sci. 1999 Mar;8(3):614–624. doi: 10.1110/ps.8.3.614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Sander C., Schneider R. The HSSP data base of protein structure-sequence alignments. Nucleic Acids Res. 1993 Jul 1;21(13):3105–3109. doi: 10.1093/nar/21.13.3105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Shindyalov I. N., Bourne P. E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 1998 Sep;11(9):739–747. doi: 10.1093/protein/11.9.739. [DOI] [PubMed] [Google Scholar]
  25. Tatusov R. L., Altschul S. F., Koonin E. V. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091–12095. doi: 10.1073/pnas.91.25.12091. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES