Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 1996 Oct;5(10):1991–1999. doi: 10.1002/pro.5560051005

Construction and analysis of a profile library characterizing groups of structurally known proteins.

A Ogiwara 1, I Uchiyama 1, T Takagi 1, M Kanehisa 1
PMCID: PMC2143267  PMID: 8897599

Abstract

A new sequence motif library StrProf was constructed characterizing the groups of related proteins in the PDB three-dimensional structure database. For a representative member of each protein family, which was identified by cross-referencing the PDB with the PIR superfamily classification, a group of related sequences was collected by the BLAST search against the nonredundant protein sequence database. For every group, the motifs were identified automatically according to the criteria of conservation and uniqueness of pentapeptide patterns and with a dual dynamic programming algorithm. In the StrProf library, motifs are represented by profile matrices rather than consensus patterns to allow more flexible search capabilities. Another dynamic programming algorithm was then developed to search this motif library. When the computationally derived StrProf was compared with PROSITE, which is a manually derived motif library in the best consensus pattern representation, the numbers of identified patterns were comparable. StrProf missed about one third of the PROSITE motifs, but there were also new motifs lacking in PROSITE. The new library was incorporated in SMART (Sequence Motif Analysis and Retrieval Tool), a computer tool designed to help search and annotate biologically important sites in an unknown protein sequence. The client program is available free of charge through the Internet.

Full Text

The Full Text of this article is available as a PDF (5.8 MB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  2. Alves I. L., Divino C. M., Schussler G. C., Altland K., Almeida M. R., Palha J. A., Coelho T., Costa P. P., Saraiva M. J. Thyroxine binding in a TTR Met 119 kindred. J Clin Endocrinol Metab. 1993 Aug;77(2):484–488. doi: 10.1210/jcem.77.2.8102146. [DOI] [PubMed] [Google Scholar]
  3. Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1992 May 11;20 (Suppl):2013–2018. doi: 10.1093/nar/20.suppl.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Cardon L. R., Stormo G. D. Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments. J Mol Biol. 1992 Jan 5;223(1):159–170. doi: 10.1016/0022-2836(92)90723-w. [DOI] [PubMed] [Google Scholar]
  5. Galas D. J., Eggert M., Waterman M. S. Rigorous pattern-recognition methods for DNA sequences. Analysis of promoter sequences from Escherichia coli. J Mol Biol. 1985 Nov 5;186(1):117–128. doi: 10.1016/0022-2836(85)90262-1. [DOI] [PubMed] [Google Scholar]
  6. Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Henikoff S., Henikoff J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hsu Y. R., Ferguson B., Narachi M., Richards R. M., Stabinsky Y., Alton N. K., Stebbing N., Arakawa T. Structure and activity of recombinant human interferon-gamma analogs. J Interferon Res. 1986 Dec;6(6):663–670. doi: 10.1089/jir.1986.6.663. [DOI] [PubMed] [Google Scholar]
  9. Matsuo Y., Nishikawa K. Protein structural similarities predicted by a sequence-structure compatibility method. Protein Sci. 1994 Nov;3(11):2055–2063. doi: 10.1002/pro.5560031118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Ogiwara A., Uchiyama I., Seto Y., Kanehisa M. Construction of a dictionary of sequence motifs that characterize groups of related proteins. Protein Eng. 1992 Sep;5(6):479–488. doi: 10.1093/protein/5.6.479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Schneider T. D., Stormo G. D., Gold L., Ehrenfeucht A. Information content of binding sites on nucleotide sequences. J Mol Biol. 1986 Apr 5;188(3):415–431. doi: 10.1016/0022-2836(86)90165-8. [DOI] [PubMed] [Google Scholar]
  12. Sibbald P. R., Sommerfeldt H., Argos P. Automated protein sequence pattern handling and PROSITE searching. Comput Appl Biosci. 1991 Oct;7(4):535–536. doi: 10.1093/bioinformatics/7.4.535. [DOI] [PubMed] [Google Scholar]
  13. Stormo G. D., Hartzell G. W., 3rd Identifying protein-binding sites from unaligned DNA fragments. Proc Natl Acad Sci U S A. 1989 Feb;86(4):1183–1187. doi: 10.1073/pnas.86.4.1183. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES