Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 1995 Aug;4(8):1587–1595. doi: 10.1002/pro.5560040817

Finding flexible patterns in unaligned protein sequences.

I Jonassen 1, J F Collins 1, D G Higgins 1
PMCID: PMC2143188  PMID: 8520485

Abstract

We present a new method for the identification of conserved patterns in a set of unaligned related protein sequences. It is able to discover patterns of a quite general form, allowing for both ambiguous positions and for variable length wildcard regions. It allows the user to define a class of patterns (e.g., the degree of ambiguity allowed and the length and number of gaps), and the method is then guaranteed to find the conserved patterns in this class scoring highest according to a significance measure defined. Identified patterns may be refined using one of two new algorithms. We present a new (nonstatistical) significance measure for flexible patterns. The method is shown to recover known motifs for PROSITE families and is also applied to some recently described families from the literature.

Full Text

The Full Text of this article is available as a PDF (988.3 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Aasland R., Gibson T. J., Stewart A. F. The PHD finger: implications for chromatin-mediated transcriptional regulation. Trends Biochem Sci. 1995 Feb;20(2):56–59. doi: 10.1016/s0968-0004(00)88957-4. [DOI] [PubMed] [Google Scholar]
  2. Bairoch A., Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1992 May 11;20 (Suppl):2019–2022. doi: 10.1093/nar/20.suppl.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bairoch A., Bucher P. PROSITE: recent developments. Nucleic Acids Res. 1994 Sep;22(17):3583–3589. [PMC free article] [PubMed] [Google Scholar]
  4. Devereux J., Haeberli P., Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395. doi: 10.1093/nar/12.1part1.387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Dodd I. B., Egan J. B. Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res. 1990 Sep 11;18(17):5019–5026. doi: 10.1093/nar/18.17.5019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Etzold T., Argos P. SRS--an indexing and retrieval tool for flat file data libraries. Comput Appl Biosci. 1993 Feb;9(1):49–57. doi: 10.1093/bioinformatics/9.1.49. [DOI] [PubMed] [Google Scholar]
  7. Fuchs R. Predicting protein function: a versatile tool for the Apple Macintosh. Comput Appl Biosci. 1994 Apr;10(2):171–178. doi: 10.1093/bioinformatics/10.2.171. [DOI] [PubMed] [Google Scholar]
  8. Henikoff S., Henikoff J. G. Automated assembly of protein blocks for database searching. Nucleic Acids Res. 1991 Dec 11;19(23):6565–6572. doi: 10.1093/nar/19.23.6565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Karlin S., Altschul S. F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A. 1990 Mar;87(6):2264–2268. doi: 10.1073/pnas.87.6.2264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Lawrence C. E., Altschul S. F., Boguski M. S., Liu J. S., Neuwald A. F., Wootton J. C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. doi: 10.1126/science.8211139. [DOI] [PubMed] [Google Scholar]
  11. Musacchio A., Gibson T., Lehto V. P., Saraste M. SH3--an abundant protein domain in search of a function. FEBS Lett. 1992 Jul 27;307(1):55–61. doi: 10.1016/0014-5793(92)80901-r. [DOI] [PubMed] [Google Scholar]
  12. Neuwald A. F., Green P. Detecting patterns in protein sequences. J Mol Biol. 1994 Jun 24;239(5):698–712. doi: 10.1006/jmbi.1994.1407. [DOI] [PubMed] [Google Scholar]
  13. Ogiwara A., Uchiyama I., Seto Y., Kanehisa M. Construction of a dictionary of sequence motifs that characterize groups of related proteins. Protein Eng. 1992 Sep;5(6):479–488. doi: 10.1093/protein/5.6.479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Roytberg M. A. A search for common patterns in many sequences. Comput Appl Biosci. 1992 Feb;8(1):57–64. doi: 10.1093/bioinformatics/8.1.57. [DOI] [PubMed] [Google Scholar]
  15. Saqi M. A., Sternberg M. J. Identification of sequence motifs from a set of proteins with related function. Protein Eng. 1994 Feb;7(2):165–171. doi: 10.1093/protein/7.2.165. [DOI] [PubMed] [Google Scholar]
  16. Smith H. O., Annau T. M., Chandrasegaran S. Finding sequence motifs in groups of functionally related proteins. Proc Natl Acad Sci U S A. 1990 Jan;87(2):826–830. doi: 10.1073/pnas.87.2.826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Smith R. F., Smith T. F. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122. doi: 10.1073/pnas.87.1.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Taylor W. R. Identification of protein sequence homology by consensus template alignment. J Mol Biol. 1986 Mar 20;188(2):233–258. doi: 10.1016/0022-2836(86)90308-6. [DOI] [PubMed] [Google Scholar]
  19. Thompson J. D., Higgins D. G., Gibson T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Thompson J. D., Higgins D. G., Gibson T. J. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. doi: 10.1093/bioinformatics/10.1.19. [DOI] [PubMed] [Google Scholar]
  21. Wang J. T., Marr T. G., Shasha D., Shapiro B. A., Chirn G. W. Discovering active motifs in sets of related protein sequences and using them for classification. Nucleic Acids Res. 1994 Jul 25;22(14):2769–2775. doi: 10.1093/nar/22.14.2769. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES