Skip to main content
Protein Science : A Publication of the Protein Society logoLink to Protein Science : A Publication of the Protein Society
. 1994 Jan;3(1):139–146. doi: 10.1002/pro.5560030118

Improving the sensitivity of the sequence profile method.

R Lüthy 1, I Xenarios 1, P Bucher 1
PMCID: PMC2142471  PMID: 7511453

Abstract

The sequence profile method (Gribskov M, McLachlan AD, Eisenberg D, 1987, Proc Natl Acad Sci USA 84:4355-4358) is a powerful tool to detect distant relationships between amino acid sequences. A profile is a table of position-specific scores and gap penalties, providing a generalized description of a protein motif, which can be used for sequence alignments and database searches instead of an individual sequence. A sequence profile is derived from a multiple sequence alignment. We have found 2 ways to improve the sensitivity of sequence profiles: (1) Sequence weights: Usage of individual weights for each sequence avoids bias toward closely related sequences. These weights are automatically assigned based on the distance of the sequences using a published procedure (Sibbald PR, Argos P, 1990, J Mol Biol 216:813-818). (2) Amino acid substitution table: In addition to the alignment, the construction of a profile also needs an amino acid substitution table. We have found that in some cases a new table, the BLOSUM45 table (Henikoff S, Henikoff JG, 1992, Proc Natl Acad Sci USA 89:10915-10919), is more sensitive than the original Dayhoff table or the modified Dayhoff table used in the current implementation. Profiles derived by the improved method are more sensitive and selective in a number of cases where previous methods have failed to completely separate true members from false positives.

Full Text

The Full Text of this article is available as a PDF (626.5 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altschul S. F., Carroll R. J., Lipman D. J. Weights for data related by a tree. J Mol Biol. 1989 Jun 20;207(4):647–653. doi: 10.1016/0022-2836(89)90234-9. [DOI] [PubMed] [Google Scholar]
  2. Bairoch A., Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1992 May 11;20 (Suppl):2019–2022. doi: 10.1093/nar/20.suppl.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1992 May 11;20 (Suppl):2013–2018. doi: 10.1093/nar/20.suppl.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bork P. Recognition of functional regions in primary structures using a set of property patterns. FEBS Lett. 1989 Oct 23;257(1):191–195. doi: 10.1016/0014-5793(89)81818-6. [DOI] [PubMed] [Google Scholar]
  5. Devereux J., Haeberli P., Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395. doi: 10.1093/nar/12.1part1.387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Henikoff S., Henikoff J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Pawson T., Gish G. D. SH2 and SH3 domains: from structure to function. Cell. 1992 Oct 30;71(3):359–362. doi: 10.1016/0092-8674(92)90504-6. [DOI] [PubMed] [Google Scholar]
  9. Plesofsky-Vig N., Vig J., Brambl R. Phylogeny of the alpha-crystallin-related heat-shock proteins. J Mol Evol. 1992 Dec;35(6):537–545. doi: 10.1007/BF00160214. [DOI] [PubMed] [Google Scholar]
  10. Sibbald P. R., Argos P. Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J Mol Biol. 1990 Dec 20;216(4):813–818. doi: 10.1016/S0022-2836(99)80003-5. [DOI] [PubMed] [Google Scholar]

Articles from Protein Science : A Publication of the Protein Society are provided here courtesy of The Protein Society

RESOURCES