Abstract
The sequence profile method (Gribskov M, McLachlan AD, Eisenberg D, 1987, Proc Natl Acad Sci USA 84:4355-4358) is a powerful tool to detect distant relationships between amino acid sequences. A profile is a table of position-specific scores and gap penalties, providing a generalized description of a protein motif, which can be used for sequence alignments and database searches instead of an individual sequence. A sequence profile is derived from a multiple sequence alignment. We have found 2 ways to improve the sensitivity of sequence profiles: (1) Sequence weights: Usage of individual weights for each sequence avoids bias toward closely related sequences. These weights are automatically assigned based on the distance of the sequences using a published procedure (Sibbald PR, Argos P, 1990, J Mol Biol 216:813-818). (2) Amino acid substitution table: In addition to the alignment, the construction of a profile also needs an amino acid substitution table. We have found that in some cases a new table, the BLOSUM45 table (Henikoff S, Henikoff JG, 1992, Proc Natl Acad Sci USA 89:10915-10919), is more sensitive than the original Dayhoff table or the modified Dayhoff table used in the current implementation. Profiles derived by the improved method are more sensitive and selective in a number of cases where previous methods have failed to completely separate true members from false positives.
Full Text
The Full Text of this article is available as a PDF (626.5 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Altschul S. F., Carroll R. J., Lipman D. J. Weights for data related by a tree. J Mol Biol. 1989 Jun 20;207(4):647–653. doi: 10.1016/0022-2836(89)90234-9. [DOI] [PubMed] [Google Scholar]
- Bairoch A., Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1992 May 11;20 (Suppl):2019–2022. doi: 10.1093/nar/20.suppl.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1992 May 11;20 (Suppl):2013–2018. doi: 10.1093/nar/20.suppl.2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bork P. Recognition of functional regions in primary structures using a set of property patterns. FEBS Lett. 1989 Oct 23;257(1):191–195. doi: 10.1016/0014-5793(89)81818-6. [DOI] [PubMed] [Google Scholar]
- Devereux J., Haeberli P., Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):387–395. doi: 10.1093/nar/12.1part1.387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henikoff S., Henikoff J. G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915–10919. doi: 10.1073/pnas.89.22.10915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pawson T., Gish G. D. SH2 and SH3 domains: from structure to function. Cell. 1992 Oct 30;71(3):359–362. doi: 10.1016/0092-8674(92)90504-6. [DOI] [PubMed] [Google Scholar]
- Plesofsky-Vig N., Vig J., Brambl R. Phylogeny of the alpha-crystallin-related heat-shock proteins. J Mol Evol. 1992 Dec;35(6):537–545. doi: 10.1007/BF00160214. [DOI] [PubMed] [Google Scholar]
- Sibbald P. R., Argos P. Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J Mol Biol. 1990 Dec 20;216(4):813–818. doi: 10.1016/S0022-2836(99)80003-5. [DOI] [PubMed] [Google Scholar]