Abstract
A general searching method for comparing multiple sequence alignments was developed to detect sequence relationships between conserved protein regions. Multiple alignments are treated as sequences of amino acid distributions and aligned by comparing pairs of such distributions. Four different comparison measures were tested and the Pearson correlation coefficient chosen. The method is sensitive, detecting weak sequence relationships between protein families. Relationships are detected beyond the range of conventional sequence database searches, illustrating the potential usefulness of the method. The previously undetected relation between flavoprotein subunits of two oxidoreductase families points to the potential active site in one of the families. The similarity between the bacterial RecA, DnaA and Rad51 protein families reveals a region in DnaA and Rad51 proteins likely to bind and unstack single-stranded DNA. Helix--turn--helix DNA binding domains from diverse proteins are readily detected and shown to be similar to each other. Glycosylasparaginase and gamma-glutamyltransferase enzymes are found to be similar in their proteolytic cleavage sites. The method has been fully implemented on the World Wide Web at URL: http://blocks.fhcrc.org/blocks-bin/LAMAvsearch.
Full Text
The Full Text of this article is available as a PDF (240.3 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Abad-Zapatero C., Griffith J. P., Sussman J. L., Rossmann M. G. Refined crystal structure of dogfish M4 apo-lactate dehydrogenase. J Mol Biol. 1987 Dec 5;198(3):445–467. doi: 10.1016/0022-2836(87)90293-2. [DOI] [PubMed] [Google Scholar]
- Altschul S. F. Amino acid substitution matrices from an information theoretic perspective. J Mol Biol. 1991 Jun 5;219(3):555–565. doi: 10.1016/0022-2836(91)90193-A. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- Altschul S. F., Lipman D. J. Protein database searches for multiple alignments. Proc Natl Acad Sci U S A. 1990 Jul;87(14):5509–5513. doi: 10.1073/pnas.87.14.5509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Attwood T. K., Beck M. E., Bleasby A. J., Parry-Smith D. J. PRINTS--a database of protein motif fingerprints. Nucleic Acids Res. 1994 Sep;22(17):3590–3596. [PMC free article] [PubMed] [Google Scholar]
- Bairoch A., Boeckmann B. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2247–2249. doi: 10.1093/nar/19.suppl.2247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bairoch A. PROSITE: a dictionary of sites and patterns in proteins. Nucleic Acids Res. 1991 Apr 25;19 (Suppl):2241–2245. doi: 10.1093/nar/19.suppl.2241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Birch-Machin M. A., Farnsworth L., Ackrell B. A., Cochran B., Jackson S., Bindoff L. A., Aitken A., Diamond A. G., Turnbull D. M. The sequence of the flavoprotein subunit of bovine heart succinate dehydrogenase. J Biol Chem. 1992 Jun 5;267(16):11553–11558. [PubMed] [Google Scholar]
- Bishop D. K., Park D., Xu L., Kleckner N. DMC1: a meiosis-specific yeast homolog of E. coli recA required for recombination, synaptonemal complex formation, and cell cycle progression. Cell. 1992 May 1;69(3):439–456. doi: 10.1016/0092-8674(92)90446-j. [DOI] [PubMed] [Google Scholar]
- Bramhill D., Kornberg A. Duplex opening by dnaA protein at novel sequences in initiation of replication at the origin of the E. coli chromosome. Cell. 1988 Mar 11;52(5):743–755. doi: 10.1016/0092-8674(88)90412-6. [DOI] [PubMed] [Google Scholar]
- Brunelle A., Schleif R. Determining residue-base interactions between AraC protein and araI DNA. J Mol Biol. 1989 Oct 20;209(4):607–622. doi: 10.1016/0022-2836(89)90598-6. [DOI] [PubMed] [Google Scholar]
- Clark A. J., Sandler S. J. Homologous genetic recombination: the pieces begin to fall into place. Crit Rev Microbiol. 1994;20(2):125–142. doi: 10.3109/10408419409113552. [DOI] [PubMed] [Google Scholar]
- Dodd I. B., Egan J. B. Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res. 1990 Sep 11;18(17):5019–5026. doi: 10.1093/nar/18.17.5019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dodd I. B., Egan J. B. Systematic method for the detection of potential lambda Cro-like DNA-binding regions in proteins. J Mol Biol. 1987 Apr 5;194(3):557–564. doi: 10.1016/0022-2836(87)90681-4. [DOI] [PubMed] [Google Scholar]
- Dong Q., Sadouk A., van der Lelie D., Taghavi S., Ferhat A., Nuyten J. M., Borremans B., Mergeay M., Toussaint A. Cloning and sequencing of IS1086, an Alcaligenes eutrophus insertion element related to IS30 and IS4351. J Bacteriol. 1992 Dec;174(24):8133–8138. doi: 10.1128/jb.174.24.8133-8138.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dubnau J., Struhl G. RNA recognition and translational regulation by a homeodomain protein. Nature. 1996 Feb 22;379(6567):694–699. doi: 10.1038/379694a0. [DOI] [PubMed] [Google Scholar]
- Fisher K. J., Tollersrud O. K., Aronson N. N., Jr Cloning and sequence analysis of a cDNA for human glycosylasparaginase. A single gene encodes the subunits of this lysosomal amidase. FEBS Lett. 1990 Dec 10;276(1-2):232–232. doi: 10.1016/0014-5793(90)80551-s. [DOI] [PubMed] [Google Scholar]
- Gardell S. J., Tate S. S. Latent proteinase activity of gamma-glutamyl transpeptidase light subunit. J Biol Chem. 1979 Jun 25;254(12):4942–4945. [PubMed] [Google Scholar]
- Grau U. M., Trommer W. E., Rossmann M. G. Structure of the active ternary complex of pig heart lactate dehydrogenase with S-lac-NAD at 2.7 A resolution. J Mol Biol. 1981 Sep 15;151(2):289–307. doi: 10.1016/0022-2836(81)90516-7. [DOI] [PubMed] [Google Scholar]
- Green P., Lipman D., Hillier L., Waterston R., States D., Claverie J. M. Ancient conserved regions in new gene sequences and the protein databases. Science. 1993 Mar 19;259(5102):1711–1716. doi: 10.1126/science.8456298. [DOI] [PubMed] [Google Scholar]
- Gribskov M., McLachlan A. D., Eisenberg D. Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355–4358. doi: 10.1073/pnas.84.13.4355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guan C., Cui T., Rao V., Liao W., Benner J., Lin C. L., Comb D. Activation of glycosylasparaginase. Formation of active N-terminal threonine by intramolecular autoproteolysis. J Biol Chem. 1996 Jan 19;271(3):1732–1737. doi: 10.1074/jbc.271.3.1732. [DOI] [PubMed] [Google Scholar]
- Hashimoto W., Suzuki H., Yamamoto K., Kumagai H. Effect of site-directed mutations on processing and activity of gamma-glutamyltranspeptidase of Escherichia coli K-12. J Biochem. 1995 Jul;118(1):75–80. doi: 10.1093/oxfordjournals.jbchem.a124894. [DOI] [PubMed] [Google Scholar]
- Henikoff J. G., Henikoff S. Blocks database and its applications. Methods Enzymol. 1996;266:88–105. doi: 10.1016/s0076-6879(96)66008-x. [DOI] [PubMed] [Google Scholar]
- Henikoff J. G., Henikoff S. Using substitution probabilities to improve position-specific scoring matrices. Comput Appl Biosci. 1996 Apr;12(2):135–143. doi: 10.1093/bioinformatics/12.2.135. [DOI] [PubMed] [Google Scholar]
- Henikoff S. Detection of Caenorhabditis transposon homologs in diverse organisms. New Biol. 1992 Apr;4(4):382–388. [PubMed] [Google Scholar]
- Henikoff S., Henikoff J. G. Automated assembly of protein blocks for database searching. Nucleic Acids Res. 1991 Dec 11;19(23):6565–6572. doi: 10.1093/nar/19.23.6565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henikoff S., Henikoff J. G. Position-based sequence weights. J Mol Biol. 1994 Nov 4;243(4):574–578. doi: 10.1016/0022-2836(94)90032-9. [DOI] [PubMed] [Google Scholar]
- Henikoff S., Henikoff J. G. Protein family classification based on searching a database of blocks. Genomics. 1994 Jan 1;19(1):97–107. doi: 10.1006/geno.1994.1018. [DOI] [PubMed] [Google Scholar]
- Iwata S., Ohta T. Molecular basis of allosteric activation of bacterial L-lactate dehydrogenase. J Mol Biol. 1993 Mar 5;230(1):21–27. doi: 10.1006/jmbi.1993.1122. [DOI] [PubMed] [Google Scholar]
- Kaartinen V., Williams J. C., Tomich J., Yates J. R., 3rd, Hood L. E., Mononen I. Glycosaparaginase from human leukocytes. Inactivation and covalent modification with diazo-oxonorvaline. J Biol Chem. 1991 Mar 25;266(9):5860–5869. [PubMed] [Google Scholar]
- Karlin S., Brocchieri L. Evolutionary conservation of RecA genes in relation to protein structure and function. J Bacteriol. 1996 Apr;178(7):1881–1894. doi: 10.1128/jb.178.7.1881-1894.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koonin E. V. A common set of conserved motifs in a vast variety of putative nucleic acid-dependent ATPases including MCM proteins involved in the initiation of eukaryotic DNA replication. Nucleic Acids Res. 1993 Jun 11;21(11):2541–2547. doi: 10.1093/nar/21.11.2541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koonin E. V., Bork P., Sander C. Yeast chromosome III: new gene functions. EMBO J. 1994 Feb 1;13(3):493–503. doi: 10.1002/j.1460-2075.1994.tb06287.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koonin E. V., Tatusov R. L., Rudd K. E. Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications. Proc Natl Acad Sci U S A. 1995 Dec 5;92(25):11921–11925. doi: 10.1073/pnas.92.25.11921. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuno T., Matsuda Y., Katunuma N. Characterization of a processing protease that converts the precursor form of gamma-glutamyltranspeptidase to its subunits. Biochem Int. 1984 Apr;8(4):581–588. [PubMed] [Google Scholar]
- Lawrence C. E., Altschul S. F., Boguski M. S., Liu J. S., Neuwald A. F., Wootton J. C. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208–214. doi: 10.1126/science.8211139. [DOI] [PubMed] [Google Scholar]
- Lewis M., Chang G., Horton N. C., Kercher M. A., Pace H. C., Schumacher M. A., Brennan R. G., Lu P. Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science. 1996 Mar 1;271(5253):1247–1254. doi: 10.1126/science.271.5253.1247. [DOI] [PubMed] [Google Scholar]
- Lieberman M. W., Barrios R., Carter B. Z., Habib G. M., Lebovitz R. M., Rajagopalan S., Sepulveda A. R., Shi Z. Z., Wan D. F. gamma-Glutamyl transpeptidase. What does the organization and expression of a multipromoter gene tell us about its functions? Am J Pathol. 1995 Nov;147(5):1175–1185. [PMC free article] [PubMed] [Google Scholar]
- Miyano M., Fukui K., Watanabe F., Takahashi S., Tada M., Kanashiro M., Miyake Y. Studies on Phe-228 and Leu-307 recombinant mutants of porcine kidney D-amino acid oxidase: expression, purification, and characterization. J Biochem. 1991 Jan;109(1):171–177. doi: 10.1093/oxfordjournals.jbchem.a123340. [DOI] [PubMed] [Google Scholar]
- Mononen I., Fisher K. J., Kaartinen V., Aronson N. N., Jr Aspartylglycosaminuria: protein chemistry and molecular biology of the most common lysosomal storage disorder of glycoprotein degradation. FASEB J. 1993 Oct;7(13):1247–1256. doi: 10.1096/fasebj.7.13.8405810. [DOI] [PubMed] [Google Scholar]
- Neuwald A. F., Green P. Detecting patterns in protein sequences. J Mol Biol. 1994 Jun 24;239(5):698–712. doi: 10.1006/jmbi.1994.1407. [DOI] [PubMed] [Google Scholar]
- Neuwald A. F., Liu J. S., Lawrence C. E. Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci. 1995 Aug;4(8):1618–1632. doi: 10.1002/pro.5560040820. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ogawa T., Yu X., Shinohara A., Egelman E. H. Similarity of the yeast RAD51 filament to the bacterial RecA filament. Science. 1993 Mar 26;259(5103):1896–1899. doi: 10.1126/science.8456314. [DOI] [PubMed] [Google Scholar]
- Pabo C. O., Sauer R. T. Protein-DNA recognition. Annu Rev Biochem. 1984;53:293–321. doi: 10.1146/annurev.bi.53.070184.001453. [DOI] [PubMed] [Google Scholar]
- Pabo C. O., Sauer R. T. Transcription factors: structural families and principles of DNA recognition. Annu Rev Biochem. 1992;61:1053–1095. doi: 10.1146/annurev.bi.61.070192.005201. [DOI] [PubMed] [Google Scholar]
- Pearson W. R. Comparison of methods for searching protein sequence databases. Protein Sci. 1995 Jun;4(6):1145–1160. doi: 10.1002/pro.5560040613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pearson W. R., Miller W. Dynamic programming algorithms for biological sequence comparison. Methods Enzymol. 1992;210:575–601. doi: 10.1016/0076-6879(92)10029-d. [DOI] [PubMed] [Google Scholar]
- Roca A. I., Cox M. M. The RecA protein: structure and function. Crit Rev Biochem Mol Biol. 1990;25(6):415–456. doi: 10.3109/10409239009090617. [DOI] [PubMed] [Google Scholar]
- Sander C., Schneider R. Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins. 1991;9(1):56–68. doi: 10.1002/prot.340090107. [DOI] [PubMed] [Google Scholar]
- Sandler S. J., Satin L. H., Samra H. S., Clark A. J. recA-like genes from three archaean species with putative protein products similar to Rad51 and Dmc1 proteins of the yeast Saccharomyces cerevisiae. Nucleic Acids Res. 1996 Jun 1;24(11):2125–2132. doi: 10.1093/nar/24.11.2125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schneider T. D., Stephens R. M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990 Oct 25;18(20):6097–6100. doi: 10.1093/nar/18.20.6097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schröder I., Gunsalus R. P., Ackrell B. A., Cochran B., Cecchini G. Identification of active site residues of Escherichia coli fumarate reductase by site-directed mutagenesis. J Biol Chem. 1991 Jul 25;266(21):13572–13579. [PubMed] [Google Scholar]
- Schulz G. E., Schirmer R. H., Pai E. F. FAD-binding site of glutathione reductase. J Mol Biol. 1982 Sep 15;160(2):287–308. doi: 10.1016/0022-2836(82)90177-2. [DOI] [PubMed] [Google Scholar]
- Shinohara A., Ogawa H., Ogawa T. Rad51 protein involved in repair and recombination in S. cerevisiae is a RecA-like protein. Cell. 1992 May 1;69(3):457–470. doi: 10.1016/0092-8674(92)90447-k. [DOI] [PubMed] [Google Scholar]
- Skarstad K., Boye E. The initiator protein DnaA: evolution, properties and function. Biochim Biophys Acta. 1994 Mar 1;1217(2):111–130. doi: 10.1016/0167-4781(94)90025-6. [DOI] [PubMed] [Google Scholar]
- Smith R. F., Smith T. F. Automatic generation of primary sequence patterns from sets of related protein sequences. Proc Natl Acad Sci U S A. 1990 Jan;87(1):118–122. doi: 10.1073/pnas.87.1.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith T. F., Waterman M. S. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195–197. doi: 10.1016/0022-2836(81)90087-5. [DOI] [PubMed] [Google Scholar]
- Sonnhammer E. L., Kahn D. Modular arrangement of proteins as inferred from analysis of homology. Protein Sci. 1994 Mar;3(3):482–492. doi: 10.1002/pro.5560030314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stalder R., Caspers P., Olasz F., Arber W. The N-terminal domain of the insertion sequence 30 transposase interacts specifically with the terminal inverted repeats of the element. J Biol Chem. 1990 Mar 5;265(7):3757–3762. [PubMed] [Google Scholar]
- Story R. M., Bishop D. K., Kleckner N., Steitz T. A. Structural relationship of bacterial RecA proteins to recombination proteins from bacteriophage T4 and yeast. Science. 1993 Mar 26;259(5103):1892–1896. doi: 10.1126/science.8456313. [DOI] [PubMed] [Google Scholar]
- Story R. M., Steitz T. A. Structure of the recA protein-ADP complex. Nature. 1992 Jan 23;355(6358):374–376. doi: 10.1038/355374a0. [DOI] [PubMed] [Google Scholar]
- Tarentino A. L., Quinones G., Hauer C. R., Changchien L. M., Plummer T. H., Jr Molecular cloning and sequence analysis of Flavobacterium meningosepticum glycosylasparaginase: a single gene encodes the alpha and beta subunits. Arch Biochem Biophys. 1995 Jan 10;316(1):399–406. doi: 10.1006/abbi.1995.1053. [DOI] [PubMed] [Google Scholar]
- Tatusov R. L., Altschul S. F., Koonin E. V. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12091–12095. doi: 10.1073/pnas.91.25.12091. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taylor W. R. A flexible method to align large numbers of biological sequences. J Mol Evol. 1988 Dec;28(1-2):161–169. doi: 10.1007/BF02143508. [DOI] [PubMed] [Google Scholar]
- Thompson J. D., Higgins D. G., Gibson T. J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson J. D., Higgins D. G., Gibson T. J. Improved sensitivity of profile searches through the use of sequence weights and gap excision. Comput Appl Biosci. 1994 Feb;10(1):19–29. doi: 10.1093/bioinformatics/10.1.19. [DOI] [PubMed] [Google Scholar]
- Voloshin O. N., Wang L., Camerini-Otero R. D. Homologous DNA pairing promoted by a 20-amino acid peptide derived from RecA. Science. 1996 May 10;272(5263):868–872. doi: 10.1126/science.272.5263.868. [DOI] [PubMed] [Google Scholar]
- Walker J. E., Saraste M., Runswick M. J., Gay N. J. Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J. 1982;1(8):945–951. doi: 10.1002/j.1460-2075.1982.tb01276.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wierenga R. K., Terpstra P., Hol W. G. Prediction of the occurrence of the ADP-binding beta alpha beta-fold in proteins, using an amino acid sequence fingerprint. J Mol Biol. 1986 Jan 5;187(1):101–107. doi: 10.1016/0022-2836(86)90409-2. [DOI] [PubMed] [Google Scholar]