Abstract
A simple procedure is described for finding similarities between proteins using nucleotide sequence databases. The approach is illustrated by several examples of previously unknown correspondences with important biological implications: Drosophila elongation factor Tu is shown to be encoded by two genes that are differently expressed during development; a cluster of three Drosophila genes likely encode maltases; a flesh-fly fat body protein resembles the hypothesized Drosophila alcohol dehydrogenase ancestral protein; an unknown protein encoded at the multifunctional E. coli hisT locus resembles aspartate beta-semialdehyde dehydrogenase; and the E. coli tyrR protein is related to nitrogen regulatory proteins. These and other matches were discovered using a personal computer of the type available in most laboratories collecting DNA sequence data. As relatively few sequences were sampled to find these matches, it is likely that much of the existing data has not been adequately examined.
Full text
PDF













Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Adelman J. P., Bond C. T., Douglass J., Herbert E. Two mammalian genes transcribed from opposite strands of the same DNA locus. Science. 1987 Mar 20;235(4795):1514–1517. doi: 10.1126/science.3547652. [DOI] [PubMed] [Google Scholar]
- Arps P. J., Marvel C. C., Rubin B. C., Tolan D. A., Penhoet E. E., Winkler M. E. Structural features of the hisT operon of Escherichia coli K-12. Nucleic Acids Res. 1985 Jul 25;13(14):5297–5315. doi: 10.1093/nar/13.14.5297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arps P. J., Winkler M. E. An unusual genetic link between vitamin B6 biosynthesis and tRNA pseudouridine modification in Escherichia coli K-12. J Bacteriol. 1987 Mar;169(3):1071–1079. doi: 10.1128/jb.169.3.1071-1079.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arps P. J., Winkler M. E. Structural analysis of the Escherichia coli K-12 hisT operon by using a kanamycin resistance cassette. J Bacteriol. 1987 Mar;169(3):1061–1070. doi: 10.1128/jb.169.3.1061-1070.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Binder F., Huber O., Böck A. Cyclodextrin-glycosyltransferase from Klebsiella pneumoniae M5a1: cloning, nucleotide sequence and expression. Gene. 1986;47(2-3):269–277. doi: 10.1016/0378-1119(86)90070-3. [DOI] [PubMed] [Google Scholar]
- Bognar A. L., Osborne C., Shane B. Primary structure of the Escherichia coli folC gene and its folylpolyglutamate synthetase-dihydrofolate synthetase product and regulation of expression by an upstream gene. J Biol Chem. 1987 Sep 5;262(25):12337–12343. [PubMed] [Google Scholar]
- Brands J. H., Maassen J. A., van Hemert F. J., Amons R., Möller W. The primary structure of the alpha subunit of human elongation factor 1. Structural aspects of guanine-nucleotide-binding sites. Eur J Biochem. 1986 Feb 17;155(1):167–171. doi: 10.1111/j.1432-1033.1986.tb09472.x. [DOI] [PubMed] [Google Scholar]
- Brierley H. L., Potter S. S. Distinct characteristics of loop sequences of two Drosophila foldback transposable elements. Nucleic Acids Res. 1985 Jan 25;13(2):485–500. doi: 10.1093/nar/13.2.485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buikema W. J., Szeto W. W., Lemley P. V., Orme-Johnson W. H., Ausubel F. M. Nitrogen fixation specific regulatory genes of Klebsiella pneumoniae and Rhizobium meliloti share homology with the general nitrogen regulatory gene ntrC of K. pneumoniae. Nucleic Acids Res. 1985 Jun 25;13(12):4539–4555. doi: 10.1093/nar/13.12.4539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cardineau G. A., Curtiss R., 3rd Nucleotide sequence of the asd gene of Streptococcus mutans. Identification of the promoter region and evidence for attenuator-like sequences preceding the structural gene. J Biol Chem. 1987 Mar 5;262(7):3344–3353. [PubMed] [Google Scholar]
- Chen C. N., Malone T., Beckendorf S. K., Davis R. L. At least two genes reside within a large intron of the dunce gene of Drosophila. Nature. 1987 Oct 22;329(6141):721–724. doi: 10.1038/329721a0. [DOI] [PubMed] [Google Scholar]
- Chye M. L., Pittard J. Transcription control of the aroP gene in Escherichia coli K-12: analysis of operator mutants. J Bacteriol. 1987 Jan;169(1):386–393. doi: 10.1128/jb.169.1.386-393.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Claverie J. M., Bricault L. PseqIP: a nonredundant and exhaustive protein sequence data bank generated from 4 major existing collections. Proteins. 1986 Sep;1(1):60–65. doi: 10.1002/prot.340010110. [DOI] [PubMed] [Google Scholar]
- Cornish E. C., Argyropoulos V. P., Pittard J., Davidson B. E. Structure of the Escherichia coli K12 regulatory gene tyrR. Nucleotide sequence and sites of initiation of transcription and translation. J Biol Chem. 1986 Jan 5;261(1):403–410. [PubMed] [Google Scholar]
- Cottrelle P., Thiele D., Price V. L., Memet S., Micouin J. Y., Marck C., Buhler J. M., Sentenac A., Fromageot P. Cloning, nucleotide sequence, and expression of one of two genes coding for yeast elongation factor 1 alpha. J Biol Chem. 1985 Mar 10;260(5):3090–3096. [PubMed] [Google Scholar]
- Dayhoff M. O., Barker W. C., Hunt L. T. Establishing homologies in protein sequences. Methods Enzymol. 1983;91:524–545. doi: 10.1016/s0076-6879(83)91049-2. [DOI] [PubMed] [Google Scholar]
- Doolittle R. F., Hunkapiller M. W., Hood L. E., Devare S. G., Robbins K. C., Aaronson S. A., Antoniades H. N. Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Science. 1983 Jul 15;221(4607):275–277. doi: 10.1126/science.6304883. [DOI] [PubMed] [Google Scholar]
- Drummond M., Whitty P., Wootton J. Sequence and domain relationships of ntrC and nifA from Klebsiella pneumoniae: homologies to other regulatory proteins. EMBO J. 1986 Feb;5(2):441–447. doi: 10.1002/j.1460-2075.1986.tb04230.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gauss D. H., Sprinzl M. Compilation of sequences of tRNA genes. Nucleic Acids Res. 1983 Jan 11;11(1):r55–103. [PMC free article] [PubMed] [Google Scholar]
- Haziza C., Stragier P., Patte J. C. Nucleotide sequence of the asd gene of Escherichia coli: absence of a typical attenuation signal. EMBO J. 1982;1(3):379–384. doi: 10.1002/j.1460-2075.1982.tb01178.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Henikoff S., Keene M. A., Fechtel K., Fristrom J. W. Gene within a gene: nested Drosophila genes encode unrelated proteins on opposite DNA strands. Cell. 1986 Jan 17;44(1):33–42. doi: 10.1016/0092-8674(86)90482-4. [DOI] [PubMed] [Google Scholar]
- Hong S. H., Marmur J. Primary structure of the maltase gene of the MAL6 locus of Saccharomyces carlsbergensis. Gene. 1986;41(1):75–84. doi: 10.1016/0378-1119(86)90269-6. [DOI] [PubMed] [Google Scholar]
- Hudson G. S., Davidson B. E. Nucleotide sequence and transcription of the phenylalanine and tyrosine operons of Escherichia coli K12. J Mol Biol. 1984 Dec 25;180(4):1023–1051. doi: 10.1016/0022-2836(84)90269-9. [DOI] [PubMed] [Google Scholar]
- Jagusztyn-Krynicka E. K., Smorawinska M., Curtiss R., 3rd Expression of Streptococcus mutans aspartate-semialdehyde dehydrogenase gene cloned into plasmid pBR322. J Gen Microbiol. 1982 May;128(5):1135–1145. doi: 10.1099/00221287-128-5-1135. [DOI] [PubMed] [Google Scholar]
- Kawazu T., Nakanishi Y., Uozumi N., Sasaki T., Yamagata H., Tsukagoshi N., Udaka S. Cloning and nucleotide sequence of the gene coding for enzymatically active fragments of the Bacillus polymyxa beta-amylase. J Bacteriol. 1987 Apr;169(4):1564–1570. doi: 10.1128/jb.169.4.1564-1570.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kinchington P. R., Remenick J., Ostrove J. M., Straus S. E., Ruyechan W. T., Hay J. Putative glycoprotein gene of varicella-zoster virus with variable copy numbers of a 42-base-pair repeat sequence has homology to herpes simplex virus glycoprotein C. J Virol. 1986 Sep;59(3):660–668. doi: 10.1128/jvi.59.3.660-668.1986. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ledeboer A. M., Edens L., Maat J., Visser C., Bos J. W., Verrips C. T., Janowicz Z., Eckart M., Roggenkamp R., Hollenberg C. P. Molecular cloning and characterization of a gene coding for methanol oxidase in Hansenula polymorpha. Nucleic Acids Res. 1985 May 10;13(9):3063–3082. doi: 10.1093/nar/13.9.3063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lenstra J. A., Van Vliet A., Arnberg A. C., Van Hemert F. J., Möller W. Genes coding for the elongation factor EF-1 alpha in Artemia. Eur J Biochem. 1986 Mar 17;155(3):475–483. doi: 10.1111/j.1432-1033.1986.tb09514.x. [DOI] [PubMed] [Google Scholar]
- Linz J. E., Lira L. M., Sypherd P. S. The primary structure and the functional domains of an elongation factor-1 alpha from Mucor racemosus. J Biol Chem. 1986 Nov 15;261(32):15022–15029. [PubMed] [Google Scholar]
- Lipman D. J., Pearson W. R. Rapid and sensitive protein similarity searches. Science. 1985 Mar 22;227(4693):1435–1441. doi: 10.1126/science.2983426. [DOI] [PubMed] [Google Scholar]
- Matsumoto N., Sekimizu K., Soma G., Ohmura Y., Andoh T., Nakanishi Y., Obinata M., Natori S. Structural analysis of a developmentally regulated 25-kDa protein gene of Sarcophaga peregrina. J Biochem. 1985 May;97(5):1501–1508. doi: 10.1093/oxfordjournals.jbchem.a135205. [DOI] [PubMed] [Google Scholar]
- McGinnis W., Garber R. L., Wirz J., Kuroiwa A., Gehring W. J. A homologous protein-coding sequence in Drosophila homeotic genes and its conservation in other metazoans. Cell. 1984 Jun;37(2):403–408. doi: 10.1016/0092-8674(84)90370-2. [DOI] [PubMed] [Google Scholar]
- Merrick M. J., Gibbins J. R. The nucleotide sequence of the nitrogen-regulation gene ntrA of Klebsiella pneumoniae and comparison with conserved features in bacterial RNA polymerase sigma factors. Nucleic Acids Res. 1985 Nov 11;13(21):7607–7620. doi: 10.1093/nar/13.21.7607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nagata S., Nagashima K., Tsunetsugu-Yokota Y., Fujimura K., Miyazaki M., Kaziro Y. Polypeptide chain elongation factor 1 alpha (EF-1 alpha) from yeast: nucleotide sequence of one of the two genes for EF-1 alpha from Saccharomyces cerevisiae. EMBO J. 1984 Aug;3(8):1825–1830. doi: 10.1002/j.1460-2075.1984.tb02053.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roth W. W., Bragg P. W., Corrias M. V., Reddy N. S., Dholakia J. N., Wahba A. J. Expression of a gene for mouse eucaryotic elongation factor Tu during murine erythroleukemic cell differentiation. Mol Cell Biol. 1987 Nov;7(11):3929–3936. doi: 10.1128/mcb.7.11.3929. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaeffer S. W., Aquadro C. F. Nucleotide sequence of the Adh gene region of Drosophila pseudoobscura: evolutionary change and evidence for an ancient gene duplication. Genetics. 1987 Sep;117(1):61–73. doi: 10.1093/genetics/117.1.61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Scott M. P., Weiner A. J. Structural relationships among genes that control development: sequence homology between the Antennapedia, Ultrabithorax, and fushi tarazu loci of Drosophila. Proc Natl Acad Sci U S A. 1984 Jul;81(13):4115–4119. doi: 10.1073/pnas.81.13.4115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shinozaki K., Ohme M., Tanaka M., Wakasugi T., Hayashida N., Matsubayashi T., Zaita N., Chunwongse J., Obokata J., Yamaguchi-Shinozaki K. The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO J. 1986 Sep;5(9):2043–2049. doi: 10.1002/j.1460-2075.1986.tb04464.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith J. M., Daum H. A., 3rd Identification and nucleotide sequence of a gene encoding 5'-phosphoribosylglycinamide transformylase in Escherichia coli K12. J Biol Chem. 1987 Aug 5;262(22):10565–10569. [PubMed] [Google Scholar]
- Snyder M., Davidson N. Two gene families clustered in a small region of the Drosophila genome. J Mol Biol. 1983 May 15;166(2):101–118. doi: 10.1016/s0022-2836(83)80001-1. [DOI] [PubMed] [Google Scholar]
- Szeto W. W., Nixon B. T., Ronson C. W., Ausubel F. M. Identification and characterization of the Rhizobium meliloti ntrC gene: R. meliloti has separate regulatory pathways for activation of nitrogen fixation genes in free-living and symbiotic cells. J Bacteriol. 1987 Apr;169(4):1423–1432. doi: 10.1128/jb.169.4.1423-1432.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Takeishi K., Kaneda S., Ayusawa D., Shimizu K., Gotoh O., Seno T. Nucleotide sequence of a functional cDNA for human thymidylate synthase. Nucleic Acids Res. 1985 Mar 25;13(6):2035–2043. doi: 10.1093/nar/13.6.2035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tamura H., Tahara T., Kuroiwa A., Obinata M., Natori S. Differential expression of two abundant messenger RNAs during development of Sarcophaga peregrina. Dev Biol. 1983 Sep;99(1):145–151. doi: 10.1016/0012-1606(83)90261-0. [DOI] [PubMed] [Google Scholar]
- Thatcher D. R., Sawyer L. Secondary-structure prediction from the sequence of Drosophila melanogaster (fruitfly) alcohol dehydrogenase. Biochem J. 1980 Jun 1;187(3):884–886. doi: 10.1042/bj1870884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vogt P. K., Bos T. J., Doolittle R. F. Homology between the DNA-binding domain of the GCN4 regulatory protein of yeast and the carboxyl-terminal region of a protein coded for by the oncogene jun. Proc Natl Acad Sci U S A. 1987 May;84(10):3316–3319. doi: 10.1073/pnas.84.10.3316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walldorf U., Hovemann B., Bautz E. K. F1 and F2: Two similar genes regulated differently during development of Drosophila melanogaster. Proc Natl Acad Sci U S A. 1985 Sep;82(17):5795–5799. doi: 10.1073/pnas.82.17.5795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilbur W. J., Lipman D. J. Rapid similarity searches of nucleic acid and protein data banks. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726–730. doi: 10.1073/pnas.80.3.726. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang M., Galizzi A., Henner D. Nucleotide sequence of the amylase gene from Bacillus subtilis. Nucleic Acids Res. 1983 Jan 25;11(2):237–249. doi: 10.1093/nar/11.2.237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zalkin H., Paluh J. L., van Cleemput M., Moye W. S., Yanofsky C. Nucleotide sequence of Saccharomyces cerevisiae genes TRP2 and TRP3 encoding bifunctional anthranilate synthase: indole-3-glycerol phosphate synthase. J Biol Chem. 1984 Mar 25;259(6):3985–3992. [PubMed] [Google Scholar]
