Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1986 Jul;83(14):5155–5159. doi: 10.1073/pnas.83.14.5155

A measure of the similarity of sets of sequences not requiring sequence alignment.

B E Blaisdell
PMCID: PMC323909  PMID: 3460087

Abstract

Determination of first- and second-order Markov chain homogeneity of sets of nuclear eukaryotic DNA sequences, both coding and noncoding, finds similarities imperceptible to the standard Needleman-Wunsch base matching or dot-matrix algorithms. These measures of the similarities of the distributions of adjacent pairs or triplets are in agreement with accepted evolutionary-tree topologies. Hierarchical clustering of the distributions of doublets of 30 miscellaneous coding sequences gives clusters in reasonable agreement with accepted biological classifications. In addition to similarity by homology, there is also observed similarity of disparate genes in the same organism--for example, all three disparate yeast genes (two enzymes and actin) form a well-distinguished cluster.

Full text

PDF
5155

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Altenburger W., Neumaier P. S., Steinmetz M., Zachau H. G. DNA sequence of the constant gene region of the mouse immunoglobulin kappa chain. Nucleic Acids Res. 1981 Feb 25;9(4):971–981. doi: 10.1093/nar/9.4.971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Baralle F. E., Shoulders C. C., Goodbourn S., Jeffreys A., Proudfoot N. J. The 5' flanking region of human epsilon-globin gene. Nucleic Acids Res. 1980 Oct 10;8(19):4393–4404. doi: 10.1093/nar/8.19.4393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Baralle F. E., Shoulders C. C., Proudfoot N. J. The primary structure of the human epsilon-globin gene. Cell. 1980 Oct;21(3):621–626. doi: 10.1016/0092-8674(80)90425-0. [DOI] [PubMed] [Google Scholar]
  4. Bell G. I., Pictet R. L., Rutter W. J., Cordell B., Tischer E., Goodman H. M. Sequence of the human insulin gene. Nature. 1980 Mar 6;284(5751):26–32. doi: 10.1038/284026a0. [DOI] [PubMed] [Google Scholar]
  5. Bell G. I., Pictet R., Rutter W. J. Analysis of the regions flanking the human insulin gene and sequence of an Alu family member. Nucleic Acids Res. 1980 Sep 25;8(18):4091–4109. doi: 10.1093/nar/8.18.4091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bernardi G., Olofsson B., Filipski J., Zerial M., Salinas J., Cuny G., Meunier-Rotival M., Rodier F. The mosaic genome of warm-blooded vertebrates. Science. 1985 May 24;228(4702):953–958. doi: 10.1126/science.4001930. [DOI] [PubMed] [Google Scholar]
  7. Bird A. P. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980 Apr 11;8(7):1499–1504. doi: 10.1093/nar/8.7.1499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Blaisdell B. E. Choice of base at silent codon site 3 is not selectively neutral in eucaryotic structural genes: it maintains excess short runs of weak and strong hydrogen bonding bases. J Mol Evol. 1983;19(3-4):226–236. doi: 10.1007/BF02099970. [DOI] [PubMed] [Google Scholar]
  9. Blaisdell B. E. Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol. 1984;21(3):278–288. doi: 10.1007/BF02102360. [DOI] [PubMed] [Google Scholar]
  10. Chang A. C., Cochet M., Cohen S. N. Structural organization of human genomic DNA encoding the pro-opiomelanocortin peptide. Proc Natl Acad Sci U S A. 1980 Aug;77(8):4890–4894. doi: 10.1073/pnas.77.8.4890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Efstratiadis A., Posakony J. W., Maniatis T., Lawn R. M., O'Connell C., Spritz R. A., DeRiel J. K., Forget B. G., Weissman S. M., Slightom J. L. The structure and evolution of the human beta-globin gene family. Cell. 1980 Oct;21(3):653–668. doi: 10.1016/0092-8674(80)90429-8. [DOI] [PubMed] [Google Scholar]
  12. Goeddel D. V., Yelverton E., Ullrich A., Heyneker H. L., Miozzari G., Holmes W., Seeburg P. H., Dull T., May L., Stebbing N. Human leukocyte interferon produced by E. coli is biologically active. Nature. 1980 Oct 2;287(5781):411–416. doi: 10.1038/287411a0. [DOI] [PubMed] [Google Scholar]
  13. Grantham R., Gautier C., Gouy M., Mercier R., Pavé A. Codon catalog usage and the genome hypothesis. Nucleic Acids Res. 1980 Jan 11;8(1):r49–r62. doi: 10.1093/nar/8.1.197-c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Gubbins E. J., Maurer R. A., Lagrimini M., Erwin C. R., Donelson J. E. Structure of the rat prolactin gene. J Biol Chem. 1980 Sep 25;255(18):8655–8662. [PubMed] [Google Scholar]
  15. Hieter P. A., Max E. E., Seidman J. G., Maizel J. V., Jr, Leder P. Cloned human and mouse kappa immunoglobulin constant and J region genes conserve homology in functional segments. Cell. 1980 Nov;22(1 Pt 1):197–207. doi: 10.1016/0092-8674(80)90168-3. [DOI] [PubMed] [Google Scholar]
  16. Holland J. P., Holland M. J. The primary structure of a glyceraldehyde-3-phosphate dehydrogenase gene from Saccharomyces cerevisiae. J Biol Chem. 1979 Oct 10;254(19):9839–9845. [PubMed] [Google Scholar]
  17. JOSSE J., KAISER A. D., KORNBERG A. Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J Biol Chem. 1961 Mar;236:864–875. [PubMed] [Google Scholar]
  18. Konkel D. A., Maizel J. V., Jr, Leder P. The evolution and sequence comparison of two recently diverged mouse chromosomal beta--globin genes. Cell. 1979 Nov;18(3):865–873. doi: 10.1016/0092-8674(79)90138-7. [DOI] [PubMed] [Google Scholar]
  19. Konkel D. A., Maizel J. V., Jr, Leder P. The evolution and sequence comparison of two recently diverged mouse chromosomal beta--globin genes. Cell. 1979 Nov;18(3):865–873. doi: 10.1016/0092-8674(79)90138-7. [DOI] [PubMed] [Google Scholar]
  20. Lawn R. M., Adelman J., Franke A. E., Houck C. M., Gross M., Najarian R., Goeddel D. V. Human fibroblast interferon gene lacks introns. Nucleic Acids Res. 1981 Mar 11;9(5):1045–1052. doi: 10.1093/nar/9.5.1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Lawn R. M., Efstratiadis A., O'Connell C., Maniatis T. The nucleotide sequence of the human beta-globin gene. Cell. 1980 Oct;21(3):647–651. doi: 10.1016/0092-8674(80)90428-6. [DOI] [PubMed] [Google Scholar]
  22. Lomedico P., Rosenthal N., Efstratidadis A., Gilbert W., Kolodner R., Tizard R. The structure and evolution of the two nonallelic rat preproinsulin genes. Cell. 1979 Oct;18(2):545–558. doi: 10.1016/0092-8674(79)90071-0. [DOI] [PubMed] [Google Scholar]
  23. Needleman S. B., Wunsch C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443–453. doi: 10.1016/0022-2836(70)90057-4. [DOI] [PubMed] [Google Scholar]
  24. Newell N., Richards J. E., Tucker P. W., Blattner F. R. J genes for heavy chain immunoglobulins of mouse. Science. 1980 Sep 5;209(4461):1128–1132. doi: 10.1126/science.6250219. [DOI] [PubMed] [Google Scholar]
  25. Ng R., Abelson J. Isolation and sequence of the gene for actin in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 1980 Jul;77(7):3912–3916. doi: 10.1073/pnas.77.7.3912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Nishioka Y., Leder P. Organization and complete sequence of identical embryonic and plasmacytoma kappa V-region genes. J Biol Chem. 1980 Apr 25;255(8):3691–3694. [PubMed] [Google Scholar]
  27. Nishioka Y., Leder P. The complete sequence of a chromosomal mouse alpha--globin gene reveals elements conserved throughout vertebrate evolution. Cell. 1979 Nov;18(3):875–882. doi: 10.1016/0092-8674(79)90139-9. [DOI] [PubMed] [Google Scholar]
  28. Pan J., Elder J. T., Duncan C. H., Weissman S. M. Structural analysis of interspersed repetitive polymerase III transcription units in human DNA. Nucleic Acids Res. 1981 Mar 11;9(5):1151–1170. [PMC free article] [PubMed] [Google Scholar]
  29. Perler F., Efstratiadis A., Lomedico P., Gilbert W., Kolodner R., Dodgson J. The evolution of genes: the chicken preproinsulin gene. Cell. 1980 Jun;20(2):555–566. doi: 10.1016/0092-8674(80)90641-8. [DOI] [PubMed] [Google Scholar]
  30. Proudfoot N. J., Maniatis T. The structure of a human alpha-globin pseudogene and its relationship to alpha-globin gene duplication. Cell. 1980 Sep;21(2):537–544. doi: 10.1016/0092-8674(80)90491-2. [DOI] [PubMed] [Google Scholar]
  31. Richards R. I., Shine J., Ullrich A., Wells J. R., Goodman H. M. Molecular cloning and sequence analysis of adult chicken betal globin cDNA. Nucleic Acids Res. 1979 Nov 10;7(5):1137–1146. doi: 10.1093/nar/7.5.1137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Robertson M. A., Staden R., Tanaka Y., Catterall J. F., O'Malley B. W., Brownlee G. G. Sequence of three introns in the chick ovalbumin gene. Nature. 1979 Mar 22;278(5702):370–372. doi: 10.1038/278370a0. [DOI] [PubMed] [Google Scholar]
  33. Sakano H., Hüppi K., Heinrich G., Tonegawa S. Sequences at the somatic recombination sites of immunoglobulin light-chain genes. Nature. 1979 Jul 26;280(5720):288–294. doi: 10.1038/280288a0. [DOI] [PubMed] [Google Scholar]
  34. Sakano H., Maki R., Kurosawa Y., Roeder W., Tonegawa S. Two types of somatic recombination are necessary for the generation of complete immunoglobulin heavy-chain genes. Nature. 1980 Aug 14;286(5774):676–683. doi: 10.1038/286676a0. [DOI] [PubMed] [Google Scholar]
  35. Slightom J. L., Blechl A. E., Smithies O. Human fetal G gamma- and A gamma-globin genes: complete nucleotide sequences suggest that DNA can be exchanged between these duplicated genes. Cell. 1980 Oct;21(3):627–638. doi: 10.1016/0092-8674(80)90426-2. [DOI] [PubMed] [Google Scholar]
  36. Smith T. F., Waterman M. S., Sadler J. R. Statistical characterization of nucleic acid sequence functional domains. Nucleic Acids Res. 1983 Apr 11;11(7):2205–2220. doi: 10.1093/nar/11.7.2205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Spritz R. A., DeRiel J. K., Forget B. G., Weissman S. M. Complete nucleotide sequence of the human delta-globin gene. Cell. 1980 Oct;21(3):639–646. doi: 10.1016/0092-8674(80)90427-4. [DOI] [PubMed] [Google Scholar]
  38. Sures I., Lowry J., Kedes L. H. The DNA sequence of sea urchin (S. purpuratus) H2A, H2B and H3 histone coding and spacer regions. Cell. 1978 Nov;15(3):1033–1044. doi: 10.1016/0092-8674(78)90287-8. [DOI] [PubMed] [Google Scholar]
  39. Takahashi N., Kataoka T., Honjo T. Nucleotide sequences of class-switch recombination region of the mouse immunoglobulin gamma 2b-chain gene. Gene. 1980 Oct;11(1-2):117–127. doi: 10.1016/0378-1119(80)90092-x. [DOI] [PubMed] [Google Scholar]
  40. Tschumper G., Carbon J. Sequence of a yeast DNA fragment containing a chromosomal replicator and the TRP1 gene. Gene. 1980 Jul;10(2):157–166. doi: 10.1016/0378-1119(80)90133-x. [DOI] [PubMed] [Google Scholar]
  41. Tsujimoto Y., Suzuki Y. The DNA sequence of Bombyx mori fibroin gene including the 5' flanking, mRNA coding, entire intervening and fibroin protein coding regions. Cell. 1979 Oct;18(2):591–600. doi: 10.1016/0092-8674(79)90075-8. [DOI] [PubMed] [Google Scholar]
  42. Ullrich A., Dull T. J., Gray A., Brosius J., Sures I. Genetic variation in the human insulin gene. Science. 1980 Aug 1;209(4456):612–615. doi: 10.1126/science.6248962. [DOI] [PubMed] [Google Scholar]
  43. Young R. A., Hagenbüchle O., Schibler U. A single mouse alpha-amylase gene specifies two different tissue-specific mRNAs. Cell. 1981 Feb;23(2):451–458. doi: 10.1016/0092-8674(81)90140-9. [DOI] [PubMed] [Google Scholar]
  44. Zuckerkandl E. The appearance of new structures and functions in proteins during evolution. J Mol Evol. 1975 Dec 31;7(1):1–57. doi: 10.1007/BF01732178. [DOI] [PubMed] [Google Scholar]
  45. van Ooyen A., van den Berg J., Mantei N., Weissmann C. Comparison of total sequence of a cloned rabbit beta-globin gene and its flanking regions with a homologous mouse sequence. Science. 1979 Oct 19;206(4416):337–344. doi: 10.1126/science.482942. [DOI] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES