Skip to main content
Journal of Bacteriology logoLink to Journal of Bacteriology
. 1997 Jun;179(12):3899–3913. doi: 10.1128/jb.179.12.3899-3913.1997

Compositional biases of bacterial genomes and evolutionary implications.

S Karlin 1, J Mrázek 1, A M Campbell 1
PMCID: PMC179198  PMID: 9190805

Abstract

We compare and contrast genome-wide compositional biases and distributions of short oligonucleotides across 15 diverse prokaryotes that have substantial genomic sequence collections. These include seven complete genomes (Escherichia coli, Haemophilus influenzae, Mycoplasma genitalium, Mycoplasma pneumoniae, Synechocystis sp. strain PCC6803, Methanococcus jannaschii, and Pyrobaculum aerophilum). A key observation concerns the constancy of the dinucleotide relative abundance profiles over multiple 50-kb disjoint contigs within the same genome. (The profile is rhoXY* = fXY*/fX*fY* for all XY, where fX* denotes the frequency of the nucleotide X and fY* denotes the frequency of the dinucleotide XY, both computed from the sequence concatenated with its inverted complementary sequence.) On the basis of this constancy, we refer to the collection [rhoXY*] as the genome signature. We establish that the differences between [rhoXY*] vectors of 50-kb sample contigs of different genomes virtually always exceed the differences between those of the same genomes. Various di- and tetranucleotide biases are identified. In particular, we find that the dinucleotide CpG=CG is underrepresented in many thermophiles (e.g., M. jannaschii, Sulfolobus sp., and M. thermoautotrophicum) but overrepresented in halobacteria. TA is broadly underrepresented in prokaryotes and eukaryotes, but normal counts appear in Sulfolobus and P. aerophilum sequences. More than for any other bacterial genome, palindromic tetranucleotides are underrepresented in H. influenzae. The M. jannaschii sequence is unprecedented in its extreme underrepresentation of CTAG tetranucleotides and in the anomalous distribution of CTAG sites around the genome. Comparative analysis of numbers of long tetranucleotide microsatellites distinguishes H. influenzae. Dinucleotide relative abundance differences between bacterial sequences are compared. For example, in these assessments of differences, the cyanobacteria Synechocystis, Synechococcus, and Anabaena do not form a coherent group and are as far from each other as general gram-negative sequences are from general gram-positive sequences. The difference of M. jannaschii from low-G+C gram-positive proteobacteria is one-half of the difference from gram-negative proteobacteria. Interpretations and hypotheses center on the role of the genome signature in highlighting similarities and dissimilarities across different classes of prokaryotic species, possible mechanisms underlying the genome signature, the form and level of genome compositional flux, the use of the genome signature as a chronometer of molecular phylogeny, and implications with respect to the three putative eubacterial, archaeal, and eukaryote domains of life and to the origin and early evolution of eukaryotes.

Full Text

The Full Text of this article is available as a PDF (399.7 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Baldauf S. L., Palmer J. D., Doolittle W. F. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc Natl Acad Sci U S A. 1996 Jul 23;93(15):7749–7754. doi: 10.1073/pnas.93.15.7749. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Benachenhou-Lahfa N., Forterre P., Labedan B. Evolution of glutamate dehydrogenase genes: evidence for two paralogous protein families and unusual branching patterns of the archaebacteria in the universal tree of life. J Mol Evol. 1993 Apr;36(4):335–346. doi: 10.1007/BF00182181. [DOI] [PubMed] [Google Scholar]
  3. Beutler E., Gelbart T., Han J. H., Koziol J. A., Beutler B. Evolution of the genome and the genetic code: selection at the dinucleotide level by methylation and polyribonucleotide cleavage. Proc Natl Acad Sci U S A. 1989 Jan;86(1):192–196. doi: 10.1073/pnas.86.1.192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bhagwat A. S., McClelland M. DNA mismatch correction by Very Short Patch repair may have altered the abundance of oligonucleotides in the E. coli genome. Nucleic Acids Res. 1992 Apr 11;20(7):1663–1668. doi: 10.1093/nar/20.7.1663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Blaisdell B. E., Campbell A. M., Karlin S. Similarities and dissimilarities of phage genomes. Proc Natl Acad Sci U S A. 1996 Jun 11;93(12):5854–5859. doi: 10.1073/pnas.93.12.5854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brendel V., Brocchieri L., Sandler S. J., Clark A. J., Karlin S. Evolutionary comparisons of RecA-like proteins across all major kingdoms of living organisms. J Mol Evol. 1997 May;44(5):528–541. doi: 10.1007/pl00006177. [DOI] [PubMed] [Google Scholar]
  7. Breslauer K. J., Frank R., Blöcker H., Marky L. A. Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A. 1986 Jun;83(11):3746–3750. doi: 10.1073/pnas.83.11.3746. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Brown J. R., Masuchi Y., Robb F. T., Doolittle W. F. Evolutionary relationships of bacterial and archaeal glutamine synthetase genes. J Mol Evol. 1994 Jun;38(6):566–576. doi: 10.1007/BF00175876. [DOI] [PubMed] [Google Scholar]
  9. Bult C. J., White O., Olsen G. J., Zhou L., Fleischmann R. D., Sutton G. G., Blake J. A., FitzGerald L. M., Clayton R. A., Gocayne J. D. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science. 1996 Aug 23;273(5278):1058–1073. doi: 10.1126/science.273.5278.1058. [DOI] [PubMed] [Google Scholar]
  10. Burge C., Campbell A. M., Karlin S. Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci U S A. 1992 Feb 15;89(4):1358–1362. doi: 10.1073/pnas.89.4.1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Castresana J., Saraste M. Evolution of energetic metabolism: the respiration-early hypothesis. Trends Biochem Sci. 1995 Nov;20(11):443–448. doi: 10.1016/s0968-0004(00)89098-2. [DOI] [PubMed] [Google Scholar]
  12. Clark C. G., Roger A. J. Direct evidence for secondary loss of mitochondria in Entamoeba histolytica. Proc Natl Acad Sci U S A. 1995 Jul 3;92(14):6518–6521. doi: 10.1073/pnas.92.14.6518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Cox E. C., Yanofsky C. Altered base ratios in the DNA of an Escherichia coli mutator strain. Proc Natl Acad Sci U S A. 1967 Nov;58(5):1895–1902. doi: 10.1073/pnas.58.5.1895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Delcourt S. G., Blake R. D. Stacking energies in DNA. J Biol Chem. 1991 Aug 15;266(23):15160–15169. [PubMed] [Google Scholar]
  15. Doolittle R. F., Feng D. F., Tsang S., Cho G., Little E. Determining divergence times of the major kingdoms of living organisms with a protein clock. Science. 1996 Jan 26;271(5248):470–477. doi: 10.1126/science.271.5248.470. [DOI] [PubMed] [Google Scholar]
  16. Doolittle W. F. At the core of the Archaea. Proc Natl Acad Sci U S A. 1996 Aug 20;93(17):8797–8799. doi: 10.1073/pnas.93.17.8797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Echols H., Goodman M. F. Fidelity mechanisms in DNA replication. Annu Rev Biochem. 1991;60:477–511. doi: 10.1146/annurev.bi.60.070191.002401. [DOI] [PubMed] [Google Scholar]
  18. Felsenstein J. Phylogenies from molecular sequences: inference and reliability. Annu Rev Genet. 1988;22:521–565. doi: 10.1146/annurev.ge.22.120188.002513. [DOI] [PubMed] [Google Scholar]
  19. Gray M. W. The endosymbiont hypothesis revisited. Int Rev Cytol. 1992;141:233–357. doi: 10.1016/s0074-7696(08)62068-9. [DOI] [PubMed] [Google Scholar]
  20. Gunsalus R. P., Yanofsky C. Nucleotide sequence and expression of Escherichia coli trpR, the structural gene for the trp aporepressor. Proc Natl Acad Sci U S A. 1980 Dec;77(12):7117–7121. doi: 10.1073/pnas.77.12.7117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Gupta R. S., Bustard K., Falah M., Singh D. Sequencing of heat shock protein 70 (DnaK) homologs from Deinococcus proteolyticus and Thermomicrobium roseum and their integration in a protein-based phylogeny of prokaryotes. J Bacteriol. 1997 Jan;179(2):345–357. doi: 10.1128/jb.179.2.345-357.1997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gupta R. S., Golding G. B. The origin of the eukaryotic cell. Trends Biochem Sci. 1996 May;21(5):166–171. [PubMed] [Google Scholar]
  23. Gupta R. S., Singh B. Phylogenetic analysis of 70 kD heat shock protein sequences suggests a chimeric origin for the eukaryotic cell nucleus. Curr Biol. 1994 Dec 1;4(12):1104–1114. doi: 10.1016/s0960-9822(00)00249-9. [DOI] [PubMed] [Google Scholar]
  24. High N. J., Deadman M. E., Moxon E. R. The role of a repetitive DNA motif (5'-CAAT-3') in the variable expression of the Haemophilus influenzae lipopolysaccharide epitope alpha Gal(1-4)beta Gal. Mol Microbiol. 1993 Sep;9(6):1275–1282. doi: 10.1111/j.1365-2958.1993.tb01257.x. [DOI] [PubMed] [Google Scholar]
  25. Hunter C. A. Sequence-dependent DNA structure. The role of base stacking interactions. J Mol Biol. 1993 Apr 5;230(3):1025–1054. doi: 10.1006/jmbi.1993.1217. [DOI] [PubMed] [Google Scholar]
  26. JOSSE J., KAISER A. D., KORNBERG A. Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J Biol Chem. 1961 Mar;236:864–875. [PubMed] [Google Scholar]
  27. Karlin S., Brendel V. Chance and statistical significance in protein and DNA sequence analysis. Science. 1992 Jul 3;257(5066):39–49. doi: 10.1126/science.1621093. [DOI] [PubMed] [Google Scholar]
  28. Karlin S., Burge C., Campbell A. M. Statistical analyses of counts and distributions of restriction sites in DNA sequences. Nucleic Acids Res. 1992 Mar 25;20(6):1363–1370. doi: 10.1093/nar/20.6.1363. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Karlin S., Burge C. Dinucleotide relative abundance extremes: a genomic signature. Trends Genet. 1995 Jul;11(7):283–290. doi: 10.1016/s0168-9525(00)89076-9. [DOI] [PubMed] [Google Scholar]
  30. Karlin S., Campbell A. M. Which bacterium is the ancestor of the animal mitochondrial genome? Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12842–12846. doi: 10.1073/pnas.91.26.12842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Karlin S., Cardon L. R. Computational DNA sequence analysis. Annu Rev Microbiol. 1994;48:619–654. doi: 10.1146/annurev.mi.48.100194.003155. [DOI] [PubMed] [Google Scholar]
  32. Karlin S., Ladunga I. Comparisons of eukaryotic genomic sequences. Proc Natl Acad Sci U S A. 1994 Dec 20;91(26):12832–12836. doi: 10.1073/pnas.91.26.12832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Karlin S., Macken C. Assessment of inhomogeneities in an E. coli physical map. Nucleic Acids Res. 1991 Aug 11;19(15):4241–4246. doi: 10.1093/nar/19.15.4241. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Karlin S., Mrázek J., Campbell A. M. Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. Nucleic Acids Res. 1996 Nov 1;24(21):4263–4272. doi: 10.1093/nar/24.21.4263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Karlin S., Mrázek J. What drives codon choices in human genes? J Mol Biol. 1996 Oct 4;262(4):459–472. doi: 10.1006/jmbi.1996.0528. [DOI] [PubMed] [Google Scholar]
  36. Karlin S. Statistical significance of sequence patterns in proteins. Curr Opin Struct Biol. 1995 Jun;5(3):360–371. doi: 10.1016/0959-440x(95)80098-0. [DOI] [PubMed] [Google Scholar]
  37. Kunkel T. A. Biological asymmetries and the fidelity of eukaryotic DNA replication. Bioessays. 1992 May;14(5):303–308. doi: 10.1002/bies.950140503. [DOI] [PubMed] [Google Scholar]
  38. Lake J. A. Origin of the eukaryotic nucleus: eukaryotes and eocytes are genotypically related. Can J Microbiol. 1989 Jan;35(1):109–118. doi: 10.1139/m89-017. [DOI] [PubMed] [Google Scholar]
  39. Lake J. A. Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci U S A. 1994 Feb 15;91(4):1455–1459. doi: 10.1073/pnas.91.4.1455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Lorenz M. G., Wackernagel W. Bacterial gene transfer by natural genetic transformation in the environment. Microbiol Rev. 1994 Sep;58(3):563–602. doi: 10.1128/mr.58.3.563-602.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Mongold J. A. DNA repair and the evolution of transformation in Haemophilus influenzae. Genetics. 1992 Dec;132(4):893–898. doi: 10.1093/genetics/132.4.893. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Moxon E. R., Rainey P. B., Nowak M. A., Lenski R. E. Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr Biol. 1994 Jan 1;4(1):24–33. doi: 10.1016/s0960-9822(00)00005-1. [DOI] [PubMed] [Google Scholar]
  43. Olsen G. J., Woese C. R., Overbeek R. The winds of (evolutionary) change: breathing new life into microbiology. J Bacteriol. 1994 Jan;176(1):1–6. doi: 10.1128/jb.176.1.1-6.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Otwinowski Z., Schevitz R. W., Zhang R. G., Lawson C. L., Joachimiak A., Marmorstein R. Q., Luisi B. F., Sigler P. B. Crystal structure of trp repressor/operator complex at atomic resolution. Nature. 1988 Sep 22;335(6188):321–329. doi: 10.1038/335321a0. [DOI] [PubMed] [Google Scholar]
  45. Phillips G. J., Arnold J., Ivarie R. Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis. Nucleic Acids Res. 1987 Mar 25;15(6):2611–2626. doi: 10.1093/nar/15.6.2611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Rafferty J. B., Somers W. S., Saint-Girons I., Phillips S. E. Three-dimensional crystal structures of Escherichia coli met repressor with and without corepressor. Nature. 1989 Oct 26;341(6244):705–710. doi: 10.1038/341705a0. [DOI] [PubMed] [Google Scholar]
  47. Rivera M. C., Lake J. A. Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. Science. 1992 Jul 3;257(5066):74–76. doi: 10.1126/science.1621096. [DOI] [PubMed] [Google Scholar]
  48. Roger A. J., Brown J. R. A chimeric origin for eukaryotes re-examined. Trends Biochem Sci. 1996 Oct;21(10):370–372. [PubMed] [Google Scholar]
  49. Russell G. J., Subak-Sharpe J. H. Similarity of the general designs of protochordates and invertebrates. Nature. 1977 Apr 7;266(5602):533–536. doi: 10.1038/266533a0. [DOI] [PubMed] [Google Scholar]
  50. Russell G. J., Walker P. M., Elton R. A., Subak-Sharpe J. H. Doublet frequency analysis of fractionated vertebrate nuclear DNA. J Mol Biol. 1976 Nov;108(1):1–23. doi: 10.1016/s0022-2836(76)80090-3. [DOI] [PubMed] [Google Scholar]
  51. Solomon J. M., Grossman A. D. Who's competent and when: regulation of natural genetic competence in bacteria. Trends Genet. 1996 Apr;12(4):150–155. doi: 10.1016/0168-9525(96)10014-7. [DOI] [PubMed] [Google Scholar]
  52. Tiboni O., Cammarano P., Sanangelantoni A. M. Cloning and sequencing of the gene encoding glutamine synthetase I from the archaeum Pyrococcus woesei: anomalous phylogenies inferred from analysis of archaeal and bacterial glutamine synthetase I sequences. J Bacteriol. 1993 May;175(10):2961–2969. doi: 10.1128/jb.175.10.2961-2969.1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Viale A. M., Arakaki A. K. The chaperone connection to the origins of the eukaryotic organelles. FEBS Lett. 1994 Mar 21;341(2-3):146–151. doi: 10.1016/0014-5793(94)80446-x. [DOI] [PubMed] [Google Scholar]
  54. Woese C. R. Bacterial evolution. Microbiol Rev. 1987 Jun;51(2):221–271. doi: 10.1128/mr.51.2.221-271.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Woese C. R., Kandler O., Wheelis M. L. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A. 1990 Jun;87(12):4576–4579. doi: 10.1073/pnas.87.12.4576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Woese C. R. Whither microbiology? Phylogenetic trees. Curr Biol. 1996 Sep 1;6(9):1060–1063. doi: 10.1016/s0960-9822(02)70664-7. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Bacteriology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES