Abstract
The complete Haemophilus influenzae genome (1.83 Mb, Rd strain) provides opportunities for characterizing global genomic inhomogeneities and for detecting important sequence signals. Along these lines, new methods for identifying frequent words (oligonucleotides and/or peptides) and their distributions are applied to the H.influenzae genome with some comparisons and contrasts made with frequent words of other bacterial genomes. Three major classes of frequent oligonucleotides stand out: (i) oligos related to the familiar uptake signal sequences (USSs), AAGTGCGGT (USS+) and its inverted complement (USS-), (ii) multiple tetranucleotide iterations and (iii) intergenic dyad sequences (ISDs) found as AAGCCCACCCTAC and its dyad form. The USS+ and USS- occur in almost equal counts, are remarkably evenly spaced around the genome, and appear predominantly in the same reading frame of protein coding domains (USS+ translated to Ser-Ala-Val, USS- translated to Thr-Ala-Leu). These observations suggest that USSs contribute to global genomic functions, for example, in replication and/or repair processes, or as membrane attachment sites, or as sequences helping to pack DNA. The long tetranucleotide iterations, virtually unique to H.influenzae (i.e., unknown in other prokaryotes), through polymerase slippage during replication and/or homologous recombination may produce subpopulations expressing alternative proteins. The 13 bp frequent IDS words, invariably intergenic, occur mostly in clusters and provide potential for complex secondary structures suggesting that these sequences may be important signals for regulating the activity of their flanking genes. The frequent oligopeptides of H.influenzae are principally of two kinds--those induced by oligonucleotide frequent words (USSs, tetranucleotide iterations), and those associated with ATP or GTP binding sites that are generally composed of three motifs: the A-box which contributes to delineating the binding pocket; the B-box which functions in hydrolysis; and the C-box whose function is unknown. The A-box occurs fairly universally in prokaryotes and eukaryotes. The B- and C-motifs appear to be specialized to various functional groups (e.g., transport, recombination, chaperone activity). Other putative motifs correspond to homologs of Escherichia coli motifs, for example, are associated with proteins of transcriptional processing, aminoacyl-tRNA synthetases and proteins functioning in electron transfer.
Full Text
The Full Text of this article is available as a PDF (163.5 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Bernstein H., Byerly H. C., Hopf F. A., Michod R. E. The evolutionary role of recombinational repair and sex. Int Rev Cytol. 1985;96:1–28. doi: 10.1016/s0074-7696(08)60592-6. [DOI] [PubMed] [Google Scholar]
- Blaisdell B. E., Rudd K. E., Matin A., Karlin S. Significant dispersed recurrent DNA sequences in the Escherichia coli genome. Several new groups. J Mol Biol. 1993 Feb 20;229(4):833–848. doi: 10.1006/jmbi.1993.1090. [DOI] [PubMed] [Google Scholar]
- Brosius J. More Haemophilus and Mycoplasma genes. Science. 1996 Mar 1;271(5253):1302–1304. [PubMed] [Google Scholar]
- Burge C., Campbell A. M., Karlin S. Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci U S A. 1992 Feb 15;89(4):1358–1362. doi: 10.1073/pnas.89.4.1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fleischmann R. D., Adams M. D., White O., Clayton R. A., Kirkness E. F., Kerlavage A. R., Bult C. J., Tomb J. F., Dougherty B. A., Merrick J. M. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science. 1995 Jul 28;269(5223):496–512. doi: 10.1126/science.7542800. [DOI] [PubMed] [Google Scholar]
- Higgins C. F., Gallagher M. P., Mimmack M. L., Pearce S. R. A family of closely related ATP-binding subunits from prokaryotic and eukaryotic cells. Bioessays. 1988 Apr;8(4):111–116. doi: 10.1002/bies.950080406. [DOI] [PubMed] [Google Scholar]
- High N. J., Deadman M. E., Moxon E. R. The role of a repetitive DNA motif (5'-CAAT-3') in the variable expression of the Haemophilus influenzae lipopolysaccharide epitope alpha Gal(1-4)beta Gal. Mol Microbiol. 1993 Sep;9(6):1275–1282. doi: 10.1111/j.1365-2958.1993.tb01257.x. [DOI] [PubMed] [Google Scholar]
- Kahn M. E., Smith H. O. Transformation in Haemophilus: a problem in membrane biology. J Membr Biol. 1984;81(2):89–103. doi: 10.1007/BF01868974. [DOI] [PubMed] [Google Scholar]
- Karlin S., Bucher P., Brendel V., Altschul S. F. Statistical methods and insights for protein and DNA sequences. Annu Rev Biophys Biophys Chem. 1991;20:175–203. doi: 10.1146/annurev.bb.20.060191.001135. [DOI] [PubMed] [Google Scholar]
- Karlin S., Cardon L. R. Computational DNA sequence analysis. Annu Rev Microbiol. 1994;48:619–654. doi: 10.1146/annurev.mi.48.100194.003155. [DOI] [PubMed] [Google Scholar]
- Kleffe J., Borodovsky M. First and second moment of counts of words in random texts generated by Markov chains. Comput Appl Biosci. 1992 Oct;8(5):433–441. doi: 10.1093/bioinformatics/8.5.433. [DOI] [PubMed] [Google Scholar]
- Krawiec S., Riley M. Organization of the bacterial chromosome. Microbiol Rev. 1990 Dec;54(4):502–539. doi: 10.1128/mr.54.4.502-539.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lorenz M. G., Wackernagel W. Bacterial gene transfer by natural genetic transformation in the environment. Microbiol Rev. 1994 Sep;58(3):563–602. doi: 10.1128/mr.58.3.563-602.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mongold J. A. DNA repair and the evolution of transformation in Haemophilus influenzae. Genetics. 1992 Dec;132(4):893–898. doi: 10.1093/genetics/132.4.893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moxon E. R., Rainey P. B., Nowak M. A., Lenski R. E. Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr Biol. 1994 Jan 1;4(1):24–33. doi: 10.1016/s0960-9822(00)00005-1. [DOI] [PubMed] [Google Scholar]
- Mrázek J., Karlin S. A new significant recurrent dyad pairing in Haemophilus influenzae. Trends Biochem Sci. 1996 Jun;21(6):201–202. doi: 10.1016/s0968-0004(96)80015-6. [DOI] [PubMed] [Google Scholar]
- Nagel G. M., Doolittle R. F. Phylogenetic analysis of the aminoacyl-tRNA synthetases. J Mol Evol. 1995 May;40(5):487–498. doi: 10.1007/BF00166617. [DOI] [PubMed] [Google Scholar]
- Olyhoek A. J., Sarkari J., Bopp M., Morelli G., Achtman M. Cloning and expression in Escherichia coli of opc, the gene for an unusual class 5 outer membrane protein from Neisseria meningitidis (meningococci/surface antigen). Microb Pathog. 1991 Oct;11(4):249–257. doi: 10.1016/0882-4010(91)90029-a. [DOI] [PubMed] [Google Scholar]
- Rawlings N. D., Barrett A. J. Families of serine peptidases. Methods Enzymol. 1994;244:19–61. doi: 10.1016/0076-6879(94)44004-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson N. J., Robinson P. J., Gupta A., Bleasby A. J., Whitton B. A., Morby A. P. Singular over-representation of an octameric palindrome, HIP1, in DNA from many cyanobacteria. Nucleic Acids Res. 1995 Mar 11;23(5):729–735. doi: 10.1093/nar/23.5.729. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robison K., Gilbert W., Church G. M. More Haemophilus and Mycoplasma genes. Science. 1996 Mar 1;271(5253):1302–1304. [PubMed] [Google Scholar]
- Schbath S., Prum B., de Turckheim E. Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences. J Comput Biol. 1995 Fall;2(3):417–437. doi: 10.1089/cmb.1995.2.417. [DOI] [PubMed] [Google Scholar]
- Smith H. O., Tomb J. F., Dougherty B. A., Fleischmann R. D., Venter J. C. Frequency and distribution of DNA uptake signal sequences in the Haemophilus influenzae Rd genome. Science. 1995 Jul 28;269(5223):538–540. doi: 10.1126/science.7542802. [DOI] [PubMed] [Google Scholar]
- Solomon J. M., Grossman A. D. Who's competent and when: regulation of natural genetic competence in bacteria. Trends Genet. 1996 Apr;12(4):150–155. doi: 10.1016/0168-9525(96)10014-7. [DOI] [PubMed] [Google Scholar]
- Stern A., Brown M., Nickel P., Meyer T. F. Opacity genes in Neisseria gonorrhoeae: control of phase and antigenic variation. Cell. 1986 Oct 10;47(1):61–71. doi: 10.1016/0092-8674(86)90366-1. [DOI] [PubMed] [Google Scholar]
- Tatusov R. L., Mushegian A. R., Bork P., Brown N. P., Hayes W. S., Borodovsky M., Rudd K. E., Koonin E. V. Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol. 1996 Mar 1;6(3):279–291. doi: 10.1016/s0960-9822(02)00478-5. [DOI] [PubMed] [Google Scholar]
- Walker J. E., Saraste M., Runswick M. J., Gay N. J. Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. EMBO J. 1982;1(8):945–951. doi: 10.1002/j.1460-2075.1982.tb01276.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Willems R., Paul A., van der Heide H. G., ter Avest A. R., Mooi F. R. Fimbrial phase variation in Bordetella pertussis: a novel mechanism for transcriptional regulation. EMBO J. 1990 Sep;9(9):2803–2809. doi: 10.1002/j.1460-2075.1990.tb07468.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
