Abstract
Here we compare several methods for predicting oligonucleotide frequencies in 392 kb of yeast DNA. As in previous work on E. coli, a relatively simple equation based on tetranucleotide frequencies can be used in predicting the frequencies of longer oligonucleotides. For example, the mean of observed/expected abundances of 4,096 hexamers was 1.00 with a sample standard deviation of .18. This simple predictor arises by considering each base on the sense strand of yeast to depend only on the three bases 5' to it (a 3rd order Markov chain) and is more accurate in estimating oligonucleotide frequencies than other statistical methods examined. This equation is useful in predicting restriction enzyme fragment sizes, selecting restriction enzymes that cut preferentially in coding vs noncoding regions, and in constructing detailed physical maps of whole genomes. When ranked highest to lowest abundance, the observed frequencies of oligomers of a given length (up to 6 bases) are closely tracked by the predicted abundances of a 3rd or 4th order Markov chain. These ordered abundance curves have a power curve shape with a broad linear range with a sharp break at the top end of the curve. There is also a strong disparity between the most and least abundant oligomer with for example a 79-fold variation between the most and least abundant hexamer. The curves reveal a strong dependence of oligomer frequencies on base composition. Unlike E. Coli, there is no sharp downturn at the low end of the curves and hence, no class of oligomers rare relative to other oligomers of the same length.
Full text
PDF













Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Almagor H. A Markov analysis of DNA sequences. J Theor Biol. 1983 Oct 21;104(4):633–645. doi: 10.1016/0022-5193(83)90251-5. [DOI] [PubMed] [Google Scholar]
- Arnold J., Eckenrode V. K., Lemke K., Phillips G. J., Schaeffer S. W. A comprehensive package for DNA sequence analysis in FORTRAN IV for the PDP-11. Nucleic Acids Res. 1986 Jan 10;14(1):239–254. doi: 10.1093/nar/14.1.239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blaisdell B. E. Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol. 1984;21(3):278–288. doi: 10.1007/BF02102360. [DOI] [PubMed] [Google Scholar]
- Burke D. T., Carle G. F., Olson M. V. Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science. 1987 May 15;236(4803):806–812. doi: 10.1126/science.3033825. [DOI] [PubMed] [Google Scholar]
- Cameron J. R., Loh E. Y., Davis R. W. Evidence for transposition of dispersed repetitive DNA families in yeast. Cell. 1979 Apr;16(4):739–751. doi: 10.1016/0092-8674(79)90090-4. [DOI] [PubMed] [Google Scholar]
- Coulson A., Sulston J., Brenner S., Karn J. Toward a physical map of the genome of the nematode Caenorhabditis elegans. Proc Natl Acad Sci U S A. 1986 Oct;83(20):7821–7825. doi: 10.1073/pnas.83.20.7821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cramer J. H., Farrelly F. W., Rownd R. H. Restriction endonuclease analysis of ribosomal DNA from Saccharomyces cerevisiae. Mol Gen Genet. 1976 Nov 17;148(3):233–241. doi: 10.1007/BF00332897. [DOI] [PubMed] [Google Scholar]
- Daniels D. L., Blattner F. R. Mapping using gene encyclopaedias. 1987 Feb 26-Mar 4Nature. 325(6107):831–832. doi: 10.1038/325831a0. [DOI] [PubMed] [Google Scholar]
- Hereford L. M., Rosbash M. Number and distribution of polyadenylated RNA sequences in yeast. Cell. 1977 Mar;10(3):453–462. doi: 10.1016/0092-8674(77)90032-0. [DOI] [PubMed] [Google Scholar]
- Hoekema A., Kastelein R. A., Vasser M., de Boer H. A. Codon replacement in the PGK1 gene of Saccharomyces cerevisiae: experimental approach to study the role of biased codon usage in gene expression. Mol Cell Biol. 1987 Aug;7(8):2914–2924. doi: 10.1128/mcb.7.8.2914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 1985 Jan;2(1):13–34. doi: 10.1093/oxfordjournals.molbev.a040335. [DOI] [PubMed] [Google Scholar]
- Ikemura T. Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J Mol Biol. 1982 Jul 15;158(4):573–597. doi: 10.1016/0022-2836(82)90250-9. [DOI] [PubMed] [Google Scholar]
- Kohara Y., Akiyama K., Isono K. The physical map of the whole E. coli chromosome: application of a new strategy for rapid analysis and sorting of a large genomic library. Cell. 1987 Jul 31;50(3):495–508. doi: 10.1016/0092-8674(87)90503-4. [DOI] [PubMed] [Google Scholar]
- McClelland M. Selection against dam methylation sites in the genomes of DNA of enterobacteriophages. J Mol Evol. 1984;21(4):317–322. doi: 10.1007/BF02115649. [DOI] [PubMed] [Google Scholar]
- Morris J., Kushner S. R., Ivarie R. The simple repeat poly(dT-dG).poly(dC-dA) common to eukaryotes is absent from eubacteria and archaebacteria and rare in protozoans. Mol Biol Evol. 1986 Jul;3(4):343–355. doi: 10.1093/oxfordjournals.molbev.a040399. [DOI] [PubMed] [Google Scholar]
- Olson M. V., Dutchik J. E., Graham M. Y., Brodeur G. M., Helms C., Frank M., MacCollin M., Scheinman R., Frank T. Random-clone strategy for genomic restriction mapping in yeast. Proc Natl Acad Sci U S A. 1986 Oct;83(20):7826–7830. doi: 10.1073/pnas.83.20.7826. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peterson R. C. Prediction of the frequencies of restriction endonuclease recognition sequences using di- and mononucleotide frequencies. Biotechniques. 1988 Jan;6(1):34–40. [PubMed] [Google Scholar]
- Phillips G. J., Arnold J., Ivarie R. Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis. Nucleic Acids Res. 1987 Mar 25;15(6):2611–2626. doi: 10.1093/nar/15.6.2611. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Phillips G. J., Arnold J., Ivarie R. The effect of codon usage on the oligonucleotide composition of the E. coli genome and identification of over- and underrepresented sequences by Markov chain analysis. Nucleic Acids Res. 1987 Mar 25;15(6):2627–2638. doi: 10.1093/nar/15.6.2627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schwartz D. C., Cantor C. R. Separation of yeast chromosome-sized DNAs by pulsed field gradient gel electrophoresis. Cell. 1984 May;37(1):67–75. doi: 10.1016/0092-8674(84)90301-5. [DOI] [PubMed] [Google Scholar]
- Sharp P. M. Molecular evolution of bacteriophages: evidence of selection against the recognition sites of host restriction enzymes. Mol Biol Evol. 1986 Jan;3(1):75–83. doi: 10.1093/oxfordjournals.molbev.a040377. [DOI] [PubMed] [Google Scholar]
- Sharp P. M., Tuohy T. M., Mosurski K. R. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 1986 Jul 11;14(13):5125–5143. doi: 10.1093/nar/14.13.5125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith C. L., Econome J. G., Schutt A., Klco S., Cantor C. R. A physical map of the Escherichia coli K12 genome. Science. 1987 Jun 12;236(4807):1448–1453. doi: 10.1126/science.3296194. [DOI] [PubMed] [Google Scholar]
