Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 1992 Jul 25;20(14):3651–3657. doi: 10.1093/nar/20.14.3651

The application of Markov chain analysis to oligonucleotide frequency prediction and physical mapping of Drosophila melanogaster.

A J Cuticchia 1, R Ivarie 1, J Arnold 1
PMCID: PMC334014  PMID: 1641330

Abstract

Here we compare several methods for predicting oligonucleotide frequencies in 691 kb of Drosophila melanogaster DNA. As in previous work on Escherichia coli and Saccharomyces cerevisiae, a relatively simple equation based on tetranucleotide frequencies can be used in predicting frequencies of higher order oligonucleotides. For example, the mean of observed/expected abundances of 4,096 hexamers was 1.07 with a sample standard deviation of .55. This simple predictor arises by considering each base on the sense strand of D. melanogaster to depend only on the three bases 5' to it (a 3rd order Markov chain) and is more accurate than the random predictor. This equation is useful in predicting restriction enzyme fragment sizes, selecting restriction enzymes that cut preferentially in coding vs noncoding regions, and in selecting probes to fingerprint clones in contig mapping. Once again, this equation well predicts the occurrence of higher order oligonucleotides, supporting our hypothesis that this predictor holds in evolutionarily diverse organisms. When ranked from highest to lowest abundance, the observed frequencies of oligomers of a given length are closely tracked by the predicted abundances of a 3rd order Markov chain. Through use of the dependence of oligomer frequencies on base composition, we report a list of oligomers that will be useful for the completion of a cosmid physical map of D. melanogaster. Presently, the library is such that it will be possible to construct large contigs using only 30 oligonucleotide probes to fingerprint cosmids.

Full text

PDF
3651

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Arnold J., Cuticchia A. J., Newsome D. A., Jennings W. W., 3rd, Ivarie R. Mono- through hexanucleotide composition of the sense strand of yeast DNA: a Markov chain analysis. Nucleic Acids Res. 1988 Jul 25;16(14B):7145–7158. doi: 10.1093/nar/16.14.7145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Arnold J., Eckenrode V. K., Lemke K., Phillips G. J., Schaeffer S. W. A comprehensive package for DNA sequence analysis in FORTRAN IV for the PDP-11. Nucleic Acids Res. 1986 Jan 10;14(1):239–254. doi: 10.1093/nar/14.1.239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Burke D. T., Carle G. F., Olson M. V. Cloning of large segments of exogenous DNA into yeast by means of artificial chromosome vectors. Science. 1987 May 15;236(4803):806–812. doi: 10.1126/science.3033825. [DOI] [PubMed] [Google Scholar]
  4. Coulson A., Sulston J., Brenner S., Karn J. Toward a physical map of the genome of the nematode Caenorhabditis elegans. Proc Natl Acad Sci U S A. 1986 Oct;83(20):7821–7825. doi: 10.1073/pnas.83.20.7821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Daniels D. L., Blattner F. R. Mapping using gene encyclopaedias. 1987 Feb 26-Mar 4Nature. 325(6107):831–832. doi: 10.1038/325831a0. [DOI] [PubMed] [Google Scholar]
  6. Green E. D., Olson M. V. Chromosomal region of the cystic fibrosis gene in yeast artificial chromosomes: a model for human genome mapping. Science. 1990 Oct 5;250(4977):94–98. doi: 10.1126/science.2218515. [DOI] [PubMed] [Google Scholar]
  7. Hoheisel J. D., Lennon G. G., Zehetner G., Lehrach H. Use of high coverage reference libraries of Drosophila melanogaster for relational data analysis. A step towards mapping and sequencing of the genome. J Mol Biol. 1991 Aug 20;220(4):903–914. doi: 10.1016/0022-2836(91)90362-a. [DOI] [PubMed] [Google Scholar]
  8. Kohara Y., Akiyama K., Isono K. The physical map of the whole E. coli chromosome: application of a new strategy for rapid analysis and sorting of a large genomic library. Cell. 1987 Jul 31;50(3):495–508. doi: 10.1016/0092-8674(87)90503-4. [DOI] [PubMed] [Google Scholar]
  9. Lohe A. R., Brutlag D. L. Multiplicity of satellite DNA sequences in Drosophila melanogaster. Proc Natl Acad Sci U S A. 1986 Feb;83(3):696–700. doi: 10.1073/pnas.83.3.696. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Manning J. E., Schmid C. W., Davidson N. Interspersion of repetitive and nonrepetitive DNA sequences in the Drosophila melanogaster genome. Cell. 1975 Feb;4(2):141–155. doi: 10.1016/0092-8674(75)90121-x. [DOI] [PubMed] [Google Scholar]
  11. Merriam J., Ashburner M., Hartl D. L., Kafatos F. C. Toward cloning and mapping the genome of Drosophila. Science. 1991 Oct 11;254(5029):221–225. doi: 10.1126/science.254.5029.221. [DOI] [PubMed] [Google Scholar]
  12. O'Hare K., Rubin G. M. Structures of P transposable elements and their sites of insertion and excision in the Drosophila melanogaster genome. Cell. 1983 Aug;34(1):25–35. doi: 10.1016/0092-8674(83)90133-2. [DOI] [PubMed] [Google Scholar]
  13. Olson M. V., Dutchik J. E., Graham M. Y., Brodeur G. M., Helms C., Frank M., MacCollin M., Scheinman R., Frank T. Random-clone strategy for genomic restriction mapping in yeast. Proc Natl Acad Sci U S A. 1986 Oct;83(20):7826–7830. doi: 10.1073/pnas.83.20.7826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Peterson R. C. Prediction of the frequencies of restriction endonuclease recognition sequences using di- and mononucleotide frequencies. Biotechniques. 1988 Jan;6(1):34–40. [PubMed] [Google Scholar]
  15. Phillips G. J., Arnold J., Ivarie R. Mono- through hexanucleotide composition of the Escherichia coli genome: a Markov chain analysis. Nucleic Acids Res. 1987 Mar 25;15(6):2611–2626. doi: 10.1093/nar/15.6.2611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Phillips G. J., Arnold J., Ivarie R. The effect of codon usage on the oligonucleotide composition of the E. coli genome and identification of over- and underrepresented sequences by Markov chain analysis. Nucleic Acids Res. 1987 Mar 25;15(6):2627–2638. doi: 10.1093/nar/15.6.2627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Schmid C. W., Manning J. E., Davidson N. Inverted repeat sequences in the Drosophila genome. Cell. 1975 Jun;5(2):159–172. doi: 10.1016/0092-8674(75)90024-0. [DOI] [PubMed] [Google Scholar]
  18. Smith C. L., Econome J. G., Schutt A., Klco S., Cantor C. R. A physical map of the Escherichia coli K12 genome. Science. 1987 Jun 12;236(4807):1448–1453. doi: 10.1126/science.3296194. [DOI] [PubMed] [Google Scholar]
  19. Spradling A. C., Rubin G. M. Drosophila genome organization: conserved and dynamic aspects. Annu Rev Genet. 1981;15:219–264. doi: 10.1146/annurev.ge.15.120181.001251. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES