Abstract
Recently, the application of two statistical methods (related to Zipf's distribution and Shannon's redundancy), called 'linguistic' tests, to the primary structure of DNA sequences of living organisms has excited considerable interest. Of particular importance is the claim that noncoding DNA sequences in eukaryotes display specific 'linguistic' features, being reminiscent of natural languages. Furthermore, this implies that noncoding regions of DNA may carry some new, thus far unknown, biological information which is revealed by these tests. In this paper these claims are tested quantitatively. With the aid of computer simulations of natural DNA sequences, and by applying the same 'linguistic' tests to both natural and artificial sequences, we investigate in detail the reasons of the appearance of the claimed 'linguistic' features and the associated differences between coding and noncoding DNAs. The presented results show quantitatively that the 'linguistic' tests failed to reveal any new biological information in (noncoding or coding) DNA.
Full Text
The Full Text of this article is available as a PDF (90.9 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Bonhoeffer S., Herz A. V., Boerlijst M. C., Nee S., Nowak M. A., May R. M. Explaining "linguistic features" of noncoding DNA. Science. 1996 Jan 5;271(5245):14–15. [PubMed] [Google Scholar]
- Chatzidimitriou-Dreismann C. A., Streffer R. M., Larhammar D. A quantitative test of long-range correlations and compositional fluctuations in DNA sequences. Eur J Biochem. 1994 Sep 1;224(2):365–371. doi: 10.1111/j.1432-1033.1994.00365.x. [DOI] [PubMed] [Google Scholar]
- Chatzidimitriou-Dreismann C. A., Streffer R. M., Larhammar D. Variations in base pair composition and associated long-range correlations in DNA sequences--computer simulation results. Biochim Biophys Acta. 1994 Mar 1;1217(2):181–187. doi: 10.1016/0167-4781(94)90032-9. [DOI] [PubMed] [Google Scholar]
- Flam F. Hints of a language in junk DNA. Science. 1994 Nov 25;266(5189):1320–1320. doi: 10.1126/science.7973718. [DOI] [PubMed] [Google Scholar]
- Holian BL, Percus OE, Warnock TT, Whitlock PA. Pseudorandom number generator for massively parallel molecular-dynamics simulations. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics. 1994 Aug;50(2):1607–1615. doi: 10.1103/physreve.50.1607. [DOI] [PubMed] [Google Scholar]
- Karlin S., Brendel V. Patchiness and correlations in DNA sequences. Science. 1993 Jan 29;259(5095):677–680. doi: 10.1126/science.8430316. [DOI] [PubMed] [Google Scholar]
- Konopka A. K., Martindale C. Noncoding DNA, Zipf's law, and language. Science. 1995 May 12;268(5212):789–789. doi: 10.1126/science.7754361. [DOI] [PubMed] [Google Scholar]
- Larhammar D., Chatzidimitriou-Dreismann C. A. Biological origins of long-range correlations and compositional variations in DNA. Nucleic Acids Res. 1993 Nov 11;21(22):5167–5170. doi: 10.1093/nar/21.22.5167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mantegna R. N., Buldyrev S. V., Goldberger A. L., Havlin S., Peng C. K., Simons M., Stanley H. E. Linguistic features of noncoding DNA sequences. Phys Rev Lett. 1994 Dec 5;73(23):3169–3172. doi: 10.1103/PhysRevLett.73.3169. [DOI] [PubMed] [Google Scholar]
