Abstract
In this article, we propose two well-defined distance metrics of biological sequences based on a universal complexity profile. To illustrate our metrics, phylogenetic trees of 18 Eutherian mammals from comparison of their mtDNA sequences and 24 coronaviruses using the whole genomes are constructed. The resulting monophyletic clusters agree well with the established taxonomic groups.
Keywords: Sequence complexity, mtDNA, SARS-CoV, Phylogenetic analysis
References
- Snel B., Bork P., Huynen M.A. Nat. Genet. 1999;21:108. doi: 10.1038/5052. [DOI] [PubMed] [Google Scholar]
- Campbell A., Mrazek J., Karlin S. Proc. Natl. Acad. Sci. USA. 1999;96:9184. doi: 10.1073/pnas.96.16.9184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karlin S., Ladunga I. Proc. Natl. Acad. Sci. USA. 1994;91:12832. doi: 10.1073/pnas.91.26.12832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Karlin S., Mrázek J. Proc. Natl. Acad. Sci. USA. 1997;94:10227. doi: 10.1073/pnas.94.19.10227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blaisdell B.E. Proc. Natl. Acad. Sci. USA. 1986;83:5155. doi: 10.1073/pnas.83.14.5155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li M., Badger J.H., Chen X., Kwong S., Kearney P., Zhang H.Y. Bioinformatics. 2001;17:149. doi: 10.1093/bioinformatics/17.2.149. [DOI] [PubMed] [Google Scholar]
- Otu H.H., Sayood K. Bioinformatics. 2003;19:2122. doi: 10.1093/bioinformatics/btg295. [DOI] [PubMed] [Google Scholar]
- C. Li, J. Wang, Similarity analysis of DNA sequences based on the generalized LZ complexity of (0,1)-sequences, Preprint, J. Math. Chem. (2006)
- Cover T.M., Thomas J.A. Elements of Information Theory. Beijing: Tsinghua University Press; 2003. [Google Scholar]
- Orlov Y.L., Potapov V.N. Nucleic Acids Res. 2004;32:628. doi: 10.1093/nar/gkh466. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jiang T., Xu Y., Zhang M.Q. Current Topics in Computational Molecular Biology. Tsinghua and Cambridge: Tsinghua University Press and The MIT Press; 2002. pp. 157–171. [Google Scholar]
- Lempel A., Ziv J. IEEE T. Inform. Theory. 1976;22:75. doi: 10.1109/TIT.1976.1055501. [DOI] [Google Scholar]
- Ziv J., Lempel A. IEEE T. Inform. Theory. 1977;23:337. doi: 10.1109/TIT.1977.1055714. [DOI] [Google Scholar]
- Ziv J., Lempel A. IEEE T. Inform. Theory. 1978;24:530. doi: 10.1109/TIT.1978.1055934. [DOI] [Google Scholar]
- Li B., Li Y.B., He H.B. Geno. Prot. Bioinfo. 2005;3:206. [Google Scholar]
- J. Felsenstein, PHYLIP (Phylogeny Inference Package) version 3.5c. Department of Genetics, University of Washington, Seattle (1993)
- Page R.D. Comput. Appl. Biosci. 1996;12:357. doi: 10.1093/bioinformatics/12.4.357. [DOI] [PubMed] [Google Scholar]
- Marra M.A., et al. Science. 2003;300:1399. doi: 10.1126/science.1085953. [DOI] [PubMed] [Google Scholar]
- Rota P.A., et al. Science. 2003;300:1394. doi: 10.1126/science.1085952. [DOI] [PubMed] [Google Scholar]
- Zheng W.C., Chen L.L., Ou H.Y., Gao F., Zhang C.T. Mol. Phylogenet. Evol. 2005;36:224. doi: 10.1016/j.ympev.2005.03.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liò P., Goldman N. Trends Microbiol. 2004;12:106. doi: 10.1016/j.tim.2004.01.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Snijder E.J., Bredenbeek P.J., Dobbe J.C., Thiel V., Ziebuhr J., Poon L.L., Guan Y., Rozanov M., Spaan W.J., Gorbalenya A.E. J. Mol. Biol. 2003;331:991. doi: 10.1016/S0022-2836(03)00865-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vinga S., Almeida J. Bioinformatics. 2003;19:513. doi: 10.1093/bioinformatics/btg005. [DOI] [PubMed] [Google Scholar]
- Nandy A. Curr. Sci. 1994;66:309. [Google Scholar]
- Nandy A., Nandy P. Chem. Phys. Lett. 2003;368:102. doi: 10.1016/S0009-2614(02)01830-4. [DOI] [Google Scholar]
- Randić M., Balaban A.T., Novič M., Založnik A., Pisanski T. Period. Biol. 2005;107:403. [Google Scholar]
- Randić M., Butina D., Zupan J. Chem. Phys. Lett. 2006;419:528. doi: 10.1016/j.cplett.2005.11.091. [DOI] [Google Scholar]
- Zhang R., Zhang C.T. J. Biomol. Struc. Dyn. 1994;11:767. doi: 10.1080/07391102.1994.10508031. [DOI] [PubMed] [Google Scholar]
- Liao B., Li R., Zhu W., Xiang X. J. Math. Chem. 2007;42:47. doi: 10.1007/s10910-006-9091-z. [DOI] [Google Scholar]
- Y.S. Zhang, M.S. Tan, Visualization of DNA sequences based on 3DD-Curves, Preprint, J. Math. Chem. (2007)
