Skip to main content
Journal of Biological Physics logoLink to Journal of Biological Physics
. 2002 Sep;28(3):439–447. doi: 10.1023/A:1020316706928

Phylogeny Based on Whole Genome as inferred from Complete Information Set Analysis

W Li 1, W Fang 2, L Ling 1, J Wang 1, Z Xuan 1, R Chen 1,
PMCID: PMC3456743  PMID: 23345787

Abstract

Previous molecular phylogeny algorithms mainly rely onmulti-sequence alignments of cautiously selected characteristic sequences,thus not directly appropriate for whole genome phylogeny where eventssuch as rearrangements make full-length alignments impossible. Weintroduce here the concept of Complete Information Set (CIS) and itsmeasurement implementation as evolution distance without reference tosizes. As method proof-test, the 16s rRNA sequences of 22 completelysequenced Bacteria and Archaea species are used to reconstruct aphylogenetic tree, which is generally consistent with the commonlyaccepted one. Based on whole genome, our further efforts yield a highlyrobust whole genome phylogenetic tree, supporting separate monophyleticcluster of species with similar phenotype as well as the early evolution ofthermophilic Bacteria and late diverging of Eukarya. The purpose of thiswork is not to contradict or confirm previous phylogeny standards butrather to bring a brand-new algorithm and tool to the phylogeny researchcommunity. The software to estimate the sequence distance and materialsused in this study are available upon request to corresponding author.

Keywords: comparative genomics, information discrepancy, molecular evolution, sequence analysis

Full Text

The Full Text of this article is available as a PDF (89.8 KB).

References

  • 1.Koonin E.V. The Emerging Paradigm and Open Problems in Comparative Genomics. Bioinformatics. 1999;15:265–266. doi: 10.1093/bioinformatics/15.4.265. [DOI] [PubMed] [Google Scholar]
  • 2.Woese C.R., Kandler O., Wheelis M.L. Towards a Natural System of Organisms: Proposal for the Domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. USA. 1990;87:4576–4579. doi: 10.1073/pnas.87.12.4576. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Doolittle W.F., Logsdon J.M., Jr. Archaeal Genomics: Do Archaea have a Mixed Heritage? Curr. Biol. 1998;8:R209–211. doi: 10.1016/s0960-9822(98)70127-7. [DOI] [PubMed] [Google Scholar]
  • 4.Woese C. The Universal Ancestor. Proc. Natl. Acad. Sci. USA. 1998;95:6854–6859. doi: 10.1073/pnas.95.12.6854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Nomura M. Engineering of Bacterial Ribosomes: Replacement of all Seven Escherichia colirRNA Operons by a Single Plasmid-Encoded Operon. Proc. Natl. Acad. Sci. USA. 1999;96:1820–1822. doi: 10.1073/pnas.96.5.1820. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Pennisi E. Is it Time to Uproot the Tree of Life? Science. 1999;284:1305–1307. doi: 10.1126/science.284.5418.1305. [DOI] [PubMed] [Google Scholar]
  • 7.Boore J.L., Brown W.M. Big Trees from Little Genomes: Mitochondrial Gene Order as a Phylogenetic Tool. Curr. Opin. Genet. Dev. 1998;8:668–674. doi: 10.1016/s0959-437x(98)80035-x. [DOI] [PubMed] [Google Scholar]
  • 8.Snel B., Bork P., Huynen M.A. Genome Phylogeny Based on Gene Content. Nat. Genet. 1999;21:108–110. doi: 10.1038/5052. [DOI] [PubMed] [Google Scholar]
  • 9.Lin J., Gerstein M. Whole-Genome Trees based on the Occurrence of Folds and Orthologs: Implications for Comparing Genomes on Different Levels. Genome Res. 2000;10:808–818. doi: 10.1101/gr.10.6.808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Brown J.R., Douady C.J., Italia M.J., Marshall W.E., Stanhope M.J. Universal Trees based on Large Combined Protein Sequence Data Sets. Nat. Genet. 2001;28:281–285. doi: 10.1038/90129. [DOI] [PubMed] [Google Scholar]
  • 11.Li M., et al. An Information-Based Sequence Distance and its Application to Whole Mitochondrial Genome Phylogeny. Bioinformatics. 2001;17:149–154. doi: 10.1093/bioinformatics/17.2.149. [DOI] [PubMed] [Google Scholar]
  • 12.Hariri A., Weber B., Olmsted J. 3rd. On the Validity of Shannon-Information Calculations for Molecular Biological Sequences. J. Theor. Biol. 1990;147:235–254. doi: 10.1016/s0022-5193(05)80054-2. [DOI] [PubMed] [Google Scholar]
  • 13.Fang W.W. The Characterization of a Measure of Information Discrepancy. Information. 2000;125:207–252. [Google Scholar]
  • 14.Fang W.W. On a Global Optimization Problem in the Study of Information Discrepancy. J. Global Optimization. 1997;11:387–408. [Google Scholar]
  • 15.Kullback, S.: Information Theory and Statistics, Wiley, New York, 1959.
  • 16.Saitou N., Nei M. The Neighbor-Joining Method: A new Method for Reconstructing Phylogenetic Trees. Mol. Biol. Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
  • 17.Efron B., Halloran E., Holmes S. Bootstrap Confidence Levels for Phylogenetic Trees. Proc. Natl. Acad. Sci. USA. 1996;93:13429–13434. doi: 10.1073/pnas.93.23.13429. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Thompson J.D., Higgins D.G., Gibson T.J. CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through SequenceWeighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hillis D.M., Huelsenbeck J.P., Swofford D.L. Hobgoblin of Phylogenetics? Nature. 1994;369:363–364. doi: 10.1038/369363a0. [DOI] [PubMed] [Google Scholar]
  • 20.Russo C.A., Takezaki N., Nei M. Efficiencies of Different Genes and Different Tree-Building Methods in Recovering a Known Vertebrate Phylogeny. Mol. Biol. Evol. 1996;13:525–536. doi: 10.1093/oxfordjournals.molbev.a025613. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Biological Physics are provided here courtesy of Springer Science+Business Media B.V.

RESOURCES