Skip to main content
Genetics logoLink to Genetics
. 2001 Jul;158(3):1321–1327. doi: 10.1093/genetics/158.3.1321

Disparity index: a simple statistic to measure and test the homogeneity of substitution patterns between molecular sequences.

S Kumar 1, S R Gadagkar 1
PMCID: PMC1461708  PMID: 11454778

Abstract

A common assumption in comparative sequence analysis is that the sequences have evolved with the same pattern of nucleotide substitution (homogeneity of the evolutionary process). Violation of this assumption is known to adversely impact the accuracy of phylogenetic inference and tests of evolutionary hypotheses. Here we propose a disparity index, ID, which measures the observed difference in evolutionary patterns for a pair of sequences. On the basis of this index, we have developed a Monte Carlo procedure to test the homogeneity of the observed patterns. This test does not require a priori knowledge of the pattern of substitutions, extent of rate heterogeneity among sites, or the evolutionary relationship among sequences. Computer simulations show that the ID-test is more powerful than the commonly used chi2-test under a variety of biologically realistic models of sequence evolution. An application of this test in an analysis of 3789 pairs of orthologous human and mouse protein-coding genes reveals that the observed evolutionary patterns in neutral sites are not homogeneous in 41% of the genes, apparently due to shifts in G + C content. Thus, the proposed test can be used as a diagnostic tool to identify genes and lineages that have evolved with substantially different evolutionary processes as reflected in the observed patterns of change. Identification of such genes and lineages is an important early step in comparative genomics and molecular phylogenetic studies to discover evolutionary processes that have shaped organismal genomes.

Full Text

The Full Text of this article is available as a PDF (152.1 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Bernardi G. Isochores and the evolutionary genomics of vertebrates. Gene. 2000 Jan 4;241(1):3–17. doi: 10.1016/s0378-1119(99)00485-0. [DOI] [PubMed] [Google Scholar]
  2. Cornish-Bowden A. Assessment of protein sequence identity from amino acid composition data. J Theor Biol. 1977 Apr 21;65(4):735–742. doi: 10.1016/0022-5193(77)90019-4. [DOI] [PubMed] [Google Scholar]
  3. Duret L., Mouchiroud D., Gouy M. HOVERGEN: a database of homologous vertebrate genes. Nucleic Acids Res. 1994 Jun 25;22(12):2360–2365. doi: 10.1093/nar/22.12.2360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Funk D. J., Futuyma D. J., Ortí G., Meyer A. Mitochondrial DNA sequences and multiple data sets: a phylogenetic study of phytophagous beetles (Chrysomelidae: Ophraella). Mol Biol Evol. 1995 Jul;12(4):627–640. doi: 10.1093/oxfordjournals.molbev.a040242. [DOI] [PubMed] [Google Scholar]
  5. Galtier N., Gouy M. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol. 1998 Jul;15(7):871–879. doi: 10.1093/oxfordjournals.molbev.a025991. [DOI] [PubMed] [Google Scholar]
  6. Hasegawa M., Hashimoto T., Adachi J., Iwabe N., Miyata T. Early branchings in the evolution of eukaryotes: ancient divergence of entamoeba that lacks mitochondria revealed by protein sequence data. J Mol Evol. 1993 Apr;36(4):380–388. doi: 10.1007/BF00182185. [DOI] [PubMed] [Google Scholar]
  7. Hasegawa M., Kishino H., Yano T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985;22(2):160–174. doi: 10.1007/BF02101694. [DOI] [PubMed] [Google Scholar]
  8. Kumar S., Gadagkar S. R., Filipski A., Gu X. Determination of the number of conserved chromosomal segments between species. Genetics. 2001 Mar;157(3):1387–1395. doi: 10.1093/genetics/157.3.1387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Naylor G. J., Brown W. M. Amphioxus mitochondrial DNA, chordate phylogeny, and the limits of inference based on comparisons of sequences. Syst Biol. 1998 Mar;47(1):61–76. doi: 10.1080/106351598261030. [DOI] [PubMed] [Google Scholar]
  10. Rodríguez-Trelles F., Tarrío R., Ayala F. J. Evidence for a high ancestral GC content in Drosophila. Mol Biol Evol. 2000 Nov;17(11):1710–1717. doi: 10.1093/oxfordjournals.molbev.a026269. [DOI] [PubMed] [Google Scholar]
  11. Rzhetsky A., Nei M. Tests of applicability of several substitution models for DNA sequence data. Mol Biol Evol. 1995 Jan;12(1):131–151. doi: 10.1093/oxfordjournals.molbev.a040182. [DOI] [PubMed] [Google Scholar]
  12. Saitou N., Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987 Jul;4(4):406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
  13. Steel M. A., Lockhart P. J., Penny D. Confidence in evolutionary trees from biological sequence data. Nature. 1993 Jul 29;364(6436):440–442. doi: 10.1038/364440a0. [DOI] [PubMed] [Google Scholar]
  14. Tarrío R., Rodríguez-Trelles F., Ayala F. J. Tree rooting with outgroups when they differ in their nucleotide composition from the ingroup: the Drosophila saltans and willistoni groups, a case study. Mol Phylogenet Evol. 2000 Sep;16(3):344–349. doi: 10.1006/mpev.2000.0813. [DOI] [PubMed] [Google Scholar]

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES