Abstract
This article describes a new Markov chain Monte Carlo (MCMC) method applicable to DNA sequence data, which treats mutations in the genealogy as missing data. The method facilitates inferences regarding the age and identity of specific mutations while taking the full complexities of the mutational process in DNA sequences into account. We demonstrate the utility of the method in three applications. First, we demonstrate how the method can be used to make inferences regarding population genetical parameters such as theta (the effective population size times the mutation rate). Second, we show how the method can be used to estimate the ages of mutations in finite sites models and for making inferences regarding the distribution and ages of nonsynonymous and synonymous mutations. The method is applied to two previously published data sets and we demonstrate that in one of the data sets the average age of nonsynonymous mutations is significantly lower than the average age of synonymous mutations, suggesting the presence of slightly deleterious mutations. Third, we demonstrate how the method in general can be used to evaluate the posterior distribution of a function of a mapping of mutations on a gene genealogy. This application is useful for evaluating the uncertainty associated with methods that rely on mapping mutations on a phylogeny or a gene genealogy.
Full Text
The Full Text of this article is available as a PDF (131.9 KB).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
- Bahlo M., Griffiths R. C. Inference from gene trees in a subdivided population. Theor Popul Biol. 2000 Mar;57(2):79–95. doi: 10.1006/tpbi.1999.1447. [DOI] [PubMed] [Google Scholar]
- Beerli P., Felsenstein J. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics. 1999 Jun;152(2):763–773. doi: 10.1093/genetics/152.2.763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bush R. M., Fitch W. M., Bender C. A., Cox N. J. Positive selection on the H3 hemagglutinin gene of human influenza virus A. Mol Biol Evol. 1999 Nov;16(11):1457–1465. doi: 10.1093/oxfordjournals.molbev.a026057. [DOI] [PubMed] [Google Scholar]
- Felsenstein J. Estimating effective population size from samples of sequences: a bootstrap Monte Carlo integration method. Genet Res. 1992 Dec;60(3):209–220. doi: 10.1017/s0016672300030962. [DOI] [PubMed] [Google Scholar]
- Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17(6):368–376. doi: 10.1007/BF01734359. [DOI] [PubMed] [Google Scholar]
- Fitch W. M., Bush R. M., Bender C. A., Cox N. J. Long term trends in the evolution of H(3) HA1 human influenza type A. Proc Natl Acad Sci U S A. 1997 Jul 22;94(15):7712–7718. doi: 10.1073/pnas.94.15.7712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goldman N., Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994 Sep;11(5):725–736. doi: 10.1093/oxfordjournals.molbev.a040153. [DOI] [PubMed] [Google Scholar]
- Huelsenbeck J. P., Rannala B., Larget B. A Bayesian framework for the analysis of cospeciation. Evolution. 2000 Apr;54(2):352–364. doi: 10.1111/j.0014-3820.2000.tb00039.x. [DOI] [PubMed] [Google Scholar]
- Kuhner M. K., Yamato J., Felsenstein J. Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics. 1995 Aug;140(4):1421–1430. doi: 10.1093/genetics/140.4.1421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuhner M. K., Yamato J., Felsenstein J. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics. 1998 May;149(1):429–434. doi: 10.1093/genetics/149.1.429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lara M. C., Patton J. L., da Silva M. N. The simultaneous diversification of South American echimyid rodents (Hystricognathi) based on complete cytochrome b sequences. Mol Phylogenet Evol. 1996 Apr;5(2):403–413. doi: 10.1006/mpev.1996.0035. [DOI] [PubMed] [Google Scholar]
- Markovtsova L., Marjoram P., Tavaré S. The age of a unique event polymorphism. Genetics. 2000 Sep;156(1):401–409. doi: 10.1093/genetics/156.1.401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mau B., Newton M. A., Larget B. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics. 1999 Mar;55(1):1–12. doi: 10.1111/j.0006-341x.1999.00001.x. [DOI] [PubMed] [Google Scholar]
- Muse S. V., Gaut B. S. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 1994 Sep;11(5):715–724. doi: 10.1093/oxfordjournals.molbev.a040152. [DOI] [PubMed] [Google Scholar]
- Nielsen R. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics. 2000 Feb;154(2):931–942. doi: 10.1093/genetics/154.2.931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nielsen R., Weinreich D. M. The age of nonsynonymous and synonymous mutations in animal mtDNA and implications for the mildly deleterious theory. Genetics. 1999 Sep;153(1):497–506. doi: 10.1093/genetics/153.1.497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rannala B., Yang Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J Mol Evol. 1996 Sep;43(3):304–311. doi: 10.1007/BF02338839. [DOI] [PubMed] [Google Scholar]
- Slatkin M., Rannala B. Estimating the age of alleles by use of intraallelic variability. Am J Hum Genet. 1997 Feb;60(2):447–458. [PMC free article] [PubMed] [Google Scholar]
- Wakeley J. Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA. J Mol Evol. 1993 Dec;37(6):613–623. doi: 10.1007/BF00182747. [DOI] [PubMed] [Google Scholar]
- Wakeley J. Substitution-rate variation among sites and the estimation of transition bias. Mol Biol Evol. 1994 May;11(3):436–442. doi: 10.1093/oxfordjournals.molbev.a040124. [DOI] [PubMed] [Google Scholar]
- Ward R. H., Frazier B. L., Dew-Jager K., Päbo S. Extensive mitochondrial diversity within a single Amerindian tribe. Proc Natl Acad Sci U S A. 1991 Oct 1;88(19):8720–8724. doi: 10.1073/pnas.88.19.8720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watterson G. A. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975 Apr;7(2):256–276. doi: 10.1016/0040-5809(75)90020-9. [DOI] [PubMed] [Google Scholar]
- Wilson I. J., Balding D. J. Genealogical inference from microsatellite data. Genetics. 1998 Sep;150(1):499–510. doi: 10.1093/genetics/150.1.499. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998 May;15(5):568–573. doi: 10.1093/oxfordjournals.molbev.a025957. [DOI] [PubMed] [Google Scholar]
- Yang Z. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol. 1993 Nov;10(6):1396–1401. doi: 10.1093/oxfordjournals.molbev.a040082. [DOI] [PubMed] [Google Scholar]
- Yang Z., Nielsen R., Goldman N., Pedersen A. M. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000 May;155(1):431–449. doi: 10.1093/genetics/155.1.431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yang Z., Nielsen R. Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol. 1998 Apr;46(4):409–418. doi: 10.1007/pl00006320. [DOI] [PubMed] [Google Scholar]
- Yang Z., Rannala B. Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. Mol Biol Evol. 1997 Jul;14(7):717–724. doi: 10.1093/oxfordjournals.molbev.a025811. [DOI] [PubMed] [Google Scholar]
- da Silva M. N., Patton J. L. Amazonian phylogeography: mtDNA sequence variation in arboreal echimyid rodents (Caviomorpha). Mol Phylogenet Evol. 1993 Sep;2(3):243–255. doi: 10.1006/mpev.1993.1023. [DOI] [PubMed] [Google Scholar]