Skip to main content
Genetics logoLink to Genetics
. 2003 Jul;164(3):1229–1236. doi: 10.1093/genetics/164.3.1229

Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites.

Maria Anisimova 1, Rasmus Nielsen 1, Ziheng Yang 1
PMCID: PMC1462615  PMID: 12871927

Abstract

Maximum-likelihood methods based on models of codon substitution accounting for heterogeneous selective pressures across sites have proved to be powerful in detecting positive selection in protein-coding DNA sequences. Those methods are phylogeny based and do not account for the effects of recombination. When recombination occurs, such as in population data, no unique tree topology can describe the evolutionary history of the whole sequence. This violation of assumptions raises serious concerns about the likelihood method for detecting positive selection. Here we use computer simulation to evaluate the reliability of the likelihood-ratio test (LRT) for positive selection in the presence of recombination. We examine three tests based on different models of variable selective pressures among sites. Sequences are simulated using a coalescent model with recombination and analyzed using codon-based likelihood models ignoring recombination. We find that the LRT is robust to low levels of recombination (with fewer than three recombination events in the history of a sample of 10 sequences). However, at higher levels of recombination, the type I error rate can be as high as 90%, especially when the null model in the LRT is unrealistic, and the test often mistakes recombination as evidence for positive selection. The test that compares the more realistic models M7 (beta) against M8 (beta and omega) is more robust to recombination, where the null model M7 allows the positive selection pressure to vary between 0 and 1 (and so does not account for positive selection), and the alternative model M8 allows an additional discrete class with omega = d(N)/d(S) that could be estimated to be >1 (and thus accounts for positive selection). Identification of sites under positive selection by the empirical Bayes method appears to be less affected than the LRT by recombination.

Full Text

The Full Text of this article is available as a PDF (94.3 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Anisimova M., Bielawski J. P., Yang Z. Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol. 2001 Aug;18(8):1585–1592. doi: 10.1093/oxfordjournals.molbev.a003945. [DOI] [PubMed] [Google Scholar]
  2. Anisimova Maria, Bielawski Joseph P., Yang Ziheng. Accuracy and power of bayes prediction of amino acid sites under positive selection. Mol Biol Evol. 2002 Jun;19(6):950–958. doi: 10.1093/oxfordjournals.molbev.a004152. [DOI] [PubMed] [Google Scholar]
  3. Burke D. S. Recombination in HIV: an important viral evolutionary strategy. Emerg Infect Dis. 1997 Jul-Sep;3(3):253–259. doi: 10.3201/eid0303.970301. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Charlesworth B., Morgan M. T., Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics. 1993 Aug;134(4):1289–1303. doi: 10.1093/genetics/134.4.1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fares M. A., Moya A., Escarmís C., Baranowski E., Domingo E., Barrio E. Evidence for positive selection in the capsid protein-coding region of the foot-and-mouth disease virus (FMDV) subjected to experimental passage regimens. Mol Biol Evol. 2001 Jan;18(1):10–21. doi: 10.1093/oxfordjournals.molbev.a003715. [DOI] [PubMed] [Google Scholar]
  6. Fearnhead P., Donnelly P. Estimating recombination rates from population genetic data. Genetics. 2001 Nov;159(3):1299–1318. doi: 10.1093/genetics/159.3.1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Fitch W. M., Bush R. M., Bender C. A., Cox N. J. Long term trends in the evolution of H(3) HA1 human influenza type A. Proc Natl Acad Sci U S A. 1997 Jul 22;94(15):7712–7718. doi: 10.1073/pnas.94.15.7712. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Griffiths R. C., Marjoram P. Ancestral inference from samples of DNA sequences with recombination. J Comput Biol. 1996 Winter;3(4):479–502. doi: 10.1089/cmb.1996.3.479. [DOI] [PubMed] [Google Scholar]
  9. Hey J., Wakeley J. A coalescent estimator of the population recombination rate. Genetics. 1997 Mar;145(3):833–846. doi: 10.1093/genetics/145.3.833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Holmes Edward C., Woelk Christopher H., Kassis Raid, Bourhy Hervé. Genetic constraints and the adaptive evolution of rabies virus in nature. Virology. 2002 Jan 20;292(2):247–257. doi: 10.1006/viro.2001.1271. [DOI] [PubMed] [Google Scholar]
  11. Hudson R. R., Kaplan N. L. Deleterious background selection with recombination. Genetics. 1995 Dec;141(4):1605–1617. doi: 10.1093/genetics/141.4.1605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Hudson R. R. Properties of a neutral allele model with intragenic recombination. Theor Popul Biol. 1983 Apr;23(2):183–201. doi: 10.1016/0040-5809(83)90013-8. [DOI] [PubMed] [Google Scholar]
  13. Hudson R. R. Two-locus sampling distributions and their application. Genetics. 2001 Dec;159(4):1805–1817. doi: 10.1093/genetics/159.4.1805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Kuhner M. K., Yamato J., Felsenstein J. Maximum likelihood estimation of recombination rates from population data. Genetics. 2000 Nov;156(3):1393–1401. doi: 10.1093/genetics/156.3.1393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Maynard Smith J., Smith N. H. Detecting recombination from gene trees. Mol Biol Evol. 1998 May;15(5):590–599. doi: 10.1093/oxfordjournals.molbev.a025960. [DOI] [PubMed] [Google Scholar]
  16. McVean G. A. What do patterns of genetic variability reveal about mitochondrial recombination? Heredity (Edinb) 2001 Dec;87(Pt 6):613–620. doi: 10.1046/j.1365-2540.2001.00965.x. [DOI] [PubMed] [Google Scholar]
  17. McVean Gil, Awadalla Philip, Fearnhead Paul. A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics. 2002 Mar;160(3):1231–1241. doi: 10.1093/genetics/160.3.1231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Nielsen R. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics. 2000 Feb;154(2):931–942. doi: 10.1093/genetics/154.2.931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Nielsen R., Yang Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998 Mar;148(3):929–936. doi: 10.1093/genetics/148.3.929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Peek A. S., Souza V., Eguiarte L. E., Gaut B. S. The interaction of protein structure, selection, and recombination on the evolution of the type-1 fimbrial major subunit (fimA) from Escherichia coli. J Mol Evol. 2001 Feb;52(2):193–204. doi: 10.1007/s002390010148. [DOI] [PubMed] [Google Scholar]
  21. Przeworski M., Charlesworth B., Wall J. D. Genealogies and weak purifying selection. Mol Biol Evol. 1999 Feb;16(2):246–252. doi: 10.1093/oxfordjournals.molbev.a026106. [DOI] [PubMed] [Google Scholar]
  22. Schierup M. H., Hein J. Consequences of recombination on traditional phylogenetic analysis. Genetics. 2000 Oct;156(2):879–891. doi: 10.1093/genetics/156.2.879. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Schierup M. H., Hein J. Recombination and the molecular clock. Mol Biol Evol. 2000 Oct;17(10):1578–1579. doi: 10.1093/oxfordjournals.molbev.a026256. [DOI] [PubMed] [Google Scholar]
  24. Slade P. F. Simulation of selected genealogies. Theor Popul Biol. 2000 Feb;57(1):35–49. doi: 10.1006/tpbi.1999.1438. [DOI] [PubMed] [Google Scholar]
  25. Suzuki Y., Gojobori T. A method for detecting positive selection at single amino acid sites. Mol Biol Evol. 1999 Oct;16(10):1315–1328. doi: 10.1093/oxfordjournals.molbev.a026042. [DOI] [PubMed] [Google Scholar]
  26. Swanson W. J., Yang Z., Wolfner M. F., Aquadro C. F. Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals. Proc Natl Acad Sci U S A. 2001 Feb 20;98(5):2509–2514. doi: 10.1073/pnas.051605998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Wall J. D. A comparison of estimators of the population recombination rate. Mol Biol Evol. 2000 Jan;17(1):156–163. doi: 10.1093/oxfordjournals.molbev.a026228. [DOI] [PubMed] [Google Scholar]
  28. Williamson Scott, Orive Maria E. The genealogy of a sequence subject to purifying selection at multiple sites. Mol Biol Evol. 2002 Aug;19(8):1376–1384. doi: 10.1093/oxfordjournals.molbev.a004199. [DOI] [PubMed] [Google Scholar]
  29. Worobey M. A novel approach to detecting and measuring recombination: new insights into evolution in viruses, bacteria, and mitochondria. Mol Biol Evol. 2001 Aug;18(8):1425–1434. doi: 10.1093/oxfordjournals.molbev.a003928. [DOI] [PubMed] [Google Scholar]
  30. Wu J. C., Chiang T. Y., Shiue W. K., Wang S. Y., Sheen I. J., Huang Y. H., Syu W. J. Recombination of hepatitis D virus RNA sequences and its implications. Mol Biol Evol. 1999 Nov;16(11):1622–1632. doi: 10.1093/oxfordjournals.molbev.a026075. [DOI] [PubMed] [Google Scholar]
  31. Yamaguchi-Kabata Y., Gojobori T. Reevaluation of amino acid variability of the human immunodeficiency virus type 1 gp120 envelope glycoprotein and prediction of new discontinuous epitopes. J Virol. 2000 May;74(9):4335–4350. doi: 10.1128/jvi.74.9.4335-4350.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Yang Z. Maximum likelihood analysis of adaptive evolution in HIV-1 gp120 env gene. Pac Symp Biocomput. 2001:226–237. [PubMed] [Google Scholar]
  33. Yang Z., Nielsen R., Goldman N., Pedersen A. M. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000 May;155(1):431–449. doi: 10.1093/genetics/155.1.431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Yang Z., Swanson W. J., Vacquier V. D. Maximum-likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Mol Biol Evol. 2000 Oct;17(10):1446–1455. doi: 10.1093/oxfordjournals.molbev.a026245. [DOI] [PubMed] [Google Scholar]
  35. Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol. 2000 Dec 1;15(12):496–503. doi: 10.1016/S0169-5347(00)01994-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Yang Ziheng, Swanson Willie J. Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. Mol Biol Evol. 2002 Jan;19(1):49–57. doi: 10.1093/oxfordjournals.molbev.a003981. [DOI] [PubMed] [Google Scholar]

Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES