Skip to main content
Biology Letters logoLink to Biology Letters
. 2009 Apr 8;5(3):417–420. doi: 10.1098/rsbl.2009.0155

Effective population size and the rate and pattern of nucleotide substitutions

Megan Woolfit 1,*
PMCID: PMC2679941  PMID: 19364708

Abstract

Both the overall rate of nucleotide substitution and the relative proportions of synonymous and non-synonymous substitutions are predicted to vary between species that differ in effective population size (Ne). Our understanding of the genetic processes underlying these lineage-specific differences in molecular evolution is still developing. Empirical analyses indicate that variation in substitution rates and patterns caused by differences in Ne is often substantial, however, and must be accounted for in analyses of molecular evolution.

Keywords: effective population size, molecular evolution, substitution rate

1. Introduction

Nucleotide sequence data have been a great boon for the study of evolution. DNA sequences bring all organisms into the fold of comparative analyses, allowing us to jointly reconstruct the evolutionary histories of taxa that differ enormously in morphology and lifestyle. But while DNA is universal, its tempo and mode of evolution are not. It has become increasingly clear that the way in which a species' DNA evolves is affected by numerous aspects of its biology (e.g. Welch et al. 2008). One such aspect is effective population size (Ne), which is predicted to affect species' molecular evolution at many levels, from numbers of segregating nucleotide polymorphisms (Petit & Barbadilla 2008) to genome size and complexity (Lynch & Conery 2003; Hershberg et al. 2007). In this short review, however, I will focus on another level of molecular evolution affected by Ne: nucleotide substitutions. In particular, I will discuss how both the overall rate of nucleotide substitution and the ratio of non-synonymous to synonymous substitutions are likely to vary in lineages that differ in Ne.

2. What is effective population size?

The simplest scenario under which change in allele frequencies can be studied is the Wright–Fisher model, which consists of a population of constant size N diploid individuals, with discrete generations, random mating and binomial distribution of offspring number per parent. In reality, all natural populations will deviate from the Wright–Fisher model in numerous ways. Wright therefore developed the concept of the effective population size, or Ne, which is the size of an idealized population that would experience the same effects of random sampling of alleles as the real population under consideration (Wright 1931; see also Charlesworth (2009) for a comprehensive review of subsequent theoretical developments).

The list of demographic or genetic factors expected to reduce Ne relative to N is long, and includes common phenomena such as skewed sex ratios, non-random mating, variance in reproductive success, fluctuations in census population size, some forms of population subdivision, and linkage between loci under selection (Charlesworth 2009). Even closely related species that vary in one or more of these traits may therefore have substantially different effective population sizes.

3. Why should a species' Ne affect its evolution?

Ne reflects the balance of power between selection and drift: in small populations, drift plays a greater role and selection (both positive and negative) is correspondingly less efficacious. A mutation is effectively neutral when the magnitude of its selective coefficient is less than or equal to the inverse of the effective population size (Kimura 1983), so as Ne decreases, mutations of larger and larger effects behave as neutral. In species with small Ne, therefore, increasing numbers of slightly deleterious mutations may drift to fixation rather than being removed by purifying selection, increasing the substitution rate for this class of mutations. By contrast, more slightly advantageous mutations are likely to be lost due to drift rather than being fixed by positive selection, decreasing the substitution rate for this second class of mutations in species with small Ne.

If advantageous mutations are rare, while a substantial proportion of mutations are slightly deleterious, then we should be able to detect an increase in overall substitution rate in lineages with small Ne compared with those with larger Ne (all else, including mutation rates, being equal). If we make the further assumption that non-synonymous mutations are more likely to be slightly deleterious than synonymous mutations, many of which are probably neutral (but see Chamary et al. 2006), the ratio of non-synonymous to synonymous substitution rates (ω) should also be greater in lineages with small Ne (Ohta 1992).

4. How great should the effect be?

The magnitude of the effect of a change in Ne on nucleotide substitutions is determined by the distribution of selective effects of mutations. To illustrate this, consider two lineages with different effective population sizes, the larger NeL and the smaller NeS. If we assume that advantageous mutations are rare and most of the mutations that go to fixation are slightly deleterious, then the difference in substitution rate between these lineages will be largely determined by the proportion of mutations that have selective coefficients between 1/NeL and 1/NeS (figure 1). This proportion, in turn, is determined by the distribution of selective effects.

Figure 1.

Figure 1

The distributions of fitness effects modelled by Ohta (1977) (exponential or gamma with β=1, dashed curve) and Kimura (1979) (gamma with β=0.5, solid curve). In a small population, with effective population size NeS, mutations with selection coefficients between 1/NeS and zero will be effectively neutral. Fewer mutations, those with selection coefficients between 1/NeL and zero, will be effectively neutral in a larger population with NeL. The proportion of mutations that have selective coefficients between 1/NeS and 1/NeL will be greater under a gamma distribution of fitness effects with β=1 than with β=0.5 for most regions of parameter space.

Ohta (1977) assumed that the distribution of selection coefficients for new mutations was exponential. Under this distribution, and given a realistic mean strength of selection, a substantial proportion of mutations have fitness effects of the order of 1/Ne for many natural populations, and the effect of a change in population size on the rate of molecular evolution is expected to be quite large. This model was modified by Kimura (1979) who proposed that negative selection coefficients followed a more leptokurtic distribution. For a given strength of selection, fewer mutations will typically fall in the range from 1/NeL to 1/NeS under this distribution, and so the difference in substitution rate between lineages with different Ne will also be less, although a negative correlation between Ne and fixation rate is still predicted.

Neither of these distributions were chosen on the basis of biological data (Gillespie 1991), but a number of empirical estimates of the distribution of fitness effects of deleterious mutations have recently been made. Results vary between datasets and between taxa, with the estimated distributions including normal (Nielsen & Yang 2003), lognormal (Loewe & Charlesworth 2006) and strongly leptokurtic gamma distributions (Keightley & Eyre-Walker 2007) (figure 2). These estimates are based on the data from relatively few species, but indicate that the distribution of mutant effects is likely to vary between taxa. Adding further complexity, recent experimental work has suggested that a species' distribution of fitness effects is dynamic, and may change as organismal fitness and/or effective population size change (Silander et al. 2007).

Figure 2.

Figure 2

Example distributions of fitness effects estimated from different datasets, including lognormal for Drosophila miranda and Drosophila pseudoobscura (Loewe & Charlesworth 2006; dotted curve), strongly leptokurtic gamma for Drosophila melanogaster (solid curve) and human (spaced dashed curve) nuclear genes (Keightley & Eyre-Walker 2007), and normal for primate mitochondrial genes (Nielsen & Yang 2003; closed dashed curve).

The prediction of increased rate of evolution in species with small Ne relies on the assumption that advantageous mutations are rare: positive selection is less efficacious in small populations, so fixation of advantageous mutations will be reduced rather than increased in species with low Ne. Slightly advantageous mutations are in fact likely to be relatively common (Charlesworth & Eyre-Walker 2007), but theoretical work that incorporates positive selection on such mutations shows that a negative correlation between overall rate of substitution and effective population size is still predicted (Ohta 1992). More problematically, some studies have suggested that, far from being rare, strongly advantageous mutations may comprise a substantial proportion of those mutations that contribute to substitution in humans and Drosophila (Eyre-Walker 2006), and this may further weaken the inverse relationship between Ne and substitution rate.

5. What do the data say?

An increase in either overall substitution rate or ω in taxa with long-term low Ne has been shown for a broad range of species. For example, island endemic animal species, which are likely to experience a reduction in Ne compared with their mainland relatives due to both the bottleneck during island colonization and long-term restriction in range size, show significantly increased ω values (Woolfit & Bromham 2005). Endosymbiotic bacteria and fungi, which live within invertebrate hosts and undergo severe bottlenecks with each transmission to the next host generation, have higher substitution rates and values of ω than their free-living relatives (Woolfit & Bromham 2003; Moran et al. 2008). Also, hominids have higher values of ω, genome-wide, than other mammalian lineages with larger Ne (Kosiol et al. 2008).

We see the same patterns repeated across genomic regions that differ in Ne. Genes in regions of low recombination have reduced Ne due to Hill–Robertson interference, in which linkage between weakly selected loci reduces the efficacy of selection at any one locus (Hill & Robertson 1966); such genes show increased values of ω (Haddrill et al. 2007) and reduced fixation of beneficial mutations (Presgraves 2005).

By contrast, Charlesworth & Eyre-Walker (2007) have shown that lineages which have undergone an expansion in Ne may experience a transient, though potentially substantial, increase in substitution rate before the rate of evolution decreases to below the level it was before the increase in Ne. This temporary increase in substitution rate is due to the fixation by positive selection of slightly advantageous mutations that had previously been effectively neutral. They tested for such an effect in sequences from taxa that had probably undergone population expansion after colonizing the mainland from an island and found a significant increase in ω, supporting their prediction. Furthermore, Bachtrog (2008) recently analysed divergence data from 91 genes in two species of Drosophila that differ substantially in Ne, and found no evidence that Ne is a major determinant of the rate of adaptive evolution for these data, possibly due to recent changes in Ne or differences in the distribution of fitness effects of mutations between taxa.

6. What next?

It is clear that Ne may have substantial effects on the rates and patterns of nucleotide substitution, but predicting the precise form of those effects is far from simple. Nonetheless, some obvious implications for evolutionary analyses can be extrapolated from these results. For example, as even closely related species may differ substantially in Ne (e.g. Ramos-Onsins et al. 2004), assuming that changes in evolutionary rate along lineages are rare, is unlikely to be an appropriate model for estimating divergence dates. Similarly, when performing comparative analyses of selection in different lineages or genes, the possibility that variation in ω is due to differences in Ne must be considered alongside selective explanations.

To move beyond these caveats and begin to incorporate Ne into analyses of molecular evolution more quantitatively, we must obtain better estimates of the effective population sizes and distributions of fitness effects of both deleterious and advantageous mutations for many more taxa. Such analyses require substantial amounts of sequence data. Next-generation sequencing technology is making this increasingly tractable, although the effort involved in both sample collection and computational analysis of the data is likely to remain substantial. The return on investment would be great, however, as estimates of these parameters are essential not only to fully understand this major driver of molecular rate variation, but to answer questions in a host of other evolutionary fields ranging from conservation biology to quantitative genetics (Keightley & Eyre-Walker 2007).

Acknowledgements

I thank the editors and three anonymous reviewers for their perceptive and extremely helpful comments on the manuscript.

Footnotes

One contribution of 11 to a Special Feature on ‘Whole organism perspectives on understanding molecular evolution’.

References

  1. Bachtrog D. Similar rates of protein adaptation in Drosophila miranda and D. melanogaster, two species with different current effective population sizes. BMC Evol. Biol. 2008;8:334. doi: 10.1186/1471-2148-8-334. doi:10.1186/1471-2148-8-334 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Chamary J.V., Parmley J.L., Hurst L.D. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat. Rev. Genet. 2006;7:98–108. doi: 10.1038/nrg1770. doi:10.1038/nrg1770 [DOI] [PubMed] [Google Scholar]
  3. Charlesworth B. Effective population size and patterns of molecular evolution and variation. Nat. Rev. Genet. 2009;10:195–205. doi: 10.1038/nrg2526. doi:10.1038/nrg2526 [DOI] [PubMed] [Google Scholar]
  4. Charlesworth J., Eyre-Walker A. The other side of the nearly neutral theory, evidence of slightly advantageous back-mutations. Proc. Natl Acad. Sci. USA. 2007;104:16 992–16 997. doi: 10.1073/pnas.0705456104. doi:10.1073/pnas.0705456104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Eyre-Walker A. The genomic rate of adaptive evolution. Trends Ecol. Evol. 2006;21:569–575. doi: 10.1016/j.tree.2006.06.015. doi:10.1016/j.tree.2006.06.015 [DOI] [PubMed] [Google Scholar]
  6. Gillespie J.H. Cambridge University Press; Cambridge, UK: 1991. The causes of molecular evolution. [Google Scholar]
  7. Haddrill P.R., Halligan D.L., Tomaras D., Charlesworth B. Reduced efficacy of selection in regions of the Drosophila genome that lack crossing over. Genome Biol. 2007;8:R18. doi: 10.1186/gb-2007-8-2-r18. doi:10.1186/gb-2007-8-2-r18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hershberg R., Tang H., Petrov D.A. Reduced selection leads to accelerated gene loss in Shigella. Genome Biol. 2007;8:R164. doi: 10.1186/gb-2007-8-8-r164. doi:10.1186/gb-2007-8-8-r164 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Hill W.G., Robertson A. Effect of linkage on limits to artificial selection. Genet. Res. 1966;8:269–294. [PubMed] [Google Scholar]
  10. Keightley P.D., Eyre-Walker A. Joint inference of the distribution of fitness effects of deleterious mutations and population demography based on nucleotide polymorphism frequencies. Genetics. 2007;177:2251–2261. doi: 10.1534/genetics.107.080663. doi:10.1534/genetics.107.080663 [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Kimura M. Model of effectively neutral mutations in which selective constraint is incorporated. Proc. Natl Acad. Sci. USA. 1979;76:3440–3444. doi: 10.1073/pnas.76.7.3440. doi:10.1073/pnas.76.7.3440 [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Kimura M. Cambridge University Press; Cambridge, UK: 1983. The neutral theory of molecular evolution. [Google Scholar]
  13. Kosiol C., Vinar T., da Fonseca R.R., Hubisz M.J., Bustamante C.D., Nielsen R., Siepel A. Patterns of positive selection in six mammalian genomes. PLoS Genet. 2008;4:e1000144. doi: 10.1371/journal.pgen.1000144. doi:10.1371/journal.pgen.1000144 [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Loewe L., Charlesworth B. Inferring the distribution of mutational effects in Drosophila. Biol. Lett. 2006;2:426–430. doi: 10.1098/rsbl.2006.0481. doi:10.1098/rsbl.2006.0481 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lynch M., Conery J.S. The origins of genome complexity. Science. 2003;302:1401–1404. doi: 10.1126/science.1089370. doi:10.1126/science.1089370 [DOI] [PubMed] [Google Scholar]
  16. Moran N.A., McCutcheon J.P., Nakabachi A. Genomics and evolution of heritable bacterial symbionts. Annu. Rev. Genet. 2008;42:165–190. doi: 10.1146/annurev.genet.41.110306.130119. doi:10.1146/annurev.genet.41.110306.130119 [DOI] [PubMed] [Google Scholar]
  17. Nielsen R., Yang Z.H. Estimating the distribution of selection coefficients from phylogenetic data with applications to mitochondrial and viral DNA. Mol. Biol. Evol. 2003;20:1231–1239. doi: 10.1093/molbev/msg147. doi:10.1093/molbev/msg147 [DOI] [PubMed] [Google Scholar]
  18. Ohta T. Extension to the neutral mutation random drift hypothesis. In: Kimura M., editor. Molecular evolution and polymorphism. National Institute of Genetics; Mishima, Japan: 1977. pp. 148–167. [Google Scholar]
  19. Ohta T. The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 1992;23:263–286. doi:10.1146/annurev.es.23.110192.001403 [Google Scholar]
  20. Petit N., Barbadilla A. Selection efficiency and effective population size in Drosophila species. J. Evol. Biol. 2008;22:515–526. doi: 10.1111/j.1420-9101.2008.01672.x. doi:10.1111/j.1420-9101.2008.01672.x [DOI] [PubMed] [Google Scholar]
  21. Presgraves D.C. Recombination enhances protein adaptation in Drosophila melanogaster. Curr. Biol. 2005;15:1651–1656. doi: 10.1016/j.cub.2005.07.065. doi:10.1016/j.cub.2005.07.065 [DOI] [PubMed] [Google Scholar]
  22. Ramos-Onsins S.E., Stranger B.E., Mitchell-Olds T., Aguade M. Multilocus analysis of variation and speciation in the closely related species Arabidopsis halleri and A. lyrata. Genetics. 2004;166:373–388. doi: 10.1534/genetics.166.1.373. doi:10.1534/genetics.166.1.373 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Silander O.K., Tenaillon O., Chao L. Understanding the evolutionary fate of finite populations: the dynamics of mutational effects. PLoS Biol. 2007;5:922–931. doi: 10.1371/journal.pbio.0050094. doi:10.1371/journal.pbio.0050094 [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Welch J.J., Bininda-Emonds O.R.P., Bromham L. Correlates of substitution rate variation in mammalian protein-coding sequences. BMC Evol. Biol. 2008;8:53. doi: 10.1186/1471-2148-8-53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Woolfit M., Bromham L. Increased rates of sequence evolution in endosymbiotic bacteria and fungi with small effective population sizes. Mol. Biol. Evol. 2003;20:1545–1555. doi: 10.1093/molbev/msg167. doi:10.1093/molbev/msg167 [DOI] [PubMed] [Google Scholar]
  26. Woolfit M., Bromham L. Population size and molecular evolution on islands. Proc. R. Soc. B. 2005;272:2277–2282. doi: 10.1098/rspb.2005.3217. doi:10.1098/rspb.2005.3217 [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Wright S. Evolution in Mendelian populations. Genetics. 1931;16:97–159. doi: 10.1093/genetics/16.2.97. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Biology Letters are provided here courtesy of The Royal Society

RESOURCES