Skip to main content
Biology Letters logoLink to Biology Letters
. 2016 Jun;12(6):20160211. doi: 10.1098/rsbl.2016.0211

Coalescent inferences in conservation genetics: should the exception become the rule?

Valeria Montano 1,
PMCID: PMC4938050  PMID: 27330172

Abstract

Genetic estimates of effective population size (Ne) are an established means to develop informed conservation policies. Another key goal to pursue the conservation of endangered species is keeping the connectivity across fragmented environments, to which genetic inferences of gene flow and dispersal greatly contribute. Most current statistical tools for estimating such population demographic parameters are based on Kingman's coalescent (KC). However, KC is inappropriate for taxa displaying skewed reproductive variance, a property widely observed in natural species. Coalescent models that consider skewed reproductive success—called multiple merger coalescents (MMCs)—have been shown to substantially improve estimates of Ne when the distribution of offspring per capita is highly skewed. MMCs predictions of standard population genetic parameters, including the rate of loss of genetic variation and the fixation probability of strongly selected alleles, substantially depart from KC predictions. These extended models also allow studying gene genealogies in a spatial continuum, providing a novel theoretical framework to investigate spatial connectivity. Therefore, development of statistical tools based on MMCs should substantially improve estimates of population demographic parameters with major conservation implications.

Keywords: coalescent theory, conservation recommendations, demographic inferences

1. Recent developments in coalescent theory

Estimates of effective population size (Ne), defined by Wright as the number of reproducing lineages in an idealized population [1], are among the parameters used by the International Union for the Conservation of Nature (IUCN) to classify endangered species and to identify the minimum viable population size preventing extinction [2,3]. It has been suggested that IUCN thresholds of Ne recommended to avoid inbreeding depression and to maintain evolutionary potential should be revised, as theoretical predictions often fail to match empirical observations [3]. However, a theoretical revision of Ne thresholds will be ineffective to improve conservation recommendations if it is based on inappropriate evolutionary models.

Most methods applied in molecular ecology to infer demographic parameters from genetic data (e.g. Beast, Splatche, Ima, δaδi, FastSimcoal2, [48]) rely on Kingman's coalescent (KC; [9]) or its forward dual, the Wright–Fisher model [10]. Although KC has proven robust to violations of most of its assumptions, it drastically fails to approximate the genealogies of species with high reproductive skew [11], whereby few individuals contribute most of the offspring to the next generation (sweepstakes reproductive success (SRS; [12]). Skewed distribution of per capita reproductive success is widely observed among both marine and terrestrial species, from plants to parasites, but also among social birds and mammals [13]. SRS generally characterizes clonally reproducing organisms as much as species with high fecundity and low investment in parental care and thus applies to many endangered species, for instance, amphibians and commercial fish. Moreover, skewed individual reproductive success is not only due to intrinsic reproductive properties of a species, but can happen during strong population bottlenecks where only few individuals survive (e.g. a virus infecting a new host), during rapid population expansions [14], and during non-neutral processes, such as the appearance of a strongly beneficial allele which can drag a genome to replace an important fraction of the population within a few generations [15] (figure 1).

Figure 1.

Figure 1.

Examples of haploid genealogies presenting skewed reproductive success in forward time and thus multiple merging in backward time. Red edges indicate the sampled lineages. The yellow arrows represent the generation at which multiple merges occur and the blue arrows represent the generation at which the demographic event occurs. In (a), SRS leads to skewed offspring variance and thus multiple mergers can be observed at each generation, even when population size remains constant. In (b), population expansion happens at the last generation with low reproductive variance and low number of per capita offspring, hence the multiple mergers take place at the previous generation; in (c), the population bottleneck and the multiple merging events occur at the same generation. In (d), a selective sweep drags one genome to replace part of the population, thus the demographic event and the multiple merges co-occur.

The KC model neglects the probability of more than two lineages merging at each coalescent event, but when the offspring of a few individuals replace a large fraction of the population at each reproductive event, the probability of multiple lineages merging in backward time becomes high. Hence, under skewed reproductive success, KC forces lineages involved in multiple and/or simultaneous merges to coalesce pairwise, producing genealogical trees with misleading branch lengths and shape [11,14,15]. KC is a limit case of more complex coalescent processes, called multiple merger coalescences (MMCs), addressed in several recent studies, e.g. [11,12,1418], and excellently reviewed in [18]. MMCs cover comprehensive scenarios, spanning from multiple lineages merging into one at each coalescent event (Λ-coalescent and its limit cases—β-coalescent and Bolthausen–Sznitman coalescent [18]) to simultaneous multiple merging of multiple lineages at each coalescent event (Ξ-coalescent [18]). In MMC models, time-dependent changes in allele frequencies depart from KC predictions; consequently, probability of and time to fixation of both neutral and beneficial alleles, and, thus, the expected number of segregating sites dramatically change [19,20]. All of these measures are important to evaluate the health status of endangered species and their potential for adaptation to challenging environments [3].

When reproduction is highly skewed, few lineages substantially contribute to the next generation which means that the value of Ne, expressed by the parameter θ (2Neμ), is expected to be very low. However, under MMCs, alleles can persist at the same frequency for a longer time than under KC before changing state, implying a reduced probability of loss or fixation for very low- or high-frequency alleles, respectively [19,20]. By contrast, when offspring variance and Ne are small, alleles at low frequencies are more likely to be lost by drift. Hence, under MMCs, the number of segregating sites and the number of singletons are predicted and empirically observed to assume close values, while under KC predicted number of singletons is usually much lower than number of segregating sites [11,1618,21]. As a consequence, new beneficial mutations also show a higher chance of getting lost under KC than under MMCs [19,20]. When few individuals contribute most of the offspring to the next generation, the frequency of few genotypes can increase substantially more than predicted by neutral KC. We can think of this scenario in terms of single lineages' rapid expansion, from which it follows that a high number of singletons can appear as the local genealogies become star-like. However, this scenario does not imply an expansion of the population size, which can remain constant.

These differences between the KC and MMCs predictions explain two important results. First, MMCs estimates of Ne in marine species point to much lower values than KC estimates. In [11], the value of θ calculated for a population of oysters is 50 under KC and 0.031 under MMCs. From a conservation perspective, this result implies that high genetic variability can be generated by a very low number of lineages and thus an actual population might decline substantially without evident loss of genetic variation. At the same time, the ability of a few individuals to quickly regenerate considerable genetic variation and the chance of new beneficial mutations to persist might result in high potential for rapid adaptation. Second, under MMCs and constant population size, a low θ value can recover both the observed number of segregating sites and singletons, while KC estimates fail to do so [11,21]. Therefore, conclusions pointing to population expansion based on excess of singletons—negative values of Tajima's D—should be carefully evaluated in molecular ecology studies.

2. Spatial connectivity and continuous space evolution

Another theoretical advance of MMCs is the possibility to model continuous space evolution overcoming historical limitations. Indeed, models based on KC fail to control local population growth in continuous space, with the consequence that parts of the space grow unlimitedly and others become completely empty (a dynamic known as pain in the torus; [22,23]). As maintaining connectivity across habitats is indicated as a conservation priority [24], approaches to estimate connectivity in continuous landscapes based on circuit theory were developed as alternative to coalescent-based models [24,25]. Explicit spatial coalescent simulators based on KC (e.g. [5]) are still hampered by the use of discrete units which force coalescent events in non-contiguous populations [25], thus limiting their usefulness compared with alternative approaches [24,25]. In species with long-distance dispersal ability and skewed reproductive success, local populations show low values of Ne associated with higher pairwise FST between closer than between more distant populations [26]. This pattern can be explained by local bottlenecks due to few individuals reproducing and long-distance dispersal events.

A forward model based on extinction–recolonization events (Λ-Fleming–Viot) allows evolution to be modelled in spatial continuum using stochastic regulation of local size by randomly drawing the number of individuals destined for extinction (extinction event) and the number that will repopulate the same area from local or external parental lineages (recolonization event) [27,28]. The multiple merging spatial-Λ-coalescent is the backward dual of the forward Λ-Fleming–Viot processes [27,28]. Indeed, when lineages disappear backwardly during a recolonization event, multiple lineages will merge into the same or more parental individuals depending on how many parental lineages are responsible for the recolonization. When a parental lineage immigrates into a new area, the position of the descendent coalescing lineage will be spatially tracked back to a different part of the lattice corresponding to the origin of the parental lineage, such that the coalescing lineage is said to ‘jump’ [27]. Allowing for local bottlenecks and long-distance jumps, the spatial-Λ-coalescent can recover both small local Ne and long-distance correlated genealogies deriving from long-distance dispersal events [27,28]. Without needing to assume discrete demes or homogeneous population distribution, this new framework has been shown to predict very well local and global Ne values when classic FST measures otherwise are largely uncorrelated to observed values [2629].

3. Available statistical tools based on multiple merger coalescents

Given the wide relevance of MMCs models to describe the demographic histories of natural populations (e.g. SRS, bottlenecks, expansions, positive selection), it is important to compare the fit of KC versus MMCs to describe a population demographic history, before a parameter of interest is estimated from empirical genetic data. While in species with highly skewed reproductive success MMCs can be assumed to outperform KC, in less trivial cases, e.g. human rapid population expansion [14], a model comparison is needed to accept or reject KC.

At the state of the art, some MMCs maximum-likelihood estimators have been developed and are available to infer the effective population size and skewness of the offspring distribution of marine species [11,25,30], such as MetaGeneTree [17] (table 1). A recent software based on spatial-Λ-coalescent (PhyREX) by Guindon et al. [29] estimates global Ne values in continuous space as an alternative to classic FST estimates. Moreover, two MMCs simulators are currently available: algorithms by Kelleher et al. for continuous space evolution [28] and Hybrid-Lambda for species evolution [31], which could be used to fit evolutionary hypotheses to observations using simulation approaches (table 1). Indeed, Joseph et al. [32] developed an ABC pipeline based on the simulator presented in [28] (table 1). At the same time, empirical conservation biologists will benefit from being aware of the biological relevance of MMCs and when and why they should be applied.

Table 1.

Available statistical tools based on MMC models.

MMC tools
name type model spatially explicit reference source
Eldon & Wakeley estimator Λ-coalescent no Eldon & Wakeley [11] available from the authors on request
MetaGeneTree estimator Λ-coalescent no Birkner et al. [17] http://metagenetree.sourceforge.net/
PhyREX estimator spatial-Λ-coalescent yes Guindon et al. [29] https://github.com/stephaneguindon/phyml
Hybrid-Lambda simulator β- and Λ-coalescent no Zhu et al. [31] https://github.com/hybridLambda/hybrid-Lambda
ABC-Discsim simulator and estimator spatial-Λ-coalescent yes Kelleher et al. [28]; Joseph et al. [32] https://github.com/tyjo/ABC-Discsim

Acknowledgements

I am grateful to Mauricio Gonzalez-Forero, Jeffrey Jensen, Sebastian Matuszewski, Stefan Laurent, Oscar Gaggiotti, Chiara Batini and two anonymous reviewers for helpful comments.

Competing interests

The author declares no competing interest.

Funding

No funding was used for this article.

References

  • 1.Wright S. 1931. Evolution in Mendelian populations. Genetics 16, 97–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Luikart G, Ryman N, Tallmon DA, Schwartz MK, Allendorf FW. 2010. Estimation of census and effective population sizes: the increasing usefulness of DNA-based approaches. Conserv. Genet. 11, 355–373. ( 10.1007/s10592-010-0050-7) [DOI] [Google Scholar]
  • 3.Frankham R, Bradshaw CJA, Brook BW. 2014. Genetics in conservation management: revised recommendations for the 50/500 rules, Red List criteria and population viability analyses. Biol. Conserv. 170, 56–63. ( 10.1016/j.biocon.2013.12.036) [DOI] [Google Scholar]
  • 4.Drummond AJ, Rambaut A, Shapiro B, Pybus OG. 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22, 1185–1192. ( 10.1093/molbev/msi103) [DOI] [PubMed] [Google Scholar]
  • 5.Currat M, Ray N, Excoffier L. 2004. SPLATCHE: a program to simulate genetic diversity taking into account environmental heterogeneity. Mol. Ecol. Notes 4, 139–142. ( 10.1046/j.1471-8286.2003.00582.x) [DOI] [Google Scholar]
  • 6.Hey J. 2010. Isolation with migration models for more than two populations. Mol. Biol. Evol. 27, 905–920. ( 10.1093/molbev/msp296) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. 2009. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 ( 10.1371/journal.pgen.1000695) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M. 2013. Robust demographic inference from genomic and SNP data. PLoS Genet. 9, e1003905 ( 10.1371/journal.pgen.1003905) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kingman JFC. 1982. The coalescent. Stoch. Process. Appl. 13, 235–248. ( 10.1016/0304-4149(82)90011-4) [DOI] [Google Scholar]
  • 10.Ewens WJ. 2004. Mathematical population genetics. New York, NY: Springer. [Google Scholar]
  • 11.Eldon B, Wakeley J. 2006. Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics 172, 2621–2633. ( 10.1534/genetics.105.052175) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hedgecock D, Pudovkin AI. 2011. Sweepstakes reproductive success in highly fecund marine fish and shellfish: a review and commentary. Bull. Mar. Sci. 87, 971–1002. ( 10.5343/bms.2010.1051) [DOI] [Google Scholar]
  • 13.Rubenstein DR, Lovette IJ. 2009. Reproductive skew and selection on female ornamentation in social species. Nature 462, 786–789. ( 10.1038/nature08614) [DOI] [PubMed] [Google Scholar]
  • 14.Bhaskar A, Clark AG, Song YS. 2014. Distortion of genealogical properties when the sample is very large. Proc. Natl Acad. Sci. USA 111, 2385–2390. ( 10.1073/pnas.1322709111) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Neher RA, Hallatschek O. 2013. Genealogies of rapidly adapting populations. Proc. Natl Acad. Sci. USA 110, 437–442. ( 10.1073/pnas.1213113110) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Eldon B. 2011. Estimation of parameters in large offspring number models and ratios of coalescence times. Theor. Popul. Biol. 80, 16–28. ( 10.1016/j.tpb.2011.04.002) [DOI] [PubMed] [Google Scholar]
  • 17.Birkner M, Blath J, Steinrücken M. 2011. Importance sampling for Lambda-coalescents in the infinitely many sites model. Theor. Popul. Biol. 79, 155–173. ( 10.1016/j.tpb.2011.01.005) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Tellier A, Lemaire C. 2014. Coalescence 2.0: a multiple branching of recent theoretical developments and their applications. Mol. Ecol. 23, 2637–2652. ( 10.1111/mec.12755) [DOI] [PubMed] [Google Scholar]
  • 19.Der R, Epstein CL, Plotkin JB. 2011. Generalized population models and the nature of genetic drift. Theor. Popul. Biol. 80, 80–99. ( 10.1016/j.tpb.2011.06.004) [DOI] [PubMed] [Google Scholar]
  • 20.Der R, Epstein C, Plotkin JB. 2012. Dynamics of neutral and selected alleles when the offspring distribution is skewed. Genetics 191, 1331–1344. ( 10.1534/genetics.112.140038) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Sargsyan O, Wakeley J. 2008. A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theor. Popul. Biol. 74, 104–114. ( 10.1016/j.tpb.2008.04.009) [DOI] [PubMed] [Google Scholar]
  • 22.Felsenstein J. 1975. A pain in the torus: some difficulties with models of isolation by distance. Am. Nat. 109, 359–368. ( 10.1086/283003) [DOI] [Google Scholar]
  • 23.Barton NH, Etheridge AM, Véber A. 2010. A new model for evolution in a spatial continuum. Electron. J. Probab. 15, 162–216. ( 10.1214/EJP.v15-741) [DOI] [Google Scholar]
  • 24.McRae BH. 2006. Isolation by resistance. Evol. Int. J. Org. Evol. 60, 1551–1561. ( 10.1111/j.0014-3820.2006.tb00500.x) [DOI] [PubMed] [Google Scholar]
  • 25.Dupas S, et al. 2014. Phylogeography in continuous space: coupling species distribution models and circuit theory to assess the effect of contiguous migration at different climatic periods on genetic differentiation in Busseola fusca (Lepidoptera: Noctuidae). Mol. Ecol. 23, 2313–2325. ( 10.1111/mec.12730) [DOI] [PubMed] [Google Scholar]
  • 26.Eldon B, Wakeley J. 2009. Coalescence times and FST under a skewed offspring distribution among individuals in a population. Genetics 181, 615–629. ( 10.1534/genetics.108.094342) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Barton NH, Etheridge AM, Kelleher J, Véber A. 2013. Inference in two dimensions: allele frequencies versus lengths of shared sequence blocks. Theor. Popul. Biol. 87, 105–119. ( 10.1016/j.tpb.2013.03.001) [DOI] [PubMed] [Google Scholar]
  • 28.Kelleher J, Barton NH, Etheridge AM. 2014. Coalescent simulation in continuous space. Theor. Popul. Biol. 95, 13–23. ( 10.1016/j.tpb.2014.05.001) [DOI] [PubMed] [Google Scholar]
  • 29.Guindon S, Guo H, Welch D. In press. Demographic inference under the coalescent in a spatial continuum. Theor. Popul. Biol. ( 10.1016/j.tpb.2016.05.002) [DOI] [PubMed] [Google Scholar]
  • 30.Árnason E, Halldórsdóttir K. 2015. Nucleotide variation and balancing selection at the Ckma gene in Atlantic cod: analysis with multiple merger coalescent models. Peer J. 3, e786 ( 10.7717/peerj.786) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zhu S, Degnan JH, Goldstien SJ, Eldon B. 2015. Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees. BMC Bioinform. 16, 292 ( 10.1186/s12859-015-0721-y) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Joseph TA, Hickerson MJ, Alvarado-Serrano DF. In press. Demographic inference under a spatially continuous coalescent model. Heredity ( 10.1038/hdy.2016.28) [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Biology Letters are provided here courtesy of The Royal Society

RESOURCES