Soft Selective Sweeps in Complex Demographic Scenarios

Benjamin A Wilson; Dmitri A Petrov; Philipp W Messer

doi:10.1534/genetics.114.165571

. 2014 Jul 24;198(2):669–684. doi: 10.1534/genetics.114.165571

Soft Selective Sweeps in Complex Demographic Scenarios

Benjamin A Wilson ^1,¹, Dmitri A Petrov ¹, Philipp W Messer ^1,²

PMCID: PMC4266194 PMID: 25060100

Abstract

Adaptation from de novo mutation can produce so-called soft selective sweeps, where adaptive alleles of independent mutational origin sweep through the population at the same time. Population genetic theory predicts that such soft sweeps should be likely if the product of the population size and the mutation rate toward the adaptive allele is sufficiently large, such that multiple adaptive mutations can establish before one has reached fixation; however, it remains unclear how demographic processes affect the probability of observing soft sweeps. Here we extend the theory of soft selective sweeps to realistic demographic scenarios that allow for changes in population size over time. We first show that population bottlenecks can lead to the removal of all but one adaptive lineage from an initially soft selective sweep. The parameter regime under which such “hardening” of soft selective sweeps is likely is determined by a simple heuristic condition. We further develop a generalized analytical framework, based on an extension of the coalescent process, for calculating the probability of soft sweeps under arbitrary demographic scenarios. Two important limits emerge within this analytical framework: In the limit where population-size fluctuations are fast compared to the duration of the sweep, the likelihood of soft sweeps is determined by the harmonic mean of the variance effective population size estimated over the duration of the sweep; in the opposing slow fluctuation limit, the likelihood of soft sweeps is determined by the instantaneous variance effective population size at the onset of the sweep. We show that as a consequence of this finding the probability of observing soft sweeps becomes a function of the strength of selection. Specifically, in species with sharply fluctuating population size, strong selection is more likely to produce soft sweeps than weak selection. Our results highlight the importance of accurate demographic estimates over short evolutionary timescales for understanding the population genetics of adaptation from de novo mutation.

Keywords: adaptation, mutation, coalescent theory

ADAPTATION can proceed from standing genetic variation or mutations that are not initially present in the population. When adaptation requires de novo mutations, the waiting time until adaptation occurs depends on the product of the mutation rate toward adaptive alleles and the population size. In large populations, or when the mutation rate toward adaptive alleles is high, adaptation can be fast, whereas in small populations the speed of adaptation will often be limited by the availability of adaptive mutations.

Whether adaption is mutation limited or not has important implications for the dynamics of adaptive alleles. In a mutation-limited scenario, only a single adaptive mutation typically sweeps through the population and all individuals in a population sample that carry the adaptive allele coalesce into a single ancestor with the adaptive mutation (Figure 1A). This process is referred to as a “hard” selective sweep (Hermisson and Pennings 2005). Hard selective sweeps leave characteristic signatures in population genomic data, such as a reduction in genetic diversity around the adaptive site (Maynard Smith and Haigh 1974; Kaplan et al. 1989; Kim and Stephan 2002) and the presence of a single, long haplotype (Hudson et al. 1994; Sabeti et al. 2002; Voight et al. 2006). In non-mutation-limited scenarios, by contrast, several adaptive mutations of independent origin can sweep through the population at the same time, producing so-called “soft” selective sweeps (Pennings and Hermisson 2006a). In a soft sweep, individuals that carry the adaptive allele collapse into distinct clusters in the genealogy and several haplotypes can be frequent in the population (Figure 1A). As a result, soft sweeps leave more subtle signatures in population genomic data than hard sweeps and are thus more difficult to detect. For example, diversity is not necessarily reduced in the vicinity of the adaptive locus in a soft sweep because a larger proportion of the ancestral variation present prior to the onset of selection is preserved (Innan and Kim 2004; Przeworski et al. 2005; Pennings and Hermisson 2006b; Burke 2012; Peter et al. 2012).

Hard and soft sweeps in populations of constant size and under recurrent population bottlenecks. (A) Allele-frequency trajectories and corresponding coalescent genealogies for a hard selective sweep (left) and a soft selective sweep (right). In the soft sweep scenario, a second beneficial mutation establishes τ_est generations after the first mutation but before the beneficial allele has fixed. The distinguishing feature between a hard and a soft sweep can be seen in the genealogy of a population sample of individuals with the adaptive allele: In a hard sweep, the sample coalesces into a single ancestor, whereas in a soft sweep the sample coalesces into multiple ancestors with independently arisen adaptive mutations. (B) Illustration of our simplified model used to explore the hardening phenomenon. Population bottlenecks occur every ΔT generations wherein the population size is reduced from N₁ to N₂ for a single generation. The average waiting time between independently establishing beneficial mutations is τ_est. From establishment, it takes τ₂ generations for the second mutation to reach frequency 1/N₂, from where on it is unlikely to be lost during the bottleneck. The hardening phenomenon is illustrated by the loss of the dark blue allele during the bottleneck. The dashed blue line indicates the threshold trajectory required for the mutation to successfully survive the bottleneck.

There is mounting evidence that adaptation is not mutation limited in many species, even when it requires a specific nucleotide mutation in the genome (Messer and Petrov 2013). Recent case studies have revealed many examples where, at the same locus, several adaptive mutations of independent mutational origin swept through the population at the same time, producing soft selective sweeps. For instance, soft sweeps have been observed during the evolution of drug resistance in HIV (Fischer et al. 2010; Messer and Neher 2012; Pennings et al. 2014) and malaria (Nair et al. 2007), pesticide and viral resistance in fruit flies (Catania et al. 2004; Aminetzach et al. 2005; Chung et al. 2007; Karasov et al. 2010; Schmidt et al. 2010), warfarin resistance in rats (Pelz et al. 2005), and color patterns in beach mice (Hoekstra et al. 2006; Domingues et al. 2012). Even in the global human population, adaptation has produced soft selective sweeps, as evidenced by the parallel evolution of lactase persistence in Eurasia and Africa through recurrent mutations in the lactase enhancer (Bersaglieri et al. 2004; Tishkoff et al. 2007; Enattah et al. 2008; Jones et al. 2013) and the mutations in the gene G6PD that evolved independently in response to malaria (Louicharoen et al. 2009). Some of these sweeps arose from standing genetic variation while others involved recurrent de novo mutation. For the remainder of our study, we focus on the latter scenario of adaptation arising from de novo mutation.

The population genetics of adaptation by soft selective sweeps was first investigated in a series of articles by Hermisson and Pennings (Hermisson and Pennings 2005; Pennings and Hermisson 2006a,b). They found that in a haploid population of constant size the key evolutionary parameter that determines whether adaptation from de novo mutations is more likely to produce hard or soft sweeps is the population-scale mutation rate Θ = 2N_eU_A, where N_e is the variance effective population size in a Wright–Fisher model and U_A is the rate at which the adaptive allele arises per individual per generation. When Θ ≪ 1, adaptation typically involves only a single adaptive mutation and produces a hard sweep, whereas when Θ becomes on the order of one or larger, soft sweeps predominate (Pennings and Hermisson 2006a).

The strong dependence of the likelihood of soft sweeps on Θ can be understood from an analysis of the involved timescales. An adaptive mutation with selection coefficient s that successfully escapes early stochastic loss requires τ_fix ∼ log(N_es)/s generations until it eventually fixes in the population (Hermisson and Pennings 2005; Desai and Fisher 2007). The expected number of independent adaptive mutations that arise during this time is on the order of N_eU_Alog(N_es)/s—i.e., the product of the population-scale mutation rate toward the adaptive allele and its fixation time. Yet only an approximate fraction 2s of these mutations escapes early stochastic loss and successfully establishes in the population (Haldane 1927; Kimura 1962). Thus, the expected number of independently originated adaptive mutations that successfully establish before the first one has reached fixation is of order (2s)N_eU_Alog(N_es)/s = Θ log(N_es) and, therefore, depends only logarithmically on the selection coefficient of the adaptive allele.

Our current understanding of the likelihood of soft sweeps relies on the assumption of a Wright–Fisher model with fixed population size, where Θ remains constant over time. This assumption is clearly violated in many species, given that population sizes often change dramatically throughout the evolutionary history of a species. To assess what type of sweeps to expect in a realistic population, we must understand how the likelihood of soft sweeps is affected by demographic processes.

In many organisms, population sizes can fluctuate continuously and over timescales that are not necessarily long compared to those over which adaptation occurs. For example, many pathogens undergo severe bottlenecks during host-to-host transmission (Artenstein and Miller 1966; Gerone et al. 1966; Wolfs et al. 1992; Wang et al. 2010), insects can experience extreme, seasonal boom–bust cycles (Wright et al. 1942; Ives 1970; Baltensweiler and Fischlin 1988; Nelson et al. 2013), and even some mammals experience dramatic, cyclical changes in abundance (Krebs and Myers 1974; Myers 1998). Extensive work has been devoted to the question of how such fluctuations affect the fixation probabilities of adaptive mutations (Ewens 1967; Otto and Whitlock 1997; Pollak 2000; Patwa and Wahl 2008; Engen et al. 2009; Parsons et al. 2010; Uecker and Hermisson 2011; Waxman 2011) but it remains unclear how they affect the likelihood of observing soft sweeps.

In this study we investigate the effects of demographic processes on adaptation from de novo mutations. We show that recurrent population bottlenecks can give rise to a phenomenon that we call the hardening of soft selective sweeps. Hardening occurs when only one beneficial lineage in an initially soft sweep persists through a population bottleneck. We then develop a generalized analytical framework for calculating the likelihood of soft sweeps under arbitrary demographic scenarios, based on the coalescent with “killings” process. We find that when population size varies over time, two important symmetries of the constant population-size scenario are broken: first, the probability of observing soft sweeps becomes a function of the starting time of the sweep and, second, it becomes a function of the strength of selection. In particular, we show that strong selection is often more likely to produce soft sweeps than weak selection when population size fluctuates.

Methods

Forward simulations of adaptation under recurrent population bottlenecks

We simulated adaptation from de novo mutation in a modified Wright–Fisher model with selection. Each simulation run was started from a population that was initially monomorphic for the wild-type allele, a. New adaptive mutations entered the population by a Poisson process with rate N₁U_A[1 − x(t)], where 1 − x(t) is the frequency of the wild-type allele. The population in each generation was produced by multinomial sampling from the previous generation, with sampling probabilities being proportional to the difference in fitness of each lineage and the mean population fitness. Population bottlenecks were simulated through a single-generation downsampling to size N₂ (without selection) every ΔT generations. We did not require that the first beneficial mutation arise in the first generation. Each simulation run started ΔT generations before the first bottleneck. All adaptive lineages were tracked in the population until the adaptive allele had reached fixation. One thousand simulations were run for each parameter combination. Empirical probabilities of observing a soft sweep in a given simulation run were obtained by calculating the expected probability that two randomly drawn adaptive lineages are not identical by decent, based on the population frequencies of all adaptive lineages in the population at the time of sampling.

Numerical Monte Carlo integration

Analytical predictions for P_soft,2(t, s) and P_soft,10(t, s) in Figure 4 and Figure 5 were obtained by the following procedure: For the given demographic model, selection coefficient, and starting time of the sweep, we first calculated the fixation probability of the adaptive allele via Equation 9 using Monte Carlo integration routines from the GNU Scientific Library (Galassi et al. 2009). This fixation probability was then used in Equation 8 to obtain the deterministic trajectory x∗(t). Solving x^∗(t_n) = 1/2 yielded the sampling time t_n. We then recursively estimated the lower bound ${\hat{t}}_{j}$ of integral in (10) for each k such that the expected number of events occurring between ${\hat{t}}_{k}$ and ${\hat{t}}_{k + 1}$ converged to 1 ± 10⁻⁴. Finally, we integrated the coalescence rate from Equation 4 over the interval $[{\hat{t}}_{k}, {\hat{t}}_{k + 1}]$ to determine the probability that the event occurring at ${\hat{t}}_{k}$ was a coalescent event, yielding P_coal,_k = 1 − P_soft,_k. These probabilities were calculated for k = 1, …, n−1 and used in Equation 7 to get P_soft,_n(t, s). Note that this approach can easily be adjusted for any other sampling time or adaptive allele frequency at sampling.

Weak and strong selection limits. (A) In the cyclical population example, N(t) cycles between a maximum size N_max = 10⁸ and a minimum size N_min = 10⁶ over a period of ΔT = 500 generations. Adaptive mutations occur at a *de novo* rate of U_A = 10⁻⁸ per individual, per generation. We condition selective sweeps on four different starting times: T₁, T₂, T₃, and T₄. (B) Comparison of our analytical predictions for the probabilities P_soft,2(t) of observing a soft sweep in a sample of two adaptive alleles, drawn randomly at the time when the adaptive allele has reached a population frequency of 50% (colored lines), with empirical probabilities observed in Wright–Fisher simulations (colored circles, see *Methods*). Convergence to the harmonic mean expectation, E(Θ_H), is seen for weak selection, while convergence to the instantaneous population-size expectation, E(Θ₁) and E(Θ₂), is seen for strong selection. The convergence of the orange and green lines is also expected in the strong selection limit as they share the same instantaneous population size at t₀. (C) Weak selection/fast fluctuation and strong selection/slow fluctuation limits in our recurrent bottleneck model from Figure 2: The observed probabilities of soft sweeps in the recurrent bottleneck simulations transition from the harmonic mean expectations (dashed black lines) to the instantaneous population-size expectations (solid black lines). The dotted vertical lines indicate the position of our heuristic boundary $Δ T = {τ^{'}}_{e s t}$ for selection coefficients that meet the criteria s > N₂/N₁, where our heuristic is valid.

Probability of observing soft sweeps in two demographic scenarios. (A) Example inspired by data from the extreme fluctuations observed in multiple species of moths (Baltensweiler and Fischlin 1988; Nelson *et al.* 2013). We assume that the adaptive population-scale mutation rate varies between Θ_min = 10⁻³ and Θ_max = 1 over a period of ΔT = 5 generations. (B) Predictions for the probability of observing a soft sweep in a sample of two adaptive lineages drawn randomly at the time point when the adaptive allele has reached a population frequency of 50%, conditional on four different starting times of the sweep (T₁ to T₄). The noise stems from the numerical Monte Carlo integrations. The probability of observing a soft sweep is close to the harmonic mean expectation, E(Θ_H), for virtually all starting times and selection strengths, except when selection is extremely strong. (C) Demographic model proposed for the European human population (Gazave *et al.* 2014). The ancestral population size is N_anc = 10⁴. Starting at 113 generations in the past, the population expands exponentially at a constant rate of r = 0.0554, until it reaches its current size of N_cur ≈ 520,000. Population size is assumed to remain constant thereafter. Note that the y-axis is plotted logarithmically. We set the beneficial mutation rate in this example at U_A = 5 × 10⁻⁸. (D) Analytic predictions for the probability of observing a soft sweep in a sample of size n = 10 (solid lines) when the sweep starts at present (T₄), midway during the expansion (T₃ = 50 generations ago), at the beginning of the expansion (T₂ = 113 generations ago), and prior to the expansion (T₁ = 500 generations ago). Sweeps that start prior to the expansion are almost exclusively hard, whereas sweeps starting today will be primarily soft, regardless of the strength of selection. Sweeps starting at the beginning or during the expansion show an interesting crossover behavior: Smaller selection coefficients are more likely than larger selection coefficients to produce soft sweeps because weaker sweeps take longer to complete and thus experience more time at larger population sizes. Note that all sweeps have selection coefficients >1/2N_anc, below which drift stochasticity would prevent a meaningful deterministic approximation to the frequency trajectory. Open circles and dotted lines show results for sample size of n = 2 for comparison.

Forward simulations in cycling and expanding populations

We simulated adaptation from de novo mutation in cycling populations and an expanding population using the Wright–Fisher models specified above. Each simulated population was initially monomorphic for the wild-type allele. We began our simulations at four different time points (t₀) along the population-size trajectory and ran each simulation on the condition that the first beneficial allele that arose in generation t₀ did not go extinct during the simulation. Simulations were run until the adaptive allele was >50% frequency. Ten thousand simulations were run for each combination of parameters in the cycling population example, and one thousand simulations were run for each combination of parameters in the human population expansion example. All code was written in Python and C++ and is available upon request.

Results

We study a single locus with two alleles, a and A, in a haploid Wright–Fisher population (random mating, discrete generations) (Ewens 2004). The population is initially monomorphic for the wild-type allele a. The derived allele A has a selective advantage s over the wild-type and arises at a rate U_A per individual, per generation. We ignore back mutations and consider the dynamics of the two alleles at this locus in isolation; i.e., there is no interaction with other alleles elsewhere in the genome.

In a classical hard sweep scenario, a single adaptive allele arises, successfully escapes early stochastic loss, and ultimately sweeps to fixation in the population. In a soft sweep, several adaptive mutations establish independently in the population and rise in frequency before the adaptive allele has fixed in the population. After fixation of the adaptive allele, individuals in a population sample do not coalesce into a single ancestor with the adaptive allele but fall into two or more clusters, reflecting the independent mutational origins of the different adaptive lineages (Figure 1A). Note that the distinction between a hard and a soft sweep is based on the genealogy of adaptive alleles in a population sample. It is therefore possible that the same adaptive event yields a soft sweep in one sample but remains hard in another, depending on which individuals are sampled.

Soft sweeps in populations of constant size

The likelihood of soft sweeps during adaptation from de novo mutation has been calculated by Pennings and Hermisson (2006a) for a Wright–Fisher model of constant population size N. Using coalescent theory, they showed that in a population sample of size n, drawn right after fixation of the adaptive allele, the probability of observing at least two independently originated adaptive lineages is given by

P_{soft, n} (Θ) \approx 1 - \prod_{k = 1}^{n - 1} \frac{k}{k + Θ},

(1)

where Θ = 2NU_A is the population-scale mutation rate—twice the number of adaptive alleles that enter the population per generation. Thus, the probability of a soft sweep is primarily determined by Θ and is nearly independent of the strength of selection.

The transition between the regimes where hard and where soft sweeps predominate occurs when Θ becomes on the order of 1 in the constant population-size scenario. When Θ ≪ 1, adaptive mutations are not readily available in the population and adaptation is impeded by the waiting time until the first successful adaptive mutation arises. This regime is referred to as the mutation-limited regime. Adaptation from de novo mutation typically produces hard sweeps in this case. When Θ ≥ 1, by contrast, adaptive mutations arise at least once per generation on average. In this non-mutation-limited regime, soft sweeps predominate.

Soft sweeps under recurrent bottlenecks: Heuristic predictions

The standard Wright–Fisher model assumes a population of constant size N. To study the effects of population-size changes on the probability of soft sweeps, we relax this condition and model a population that alternates between two sizes. Every ΔT generations the population size is reduced from N₁ to N₂ ≪ N₁ for a single generation and then returns to its initial size in the following generation (Figure 1B). We define Θ = 2N₁U_A as the population-scale mutation rate during the large population phases.

We assume instantaneous population-size changes and do not explicitly consider a continuous population decline at the beginning of the bottleneck or growth during the recovery phase. This assumption should be appropriate for sharp, punctuated bottlenecks and allows us to specify the “severity” of a bottleneck in terms of a single parameter, N₂/N₁. We also assume that mutation and selection operate only during the phases when the population is large, whereas the two alleles, a and A, are neutral with respect to each other and no new mutations occur during a bottleneck. This assumption is justified for severe bottlenecks with N₂ ≪ N₁ and when bottlenecks are neutral demographic events. Note that many effects of a population bottleneck depend primarily on the ratio of its duration over its severity. In principle, most of the results we derive below should therefore be readily applicable to more complex bottleneck scenarios by mapping the real bottleneck onto an effective single-generation bottleneck, provided that the real bottleneck is not long enough that beneficial mutations appear during the bottleneck.

Adaptive mutations arise in the large population at rate N₁U_A, but only a fraction 2s of these mutations successfully establishes in the large population; i.e., these mutations stochastically reach a frequency ≈1/(N₁s) whereupon they are no longer likely to become lost by random genetic drift (assuming that the amount of drift remains constant over time). Thus, adaptive mutations establish during the large phases at an approximate rate Θs. We assume that successfully establishing mutations reach their establishment frequency fast compared to the timescale ΔT between bottlenecks, in which case establishment can be effectively modeled by a Poisson process. This assumption is reasonable when selection is strong and the establishment frequency low. Note that those adaptive mutations that do reach establishment frequency typically achieve this quickly in ∼γ/s generations, where γ ≈ 0.577 is the Euler–Mascheroni constant (Desai and Fisher 2007; Eriksson et al. 2008).

Under the Poisson assumption, the expected waiting time until a successful adaptive mutation arises in the large population phase is given by τ_est = 1/(Θs). After establishment, its population frequency is modeled deterministically by logistic growth: x(t) = 1/[1 + (N₁s)exp(−st)]. Fixation would occur τ_fix ∼ log(N₁s)/s generations after establishment, assuming that the population sizes were to remain constant.

If an adaptive mutation establishes during the large phase but has not yet fixed at the time the next bottleneck occurs, its fate will depend on its frequency at the onset of the bottleneck. In our model, the bottleneck is a single generation of random downsampling of the population to a size N₂ ≪ N₁. Any mutation present at the onset of the bottleneck will likely survive the bottleneck only when it was previously present at a frequency >1/N₂, i.e., when at least one copy is expected to be present during the bottleneck. Less frequent mutations will typically be lost (Figure 1B). To reach frequency 1/N₂ in the population, an adaptive mutation needs to grow for approximately another τ₂ = log(N₁s/N₂)/s generations after establishment. We can therefore define the bottleneck establishment time as the sum of the initial establishment time (assuming instantaneous establishment), τ_est, and the waiting time until the mutation has subsequently reached a high-enough frequency to likely survive a bottleneck, τ₂:

{τ^{'}}_{e s t} = \frac{1}{Θ s} + \frac{\log (N_{1} s / N_{2})}{s} .

(2)

We show below that the comparison between bottleneck establishment time, ${τ^{'}}_{e s t}$ , and bottleneck recurrence time, ΔT, distinguish the qualitatively different regimes in our model.

Mutation-limited adaptation:

It is clear that bottlenecks can decrease only the probability of a soft sweep in our model relative to the probability in the constant population-size scenario, as they systematically remove variation from the population by increasing the variance in allele frequencies between generations. Consequently, when Θ ≪ 1, sweeps are hard because adaptation is already mutation-limited during the large phases. Note that mutation limitation does not necessarily imply that adaptation is unlikely in general; it may just take longer until an adaptive mutation successfully establishes in the population. When the recurrence time, ΔT, is much larger than the establishment time, τ_est, adaptation is still expected to occur between two bottlenecks.

Non-mutation-limited adaptation:

If Θ ≥ 1, adaptation is not mutation limited during the large population phases. In the absence of bottlenecks (or when bottlenecks are very weak), adaptation from de novo mutation often produces soft selective sweeps. A strong population bottleneck, however, can potentially remove all but one adaptive lineage and result in a scenario in which only this one lineage ultimately fixes. In this case, we say that the bottleneck has hardened the initially soft selective sweep.

We can identify the conditions that make hardening likely from a simple comparison of timescales: Hardening should occur whenever Θ ≥ 1 and at the same time

Δ T < {τ^{'}}_{e s t},

(3)

such that a second de novo mutation typically does not have enough time to reach a safe frequency that assures its survival before the next bottleneck sets in (Figure 1B).

The argument that the second adaptive mutation needs to grow for τ₂ generations after its establishment to reach a safe frequency 1/N₂ makes sense only when the mutation is actually at a lower frequency than 1/N₂ at establishment, which requires that bottlenecks are sufficiently severe (N₂/N₁ < s). For weaker bottlenecks, most established mutations should typically survive the bottleneck and hardening will generally be unlikely. Note that the condition N₂/N₁ > s alone does not imply that soft sweeps should predominate—this still depends on the value of Θ. In the other limit, where bottleneck severity increases until N₂ → 1, all sweeps become hardened. This imposes the requirement that τ₂ ≪ τ_fix or correspondingly that N₂ ≫ 1 for our bottleneck establishment time to be valid.

The heuristic argument invokes a number of strong simplifications, including that allele-frequency trajectories are deterministic once the adaptive allele has reached its establishment frequency, that alleles at frequencies <1/N₂ have no chance of surviving a bottleneck, and that establishment occurs instantaneously during a large population phase. In reality, however, an adaptive mutation spends time in the population before establishment. And if this time becomes on the order of ΔT, then adaptive mutations encounter bottlenecks during the process of establishment. In this case, establishment frequency will be >1/(N₁s) and establishment time will be longer than 1/(Θs) due to the increased drift during bottlenecks. We address these issues more thoroughly below when we analyze general demographic scenarios.

Our condition relating the bottleneck recurrence time and the bottleneck establishment time (3) makes the interesting prediction that for fixed values of Θ, ΔT, and N₂/N₁, there should be a threshold selection strength for hardening. Sweeps involving weaker selection than this threshold are likely to be hardened, whereas stronger sweeps are not. Thus, both hard and soft sweeps can occur in the same demographic scenario, depending on the strength of selection. This is in stark contrast to the constant population-size scenario, where primarily the value of Θ determines whether adaptation produces hard or soft sweeps while the strength of selection enters only logarithmically.

Soft sweeps under recurrent bottlenecks: Forward simulations

We performed extensive forward simulations of adaptation from de novo mutation under recurrent population bottlenecks to measure the likelihood of soft sweeps in our model and to assess the accuracy of condition (3) under a broad range of parameter values. In our simulations we modeled the dynamics of adaptive lineages at a single locus in a modified Wright–Fisher model with selection (Methods). To estimate the empirical probability of observing a soft sweep in a given simulation run, we calculated the probability that two randomly sampled individuals are not identical by decent at the time of fixation of the adaptive allele; i.e., their alleles arose from independent mutational origins.

Figure 2 shows phase diagrams of the empirical probabilities of soft sweeps in our simulations over a wide range of parameter values. We investigated three Θ-regimes that differ in the relative proportions at which hard and soft sweeps arise during the large phases before they experience a bottleneck: (i) Mostly hard sweeps arise during the large phase (Θ = 0.2), (ii) mostly soft sweeps arise during the large phase (Θ = 2), and (iii) practically only soft sweeps arise during the large phase (Θ = 20). For each value of Θ, we investigated three different bottleneck severities: N₁/N₂ = 10², N₁/N₂ = 10³, and N₁/N₂ = 10⁴.

Hardening of soft selective sweeps under recurrent population bottlenecks. The different bottleneck severities are shown from weaker to stronger(top to bottom) and different population-scale mutation rates, Θ, during the large population phases. The coloring of the squares specifies the proportion of soft sweeps observed in samples of two individuals at the time of fixation for 1000 simulations runs (*Methods*) with selection coefficient (s) and bottleneck recurrence time (ΔT) at the center of each square. The red lines indicate the boundary condition $Δ T = {τ^{'}}_{e s t}$ between the regime, where hardening is predicted to be likely (left of line) and unlikely (right of line) according to our heuristic condition (3). The dashed black line indicates the boundary condition N₂/N₁ = s on the severity of the bottleneck; below the line, bottlenecks are not severe enough for the hardening condition to be applicable. Note that for the low population-scale mutation rate Θ = 0.2 (left) only very few sweeps are soft initially during the large population phase, and hardening therefore is unlikely from the outset. In contrast, the top right shows very little hardening because mutations establish so frequently that weaker bottlenecks are unlikely to remove all but one of the mutations that establish during the sweep.

Our simulations confirm that hardening is common in populations that experience sharp, recurrent bottlenecks. The evolutionary parameters under which hardening is likely are qualitatively distinguished by the heuristic condition (3). Hardening becomes more likely with increasing severity of the population bottlenecks. For a fixed value of Θ and a fixed severity of the bottlenecks, hardening also becomes more likely the weaker the strength of positive selection and the shorter the recurrence time between bottlenecks, as predicted. For the scenarios with Θ = 0.2, most sweeps are already hard when they arise. Thus, there are only few soft sweeps that could be subject to hardening, leading to systematically lower values of P_soft compared to the scenarios with higher values of Θ. Note that the transition between the regimes where hardening is common and where it is uncommon can be quite abrupt. For example, in the scenario where Θ = 2, N₁/N₂ = 10⁴, and ΔT = 100 generations, an adaptive allele with s = 0.056 almost always (90%) produced a hard sweep in our simulations, whereas an allele with s = 0.1 mostly (57%) produced a soft sweep.

Probability of soft sweeps in complex demographic scenarios

In this section we describe an approach for calculating the probability of observing soft sweeps from recurrent de novo mutation that can be applied to complex demographic scenarios. We assume that the population is initially monomorphic for the wild-type allele, a, and that the adaptive allele, A, has selection coefficient s and arises through mutation of the wild-type allele at rate U_A per individual, per generation. Let P_soft,_n(t₀, s) denote the probability that a sweep arising at time t₀ is soft in a sample of n adaptive alleles. Generally P_soft,_n(t₀, s) also is a function of the trajectory, x(t ≥ t₀), of the adaptive allele, the specific demographic scenario, N(t ≥ t₀), and the sampling time, t_n.

We can calculate P_soft,_n(t₀, s) given x(t), N(t), and t_n using a straightforward extension of the approach employed by Pennings and Hermisson (2006a) in deriving P_soft,_n(Θ) for a population of constant size, which resulted in Equation 1. In particular, we can model the genealogy of adaptive alleles in a population sample by a coalescent process with “killings” (Durrett 2008). In this process, two different types of events can occur in the genealogy of adaptive alleles when going backward in time from the point of sampling: two branches can coalesce, or a branch can mutate from the wild-type allele to the adaptive allele (Figure 3). In the latter case, the branch in which the mutation occurred is stopped (referred to as killing). Thus, each pairwise coalescence event and each mutation event reduce the number of ancestors in the genealogy by one. The process stops when the last branch is stopped by a mutation (which cannot occur further back in the past than time t₀, the time when the adaptive allele first arose in the population).

Modeling the genealogy of adaptive alleles by a coalescent process with killings. Population size N(t) can vary arbitrarily over time in our model (top). An adaptive allele arises in the population (indicated by x) in generation t₀ and subsequently sweeps through the population (red frequency trajectory x(t), middle). Before fixation, a second adaptive lineage arises by mutation (indicated by second x) and also sweeps through the population (blue frequency trajectory, middle). Bottom: possible genealogy of a population sample of n = 6 adaptive alleles, taken at the time t₆. When tracing the lineages back in time, a pair of branches can coalesce (events t₁, t₂, t₄, and t₅) or a branch can mutate (events t₀ and t₃), indicating *de novo* mutational origin of the adaptive allele. In the latter case the lineage is killed. The example shown is a soft sweep because a second *de novo* mutation occurs before all individuals have coalesced into a single ancestor.

Hard and soft sweeps have straightforward interpretations in this framework: In a hard sweep, all individuals in the sample carry the adaptive allele from the same mutational origin and therefore coalesce into a single ancestor before the process finally stops. In a soft sweep, on the other hand, at least one additional mutation occurs before the process stops (Figure 3).

We depart from the Wright–Fisher framework here and instead model this coalescent as a continuous-time Markov process. The instantaneous rates of coalescence (λ_coal) and mutation (λ_mut) at time t, assuming that k ancestors are present in the genealogy at this time, are then given by

λ_{coal} (t, k) \approx \frac{k (k - 1)}{2 N_{e} (t) x (t)} and λ_{mut} (t, k) \approx \frac{k U_{A} [1 - x (t)]}{x (t)},

(4)

where N_e(t) is the single-generation variance effective population size in generation t. Note that these are the same rates that are derived and used by Pennings and Hermisson (2006a), with the only difference being that in our case the population size N_e(t) can vary over time.

Let us for now assume that we were to actually know the times t₁, …, t_n₋₁ at which coalescence or mutation events happen in the genealogy, where t_k for k = 1, …, n − 1 specifies the time at which the coalescence or mutation event happens that reduces the number of ancestors from k + 1 to k, and t_n specifies the time of sampling (Figure 3). Note that we do not make any assumptions about when the sample is taken; we require only that n copies of the adaptive allele are present in the sample. Given a pair of successive time points, t_k and t_k₊₁, we can calculate the probability P_coal(t_k) that this event is a coalescence event, rather than a mutation event, using the theory of competing Poisson processes:

\begin{matrix} P_{coal} (t_{k}) = \frac{\int_{t_{k}}^{t_{k + 1}} λ_{coal} (t, k + 1) d t}{\int_{t_{k}}^{t_{k + 1}} [λ_{coal} (t, k + 1) + λ_{mut} (t, k + 1)] d t} \\ = \frac{k}{k + Θ_{k}} . \end{matrix}

(5)

The last equation holds if we define an effective Θ_k as

Θ_{k} = 2 U_{A} A_{k} ((1 - x) / x) H_{k} (N_{e} x),

(6)

where $H_{k} (y) = (t_{k + 1} - t_{k}) / \int_{t_{k}}^{t_{k + 1}} y {(t)}^{- 1} d t$ denotes the harmonic mean and $A_{k} (y) = \int_{t_{k}}^{t_{k + 1}} y (t) d t / (t_{k + 1} - t_{k})$ the arithmetic mean, estimated over the interval [t_k, t_k₊₁]. This effective Θ_k recovers the original result Θ_k = 2N_eU_A from Pennings and Hermisson (2006a) for the special case of constant population size, where $H_{k} (N_{e} x) = N_{e} A_{k}^{- 1} (1 / x)$ and mutation and coalescence should be likely only during the early phase of a sweep, when $A_{k} ((1 - x) / x) \approx A_{k} (1 / x)$ .

The effective Θ_k from Equation 6 describes the product of two specific means estimated during the time interval between events at t_k and t_k₊₁: (i) the arithmetic mean of twice the rate at which mutations toward the adaptive allele occur per individual and (ii) the harmonic mean of N_ex, the effective number of individuals that carry the adaptive allele at time t. The first mean is independent of demography and is largest during the early phase of a sweep when x(t) is small. The second mean depends on the product of both the trajectory, x(t), and the demography, N_e(t). Importantly, as a harmonic mean, it is dominated by the smallest values of N_ex during the estimation interval. Thus, even if the estimation interval lies in a later stage of the sweep, when x(t) is larger than it was early in the sweep, the harmonic mean could nevertheless be small if N_e(t) is small at some point during this interval. In general, when population size varies over time, it is not always true that most coalescence occurs during the early phase of a sweep, and we therefore do not adopt this assumption here. For instance, if a strong bottleneck is encountered late during the sweep, most coalescence can occur within this bottleneck.

Given an arbitrary demographic scenario, N_e(t), and trajectory x(t) of the adaptive allele, Equation 6 allows us to calculate each effective Θ_k if we know the time points t_k and t_k₊₁. Given the sequence {Θ_k} for all k = 1, …, n − 1, we can then calculate the probability that the sweep in our sample is hard, as this is only the case if all individual events in the genealogy happen to be coalescence events. The probability that this happens is the product of all P_coal(t_k). Hence, the probability that the sweep is soft in our sample is

\begin{matrix} P_{soft, n} ({Θ_{k}}) = 1 - P_{hard, n} ({Θ_{k}}) \\ = 1 - \prod_{k = 1}^{n - 1} \frac{k}{k + Θ_{k}} . \end{matrix}

(7)

Calculating Θ_k for a given demographic scenario

The above calculation of P_soft,_n based on Equations 6 and 7 presupposed that we actually know the trajectory of the adaptive allele and the times t_k at which coalescence or mutation events occur in the genealogy. This assumption is unrealistic in practice. A full treatment of the problem in the absence of such information then requires integrating over all possible trajectories and all individual times at which coalescence or mutation events can occur, where we weigh each particular path x(t) and sequence of event times t₁, …, t_n by their probabilities.

Instead of performing such a complicated ensemble average, we use a deterministic approximation for the trajectory x(t) and then model the times t_k as stochastic random variables that we estimate numerically. Specifically, we model the frequency trajectory of an adaptive allele in the population by

x^{*} (t > t_{0}) = \frac{e^{s (t - t_{0})}}{N (t_{0}) P_{fix} (t_{0}, s) - 1 + e^{s (t - t_{0})}},

(8)

where P_fix(t₀, s) is the fixation probability of a new mutation of selection coefficient s that arises in the population at time t₀ in a single copy (Uecker and Hermisson 2011). Calculating such fixation probabilities when population size varies over time has been the subject of several studies and is well understood (Ewens 1967; Otto and Whitlock 1997; Pollak 2000; Patwa and Wahl 2008; Engen et al. 2009; Parsons et al. 2010; Uecker and Hermisson 2011; Waxman 2011). For example, Uecker and Hermisson (2011) have derived the following general formula for calculating P_fix(t₀, s) under arbitrary demographic scenarios:

P_{fix} (t_{0}, s) = \frac{2}{1 + N (t_{0}) \int_{t_{0}}^{∞} [e^{- s (t - t_{0})} / N_{e} (t)] d t} .

(9)

Here N_e(t) again specifies the single-generation variance effective population size in generation t. This approximation works well as long as the number of beneficial mutations that enter the population during the sweep is not extremely high (Θ ≫ 1), in which case one would need to explicitly include the contribution from mutation in the formulation of the birth–death process.

Assuming that the adaptive allele follows the deterministic trajectory, x^∗(t), from Equation 8, we can calculate the expected rates of coalescence, $λ_{coal}^{*} (t, k)$ , and mutation, $λ_{mut}^{*} (t, k)$ , in the genealogy of adaptive alleles in a population sample. Let us assume the sample of size n is taken at t_n. We can estimate the times t_k (k = 1, …, n − 1) at which the number of ancestors goes from k + 1 to k using the relation

n - k = \sum_{j = k}^{n - 1} \int_{{\hat{t}}_{j}}^{{\hat{t}}_{j + 1}} [λ_{coal}^{*} (t, j + 1) + λ_{mut}^{*} (t, j + 1)] d t .

(10)

In other words, the time estimates ${\hat{t}}_{k}$ can be calculated recursively going backward in time event-by-event from the point of sampling until n − k events have occurred in the genealogy. Given the time estimates ${\hat{t}}_{k}$ , one can then calculate the estimate for Θ_k via Equation 6 and estimate P_soft,_n(t₀, s) via Equation 7. See Methods for a more precise explanation of how this is accomplished in practice.

Application for cycling populations

To illustrate and verify our approach for calculating P_soft,_n(t₀, s), we examine selective sweeps in a population that undergoes cyclical population-size changes. In particular, we model a haploid Wright–Fisher population with a time-dependent population size given by

N (t) = \frac{N_{min} + N_{max}}{2} + \frac{N_{max} - N_{min}}{2} \sin (\frac{2 π t}{Δ T}) .

(11)

As illustrated in Figure 4A, this specifies a population that cycles between a minimal size, N_min, and a maximal size, N_max, over a period of ΔT generations. We investigate selective sweeps with four different starting times (t₀) at which the successfully sweeping allele first arises within a cycle: t₀ = 0, t₀ = 0.25ΔT, t₀ = 0.5ΔT, and t₀ = 0.75ΔT. These four cases describe, in order, a starting time of the sweep midway during a growth phase, at the end of a growth phase, midway of a decline phase, and at the end of a decline phase (Figure 4A). For each starting time we calculate the expected probability P_soft,2(t₀, s) of observing a soft sweep in a sample of size 2 as a function of the selection coefficient (s), of the adaptive allele, assuming that the population is sampled when the adaptive allele has reached population frequency x = 1/2. In contrast to sampling at the time of fixation, this criterion does not depend on the actual population size (e.g., in a growing population fixation can take very long). Note that the probability P_soft,2(t₀, s) is the probability that two adaptive alleles in a random population sample are not identical by decent.

We derived our analytical predictions for P_soft,2(t₀, s) by first calculating P_fix(t₀, s) for the given N(t), t₀, and s via numerical integration of Equation 9 and then inserting the result into Equation 8 to obtain the trajectory x^∗(t), using the scaling N_e(t) = N(t)/(1 + s) for concordance between the generalized birth–death model used by Uecker and Hermisson (2011) and the Wright–Fisher model. We then estimated ${\hat{t}}_{1}$ via numerical integration of Equation 10 (Methods), assuming that the adaptive allele reaches frequency x = 1/2 at

t_{2} = t_{0} + \frac{\log [N_{e} (t_{0}) P_{fix} (t_{0}, s)]}{s} .

(12)

Figure 4B shows the comparison between our analytical predictions for P_soft,2(t₀, s) and the observed frequencies of soft sweeps in Wright–Fisher simulations for a scenario with population sizes N_min = 10⁶ = 0.01N_max, cycle period ΔT = 500, and adaptive mutation rate U_A = 10⁻⁸, as a function of the strength of positive selection and the starting time of the sweep within a cycle. Simulation results are in good agreement with analytical predictions over the whole range of investigated parameters.

We observe two characteristic limits in our cyclical population-size model, specified by the relation between the duration of the sweeps (which inversely depends on the selection strength) and the timescale over which demographic processes occur:

Weak selection/fast fluctuation limit: When the duration of a sweep becomes much longer than the period of population-size fluctuations, the probability of observing a soft sweep converges to that expected in a population of constant size, given by the harmonic mean of N_e(t) estimated over a population cycle (dash–dotted line in Figure 4B). The starting time of the sweep becomes irrelevant in this case. To show this, we partition the embedded integral $\int 1 / (N_{e} x) d t$ in Equation 6 into consecutive intervals, each extending over one population cycle. Because x(t) changes slowly compared with the timescale of a population cycle, we can assume that x(t) is approximately constant over each such interval. The harmonic mean then factorizes into H_k(N_ex) = H_k(N_e)H_k(x), and Equation 6 reduces to
$Θ_{k} = 2 U_{A} H_{k} (N_{e}) [1 - H_{k} (x)] \approx 2 U_{A} H_{k} (N_{e}) .$ (13)

The last approximation holds as long as k is not too large, in which case the lowest value of x(t) in the interval, and thus also H_k(x), are still small, since the harmonic mean is dominated by the smallest values.

Note that the above argument applies more broadly and is not necessarily limited to scenarios where population-size fluctuations are exactly cyclical. In general, a sufficient condition for the factorization in Equation 13 is the existence of a timescale ξ that is much shorter than the duration of the sweep, where harmonic averages of N(t) estimated over time intervals of length ξ are already approximately constant for every interval lying within the duration of the sweep. In other words, factorization works for all demographic models that have fast fluctuation modes that we can effectively average out but no slow fluctuation modes occurring over timescales comparable to the duration of the sweep.

Examples for demographic models where the weak selection/fast fluctuation limit becomes applicable include those where N(t) is any periodic function with a period much shorter than the duration of the sweep. Another example would be a model in which population sizes are drawn randomly from a distribution with fixed mean, where the number of drawings over the duration of the sweep is large enough such that harmonic averages already converge to the mean over timescales much shorter than the duration of the sweep.
Strong selection/slow fluctuation limit: When the duration of a sweep becomes much shorter than the timescale over which population size changes, the probability of observing a soft sweep in the cyclical population model converges to that which is expected in a population of constant size N_e(t₀), the effective population size at the starting time of the sweep. In this case the effective Θ_k from Equation 6 reduces to
$Θ_{k} = 2 U_{A} N_{e} (t_{0}) [1 - H_{k} (x)] \approx 2 U_{A} N_{e} (t_{0}) .$ (14)

We can also recover these weak and strong selection limits for our earlier simulations of the recurrent bottleneck scenario. Figure 4C shows the transition from what is expected in a constant population given by the harmonic mean population size over one bottleneck cycle, H(N_e), to a constant population at the instantaneous population size, N_e(t₀) ≈ N₁. The expectations in the limits were calculated using Equation 1 while substituting the appropriate effective population size. Again we see that even for the same demographic scenario, the probability of observing a soft sweep can vary dramatically with selection coefficient. This implies that there is generally no one effective population size that will be relevant for determining the expected selective sweep signature. Note also that while the transition between the two regimes in our hardening model is monotonic, the transition is not guaranteed to be monotonic in more complex demographic scenarios, as seen for some of the transitions in our cycling population model.

Discussion

In this study we investigated the population parameters that determine the probability of observing soft selective sweeps when adaptation arises from de novo mutations. Our understanding of soft sweeps has hitherto been limited to the special case in which population size remains constant over time. In this special case, the probability of soft sweeps from recurrent de novo mutation depends primarily on the population-scale mutation rate toward the adaptive allele, Θ = 2N_eU_A, and is largely independent of the strength of selection (Pennings and Hermisson 2006a). We devised a unified framework for calculating the probability of observing soft sweeps when population-size changes over time and found that the strength of selection becomes a key factor for determining the likelihood of observing soft sweeps in many demographic scenarios.

The hardening phenomenon

We first demonstrated that population bottlenecks can give rise to a phenomenon that we term the hardening of soft selective sweeps. Hardening describes a situation where several adaptive mutations of independent origin—initially destined to produce a soft sweep in a constant population—establish in the population, but only one adaptive lineage ultimately survives a subsequent bottleneck, resulting in a hard selective sweep.

Using a simple heuristic approach that models the trajectories of adaptive alleles forward in time, we showed that in populations that experience recurrent, sharp bottlenecks, the likelihood of such hardening depends on the comparison of two characteristic timescales: (i) the recurrence time (ΔT) between bottlenecks and (ii) the bottleneck establishment time $({τ^{'}}_{e s t})$ , which specifies the waiting time until a de novo adaptive mutation reaches a high-enough frequency such that it is virtually guaranteed to survive a bottleneck. We derived a simple heuristic approximation, ${τ^{'}}_{e s t} = [Θ^{- 1} + log (N_{1} s / N_{2})] / s$ , that applies when bottlenecks are severe enough (N₁s > N₂ with N₂ ≫ 1). If soft sweeps are expected to arise between bottlenecks— i.e., if Θ is on the order of one or larger during those phases—then hardening is common when $Δ T < {τ^{'}}_{e s t}$ , whereas it is unlikely when $Δ T > {τ^{'}}_{e s t}$ . The bottleneck establishment time increases only logarithmically with the severity of the bottleneck and scales inversely with the selection coefficient of the adaptive mutation. In stark contrast to a population of constant size, the probability of observing soft sweeps can therefore strongly depend on the strength of selection in the recurrent bottleneck scenario.

Generalized analytical framework for complex demographies

The heuristic condition $Δ T < {τ^{'}}_{e s t}$ provides a rough estimate of whether hardening is expected in a recurrent bottleneck scenario, but it lacks generality for more complex demographic scenarios and does not provide the actual probabilities of observing soft sweeps. We showed that such probabilities can be approximated analytically for a wide range of demographic models by mapping the problem onto a coalescent with killings process (Durrett 2008). Our approach is very similar to that employed by Pennings and Hermisson (2006a) for the constant size model, with the primary difference being that we allow for coalescence and mutation rates to vary over time as population size changes.

In the coalescent with killings framework (Figure 3), the probability of a soft sweep is determined by the competition between two processes: coalescence in the fraction x(t) of the population that carry the adaptive allele, and emergence of new adaptive lineages through de novo mutation (referred to as killings when going backward in time) in the fraction 1 − x(t) of the population that do not yet carry the adaptive allele. A sweep is hard in a population sample if all individuals in that sample coalesce before a second adaptive mutation arises and soft otherwise. In our analytical approach, we assume that the trajectory x(t) can be described by a logistic function. The probability of observing a soft sweep can then be calculated through numerical integration of the expected rates of coalescence and mutation in the genealogy, which are simple functions of x(t) and N_e(t), the variance effective population size in generation t.

Note that by adjusting the endpoint of the integration interval to the time at which the adaptive allele reaches a given frequency, our approach can easily be extended to the analysis of partial selective sweeps. Similarly, by extending the time interval beyond the fixation of the adaptive allele, one can study the loss of adaptive lineages due to random genetic drift after the completion of a soft sweep. Moreover, since our model requires only an estimate of the frequency trajectory of the adaptive allele, x(t), it should be easily extendable to other, more complex scenarios, including time-varying selection coefficients (Uecker and Hermisson 2011), as long as one can still model x(t) in the particular scenario. We leave these possible extensions for future exploration.

Even though the results presented in this article were derived for haploid populations, it is straightforward to extend them to other levels of ploidy. The key prerequisite is again that we still have an estimate for the frequency trajectory of the adaptive allele, which can be complicated by dominance effects when ploidy increases. Given the trajectory, the population size N(t) simply needs to be multiplied by the ploidy level to adjust for the changed rate of coalescence in the genealogy. For example, in a diploid population with codominance, the population-scale mutation rate needs to be defined as Θ = 4N_eU_A, twice the value for a haploid population of the same size.

Weak and strong selection limits

Our approach reveals interesting analogies to Kingman’s coalescent (Kingman 1982) with respect to our ability to map the dynamics onto an effective model of constant population size. Sjödin et al. (2005) showed that genealogies at neutral loci can be described by a linear rescaling of Kingman’s coalescent with a corresponding coalescent effective population size, as long as demographic processes and coalescence events operate on very different timescales. Specifically, when population-size fluctuations occur much faster compared with the timescale of coalescence, the coalescent effective population size is given by the harmonic mean of the variance effective population size, N_e(t), estimated over the timescale of coalescence. In the opposite limit where population-size fluctuations occur much more slowly than the timescales of neutral coalescence, the variance effective population size is approximately constant over the relevant time interval and directly corresponds to the instantaneous coalescent effective population size.

Analogously, in our analytical framework for determining the likelihood of soft sweeps, we can again map demography onto an effective model with constant effective population size in the two limits where population-size fluctuations are either very fast or very slow. The relevant timescale for comparison here is the duration of the selective sweep, τ_fix ∼ log(Ns)/s, which is inversely proportional to the selection coefficient of the sweep. Hence, the fast fluctuation limit corresponds to a weak selection limit, and the slow fluctuation limit corresponds to a strong selection limit. In the strong selection/slow fluctuation limit, the relevant effective population size is the instantaneous effective population size at the start of the sweep; in the weak selection/fast fluctuation limit, it is the harmonic mean of the variance effective population size estimated over the duration of the sweep.

One important consequence of this finding is that, even in the same demographic scenario, the probability of observing soft sweeps can differ substantially for weakly and strongly selected alleles. This is because the harmonic mean that determines the effective population size in the weak selection/fast fluctuation limit will be dominated by the phases where population size is small. For a weakly selected allele in a population that fluctuates much faster than the duration of the sweep, it will be close to the minimum size encountered during the sweep, resulting in a low effective population size and, correspondingly, a low probability of observing a soft sweep. A strongly selected allele, on the other hand, can arise and sweep to fixation between collapses of the population. The effective population size remains large in this case, increasing the probability of observing a soft sweep. Hence, the stronger the selective sweep, the higher the chance that it will be soft in a population that fluctuates in size.

Similar behavior is observed for the fixation probabilities of adaptive alleles in fluctuating populations. In particular, Otto and Whitlock (1997) showed that the fixation process of an adaptive allele depends on the timescale of the fixation itself. Only short-term demographic changes encountered during the fixation event matter for strongly selected alleles, whereas slower changes affect only weakly selected alleles. Otto and Whitlock (1997) therefore concluded that “there is no single effective population size that can be used to determine the probability of fixation for all new beneficial mutations in a population of changing size” (p. 728).

Hard vs. soft selective sweeps in natural populations

How relevant is our finding that the likelihood of observing soft sweeps can strongly depend on the strength of selection for understanding adaptation in realistic populations? We know that both necessary ingredients for this effect to occur—strong temporal fluctuations in population size and difference in the fitness effects of de novo adaptive mutations—are common in nature.

Population-size fluctuations over several orders of magnitude are observed in various animal species, ranging from parasitic worms to insects and even small mammals (Berryman 2002). Unicellular organisms often undergo even more dramatic changes in population size. For instance, during Malaria infection only 10 to a 100 sporozoites are typically ejected by a feeding mosquito—the numbers of sporozoites that successfully enter the human blood stream are even smaller—yet this population grows to many billions of parasites within an infected individual (Rosenberg et al. 1990). Similarly, in the majority of cases, acute HIV infection was found to result from a single virus (Keele et al. 2008). Severe population bottlenecks resulting from serial dilution are also commonly encountered in evolution experiments with bacteria and yeast (Wahl et al. 2002). Even our own species has likely experienced population-size changes over more than three orders of magnitude within the past 1000 generations (Gazave et al. 2014).

It is also well established that fitness effects of de novo adaptive mutations can vary over many orders of magnitude within the same species. For example, codon bias is typically associated with only weak selective advantages, whereas the fitness advantage during the evolution of drug resistance in pathogens or pesticide resistance in insects can be on the order of 10% or larger.

Taken together, we predict that we should be able to observe strong dependence of the likelihood of hardening on the strength of selection for adaptation in natural populations that experience a demographic phase where adaptation is not mutation limited. The likelihood of observing soft sweeps will depend on the types of natural population fluctuations that occur and whether they can be characterized by the weak selection/fast fluctuation limit or the strong selection/slow fluctuation limit.

To demonstrate this possibility, consider a cycling population illustrated in Figure 5A that is based on data from the extreme fluctuations observed in multiple species of moths, including the tea tortrix, Adoxophyes honmai, and the larch budmoth, Zeiraphera diniana. These diploid moth species have been observed to undergo changes in population size spanning many orders of magnitude over short periods of just four to five generations (Baltensweiler and Fischlin 1988; Nelson et al. 2013). Let us further assume that these changes result in a change in the adaptive population-scale mutation rate between Θ_min = 10⁻³ and Θ_max = 1. In this case, adaptation is not mutation limited during population maxima and is mutation limited during population minima. Consequently, hardening of soft selective sweeps could be common.

Figure 5B shows the likelihood of soft sweeps in this scenario according to Equation 7, as a function of the strength of selection and the starting time of the sweep. The probability of observing soft sweeps generally remains low in this scenario, except for cases of extremely strong selection. We can understand this result from the fact that the timescale of population-size fluctuations is so fast that all but the most strongly selected alleles still fall within the weak selection limit, described by the harmonic mean effective population size.

This result has important consequences for the study of other populations that fluctuate over similarly short timescales, such as the fruit fly Drosophila melanogaster. Natural populations of D. melanogaster undergo ∼10–20 generations over a seasonal cycle, often experiencing enormous population sizes during the summer that collapse again each winter (Ives 1970). Our result then suggests that only the most strongly selected alleles, which can arise and sweep over a single season, may actually fall within the strong selection limit. All other sweeps should still be governed by the harmonic mean of the population size averaged over a yearly cycle, which will be dominated by the small winter population sizes. Note that this also could mean that some of the strongest adaptations would be missed by genome scans unless they incorporate recent methodologies that are capable of detecting signatures associated with soft selective sweeps (Garud et al. 2013; Ferrer-Admetlla et al. 2014).

Let us consider another example, motivated by the proposed recent demographic history of the European human population (Coventry et al. 2010; Nelson et al. 2012; Tennessen et al. 2012; Gazave et al. 2014). Specifically, we consider a population that was small throughout most of its history and has recently experienced a dramatic population expansion. We assume demographic parameters similar to those estimated by Gazave et al. (2014), i.e., an ancestral population size of N_anc = 10⁴, followed by exponential growth over a period of 113 generations, reaching a current size of N_cur ≈ 520,000 individuals (Figure 5C). We further assume that exponential growth halts at present and that population size remains constant thereafter. Note that this scenario is qualitatively different from the previously discussed models in that population-size changes are nonrecurring. As a result, the weak selection/fast fluctuation limit does not exist in this case. For determining whether a given selective sweep will likely be hard or soft in this model, its starting time becomes of crucial importance.

We assume an adaptive mutation rate of U_A = 5 × 10⁻⁷ for this example to illustrate the transition between mutation-limited behavior in the ancestral population, where Θ_anc = 4N_ancU_A ≈ 0.02, and non-mutation-limited behavior in the current population, where Θ_cur = 4N_curU_A ≈ 1.0. Note that this adaptive mutation rate is higher than the single nucleotide mutation rate in humans, but it may be appropriate for describing adaptations that have larger mutational target size, such as loss-of-function mutations or changes in the expression level of a gene. Moreover, if we were to assume that the current effective population size of the European human population is in fact N_cur ≈ 2 × 10⁷—still over an order of magnitude smaller than its census size—we would already be in the non-mutation-limited regime for U_A ≈ 10⁻⁸, the current estimate of the single-nucleotide mutation rate in humans (Kong et al. 2012).

Figure 5D shows the probabilities of soft sweeps in this scenario predicted by our approach as a function of the strength of selection and starting time of the sweep. The results confirm our intuition that almost all sweeps that start prior to the expansion are hard in a sample of size 10, as expected for adaptation by de novo mutation in a mutation-limited scenario, whereas sweeps starting in the current, non-mutation-limited regime are almost entirely soft, regardless of the strength of selection. Sweeps starting during the expansion phase show an interesting crossover behavior between hard and soft sweeps. The strength of selection becomes important in this case. Specifically, sweeps that start during the expansion have a higher probability of producing soft sweeps when they are driven by weaker selection than when they are driven by stronger selection. This effect can be understood from the fact that stronger sweeps go to fixation faster than the weaker sweeps. Hence, in a growing population, a weaker sweep will experience larger population sizes during its course than a stronger sweep starting at the same time, increasing its probability of becoming soft.

When expanding the intuition from our single-locus model to whole genomes, we must bear in mind that the effective Θ determining the probability of soft sweeps is not the same for different loci across the genome because mutational target sizes and thus adaptive mutation rates vary at different loci. For example, adaptive loss-of-function mutations likely have a much higher value of U_A than adaptive single-nucleotide mutations. Therefore, no single value of Θ will be appropriate for describing the entire adaptive dynamics of a population. Adaptation across the genome can simultaneously be mutation limited and non-mutation limited in the same population, depending on population-size fluctuations, mutation rate, target size, and the strength of selection. Furthermore, we should be very cautious when assuming that estimators for Θ based on genetic diversity will inform us about whether recent adaptation produce hard or soft sweeps. Estimators based on the levels of neutral diversity in a population, such as Θ_π and Watterson’s Θ_W (Ewens 2004), can be strongly biased downward by ancient bottlenecks and recurrent linked selection.

Finally, the overall prevalence of soft sweeps should depend on when adaptation and directional selection is common. If adaptation is limited by mutational input, then most adaptive mutations should arise during the population booms, biasing us toward seeing more soft sweeps. On the other hand, it is also possible—maybe even more probable—that adaptation will be common during periods of population decline, such as when population decline is caused by a strong selective agent like a new pathogen, competitor, predator, or a shortage in the abundance of food. If adaptation is more common during population busts, this should lead us to observe more hard sweeps.

These considerations highlight one of the key limits of the current analysis—we have considered only scenarios in which population size and selection coefficients are independent of each other. In the future, we believe that models that consider population size and fitness in a unified framework will be necessary to fully understand signatures that adaptation leaves in populations of variable size.

Acknowledgments

We thank Pleuni Pennings, Jamie Blundell, and Hildegard Uecker for useful discussions leading to the formulation of our primary results. We thank Nandita Garud, Joachim Hermisson, Marc Feldman, Daniel Fisher, and members of the Petrov lab for comments and suggestions made prior to and during the formulation of this manuscript. B.A.W. is supported by the National Science Foundation Graduate Research Fellowship and National Institutes of Health (NIH)/National Human Genome Research Institute (NHGRI) T32 HG000044. This work was supported by the NIH under grants GM089926 and HG002568 to D.A.P.

Footnotes

Available freely online through the author-supported open access option.

Communicating editor: N. H. Barton

Literature Cited

Aminetzach Y. T., Macpherson J. M., Petrov D. A., 2005. Pesticide resistance via transposition-mediated adaptive gene truncation in Drosophila. Science 309: 764–767 [DOI] [PubMed] [Google Scholar]
Artenstein M. S., Miller W. S., 1966. Air sampling for respiratory disease agents in army recruits. Bacteriol. Rev. 30: 571. [DOI] [PMC free article] [PubMed] [Google Scholar]
Baltensweiler, W., and A. Fischlin, 1988 The larch budmoth in the Alps, pp. 331–351 in Dynamics of Forest Insect Populations, edited by A. A. Berryman. Springer, New York. [Google Scholar]
Berryman A., 2002. Population Cycles: The Case for Trophic Interactions. Oxford University Press, Oxford [Google Scholar]
Bersaglieri T., Sabeti P. C., Patterson N., Vanderploeg T., Schaffner S. F., et al. , 2004. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74: 1111–1120 [DOI] [PMC free article] [PubMed] [Google Scholar]
Burke M. K., 2012. How does adaptation sweep through the genome?: insights from long-term selection experiments. Proc. Biol. Sci. 279: 5029–5038 [DOI] [PMC free article] [PubMed] [Google Scholar]
Catania F., Kauer M., Daborn P., Yen J., Ffrench-Constant R., et al. , 2004. World-wide survey of an Accord insertion and its association with DDT resistance in Drosophila melanogaster. Mol. Ecol. 13: 2491–2504 [DOI] [PubMed] [Google Scholar]
Chung H., Bogwitz M. R., McCart C., Andrianopoulos A., Ffrench-Constant R. H., et al. , 2007. Cis-regulatory elements in the Accord retrotransposon result in tissue-specific expression of the Drosophila melanogaster insecticide resistance gene Cyp6g1. Genetics 175: 1071–1077 [DOI] [PMC free article] [PubMed] [Google Scholar]
Coventry A., Bull-Otterson L. M., Liu X., Clark A. G., Maxwell T. J., et al. , 2010. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat. Commun. 1: 131. [DOI] [PMC free article] [PubMed] [Google Scholar]
Desai M. M., Fisher D. S., 2007. Beneficial mutation selection balance and the effect of linkage on positive selection. Genetics 176: 1759–1798 [DOI] [PMC free article] [PubMed] [Google Scholar]
Domingues V. S., Poh Y. P., Peterson B. K., Pennings P. S., Jensen J. D., et al. , 2012. Evidence of adaptation from ancestral variation in young populations of beach mice. Evolution 66: 3209–3223 [DOI] [PubMed] [Google Scholar]
Durrett R., 2008. Probability Models for DNA Sequence Evolution. Springer, New York [Google Scholar]
Enattah N. S., Jensen T. G., Nielsen M., Lewinski R., Kuokkanen M., et al. , 2008. Independent introduction of two lactase-persistence alleles into human populations reflects different history of adaptation to milk culture. Am. J. Hum. Genet. 82: 57–72 [DOI] [PMC free article] [PubMed] [Google Scholar]
Engen S., Lande R., Sæther B.-E., 2009. Fixation probability of beneficial mutations in a fluctuating population. Genet. Res. 91: 73–82 [DOI] [PubMed] [Google Scholar]
Eriksson A., Fernstrom P., Mehlig B., Sagitov S., 2008. An accurate model for genetic hitchhiking. Genetics 178: 439–451 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ewens W. J., 1967. The probability of survival of a new mutant in a fluctuating environment. Heredity 22: 438–443 [DOI] [PubMed] [Google Scholar]
Ewens W. J., 2004. Mathematical Population Genetics, Ed. 2 Springer, New York [Google Scholar]
Ferrer-Admetlla A., Liang M., Korneliussen T., Nielsen R., 2014. On detecting incomplete soft or hard selective sweeps using haplotype structure. Mol. Biol. Evol. 31: 1275–1291 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fischer W., Ganusov V. V., Giorgi E. E., Hraber P. T., Keele B. F., et al. , 2010. Transmission of single HIV-1 genomes and dynamics of early immune escape revealed by ultra-deep sequencing. PLoS ONE 5: e12303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Galassi M., Davies J., Theiler J., Gough B., Jungman G., et al. , 2009. GNU Scientific Library: Reference Manual, Ed. 3 Network Theory, Bristol, UK [Google Scholar]
Garud, N. R., P. W. Messer, E. O. Buzbas, and D. A. Petrov, 2013 Soft selective sweeps are the primary mode of recent adaptation in Drosophila melanogaster ArXiv: 1303.0906.
Gazave E., Ma L., Chang D., Coventry A., Gao F., et al. , 2014. Neutral genomic regions refine models of recent rapid human population growth. Proc. Natl. Acad. Sci. USA 111: 757–762 [DOI] [PMC free article] [PubMed] [Google Scholar]
Gerone P. J., Couch R. B., Keefer G. V., Douglas R., Derrenbacher E. B., et al. , 1966. Assessment of experimental and natural viral aerosols. Bacteriol. Rev. 30: 576. [DOI] [PMC free article] [PubMed] [Google Scholar]
Haldane J. B. S., 1927. A mathematical theory of natural and artificial selection. V. selection and mutation. Math. Proc. Camb. Philos. Soc. 23: 838–844 [Google Scholar]
Hermisson J., Pennings P. S., 2005. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics 169: 2335–2352 [DOI] [PMC free article] [PubMed] [Google Scholar]
Hoekstra H. E., Hirschmann R. J., Bundey R. A., Insel P. A., Crossland J. P., 2006. A single amino acid mutation contributes to adaptive beach mouse color pattern. Science 313: 101–104 [DOI] [PubMed] [Google Scholar]
Hudson R. R., Bailey K., Skarecky D., Kwiatowski J., Ayala F. J., 1994. Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster. Genetics 136: 1329–1340 [DOI] [PMC free article] [PubMed] [Google Scholar]
Innan H., Kim Y., 2004. Pattern of polymorphism after strong artificial selection in a domestication event. Proc. Natl. Acad. Sci. USA 101: 10667–10672 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ives P. T., 1970. Further genetic studies of the South Amherst population of Drosophila melanogaster. Evolution 24: 507–518 [DOI] [PubMed] [Google Scholar]
Jones B. L., Raga T. O., Liebert A., Zmarz P., Bekele E., et al. , 2013. Diversity of lactase persistence alleles in Ethiopia: signature of a soft selective sweep. Am. J. Hum. Genet. 93: 538–544 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kaplan N. L., Hudson R. R., Langley C. H., 1989. The “hitchhiking effect” revisited. Genetics 123: 887–899 [DOI] [PMC free article] [PubMed] [Google Scholar]
Karasov T., Messer P. W., Petrov D. A., 2010. Evidence that adaptation in Drosophila is not limited by mutation at single sites. PLoS Genet. 6: e1000924. [DOI] [PMC free article] [PubMed] [Google Scholar]
Keele B. F., Giorgi E. E., Salazar-Gonzalez J. F., Decker J. M., Pham K. T., et al. , 2008. Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc. Natl. Acad. Sci. USA 105: 7552–7557 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim Y., Stephan W., 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160: 765–777 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kimura M., 1962. On the probability of fixation of mutant genes in a population. Genetics 47: 713–719 [DOI] [PMC free article] [PubMed] [Google Scholar]
Kingman J., 1982. The coalescent. Stochastic Process. Appl. 13: 235–248 [Google Scholar]
Kong A., Frigge M. L., Masson G., Besenbacher S., Sulem P., et al. , 2012. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488: 471–475 [DOI] [PMC free article] [PubMed] [Google Scholar]
Krebs C. J., Myers J. H., 1974. Population cycles in small mammals. Adv. Ecol. Res 8: 267–399 [Google Scholar]
Louicharoen C., Patin E., Paul R., Nuchprayoon I., Witoonpanich B., et al. , 2009. Positively selected G6PD-Mahidol mutation reduces Plasmodium vivax density in Southeast Asians. Science 326: 1546–1549 [DOI] [PubMed] [Google Scholar]
Maynard Smith J., Haigh J., 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23–35 [PubMed] [Google Scholar]
Messer P. W., Neher R. A., 2012. Estimating the strength of selective sweeps from deep population diversity data. Genetics 191: 593–605 [DOI] [PMC free article] [PubMed] [Google Scholar]
Messer P. W., Petrov D. A., 2013. Population genomics of rapid adaptation by soft selective sweeps. Trends Ecol. Evol. 28: 659–669 [DOI] [PMC free article] [PubMed] [Google Scholar]
Myers J. H., 1998. Synchrony in outbreaks of forest Lepidoptera: a possible example of the Moran effect. Ecology 79: 1111–1117 [Google Scholar]
Nair S., Nash D., Sudimack D., Jaidee A., Barends M., et al. , 2007. Recurrent gene amplification and soft selective sweeps during evolution of multidrug resistance in malaria parasites. Mol. Biol. Evol. 24: 562–573 [DOI] [PubMed] [Google Scholar]
Nelson M. R., Wegmann D., Ehm M. G., Kessner D., Jean P. S., et al. , 2012. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337: 100–104 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nelson W. A., Bjrnstad O. N., Yamanaka T., 2013. Recurrent insect outbreaks caused by temperature-driven changes in system stability. Science 341: 796–799 [DOI] [PubMed] [Google Scholar]
Otto S. P., Whitlock M. C., 1997. The probability of fixation in populations of changing size. Genetics 146: 723–733 [DOI] [PMC free article] [PubMed] [Google Scholar]
Parsons T. L., Quince C., Plotkin J. B., 2010. Some consequences of demographic stochasticity in population genetics. Genetics 185: 1345–1354 [DOI] [PMC free article] [PubMed] [Google Scholar]
Patwa Z., Wahl L., 2008. The fixation probability of beneficial mutations. J. R. Soc. Interface 5: 1279–1289 [DOI] [PMC free article] [PubMed] [Google Scholar]
Pelz H. J., Rost S., Hunerberg M., Fregin A., Heiberg A. C., et al. , 2005. The genetic basis of resistance to anticoagulants in rodents. Genetics 170: 1839–1847 [DOI] [PMC free article] [PubMed] [Google Scholar]
Pennings P. S., Hermisson J., 2006a Soft sweeps II: molecular population genetics of adaptation from recurrent mutation or migration. Mol. Biol. Evol. 23: 1076–1084 [DOI] [PubMed] [Google Scholar]
Pennings P. S., Hermisson J., 2006b Soft sweeps III: the signature of positive selection from recurrent mutation. PLoS Genet. 2: e186. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pennings P. S., Kryazhimskiy S., Wakeley J., 2014. Loss and recovery of genetic diversity in adapting populations of HIV. PLoS Genet. 10: e1004000. [DOI] [PMC free article] [PubMed] [Google Scholar]
Peter B. M., Huerta-Sanchez E., Nielsen R., 2012. Distinguishing between selective sweeps from standing variation and from a de novo mutation. PLoS Genet. 8: e1003011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pollak E., 2000. Fixation probabilities when the population size undergoes cyclic fluctuations. Theor. Popul. Biol. 57: 51–58 [DOI] [PubMed] [Google Scholar]
Przeworski M., Coop G., Wall J. D., 2005. The signature of positive selection on standing genetic variation. Evolution 59: 2312–2323 [PubMed] [Google Scholar]
Rosenberg R., Wirtz R. A., Schneider I., Burge R., 1990. An estimation of the number of malaria sporozoites ejected by a feeding mosquito. Trans. R. Soc. Trop. Med. Hyg. 84: 209–212 [DOI] [PubMed] [Google Scholar]
Sabeti P. C., Reich D. E., Higgins J. M., Levine H. Z., Richter D. J., et al. , 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419: 832–837 [DOI] [PubMed] [Google Scholar]
Schmidt J. M., Good R. T., Appleton B., Sherrard J., Raymant G. C., et al. , 2010. Copy number variation and transposable elements feature in recent, ongoing adaptation at the Cyp6g1 locus. PLoS Genet. 6: e1000998. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sjödin P., Kaj I., Krone S., Lascoux M., Nordborg M., 2005. On the meaning and existence of an effective population size. Genetics 169: 1061–1070 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tennessen J. A., Bigham A. W., Connor T. D. O., Fu W., Kenny E. E., et al. , 2012. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337: 64–69 [DOI] [PMC free article] [PubMed] [Google Scholar]
Tishkoff S. A., Reed F. A., Ranciaro A., Voight B. F., Babbitt C. C., et al. , 2007. Convergent adaptation of human lactase persistence in Africa and Europe. Nat. Genet. 39: 31–40 [DOI] [PMC free article] [PubMed] [Google Scholar]
Uecker H., Hermisson J., 2011. On the fixation process of a beneficial mutation in a variable environment. Genetics 188: 915–930 [DOI] [PMC free article] [PubMed] [Google Scholar]
Voight B. F., Kudaravalli S., Wen X., Pritchard J. K., 2006. A map of recent positive selection in the human genome. PLoS Biol. 4: e72. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wahl L. M., Gerrish P. J., Saika-Voivod I., 2002. Evaluating the impact of population bottlenecks in experimental evolution. Genetics 162: 961–971 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang G. P., Sherrill-Mix S. A., Chang K.-M., Quince C., Bushman F. D., 2010. Hepatitis C virus transmission bottlenecks analyzed by deep sequencing. J. Virol. 84: 6218–6228 [DOI] [PMC free article] [PubMed] [Google Scholar]
Waxman D., 2011. A unified treatment of the probability of fixation when population size and the strength of selection change over time. Genetics 188: 907–913 [DOI] [PMC free article] [PubMed] [Google Scholar]
Wolfs T. F., Zwart G., Bakker M., Goudsmit J., 1992. HIV-1 genomic RNA diversification following sexual and parenteral virus transmission. Virology 189: 103–110 [DOI] [PubMed] [Google Scholar]
Wright S., Dobzhansky T., Hovanitz W., 1942. Genetics of natural populations. VII. The allelism of lethals in the third chromosome of Drosophila pseudoobscura. Genetics 27: 363. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib1] Aminetzach Y. T., Macpherson J. M., Petrov D. A., 2005. Pesticide resistance via transposition-mediated adaptive gene truncation in Drosophila. Science 309: 764–767 [DOI] [PubMed] [Google Scholar]

[bib2] Artenstein M. S., Miller W. S., 1966. Air sampling for respiratory disease agents in army recruits. Bacteriol. Rev. 30: 571. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Baltensweiler, W., and A. Fischlin, 1988 The larch budmoth in the Alps, pp. 331–351 in Dynamics of Forest Insect Populations, edited by A. A. Berryman. Springer, New York. [Google Scholar]

[bib4] Berryman A., 2002. Population Cycles: The Case for Trophic Interactions. Oxford University Press, Oxford [Google Scholar]

[bib5] Bersaglieri T., Sabeti P. C., Patterson N., Vanderploeg T., Schaffner S. F., et al. , 2004. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74: 1111–1120 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Burke M. K., 2012. How does adaptation sweep through the genome?: insights from long-term selection experiments. Proc. Biol. Sci. 279: 5029–5038 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Catania F., Kauer M., Daborn P., Yen J., Ffrench-Constant R., et al. , 2004. World-wide survey of an Accord insertion and its association with DDT resistance in Drosophila melanogaster. Mol. Ecol. 13: 2491–2504 [DOI] [PubMed] [Google Scholar]

[bib8] Chung H., Bogwitz M. R., McCart C., Andrianopoulos A., Ffrench-Constant R. H., et al. , 2007. Cis-regulatory elements in the Accord retrotransposon result in tissue-specific expression of the Drosophila melanogaster insecticide resistance gene Cyp6g1. Genetics 175: 1071–1077 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Coventry A., Bull-Otterson L. M., Liu X., Clark A. G., Maxwell T. J., et al. , 2010. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nat. Commun. 1: 131. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] Desai M. M., Fisher D. S., 2007. Beneficial mutation selection balance and the effect of linkage on positive selection. Genetics 176: 1759–1798 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Domingues V. S., Poh Y. P., Peterson B. K., Pennings P. S., Jensen J. D., et al. , 2012. Evidence of adaptation from ancestral variation in young populations of beach mice. Evolution 66: 3209–3223 [DOI] [PubMed] [Google Scholar]

[bib12] Durrett R., 2008. Probability Models for DNA Sequence Evolution. Springer, New York [Google Scholar]

[bib13] Enattah N. S., Jensen T. G., Nielsen M., Lewinski R., Kuokkanen M., et al. , 2008. Independent introduction of two lactase-persistence alleles into human populations reflects different history of adaptation to milk culture. Am. J. Hum. Genet. 82: 57–72 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Engen S., Lande R., Sæther B.-E., 2009. Fixation probability of beneficial mutations in a fluctuating population. Genet. Res. 91: 73–82 [DOI] [PubMed] [Google Scholar]

[bib15] Eriksson A., Fernstrom P., Mehlig B., Sagitov S., 2008. An accurate model for genetic hitchhiking. Genetics 178: 439–451 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib16] Ewens W. J., 1967. The probability of survival of a new mutant in a fluctuating environment. Heredity 22: 438–443 [DOI] [PubMed] [Google Scholar]

[bib17] Ewens W. J., 2004. Mathematical Population Genetics, Ed. 2 Springer, New York [Google Scholar]

[bib18] Ferrer-Admetlla A., Liang M., Korneliussen T., Nielsen R., 2014. On detecting incomplete soft or hard selective sweeps using haplotype structure. Mol. Biol. Evol. 31: 1275–1291 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Fischer W., Ganusov V. V., Giorgi E. E., Hraber P. T., Keele B. F., et al. , 2010. Transmission of single HIV-1 genomes and dynamics of early immune escape revealed by ultra-deep sequencing. PLoS ONE 5: e12303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Galassi M., Davies J., Theiler J., Gough B., Jungman G., et al. , 2009. GNU Scientific Library: Reference Manual, Ed. 3 Network Theory, Bristol, UK [Google Scholar]

[bib21] Garud, N. R., P. W. Messer, E. O. Buzbas, and D. A. Petrov, 2013 Soft selective sweeps are the primary mode of recent adaptation in Drosophila melanogaster ArXiv: 1303.0906.

[bib22] Gazave E., Ma L., Chang D., Coventry A., Gao F., et al. , 2014. Neutral genomic regions refine models of recent rapid human population growth. Proc. Natl. Acad. Sci. USA 111: 757–762 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Gerone P. J., Couch R. B., Keefer G. V., Douglas R., Derrenbacher E. B., et al. , 1966. Assessment of experimental and natural viral aerosols. Bacteriol. Rev. 30: 576. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Haldane J. B. S., 1927. A mathematical theory of natural and artificial selection. V. selection and mutation. Math. Proc. Camb. Philos. Soc. 23: 838–844 [Google Scholar]

[bib25] Hermisson J., Pennings P. S., 2005. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics 169: 2335–2352 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Hoekstra H. E., Hirschmann R. J., Bundey R. A., Insel P. A., Crossland J. P., 2006. A single amino acid mutation contributes to adaptive beach mouse color pattern. Science 313: 101–104 [DOI] [PubMed] [Google Scholar]

[bib27] Hudson R. R., Bailey K., Skarecky D., Kwiatowski J., Ayala F. J., 1994. Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster. Genetics 136: 1329–1340 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Innan H., Kim Y., 2004. Pattern of polymorphism after strong artificial selection in a domestication event. Proc. Natl. Acad. Sci. USA 101: 10667–10672 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Ives P. T., 1970. Further genetic studies of the South Amherst population of Drosophila melanogaster. Evolution 24: 507–518 [DOI] [PubMed] [Google Scholar]

[bib30] Jones B. L., Raga T. O., Liebert A., Zmarz P., Bekele E., et al. , 2013. Diversity of lactase persistence alleles in Ethiopia: signature of a soft selective sweep. Am. J. Hum. Genet. 93: 538–544 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib31] Kaplan N. L., Hudson R. R., Langley C. H., 1989. The “hitchhiking effect” revisited. Genetics 123: 887–899 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Karasov T., Messer P. W., Petrov D. A., 2010. Evidence that adaptation in Drosophila is not limited by mutation at single sites. PLoS Genet. 6: e1000924. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] Keele B. F., Giorgi E. E., Salazar-Gonzalez J. F., Decker J. M., Pham K. T., et al. , 2008. Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc. Natl. Acad. Sci. USA 105: 7552–7557 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Kim Y., Stephan W., 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160: 765–777 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Kimura M., 1962. On the probability of fixation of mutant genes in a population. Genetics 47: 713–719 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib36] Kingman J., 1982. The coalescent. Stochastic Process. Appl. 13: 235–248 [Google Scholar]

[bib37] Kong A., Frigge M. L., Masson G., Besenbacher S., Sulem P., et al. , 2012. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488: 471–475 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Krebs C. J., Myers J. H., 1974. Population cycles in small mammals. Adv. Ecol. Res 8: 267–399 [Google Scholar]

[bib39] Louicharoen C., Patin E., Paul R., Nuchprayoon I., Witoonpanich B., et al. , 2009. Positively selected G6PD-Mahidol mutation reduces Plasmodium vivax density in Southeast Asians. Science 326: 1546–1549 [DOI] [PubMed] [Google Scholar]

[bib40] Maynard Smith J., Haigh J., 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23: 23–35 [PubMed] [Google Scholar]

[bib41] Messer P. W., Neher R. A., 2012. Estimating the strength of selective sweeps from deep population diversity data. Genetics 191: 593–605 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] Messer P. W., Petrov D. A., 2013. Population genomics of rapid adaptation by soft selective sweeps. Trends Ecol. Evol. 28: 659–669 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib43] Myers J. H., 1998. Synchrony in outbreaks of forest Lepidoptera: a possible example of the Moran effect. Ecology 79: 1111–1117 [Google Scholar]

[bib44] Nair S., Nash D., Sudimack D., Jaidee A., Barends M., et al. , 2007. Recurrent gene amplification and soft selective sweeps during evolution of multidrug resistance in malaria parasites. Mol. Biol. Evol. 24: 562–573 [DOI] [PubMed] [Google Scholar]

[bib45] Nelson M. R., Wegmann D., Ehm M. G., Kessner D., Jean P. S., et al. , 2012. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337: 100–104 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Nelson W. A., Bjrnstad O. N., Yamanaka T., 2013. Recurrent insect outbreaks caused by temperature-driven changes in system stability. Science 341: 796–799 [DOI] [PubMed] [Google Scholar]

[bib47] Otto S. P., Whitlock M. C., 1997. The probability of fixation in populations of changing size. Genetics 146: 723–733 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Parsons T. L., Quince C., Plotkin J. B., 2010. Some consequences of demographic stochasticity in population genetics. Genetics 185: 1345–1354 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] Patwa Z., Wahl L., 2008. The fixation probability of beneficial mutations. J. R. Soc. Interface 5: 1279–1289 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib50] Pelz H. J., Rost S., Hunerberg M., Fregin A., Heiberg A. C., et al. , 2005. The genetic basis of resistance to anticoagulants in rodents. Genetics 170: 1839–1847 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Pennings P. S., Hermisson J., 2006a Soft sweeps II: molecular population genetics of adaptation from recurrent mutation or migration. Mol. Biol. Evol. 23: 1076–1084 [DOI] [PubMed] [Google Scholar]

[bib52] Pennings P. S., Hermisson J., 2006b Soft sweeps III: the signature of positive selection from recurrent mutation. PLoS Genet. 2: e186. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Pennings P. S., Kryazhimskiy S., Wakeley J., 2014. Loss and recovery of genetic diversity in adapting populations of HIV. PLoS Genet. 10: e1004000. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Peter B. M., Huerta-Sanchez E., Nielsen R., 2012. Distinguishing between selective sweeps from standing variation and from a de novo mutation. PLoS Genet. 8: e1003011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] Pollak E., 2000. Fixation probabilities when the population size undergoes cyclic fluctuations. Theor. Popul. Biol. 57: 51–58 [DOI] [PubMed] [Google Scholar]

[bib56] Przeworski M., Coop G., Wall J. D., 2005. The signature of positive selection on standing genetic variation. Evolution 59: 2312–2323 [PubMed] [Google Scholar]

[bib57] Rosenberg R., Wirtz R. A., Schneider I., Burge R., 1990. An estimation of the number of malaria sporozoites ejected by a feeding mosquito. Trans. R. Soc. Trop. Med. Hyg. 84: 209–212 [DOI] [PubMed] [Google Scholar]

[bib58] Sabeti P. C., Reich D. E., Higgins J. M., Levine H. Z., Richter D. J., et al. , 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419: 832–837 [DOI] [PubMed] [Google Scholar]

[bib59] Schmidt J. M., Good R. T., Appleton B., Sherrard J., Raymant G. C., et al. , 2010. Copy number variation and transposable elements feature in recent, ongoing adaptation at the Cyp6g1 locus. PLoS Genet. 6: e1000998. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Sjödin P., Kaj I., Krone S., Lascoux M., Nordborg M., 2005. On the meaning and existence of an effective population size. Genetics 169: 1061–1070 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib61] Tennessen J. A., Bigham A. W., Connor T. D. O., Fu W., Kenny E. E., et al. , 2012. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337: 64–69 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Tishkoff S. A., Reed F. A., Ranciaro A., Voight B. F., Babbitt C. C., et al. , 2007. Convergent adaptation of human lactase persistence in Africa and Europe. Nat. Genet. 39: 31–40 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib63] Uecker H., Hermisson J., 2011. On the fixation process of a beneficial mutation in a variable environment. Genetics 188: 915–930 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib64] Voight B. F., Kudaravalli S., Wen X., Pritchard J. K., 2006. A map of recent positive selection in the human genome. PLoS Biol. 4: e72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib65] Wahl L. M., Gerrish P. J., Saika-Voivod I., 2002. Evaluating the impact of population bottlenecks in experimental evolution. Genetics 162: 961–971 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib66] Wang G. P., Sherrill-Mix S. A., Chang K.-M., Quince C., Bushman F. D., 2010. Hepatitis C virus transmission bottlenecks analyzed by deep sequencing. J. Virol. 84: 6218–6228 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib67] Waxman D., 2011. A unified treatment of the probability of fixation when population size and the strength of selection change over time. Genetics 188: 907–913 [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib68] Wolfs T. F., Zwart G., Bakker M., Goudsmit J., 1992. HIV-1 genomic RNA diversification following sexual and parenteral virus transmission. Virology 189: 103–110 [DOI] [PubMed] [Google Scholar]

[bib69] Wright S., Dobzhansky T., Hovanitz W., 1942. Genetics of natural populations. VII. The allelism of lethals in the third chromosome of Drosophila pseudoobscura. Genetics 27: 363. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Soft Selective Sweeps in Complex Demographic Scenarios

Benjamin A Wilson

Dmitri A Petrov

Philipp W Messer

Abstract

Figure 1.

Methods