Skip to main content
Genetics logoLink to Genetics
. 2021 Sep 3;219(4):iyab135. doi: 10.1093/genetics/iyab135

Dynamic sampling bias and overdispersion induced by skewed offspring distributions

Takashi Okada 1,2, Oskar Hallatschek 1,
Editor: K Jain
PMCID: PMC8664600  PMID: 34718557

Abstract

Natural populations often show enhanced genetic drift consistent with a strong skew in their offspring number distribution. The skew arises because the variability of family sizes is either inherently strong or amplified by population expansions. The resulting allele-frequency fluctuations are large and, therefore, challenge standard models of population genetics, which assume sufficiently narrow offspring distributions. While the neutral dynamics backward in time can be readily analyzed using coalescent approaches, we still know little about the effect of broad offspring distributions on the forward-in-time dynamics, especially with selection. Here, we employ an asymptotic analysis combined with a scaling hypothesis to demonstrate that over-dispersed frequency trajectories emerge from the competition of conventional forces, such as selection or mutations, with an emerging time-dependent sampling bias against the minor allele. The sampling bias arises from the characteristic time-dependence of the largest sampled family size within each allelic type. Using this insight, we establish simple scaling relations for allele-frequency fluctuations, fixation probabilities, extinction times, and the site frequency spectra that arise when offspring numbers are distributed according to a power law.

Keywords: natural selection, site-frequency spectrum, fixation probability, stationary distribution, traveling waves, Λ-coalescent, jackpot events, multiple mergers


Interpreting The Genetic Differences between and within populations we observe today requires a robust understanding of how allele frequencies change over time. Most theoretical and statistical advancements have been based on the Wright–Fisher model (Fisher 1930; Wright 1931), which has shaped the intuition of generations of population geneticists for how evolutionary dynamics works (Crow and Kimura 1970). The Wright–Fisher model assumes that the genetic makeup of a generation results from resampling the gene pool of the previous generation, whereby biases are introduced to account for most relevant evolutionary forces, such as selection, migration, or variable population sizes. For large populations, the resulting dynamics can be approximated by a biased diffusion process, which simplifies the statistical modeling of the genetic diversity. More importantly, the Wright–Fisher diffusion is the limiting allele frequency process of a wide variety of microscopic models, as long as they satisfy seemingly mild assumptions (see below). This flexibility has made the Wright–Fisher diffusion the standard model of choice to infer the demographic history of a species, loci of selection, or the strength of polygenic selection (Bollback et al. 2008; Berg and Coop 2014; Feder et al. 2014; Foll et al. 2015; Schraiber et al. 2016; Tataru et al. 2017).

Despite its versatility, the Wright–Fisher diffusion can be a poor approximation when the population dynamics is driven by rare but strong number fluctuations. It is increasingly recognized that number fluctuations can be inflated for very different reasons. First, the considered species may have a broad offspring distribution, which occurs for marine species and plants with a Type III survivorship curve (Hedgecock 1994; Eldon and Wakeley 2006) as well as viruses and fungi [reviewed in Tellier and Lemaire (2014)]. Broad offspring distributions also arise in infectious disease, when relatively few super-spreaders are responsible for the majority of the disease transmissions (Lloyd-Smith et al. 2005). In the recent SARS-CoV-2 pandemic, for example, a strongly skewed offspring distributions were consistently inferred from both contact tracing data and infection cluster size distributions (Adam et al. 2020; Laxminarayan et al. 2020). Understanding allele frequency trajectories in these systems is extremely challenging, as statistical inference based on the Wright–Fisher model is often misleading (see, e.g., Sackman et al. 2019).

A second mechanism for strong number fluctuations are so-called jackpot events, which can occur in any species no matter the actual offspring distribution. Jackpot events are population bottlenecks that arise when the earliest, the most fit or the most advanced individuals have an unusual large number of descendants. Temporal jackpot events (“earliest”) were first discovered by Luria and Delbrück (1943) and studied as a signal of spontaneous mutations in an expanding population. They observed that a phage resistant mutant clone can grow exceptionally large if the resistance mutation by chance occurs early in an expansion event. Despite being rare, these jackpot events are easily detectable in large populations because they strongly inflate the variance of the expected number of mutants and lead to power-law descendant distributions.

The very same descendant distribution arises in models of rampant adaptation and background selection. In these models, mutations generate jackpot events when they arise within the few fittest individuals (Neher and Hallatschek 2013). Jackpot events also arise in range expansions, where the most advanced individuals in the front of the population have a good chance to leave many descendants over the next few generations. This phenomenon of gene surfing can produce a wide range of scale-free descendant distributions (Hallatschek and Nelson 2008; Fusco et al. 2016; Birzu et al. 2018, 2021).

To account for skewed offspring distributions, a number of theoretical studies have been conducted in the context of the coalescent framework. Based on this backward-in-time, a striking feature of broad offspring distributions is the simultaneous merging of multiple lineages. One of the most widely studied models is the beta-coalescent (Schweinsberg 2003a), which is a subclass of the Λ-coalescent and corresponds to the population dynamics with a power-law offspring number distribution u1+α. The case α = 1, called Bolthausen–Sznitman coalescent (Bolthausen and Sznitman 1998), has been shown to be the limiting coalescent in models of so-called “pulled” traveling waves, which describe the most basic scenarios of range expansions (Brunet et al. 2007) and of rampant adaptation (Desai et al. 2013; Kosheleva and Desai 2013; Neher and Hallatschek 2013; Schweinsberg 2017). Moreover, so-called “semi-pushed” traveling waves that contain some level of co-operativity, induced e.g. by an Allee effect, generate power-law offspring distributions with 1<α<2 (Birzu et al. 2018), indicating that their coalescent is intermediate between the Bolthausen–Sznitman and Kingman coalescents.

The tractability of coalescent approaches make it particularly useful for inferring demographic histories and detecting outlier behaviors (Basdevant and Goldschmidt 2008; Eldon 2009 2011). However, as it is notoriously difficult to integrate selection in coalescent frameworks, there is also a strong need for forward-in-time approaches that capture the competition between genetic drift and selection. While for α2, the limiting allele frequency dynamics is given by the well-understood Wright–Fisher process, much less is known for the case α<2. This is unfortunate because, as mentioned above, any exponent 1α2 can arise dynamically.

Recently, the forward-dynamics of the special case α = 1 was studied by one of the authors (Hallatschek 2018), finding that an emergent sampling bias generates strong deviations from the Wright–Fisher dynamics. The sampling bias arises because, in each generation, an allele with high frequency can sample more often and, hence, deeper into the tail of the offspring distribution than an allele with small frequency. The major allele of a biallelic site, therefore, has with high probability a greater number of offspring per individual than the minority type. This sampling bias acts like a selective advantage of the major allele, but its average effect is compensated by rare frequency hikes of the minor allele so that the expected change in frequency only changes in the presence of genuine selection.

Here, we focus on the understudied case 1<α<2 intermediate between the known cases of α2, corresponding to the Wright–Fisher diffusion, and α = 1 described by jumps and sampling bias but vanishing diffusion. Similarly to the α = 1 borderline case, we find that a minor-allele-suppressing sampling bias arises but that it is fading over time as the offspring distributions are sampled more and more thoroughly. This time-dependent sampling bias determines the scaling of the fixation probability, extinction time, stationary distribution, and site frequency spectrum (SFS). The combination of jumps and bias generates a so-called Levy-flight which controls the variability of allele frequency trajectories, for instance between unlinked genes or between populations. The flexibility of our model should enable to fit wide range cases that deviate from the Wright–Fisher diffusion.

Model and methods

Model

To study the impact of broad offspring numbers, we consider an idealized, panmictic, haploid population of constant size N that produces non-overlapping generations in the following way. First, we associate with each individual i a “reproductive value” (Fisher 1930; Barton and Etheridge 2011) Ui, which represents its expected contribution to the population of the next generation. The random numbers Ui are drawn from a specified distribution PU. In a second step, we sample each individual in proportion to its reproductive value until we have obtained N new individuals representing the next generation.

Our model belongs to the general class of Cannings models (Cannings 1974). The Wright–Fisher model is obtained if we choose PU to be a Dirac delta function, such that all individuals have the same reproductive value.

We focus most of our analysis on the dynamics of two mutually exclusive alleles, a wild type and a mutant allele. The dynamics of the two alleles is captured by the time-dependent frequency X(t) (0X1) of mutants. The wild type frequency is given by 1X(t). The total reproductive values M and W of the mutant population and the wild type population, respectively, are given by

Mi=1NXUi(M),Wi=1N(1X)Ui(W). (1)

Here, Ui(M) and Ui(W) are the individual reproductive values of mutants and wild types and sampled from the distribution PU. The population at the next generation is generated by binomially sampling N individuals with success probability (namely, the probability that the parent of a randomly chosen individual is a mutant) MM+W. Mutations and selection are included as in the Wright–Fisher model. If the fitness of the mutant relative to the wild-type is 1+s, where s is the selection coefficient, and the forward- and back- mutation rates are μ1 and μ2, respectively, then the success probability is given by (1μ2)(1+s)M+μ1W(1+s)M+W.

For the offspring distribution PU, we consider a family of fat-tailed distributions, which asymptotically behave as PU1uα+1 with α being a positive constant. To make our presentation concrete, we choose PU(u)=α/uα+1(u1), which is known as the Pareto distribution. In the large population size limit, the neutral allele-frequency dynamics is known to only depend on the asymptotic power law exponent α provided we measure time in units of the coalescence time (Schweinsberg 2003b). For different but closely related modeling of broad offspring distributions, see Bah and Pardoux (2015).

Methods

Our goal is to understand the asymptotic dynamics of our model for large N, where the frequency becomes continuous over time (Kimura 1955; Gardiner 2009) provided that α1 (Schweinsberg 2003b). We first present simulation results regarding relevant measures in the population genetics. Later, we provide a heuristic argument to explain them. Many separate observations (the fixation probability, extinction time, allele frequency fluctuations, stationary distribution, and SFS) can be matched up with a unifying scaling picture.

Below, t and τ=t/Tc denote a time in units of generations and one normalized by the characteristic (coalescent) timescale Tc, respectively. Tc depends on the population size and the exponent α as follows: Tc = N when α>2,Tc=N/logN when α = 2, Tc=Nα1 when 1<α<2, and by Tc=logN when α = 1. These timescales were originally derived in the coalescent framework (Schweinsberg 2003b). Later, we explain how they can be rationalized within the forward-in-time approach.

To understand the frequency dynamics when 1α<2, it is essential to distinguish between average and typical trajectories. As a proxy for typical trajectories, we use the median of the frequencies, denoted by Xmed(τ), throughout this paper.

Results

Neutral dynamics: typical trajectories and extinction time

First, we characterize the allele frequency dynamics in the absence of selection s =0. In this neutral limit, the expected value of the allele frequency does not change over time, i.e., X(t)=X(0). Yet, despite the overall neutrality, a typical trajectory experiences a bias against the minority allele. This can be seen in Figure 1, where the mean and median are plotted across many realizations that start from the same frequency X(0)=0.01. While the mean does not change over time, as required from neutrality, the median decays to zero in an α-dependent manner. By symmetry, the median increases toward fixation if the starting frequency is larger than 50%. Thus, the median experiences a bias against the minor allele. Note also that, when 1<α<2, the velocity of the median approaching extinction decreases as it approaches the extinction boundary (see the red curve in Figure 1). As we will show later, an uptick of the SFS at the boundaries originates from this slowing.

Figure 1.

Figure 1

The mean (blue dashed curve) and median (red solid curve) of allele frequency trajectories for α=0.8,1,1.5,2, and 2.5. For each α, 104 trajectories are generated with the initial frequency x0=0.01. For ease of viewing, only 50 trajectories are shown in gray in each panel. The time t in units of generations and the one τ=tTc re-scaled by the coalescent timescale Tc are shown in the horizontal axes. The dependence of the coalescent time on the population size N is written below each panel. The population size is N=105.

Numerical simulations of the early part of trajectories show that time-dependent median displacement follows a simple power law,

ΔXmedXmed(τ)X(0)τ1α, (2)

up to an X(0)-dependent prefactor. Figure 2 shows the numerical result for α=1.5. The red curve represents the median of trajectories, which agrees well with ΔXmedτ23.

Figure 2.

Figure 2

Trajectories of the median (red thick line) and the mean (blue dashed) of allele frequency when α=1.5 and N=108. Inset: The trajectory (red points) of |ΔXmed|=X(0)Xmed(τ) is shown in log–log plot. The median agrees well with the expectation from the scaling argument ΔXmedτ1α (black solid line).

Next we quantify the time to extinction, which turns out to be driven by the above minor-allele suppressing bias. Numerical results of the mean extinction time are consistent with

τextX(0)α1, (3)

as shown in Figure 3. Hence, in units of the coalescence time, the mean extinction time τext becomes larger as α decreases (namely, for a broader offspring distribution). Note, however, that if one measures time in units of generations, Equation 3 can be rewritten as text=τextTc(NX(0))α1, which becomes smaller as α decreases since NX(0)1. As we show later, Equation 2 can be analytically derived from a short-time approximation of the dynamics (Equation 25). Equation 3 can be explained from an effective sampling bias (Equation 35).

Figure 3.

Figure 3

The mean extinction time text (in units of generations) as a function of initial allele frequency X(0) is plotted for α=1.4,1.5,1.6. Each of the straight lines has the slope α1. The initial-frequency dependence of text can be fitted well by Equation 3. The population size is N=108.

Allele frequency fluctuations as a signature of broad offspring distributions

Next, we explore to what extent the spectrum of allele frequency fluctuations can provide a clue for identifying the exponent α of the offspring distribution. A deviation from the Wright–Fisher diffusion is most clearly revealed by measuring the median square displacement [median standard deviation (SD)],

MedianSDM[(X(t)X(t=0))2], (4)

where M[·] denotes taking the median (e.g., M[X(t)]=Xmed(t)). To measure the median SD, we simulate 1000 neutral allele frequency trajectories with initial condition X(0)=0.5, for α=1,1.5 and the Wright–Fisher model (Figure 4A). As shown in Figure 4B, the median SD computed from this data set is consistent with the scaling,

Median SDt2α, (5)

when t/Tc1. Noting 1α<2, this scaling means that typical fluctuations characterized by the median SD exhibit super-diffusion. Later, we derive the superdiffusive exponent 2α in Equation 5 analytically (Equation 32).

Figure 4.

Figure 4

(A) Fluctuations of neutral allele frequencies when α=1.5 and N=105. For X(0)=0.5, the median Xmed(t) (red) is constant as well as the mean X(t) (blue). (B) The median square displacement computed from a data set of 1000 trajectories. For α=1,1.5 and the Wright–Fisher model, N=108,104, and 103 are used respectively. The straight lines represent the scaling in Equation 5. For α = 1, the fitting after t5 is not perfect, since t/Tc=t/lnN1 is not satisfied. (C) The mean square displacement (mean SD) for different values of α. Solid lines represent linear scaling, which is expected for a regular diffusion process. (D) Data-size dependence of the estimated diffusion exponent κestimated for the mean SD (blue circle) and that for the median SD (orange triangle). See the main text for the detailed explanation. The horizontal lines show κ=2α and κ = 1. The bars represent the standard deviations of κestimated, α=1.3, and N=108 are used.

Usually, allele frequency fluctuations are quantified by using the mean SD (X(t)X(0))2, rather than the median SD. For the Wright–Fisher diffusion, the distinction between these two measures is irrelevant since both of them increase linearly with time, except with differing prefactors. However, for 1α<2, the α-dependence in Equation 5 can be detected by measuring the median SD. As shown in Figure 4C, the mean SD (computed from a large data set) grows linearly in time even when α is less than 2, as if the underlying process was diffusive.

That the dynamics is not diffusive also impacts the mean SD, but somewhat subtly in that its value depends on the size of the data set (i.e., the number of frequency trajectories) used to measure it. This is because while rare large jumps contribute the mean SD in a large data set, these jumps are not observed in a small data set (with high probability). To demonstrate this data-size dependence, we prepare an ensemble of data sets, where each data set consists of a given number of allele-frequency trajectories. Then, for each data set, we measure the diffusion exponent κ, which is defined by

MeanSDtκ. (6)

In Figure 4D, the ensemble-averaged exponent is shown by the blue circle. We can see that, as the data size increases, fluctuations characterized by the mean SD exhibit a crossover from super-diffusion (κ=2α) to normal diffusion (κ = 1). For the median SD, by contrast, we find that its diffusion exponent κ can be computed reliably without any significant dependence on the size of the data set (orange triangles in Figure 4D). For example, under the parameter setting in Figure 4D, given a date set of 320 trajectories, the diffusion exponent κestimated of the median SD falls within the interval [1.45,1.57] with probability 68%. This in turn predicts αestimated:=κestimated21.32±0.05, which is close to the actual value α=1.3.

Fixation probability

Next, we examine the effect of natural selection on the fixation probability of beneficial mutations. We consider a mutant with positive selective advantage s >0 arising in a monoclonal population. The fixation probability Pfix(s) of a single mutant depends on the parameter α of the offspring distribution. In the Wright–Fisher model (or equivalently, α2), the fixation probability can be obtained using a diffusion approximation and is given by Pfix(s)=1e2s1e2Ns, which becomes Pfix2s when Ns1 and s is small. When α = 1, an analytic result has been recently obtained in (Hallatschek 2018), which can be approximated as Pfix(s)1N1s. For the intermediate case, 1<α<2, we find that the fixation probability is given by

Pfix(s)s1α1. (7)

See Figure 5 for the numerical results. Note that since Pfix(s)1N in the neutral limit independently of α, these results hold for sufficiently strong selection, 1Ns1α1. Equation 7 can be deduced from the balance between the selection force and an emergent sampling bias (Equation 38 and Figure 13A).

Figure 5.

Figure 5

The fixation probability Pfix as a function of selective advantage s. The lines are the expectations from the scaling argument in Equation 40. The population size is N=108.

Figure 13.

Figure 13

(A) The crossover from the effective bias to genuine selection. V(X)CX2α+σX is plotted, where C is a positive coefficient and σ>0. Deterministically, an unstable point exists at xXc. (B) The balance between the effective bias and mutation. V(X)CX2α+θ is plotted. Deterministically, a stable point exists at Xθ1/(2α).

As Equation 7 shows, for a fixed population size and selective advantage, the fixation probability becomes smaller as α decreases. Intuitively, this is because, for smaller α, the success of fixation in catching a ride on a jackpot event depends more strongly on luck than on fitness differences.

Site frequency spectrum

We return to the neutral case and present the scaling behaviors of the neutral SFS. The SFS is often used as a convenient summary of the genetic diversity within a population. Theoretically, the SFS is defined in the infinite alleles model (Kimura 1969) as the density fSFS(x) of neutral derived alleles in the population (namely, fSFS(x)dx is the number of derived alleles in the frequency window [xdx2,x+dx2]).

Figure 6 shows numerical plots of the neutral SFS for α=1,1.5, and the Wright–Fisher model. In the standard Wright–Fisher model, the SFS is proportional to 1/x, which decreases monotonically as x increases. By contrast, when offspring numbers are broadly distributed (when α<2), the SFS is non-monotonic with a somewhat surprising uptick toward the fixation boundary. When α = 1, the analytic understandings of asymptotic behaviors near both boundaries are well-established: fSFS(x) is proportional to 1(xlogx)2 near x0 and 1(1x)log(1x) near x1, respectively (Kosheleva and Desai 2013; Neher and Hallatschek 2013) (see also Appendix A).

Figure 6.

Figure 6

The neutral site frequency spectrum for different values of α and fixed population size N=105. When 1<α<2, the rare-end spectrum and the frequent-end spectrum are 1x3α and 1(1x)2α, respectively (see also Figure 7).

For the intermediate case 1<α<2, the rare-end behavior of the SFS has been analytically studied. From a backward approach (the Λ-coalescent), the authors in Berestycki et al. (2014) showed

limnζi(n)n2αΓ(i+α2)i!. (8)

Here, n is a sample size and ζi(n) is the number of sites at which variants appear i times in the sample (see Berestycki et al. 2014) for the proportionality constant of the right-hand side of Equation 8). By using Stirling’s approximation in Equation 8, we have

fSFS(x)1x3αwhenx1. (9)

Equation 8, cannot be used for high-frequency variants, because the number of times the variants appear (i in Equation 8) is kept finite in taking the limit of the sample size n. To the best of our knowledge, a precise behavior at the high-frequency end for 1<α<2 has not been reported. As shown in Figure 7, we find that the asymptotic form of the uptick of fSFS(x) is given by

fSFS(x)1(1x)2α(for1x1). (10)

Figure 7.

Figure 7

The SFS near x =1 for α=1.3,1.4,1.5,1.6 (circle, squares). The horizontal axis is 1x. The solid lines are drawn assuming fSFS(x)1(1x)2α. N=106 is used.

We will show that the uptick arises due to the fact that an effective sampling bias decreases as an allele-frequency trajectory approaches the fixation boundary (Equation 42).

Mutation-drift balance

A broad offspring distribution also affects the stationary distribution of allele frequency when mutations and genetic drift are balancing one another. For simplicity, we consider symmetric reversible mutations between two neutral allele types. We denote the scaled mutation rate (per unit time in the continuous description) as θ=Tcμ, where μ denotes the mutation rate per generation. In the Wright–Fisher model, it is known that the stationary distribution is given by Kimura (1955)

PWF(x)x2θ1(1x)2θ1. (11)

There is a critical value θcWF=12: When θ>θcWF, the distribution in Equation 11 has a single peak at the center x=12; when θ<θcWF, it has a U-shaped distribution, where the density is increasing monotonically from the center to the boundaries.

Figure 8, A and B show the numerical results of the stationary distributions for the Wright–Fisher model and α=1.5, respectively. When 1α<2, while a critical value of the mutation rate θc exists as in the Wright–Fisher model, there is a qualitatively different feature: For a small mutation rate θ<θc, the stationary distribution is not a U-shaped but an M-shaped distribution with two peaks near the boundaries. Note that the M-shaped distribution indicates a stochastic switching behavior, as illustrated in Figure 8D (the blue curve). As shown in Figure 8D, the peak positions are approximately given up to prefactors by

xpeak,1xpeakθ12α. (12)

Figure 8.

Figure 8

(A) Stationary distribution of the allele frequency in the Wright–Fisher model, when the mutation rate is small (θ=0.1) and large (θ=1.0). (B) Stationary distribution for α=1.5, when the mutation rate is small (θ=0.1) and large (θ=1.0). (C) The time-series of the allele frequency in the case of α=1.5, when the stationary distribution is bimodal (θ=0.1) and unimodal (θ=1.0). (D) The position of the peak near x =0 of the stationary distribution versus the mutation rate μ. N=104 is used.

In Appendix B, we show that the M-shaped stationary distribution persists even in the presence of natural selection, provided that selection is weaker than the sampling bias at the peaks of the distribution.

A similar M-shaped distribution was observed for the EW process in (Der and Plotkin 2014), wherein moments of the stationary distribution were extensively studied. However, the origin of the M-shaped distribution remained unclear. Below, using scaling arguments, we explain why the bimodal distribution arises in our case (see the argument above Equation 44).

Analytical arguments

Limiting process, transition density, and time-dependent effective bias

We now provide analytical arguments for the observations made in the simulations described in the first part of this paper. Our discussion starts with an exact but somewhat unwieldy description of the allele frequency dynamics. We then show how exact short-time and intermediate time asymptotics can be derived and used to rationalize the sampling bias and the scaling laws discovered above.

The allele frequency dynamics can be fully characterized by the transition probability density wN(y|x) that the mutant frequency changes from x to y in one generation. Since one generation consists of random offspring contributions to the seed pool and binomial sampling from the seed pool, we have

wN(y|x)=dMdW PMUT(M;xN)PWT(W;(1x)N)×Prbinom.(yN,N,MM+W). (13)

Here, PMUT(M;xN) is the probability density that the sum of xN random mutant offspring numbers takes the value M, PWT(W;(1x)N) is that for the wild type, and Prbinom. is the probability of getting yN successes in N trials with success probability MM+W. First, we will focus on the neutral case, for which PMUT and PWT are the same function, i.e., PMUT(·)=PWT(·).

While the resampling distribution wN may in general behave in complex ways, it has few options in the large N limit. These constraints emerge from two asymptotic simplifications. First, since M and W are the sums of many random variables, PMUT and PWT tend to stable distributions as described by the generalized central limit theorem (Gnedenko and Kolmogorov 1968; Uchaikin and Zolotarev 2011) (see also Appendix C for a brief description of the theorem). Second, the fluctuations associated with binomial sampling become negligible compared with those induced by offspring number contributions to the seed pool, provided that the offspring distribution is sufficiently broad, i.e., α2. Thus, we can replace Prbinom.(yN,N,MM+W) with a Dirac delta function, δ(yMM+W). By using these facts and evaluating the integral in Equation 13 (see Appendix D for details), we obtain a simple analytical expression of wN(y|x), which is valid in the large N limit: When α = 1 (Hallatschek 2018),

wN(y|x)=1logNx(1x)(xy)2. (14)

When 1<α<2,

wN(y|x)={N1αCαx(1x)(1y)α1(yx)α+1whenx<yN1αCαx(1x)yα1(xy)α+1whenx>y, (15)

where Cαα(α1α)α.

To obtain the continuum description, we must appropriately scale the time t with the population size N (Gardiner 2009). The characteristic timescale (coalescent timescale) Tc can be read from the dependence of the transition density on N. Hallatschek (2018) showed that, when α = 1, the resulting limiting process is described by,

τP(x,τ)=xV(x)P(x,τ)+dx(w(x|x)P(x,τ)w(x|x)P(x,τ)),(16)

where the jump kernel w(x|x) is given by

w(x|x)=x(1x)(xx)2, (17)

and the advection (bias) term V(x) is given by

V(x)=P.V.dx(xx)w(x|x)=x(1x)logx1x (18)

where P.V. denotes the Cauchy principal value. It is easy to check that Equation 16 satisfies the neutrality condition τX=0 (see Hallatschek 2018 for the calculation). Equations of the form in Equation (14) are sometime called differential Chapman–Kolmogorov equations (Gardiner 2009).

To develop intuition, it is useful to interpret the different terms in Equation 16. First, V(x) has a form of frequency-dependent selection that enhances the major allele (with frequency >50%) and suppresses the minor allele. The apparent fitness differences between the mutant and wild type is given by the log-ratio of their frequencies. Such a selection-like effect arises because the major allele can sample the offspring number from PU(u) more deeply than the minor allele (see Hallatschek 2018). Second, in spite of this apparent bias, the neutrality of the whole process is maintained due to rare large jumps, characterized by w(y|x). This also means that the neutrality does not hold if we focus on “typical” trajectories (see Figure 1). In fact, as we show in Appendix A, the median xmed of the mutant frequency, which is a proxy of “typical” trajectories, evolves according to

ddτXmed(τ)=V(Xmed(τ))(whenα=1). (19)

When 1<α<2, using the same reasoning as the derivation of Equation 14 and choosing τ=tCαNα1, we can obtain the following differential Chapman–Kolmogorov equation,

τP(x,τ)=xV(x)P(x,τ)+|xx|>ϵdx(w(x|x)P(x,τ)w(x|x)P(x,τ))(20)

where

w(x|x)={x(1x)(1x)α1(xx)α+1whenx<xx(1x)xα1(xx)α+1whenx>x (21)

and

V(x)=|xx|>ϵdx(xx)w(x|x). (22)

As in Equation 16, the advection term guarantees the neutrality. Equation 21 means that, when x<12, rightward jumps occur more frequently than leftward ones, and this tendency reverses when x>12. Noting the overall minus sign in Equation 22, this in turn means that Veff is a bias against the minor allele (see Figure 1), as in the case of α = 1. We will later show that when x1, the median trajectory is initially decaying like ΔXmedτ1α (Equation 2).

We note that the short-time superdiffusive behavior in Equation 5 implies that Equation 20 cannot be simplified to a Fokker–Planck equation. We also note that, in the limit ϵ0, two divergencies arise in Equation 20, one in the integral for the advection velocity in Equation 22 and one in the jump integral in Equation 16. However, since both divergencies exactly cancel, the entire right-hand side of Equation 20 is well-defined. As shown in Appendix E, Equations 16 and 20 can also be derived as a dual of the Λ-Fleming-Viot process, namely as the adjoint operator of the backward generator (e.g., Etheridge et al. 2010; Griffiths 2014).

Although it is difficult to study Equation 20 analytically, it is possible to derive exact short-time and long-time asymptotics that, combined with scaling arguments, paint a fairly comprehensive picture of the ensuing statistical genetics.

Short-time dynamics and fluctuations

First, we describe the transition density P(x,τ|x0,τ=0) of Equation 20 for small times. When 1<α<2, the allele frequency changes due to the deterministic bias V(x) and random occurrence of jumps, sampled from the broad distribution in Equation 21. Since the number of jump events is enormous (τϵα) even for small τ, the generalized central limit theorem applies, and X(τ) is asymptotically distributed according to a stable distribution (Gnedenko and Kolmogorov 1968). For a general stable distribution, its analytical expression is not available, and only its characteristic function can be expressed analytically. As we show in Appendix F, the random displacement ΔX(τ)=X(τ)x0 can be expressed as

ΔX(τ)=X(τ)x0=γ(τ,x0)Z. (23)

Here Z is sampled from the stable distribution p(z) whose characteristic function eikZdzeikzp(z) is given by

eikZ=exp[|k|α(1iβ(x0)tanπα2sign(k))], (24)

and the scale parameter γ(τ,x) and the skewness parameter β(x) are respectively given by

γ(τ,x)τ1α(π(x(1x)α+(1x)xα)2Γ(α+1)sinπα2)1α, (25)
β(x)x(1x)αxα(1x)xα(1x)+x(1x)α. (26)

Note that statistical properties of Z are independent of τ, and ΔX(τ) depends on τ via the scale parameter γ(τ,x0). As shown in Figure 9A, for small times, the transition density P(x,τ|x0,τ=0) computed from the stable distribution agrees precisely with numerical simulation results in the discrete-time model. Our result can be regarded as a counterpart of the Gaussian approximation often employed for the Wright–Fisher diffusion (see Tataru et al. 2017 and the references therein).

Figure 9.

Figure 9

(A) The allele frequency distribution p(x,t|x0=0.005) at generation t=5,10,35, for α=1.5. The solid lines denote the short-time transition densities given by Equations 21 and 22, and the open markers denote those computed from 10,000 allele frequency trajectories in the discrete-time model. (B) The initial dynamics of the median of the allele frequency (black). The red and blue lines denote the short-time solution in Equation 25 and the long-time solution in Equation 35, where constants of integration and the prefactor of Equation 36 are determined by fitting to the discrete-time model (black line) between 40<t<800. (C) The overall trajectory to extinction. The color scheme is the same as that in (B). In (A–C), α=1.5,N=107,x0=0.005 are used.

Now, we study the mean and median of the allele frequency using the short-time expression. The mean does not change in time since ΔX(τ)=γ(τ,x0)Z=0, which is consistent with the neutrality. On the other hand, the median changes as

ΔXmed(τ)=γ(τ,x0)Zmed(x0), (27)

where Zmed(x0) denotes the median of Z. Zmed(x0) depends on x0 via β(x0) (see Equation 24), and Zmed(x0)0 for x012. Equation 27 agrees with numerical simulations in the discrete-time model, while X(τ) is close to the initial frequency x0 (see the red and black curves in Figure 9B).

The scaling property ΔX(τ)τ1α in Equation 2 immediately follows from Equation 27, since γτ1α. This scaling implies that there is a time-dependent bias driving the median of the allele frequency. Differentiating Equation 27 with respect to time gives

ddτXmed(τ)=Veff(τ) (28)

where the effective time-dependent biasVeff(τ) is given by

Veff(τ)γ(τ,x0)τZmed(x0). (29)

Near the boundaries x =0 and x =1, Veff(τ) is approximately given by

Veff(τ){kx01ατ11α(x1)+k(1x0)1ατ11α(1x1), (30)

where k|Zmed(x0=0)|α(π2Γ(α+1)sinπα2)1α is a positive constant.

The advection term arises from a sampling bias

Intuitively, the time-dependent bias Veff(τ) arises from a time-dependence of the largest sampled offspring number (Figure 10). To see this, consider a typical trajectory of the allele frequency starting from x. Up to a short time τ, only jumps from x to y[y(τ),y+(τ)] are likely to occur, where y(τ) and y+(τ) can be estimated by the extremal criterion (Krapivsky et al. 2010),

τ×0y(τ)w(y|x)dy1,τ×y+(τ)1w(y|x)dy1. (31)

These conditions give

y(τ)x1+(τ(1x)α)1/α,y+(τ)x+(τxα)1/α1+(τxα)1/α. (32)

Because these small jumps cancel a part of the bias V(x) in Equation 22, the typical trajectory is then driven by the uncanceled residual part of the bias V(x),

Veffy[0,y(τ)][y+(τ),1](yx)w(y|x)dy. (33)

When x1, the dominant contribution to this integral is from yy+(τ). Using w(y|x)x(yx)α+1 from the first line of Equation 21 and y+x(τx)1α from Equation 32, the above integral can be evaluated as Veffx(y+x)α1x1ατα1α, which agrees with Veff in Equation 30 for x1 (up to the factor κ). When 1x1, the dominant contribution to Veff is from yy(τ) and can be evaluated in a similar way, reproducing Veff in Equation 30 for 1x1.

Figure 10.

Figure 10

Schematic explanation of the effective time-dependent bias Veff(τ). The black curve shows the jump rate w(y|x) in Equation 19 when x1. In a time τ, small jumps within the region [y(τ),y+(τ)] are likely to occur, offsetting a part of the original bias V(x). Veff(τ) is the residual part of the bias.

One interpretation of Equation 33 is that the bias V(x) in Equation 22 is mitigated by small jumps in a short time, and therefore, the integration over small jumps is excluded in Equation 33. Another interpretation is that, for typical short-time dynamics, small jumps and the bias V(x) are relevant, and, from the overall neutrality, the change caused by these two is equal to the negative of that caused by large jumps, thus resulting in Equation 33.

Allele frequency fluctuations are inconsistent with the Wright–Fisher diffusion

In the simulations, we found that, for 1α<2, allele frequency fluctuations are inconsistent with the Wright–Fisher diffusion and characterized by super-diffusion with diffusion exponent 2α (see Equation 5). This finding is readily explained by the short-time asymptotic in Equation 23. Recalling γτ1α (Equation 25) and statistical properties of Z are independent of τ, we obtain

MedianSD=γ(τ,x0)2M[Z2]τ2α. (34)

This scaling can also be justified heuristically by noting that, for 1<α<2, the square displacement is dominated by large jumps. During time τ, an allele frequency X(τ) around x typically jumps to y± given in Equation 32. When τ1, it is easy to see |y±x|τ1α with x-dependent prefactors. Because the median SD is dominated by the largest displacements, it can be evaluated as

Median SD(y±(τ)x)2τ2α, (35)

where τ=tTc1 is assumed.

Long-time dynamics and extinction time

Above, we saw that at short times, allele frequencies carry out an unconstrained Levy flight. This random search process, however, gets distorted as soon as the allele frequency starts to get in reach of one of the absorbing boundaries. Interestingly, the dynamics then enters a universal intermediate asymptotic regime that controls both the characteristic extinction time as well as establishment times and fixation probabilities.

To see this, let us consider the extinction dynamics of a trajectory starting from a small frequency x01 (Figure 4). At short times, we can apply the short-time asymptotics in Equations 28 and 30. We expect Equations 28 and 30 to break down when the displacement ΔXmed(τ) computed from Equation 28 becomes comparable to x0, which occurs at ττshortx0α1. By taking a coarse-grained view, the rate of the frequency change in τshort is roughly given by

ΔXmedτshortx02α. (36)

This suggests that, in a long timescale (ττshort), the median frequency decreases as

ddτXmed(τ)=V˜eff(Xmed(τ))(forX1), (37)

where, up to a prefactor, the frequency-dependent bias V˜eff(X) is given by

V˜eff(X)X2α. (38)

In Figure 9C, it is numerically shown that the long-time trajectory Xmed(τ>τshort) is consistent with Equation 37. By solving Equation 37, the median trajectory goes to extinction at τextx0α1 (Equation 3), in agreement with our simulations (Figure 4). Note that, for 1x1, the bias in Equation 38 is replaced by V˜eff(X)(1X)2α.

Importantly, Equations 37 and 38 can also be rigorously justified from a scaling ansatz for the transition density. After some time, P(x,τ|x0) spreads broadly over the region x1 with a peak at x =0 (Figure 11A). As shown in Figure 11B, P(x,τ) is consistent with the following scaling ansatz;

P(x,τ)τ2ηg(ξ)(forx1), (39)

where η1α1 and g(ξ) is a function of ξxτη. Up to an overall constant, g(ξ) can be determined analytically and expressed as an infinite series (see Appendix F). Note that the τ-dependent factor in Equation 39 is motivated from the fact that the extent over which the distribution spreads increases like τη. Equation 39 implies that, conditional on establishment at τ, the median frequency increases as Xmed(τ)|establishτη. Then, Equation 38 follows by evaluating the bias in Equation 28 at τ(Xmed)1η and at Xmed, instead of at x0.

Figure 11.

Figure 11

(A) Log plot of P(x,τ|x0=0.01) at generations t=700,1100,1500,1900 computed from the discrete-time model. N=107 and α=1.5. (B) Log–log plot of τ2ηP(x,τ|x0=0.01) versus ξ=x/τ2η, where η=(α1)1, at t=700,1100,1500,1900 (solid curves). The dashed curve represents the analytic result of g(ξ) (see Appendix F). The curves τ2ηP(x,τ|x0=0.01) at the different time points collapse into g(ξ), supporting the scaling ansatz in Equation 37.

As a consistency check of the exponent α1 in Equation 3, we consider two solvable, extreme cases. First, in the limit α2, the dependence on x0 in Equation 3 becomes linear. In the Wright–Fisher model, the mean extinction time can be obtained analytically by solving the backward equation 1=x(1x)2τext(x)x2 (see, e.g., Karlin and Taylor 1981). The solution is proportional to x0 with a logarithmic correction, τextx0logx0. Second, when α1, the mean extinction time no longer depends on x0. We can obtain this explicitly, by solving Equation 17: Using V(X)XlogX when X1, the solution is given by loglogX(τ)logx0=τ. Therefore, if we approximately define the mean extinction time τext as X(τext)1N, we obtain τextloglogNlogx0loglogN, which is to leading order independent of x0 if x0 is taken to be of order one.

Natural selection and fixation probability

One important advantage of the forward-time perspective is that we account for natural selection by introducing an appropriate bias favoring of the beneficial variant. Suppose that the mutant type has a selective advantage s >0, such that the average offspring number of mutants is increased by a factor of 1+s relative to the wild type. In time-rescaled Chapman–Kolmogorov equation, this adds the term σx(1x), where σ=Tcs, into the advection V(x) of Equation 20.

The key observation underlying the argument below is that when X is sufficiently small, the selection force σx(1x)σX is negligible compared to the bias V˜eff(X)X2α in Equation 38 because, while the former is approximately linear in X, the latter is sublinear. If the frequency happens to grow and reach a certain value Xc, the genuine selection begins to dominate over the bias, and the trajectory fixes with high probability (see Figure 12 for example trajectories and Figure 13A). By using Equation 38, the crossover point Xc can be estimated from the balance between the selection force and the sampling bias V˜eff(X),

σX=V˜eff(X)X2α, (40)

which gives

Xcσ1α1=1Ns1α1. (41)

Figure 12.

Figure 12

Example of trajectories of the frequency of the beneficial allele, starting from x0=0.05, α=1.5,s=0.03, and N =5000. Fixed trajectories are colored in blue and extinct ones in gray. Here, the crossover point Xc can be estimated as Xc0.2, assuming that the proportional constant in Equation 39 is one. Once a trajectory reaches the crossover point, it becomes fixed in high probability.

For XXc, the dynamics are essentially neutral (described by Equation 20), while, for X>Xc, the trajectory grows almost deterministically. Therefore, the fixation probability Pfix of a beneficial mutation can be estimated by using the neutral fixation probability in a population of size NXc. Although the full dynamics in Equation 20 is difficult to analyze, it is obvious that the neutral fixation probability is equal to the inverse of the population size. Therefore, we have

Pfix1NXcs1α1, (42)

which is valid for 1Ns1α1. Equation 42 reproduces our simulation results in Figure 5 for 1<α<2 and, as α2, also reproduces the known result of the Wright–Fisher model, PfixWF2s (up to a prefactor).

Site frequency spectrum

By using the time-dependent effective bias, we can also estimate the behavior of the SFS fSFS(x) for frequent and rare variants. While the SFS is theoretically defined in the infinite alleles model, it can be computed from our biallelic framework (Ewens 1963): fSFS(x)Δx is defined as the expected number of neutral derived alleles in the frequency interval [xΔx2,xΔx2] in a sampled population (here, the whole population). Because new mutations are assumed to arise uniformly in time, the SFS for unlinked neutral loci is given by the product of the total mutation rate μN and the mean sojourn time, namely, the average time an allele spends in the frequency interval [xΔx2,xΔx2] until fixation or extinction.

First, we consider the low-frequency end, x1, of the SFS (see Cvijović et al. 2018 for a similar argument). Since the SFS is proportional to the sojourn time, trajectories whose maximum frequencies are x or slightly larger than x dominantly contribute to the SFS fSFS(x) at x. Since these trajectories typically go extinct due to the bias, and we can roughly estimate their sojourn times at x as the inverse of “velocity”, |V˜eff(x)|x2α in Equation 38. Since the probability that a trajectory grows above a frequency x is roughly given by 1/(Nx), the SFS is proportional to

1NxV˜eff(x)1x3α(forx1). (43)

Similarly, for the high-frequency end of the SFS, only the trajectories that grow above x can contribute to fSFS(x). Typically, these trajectories go to fixation due to the bias V˜eff(x)(1x)2α. Therefore, the SFS is proportional to

1NxV˜eff(x)1(1x)2α(for1x1). (44)

The effect of the genuine selection on the SFS can also be studied by using the effective bias. See Appendix G.

Bimodality of stationary distribution

Now, we turn to explaining the bimodality observed at mutation-drift balance. We found that, when the mutation rates are small, the stationary allele frequency distribution is not a U-shaped, as expected from the Wright–Fisher dynamics, but M-shaped, as shown in Figure 8. The M-shaped distribution arises from the balance between the mutational force and the effective bias (see Figure 13B). In the Chapman–Kolmogorov equation, the mutational force is given by

θx+θ(1x){+θ(x1)θ(1x1), (45)

which pushes the frequency toward the center x=12. On the other hand, the effective bias, V˜eff(x)x2α for x1 and V˜eff(x)(1x)2α for 1x1, pushes a trajectory toward the closer boundary. Therefore, the positions where these two forces balance are approximately given by

xpeakcθ12α,1cθ12α, (46)

where c is a positive constant. If θ is sufficiently small, we can always find the balancing points. The presence of these two balancing points means that we can think of the allele frequency dynamics as a two-state system, essentially analogous to a super-diffusing particle in a double-well potential (see Figure 8C for a realization of trajectories). This explains the bimodal shape of the stationary distribution.

Finally, we remark that, even in the presence of natural selection, the balancing positions are still determined from the mutation-effective bias balance provided that θ1: while the effective bias and the mutational term are sub-linear and constant respectively, the selection term σx(1x) is linear in x when x1. Thus, when θ is sufficiently small, the magnitude of the selection term around x=cθ12α,1cθ12α is negligible, and the peak positions are given by Equation 46.

Discussion

In this study, we analyzed the effect of power law offspring distributions on the competition of two mutually exclusive alleles. Our main reason to consider such broad offspring distributions is that they often emerge in evolutionary scenarios that inflate the reproductive value (Barton and Etheridge 2011) of a small set of founders. For example, range expansions blow up the descendant numbers of the most advanced individuals in the front of the population, an effect that has been called gene surfing (Hallatschek and Nelson 2008). Likewise, continual rampant adaptation boosts the descendant numbers of the most fit individuals. The resulting allele frequency dynamics becomes asymptotically similar to that of a population with scale-free offspring distributions.

In the case of narrow offspring distributions, which is predominant assumption in population genetics, it is usually an excellent approximation to describe the allele frequency dynamics by a biased diffusion process, which forms the basis of powerful inference frameworks (Tataru et al. 2017). If the offspring distribution is broad, however, allele frequency trajectories are disrupted by discontinuous jumps, resulting from so-called jackpot events—exceptionally large family sizes drawn by chance from the offspring distribution. Our goal was to find an analytical and intuitive framework within which we can understand the main features of these unusual dynamics.

We found that the main counter-intuitive features can be understood and well-approximated from a competition of selection and mutations with a time-dependent emergent sampling bias, Veff(τ). The sampling bias favors the major allele and arises, because the sub-population carrying the major allele typically samples deeper into the tail of the offspring distribution than the minor allele fraction.

In the remainder, we first summarize the unusual population genetic patterns that can be explained by the action of these effective forces. We then discuss how broad offspring dynamics could be detected in natural populations and what its implications are for the dynamics of adaptation. Finally, we demonstrate that these dynamics are also ubiquitous in populations with narrow offspring distributions, when mutational jackpots are possible. Therefore, we believe our theoretical framework may be taken as a general null model for populations far from equilibrium.

Unusual dynamics

We found that the sampling bias effectively acts like time- and frequency-dependent selection. In the absence of true selection, Veff(x,τ) drives the major allele to fixation, first rapidly and than gradually slowing down with time and proximity to fixation. The slowing down of the sampling bias near fixation also leads to an excess of high-frequency alleles, given continual influx of neutral mutations. This generates a high-frequency uptick in the SFS, which is characteristic of the tail of the offspring distribution. In mutation-drift balance, the allele frequency distribution is M-shaped, in contrast to the U-shape expected from the Wright–Fisher dynamics. The peaks reflect the balance of the mutational and sampling bias.

Non-neutral dynamics depends on whether the genuine selection force dominates over the sampling bias. The sampling bias tends to dominate near extinction or fixation, and wanes near 50% frequency. A denovo beneficial allele will not be able to fix unless it overcomes, by chance, the switch-point frequency at which genuine selection becomes stronger than the sampling bias. Finally, fluctuations in typical trajectories are getting stronger over time. As a consequence, allele frequencies super-diffuse: fluctuations grow with time more rapidly than under the Wright–Fisher diffusion.

Detecting dynamics driven by broad offspring distributions

The time-dependent over-dispersion is most readily detected by plotting the median square displacement as a function of time (see Figure 4B). Testing deviations in this statistic is an attractive avenue for detecting deviations from the Wright–Fisher diffusion because the signal is strong for intermediate allele frequencies, which can be accurately measured by population sequencing. By contrast, the time-dependent bias vanishes when an allele has 50% frequency. So, the detection of the sampling bias requires accurate time series data of low frequency variants, which is difficult to obtain given sequencing errors.

It is clear that a single super-diffusing but neutral allele would not abide by the diffusive Wright–Fisher null model and thus might be falsely considered as an allele under selection. But importantly, allele super-diffusion has an impact even on statistics that sum over many unlinked loci. This is significant for inference methods, for instance to detect polygenic selection, which argue that trait values follow a diffusion process, if not for an underlying Wright–Fisher dynamics of the allele frequencies then because they sum over many independent allele frequencies (Berg and Coop 2014). However, α<2 dynamics breaks both of these arguments. In particular, sums of many unlinked loci tend to non-Gaussian distributions (so-called alpha-stable distributions). Hence, for traditional inference methods based on the Wright–Fisher diffusion or standard central limit theorem (Tataru et al. 2017), an underlying super-diffusion process should be ruled out.

If time series are not available, broad offspring numbers can also be detected from the SFS (Neher and Hallatschek 2013). A tail-tale sign of the sampling bias is a characteristic uptick at the high-frequency tail of the SFS, which is difficult to generate by demographic variation (Neher and Hallatschek 2013). As we have shown, the shape of the uptick is characteristic of the tail of the offspring distribution (the parameter α).

Implications for the dynamics of adaptation

We found that the fixation probabilities quite sensitively depends on the broadness α of the offspring distribution (Equation 42). Accordingly, the dynamics of adaptation, which ultimately depends on the fixation of beneficial variants, should change quantitatively. To estimate these modifications, we consider an asexual population of constant size N with a broad offspring distribution with 1<α<2, wherein beneficial mutations occur at the rate μB. For low mutation rates, mutations sweep one after the other but when mutation rate are sufficiently high, multiple mutations occur and most mutations are outcompeted by fitter mutations. Such a situation is known as clonal interference.

We can study the effect of the exponent α on the adaptation dynamics quantitatively by repeating the argument in Desai and Fisher (2007), wherein the variance of offspring numbers is assumed to be narrow. As discussed in Appendix H, clonal interference should occur if

μBNs2αα1ln(Ns1α1)1(clonalinterference), (47)

where s >0 is the fitness effect of a mutation, which we assume to be constant. The rate R of adaptation is given by

R{μBNsαα1(successiveselectivesweeps)2s2ln(Ns1α1)(lnsμB)2(clonalinterference). (48)

Note that the second line in Equation 48 reproduces Equation 5 of Desai and Fisher (2007) in the limit α2. Thus, the rate of adaptation depends only weakly (logarithmically) on α in the clonal interference regime, even though the condition for clonal interference in Equation 47 depends on α quite sensitively.

Emergence of skewed offspring distributions in models of range expansions

Our study can be regarded as an analysis of the population genetics induced by power-law offspring distributions. The main reason to consider these scale-free offspring distributions is that they quite generally emerge in models of stochastic traveling waves (Birzu et al. 2018). Such models are ubiquitous in population genetics because they describe a wide range of evolutionary scenarios, including range expansions, rampant asexual and sexual adaptation as well as Muller’s ratchet (Brunet et al. 2007; Desai et al. 2013; Kosheleva and Desai 2013; Neher and Hallatschek 2013; Schweinsberg 2017; Birzu et al. 2018). Our analysis should therefore apply most directly to these evolutionary scenarios, which we now demonstrate using a simple model of a range expansion. We end by discussing the question of whether some of our results may also arise in scale-rich offspring distributions.

Birzu et al. (2018) argued that any exponent 1α2 can emerge in a simple model of range expansions that incorporates a tunable level of cooperativity between individuals (Figure 14A). The model can be described by a generalized stochastic Fisher–Kolmogorov equation

nt=D2nx2+r(n)n+noise, (49)

for the time-dependent population density n(x, t) at position x in a linear habitat and time t. The growth rate r(n) is assumed to be density-dependent, with

r(n)=r0(1nK)(1+BnK), (50)

where the parameter B0 accounts for co-operativity among individuals, which is also called an Allee effect. As discussed in Hallatschek (2018), lineages in the region of the wave tip are diffusively mixed within the timescale τmix1rln2KDr. This implies that, in this microscopic model, resampling from an offspring distribution roughly occurs every τmix generations. In Birzu et al. (2018, 2021), it was argued that depending on the strength of the Allee effect, the offspring distributions corresponding to any of the three distinct classes of the beta coalescent process can arise; namely, the Bolthausen–Sznitman coalescent when B <2, the beta coalescent with 1<α<2 when 2<B<4, and the Kingman coalescent when B >4.

Figure 14.

Figure 14

(A) The model of a range expanding population with two neutral alleles (green and gray). A broad offspring distribution arises dynamically in the front region. (B) Stationary distributions of the allele frequency when mutation rate θ is small (blue) and when θ is large (orange). The wiggling lines (blue/orange) are the numerical results in the traveling wave model, while the dotted lines (black) are those in the macroscopic model. The parameters of the Allee effect B are B =1 (left), 3 (middle), and 8 (right). See Appendix I for the details of the implementation of the simulation and other parameter values.

To demonstrate clearly that our present study can serve as a macroscopic analysis of the traveling model, we introduce reversible mutations in the traveling wave model and measured the mutant frequency of the first NKk individuals from the edge of the front. Here, k is the spatial decay rate, i.e., nekx˜ where x˜ is the coordinate comoving with the expansion. This definition of the mutant frequency is reasonable because only the wave front has a skewed offspring distribution due to the founder effect. In Figure 14B, for B =1 (left), 3 (middle), and 8 (right), the frequency distributions in the traveling wave model are shown when the mutation rate is small (orange jagged line) and when it is large (blue jagged line). The corresponding distributions in the macroscopic model are shown by black dotted lines. The stationary distributions in the traveling wave model agree well with those in the macroscopic model. Especially, the transition from the M-shaped or U-shaped distribution to the monomodal distribution is consistently reproduced in the traveling wave model. These results underscore the correspondence between the traveling wave with the Allee effect and the beta coalescent process.

The above-described correspondence suggests that the spatial area occupied by one allele type in a range expansion should behave statistically like the time-integral over the allele frequency in the Cannings model. In the context of adapting (non-spatial) populations, this quantity describes the total number of mutational opportunities of a mutant lineage (Desai and Fisher 2007; Weissman et al. 2009; Neher and Shraiman 2011). As presented in Appendix J, the distribution of the time-integrated frequency exhibits a scaling behavior that depends on the offspring distribution sensitively. While a full discussion is beyond the scope of this paper, we expect that the distribution of areas serves as a useful observable to distinguish different prototypes of traveling waves (Birzu et al. 2018).

Broad offspring distributions with a scale: While scale-free offspring distributions often emerge over an intermediate time scale (τmix in the above traveling wave model), there are also species that over single generations show broad offspring numbers and violate the Wright–Fisher diffusion. For such species, it may be more natural to consider offspring distribution with a characteristic scale. In ‘sweepstake’ reproduction (Eldon and Wakeley 2006), a fixed and finite fraction of the population is replaced at every sweepstake event (specified by the parameter Ψ in Eldon and Wakeley (2006)). Because Ψ sets a characteristic scale in offspring numbers, power law relationships for the median of allele frequencies as well as frequency fluctuations cannot be expected, which we confirm in Appendix K. Nevertheless, the qualitative features of a sampling bias can be recognized quite clearly for sweepstake reproduction as well.

Either type of model ultimately is an approximation to true offspring distributions, and it depends on the situation, which one to use. As we argued, the beta-coalescent along with the forward-in-time model described in this article is the natural choice for range expansions, rapid adaptive process or other scenarios where the reproductive value of a chosen few are highly inflated.

Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

Funding

Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award R01GM115851, a National Science Foundation CAREER Award (#1555330), a Simons Investigator award from the Simons Foundation (#327934), RIKEN iTHEMS Program, and JSPS KAKENHI (JP19K03663).

Conflict of interest

The authors declare that there is no conflict of interest.

Acknowledgments

The authors thank Benjamin H. Good, Daniel B. Weissman, Jiseon Min, Joao Ascensao, Michael M. Desai, and Stephen Martis for their helpful discussions and comments.

Appendix A: Analytic results in the marginal case α = 1

Although the main target of our present study is the case of 1<α<2, we here provide analytical results for α = 1, which have not been derived before.

Figure A1.

Figure A1

The SFS fSFS(x)/μ when α = 1 for the selective advantage σ=2,0,2. fSFS(x) is obtained by numerically evaluating the exact expression of t(x) in the first line of Equation A.5. As x0, f(x) becomes independent of σ. Near x =1, while the magnitude of f(x) depends on σ, the scaling behavior (slope in the log–log plot) does not. See Equation A.14.

Site frequency spectrum in the presence of genuine selection

The transition density for α = 1 in the presence of natural selection is derived in Hallatschek (2018) (see Kosheleva and Desai 2013 for neutral case). In x space, it is given by

Gσ(x,τ|x0)=12πx(1x)sinπηcosπη+cosh[ηlogeσx1xlogeσx01x0], (A.1)

where ηeτ and σ is the selective advantage [there is an erratum in Equation 38 in Hallatschek (2018)].

For the purpose of computing the SFS (or, equivalently, the mean sojourn time), we set x0=1/N. Since we are considering the large N limit, the denominator of Equation A.1 can be rewritten as

cosπη+cosh[ηlogeσx1xlogeσx01x0]12exp[ηlogx1xσ(1η)+logN]=12N(x1x)ηeσ(1η). (A.2)

Thus, the transition density for x0=1N can be written as

Gσ(x,η|x0=1N)=eσπNx(1x)sinπη(x1xeσ)η. (A.3)

Near the boundaries, this can be approximated as

Gσ(x,η|x0=1N)={eσπNxsinπη(xeσ)η(x1)eσπN(1x)((1x)eσ)ηsinπη(1x1). (A.4)

The SFS is given by fSFS(x)=Nμ×t(x), where μ is the mutation rate per generation, and t(x) is the mean sojourn time density, which is given by

t(x)=0dtGσ(x,τ|x0=1N)=01dηηGσ(x,η|x0)={eσπNx01dηηsinπη(xeσ)η(x1)eσπN(1x)01dηηsin(πη)((1x)eσ)η(1x1). (A.5)

Next, we compute the integrals in Equation A.5, asymptotically close to the absorbing boundaries (see Equation A.14 for the final results). To evaluate Equation A.5 for x1, we first consider the integral,

Iϵ=01dηexpf(η). (A.6)

When f(η) has a sharp peak at η=η*, we approximate this integral as

Iϵef(η*)2π|f(η*)|. (A.7)

In our case,

f(η)=logηlog(ϵ)η+logsinπη (A.8)

where ϵ=xeσ. f(η) takes the maximum value at η=η*1+1logϵ.1 At η=η*,f(η*)logϵ1+logπlogϵ,2 and f(η*)1π2sin2(πlogϵ)log2ϵ. The saddle-point evaluation in Equation A.7 is precise when ϵ1. By using these expressions, Iϵ can be evaluated as

Iϵ2πlogϵe1πϵlogϵ=2ππe1ϵlog2ϵ. (A.9)

By setting ϵ=xeσ, we find

fSFS(x0)μ2πe1(x(logx+σ))2μ2πe1(xlogx)2μ1(xlogx)2. (A.10)

Next, to evaluate Equation A.5 for the high-frequency end, we consider the following integral

Iϵ=01dηηsin(πη)ϵη. (A.11)

When ϵ1, the integrand takes the maximum value at the boundary η = 0. Thus,

Iϵ01dηπϵη=π(1+ϵ)logϵπlogϵ. (A.12)

By setting ϵ=(1x)eσ, we find

fSFS(x1)μeσ(1x)(log(1x)σ)μeσ(1x)log(1x). (A.13)

In summary, the SFS in Equation A.5 is given by

fSFS(x){μ1(xlogx)2(forx1,eσ)μeσ(1x)log(1x)(for1x1,eσ). (A.14)

Note that the dependence on σ disappears when x1. Figure A1 shows the plots of the SFS.

For comparison, we write the SFS for the Wright–Fisher model (α2) (see, e.g., Crow and Kimura 1970; Evans et al. 2007);

fSFSWF(x)=θe2σ(1e2σ(1x))(e2σ1)x(1x). (A.15)

The asymptotic forms near the boundaries are given by

fSFSWF(x){θ1x(forx1)θσ(1+cothσ)(1+(σ1)(x1)),(for1x1,|σ|(1x)1)

where we have expanded the SFS around x =1 up to the sub-leading order. For a sufficiently strong selection (σ>1), the SFS increases with x at the high-frequency end. However, unlike the case of α<2, the increase is not strong and the SFS approaches the constant σ(1+cothσ) as x1.

Dynamics of the median of allele frequencies

When α = 1, we can derive a simple differential equation that described the median of trajectories. In the logit space, the transition density is given by

G(ψ,ρ|ψ0)=sinπρ2π{cosπη+cosh[ρ(ψ+σ)(ψ0+σ)]} (A.16)

where ρ=eτ. The median Ψmed (at a given time point ρ) is characterized by

ΨmedG(ψ,ρ|ψ0)dψ=12. (A.17)

From the symmetry of cosh, the median is given by the peak of the transition density;

Ψmed=σ+1ρ(ψ0+σ). (A.18)

By differentiating Equation A.18 with respect to ρ and eliminating ψ0, we obtain

ddρΨmed=1ρ2(ψ0+σ)=1ρ(Ψmed+σ). (A.19)

Noting that ddt=ρddρ, we find

ddτΨmed=Ψmed+σ. (A.20)

Since the median is invariant under a coordinate transformation, the median Xmed in the x space is simply related with Ψmed via the logit transformation, logXmed1Xmed=Ψmed. By differentiating this with respect to time and using Equation A.20, we obtain

ddτXmed=Xmed(1Xmed)(logXmed1Xmed+σ). (A.21)
Allele frequency dynamics conditioned on fixation

By using Bayes’ theorem, the probability distribution of the allele frequency conditioned on fixation can be written as

P(x,τ|x0,fixation)=P(x,τ,fixation|x0)×1P(fixation|x0) (A.22)
=P(x,τ|x0)×P(fixation|x)P(fixation|x0). (A.23)

The fixation probability for the initial frequency x0 is given by (see Hallatschek 2018)

P(fixation|x0)=x0eσ1+x0(eσ1). (A.24)

In particular, the fixation probability of a single mutant is given by

P(fixation|x0=1N)1N1s. (A.25)

By using Equation A.24, the conditioned probability in Equation A.23 is computed as

P(x,t|x0,fixation)=12πx(1x)×xeσ1+x(eσ1)×1+x0(eσ1)x0eσ×sinπρcosπρ+cosh[ρlogeσx1xlogeσx01x0]=12πx0(1x)1+x0(eσ1)1+x(eσ1)sinπρcosπρ+cosh[ρlogeσx1xlogeσx01x0]. (A.26)

Appendix B: Stationary distributions of traveling wave model in the presence of natural selection

In Figure 14 of the main text, the mutant allele is assumed be neutral. Here, we provide the results in the case where mutants have a fitness advantage σ (Figure B1). As in the main text, symmetrically reversible mutations are assumed.

Figure B1.

Figure B1

The stationary distributions of the mutant frequency for θ=0.1,1,5. σ=0,1,5. σ is the selection coefficient in the time-continuous description, σ=sTc.

Appendix C: Generalized central limit theorem

Here, we briefly summarize the generalized central limit theorem (Gnedenko and Kolmogorov 1968; Uchaikin and Zolotarev 1999). Suppose that each random number ui is sampled from the Pareto distribution P(u)=αuα+1(u1) and consider the shifted and rescaled random variable ζ;

ζ=i=1nuianbn, (C.1)

where an and bn are

an=0,bn=(πn2Γ(α)sinπα2)1/αfor0<α<1,an=nlogn,bn=π2nforα=1,an=αα1n,bn=(πn2Γ(α)sinπα2)1/αfor1<α<2,an=αα1n=2n,bn=(nlogn)1/2forα=2. (C.2)

It is well-known that the distribution of ζ is well-approximated by the α-stable distribution, which we denote as Pα(ζ). While an explicit expression of Pα(ζ) is not available in general, the characteristic function is given by

eisζ=dζeisζPα(ζ){exp[|s|α(1+i sgn(s)2πlog|s|)](α=1)exp[|s|α(1i sgn(s)tanπα2)](α1),forn. (C3)

Appendix D: The transition density of an allele frequency wN(y|x) and the asymptotic dynamics for large N

Allele-frequency change in a generation is characterized by the transition density wN(y|x), which is the probability distribution of the allele frequency y at the next generation given the current allele frequency x. When N is large, the asymptotic dynamics can be described by a time-continuous differential Chapman–Kolmogorov equation, which is defined by an advection velocity V(x), diffusion coefficient D(x), and jump kernel w(y|x) (Gardiner 2009). The triplet is obtained from the transition density wN(y|x) as follows:

w(y|x)=limNwN(y|x)δtNV(x)=limN1δtN|yx|<ϵ(yx)wN(y|x)dyD(x)=limN1δtN|yx|<ϵ(yx)2wN(y|x)dy, (D.1)

where δtN is an N-dependent timescale, corresponding to one generation measured in units of the coalescent timescale. In the following, we derive the transition density wN(y|x) and the asymptotic dynamics for general α by using a similar computational technique used in Hallatschek (2018), wherein the case of α = 1 is studied extensively.

As mentioned in the main text, when α2, the binomial sampling error is negligible for large N compared to the stochasticity coming from broad offspring number fluctuations, and we can replace the binomial distribution in Equation 13 of the main text with the Dirac delta function;

wN(y|x)=δ(yMM+W)M,W=+dσ2πei(yMM+W)σM,W. (D.2)

Here ·M,W means the average over M=i=1xNui and W=i=1(1x)Nvi. Using the variable s=σM+W, we can rewrite wN as

wN(y|x)=+ds2π(M+W)eis(My(M+W))M,W=yds2πiseisM(1y)+isWy)M,W=yWN(y|x), (D.3)

Here,

WN(y|x)=+ds2πisΦ(s(1y);xN)Φ(sy;(1x)N) (D.4)

with

Φ(s;n)=eisi=1nui. (D.5)

To use the properties of the α-stable distributions in Appendix C, we further rewrite WN(y|x) as follows:

WN(y|x)=+ds2πisΦ(s(1y);xN)Φ(sy;(1x)N)=+ds2πiseis(1y)i=1xNxieisyi=1(1x)Nxi=+ds2πiseis(1y)(bxNζ+axN)ζeisy(b(1x)Nζ+a(1x)N)ζ=+ds2πiseis(1y)axN+isya(1x)Neis(1y)bxNζζeisyb(1x)Nζζ. (D.6)

When N is large, the quantities in the two brackets in the last line can be approximated by the characteristic functions of α-stable distribution, Equation C.3, with ss(1y)bxN and ssyb(1x)N, respectively. Thus, when α1, Equation D.6 can be computed as

WN(y|x)=+ds2πiseis(1y)axN+isya(1x)N×e|s|α(1y)αbxNα(1+isgn(s)tanπα2)e|s|αyαb(1x)Nα(1isgn(s)tanπα2)=+ds2πise|s|α{(1y)αbxNα+yαb(1x)Nα}×eis(1y)axN+isya(1x)Nei|s|αsgn(s)tanπα2((1y)αbxNαyαb(1x)Nα)=0dsπsesα{(1y)αbxNα+yαb(1x)Nα}×sin[s((1y)axNya(1x)N) sαtanπα2((1y)αbxNαyαb(1x)Nα)]. (D.7)

In the following, we evaluate the integral expression of WN(y|x) and compute the transition density wN(y|x) from Equation D.3.

When α<1

By using Equation C.2,

an=0,bnα=π(2Γ(α)sinπα2)ncαn, (D.8)

we have

WN(y|x)=0dsπsesαcαN((1y)αx+yα(1x))×sin[sαcαNtanπα2((1y)αxyα(1x))]. (D.9)

By setting Ncαsα=σ,WN(y|x) becomes

WN(y|x)=1α0dσπσeσ((1y)αx+yα(1x))sin[σtanπα2((1y)αxyα(1x))]
=tan1(tan(πα2)(x(1y)α(1x)yα)(1x)yα+x(1y)α)πα. (D.10)

By differentiating it with respect to y, we obtain

wN(y|x)=x(1x)sin(πα)((1y)y)α1π(x2(1y)2α+(1x)2y2α+2x(1x)cos(πα)((1y)y)α). (D.11)

Note that this does not depend on N, which is consistent with the fact that the coalescent time is O(N0) when α<1.

When 1<α<2

By using Equation C.2,

an=αα1n
bnα=Bαn,whereBαπ2Γ(α)sinπα2, (D.12)

Equation D.7 becomes

WN(y|x)=0dsπseBαNsα{(1y)αx+yα(1x)}×sin[sαα1N(xy)sαtanπα2BαN((1y)αxyα(1x))]. (D.13)

By changing the variable of integration as σ=N1/αs, we have

WN(y|x)=0dσπσeBασα{(1y)αx+yα(1x)}×sin[σαα1N11α(xy)σαtanπα2Bα((1y)αxyα(1x))]. (D.14)

By changing the variable of integration as σ=αα1|xy|σ and redefining σ as σ, we have

WN(y|x)=0dσπσeμ1σαsin(sgn(xy)N11ασμ2σα), (D.15)

where

μ1=Bα(α1α)α(1y)αx+yα(1x)|xy|αμ2=tanπα2Bα(α1α)α(1y)αxyα(1x)|xy|α. (D.16)

The transition probability wN(y|x) is given by

wN(y|x)=yWN(y|x)=sgn(xy)yμ1π0dσσα1eμ1σαsin(N11ασ+sgn(xy)μ2σα)yμ2π0dσσα1eμ1σαcos(N11ασ+sgn(xy)μ2σα). (D.17)

Consider the integral

Jα=0dσσα1eμσαeiN11ασ (D.18)

where μ=μ1isgn(xy)μ2. Then, the transition probability can be written as

wN(y|x)=sgn(xy)yμ1πImJαyμ2πReJα. (D.19)

From Watson’s lemma, the integral Jα can be expressed as a series expansion;

Jα=m=11Nm(α1)eiπ2mα(μ)m1Γ(mα)Γ(m). (D.20)

By substituting Equation D.20 into Equation D.19 and writing μ=|μ|eiθ, we obtain

wN(y|x)=m=1(|μ|)m1Nm(α1)Γ(mα)πΓ(m)[sgn(xy)yμ1sin(π2mα+(m1)θ)yμ2cos(π2mα+(m1)θ)]. (D.21)

The leading order (m =1) is given by

wN(y|x)=Γ(α)Nα1(sgn(xy)yμ1πsinπα2yμ2πcosπα2)={N1αα(α1α)αx(1x)(1y)α1(yx)α+1whenx<yN1αα(α1α)αx(1x)yα1(xy)α+1whenx>y. (D.22)

Equation 21 in the main text can be obtained by introducing the continuous time τt/(CαNα1) where Cαα(α1α)α. Equation 22 follows from the neutrality ddtx=0. Note that the expansion of Equation D.20 is possible only when |xy| is finite, i.e., when |xy|>ϵ where ϵ is an N-independent positive constant. Although wN(y|x) in D.22 diverges as |xy|0, this divergence is not a problem, because the jump term of the asymptotic dynamics in Equation 20 can be obtained from wN(y|x) for |xy|>ϵ (see Gardiner 2009).

Whenα = 2

an and bn are given by

an=αα1n=2n,bn=(nlogn)1/2. (D.23)

Equation D.7 then becomes

WN(y|x)=0dsπses2{(1y)2xNlogxN+y2(1x)Nlog(1x)N}×sin(2sN(xy))=0dsπses2{(2xy+x+y2)NlogN+((1y)2xlogx+y2(1x)log(1x))N}×sin(2sN(xy)).

By changing the variable of integration as σ=(NlogN)12s,

WN(y|x)=0dσπσeσ2{(2xy+x+y2)+((1y)2xlogx+y2(1x)log(1x))(logN)1}×sin(2(NlogN)12σ(xy))0dσπσeσ2(2xy+x+y2)×sin(2(NlogN)12σ(xy))=12erf((xy)Nlog(N)(2xy+x+y2)), (D.24)

where erf(x) is the Gauss error function

erf(x)=1πxxet2dt. (D.25)

By differentiating WN(y|x) with respect to y, we have

wN(y|x)=(NlogN)121π(1x)x(2xy+x+y2)3/2eN(xy)2logN(2xy+x+y2). (D.26)

Suppose that ϵ is a sufficiently small but finite constant. For |xy|<ϵ, wN can be approximated as

wN(y|x)=(NlogN)121π1(x(1x))1/2eN(xy)2logN(x(1x))=12πσ2e(xy)22σ2. (D.27)

where 2σ2=logNNx(1x). From the symmetry yx(yx) of wN(y|x), the advection term is zero. The diffusivity D is given by

D=1δtN|xy|<ϵdy(xy)2wN(y|x)=1δtNσ2=1δtNlogNN12x(1x)=12x(1x), (D.28)

where we have introduced the natural timescale as δtN=logNN and used the integral approximation

ϵϵdΔΔ212πσ2exp(Δ22σ2)=12πσ(2πσerf(ϵ2σ)2ϵeϵ22σ2)σ2. (D.29)

Finally, the jump kernel asymptotically vanishes on the time scale δtN,

w(y|x)=limNwN(y|x)δtN, (D.30)

because for fixed x, y with |xy|>ϵ,wN(y|x) becomes exponentially small as N becomes large.

Thus, in the large-N limit, α = 2 corresponds to the Wright–Fisher diffusion for a population of effective size

Ne=Nlog(N). (D.31)

When α>2

In this case, since the Pareto distribution P(u)=αuα+1(u1) has finite mean a=αα1 and finite variance b2=α(α1)2(α2), and the large N limit of the allele frequency dynamics should be described by the Wright–Fisher diffusion process. To confirm this more generally, we consider a general distribution with finite mean and variance, namely, consider that each individual’s offspring number ui is sampled from a distribution with mean a and variance b2. Then, from the central limit theorem, the shifted and rescaled variable

ζ=i=1nxianbn,wherean=an,bn=nb, (D.32)

obeys the normal distribution N(0,1). Its characteristic function is given by f(s)=exp(12s2). Thus, we have

WN(y|x)+ds2πiseis(1y)axN+isya(1x)Nf(s(1y)bxN)f(syb(1x)N)=+ds2πiseis(1y)axN+isya(1x)Nf(s(1y)xNb)f(sy(1x)Nb)=+ds2πiseis(1y)axN+isya(1x)Ne12s2(1y)2xNb2e12s2y2(1x)Nb2. (D.33)

By setting σ=N1/2s,

WN(y|x)=+dσ2πiσeia(xy)N1/2σe12b2σ2((1y)2x+y2(1x))=0+dσπσsin(a(xy)N1/2σ)e12b2σ2((1y)2x+y2(1x))=12erf(aN(xy)2b(2xy+x+y2)). (D.34)

Thus, we obtain

wN(y|x)=yWN(y|x)=N12πγx(1x)exp(γ2N(xy)22(2xy+x+y2))(2xy+x+y2)3/2. (D.35)

where γa/b. For the Pareto distribution, γ=(αα1)/α(α1)2(α2)=α(α2).

For |xy|>ϵ,wN(y|x) becomes exponentially small as N becomes large, and so the jump term does not exist in the asymptotic dynamics; w(y|x)=0. For |xy|<ϵ, we can approximate wN(y|x) as

wN(y|x)=Nγ22πx(1x)exp(γ2N(xy)22x(1x))=12πΣ2e(xy)22Σ2, (D.36)

where Σ2=x(1x)γ2N. From the symmetry yx(yx) of wN(y|x), the advection is zero. Finally, the diffusion is evaluated as

|xy|<ϵdy(xy)2wN(y|x)=Σ(Σerf(ϵ2Σ)2πϵeϵ22Σ2)Σ2=x(1x)γ2N. (D.37)

Thus, by re-scaling time as τ=tγ2N, we obtain

D=x(1x), (D.38)

which corresponds to the Wright–Fisher diffusion of a population of effective size Ne=Nγ2=Nα(α2). Notice Ne0 as α2, indicating that the concept of the effective population size breaks down when the variance of the offspring distribution diverges.

Appendix E: From Lambda-Fleming-Viot Generator to differential Chapman–Kolmogorov equation

In Appendix D, the jump density w(y|x) is derived from the generalized Wright–Fisher sampling, Equation 13 in the main text. Here, we present another more formal derivation of the jump density w(y|x) for 1<α<2. See Hallatschek (2018) for the case α = 1.

Jump density for general Λ measure

The backward generator of the Λ coalescent process for the biallelic model (see, e.g., Etheridge et al. 2010; Griffiths 2014) is given by

Gτ(x|x0)=01(x0Gτ(x|x0+(1x0)λ)Gτ(x|x0)+(1x0)Gτ(x|x0x0λ))Λ(dλ)λ2. (E.1)

This can be rewritten as a sum of two terms:

LGτ(x|x0)=A+B, (E.2)

where

A=x001(Gτ(x|x0+(1x0)λ)Gτ(x|x0)(1x0)λx0Gτ(x|x0))Λ(dλ)λ2, (E.3)
B=(1x0)01(Gτ(x|x0x0λ)Gτ(x|x0)+x0λx0Gτ(x|x0))Λ(dλ)λ2. (E.4)

We introduce the integration variable xx0+(1x0)λ for A and xx0x0λ for B, respectively. By writing

Λ(dλ)λ2=l(λ)λ2dλ, (E.5)

A and B become

A=x0(1x0)x01(Gτ(x|x)Gτ(x|x0)(xx0)x0Gτ(x|x0))l(xx01x0)(xx0)2dx, (E.6)
B=x0(1x0)0x0(Gτ(x|x)Gτ(x|x0)(xx0)x0Gτ(x|x0))l(x0xx0)(xx0)2dx. (E.7)

Defining the jump kernel w(x|x0) as

w(x|x0)={x0(1x0)(xx0)2l(xx01x0)(x>x0),x0(1x0)(xx0)2l(x0xx0)(x<x0), (E.8)

we can formally rewrite the generator as

LGτ(x|x0)=V(x0)x0Gτ(x|x0)+PV01w(x|x0)[Gτ(x|x)Gτ(x|x0)]dx, (E.9)

where

V(x0)=PV01dxw(x|x0)(xx0). (E.10)

When the measure is the Beta distribution Beta (α,2α)

We take the Beta (α,2α) distribution as the Λ measure, which corresponds to the descendant distribution considered in this study, 1/u1+α:

Λ(dλ)λ2=l(λ)dλλ2=λ+1α(1λ)α1B(α,2α)dλλ2=λ1α(1λ)α1B(α,2α)dλ. (E.11)

With this measure, A and B become

A=x0(1x0)B(α,2α)x01dx[Gτ(x|x)Gτ(x|x0)(xx0)x0Gτ(x|x0)](xx0)1α(1x)α1, (E.12)
B=x0(1x0)B(α,2α)0x0dx[Gτ(x|x)Gτ(x|x0)(xx0)x0Gτ(x|x0)](x0x)1αxα1. (E.13)

Note that the integrals A and B are convergent for α(0,2), because, near xx0, the terms inside [] are O((xx0)2) and so the integrands are O(|xx0|1α). The jump kernel is given by

w(x|x0)={x0(1x0)B(α,2α)(xx0)1α(1x)α1(x>x0)x0(1x0)B(α,2α)(x0x)1α(x)α1(x<x0). (E.14)

When 1<α<2, this density agrees with Equation 21 of the main text (up to a proportionality constant). The advection is given by

V(x0)=PV01dxw(x|x0)(xx0)=x0(1x0)B(α,2α)(0x00dx(x0x)αxα1x0+01dx(xx0)α(1x)α1). (E.15)

Note that, when α>1, the limit limϵ00x0ϵ+x0+ϵ1 in Equation E.15 does not exist, although this divergence is rather formal since there exists a natural cutoff ϵ1N for a finite-size population.

Appendix F: The transition density for the differential Chapman–Kolmogorov equation for 1<α<2

Here we derive the short-time transition density given in Equations 23 and 24 and determine g(ξ) in the scaling ansatz given in Equation 39.

The short-time transition density

Before discussing the CK equation in Equation 18, it is instructive to start from the simple diffusion equation,

τP(x,τ)=Dx2P(x,τ), (F.1)

with the initial condition P(x,τ=0)=δ(xx0). The solution of this initial value problem is given by

P(Δx,τ)=12π(2Dτ)exp(Δx22(2Dτ)), (F.2)

which is usually derived from the Laplace-Fourier transformation. However, this solution can also be obtained by using the central limit theorem: Equation F.1 is equivalent to a Brownian motion where jumps XX±a occur with rate m2, where a and m are related with D via D=a2m2. Since nmτ jumps occur in time τ, the displacement is approximately given by ΔX(τ)i=1nli where li=±a. Then, from the central limit theorem, ΔX(τ) is distributed according to the normal distribution with mean nli=0 and variance nli2=(mτ)a2=2Dτ, namely, Equation F.2. Note that, even if the diffusion constant depends on x, the solution in Equation F.2 (with DD(x0)) is valid in short times.

Essentially the same argument can be applied to the CK dynamics, except that the generalized central limit theorem should be employed since the variance of jump sizes is divergent in the case of the CK dynamics. Suppose that the initial density is given by P(x,τ=0)=δ(xx) (for notational simplicity, the subscript 0 on x is dropped). In the CK dynamics, the frequency change ΔX(τ)=X(τ)x is caused by the bias V(x) in Equation 20 and by stochastic jumps. The rate of a frequency-increasing jump and that of a frequency-decreasing jump are given by

W+(x)=x+ϵ1w(x|x)dx=xα(1xϵ)α, (F.3)
W(x)=0xϵw(x|x)dx=1xα(xϵ)α, (F.4)

respectively. Therefore, the expected number n of jump events in time τ is given by

n=(W+W+)τ. (F.5)

Because randomness in the number of jump events is negligible compared to that in jump sizes, it can be assumed that exactly n jumps occur in time τ. Then, the displacement ΔX(τ)=X(τ)x can be written as

ΔX(τ)=V(x)τ+i=1nli, (F.6)

where li[x,ϵ][ϵ,1x] denotes the displacement due to the i-th jump. For small τ, w(y|x(τ))w(y|x) for 0<τ<τ, which means that l1,,ln are independent and identically distributed. From Equation 19, each li is approximately sampled from the following power-law distribution,

P(l)={W+W++Wϵααlα+1(l[+ϵ,+))0(l(ϵ,+ϵ))WW++Wϵαα|l|α+1(l(,ϵ]), (F.7)

where the factor W+W+W+ (resp. WW+W+) represents the probability that a given jump is frequency-increasing (resp. frequency-decreasing). P(l) is normalized as P(l)dl=1. Note that, in Equation F.7, the original range [x,ϵ][ϵ,1x] of l has been extended to [(,ϵ][ϵ,). Under this modification, the variance x(τ)2 is no longer well-defined. However, this modification does not alter short-time properties of typical events, because the presence of the boundaries at x =0, 1 is not important for them.

By noting that P(l) has a divergent variance and that the number of jumps is nτϵα1 even for small τ (as ϵ+0), the generalized central limit theorem states that the sum i=1nli in Equation F.6 obeys an α-stable distribution. The stable distribution is characterized by l,β,γ given below (see, e.g., Uchaikin and Zolotarev 2011): The mean l is

l=W+WW++Wαα1ϵ=x(1x)αxα(1x)xα(1x)+x(1x)ααα1ϵ. (F.8)

Asymptotically, P(l) satisfies

lP(l)dl=W+W+W+ϵαlαc+lα(l),lP(l)dl=WW+W+ϵα|l|αc|l|α(l). (F.9)

Note c+c+=ϵα. The parameters γ and β are determined from c±;

γ(π(c++c)n2Γ(α)sinπα2)1α=ϵ(πn2Γ(α)sinπα2)1α=(τπ(x(1x)α+(1x)xα)2Γ(α+1)sinπα2)1α (F.10)
βc+cc++c=W+WW++W=x(1x)αxα(1x)xα(1x)+x(1x)α. (F.11)

Then, from the generalized central limit theorem, the random variable,

Zi=1nlinlγ, (F.12)

has the following characteristic function,

eikZ=eikzPZ(z)dz=ϵ+0exp[|k|α(1iβtanπα2signk)]. (F.13)

We can determine the characteristic function for Δx, using Equation F.13 and the relation

ΔX(τ)=V(x)τ+γZ+nl, (F.14)

which follows from Equations F.6 and F.12. While V(x) and l are divergent in the limit ϵ+0, we can show, by using Equation F.8 and V(x)=|xx|>ϵdx(xx)w(x|x)1ϵα11α1(xα(1x)x(1x)α), that these divergent terms exactly cancel out each other. Therefore, the displacement is simplified as

ΔX(τ)=γZ. (F.15)

Equations 24 and 23 in the main text are the same as Equations F.15 and F.13 (with the replacement of xx0). By substituting this into Equation F.13, we obtain the characteristic function of the allele frequency X(τ);

eikX(τ)=eikxP(x,τ|x0)dx=exp[ikx0|γ(x0)k|α(1iβ(x0)tanπα2signk)]. (F.16)

The scaling ansatz for the long-time transition density in Equation 39

Consider the initial distribution P(x,τ=0)=δ(xx0) with x01. After some time, the distribution spreads over the region x1 with a peak at the extinction boundary x =0. As presented in Equation 37 of the main text, up to a constant prefactor, P(x,τ) takes the following form

P(x,τ)τ2ηg(ξ),

where η=(α1)1 and ξ=xτη. Here, we present an analytic argument to determine g(ξ).

Equation 20 can be rewritten as

Pτ=|Δ|<ϵdΔ(fΔ(xΔ)P(xΔ,τ)fΔ(x)P(x,τ))+x|Δ|<ϵdΔ(fΔ(x)P(x,τ)), (F.17)

where fΔ(x)w(x+Δ|x) given by Equation 21. For x1,fΔ(x) is approximately given by

fΔ(x)={xΔα+1(Δ>0)x(x+Δ)α1Δα+1(Δ<0). (F.18)

We substitute the ansatz P(x,τ)τ2ηg(ξ) into the above CK equation. The left-hand side of the CK equation becomes

LHS=2ητ2η1g(ξ)ητ2η1g(ξ)ξ, (F.19)

which is proportional to τ2η1=τα+1α1. The right-hand side is decomposed into the integrals over Δ>0 and those over Δ<0. We can show that the former is proportional to τα+1α1, while the latter is proportional to τ2α1.3 Since the extinction time for the initial frequency x01 is much shorter than the coalescent timescale, we can assume τ1, which implies that the integrals over Δ>0 are negligible compared to those over Δ>0. By evaluating the integrals over Δ>0 using the scaling form of P(x,τ) and comparing them with Equation F.19, we have

η(2g(ξ)+ξg(ξ))=0dδδα+1((ξδ)g(ξδ)Θ(ξδ)ξg(ξ)+δddξ(ξg(ξ))), (F.20)

where Θ(·) is the Heaviside step function. Note that the variable of integration has been changed from Δ to δ=Δτη, and the upper bound in the integral has been extended into +, to make the equation analytically tractable. It is convenient to express Equation F.20 in terms of Φ(ξ)ξg(ξ);

η(Φ(ξ)ξ+Φ(ξ))=0dδδα+1(Φ(ξδ)Θ(ξδ)Φ(ξ)+δΦ(ξ)). (F.21)

The solution of the integro-differential equation in Equation F.21 can be obtained as a series expansion. Assume, for small ξ,

Φ(ξ)=c1ξβ+, (F.22)

where c1 is a normalization and the exponent of the leading term is denoted by β(0,1). Here, β<1 is required since we are considering the situation where P(x,τ) is monotonically decreasing in x, while β>0 is required to make P(x,τ) normalizable. By substituting Equation F.22 into Equation F.21, we have

β+1α11ξ1β+=Γ(α)Γ(1+β)Γ(1α+β)1ξαβ+. (F.23)

Since 1ξ1β1ξαβ for ξ1, in order for the two sides to be balanced, the coefficient Γ(α)Γ(1+β)Γ(1α+β) needs to be zero, which is possible only when Γ(1α+β) diverges. Since 1<α<2 and 0<β<1, we can conclude β=α1. Therefore, the leading term of g(ξ) is given by

g(ξ)=c1ξ2α+(ξ1). (F.24)

More generally, by starting from the ansatz,

Φ(ξ)=m=1cmξ(α1)m, (F.25)

the coefficients c2,c3, can be determined iteratively:

cm+1=1+(α1)mα1Γ(m(α1))Γ(α)Γ(m(α1)+α)cm(m=1,2,). (F.26)

By using this iteratively, we can express Φ(ξ) as

Φ(ξ)=c1m=1(1)m+1Γ(α+1)(αα1)m1α(α1)mΓ(α)m1Γ(m+1)Γ(m(α1))ξ(α1)m, (F.27)

where (αα1)m1 is the Pochhammer symbol, (q)n=Γ(q+n)/Γ(q). The analytic expression of g(ξ) can be obtained from this using g(ξ)=Φ(ξ)ξ.

On the other hand, for ξ1, we expect that g(ξ) decreases in the same way as the offspring distribution does;

g(ξ)1ξα+1+(ξ1). (F.28)

Therefore, we expect there is a crossover point ξc such that g(ξ)1ξ2α+ for ξξc and g(ξ)1ξα+1+ for ξξc. The scaling form for ξξc can indeed be confirmed by considering the following ansatz for Φ(ξ),

Φ(ξ)={c1ξα1+(ξ<ξc)cξα+(ξ>ξc), (F.29)

where c is a normalization and α is an exponent to be determined. Substituting this ansatz into Equation F.21, we can show α=α, leading to g(ξ)1ξα+1+ for ξ>ξc.

Finally, we remark that, while Equation F.27 is derived assuming ξ1, the series converges for any ξ>0. This indicates that the scaling form g(ξ)1ξα+1+ for large ξ should directly follow from a resummation of the infinite series in Equation F.27. In fact, numerical evaluation of a finite truncation of the series indicates the crossover behavior Equation F.29 (see Figure F1).

Figure F1.

Figure F1

The infinite series in Equation F.27 is evaluated numerically by truncating at m =150 and using the van Wijngaarden transformation (solid line). α=1.7 is used. The dashed blue and red lines represent the asymptotic behaviors given in Equation F.29.

Appendix G: Site frequency spectra in presence of selection

Here, we argue the effect of the genuine selection on the SFS by using the effective bias when 1<α<2. As discussed in the main text, there is a crossover point xc, shown in Equation 39, below which the selection is negligible compared to the effective bias (see Figure 13). Thus, we can expect that the SFS becomes independent of the selective advantage σ for a sufficiently small frequency x. Similarly, for the high-frequency end 1x1, the selection is negligible compared with the effective bias. Therefore, we expect that fSFS(x)1Veff(x)(1x)α+2 even in the presence of natural selection. In particular, the exponent is independent of σ. Figure G1 shows the numerical results when α=1.5. As x approaches 0, the SFS becomes independent of the selective advantage σ. For frequent variants 1x1, the SFS can be fitted well by (1x)α+2, while the magnitude of the SFS increases with σ. A similar result can be obtained analytically when α = 1 (see Appendix A).

Figure G1.

Figure G1

Left: The SFS under positive selection s=0,0.005,0.01,0.02, α=1.5, and N=106. Right: The SFS near x =1. The straight lines are drawn assuming SFS(x)1/(1x)2α. The slope is almost independent of s.

Appendix H: Derivation of the rate of adaptation in Equation 46 of the main text

Here, we conjecture the rate of adaptation for an asexual population with a broad offspring distribution (1<α<2) in the clonal-interference regime, using a self-consistency condition argument described in Desai and Fisher (2007).

We assume that mutations have a fixed effect s much larger than the mutation rate μB at which they arise. First, we consider the dynamics of the fittest sub-population that becomes established at the nose of the fitness wave. We can estimate the size of the sub-population when established from the establishment probability of a single fittest mutant;

Nest1Pfix(qs), (H.1)

where qs (qN) is the fitness lead of the sub-population compared with the mean of the whole population, and the fixation probability is given by Equation 42, Pfix(qs)1α1. In the time this sub-population is seeded and becomes established, the mean fitness should increase by s. This implies that, after its establishment, this sub-population will initially grow exponentially at rate (q1)s. The growth rate will slow down to 0 when it fixes. Therefore, the time from establishment to fixation can be estimated as

tfix1(q1)s/2lnNNest=1(q1)s/2lnNPfix(qs) (H.2)

where (q1)s/2 is its average growth rate between the establishment and fixation. Thus, the rate of adaptation is given by

R=(q1)stfix((q1)s)22lnNPfix(qs). (H.3)

Second, we focus on successive events of establishments at the edge of the fitness wave. We define test as the mean time interval between two successive establishments. An established sub-population grows like n(t)Neste(q1)st, from which the next event of establishment is produced with rate n(t)μBPfix(qs). Therefore, test can be estimated from

μBPfix(qs)0testn(t)dt1, (H.4)

which leads to test1(q1)sln[sμB]. Since the nose of the fitness wave advances at a speed R=stest, we have

R=stest(q1)s2lnsμB. (H.5)

By comparing Equations H.3 and H.5, we obtain

q1+2ln(NPfix(qs))lnsμB,R2s2ln(NPfix(qs))(lnsμB)2. (H.6)

By substituting Pfix(qs)1α1 into Equation H.6, we obtain

q1+2ln(Ns1α1)lnsμB,R2s2ln(Ns1α1)(lnsμB)2, (H.7)

where we used lnNq1α1lnN. In the limit α2, the above results reproduce those in Desai and Fisher (2007).

The case of α = 1 can be discussed in a similar way. Suppose that the population is monoclonal. The fixation probability of a mutant is given by PfixN1+s (see Equation A.25), which implies that the establishment size is roughly given by NestN1s. While the timescale of establishment of a mutant is given by (μBNPfix)1=(μBNs)1, the timescale of fixation is given by tfix1slogNNestlogN. Thus, the successive selection sweeps occur if (μBNs)1logN, or equivalently,

μBNslogN1(successiveselectivesweeps). (H.8)

By substituting PfixN1+s into Equation H.6, the rate of adaptation in the clonal-interference regime is given by

R2s3lnN(lnsμB)2. (H.9)

In the successive-sweeps regime, the adaptation rate is given by

R=sμBN×Pfix(s)sμBNs. (H.10)

Note that clonal interference becomes unlikely to occur as the offspring distribution becomes broader. For example, when α = 1, the population size needs to be N1041 for μB=104,s=0.05 to satisfy μBNslogN1.

Figure H1 shows the numerical results of the adaptation rate R versus the selection coefficient s. The parameters used in the simulation are in the regime of clonal interference. When 1<α, R is approximately proportional to s2, while, when α = 1, R is approximately proportional to s3, which are consistent with Equations H.6 and H.9. However, when α  =  1, the quantitative agreement between the numerical result and the theoretical prediction is not good, and a further investigation is needed to validate Equation H.9.

Figure H1.

Figure H1

The open markers show the numerical results of R as a function of s, while the curves show the theoretical predictions, based on the heuristic argument. The m rate of beneficial mutations is μ=104. The population size is N=10100 for α = 1,N=1010 for α=1.5, and N=108 for the Wright–Fisher model.

Appendix I: Numerical simulations

Simulations are implemented in C++ with the GNU scientific library’s random number generators. Results obtained from the simulations are analyzed by Mathematica. The codes are freely available upon request.

Numerical synthesis of Pareto random variables and α-stable distribution

In order to generate the mutant frequency of the gamete pool, we need to compute the sums of random Pareto variables,

M=i=1Nxui,W=i=1N(1x)vi, (I.1)

where ui, vi are drawn from the Pareto distribution PU(u)=α/uα+1(u1). One simple way to synthesize ui, vi is to sample a number r from the uniform distribution on (0, 1) and compute r1α.

To generate the sums M, N efficiently for large N (e.g., N106), we can use the generalized central limit theorem when xN and (1x)N are large. In simulations, when xN < 100, M is generated directly by synthesizing xN random variables {ui}, while, when xN100, M is generated by sampling a random number ζ from the α-stable distribution and then determining M=iui from Equation C.1. W is generated in a similar way.

After generating M and W, the population is updated by the binomial sampling with the success probability p=MM+W (although this sampling process can be omitted when α2 since the fluctuations associated with the binomial sampling is negligible compared to the fluctuations associated with M and N). Natural selection and mutations are implemented by modifying the success probability p=MM+W as

p(1+s)p(1+s)+(1p)(1μMW)+(1p)p(1+s)+(1p)μWM, (I.2)

where μWM is the mutation rate from the wild-type to the mutant allele, and μMW is the mutation rate in the reverse direction.

Site frequency spectrum

Since the SFS is proportional to the mean sojourn time, the SFS can be computed numerically by generating trajectories staring with x0=1N until fixation or extinction and measuring how many times a trajectory visits a given frequency interval on average.

Numerical simulation of the model of range expansion in the main text

We first review the numerical implementation of the range expansion model with two neutral alleles without mutations (Birzu et al. 2018). The per capita growth rate r(n) with an Allee effect is given by

r(n)=r0(1nK)(1+BnK), (I.3)

where n=n1+n2 is the sum of the two population densities, and B is the strength of cooperativity. In each deme, there are three types; allele 1, allele 2, and “empty.” At each time step, the configuration of deme x is updated by the trinomial sampling process with

pi=n˜iK(1r(n˜)τ)fori=1,2andpempty=1p1p2, (I.4)

where n˜i is the population density after migration,

n˜i(t,x)=m2ni(t,xa)+(1m)ni(t,x)+m2ni(t,x+a), (I.5)

and n˜ in the denominator of Equation I.4 is the sum of these densities, n˜=n˜1+n˜2, and a denotes the width of a deme. The expectation value of the total density n after one time step is given by

Ki=1,2pi=n˜1r(n˜)τn˜(1+r(n˜)τ), (I.6)

which explains the denominator of Equation I.4. In the simulation, a = 1 and τ = 1 are used.

As in the standard Wright–Fisher model, a mutation process can be introduced by using the success probabilities p=(p1,p2)T given by

p=Up, (I.7)

where p=(p1,p2)T and U is a matrix representing mutational transitions. In the case of symmetrical mutations in the main text, U is given by

U=(1μμμ1μ), (I.8)

This model serves as a microscopic description of our (non-spatial) macroscopic model of the population with a broad offspring distribution p(U=u)1uα+1. We can argue the relation between the parameters in the two models by comparing the coalescent timescales. As established in Birzu et al. (2018), for a semi-pushed wave (2<B<4), the coalescent timescale is given by

TcmicroN21γ(B)211γ(B)2. (I.9)

where γ(B)=vFv=2(B2+2B)1 is the ratio of the Fisher velocity vF=2Dr0 to the wave velocity v=r0D(B2+2B). On the other hand, the coalescent timescale Tcmacro in the macroscopic description for 1<α<2 is proportional to Nα1 (see Equation 15). By comparing the exponents, a semi-pushed wave with B corresponds to the macroscopic model with4

α=21γ(B)211γ(B)2+1. (I.10)

For example, B =3 corresponds to α=1.5. In addition, the mutation rate μmicro per generation in the microscopic model and the mutation rate μmacro per generation in the macroscopic model should be related by μmicro×Tcmicroμmacro×Tcmacro.

In the three panels (Left. Center, Right) in Figure 14B of the main text, The following parameters are used.

  • Left: B=1,μ=(5×104,5×105),K=28000 for the microscopic model, and α=1,θ=(1.5,0.15) for the macroscopic model.

  • Center: B=3,μ=(2×104,2×105),K=35000 for the microscopic model, and α=1.5,θ=(1.6,0.16) for the macroscopic model.

  • Right: B=8,μ=(1×105,1×106),K=57000 for the microscopic model, and the Wright–Fisher model, θ=(2.4,0.24) for the macroscopic model.

In all of the three cases, the growth rate r0τ=0.01 and the migration probability m=2Dτa2=0.125 are used in the microscopic model, and the population size N=105 is used in the macroscopic model. Note that, to compare the microscopic model with the macroscopic model, the value of the carrying capacity K for each case is chosen such that the size of the front population Kk, where k is the spatial decay rate of the population density,5 approximately agrees with the population size N=105 in the macroscopic model.

Appendix J: Areas swept by trajectories

J-1: A scaling argument on area distributions

Consider frequency trajectories that depart from a single mutant x0=1N and are eventually absorbed either at x =0 or at x =1. For each of such trajectories, we can define the area in xτ-space swept by the trajectory (see Figure J1),

A=0τabsx(τ)dτ, (J.1)

where τabs is the absorption time of the trajectory. While this quantity is defined for a population without spatial structure, we expect that it has a natural interpretation in a model of range expansion as a spatial integration over the mutant frequency (i.e., the abundance of the mutant type), since τ in Equation J.1 is related with the spatial position of the traveling wave in the comoving frame.

Figure J1.

Figure J1

An area A swept by a trajectory that eventually goes extinct and an area A swept by a trajectories that eventually gets fixed are illustrated. τabs and τabs are the extinction time and the fixation time, respectively.

Here, we examine how the area A defined in Equation J.1 depends on the exponent α of the offspring distribution. The left panel of Figure J2 shows the numerical results of the area distribution p(A) for α=1,1.5, and the Wright–Fisher model (corresponding to α2). In a wide range of A, areas are distributed according to p(A)1NA1+1α.

Figure J2.

Figure J2

Left: The area distribution p(A) for α=1,1.5 and the Wright–Fisher model. The straight lines show the scaling-argument predictions, p(A)1A1+1α. N=106. Right: The tail of p(A) in the large-A region.

Focusing on small areas, which correspond to extinct trajectories, this power-law behavior can be rationalized again from a scaling argument: First, by using Equation 3, a trajectory whose maximum frequency is x*1 sweeps an area roughly given by Ax*×τextx*α (see Figure J1), i.e., x*A1α. Second, from the neutrality, the cumulative probability Pr(X*>x*) that a single mutant achieves a frequency larger than x* before absorption is estimated as Pr(X*>x*)1Nx*. Hence, the density p(x*) is given by p(x*)=ddx*Pr(X*>x*)1Nx*2. Combining these two results, we can estimate the area distribution p(A) as

p(A)p(x*)dx*dA|x*=A1α1Nx*2(x*)α+1|x*=A1α1NA11α. (J.2)

When α20 (the Wright–Fisher limit), the distribution becomes p(A)1NA32, which can be analytically confirmed by solving a backward diffusion equation of the Wright–Fisher diffusion (see Appendix J-2).

The numerical results indicate that, when 1α<2, there is an uptick in the area distribution p(A), which comes from fixed trajectories (see the case of α = 1 in the right panel of Figure J2). The uptick becomes less pronounced as α increases. For the Wright–Fisher model, we can analytically prove that p(A) monotonically decreases with A.

J-2: Area distribution in the Wright–Fisher model

Here, we derive an analytical result of Equation J.1 for the Wright–Fisher diffusion process.

Consider a Langevin equation

dXdτ=v(X)+ξ(τ), (J.3)

with ξ(τ)ξ(τ)=2D(x)δ(ττ). Assume the initial value X(τ=0)=x0(0,1) and the absorbing boundaries at X =0, 1. For a given trajectory departing from x0 and ending at either one of the boundaries, we consider the “area” defined by

A=0τabsX(τ)dτ. (J.4)

where τabs is the absorption time.

The area distribution Π(A;x0) for a given initial condition X(0)=x0 obeys a backward equation. To show this, we discretize the dynamics;

ΔX=vh+W (J.5)

where h denotes a short time interval and WiWj=2Dhδi,j. The transition density is given by

T(x0+Δx|x0)=1π(2D(x0)h)exp((Δxv(x0)h)22(2D(x0)h)). (J.6)

Note that

Δxx0=v(x0)h,(Δx)2x0=v(x0)h+2D(x0)h. (J.7)

By separating a trajectory into the initial step and the remaining part, we have

Π(A;x0)=d(Δx)T(x0+Δx|x0)Π(Ax0h;x0+Δx)+o(h), (J.8)

By Taylor-expanding Π(Ax0h;x0+Δx), we have

Π(Ax0h;x0+Δx)=Π(A;x0)ΠAx0h+Πx0Δx+122ΠA2x0h22ΠAx0x0hΔx+12Πx0Δx2+=Π(A;x0)ΠAx0h+Πx0Δx+12Πx0Δx2+o(h). (J.9)

Therefore, Equation J.8 becomes

Π(A;x0)=Π(A;x0)ΠAx0h+Πx0Δxx0+122Πx02Δx2x0+o(h). (J.10)

By using Equation J.7, we obtain

x0ΠA=v(x0)Πx0+D(x0)2Πx02. (J.11)

More generally, it can be shown that, for the following integral,

A˜=0T*dtf(X), (J.12)

the distribution Π(A˜;x0) satisfies

f(x0)ΠA˜=v(x0)Πx0+D(x0)2Πx02. (J.13)

In the neutral Wright–Fisher model, v(x0)=0 and D(x0)=x0(1x0). The backward equation in Equation J.11 is given by

ΠA=(1x0)2Πx02. (J.14)

From this equation, it follows that Π(A|x0) monotonically decreases with A0 because the spectrum of the operator x02(ik)2 is non-positive.

We can determine the area distribution p(A) analytically at least for small A. We are interested in the invasion by a single mutant, x0=1N1. Furthermore, for the purpose of determining the behavior for small areas, we expect that we can ignore the presence of the high-frequency boundary x =1 and solve the problem on the semi-infinite line x0(0,). Therefore, we consider the following problem:

ΠA=2Πx02,Π(A;x0=0)=g(A)Π(A=0;x0)=0forx0>0limx0Π(A,x0)=0 (J.15)

In our case, g(A)=δ(A), because the trajectory starting from x0=0 has A =0.

For a function f(A) of A, we write the Laplace transformation as

f^(s)=[f(A)]=0dsf(t)esA. (J.16)

By taking the Laplace transform with respect to A, we have

sΠ^(s;x0)=2Π^(s,x0)x02,Π^(s,0)=g^(s). (J.17)

The solution is

Π^(s;x)=ex0sg^(s). (J.18)

We take the inverse of the Laplace transformation,

Π(A;x0)=1(ex0sg^(s)). (J.19)

From the convolution theorem, this is given by the convolution of 1(ex0s)=x02πA32ex024A and g(A);

Π(A;x0)=0AdAx02πA32ex024Ag(AA). (J.20)

When g(A)=δ(A), we have

Π(A;x0)=x02πA32ex024A. (J.21)

Especially, when x0=1/N, we have

Π(A;x0=1N)=12πNA32e14AN212πNA32, (J.22)

where we have used e14AN21 since only areas larger than x0×dτ1N×1N=1N2 are meaningful for a finite-size population.

Appendix K: Forward-in-time behaviors of the Eldon–Wakeley model

Here, we present simulation results of the median allele frequency and the median and mean square displacements in the Eldon–Wakeley model (Eldon and Wakeley 2006) (see also Der et al. 2012). As shown below, unlike our model, these quantities do not exhibit sustained power-law behaviors, because of the existence of a characteristic size ψ in the offspring distribution.

We consider the neutral Eldon–Wakeley model, where the following offspring distribution PU(u) is given by [see Equation (7) in Eldon and Wakeley (2006)];

PU(u)=(1Nγ)δu,2+Nγδu,ψN, (K.1)

where δa,b is the Kronecker delta. ψ(0,1) and the parameters characterizing how large and frequent ‘sweepstakes’ are.

The limiting process as N depends on γ [see Equation (9) in Der et al. (2012)]. For γ>2, the process is the same as the Wright–Fisher diffusion, while, for γ<2, it is described by a jump process whose backward-time generator is given by

LP(x,τ)=xP(x+ψ(1x),τ)P(x,τ)+(1x)P(xψx,τ), (K.2)

where the continuous time τ is related with generations t by τ=t/Nγ. The first term of the generator represents a frequency-increasing jump xx+ψ(1x) with rate x, while the last one represents a frequency-decreasing jump xxψx with rate 1x.

Figure K1 shows numerical simulation results for the median of allele frequencies and the median/mean square displacements. The median frequency for a small initial frequency x01 is well described by Xmed(t)=x0eψNγt (Figure K1A). This exponential decay can be expected from the generator in Equation K.2; for x1, frequency-increasing jumps (with rate x) are unlikely to occur, and an allele frequency typically decreases by ψx with rate 1x1. Thus, the median frequency in the Eldon–Wakeley model does not exhibit a power-law behavior.

Figure K1.

Figure K1

Simulation results of the Eldon-Wakeley model. (A) The median frequency of the Eldon-Wakeley model (red solid) and Xmed(t)=x0eψNγt (black dashed). N=103, γ = 1, ψ=0.1,x0=0.05. (B) The mean and median square displacements (blue and red curves, receptively). The black dashed line 1/t indicates the expectation from the Wright–Fisher (or Moran) model. N=103, γ = 1, ψ=0.2,x0=0.5.

As for frequency fluctuations, while the mean SD exhibits a normal diffusion as in the Moran (or the Wright–Fisher) model, i.e., MeanSDt, the median SD does not exhibit a sustained power-law behavior (Figure K1B); in a short- and long-time scales, the median SD exhibits a normal diffusion (MedianSDt), but, for an intermediate timescale (t5001000 generations in the figure), it increases more rapidly than expected from a normal diffusion.

Footnotes

1

η* is obtained from 0=f(η*)=1η*log(ϵ)+πtanπη*log(ϵ)+πtanπη*log(ϵ)+11η*.

2

Although the magnitudes of –1 and logπlogϵ are small compared to logϵ, we need to retain these two terms because f(η*) contributes to Iϵ through ef(η*).

3

For example, one of the integrals over Δ>0 is

Δ>0dΔfΔ(x)P(x)=Δ>0dΔxΔα+1τ2ηg(ξ)=τα+1α1δ>0dδξδα+1g(ξ),

while one of the integrals over Δ<0 is

Δ<0dΔfΔ(x)P(x)=Δ<0dΔx(x+Δ)α1Δα+1τ2ηg(ξ)=τ2α1δ>0dδξδα+1g(ξ),

where we have changed the integration variable from Δ to δ=Δτη.

4

Note that the definition of the parameter αH in Birzu et al. (2018) is different from our definition of α. For 1<α<2, which corresponds to the semi-pushed wave region 1<αH<0, the two definitions are related by αH=α1.

5

Specifically, the rate k is given by k=r0D for 0<B<2 and by k=r0B2D for B2 (Birzu et al. 2018).

Literature cited

  1. Adam DC, Wu P, Wong JY, Lau EH, Tsang TK, et al. 2020. Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong. Nat Med. 26:1714–1719. [DOI] [PubMed] [Google Scholar]
  2. Bah B, Pardoux E.. 2015. The Λ-lookdown model with selection. Stoch Process Appl. 125:1089–1126. [Google Scholar]
  3. Barton NH, Etheridge AM.. 2011. The relation between reproductive value and genetic contribution. Genetics. 188:953–973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Basdevant A, Goldschmidt C.. 2008. Asymptotics of the allele frequency spectrum associated with the Bolthausen–Sznitman coalescent. Electron J Probab. 13:486–512. [Google Scholar]
  5. Berestycki J, Berestycki N, Limic V.. 2014. Asymptotic sampling formulae for Λ-coalescents. Ann IHP Prob Stat. 50:715–731. [Google Scholar]
  6. Berg JJ, Coop G.. 2014. A population genetic signal of polygenic adaptation. PLoS Genet. 10:e1004412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Birzu G, Hallatschek O, Korolev KS.. 2018. Fluctuations uncover a distinct class of traveling waves. Proc Natl Acad Sci USA. 115:E3645–E3654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Birzu G, Hallatschek O, Korolev KS.. 2021. Genealogical structure changes as range expansions transition from pushed to pulled. Proc Natl Acad Sci. 118:34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bollback JP, York TL, Nielsen R.. 2008. Estimation of 2Nes from temporal allele frequency data. Genetics. 179:497–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Bolthausen E, Sznitman A.-S.. 1998. On Ruelle’s probability cascades and an abstract cavity method. Commun Math Phys. 197:247–276. [Google Scholar]
  11. Brunet É, Derrida B, Mueller AH, Munier S.. 2007. Effect of selection on ancestry: an exactly soluble case and its phenomenological generalization. Phys Rev E Stat Nonlin Soft Matter Phys. 76:041104. [DOI] [PubMed] [Google Scholar]
  12. Cannings C. 1974. The latent roots of certain Markov chains arising in genetics: a new approach, I. Haploid models. Adv Appl Prob. 6:260–290. [Google Scholar]
  13. Crow JF, Kimura M.. 1970. An Introduction to Population Genetics Theory. New York, Evanston and London: Harper & Row, Publishers. [Google Scholar]
  14. Cvijović I, Good BH, Desai MM.. 2018. The effect of strong purifying selection on genetic diversity. Genetics. 209:1235–1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Der R, Epstein C, Plotkin JB.. 2012. Dynamics of neutral and selected alleles when the offspring distribution is skewed. Genetics. 191:1331–1344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Der R, Plotkin JB.. 2014. The equilibrium allele frequency distribution for a population with reproductive skew. Genetics. 196:1199–1216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Desai MM, Fisher DS.. 2007. Beneficial mutation–selection balance and the effect of linkage on positive selection. Genetics. 176:1759–1798. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Desai MM, Walczak AM, Fisher DS.. 2013. Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics. 193:565–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Eldon B. 2009. Structured coalescent processes from a modified Moran model with large offspring numbers. Theor Popul Biol. 76:92–104. [DOI] [PubMed] [Google Scholar]
  20. Eldon B. 2011. Estimation of parameters in large offspring number models and ratios of coalescence times. Theor Popul Biol. 80:16–28. [DOI] [PubMed] [Google Scholar]
  21. Eldon B, Wakeley J.. 2006. Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics. 172:2621–2633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Etheridge AM, Griffiths RC, Taylor JE.. 2010. A coalescent dual process in a Moran model with genic selection, and the lambda coalescent limit. Theor Popul Biol. 78:77–92. [DOI] [PubMed] [Google Scholar]
  23. Evans SN, Shvets Y, Slatkin M.. 2007. Non-equilibrium theory of the allele frequency spectrum. Theor Popul Biol. 71:109–119. [DOI] [PubMed] [Google Scholar]
  24. Ewens WJ. 1963. The diffusion equation and a pseudo-distribution in genetics. J R Stat Soc Series B Methodol. 25:405–412. [Google Scholar]
  25. Feder AF, Kryazhimskiy S, Plotkin JB.. 2014. Identifying signatures of selection in genetic time series. Genetics. 196:509–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Fisher R. 1930. The Genetical Theory of Natural Selection. London, UK: Oxford University Press. [Google Scholar]
  27. Foll M, Shim H, Jensen JD.. 2015. WFABC: a Wright–Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Mol Ecol Resour. 15:87–98. [DOI] [PubMed] [Google Scholar]
  28. Fusco D, Gralka M, Kayser J, Anderson A, Hallatschek O.. 2016. Excess of mutational jackpot events in expanding populations revealed by spatial Luria–Delbrück experiments. Nat Commun. 7:12760. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Gardiner C. 2009. Stochastic Methods, Vol. 4. Berlin: Springer. [Google Scholar]
  30. Gnedenko BV, Kolmogorov A.. 1968. Limit Distributions for Sums of Independent Random Variables, Vol. 233. MA: Addison-wesley Reading. [Google Scholar]
  31. Griffiths RC. 2014. The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Adv Appl Prob. 46:1009–1035. [Google Scholar]
  32. Hallatschek O. 2018. Selection-like biases emerge in population models with recurrent jackpot events. Genetics. 210:1053–1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Hallatschek O, Nelson DR.. 2008. Gene surfing in expanding populations. Theor Popul Biol. 73:158–170. [DOI] [PubMed] [Google Scholar]
  34. Hedgecock D. 1994. Does variance in reproductive success limit effective population sizes of marine organisms. Genet Evol Aquat Organ. 122:122–134. [Google Scholar]
  35. Karlin S, Taylor HE.. 1981. A Second Course in Stochastic Processes. New York: Academic Press. [Google Scholar]
  36. Kimura M. 1955. Stochastic Processes and Distribution of Gene Frequencies under Natural Selection. Cold Spring Harbor Symp Quant Biol. 20:57–66. [DOI] [PubMed] [Google Scholar]
  37. Kimura M. 1969. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics. 61:893–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Kosheleva K, Desai MM.. 2013. The dynamics of genetic draft in rapidly adapting populations. Genetics. 195:1007–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Krapivsky PL, Redner S, Ben-Naim E.. 2010. A Kinetic View of Statistical Physics. New York: Cambridge University Press. [Google Scholar]
  40. Laxminarayan R, Wahl B, Dudala SR, Gopal K, Neelima S, Reddy KJ, et al. 2020. Epidemiology and transmission dynamics of COVID-19 in two Indian states. Science. 370:691–697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM.. 2005. Superspreading and the effect of individual variation on disease emergence. Nature. 438:355–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Luria SE, Delbrück M.. 1943. Mutations of bacteria from virus sensitivity to virus resistance. Genetics. 28:491–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Neher RA, Hallatschek O.. 2013. Genealogies of rapidly adapting populations. Proc Natl Acad Sci USA. 110:437–442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Neher RA, Shraiman BI.. 2011. Genetic draft and quasi-neutrality in large facultatively sexual populations. Genetics. 188:975–996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Sackman AM, Harris RB, Jensen JD.. 2019. Inferring demography and selection in organisms characterized by skewed offspring distributions. Genetics. 211:1019–301684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Schraiber JG, Evans SN, Slatkin M.. 2016. Bayesian inference of natural selection from allele frequency time series. Genetics. 203:493–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Schweinsberg J. 2003a. Coalescent processes obtained from supercritical Galton–Watson processes. Stoch Process Appl. 106:107–139. [Google Scholar]
  48. Schweinsberg J. 2003b. Coalescent processes obtained from supercritical Galton–Watson processes. Stoch Process Appl. 106:107–139. [Google Scholar]
  49. Schweinsberg J. 2017. Rigorous results for a population model with selection II: genealogy of the population. Electron J Probab. 22:1–54. [Google Scholar]
  50. Tataru P, Simonsen M, Bataillon T, Hobolth A.. 2017. Statistical inference in the Wright–Fisher model using allele frequency data. Syst Biol. 66:e30–e46. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Tellier A, Lemaire C.. 2014. Coalescence 2.0: a multiple branching of recent theoretical developments and their applications. Mol Ecol. 23:2637–2652. [DOI] [PubMed] [Google Scholar]
  52. Uchaikin VV, Zolotarev VM.. 1999. Chance and Stability: Stable Distributions and Their Applications. UtrechtVSP. [Google Scholar]
  53. Weissman DB, Desai MM, Fisher DS, Feldman MW.. 2009. The rate at which asexual populations cross fitness valleys. Theor Popul Biol. 75:286–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Wright S. 1931. Evolution in Mendelian populations. Genetics. 16:97–159. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.


Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES