Dynamic sampling bias and overdispersion induced by skewed offspring distributions

Takashi Okada; Oskar Hallatschek

doi:10.1093/genetics/iyab135

. 2021 Sep 3;219(4):iyab135. doi: 10.1093/genetics/iyab135

Dynamic sampling bias and overdispersion induced by skewed offspring distributions

Takashi Okada ^1,², Oskar Hallatschek ^1,^✉

Editor: K Jain

PMCID: PMC8664600 PMID: 34718557

Abstract

Natural populations often show enhanced genetic drift consistent with a strong skew in their offspring number distribution. The skew arises because the variability of family sizes is either inherently strong or amplified by population expansions. The resulting allele-frequency fluctuations are large and, therefore, challenge standard models of population genetics, which assume sufficiently narrow offspring distributions. While the neutral dynamics backward in time can be readily analyzed using coalescent approaches, we still know little about the effect of broad offspring distributions on the forward-in-time dynamics, especially with selection. Here, we employ an asymptotic analysis combined with a scaling hypothesis to demonstrate that over-dispersed frequency trajectories emerge from the competition of conventional forces, such as selection or mutations, with an emerging time-dependent sampling bias against the minor allele. The sampling bias arises from the characteristic time-dependence of the largest sampled family size within each allelic type. Using this insight, we establish simple scaling relations for allele-frequency fluctuations, fixation probabilities, extinction times, and the site frequency spectra that arise when offspring numbers are distributed according to a power law.

Keywords: natural selection, site-frequency spectrum, fixation probability, stationary distribution, traveling waves, Λ-coalescent, jackpot events, multiple mergers

Interpreting The Genetic Differences between and within populations we observe today requires a robust understanding of how allele frequencies change over time. Most theoretical and statistical advancements have been based on the Wright–Fisher model (Fisher 1930; Wright 1931), which has shaped the intuition of generations of population geneticists for how evolutionary dynamics works (Crow and Kimura 1970). The Wright–Fisher model assumes that the genetic makeup of a generation results from resampling the gene pool of the previous generation, whereby biases are introduced to account for most relevant evolutionary forces, such as selection, migration, or variable population sizes. For large populations, the resulting dynamics can be approximated by a biased diffusion process, which simplifies the statistical modeling of the genetic diversity. More importantly, the Wright–Fisher diffusion is the limiting allele frequency process of a wide variety of microscopic models, as long as they satisfy seemingly mild assumptions (see below). This flexibility has made the Wright–Fisher diffusion the standard model of choice to infer the demographic history of a species, loci of selection, or the strength of polygenic selection (Bollback et al. 2008; Berg and Coop 2014; Feder et al. 2014; Foll et al. 2015; Schraiber et al. 2016; Tataru et al. 2017).

Despite its versatility, the Wright–Fisher diffusion can be a poor approximation when the population dynamics is driven by rare but strong number fluctuations. It is increasingly recognized that number fluctuations can be inflated for very different reasons. First, the considered species may have a broad offspring distribution, which occurs for marine species and plants with a Type III survivorship curve (Hedgecock 1994; Eldon and Wakeley 2006) as well as viruses and fungi [reviewed in Tellier and Lemaire (2014)]. Broad offspring distributions also arise in infectious disease, when relatively few super-spreaders are responsible for the majority of the disease transmissions (Lloyd-Smith et al. 2005). In the recent SARS-CoV-2 pandemic, for example, a strongly skewed offspring distributions were consistently inferred from both contact tracing data and infection cluster size distributions (Adam et al. 2020; Laxminarayan et al. 2020). Understanding allele frequency trajectories in these systems is extremely challenging, as statistical inference based on the Wright–Fisher model is often misleading (see, e.g., Sackman et al. 2019).

A second mechanism for strong number fluctuations are so-called jackpot events, which can occur in any species no matter the actual offspring distribution. Jackpot events are population bottlenecks that arise when the earliest, the most fit or the most advanced individuals have an unusual large number of descendants. Temporal jackpot events (“earliest”) were first discovered by Luria and Delbrück (1943) and studied as a signal of spontaneous mutations in an expanding population. They observed that a phage resistant mutant clone can grow exceptionally large if the resistance mutation by chance occurs early in an expansion event. Despite being rare, these jackpot events are easily detectable in large populations because they strongly inflate the variance of the expected number of mutants and lead to power-law descendant distributions.

The very same descendant distribution arises in models of rampant adaptation and background selection. In these models, mutations generate jackpot events when they arise within the few fittest individuals (Neher and Hallatschek 2013). Jackpot events also arise in range expansions, where the most advanced individuals in the front of the population have a good chance to leave many descendants over the next few generations. This phenomenon of gene surfing can produce a wide range of scale-free descendant distributions (Hallatschek and Nelson 2008; Fusco et al. 2016; Birzu et al. 2018, 2021).

To account for skewed offspring distributions, a number of theoretical studies have been conducted in the context of the coalescent framework. Based on this backward-in-time, a striking feature of broad offspring distributions is the simultaneous merging of multiple lineages. One of the most widely studied models is the beta-coalescent (Schweinsberg 2003a), which is a subclass of the Λ-coalescent and corresponds to the population dynamics with a power-law offspring number distribution $\propto u^{- 1 + α}$ . The case α = 1, called Bolthausen–Sznitman coalescent (Bolthausen and Sznitman 1998), has been shown to be the limiting coalescent in models of so-called “pulled” traveling waves, which describe the most basic scenarios of range expansions (Brunet et al. 2007) and of rampant adaptation (Desai et al. 2013; Kosheleva and Desai 2013; Neher and Hallatschek 2013; Schweinsberg 2017). Moreover, so-called “semi-pushed” traveling waves that contain some level of co-operativity, induced e.g. by an Allee effect, generate power-law offspring distributions with $1 < α < 2$ (Birzu et al. 2018), indicating that their coalescent is intermediate between the Bolthausen–Sznitman and Kingman coalescents.

The tractability of coalescent approaches make it particularly useful for inferring demographic histories and detecting outlier behaviors (Basdevant and Goldschmidt 2008; Eldon 2009 2011). However, as it is notoriously difficult to integrate selection in coalescent frameworks, there is also a strong need for forward-in-time approaches that capture the competition between genetic drift and selection. While for $α \geq 2$ , the limiting allele frequency dynamics is given by the well-understood Wright–Fisher process, much less is known for the case $α < 2$ . This is unfortunate because, as mentioned above, any exponent $1 \leq α \leq 2$ can arise dynamically.

Recently, the forward-dynamics of the special case α = 1 was studied by one of the authors (Hallatschek 2018), finding that an emergent sampling bias generates strong deviations from the Wright–Fisher dynamics. The sampling bias arises because, in each generation, an allele with high frequency can sample more often and, hence, deeper into the tail of the offspring distribution than an allele with small frequency. The major allele of a biallelic site, therefore, has with high probability a greater number of offspring per individual than the minority type. This sampling bias acts like a selective advantage of the major allele, but its average effect is compensated by rare frequency hikes of the minor allele so that the expected change in frequency only changes in the presence of genuine selection.

Here, we focus on the understudied case $1 < α < 2$ intermediate between the known cases of $α \geq 2$ , corresponding to the Wright–Fisher diffusion, and α = 1 described by jumps and sampling bias but vanishing diffusion. Similarly to the α = 1 borderline case, we find that a minor-allele-suppressing sampling bias arises but that it is fading over time as the offspring distributions are sampled more and more thoroughly. This time-dependent sampling bias determines the scaling of the fixation probability, extinction time, stationary distribution, and site frequency spectrum (SFS). The combination of jumps and bias generates a so-called Levy-flight which controls the variability of allele frequency trajectories, for instance between unlinked genes or between populations. The flexibility of our model should enable to fit wide range cases that deviate from the Wright–Fisher diffusion.

Model and methods

Model

To study the impact of broad offspring numbers, we consider an idealized, panmictic, haploid population of constant size N that produces non-overlapping generations in the following way. First, we associate with each individual i a “reproductive value” (Fisher 1930; Barton and Etheridge 2011) U_i, which represents its expected contribution to the population of the next generation. The random numbers U_i are drawn from a specified distribution P_U. In a second step, we sample each individual in proportion to its reproductive value until we have obtained N new individuals representing the next generation.

Our model belongs to the general class of Cannings models (Cannings 1974). The Wright–Fisher model is obtained if we choose P_U to be a Dirac delta function, such that all individuals have the same reproductive value.

We focus most of our analysis on the dynamics of two mutually exclusive alleles, a wild type and a mutant allele. The dynamics of the two alleles is captured by the time-dependent frequency X(t) ( $0 \leq X \leq 1$ ) of mutants. The wild type frequency is given by $1 - X (t)$ . The total reproductive values M and W of the mutant population and the wild type population, respectively, are given by

M \equiv \sum_{i = 1}^{N X} U_{i}^{(M)}, W \equiv \sum_{i = 1}^{N (1 - X)} U_{i}^{(W)} .

(1)

Here, $U_{i}^{(M)}$ and $U_{i}^{(W)}$ are the individual reproductive values of mutants and wild types and sampled from the distribution P_U. The population at the next generation is generated by binomially sampling N individuals with success probability (namely, the probability that the parent of a randomly chosen individual is a mutant) $\frac{M}{M + W}$ . Mutations and selection are included as in the Wright–Fisher model. If the fitness of the mutant relative to the wild-type is $1 + s$ , where s is the selection coefficient, and the forward- and back- mutation rates are μ₁ and μ₂, respectively, then the success probability is given by $\frac{(1 - μ_{2}) (1 + s) M + μ_{1} W}{(1 + s) M + W}$ .

For the offspring distribution P_U, we consider a family of fat-tailed distributions, which asymptotically behave as $P_{U} \sim \frac{1}{u^{α + 1}}$ with α being a positive constant. To make our presentation concrete, we choose $P_{U} (u) = α / u^{α + 1} (u \geq 1)$ , which is known as the Pareto distribution. In the large population size limit, the neutral allele-frequency dynamics is known to only depend on the asymptotic power law exponent α provided we measure time in units of the coalescence time (Schweinsberg 2003b). For different but closely related modeling of broad offspring distributions, see Bah and Pardoux (2015).

Methods

Our goal is to understand the asymptotic dynamics of our model for large N, where the frequency becomes continuous over time (Kimura 1955; Gardiner 2009) provided that $α \geq 1$ (Schweinsberg 2003b). We first present simulation results regarding relevant measures in the population genetics. Later, we provide a heuristic argument to explain them. Many separate observations (the fixation probability, extinction time, allele frequency fluctuations, stationary distribution, and SFS) can be matched up with a unifying scaling picture.

Below, t and $τ = t / T_{c}$ denote a time in units of generations and one normalized by the characteristic (coalescent) timescale T_c, respectively. T_c depends on the population size and the exponent α as follows: T_c = N when $α > 2, T_{c} = N / log N$ when α = 2, $T_{c} = N^{α - 1}$ when $1 < α < 2$ , and by $T_{c} = log N$ when α = 1. These timescales were originally derived in the coalescent framework (Schweinsberg 2003b). Later, we explain how they can be rationalized within the forward-in-time approach.

To understand the frequency dynamics when $1 \leq α < 2$ , it is essential to distinguish between average and typical trajectories. As a proxy for typical trajectories, we use the median of the frequencies, denoted by $X^{med} (τ)$ , throughout this paper.

Results

Neutral dynamics: typical trajectories and extinction time

First, we characterize the allele frequency dynamics in the absence of selection s = 0. In this neutral limit, the expected value of the allele frequency does not change over time, i.e., $〈 X (t) 〉 = X (0)$ . Yet, despite the overall neutrality, a typical trajectory experiences a bias against the minority allele. This can be seen in Figure 1, where the mean and median are plotted across many realizations that start from the same frequency $X (0) = 0.01$ . While the mean does not change over time, as required from neutrality, the median decays to zero in an α-dependent manner. By symmetry, the median increases toward fixation if the starting frequency is larger than 50%. Thus, the median experiences a bias against the minor allele. Note also that, when $1 < α < 2$ , the velocity of the median approaching extinction decreases as it approaches the extinction boundary (see the red curve in Figure 1). As we will show later, an uptick of the SFS at the boundaries originates from this slowing.

The mean (blue dashed curve) and median (red solid curve) of allele frequency trajectories for $α = 0.8, 1, 1.5, 2,$ and 2.5. For each α, 10⁴ trajectories are generated with the initial frequency $x_{0} = 0.01$ . For ease of viewing, only 50 trajectories are shown in gray in each panel. The time t in units of generations and the one $τ = \frac{t}{T_{c}}$ re-scaled by the coalescent timescale *T_c* are shown in the horizontal axes. The dependence of the coalescent time on the population size N is written below each panel. The population size is $N = 10^{5}$ .

Numerical simulations of the early part of trajectories show that time-dependent median displacement follows a simple power law,

Δ X^{med} \equiv X^{med} (τ) - X (0) \sim - τ^{\frac{1}{α}},

(2)

up to an X(0)-dependent prefactor. Figure 2 shows the numerical result for $α = 1.5$ . The red curve represents the median of trajectories, which agrees well with $Δ X^{med} \sim - τ^{\frac{2}{3}}$ .

Trajectories of the median (red thick line) and the mean (blue dashed) of allele frequency when $α = 1.5$ and $N = 10^{8}$ . Inset: The trajectory (red points) of $| Δ X^{med} | = X (0) - X^{med} (τ)$ is shown in log–log plot. The median agrees well with the expectation from the scaling argument $Δ X^{med} \sim - τ^{\frac{1}{α}}$ (black solid line).

Next we quantify the time to extinction, which turns out to be driven by the above minor-allele suppressing bias. Numerical results of the mean extinction time are consistent with

τ_{ext} \sim X {(0)}^{α - 1},

(3)

as shown in Figure 3. Hence, in units of the coalescence time, the mean extinction time $τ_{ext}$ becomes larger as α decreases (namely, for a broader offspring distribution). Note, however, that if one measures time in units of generations, Equation 3 can be rewritten as $t_{ext} = τ_{ext} T_{c} \sim {(N X (0))}^{α - 1}$ , which becomes smaller as α decreases since $N X (0) \geq 1$ . As we show later, Equation 2 can be analytically derived from a short-time approximation of the dynamics (Equation 25). Equation 3 can be explained from an effective sampling bias (Equation 35).

The mean extinction time $t_{ext}$ (in units of generations) as a function of initial allele frequency X(0) is plotted for $α = 1.4, 1.5, 1.6$ . Each of the straight lines has the slope $α - 1$ . The initial-frequency dependence of $t_{ext}$ can be fitted well by Equation 3. The population size is $N = 10^{8}$ .

Allele frequency fluctuations as a signature of broad offspring distributions

Next, we explore to what extent the spectrum of allele frequency fluctuations can provide a clue for identifying the exponent α of the offspring distribution. A deviation from the Wright–Fisher diffusion is most clearly revealed by measuring the median square displacement [median standard deviation (SD)],

Median SD \equiv M [{(X (t) - X (t = 0))}^{2}],

(4)

where $M [\cdot]$ denotes taking the median (e.g., $M [X (t)] = X^{med} (t)$ ). To measure the median SD, we simulate 1000 neutral allele frequency trajectories with initial condition $X (0) = 0.5$ , for $α = 1, 1.5$ and the Wright–Fisher model (Figure 4A). As shown in Figure 4B, the median SD computed from this data set is consistent with the scaling,

Median SD \sim t^{\frac{2}{α}},

(5)

when $t / T_{c} ≪ 1$ . Noting $1 \leq α < 2$ , this scaling means that typical fluctuations characterized by the median SD exhibit super-diffusion. Later, we derive the superdiffusive exponent $\frac{2}{α}$ in Equation 5 analytically (Equation 32).

(A) Fluctuations of neutral allele frequencies when $α = 1.5$ and $N = 10^{5}$ . For $X (0) = 0.5$ , the median $X^{med} (t)$ (red) is constant as well as the mean $〈 X (t) 〉$ (blue). (B) The median square displacement computed from a data set of 1000 trajectories. For $α = 1, 1.5$ and the Wright–Fisher model, $N = 10^{8}, 10^{4},$ and 10³ are used respectively. The straight lines represent the scaling in Equation 5. For α = 1, the fitting after $t ≳ 5$ is not perfect, since $t / T_{c} = t / \ln N ≪ 1$ is not satisfied. (C) The mean square displacement (mean SD) for different values of α. Solid lines represent linear scaling, which is expected for a regular diffusion process. (D) Data-size dependence of the estimated diffusion exponent $κ_{estimated}$ for the mean SD (blue circle) and that for the median SD (orange triangle). See the main text for the detailed explanation. The horizontal lines show $κ = \frac{2}{α}$ and κ = 1. The bars represent the standard deviations of $κ_{estimated}$ , $α = 1.3$ , and $N = 10^{8}$ are used.

Usually, allele frequency fluctuations are quantified by using the mean SD $\equiv 〈 {(X (t) - X (0))}^{2} 〉$ , rather than the median SD. For the Wright–Fisher diffusion, the distinction between these two measures is irrelevant since both of them increase linearly with time, except with differing prefactors. However, for $1 \leq α < 2$ , the α-dependence in Equation 5 can be detected by measuring the median SD. As shown in Figure 4C, the mean SD (computed from a large data set) grows linearly in time even when α is less than 2, as if the underlying process was diffusive.

That the dynamics is not diffusive also impacts the mean SD, but somewhat subtly in that its value depends on the size of the data set (i.e., the number of frequency trajectories) used to measure it. This is because while rare large jumps contribute the mean SD in a large data set, these jumps are not observed in a small data set (with high probability). To demonstrate this data-size dependence, we prepare an ensemble of data sets, where each data set consists of a given number of allele-frequency trajectories. Then, for each data set, we measure the diffusion exponent κ, which is defined by

Mean SD \propto t^{κ} .

(6)

In Figure 4D, the ensemble-averaged exponent is shown by the blue circle. We can see that, as the data size increases, fluctuations characterized by the mean SD exhibit a crossover from super-diffusion ( $κ = \frac{2}{α}$ ) to normal diffusion (κ = 1). For the median SD, by contrast, we find that its diffusion exponent κ can be computed reliably without any significant dependence on the size of the data set (orange triangles in Figure 4D). For example, under the parameter setting in Figure 4D, given a date set of 320 trajectories, the diffusion exponent $κ_{estimated}$ of the median SD falls within the interval $[1.45, 1.57]$ with probability $\sim 68 %$ . This in turn predicts $α_{estimated} : = \frac{κ_{estimated}}{2} \approx 1.32 \pm 0.05$ , which is close to the actual value $α = 1.3$ .

Fixation probability

Next, we examine the effect of natural selection on the fixation probability of beneficial mutations. We consider a mutant with positive selective advantage s > 0 arising in a monoclonal population. The fixation probability $P_{fix} (s)$ of a single mutant depends on the parameter α of the offspring distribution. In the Wright–Fisher model (or equivalently, $α \geq 2$ ), the fixation probability can be obtained using a diffusion approximation and is given by $P_{fix} (s) = \frac{1 - e^{- 2 s}}{1 - e^{- 2 N s}}$ , which becomes $P_{fix} \approx 2 s$ when $N s ≫ 1$ and s is small. When α = 1, an analytic result has been recently obtained in (Hallatschek 2018), which can be approximated as $P_{fix} (s) \sim \frac{1}{N^{1 - s}}$ . For the intermediate case, $1 < α < 2$ , we find that the fixation probability is given by

P_{fix} (s) \sim s^{\frac{1}{α - 1}} .

(7)

See Figure 5 for the numerical results. Note that since $P_{fix} (s) \to \frac{1}{N}$ in the neutral limit independently of α, these results hold for sufficiently strong selection, $1 ≪ N s^{\frac{1}{α - 1}}$ . Equation 7 can be deduced from the balance between the selection force and an emergent sampling bias (Equation 38 and Figure 13A).

The fixation probability $P_{fix}$ as a function of selective advantage s. The lines are the expectations from the scaling argument in Equation 40. The population size is $N = 10^{8}$ .

(A) The crossover from the effective bias to genuine selection. $V (X) \sim - C X^{2 - α} + σ X$ is plotted, where C is a positive coefficient and $σ > 0$ . Deterministically, an unstable point exists at $x \sim X_{c}$ . (B) The balance between the effective bias and mutation. $V (X) \sim - C X^{2 - α} + θ$ is plotted. Deterministically, a stable point exists at $X \sim θ^{1 / (2 - α)}$ .

As Equation 7 shows, for a fixed population size and selective advantage, the fixation probability becomes smaller as α decreases. Intuitively, this is because, for smaller α, the success of fixation in catching a ride on a jackpot event depends more strongly on luck than on fitness differences.

Site frequency spectrum

We return to the neutral case and present the scaling behaviors of the neutral SFS. The SFS is often used as a convenient summary of the genetic diversity within a population. Theoretically, the SFS is defined in the infinite alleles model (Kimura 1969) as the density $f_{SFS} (x)$ of neutral derived alleles in the population (namely, $f_{SFS} (x) d x$ is the number of derived alleles in the frequency window $[x - \frac{d x}{2}, x + \frac{d x}{2}]$ ).

Figure 6 shows numerical plots of the neutral SFS for $α = 1, 1.5$ , and the Wright–Fisher model. In the standard Wright–Fisher model, the SFS is proportional to $1 / x$ , which decreases monotonically as x increases. By contrast, when offspring numbers are broadly distributed (when $α < 2$ ), the SFS is non-monotonic with a somewhat surprising uptick toward the fixation boundary. When α = 1, the analytic understandings of asymptotic behaviors near both boundaries are well-established: $f_{SFS} (x)$ is proportional to $\frac{1}{{(x log x)}^{2}}$ near $x \sim 0$ and $- \frac{1}{(1 - x) log (1 - x)}$ near $x \sim 1$ , respectively (Kosheleva and Desai 2013; Neher and Hallatschek 2013) (see also Appendix A).

The neutral site frequency spectrum for different values of α and fixed population size $N = 10^{5}$ . When $1 < α < 2$ , the rare-end spectrum and the frequent-end spectrum are $\propto \frac{1}{x^{3 - α}}$ and $\propto \frac{1}{{(1 - x)}^{2 - α}}$ , respectively (see also Figure 7).

For the intermediate case $1 < α < 2$ , the rare-end behavior of the SFS has been analytically studied. From a backward approach (the Λ-coalescent), the authors in Berestycki et al. (2014) showed

lim_{n \to \infty} \frac{ζ_{i}^{(n)}}{n^{2 - α}} \propto \frac{Γ (i + α - 2)}{i!} .

(8)

Here, n is a sample size and $ζ_{i}^{(n)}$ is the number of sites at which variants appear i times in the sample (see Berestycki et al. 2014) for the proportionality constant of the right-hand side of Equation 8). By using Stirling’s approximation in Equation 8, we have

f_{SFS} (x) \propto \frac{1}{x^{3 - α}} when x ≪ 1.

(9)

Equation 8, cannot be used for high-frequency variants, because the number of times the variants appear (i in Equation 8) is kept finite in taking the limit of the sample size n. To the best of our knowledge, a precise behavior at the high-frequency end for $1 < α < 2$ has not been reported. As shown in Figure 7, we find that the asymptotic form of the uptick of $f_{SFS} (x)$ is given by

f_{SFS} (x) \propto \frac{1}{{(1 - x)}^{2 - α}} (for 1 - x ≪ 1) .

(10)

The SFS near x = 1 for $α = 1.3, 1.4, 1.5, 1.6$ (circle, squares). The horizontal axis is $1 - x$ . The solid lines are drawn assuming $f_{SFS} (x) \propto \frac{1}{{(1 - x)}^{2 - α}}$ . $N = 10^{6}$ is used.

We will show that the uptick arises due to the fact that an effective sampling bias decreases as an allele-frequency trajectory approaches the fixation boundary (Equation 42).

Mutation-drift balance

A broad offspring distribution also affects the stationary distribution of allele frequency when mutations and genetic drift are balancing one another. For simplicity, we consider symmetric reversible mutations between two neutral allele types. We denote the scaled mutation rate (per unit time in the continuous description) as $θ = T_{c} μ$ , where μ denotes the mutation rate per generation. In the Wright–Fisher model, it is known that the stationary distribution is given by Kimura (1955)

P_{WF} (x) \propto x^{2 θ - 1} {(1 - x)}^{2 θ - 1} .

(11)

There is a critical value $θ_{c}^{WF} = \frac{1}{2}$ : When $θ > θ_{c}^{WF}$ , the distribution in Equation 11 has a single peak at the center $x = \frac{1}{2}$ ; when $θ < θ_{c}^{WF}$ , it has a U-shaped distribution, where the density is increasing monotonically from the center to the boundaries.

Figure 8, A and B show the numerical results of the stationary distributions for the Wright–Fisher model and $α = 1.5$ , respectively. When $1 \leq α < 2$ , while a critical value of the mutation rate θ_c exists as in the Wright–Fisher model, there is a qualitatively different feature: For a small mutation rate $θ < θ_{c}$ , the stationary distribution is not a U-shaped but an M-shaped distribution with two peaks near the boundaries. Note that the M-shaped distribution indicates a stochastic switching behavior, as illustrated in Figure 8D (the blue curve). As shown in Figure 8D, the peak positions are approximately given up to prefactors by

x_{peak}, 1 - x_{peak} \approx θ^{\frac{1}{2 - α}} .

(12)

(A) Stationary distribution of the allele frequency in the Wright–Fisher model, when the mutation rate is small ( $θ = 0.1$ ) and large ( $θ = 1.0$ ). (B) Stationary distribution for $α = 1.5$ , when the mutation rate is small ( $θ = 0.1$ ) and large ( $θ = 1.0$ ). (C) The time-series of the allele frequency in the case of $α = 1.5$ , when the stationary distribution is bimodal ( $θ = 0.1$ ) and unimodal ( $θ = 1.0$ ). (D) The position of the peak near x = 0 of the stationary distribution versus the mutation rate μ. $N = 10^{4}$ is used.

In Appendix B, we show that the M-shaped stationary distribution persists even in the presence of natural selection, provided that selection is weaker than the sampling bias at the peaks of the distribution.

A similar M-shaped distribution was observed for the EW process in (Der and Plotkin 2014), wherein moments of the stationary distribution were extensively studied. However, the origin of the M-shaped distribution remained unclear. Below, using scaling arguments, we explain why the bimodal distribution arises in our case (see the argument above Equation 44).

Analytical arguments

Limiting process, transition density, and time-dependent effective bias

We now provide analytical arguments for the observations made in the simulations described in the first part of this paper. Our discussion starts with an exact but somewhat unwieldy description of the allele frequency dynamics. We then show how exact short-time and intermediate time asymptotics can be derived and used to rationalize the sampling bias and the scaling laws discovered above.

The allele frequency dynamics can be fully characterized by the transition probability density $w_{N} (y | x)$ that the mutant frequency changes from x to y in one generation. Since one generation consists of random offspring contributions to the seed pool and binomial sampling from the seed pool, we have

\begin{matrix} w_{N} (y | x) = \int d M \int d W P^{MUT} (M; x N) P^{WT} (W; (1 - x) N) \\ \times {Pr}_{binom .} (y N, N, \frac{M}{M + W}) . \end{matrix}

(13)

Here, $P^{MUT} (M; x N)$ is the probability density that the sum of $x N$ random mutant offspring numbers takes the value M, $P^{WT} (W; (1 - x) N)$ is that for the wild type, and $P r_{binom .}$ is the probability of getting $y N$ successes in N trials with success probability $\frac{M}{M + W}$ . First, we will focus on the neutral case, for which $P^{MUT}$ and $P^{WT}$ are the same function, i.e., $P^{MUT} (\cdot) = P^{WT} (\cdot)$ .

While the resampling distribution w_N may in general behave in complex ways, it has few options in the large N limit. These constraints emerge from two asymptotic simplifications. First, since M and W are the sums of many random variables, $P^{MUT}$ and $P^{WT}$ tend to stable distributions as described by the generalized central limit theorem (Gnedenko and Kolmogorov 1968; Uchaikin and Zolotarev 2011) (see also Appendix C for a brief description of the theorem). Second, the fluctuations associated with binomial sampling become negligible compared with those induced by offspring number contributions to the seed pool, provided that the offspring distribution is sufficiently broad, i.e., $α \leq 2$ . Thus, we can replace ${Pr}_{binom .} (y N, N, \frac{M}{M + W})$ with a Dirac delta function, $δ (y - \frac{M}{M + W})$ . By using these facts and evaluating the integral in Equation 13 (see Appendix D for details), we obtain a simple analytical expression of $w_{N} (y | x)$ , which is valid in the large N limit: When α = 1 (Hallatschek 2018),

w_{N} (y | x) = \frac{1}{log N} \frac{x (1 - x)}{{(x - y)}^{2}} .

(14)

When $1 < α < 2$ ,

w_{N} (y | x) = {\begin{matrix} N^{1 - α} C_{α} x (1 - x) \frac{{(1 - y)}^{α - 1}}{{(y - x)}^{α + 1}} when x < y \\ N^{1 - α} C_{α} x (1 - x) \frac{y^{α - 1}}{{(x - y)}^{α + 1}} when x > y, \end{matrix}

(15)

where $C_{α} \equiv α {(\frac{α - 1}{α})}^{α}$ .

To obtain the continuum description, we must appropriately scale the time t with the population size N (Gardiner 2009). The characteristic timescale (coalescent timescale) T_c can be read from the dependence of the transition density on N. Hallatschek (2018) showed that, when α = 1, the resulting limiting process is described by,

\begin{matrix} \frac{\partial}{\partial τ} P (x, τ) = - \frac{\partial}{\partial x} V (x) P (x, τ) \\ + \int d x' (w (x | x') P (x', τ) - w (x' | x) P (x, τ)), (16) \end{matrix}

where the jump kernel $w (x' | x)$ is given by

w (x' | x) = \frac{x (1 - x)}{{(x - x^{'})}^{2}},

(17)

and the advection (bias) term V(x) is given by

V (x) = - P . V . \int d x' (x' - x) w (x' | x) = x (1 - x) log \frac{x}{1 - x'}

(18)

where P.V. denotes the Cauchy principal value. It is easy to check that Equation 16 satisfies the neutrality condition $\frac{\partial}{\partial τ} 〈 X 〉 = 0$ (see Hallatschek 2018 for the calculation). Equations of the form in Equation (14) are sometime called differential Chapman–Kolmogorov equations (Gardiner 2009).

To develop intuition, it is useful to interpret the different terms in Equation 16. First, V(x) has a form of frequency-dependent selection that enhances the major allele (with frequency $> 50 %$ ) and suppresses the minor allele. The apparent fitness differences between the mutant and wild type is given by the log-ratio of their frequencies. Such a selection-like effect arises because the major allele can sample the offspring number from $P_{U} (u)$ more deeply than the minor allele (see Hallatschek 2018). Second, in spite of this apparent bias, the neutrality of the whole process is maintained due to rare large jumps, characterized by $w (y | x)$ . This also means that the neutrality does not hold if we focus on “typical” trajectories (see Figure 1). In fact, as we show in Appendix A, the median $x_{med}$ of the mutant frequency, which is a proxy of “typical” trajectories, evolves according to

\frac{d}{d τ} X^{med} (τ) = V (X^{med} (τ)) (when α = 1) .

(19)

When $1 < α < 2$ , using the same reasoning as the derivation of Equation 14 and choosing $τ = \frac{t}{C_{α} N^{α - 1}}$ , we can obtain the following differential Chapman–Kolmogorov equation,

\begin{matrix} \frac{\partial}{\partial τ} P (x, τ) = - \frac{\partial}{\partial x} V (x) P (x, τ) \\ + \int_{| x' - x | > ϵ} d x' (w (x | x') P (x', τ) - w (x' | x) P (x, τ)) (20) \end{matrix}

where

w (x' | x) = {\begin{matrix} x (1 - x) \frac{{(1 - x')}^{α - 1}}{{(x' - x)}^{α + 1}} when x < x' \\ x (1 - x) \frac{x'^{α - 1}}{{(x - x')}^{α + 1}} when x > x' \end{matrix}

(21)

and

V (x) = - \int_{| x' - x | > ϵ} d x' (x' - x) w (x' | x) .

(22)

As in Equation 16, the advection term guarantees the neutrality. Equation 21 means that, when $x < \frac{1}{2}$ , rightward jumps occur more frequently than leftward ones, and this tendency reverses when $x > \frac{1}{2}$ . Noting the overall minus sign in Equation 22, this in turn means that $V_{eff}$ is a bias against the minor allele (see Figure 1), as in the case of α = 1. We will later show that when $x ≪ 1$ , the median trajectory is initially decaying like $Δ X^{med} \sim - τ^{\frac{1}{α}}$ (Equation 2).

We note that the short-time superdiffusive behavior in Equation 5 implies that Equation 20 cannot be simplified to a Fokker–Planck equation. We also note that, in the limit $ϵ \to 0$ , two divergencies arise in Equation 20, one in the integral for the advection velocity in Equation 22 and one in the jump integral in Equation 16. However, since both divergencies exactly cancel, the entire right-hand side of Equation 20 is well-defined. As shown in Appendix E, Equations 16 and 20 can also be derived as a dual of the Λ-Fleming-Viot process, namely as the adjoint operator of the backward generator (e.g., Etheridge et al. 2010; Griffiths 2014).

Although it is difficult to study Equation 20 analytically, it is possible to derive exact short-time and long-time asymptotics that, combined with scaling arguments, paint a fairly comprehensive picture of the ensuing statistical genetics.

Short-time dynamics and fluctuations

First, we describe the transition density $P (x, τ | x_{0}, τ = 0)$ of Equation 20 for small times. When $1 < α < 2$ , the allele frequency changes due to the deterministic bias V(x) and random occurrence of jumps, sampled from the broad distribution in Equation 21. Since the number of jump events is enormous $(\sim \frac{τ}{ϵ^{α}})$ even for small τ, the generalized central limit theorem applies, and $X (τ)$ is asymptotically distributed according to a stable distribution (Gnedenko and Kolmogorov 1968). For a general stable distribution, its analytical expression is not available, and only its characteristic function can be expressed analytically. As we show in Appendix F, the random displacement $Δ X (τ) = X (τ) - x_{0}$ can be expressed as

Δ X (τ) = X (τ) - x_{0} = γ (τ, x_{0}) Z .

(23)

Here Z is sampled from the stable distribution p(z) whose characteristic function $〈 e^{ikZ} 〉 \equiv \int d z e^{ikz} p (z)$ is given by

〈 e^{ikZ} 〉 = exp [- | k |^{α} (1 - i β (x_{0}) tan \frac{π α}{2} sign (k))],

(24)

and the scale parameter $γ (τ, x)$ and the skewness parameter $β (x)$ are respectively given by

γ (τ, x) \equiv τ^{\frac{1}{α}} {(\frac{π (x {(1 - x)}^{α} + (1 - x) x^{α})}{2 Γ (α + 1) sin \frac{π α}{2}})}^{\frac{1}{α}},

(25)

β (x) \equiv \frac{x {(1 - x)}^{α} - x^{α} (1 - x)}{x^{α} (1 - x) + x {(1 - x)}^{α}} .

(26)

Note that statistical properties of Z are independent of τ, and $Δ X (τ)$ depends on τ via the scale parameter $γ (τ, x_{0})$ . As shown in Figure 9A, for small times, the transition density $P (x, τ | x_{0}, τ = 0)$ computed from the stable distribution agrees precisely with numerical simulation results in the discrete-time model. Our result can be regarded as a counterpart of the Gaussian approximation often employed for the Wright–Fisher diffusion (see Tataru et al. 2017 and the references therein).

(A) The allele frequency distribution $p (x, t | x_{0} = 0.005)$ at generation $t = 5, 10, 35$ , for $α = 1.5$ . The solid lines denote the short-time transition densities given by Equations 21 and 22, and the open markers denote those computed from 10,000 allele frequency trajectories in the discrete-time model. (B) The initial dynamics of the median of the allele frequency (black). The red and blue lines denote the short-time solution in Equation 25 and the long-time solution in Equation 35, where constants of integration and the prefactor of Equation 36 are determined by fitting to the discrete-time model (black line) between $40 < t < 800$ . (C) The overall trajectory to extinction. The color scheme is the same as that in (B). In (A–C), $α = 1.5, N = 10^{7}, x_{0} = 0.005$ are used.

Now, we study the mean and median of the allele frequency using the short-time expression. The mean does not change in time since $〈 Δ X (τ) 〉 = γ (τ, x_{0}) 〈 Z 〉 = 0$ , which is consistent with the neutrality. On the other hand, the median changes as

Δ X^{med} (τ) = γ (τ, x_{0}) Z^{med} (x_{0}),

(27)

where $Z^{med} (x_{0})$ denotes the median of Z. $Z^{med} (x_{0})$ depends on x₀ via $β (x_{0})$ (see Equation 24), and $Z^{med} (x_{0}) ≶ 0$ for $x_{0} ≶ \frac{1}{2}$ . Equation 27 agrees with numerical simulations in the discrete-time model, while $X (τ)$ is close to the initial frequency x₀ (see the red and black curves in Figure 9B).

The scaling property $Δ X (τ) \propto τ^{\frac{1}{α}}$ in Equation 2 immediately follows from Equation 27, since $γ \propto τ^{\frac{1}{α}}$ . This scaling implies that there is a time-dependent bias driving the median of the allele frequency. Differentiating Equation 27 with respect to time gives

\frac{d}{d τ} X^{med} (τ) = V_{eff} (τ)

(28)

where the effective time-dependent bias $V_{eff} (τ)$ is given by

V_{eff} (τ) \equiv \frac{\partial γ (τ, x_{0})}{\partial τ} Z^{med} (x_{0}) .

(29)

Near the boundaries x = 0 and x = 1, $V_{eff} (τ)$ is approximately given by

V_{eff} (τ) \approx {\begin{matrix} - k \frac{x_{0}^{\frac{1}{α}}}{τ^{1 - \frac{1}{α}}} (x ≪ 1) \\ + k \frac{{(1 - x_{0})}^{\frac{1}{α}}}{τ^{1 - \frac{1}{α}}} (1 - x ≪ 1) \end{matrix},

(30)

where $k \equiv \frac{| Z^{med} (x_{0} = 0) |}{α} {(\frac{π}{2 Γ (α + 1) sin \frac{π α}{2}})}^{\frac{1}{α}}$ is a positive constant.

The advection term arises from a sampling bias

Intuitively, the time-dependent bias $V_{eff} (τ)$ arises from a time-dependence of the largest sampled offspring number (Figure 10). To see this, consider a typical trajectory of the allele frequency starting from x. Up to a short time τ, only jumps from x to $y \in [y_{-} (τ), y_{+} (τ)]$ are likely to occur, where $y_{-} (τ)$ and $y_{+} (τ)$ can be estimated by the extremal criterion (Krapivsky et al. 2010),

τ \times \int_{0}^{y_{-} (τ)} w (y | x) d y \sim 1, τ \times \int_{y_{+} (τ)}^{1} w (y | x) d y \sim 1.

(31)

These conditions give

y_{-} (τ) \sim \frac{x}{1 + {(\frac{τ (1 - x)}{α})}^{1 / α}}, y_{+} (τ) \sim \frac{x + {(\frac{τ x}{α})}^{1 / α}}{1 + {(\frac{τ x}{α})}^{1 / α}} .

(32)

Because these small jumps cancel a part of the bias V(x) in Equation 22, the typical trajectory is then driven by the uncanceled residual part of the bias V(x),

V'_{eff} \equiv - \int_{y \in [0, y_{-} (τ)] \cup [y_{+} (τ), 1]} (y - x) w (y | x) d y .

(33)

When $x ≪ 1$ , the dominant contribution to this integral is from $y \approx y_{+} (τ)$ . Using $w (y | x) \sim \frac{x}{{(y - x)}^{α + 1}}$ from the first line of Equation 21 and $y_{+} - x \sim {(τ x)}^{\frac{1}{α}}$ from Equation 32, the above integral can be evaluated as $V'_{eff} \sim - \frac{x}{{(y_{+} - x)}^{α - 1}} \sim - \frac{x^{\frac{1}{α}}}{τ^{\frac{α - 1}{α}}}$ , which agrees with $V_{eff}$ in Equation 30 for $x ≪ 1$ (up to the factor κ). When $1 - x ≪ 1$ , the dominant contribution to $V'_{eff}$ is from $y \approx y_{-} (τ)$ and can be evaluated in a similar way, reproducing $V_{eff}$ in Equation 30 for $1 - x ≪ 1$ .

Schematic explanation of the effective time-dependent bias $V_{eff} (τ)$ . The black curve shows the jump rate $w (y | x)$ in Equation 19 when $x ≪ 1$ . In a time τ, small jumps within the region $[y_{-} (τ), y_{+} (τ)]$ are likely to occur, offsetting a part of the original bias V(x). $V_{eff} (τ)$ is the residual part of the bias.

One interpretation of Equation 33 is that the bias V(x) in Equation 22 is mitigated by small jumps in a short time, and therefore, the integration over small jumps is excluded in Equation 33. Another interpretation is that, for typical short-time dynamics, small jumps and the bias V(x) are relevant, and, from the overall neutrality, the change caused by these two is equal to the negative of that caused by large jumps, thus resulting in Equation 33.

Allele frequency fluctuations are inconsistent with the Wright–Fisher diffusion

In the simulations, we found that, for $1 \leq α < 2$ , allele frequency fluctuations are inconsistent with the Wright–Fisher diffusion and characterized by super-diffusion with diffusion exponent $\frac{2}{α}$ (see Equation 5). This finding is readily explained by the short-time asymptotic in Equation 23. Recalling $γ \propto τ^{\frac{1}{α}}$ (Equation 25) and statistical properties of Z are independent of τ, we obtain

Median SD = γ {(τ, x_{0})}^{2} M [Z^{2}] \propto τ^{\frac{2}{α}} .

(34)

This scaling can also be justified heuristically by noting that, for $1 < α < 2$ , the square displacement is dominated by large jumps. During time τ, an allele frequency $X (τ)$ around x typically jumps to $y_{\pm}$ given in Equation 32. When $τ ≪ 1$ , it is easy to see $| y_{\pm} - x | \sim τ^{\frac{1}{α}}$ with x-dependent prefactors. Because the median SD is dominated by the largest displacements, it can be evaluated as

Median SD \sim {(y_{\pm} (τ) - x)}^{2} \sim τ^{\frac{2}{α}},

(35)

where $τ = \frac{t}{T_{c}} ≪ 1$ is assumed.

Long-time dynamics and extinction time

Above, we saw that at short times, allele frequencies carry out an unconstrained Levy flight. This random search process, however, gets distorted as soon as the allele frequency starts to get in reach of one of the absorbing boundaries. Interestingly, the dynamics then enters a universal intermediate asymptotic regime that controls both the characteristic extinction time as well as establishment times and fixation probabilities.

To see this, let us consider the extinction dynamics of a trajectory starting from a small frequency $x_{0} ≪ 1$ (Figure 4). At short times, we can apply the short-time asymptotics in Equations 28 and 30. We expect Equations 28 and 30 to break down when the displacement $Δ X^{med} (τ)$ computed from Equation 28 becomes comparable to x₀, which occurs at $τ \sim τ_{short} \equiv x_{0}^{α - 1}$ . By taking a coarse-grained view, the rate of the frequency change in $τ_{short}$ is roughly given by

\frac{Δ X^{med}}{τ_{short}} \sim - x_{0}^{2 - α} .

(36)

This suggests that, in a long timescale ( $τ ≳ τ_{short}$ ), the median frequency decreases as

\frac{d}{d τ} X^{med} (τ) = {\tilde{V}}_{eff} (X^{med} (τ)) (for X ≪ 1),

(37)

where, up to a prefactor, the frequency-dependent bias ${\tilde{V}}_{eff} (X)$ is given by

{\tilde{V}}_{eff} (X) \sim - X^{2 - α} .

(38)

In Figure 9C, it is numerically shown that the long-time trajectory $X^{med} (τ > τ_{short})$ is consistent with Equation 37. By solving Equation 37, the median trajectory goes to extinction at $τ_{ext} \sim x_{0}^{α - 1}$ (Equation 3), in agreement with our simulations (Figure 4). Note that, for $1 - x ≪ 1$ , the bias in Equation 38 is replaced by ${\tilde{V}}_{eff} (X) \sim {(1 - X)}^{2 - α}$ .

Importantly, Equations 37 and 38 can also be rigorously justified from a scaling ansatz for the transition density. After some time, $P (x, τ | x_{0})$ spreads broadly over the region $x ≪ 1$ with a peak at x = 0 (Figure 11A). As shown in Figure 11B, $P (x, τ)$ is consistent with the following scaling ansatz;

P (x, τ) \sim τ^{- 2 η} g (ξ) (for x ≪ 1),

(39)

where $η \equiv \frac{1}{α - 1}$ and $g (ξ)$ is a function of $ξ \equiv \frac{x}{τ^{η}}$ . Up to an overall constant, $g (ξ)$ can be determined analytically and expressed as an infinite series (see Appendix F). Note that the τ-dependent factor in Equation 39 is motivated from the fact that the extent over which the distribution spreads increases like $τ^{η}$ . Equation 39 implies that, conditional on establishment at τ, the median frequency increases as $X^{med} (τ) |_{establish} \sim τ^{η}$ . Then, Equation 38 follows by evaluating the bias in Equation 28 at $τ \sim {(X^{med})}^{\frac{1}{η}}$ and at $X^{med}$ , instead of at x₀.

(A) Log plot of $P (x, τ | x_{0} = 0.01)$ at generations $t = 700, 1100, 1500, 1900$ computed from the discrete-time model. $N = 10^{7}$ and $α = 1.5$ . (B) Log–log plot of $τ^{2 η} P (x, τ | x_{0} = 0.01)$ versus $ξ = x / τ^{2 η}$ , where $η = {(α - 1)}^{- 1}$ , at $t = 700, 1100, 1500, 1900$ (solid curves). The dashed curve represents the analytic result of $g (ξ)$ (see Appendix F). The curves $τ^{2 η} P (x, τ | x_{0} = 0.01)$ at the different time points collapse into $g (ξ)$ , supporting the scaling ansatz in Equation 37.

As a consistency check of the exponent $α - 1$ in Equation 3, we consider two solvable, extreme cases. First, in the limit $α \to 2$ , the dependence on x₀ in Equation 3 becomes linear. In the Wright–Fisher model, the mean extinction time can be obtained analytically by solving the backward equation $- 1 = x (1 - x) \frac{\partial^{2} τ_{ext} (x)}{\partial x^{2}}$ (see, e.g., Karlin and Taylor 1981). The solution is proportional to x₀ with a logarithmic correction, $τ_{ext} \approx - x_{0} log x_{0}$ . Second, when $α \to 1$ , the mean extinction time no longer depends on x₀. We can obtain this explicitly, by solving Equation 17: Using $V (X) ≃ X log X$ when $X ≪ 1$ , the solution is given by $log \frac{log X (τ)}{log x_{0}} = τ$ . Therefore, if we approximately define the mean extinction time $τ_{ext}$ as $X (τ_{ext}) \approx \frac{1}{N}$ , we obtain $τ_{ext} \approx log \frac{log N}{- log x_{0}} \approx log log N$ , which is to leading order independent of x₀ if x₀ is taken to be of order one.

Natural selection and fixation probability

One important advantage of the forward-time perspective is that we account for natural selection by introducing an appropriate bias favoring of the beneficial variant. Suppose that the mutant type has a selective advantage s > 0, such that the average offspring number of mutants is increased by a factor of $1 + s$ relative to the wild type. In time-rescaled Chapman–Kolmogorov equation, this adds the term $σ x (1 - x)$ , where $σ = T_{c} s$ , into the advection V(x) of Equation 20.

The key observation underlying the argument below is that when X is sufficiently small, the selection force $σ x (1 - x) \approx σ X$ is negligible compared to the bias ${\tilde{V}}_{eff} (X) \sim - X^{2 - α}$ in Equation 38 because, while the former is approximately linear in X, the latter is sublinear. If the frequency happens to grow and reach a certain value X_c, the genuine selection begins to dominate over the bias, and the trajectory fixes with high probability (see Figure 12 for example trajectories and Figure 13A). By using Equation 38, the crossover point X_c can be estimated from the balance between the selection force and the sampling bias ${\tilde{V}}_{eff} (X)$ ,

σ X = - {\tilde{V}}_{eff} (X) \sim X^{2 - α},

(40)

which gives

X_{c} \sim σ^{- \frac{1}{α - 1}} = \frac{1}{N} s^{- \frac{1}{α - 1}} .

(41)

Example of trajectories of the frequency of the beneficial allele, starting from $x_{0} = 0.05$ , $α = 1.5, s = 0.03$ , and N = 5000. Fixed trajectories are colored in blue and extinct ones in gray. Here, the crossover point *X_c* can be estimated as $X_{c} \sim 0.2$ , assuming that the proportional constant in Equation 39 is one. Once a trajectory reaches the crossover point, it becomes fixed in high probability.

For $X ≪ X_{c}$ , the dynamics are essentially neutral (described by Equation 20), while, for $X > X_{c}$ , the trajectory grows almost deterministically. Therefore, the fixation probability $P_{fix}$ of a beneficial mutation can be estimated by using the neutral fixation probability in a population of size $\approx N X_{c}$ . Although the full dynamics in Equation 20 is difficult to analyze, it is obvious that the neutral fixation probability is equal to the inverse of the population size. Therefore, we have

P_{fix} \sim \frac{1}{N X_{c}} \sim s^{\frac{1}{α - 1}},

(42)

which is valid for $\frac{1}{N} ≪ s^{\frac{1}{α - 1}}$ . Equation 42 reproduces our simulation results in Figure 5 for $1 < α < 2$ and, as $α \to 2$ , also reproduces the known result of the Wright–Fisher model, $P_{fix}^{WF} \sim 2 s$ (up to a prefactor).

Site frequency spectrum

By using the time-dependent effective bias, we can also estimate the behavior of the SFS $f_{SFS} (x)$ for frequent and rare variants. While the SFS is theoretically defined in the infinite alleles model, it can be computed from our biallelic framework (Ewens 1963): $f_{SFS} (x) Δ x$ is defined as the expected number of neutral derived alleles in the frequency interval $[x - \frac{Δ x}{2}, x - \frac{Δ x}{2}]$ in a sampled population (here, the whole population). Because new mutations are assumed to arise uniformly in time, the SFS for unlinked neutral loci is given by the product of the total mutation rate μN and the mean sojourn time, namely, the average time an allele spends in the frequency interval $[x - \frac{Δ x}{2}, x - \frac{Δ x}{2}]$ until fixation or extinction.

First, we consider the low-frequency end, $x ≪ 1$ , of the SFS (see Cvijović et al. 2018 for a similar argument). Since the SFS is proportional to the sojourn time, trajectories whose maximum frequencies are x or slightly larger than x dominantly contribute to the SFS $f_{SFS} (x)$ at x. Since these trajectories typically go extinct due to the bias, and we can roughly estimate their sojourn times at x as the inverse of “velocity”, $| {\tilde{V}}_{eff} (x) | \sim x^{2 - α}$ in Equation 38. Since the probability that a trajectory grows above a frequency x is roughly given by $\sim 1 / (N x)$ , the SFS is proportional to

\frac{1}{N x {\tilde{V}}_{eff} (x)} \propto \frac{1}{x^{3 - α}} (for x ≪ 1) .

(43)

Similarly, for the high-frequency end of the SFS, only the trajectories that grow above x can contribute to $f_{SFS} (x)$ . Typically, these trajectories go to fixation due to the bias ${\tilde{V}}_{eff} (x) \sim {(1 - x)}^{2 - α}$ . Therefore, the SFS is proportional to

\frac{1}{N x {\tilde{V}}_{eff} (x)} \propto \frac{1}{{(1 - x)}^{2 - α}} (for 1 - x ≪ 1) .

(44)

The effect of the genuine selection on the SFS can also be studied by using the effective bias. See Appendix G.

Bimodality of stationary distribution

Now, we turn to explaining the bimodality observed at mutation-drift balance. We found that, when the mutation rates are small, the stationary allele frequency distribution is not a U-shaped, as expected from the Wright–Fisher dynamics, but M-shaped, as shown in Figure 8. The M-shaped distribution arises from the balance between the mutational force and the effective bias (see Figure 13B). In the Chapman–Kolmogorov equation, the mutational force is given by

- θ x + θ (1 - x) \approx {\begin{matrix} + θ (x ≪ 1) \\ - θ (1 - x ≪ 1), \end{matrix}

(45)

which pushes the frequency toward the center $x = \frac{1}{2}$ . On the other hand, the effective bias, ${\tilde{V}}_{eff} (x) \approx - x^{2 - α}$ for $x ≪ 1$ and ${\tilde{V}}_{eff} (x) \approx {(1 - x)}^{2 - α}$ for $1 - x ≪ 1$ , pushes a trajectory toward the closer boundary. Therefore, the positions where these two forces balance are approximately given by

x_{peak} \approx c θ^{\frac{1}{2 - α}}, 1 - c θ^{\frac{1}{2 - α}},

(46)

where c is a positive constant. If θ is sufficiently small, we can always find the balancing points. The presence of these two balancing points means that we can think of the allele frequency dynamics as a two-state system, essentially analogous to a super-diffusing particle in a double-well potential (see Figure 8C for a realization of trajectories). This explains the bimodal shape of the stationary distribution.

Finally, we remark that, even in the presence of natural selection, the balancing positions are still determined from the mutation-effective bias balance provided that $θ ≪ 1$ : while the effective bias and the mutational term are sub-linear and constant respectively, the selection term $σ x (1 - x)$ is linear in x when $x ≪ 1$ . Thus, when θ is sufficiently small, the magnitude of the selection term around $x = c θ^{\frac{1}{2 - α}}, 1 - c θ^{\frac{1}{2 - α}}$ is negligible, and the peak positions are given by Equation 46.

Discussion

In this study, we analyzed the effect of power law offspring distributions on the competition of two mutually exclusive alleles. Our main reason to consider such broad offspring distributions is that they often emerge in evolutionary scenarios that inflate the reproductive value (Barton and Etheridge 2011) of a small set of founders. For example, range expansions blow up the descendant numbers of the most advanced individuals in the front of the population, an effect that has been called gene surfing (Hallatschek and Nelson 2008). Likewise, continual rampant adaptation boosts the descendant numbers of the most fit individuals. The resulting allele frequency dynamics becomes asymptotically similar to that of a population with scale-free offspring distributions.

In the case of narrow offspring distributions, which is predominant assumption in population genetics, it is usually an excellent approximation to describe the allele frequency dynamics by a biased diffusion process, which forms the basis of powerful inference frameworks (Tataru et al. 2017). If the offspring distribution is broad, however, allele frequency trajectories are disrupted by discontinuous jumps, resulting from so-called jackpot events—exceptionally large family sizes drawn by chance from the offspring distribution. Our goal was to find an analytical and intuitive framework within which we can understand the main features of these unusual dynamics.

We found that the main counter-intuitive features can be understood and well-approximated from a competition of selection and mutations with a time-dependent emergent sampling bias, $V_{eff} (τ)$ . The sampling bias favors the major allele and arises, because the sub-population carrying the major allele typically samples deeper into the tail of the offspring distribution than the minor allele fraction.

In the remainder, we first summarize the unusual population genetic patterns that can be explained by the action of these effective forces. We then discuss how broad offspring dynamics could be detected in natural populations and what its implications are for the dynamics of adaptation. Finally, we demonstrate that these dynamics are also ubiquitous in populations with narrow offspring distributions, when mutational jackpots are possible. Therefore, we believe our theoretical framework may be taken as a general null model for populations far from equilibrium.

Unusual dynamics

We found that the sampling bias effectively acts like time- and frequency-dependent selection. In the absence of true selection, $V_{eff} (x, τ)$ drives the major allele to fixation, first rapidly and than gradually slowing down with time and proximity to fixation. The slowing down of the sampling bias near fixation also leads to an excess of high-frequency alleles, given continual influx of neutral mutations. This generates a high-frequency uptick in the SFS, which is characteristic of the tail of the offspring distribution. In mutation-drift balance, the allele frequency distribution is M-shaped, in contrast to the U-shape expected from the Wright–Fisher dynamics. The peaks reflect the balance of the mutational and sampling bias.

Non-neutral dynamics depends on whether the genuine selection force dominates over the sampling bias. The sampling bias tends to dominate near extinction or fixation, and wanes near 50% frequency. A de novo beneficial allele will not be able to fix unless it overcomes, by chance, the switch-point frequency at which genuine selection becomes stronger than the sampling bias. Finally, fluctuations in typical trajectories are getting stronger over time. As a consequence, allele frequencies super-diffuse: fluctuations grow with time more rapidly than under the Wright–Fisher diffusion.

Detecting dynamics driven by broad offspring distributions

The time-dependent over-dispersion is most readily detected by plotting the median square displacement as a function of time (see Figure 4B). Testing deviations in this statistic is an attractive avenue for detecting deviations from the Wright–Fisher diffusion because the signal is strong for intermediate allele frequencies, which can be accurately measured by population sequencing. By contrast, the time-dependent bias vanishes when an allele has 50% frequency. So, the detection of the sampling bias requires accurate time series data of low frequency variants, which is difficult to obtain given sequencing errors.

It is clear that a single super-diffusing but neutral allele would not abide by the diffusive Wright–Fisher null model and thus might be falsely considered as an allele under selection. But importantly, allele super-diffusion has an impact even on statistics that sum over many unlinked loci. This is significant for inference methods, for instance to detect polygenic selection, which argue that trait values follow a diffusion process, if not for an underlying Wright–Fisher dynamics of the allele frequencies then because they sum over many independent allele frequencies (Berg and Coop 2014). However, $α < 2$ dynamics breaks both of these arguments. In particular, sums of many unlinked loci tend to non-Gaussian distributions (so-called alpha-stable distributions). Hence, for traditional inference methods based on the Wright–Fisher diffusion or standard central limit theorem (Tataru et al. 2017), an underlying super-diffusion process should be ruled out.

If time series are not available, broad offspring numbers can also be detected from the SFS (Neher and Hallatschek 2013). A tail-tale sign of the sampling bias is a characteristic uptick at the high-frequency tail of the SFS, which is difficult to generate by demographic variation (Neher and Hallatschek 2013). As we have shown, the shape of the uptick is characteristic of the tail of the offspring distribution (the parameter α).

Implications for the dynamics of adaptation

We found that the fixation probabilities quite sensitively depends on the broadness α of the offspring distribution (Equation 42). Accordingly, the dynamics of adaptation, which ultimately depends on the fixation of beneficial variants, should change quantitatively. To estimate these modifications, we consider an asexual population of constant size N with a broad offspring distribution with $1 < α < 2$ , wherein beneficial mutations occur at the rate $μ_{B}$ . For low mutation rates, mutations sweep one after the other but when mutation rate are sufficiently high, multiple mutations occur and most mutations are outcompeted by fitter mutations. Such a situation is known as clonal interference.

We can study the effect of the exponent α on the adaptation dynamics quantitatively by repeating the argument in Desai and Fisher (2007), wherein the variance of offspring numbers is assumed to be narrow. As discussed in Appendix H, clonal interference should occur if

μ_{B} N s^{\frac{2 - α}{α - 1}} \ln (N s^{\frac{1}{α - 1}}) ≳ 1 (clonal interference),

(47)

where s > 0 is the fitness effect of a mutation, which we assume to be constant. The rate R of adaptation is given by

R \sim {\begin{matrix} μ_{B} N s^{\frac{α}{α - 1}} (successive selective sweeps) \\ \frac{2 s^{2} \ln (N s^{\frac{1}{α - 1}})}{{(\ln \frac{s}{μ_{B}})}^{2}} (clonal interference) \end{matrix} .

(48)

Note that the second line in Equation 48 reproduces Equation 5 of Desai and Fisher (2007) in the limit $α \to 2$ . Thus, the rate of adaptation depends only weakly (logarithmically) on α in the clonal interference regime, even though the condition for clonal interference in Equation 47 depends on α quite sensitively.

Emergence of skewed offspring distributions in models of range expansions

Our study can be regarded as an analysis of the population genetics induced by power-law offspring distributions. The main reason to consider these scale-free offspring distributions is that they quite generally emerge in models of stochastic traveling waves (Birzu et al. 2018). Such models are ubiquitous in population genetics because they describe a wide range of evolutionary scenarios, including range expansions, rampant asexual and sexual adaptation as well as Muller’s ratchet (Brunet et al. 2007; Desai et al. 2013; Kosheleva and Desai 2013; Neher and Hallatschek 2013; Schweinsberg 2017; Birzu et al. 2018). Our analysis should therefore apply most directly to these evolutionary scenarios, which we now demonstrate using a simple model of a range expansion. We end by discussing the question of whether some of our results may also arise in scale-rich offspring distributions.

Birzu et al. (2018) argued that any exponent $1 \leq α \leq 2$ can emerge in a simple model of range expansions that incorporates a tunable level of cooperativity between individuals (Figure 14A). The model can be described by a generalized stochastic Fisher–Kolmogorov equation

\frac{\partial n}{\partial t} = D \frac{\partial^{2} n}{\partial x^{2}} + r (n) n + noise,

(49)

for the time-dependent population density n(x, t) at position x in a linear habitat and time t. The growth rate r(n) is assumed to be density-dependent, with

r (n) = r_{0} (1 - \frac{n}{K}) (1 + B \frac{n}{K}),

(50)

where the parameter $B \geq 0$ accounts for co-operativity among individuals, which is also called an Allee effect. As discussed in Hallatschek (2018), lineages in the region of the wave tip are diffusively mixed within the timescale $τ_{mix} \sim \frac{1}{r} \ln^{2} K \sqrt{\frac{D}{r}}$ . This implies that, in this microscopic model, resampling from an offspring distribution roughly occurs every τ_mix generations. In Birzu et al. (2018, 2021), it was argued that depending on the strength of the Allee effect, the offspring distributions corresponding to any of the three distinct classes of the beta coalescent process can arise; namely, the Bolthausen–Sznitman coalescent when B < 2, the beta coalescent with $1 < α < 2$ when $2 < B < 4$ , and the Kingman coalescent when B > 4.

(A) The model of a range expanding population with two neutral alleles (green and gray). A broad offspring distribution arises dynamically in the front region. (B) Stationary distributions of the allele frequency when mutation rate θ is small (blue) and when θ is large (orange). The wiggling lines (blue/orange) are the numerical results in the traveling wave model, while the dotted lines (black) are those in the macroscopic model. The parameters of the Allee effect B are B = 1 (left), 3 (middle), and 8 (right). See Appendix I for the details of the implementation of the simulation and other parameter values.

To demonstrate clearly that our present study can serve as a macroscopic analysis of the traveling model, we introduce reversible mutations in the traveling wave model and measured the mutant frequency of the first $N \sim \frac{K}{k}$ individuals from the edge of the front. Here, k is the spatial decay rate, i.e., $n \sim e^{- k \tilde{x}}$ where $\tilde{x}$ is the coordinate comoving with the expansion. This definition of the mutant frequency is reasonable because only the wave front has a skewed offspring distribution due to the founder effect. In Figure 14B, for B = 1 (left), 3 (middle), and 8 (right), the frequency distributions in the traveling wave model are shown when the mutation rate is small (orange jagged line) and when it is large (blue jagged line). The corresponding distributions in the macroscopic model are shown by black dotted lines. The stationary distributions in the traveling wave model agree well with those in the macroscopic model. Especially, the transition from the M-shaped or U-shaped distribution to the monomodal distribution is consistently reproduced in the traveling wave model. These results underscore the correspondence between the traveling wave with the Allee effect and the beta coalescent process.

The above-described correspondence suggests that the spatial area occupied by one allele type in a range expansion should behave statistically like the time-integral over the allele frequency in the Cannings model. In the context of adapting (non-spatial) populations, this quantity describes the total number of mutational opportunities of a mutant lineage (Desai and Fisher 2007; Weissman et al. 2009; Neher and Shraiman 2011). As presented in Appendix J, the distribution of the time-integrated frequency exhibits a scaling behavior that depends on the offspring distribution sensitively. While a full discussion is beyond the scope of this paper, we expect that the distribution of areas serves as a useful observable to distinguish different prototypes of traveling waves (Birzu et al. 2018).

Broad offspring distributions with a scale: While scale-free offspring distributions often emerge over an intermediate time scale (τ_mix in the above traveling wave model), there are also species that over single generations show broad offspring numbers and violate the Wright–Fisher diffusion. For such species, it may be more natural to consider offspring distribution with a characteristic scale. In ‘sweepstake’ reproduction (Eldon and Wakeley 2006), a fixed and finite fraction of the population is replaced at every sweepstake event (specified by the parameter $Ψ$ in Eldon and Wakeley (2006)). Because $Ψ$ sets a characteristic scale in offspring numbers, power law relationships for the median of allele frequencies as well as frequency fluctuations cannot be expected, which we confirm in Appendix K. Nevertheless, the qualitative features of a sampling bias can be recognized quite clearly for sweepstake reproduction as well.

Either type of model ultimately is an approximation to true offspring distributions, and it depends on the situation, which one to use. As we argued, the beta-coalescent along with the forward-in-time model described in this article is the natural choice for range expansions, rapid adaptive process or other scenarios where the reproductive value of a chosen few are highly inflated.

Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

Funding

Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award R01GM115851, a National Science Foundation CAREER Award (#1555330), a Simons Investigator award from the Simons Foundation (#327934), RIKEN iTHEMS Program, and JSPS KAKENHI (JP19K03663).

Conflict of interest

The authors declare that there is no conflict of interest.

Acknowledgments

The authors thank Benjamin H. Good, Daniel B. Weissman, Jiseon Min, Joao Ascensao, Michael M. Desai, and Stephen Martis for their helpful discussions and comments.

Appendix A: Analytic results in the marginal case α = 1

Although the main target of our present study is the case of $1 < α < 2$ , we here provide analytical results for α = 1, which have not been derived before.

Figure A1 — The SFS $f_{SFS} (x) / μ$ when α = 1 for the selective advantage $σ = - 2, 0, 2$ . $f_{SFS} (x)$ is obtained by numerically evaluating the exact expression of t(x) in the first line of Equation A.5. As $x \to 0$ , f(x) becomes independent of σ. Near x = 1, while the magnitude of f(x) depends on σ, the scaling behavior (slope in the log–log plot) does not. See Equation A.14.

Site frequency spectrum in the presence of genuine selection

The transition density for α = 1 in the presence of natural selection is derived in Hallatschek (2018) (see Kosheleva and Desai 2013 for neutral case). In x space, it is given by

G^{σ} (x, τ | x_{0}) = \frac{1}{2 π x (1 - x)} \frac{sin π η}{cos π η + \cosh [η log \frac{e^{σ} x}{1 - x} - log \frac{e^{σ} x_{0}}{1 - x_{0}}]},

(A.1)

where $η \equiv e^{- τ}$ and σ is the selective advantage [there is an erratum in Equation 38 in Hallatschek (2018)].

For the purpose of computing the SFS (or, equivalently, the mean sojourn time), we set $x_{0} = 1 / N$ . Since we are considering the large N limit, the denominator of Equation A.1 can be rewritten as

\begin{matrix} cos π η + \cosh [η log \frac{e^{σ} x}{1 - x} - log \frac{e^{σ} x_{0}}{1 - x_{0}}] \approx \frac{1}{2} exp [η log \frac{x}{1 - x} - σ (1 - η) + log N] \\ = \frac{1}{2} N {(\frac{x}{1 - x})}^{η} e^{- σ (1 - η)} . \end{matrix}

(A.2)

Thus, the transition density for $x_{0} = \frac{1}{N}$ can be written as

G^{σ} (x, η | x_{0} = \frac{1}{N}) = \frac{e^{σ}}{π N x (1 - x)} \frac{sin π η}{{(\frac{x}{1 - x} e^{σ})}^{η}} .

(A.3)

Near the boundaries, this can be approximated as

G^{σ} (x, η | x_{0} = \frac{1}{N}) = {\begin{matrix} \frac{e^{σ}}{π N x} \frac{sin π η}{{(x e^{σ})}^{η}} (x ≪ 1) \\ \frac{e^{σ}}{π N (1 - x)} {((1 - x) e^{- σ})}^{η} sin π η (1 - x ≪ 1) . \end{matrix}

(A.4)

The SFS is given by $f_{SFS} (x) = N μ \times t (x)$ , where μ is the mutation rate per generation, and t(x) is the mean sojourn time density, which is given by

\begin{matrix} t (x) = \int_{0}^{\infty} d t G^{σ} (x, τ | x_{0} = \frac{1}{N}) = \int_{0}^{1} \frac{d η}{η} G^{σ} (x, η | x_{0}) \\ = {\begin{matrix} \frac{e^{σ}}{π N x} \int_{0}^{1} \frac{d η}{η} \frac{sin π η}{{(x e^{σ})}^{η}} (x ≪ 1) \\ \frac{e^{σ}}{π N (1 - x)} \int_{0}^{1} \frac{d η}{η} sin (π η) {((1 - x) e^{- σ})}^{η} (1 - x ≪ 1) . \end{matrix} \end{matrix}

(A.5)

Next, we compute the integrals in Equation A.5, asymptotically close to the absorbing boundaries (see Equation A.14 for the final results). To evaluate Equation A.5 for $x ≪ 1$ , we first consider the integral,

I_{ϵ} = \int_{0}^{1} d η exp f (η) .

(A.6)

When $f (η)$ has a sharp peak at $η = η^{*}$ , we approximate this integral as

I_{ϵ} \approx e^{f (η^{*})} \sqrt{\frac{2 π}{| f ″ (η^{*}) |}} .

(A.7)

In our case,

f (η) = - log η - log (ϵ) η + log sin π η

(A.8)

where $ϵ = x e^{σ}$ . $f (η)$ takes the maximum value at $η = η^{*} \approx 1 + \frac{1}{log ϵ}$ .¹ At $η = η^{*}, f (η^{*}) \approx - log ϵ - 1 + log \frac{- π}{log ϵ}$ ,² and $f ″ (η^{*}) \approx 1 - \frac{π^{2}}{{sin}^{2} (\frac{π}{log ϵ})} \approx {log}^{2} ϵ$ . The saddle-point evaluation in Equation A.7 is precise when $ϵ ≪ 1$ . By using these expressions, $I_{ϵ}$ can be evaluated as

I_{ϵ} \approx \frac{\sqrt{2 π}}{- log ϵ} e^{- 1} \frac{- π}{ϵ log ϵ} = \frac{\sqrt{2 π} π e^{- 1}}{ϵ {log}^{2} ϵ} .

(A.9)

By setting $ϵ = x e^{σ}$ , we find

f_{SFS} (x \approx 0) \sim μ \frac{\sqrt{2 π} e^{- 1}}{{(x (log x + σ))}^{2}} \sim μ \frac{\sqrt{2 π} e^{- 1}}{{(x log x)}^{2}} \propto μ \frac{1}{{(x log x)}^{2}} .

(A.10)

Next, to evaluate Equation A.5 for the high-frequency end, we consider the following integral

I'_{ϵ} = \int_{0}^{1} \frac{d η}{η} sin (π η) ϵ^{η} .

(A.11)

When $ϵ ≪ 1$ , the integrand takes the maximum value at the boundary η = 0. Thus,

I'_{ϵ} \approx \int_{0}^{1} d η π ϵ^{η} = \frac{π (- 1 + ϵ)}{log ϵ} \approx \frac{- π}{log ϵ} .

(A.12)

By setting $ϵ = (1 - x) e^{- σ}$ , we find

f_{SFS} (x \approx 1) \sim - μ \frac{e^{σ}}{(1 - x) (log (1 - x) - σ)} \sim - μ \frac{e^{σ}}{(1 - x) log (1 - x)} .

(A.13)

In summary, the SFS in Equation A.5 is given by

f_{SFS} (x) \sim {\begin{matrix} μ \frac{1}{{(x log x)}^{2}} (for x ≪ 1, e^{- σ}) \\ - μ \frac{e^{σ}}{(1 - x) log (1 - x)} (for 1 - x ≪ 1, e^{σ}) . \end{matrix}

(A.14)

Note that the dependence on σ disappears when $x ≪ 1$ . Figure A1 shows the plots of the SFS.

For comparison, we write the SFS for the Wright–Fisher model ( $α \geq 2$ ) (see, e.g., Crow and Kimura 1970; Evans et al. 2007);

f_{SFS}^{WF} (x) = θ \frac{e^{2 σ} (1 - e^{- 2 σ (1 - x)})}{(e^{2 σ} - 1) x (1 - x)} .

(A.15)

The asymptotic forms near the boundaries are given by

f_{SFS}^{WF} (x) \approx {\begin{matrix} θ \frac{1}{x} (for x ≪ 1) \\ θ σ (1 + \coth σ) (1 + (σ - 1) (x - 1)), (for 1 - x ≪ 1, | σ | (1 - x) ≪ 1) \end{matrix}

where we have expanded the SFS around x = 1 up to the sub-leading order. For a sufficiently strong selection ( $σ > 1$ ), the SFS increases with x at the high-frequency end. However, unlike the case of $α < 2$ , the increase is not strong and the SFS approaches the constant $σ (1 + coth σ)$ as $x \to 1$ .

Dynamics of the median of allele frequencies

When α = 1, we can derive a simple differential equation that described the median of trajectories. In the logit space, the transition density is given by

G (ψ, ρ | ψ_{0}) = \frac{sin π ρ}{2 π {cos π η + \cosh [ρ (ψ + σ) - (ψ_{0} + σ)]}}

(A.16)

where $ρ = e^{- τ}$ . The median $Ψ^{med}$ (at a given time point ρ) is characterized by

\int_{- \infty}^{Ψ^{med}} G (ψ, ρ | ψ_{0}) d ψ = \frac{1}{2} .

(A.17)

From the symmetry of cosh, the median is given by the peak of the transition density;

Ψ^{med} = - σ + \frac{1}{ρ} (ψ_{0} + σ) .

(A.18)

By differentiating Equation A.18 with respect to ρ and eliminating ψ₀, we obtain

\frac{d}{d ρ} Ψ^{med} = - \frac{1}{ρ^{2}} (ψ_{0} + σ) = - \frac{1}{ρ} (Ψ^{med} + σ) .

(A.19)

Noting that $\frac{d}{d t} = - ρ \frac{d}{d ρ}$ , we find

\frac{d}{d τ} Ψ^{med} = Ψ^{med} + σ .

(A.20)

Since the median is invariant under a coordinate transformation, the median $X^{med}$ in the x space is simply related with $Ψ^{med}$ via the logit transformation, $log \frac{X^{med}}{1 - X^{med}} = Ψ^{med}$ . By differentiating this with respect to time and using Equation A.20, we obtain

\frac{d}{d τ} X^{med} = X^{med} (1 - X^{med}) (log \frac{X^{med}}{1 - X^{med}} + σ) .

(A.21)

Allele frequency dynamics conditioned on fixation

By using Bayes’ theorem, the probability distribution of the allele frequency conditioned on fixation can be written as

P (x, τ | x_{0}, fixation) = P (x, τ, fixation | x_{0}) \times \frac{1}{P (fixation | x_{0})}

(A.22)

= P (x, τ | x_{0}) \times \frac{P (fixation | x)}{P (fixation | x_{0})} .

(A.23)

The fixation probability for the initial frequency x₀ is given by (see Hallatschek 2018)

P (fixation | x_{0}) = \frac{x_{0} e^{σ}}{1 + x_{0} (e^{σ} - 1)} .

(A.24)

In particular, the fixation probability of a single mutant is given by

P (fixation | x_{0} = \frac{1}{N}) \sim \frac{1}{N^{1 - s}} .

(A.25)

By using Equation A.24, the conditioned probability in Equation A.23 is computed as

\begin{matrix} P (x, t | x_{0}, fixation) = \frac{1}{2 π x (1 - x)} \times \frac{x e^{σ}}{1 + x (e^{σ} - 1)} \times \frac{1 + x_{0} (e^{σ} - 1)}{x_{0} e^{σ}} \\ \times \frac{sin π ρ}{cos π ρ + \cosh [ρ log \frac{e^{σ} x}{1 - x} - log \frac{e^{σ} x_{0}}{1 - x_{0}}]} \\ = \frac{1}{2 π x_{0} (1 - x)} \frac{1 + x_{0} (e^{σ} - 1)}{1 + x (e^{σ} - 1)} \frac{sin π ρ}{cos π ρ + \cosh [ρ log \frac{e^{σ} x}{1 - x} - log \frac{e^{σ} x_{0}}{1 - x_{0}}]} . \end{matrix}

(A.26)

Appendix B: Stationary distributions of traveling wave model in the presence of natural selection

In Figure 14 of the main text, the mutant allele is assumed be neutral. Here, we provide the results in the case where mutants have a fitness advantage σ (Figure B1). As in the main text, symmetrically reversible mutations are assumed.

Figure B1 — The stationary distributions of the mutant frequency for $θ = 0.1, 1, 5$ . $σ = 0, 1, 5$ . σ is the selection coefficient in the time-continuous description, $σ = s T_{c}$ .

Appendix C: Generalized central limit theorem

Here, we briefly summarize the generalized central limit theorem (Gnedenko and Kolmogorov 1968; Uchaikin and Zolotarev 1999). Suppose that each random number u_i is sampled from the Pareto distribution $P (u) = \frac{α}{u^{α + 1}} (u \geq 1)$ and consider the shifted and rescaled random variable ζ;

ζ = \frac{\sum_{i = 1}^{n} u_{i} - a_{n}}{b_{n}},

(C.1)

where a_n and b_n are

\begin{array}{l} a_{n} = 0, b_{n} = {(\frac{π n}{2 Γ (α) sin \frac{π α}{2}})}^{1 / α} for 0 < α < 1, \\ a_{n} = n log n, b_{n} = \frac{π}{2} n for α = 1, \\ a_{n} = \frac{α}{α - 1} n, b_{n} = {(\frac{π n}{2 Γ (α) sin \frac{π α}{2}})}^{1 / α} for 1 < α < 2, \\ a_{n} = \frac{α}{α - 1} n = 2 n, b_{n} = {(n log n)}^{1 / 2} for α = 2. \end{array}

(C.2)

It is well-known that the distribution of ζ is well-approximated by the α-stable distribution, which we denote as $P_{α} (ζ)$ . While an explicit expression of $P_{α} (ζ)$ is not available in general, the characteristic function is given by

\begin{matrix} 〈 e^{i s ζ} 〉 = \int d ζ e^{i s ζ} P_{α} (ζ) \\ \sim {\begin{matrix} exp [- | s |^{α} (1 + i sgn (s) \frac{2}{π} log | s |)] (α = 1) \\ exp [- | s |^{α} (1 - i sgn (s) tan \frac{π α}{2})] (α \neq 1) \end{matrix}, for n \to \infty . \end{matrix}

(C3)

Appendix D: The transition density of an allele frequency $w_{N} (y | x)$ and the asymptotic dynamics for large N

Allele-frequency change in a generation is characterized by the transition density $w_{N} (y | x)$ , which is the probability distribution of the allele frequency y at the next generation given the current allele frequency x. When N is large, the asymptotic dynamics can be described by a time-continuous differential Chapman–Kolmogorov equation, which is defined by an advection velocity V(x), diffusion coefficient D(x), and jump kernel $w (y | x)$ (Gardiner 2009). The triplet is obtained from the transition density $w_{N} (y | x)$ as follows:

\begin{array}{l} w (y | x) = \lim_{N \to \infty} \frac{w_{N} (y | x)}{δ t_{N}} \\ V (x) = \lim_{N \to \infty} \frac{1}{δ t_{N}} \int_{| y - x | < ϵ} (y - x) w_{N} (y | x) d y \\ D (x) = \lim_{N \to \infty} \frac{1}{δ t_{N}} \int_{| y - x | < ϵ} {(y - x)}^{2} w_{N} (y | x) d y, \end{array}

(D.1)

where $δ t_{N}$ is an N-dependent timescale, corresponding to one generation measured in units of the coalescent timescale. In the following, we derive the transition density $w_{N} (y | x)$ and the asymptotic dynamics for general α by using a similar computational technique used in Hallatschek (2018), wherein the case of α = 1 is studied extensively.

As mentioned in the main text, when $α \leq 2$ , the binomial sampling error is negligible for large N compared to the stochasticity coming from broad offspring number fluctuations, and we can replace the binomial distribution in Equation 13 of the main text with the Dirac delta function;

w_{N} (y | x) = {〈 δ (y - \frac{M}{M + W}) 〉}_{M, W} = {〈 \int_{- \infty}^{+ \infty} \frac{d σ}{2 π} e^{i (y - \frac{M}{M + W}) σ} 〉}_{M, W} .

(D.2)

Here ${〈 \cdot 〉}_{M, W}$ means the average over $M = \sum_{i = 1}^{x N} u_{i}$ and $W = \sum_{i = 1}^{(1 - x) N} v_{i}$ . Using the variable $s = \frac{σ}{M + W}$ , we can rewrite w_N as

\begin{matrix} w_{N} (y | x) = {〈 \int_{- \infty}^{+ \infty} \frac{d s}{2 π} (M + W) e^{- i s (M - y (M + W))} 〉}_{M, W} \\ = \partial_{y} \int \frac{d s}{2 π i s} {〈 e^{- isM (1 - y) + isWy)} 〉}_{M, W} \\ = \partial_{y} W_{N} (y | x), \end{matrix}

(D.3)

Here,

W_{N} (y | x) = \int_{- \infty}^{+ \infty} \frac{d s}{2 π i s} Φ (- s (1 - y); x N) Φ (s y; (1 - x) N)

(D.4)

with

Φ (s; n) = 〈 e^{i s \sum_{i = 1}^{n} u_{i}} 〉 .

(D.5)

To use the properties of the α-stable distributions in Appendix C, we further rewrite $W_{N} (y | x)$ as follows:

\begin{matrix} W_{N} (y | x) = \int_{- \infty}^{+ \infty} \frac{d s}{2 π i s} Φ (- s (1 - y); x N) Φ (s y; (1 - x) N) \\ = \int_{- \infty}^{+ \infty} \frac{d s}{2 π i s} 〈 e^{- i s (1 - y) \sum_{i = 1}^{x N} x_{i}} 〉 〈 e^{isy \sum_{i = 1}^{(1 - x) N} x_{i}} 〉 \\ = \int_{- \infty}^{+ \infty} \frac{d s}{2 π i s} {〈 e^{- i s (1 - y) (b_{x N} ζ + a_{x N})} 〉}_{ζ} {〈 e^{isy (b_{(1 - x) N} ζ' + a_{(1 - x) N})} 〉}_{ζ'} \\ = \int_{- \infty}^{+ \infty} \frac{d s}{2 π i s} e^{- i s (1 - y) a_{x N} + isy a_{(1 - x) N}} {〈 e^{- i s (1 - y) b_{x N} ζ} 〉}_{ζ} {〈 e^{isy b_{(1 - x) N} ζ'} 〉}_{ζ'} . \end{matrix}

(D.6)

When N is large, the quantities in the two brackets in the last line can be approximated by the characteristic functions of α-stable distribution, Equation C.3, with $s \to - s (1 - y) b_{x N}$ and $s \to s y b_{(1 - x) N}$ , respectively. Thus, when $α \neq 1$ , Equation D.6 can be computed as

\begin{array}{l} W_{N} (y | x) = \int_{- \infty}^{+ \infty} \frac{d s}{2 π i s} e^{- i s (1 - y) a_{x N} + isy a_{(1 - x) N}} \\ \times e^{- | s |^{α} {(1 - y)}^{α} b_{x N}^{α} (1 + i sgn (s) tan \frac{π α}{2})} e^{- | s |^{α} y^{α} b_{(1 - x) N}^{α} (1 - i sgn (s) tan \frac{π α}{2})} \\ = \int_{- \infty}^{+ \infty} \frac{d s}{2 π i s} e^{- | s |^{α} {{(1 - y)}^{α} b_{x N}^{α} + y^{α} b_{(1 - x) N}^{α}}} \\ \times e^{- i s (1 - y) a_{x N} + isy a_{(1 - x) N}} e^{- i | s |^{α} sgn (s) tan \frac{π α}{2} ({(1 - y)}^{α} b_{x N}^{α} - y^{α} b_{(1 - x) N}^{α})} \\ = \int_{0}^{\infty} \frac{d s}{π s} e^{- s^{α} {{(1 - y)}^{α} b_{x N}^{α} + y^{α} b_{(1 - x) N}^{α}}} \\ \times sin [- s ((1 - y) a_{x N} - y a_{(1 - x) N}) - s^{α} tan \frac{π α}{2} ({(1 - y)}^{α} b_{x N}^{α} - y^{α} b_{(1 - x) N}^{α})] . \end{array}

(D.7)

In the following, we evaluate the integral expression of $W_{N} (y | x)$ and compute the transition density $w_{N} (y | x)$ from Equation D.3.

When $α < 1$

By using Equation C.2,

a_{n} = 0, b_{n}^{α} = π (2 Γ (α) sin \frac{π α}{2}) n \equiv c_{α} n,

(D.8)

we have

\begin{matrix} W_{N} (y | x) = \int_{0}^{\infty} \frac{d s}{π s} e^{- s^{α} c_{α} N ({(1 - y)}^{α} x + y^{α} (1 - x))} \\ \times sin [- s^{α} c_{α} N tan \frac{π α}{2} ({(1 - y)}^{α} x - y^{α} (1 - x))] . \end{matrix}

(D.9)

By setting $N c_{α} s^{α} = σ, W_{N} (y | x)$ becomes

W_{N} (y | x) = \frac{1}{α} \int_{0}^{\infty} \frac{d σ}{π σ} e^{- σ ({(1 - y)}^{α} x + y^{α} (1 - x))} sin [- σ tan \frac{π α}{2} ({(1 - y)}^{α} x - y^{α} (1 - x))]

= - \frac{{tan}^{- 1} (\frac{tan (\frac{π α}{2}) (x {(1 - y)}^{α} - (1 - x) y^{α})}{(1 - x) y^{α} + x {(1 - y)}^{α}})}{π α} .

(D.10)

By differentiating it with respect to y, we obtain

w_{N} (y | x) = \frac{x (1 - x) sin (π α) {((1 - y) y)}^{α - 1}}{π (x^{2} {(1 - y)}^{2 α} + {(1 - x)}^{2} y^{2 α} + 2 x (1 - x) cos (π α) {((1 - y) y)}^{α})} .

(D.11)

Note that this does not depend on N, which is consistent with the fact that the coalescent time is $O (N^{0})$ when $α < 1$ .

When $1 < α < 2$

By using Equation C.2,

a_{n} = \frac{α}{α - 1} n

b_{n}^{α} = B_{α} n, where B_{α} \equiv \frac{π}{2 Γ (α) sin \frac{π α}{2}},

(D.12)

Equation D.7 becomes

\begin{matrix} W_{N} (y | x) = \int_{0}^{\infty} \frac{d s}{π s} e^{- B_{α} N s^{α} {{(1 - y)}^{α} x + y^{α} (1 - x)}} \\ \times sin [- s \frac{α}{α - 1} N (x - y) - s^{α} tan \frac{π α}{2} B_{α} N ({(1 - y)}^{α} x - y^{α} (1 - x))] . \end{matrix}

(D.13)

By changing the variable of integration as $σ = N^{1 / α} s$ , we have

\begin{matrix} W_{N} (y | x) = \int_{0}^{\infty} \frac{d σ}{π σ} e^{- B_{α} σ^{α} {{(1 - y)}^{α} x + y^{α} (1 - x)}} \\ \times sin [- σ \frac{α}{α - 1} N^{1 - \frac{1}{α}} (x - y) - σ^{α} tan \frac{π α}{2} B_{α} ({(1 - y)}^{α} x - y^{α} (1 - x))] . \end{matrix}

(D.14)

By changing the variable of integration as $σ' = \frac{α}{α - 1} | x - y | σ$ and redefining $σ'$ as σ, we have

W_{N} (y | x) = \int_{0}^{\infty} \frac{d σ}{π σ} e^{- μ_{1} σ^{α}} sin (- sgn (x - y) N^{1 - \frac{1}{α}} σ - μ_{2} σ^{α}),

(D.15)

where

\begin{array}{l} μ_{1} = B_{α} {(\frac{α - 1}{α})}^{α} \frac{{(1 - y)}^{α} x + y^{α} (1 - x)}{| x - y |^{α}} \\ μ_{2} = tan \frac{π α}{2} B_{α} {(\frac{α - 1}{α})}^{α} \frac{{(1 - y)}^{α} x - y^{α} (1 - x)}{| x - y |^{α}} . \end{array}

(D.16)

The transition probability $w_{N} (y | x)$ is given by

\begin{matrix} w_{N} (y | x) = \partial_{y} W_{N} (y | x) \\ = sgn (x - y) \frac{\partial_{y} μ_{1}}{π} \int_{0}^{\infty} d σ σ^{α - 1} e^{- μ_{1} σ^{α}} sin (N^{1 - \frac{1}{α}} σ + sgn (x - y) μ_{2} σ^{α}) \\ - \frac{\partial_{y} μ_{2}}{π} \int_{0}^{\infty} d σ σ^{α - 1} e^{- μ_{1} σ^{α}} cos (N^{1 - \frac{1}{α}} σ + sgn (x - y) μ_{2} σ^{α}) . \end{matrix}

(D.17)

Consider the integral

J_{α} = \int_{0}^{\infty} d σ σ^{α - 1} e^{- μ σ^{α}} e^{i N^{1 - \frac{1}{α}} σ}

(D.18)

where $μ = μ_{1} - i sgn (x - y) μ_{2}$ . Then, the transition probability can be written as

w_{N} (y | x) = sgn (x - y) \frac{\partial_{y} μ_{1}}{π} Im J_{α} - \frac{\partial_{y} μ_{2}}{π} Re J_{α} .

(D.19)

From Watson’s lemma, the integral $J_{α}$ can be expressed as a series expansion;

J_{α} = \sum_{m = 1}^{\infty} \frac{1}{N^{m (α - 1)}} e^{i \frac{π}{2} m α} {(- μ)}^{m - 1} \frac{Γ (m α)}{Γ (m)} .

(D.20)

By substituting Equation D.20 into Equation D.19 and writing $μ = | μ | e^{i θ}$ , we obtain

\begin{matrix} w_{N} (y | x) = \sum_{m = 1}^{\infty} \frac{{(- | μ |)}^{m - 1}}{N^{m (α - 1)}} \frac{Γ (m α)}{π Γ (m)} \\ [sgn (x - y) \partial_{y} μ_{1} sin (\frac{π}{2} m α + (m - 1) θ) - \partial_{y} μ_{2} cos (\frac{π}{2} m α + (m - 1) θ)] . \end{matrix}

(D.21)

The leading order (m = 1) is given by

\begin{matrix} w_{N} (y | x) = \frac{Γ (α)}{N^{α - 1}} (sgn (x - y) \frac{\partial_{y} μ_{1}}{π} sin \frac{π α}{2} - \frac{\partial_{y} μ_{2}}{π} cos \frac{π α}{2}) \\ = {\begin{matrix} N^{1 - α} α {(\frac{α - 1}{α})}^{α} x (1 - x) \frac{{(1 - y)}^{α - 1}}{{(y - x)}^{α + 1}} when x < y \\ N^{1 - α} α {(\frac{α - 1}{α})}^{α} x (1 - x) \frac{y^{α - 1}}{{(x - y)}^{α + 1}} when x > y . \end{matrix} \end{matrix}

(D.22)

Equation 21 in the main text can be obtained by introducing the continuous time $τ \equiv t / (C_{α} N^{α - 1})$ where $C_{α} \equiv α {(\frac{α - 1}{α})}^{α}$ . Equation 22 follows from the neutrality $\frac{d}{d t} 〈 x 〉 = 0$ . Note that the expansion of Equation D.20 is possible only when $| x - y |$ is finite, i.e., when $| x - y | > ϵ$ where ϵ is an N-independent positive constant. Although $w_{N} (y | x)$ in D.22 diverges as $| x - y | \to 0$ , this divergence is not a problem, because the jump term of the asymptotic dynamics in Equation 20 can be obtained from $w_{N} (y | x)$ for $| x - y | > ϵ$ (see Gardiner 2009).

When α = 2

a_n and b_n are given by

a_{n} = \frac{α}{α - 1} n = 2 n, b_{n} = {(n log n)}^{1 / 2} .

(D.23)

Equation D.7 then becomes

\begin{matrix} W_{N} (y | x) = \int_{0}^{\infty} \frac{d s}{π s} e^{- s^{2} {{(1 - y)}^{2} x N log x N + y^{2} (1 - x) N log (1 - x) N}} \times sin (- 2 s N (x - y)) \\ = \int_{0}^{\infty} \frac{d s}{π s} e^{- s^{2} {(- 2 x y + x + y^{2}) N log N + ({(1 - y)}^{2} x log x + y^{2} (1 - x) log (1 - x)) N}} \times sin (- 2 s N (x - y)) . \end{matrix}

By changing the variable of integration as $σ = {(N log N)}^{\frac{1}{2}} s$ ,

\begin{matrix} W_{N} (y | x) = \int_{0}^{\infty} \frac{d σ}{π σ} e^{- σ^{2} {(- 2 x y + x + y^{2}) + ({(1 - y)}^{2} x log x + y^{2} (1 - x) log (1 - x)) {(log N)}^{- 1}}} \times sin (- 2 {(\frac{N}{log N})}^{\frac{1}{2}} σ (x - y)) \\ \approx \int_{0}^{\infty} \frac{d σ}{π σ} e^{- σ^{2} (- 2 x y + x + y^{2})} \times sin (- 2 {(\frac{N}{log N})}^{\frac{1}{2}} σ (x - y)) \\ = - \frac{1}{2} erf ((x - y) \sqrt{\frac{N}{log (N) (- 2 x y + x + y^{2})}}), \end{matrix}

(D.24)

where $erf (x)$ is the Gauss error function

erf (x) = \frac{1}{\sqrt{π}} \int_{- x}^{x} e^{- t^{2}} d t .

(D.25)

By differentiating $W_{N} (y | x)$ with respect to y, we have

w_{N} (y | x) = {(\frac{N}{log N})}^{\frac{1}{2}} \frac{1}{\sqrt{π}} \frac{(1 - x) x}{{(- 2 x y + x + y^{2})}^{3 / 2}} e^{- \frac{N {(x - y)}^{2}}{log N (- 2 x y + x + y^{2})}} .

(D.26)

Suppose that ϵ is a sufficiently small but finite constant. For $| x - y | < ϵ$ , w_N can be approximated as

\begin{matrix} w_{N} (y | x) = {(\frac{N}{log N})}^{\frac{1}{2}} \frac{1}{\sqrt{π}} \frac{1}{{(x (1 - x))}^{1 / 2}} e^{- \frac{N {(x - y)}^{2}}{log N (x (1 - x))}} \\ = \frac{1}{\sqrt{2 π σ^{2}}} e^{- \frac{{(x - y)}^{2}}{2 σ^{2}}} . \end{matrix}

(D.27)

where $2 σ^{2} = \frac{log N}{N} x (1 - x)$ . From the symmetry $y - x \to - (y - x)$ of $w_{N} (y | x)$ , the advection term is zero. The diffusivity D is given by

D = \frac{1}{δ t_{N}} \int_{| x - y | < ϵ} d y {(x - y)}^{2} w_{N} (y | x) = \frac{1}{δ t_{N}} σ^{2} = \frac{1}{δ t_{N}} \frac{log N}{N} \frac{1}{2} x (1 - x) = \frac{1}{2} x (1 - x),

(D.28)

where we have introduced the natural timescale as $δ t_{N} = \frac{log N}{N}$ and used the integral approximation

\int_{- ϵ}^{ϵ} d Δ Δ^{2} \frac{1}{\sqrt{2 π σ^{2}}} exp (- \frac{Δ^{2}}{2 σ^{2}}) = \frac{1}{\sqrt{2 π}} σ (\sqrt{2 π} σ erf (\frac{ϵ}{\sqrt{2} σ}) - 2 ϵ e^{- \frac{ϵ^{2}}{2 σ^{2}}}) \approx σ^{2} .

(D.29)

Finally, the jump kernel asymptotically vanishes on the time scale $δ t_{N}$ ,

w (y | x) = \lim_{N \to \infty} \frac{w_{N} (y | x)}{δ t_{N}},

(D.30)

because for fixed x, y with $| x - y | > ϵ, w_{N} (y | x)$ becomes exponentially small as N becomes large.

Thus, in the large-N limit, α = 2 corresponds to the Wright–Fisher diffusion for a population of effective size

N_{e} = N log (N) .

(D.31)

When $α > 2$

In this case, since the Pareto distribution $P (u) = \frac{α}{u^{α + 1}} (u \geq 1)$ has finite mean $a = \frac{α}{α - 1}$ and finite variance $b^{2} = \frac{α}{{(α - 1)}^{2} (α - 2)}$ , and the large N limit of the allele frequency dynamics should be described by the Wright–Fisher diffusion process. To confirm this more generally, we consider a general distribution with finite mean and variance, namely, consider that each individual’s offspring number u_i is sampled from a distribution with mean a and variance b². Then, from the central limit theorem, the shifted and rescaled variable

ζ = \frac{\sum_{i = 1}^{n} x_{i} - a_{n}}{b_{n}}, where a_{n} = a n, b_{n} = \sqrt{n} b,

(D.32)

obeys the normal distribution $N (0, 1)$ . Its characteristic function is given by $f (s) = exp (- \frac{1}{2} s^{2})$ . Thus, we have

\begin{matrix} W_{N} (y | x) \approx \int_{- \infty}^{+ \infty} \frac{d s}{2 π i s} e^{- i s (1 - y) a_{x N} + isy a_{(1 - x) N}} f (- s (1 - y) b_{x N}) f (s y b_{(1 - x) N}) \\ = \int_{- \infty}^{+ \infty} \frac{d s}{2 π i s} e^{- i s (1 - y) axN + isya (1 - x) N} f (- s (1 - y) \sqrt{x N} b) f (s y \sqrt{(1 - x) N} b) \\ = \int_{- \infty}^{+ \infty} \frac{d s}{2 π i s} e^{- i s (1 - y) axN + isya (1 - x) N} e^{- \frac{1}{2} s^{2} {(1 - y)}^{2} x N b^{2}} e^{- \frac{1}{2} s^{2} y^{2} (1 - x) N b^{2}} . \end{matrix}

(D.33)

By setting $σ = N^{1 / 2} s$ ,

\begin{matrix} W_{N} (y | x) = \int_{- \infty}^{+ \infty} \frac{d σ}{2 π i σ} e^{- i a (x - y) N^{1 / 2} σ} e^{- \frac{1}{2} b^{2} σ^{2} ({(1 - y)}^{2} x + y^{2} (1 - x))} \\ = \int_{0}^{+ \infty} \frac{d σ}{π σ} sin (- a (x - y) N^{1 / 2} σ) e^{- \frac{1}{2} b^{2} σ^{2} ({(1 - y)}^{2} x + y^{2} (1 - x))} \\ = - \frac{1}{2} erf (\frac{a \sqrt{N} (x - y)}{\sqrt{2} b \sqrt{(- 2 x y + x + y^{2})}}) . \end{matrix}

(D.34)

Thus, we obtain

w_{N} (y | x) = \partial_{y} W_{N} (y | x) = \sqrt{N} \sqrt{\frac{1}{2 π}} γ x (1 - x) \frac{exp (- \frac{γ^{2} N {(x - y)}^{2}}{2 (- 2 x y + x + y^{2})})}{{(- 2 x y + x + y^{2})}^{3 / 2}} .

(D.35)

where $γ \equiv a / b$ . For the Pareto distribution, $γ = (\frac{α}{α - 1}) / \sqrt{\frac{α}{{(α - 1)}^{2} (α - 2)}} = \sqrt{α (α - 2)}$ .

For $| x - y | > ϵ, w_{N} (y | x)$ becomes exponentially small as N becomes large, and so the jump term does not exist in the asymptotic dynamics; $w (y | x) = 0$ . For $| x - y | < ϵ$ , we can approximate $w_{N} (y | x)$ as

\begin{matrix} w_{N} (y | x) = \sqrt{\frac{N γ^{2}}{2 π x (1 - x)}} exp (- \frac{γ^{2} N {(x - y)}^{2}}{2 x (1 - x)}) \\ = \frac{1}{\sqrt{2 π Σ^{2}}} e^{- \frac{{(x - y)}^{2}}{2 Σ^{2}}}, \end{matrix}

(D.36)

where $Σ^{2} = \frac{x (1 - x)}{γ^{2} N}$ . From the symmetry $y - x \to - (y - x)$ of $w_{N} (y | x)$ , the advection is zero. Finally, the diffusion is evaluated as

\int_{| x - y | < ϵ} d y {(x - y)}^{2} w_{N} (y | x) = Σ (Σ erf (\frac{ϵ}{\sqrt{2} Σ}) - \sqrt{\frac{2}{π}} ϵ e^{- \frac{ϵ^{2}}{2 Σ^{2}}}) \approx Σ^{2} = \frac{x (1 - x)}{γ^{2} N} .

(D.37)

Thus, by re-scaling time as $τ = \frac{t}{γ^{2} N}$ , we obtain

D = x (1 - x),

(D.38)

which corresponds to the Wright–Fisher diffusion of a population of effective size $N_{e} = N γ^{2} = N α (α - 2)$ . Notice $N_{e} \to 0$ as $α \to 2$ , indicating that the concept of the effective population size breaks down when the variance of the offspring distribution diverges.

Appendix E: From Lambda-Fleming-Viot Generator to differential Chapman–Kolmogorov equation

In Appendix D, the jump density $w (y | x)$ is derived from the generalized Wright–Fisher sampling, Equation 13 in the main text. Here, we present another more formal derivation of the jump density $w (y | x)$ for $1 < α < 2$ . See Hallatschek (2018) for the case α = 1.

Jump density for general Λ measure

The backward generator of the Λ coalescent process for the biallelic model (see, e.g., Etheridge et al. 2010; Griffiths 2014) is given by

ℓ G_{τ} (x | x_{0}) = \int_{0}^{1} (x_{0} G_{τ} (x | x_{0} + (1 - x_{0}) λ) - G_{τ} (x | x_{0}) + (1 - x_{0}) G_{τ} (x | x_{0} - x_{0} λ)) \frac{Λ (d λ)}{λ^{2}} .

(E.1)

This can be rewritten as a sum of two terms:

L G_{τ} (x | x_{0}) = A + B,

(E.2)

where

A = x_{0} \int_{0}^{1} (G_{τ} (x | x_{0} + (1 - x_{0}) λ) - G_{τ} (x | x_{0}) - (1 - x_{0}) λ \partial_{x_{0}} G_{τ} (x | x_{0})) \frac{Λ (d λ)}{λ^{2}},

(E.3)

B = (1 - x_{0}) \int_{0}^{1} (G_{τ} (x | x_{0} - x_{0} λ) - G_{τ} (x | x_{0}) + x_{0} λ \partial_{x_{0}} G_{τ} (x | x_{0})) \frac{Λ (d λ)}{λ^{2}} .

(E.4)

We introduce the integration variable $x' \equiv x_{0} + (1 - x_{0}) λ$ for A and $x' \equiv x_{0} - x_{0} λ$ for B, respectively. By writing

\frac{Λ (d λ)}{λ^{2}} = \frac{l (λ)}{λ^{2}} d λ,

(E.5)

A and B become

A = x_{0} (1 - x_{0}) \int_{x_{0}}^{1} (G_{τ} (x | x') - G_{τ} (x | x_{0}) - (x' - x_{0}) \partial_{x_{0}} G_{τ} (x | x_{0})) \frac{l (\frac{x' - x_{0}}{1 - x_{0}})}{{(x^{'} - x_{0})}^{2}} d x',

(E.6)

B = x_{0} (1 - x_{0}) \int_{0}^{x_{0}} (G_{τ} (x | x') - G_{τ} (x | x_{0}) - (x' - x_{0}) \partial_{x_{0}} G_{τ} (x | x_{0})) \frac{l (\frac{x_{0} - x'}{x_{0}})}{{(x^{'} - x_{0})}^{2}} d x' .

(E.7)

Defining the jump kernel $w (x | x_{0})$ as

w (x' | x_{0}) = {\begin{matrix} \frac{x_{0} (1 - x_{0})}{{(x^{'} - x_{0})}^{2}} l (\frac{x' - x_{0}}{1 - x_{0}}) (x' > x_{0}), \\ \frac{x_{0} (1 - x_{0})}{{(x^{'} - x_{0})}^{2}} l (\frac{x_{0} - x'}{x_{0}}) (x' < x_{0}), \end{matrix}

(E.8)

we can formally rewrite the generator as

L G_{τ} (x | x_{0}) = V (x_{0}) \partial_{x_{0}} G_{τ} (x | x_{0}) + PV \int_{0}^{1} w (x' | x_{0}) [G_{τ} (x | x') - G_{τ} (x | x_{0})] d x',

(E.9)

where

V (x_{0}) = - PV \int_{0}^{1} d x' w (x' | x_{0}) (x' - x_{0}) .

(E.10)

When the measure is the Beta distribution Beta $(α, 2 - α)$

We take the Beta $(α, 2 - α)$ distribution as the Λ measure, which corresponds to the descendant distribution considered in this study, $\sim 1 / u^{1 + α}$ :

\frac{Λ (d λ)}{λ^{2}} = \frac{l (λ) d λ}{λ^{2}} = \frac{λ^{+ 1 - α} {(1 - λ)}^{α - 1}}{B (α, 2 - α)} \frac{d λ}{λ^{2}} = \frac{λ^{- 1 - α} {(1 - λ)}^{α - 1}}{B (α, 2 - α)} d λ .

(E.11)

With this measure, A and B become

A = \frac{x_{0} (1 - x_{0})}{B (α, 2 - α)} \int_{x_{0}}^{1} d x' [G_{τ} (x | x') - G_{τ} (x | x_{0}) - (x' - x_{0}) \partial_{x_{0}} G_{τ} (x | x_{0})] {(x^{'} - x_{0})}^{- 1 - α} {(1 - x^{'})}^{α - 1},

(E.12)

B = \frac{x_{0} (1 - x_{0})}{B (α, 2 - α)} \int_{0}^{x_{0}} d x' [G_{τ} (x | x') - G_{τ} (x | x_{0}) - (x' - x_{0}) \partial_{x_{0}} G_{τ} (x | x_{0})] {(x_{0} - x^{'})}^{- 1 - α} x^{'}^{α - 1} .

(E.13)

Note that the integrals A and B are convergent for $α \in (0, 2)$ , because, near $x' \sim x_{0}$ , the terms inside $[\dots]$ are $O ({(x' - x_{0})}^{2})$ and so the integrands are $O (| x' - x_{0} |^{1 - α})$ . The jump kernel is given by

w (x' | x_{0}) = {\begin{matrix} \frac{x_{0} (1 - x_{0})}{B (α, 2 - α)} {(x^{'} - x_{0})}^{- 1 - α} {(1 - x^{'})}^{α - 1} (x' > x_{0}) \\ \frac{x_{0} (1 - x_{0})}{B (α, 2 - α)} {(x_{0} - x^{'})}^{- 1 - α} {(x^{'})}^{α - 1} (x' < x_{0}) . \end{matrix}

(E.14)

When $1 < α < 2$ , this density agrees with Equation 21 of the main text (up to a proportionality constant). The advection is given by

\begin{matrix} V (x_{0}) = - PV \int_{0}^{1} d x' w (x' | x_{0}) (x' - x_{0}) \\ = \frac{x_{0} (1 - x_{0})}{B (α, 2 - α)} (\int_{0}^{x_{0} - 0} d x' {(x_{0} - x')}^{- α} x'^{α - 1} - \int_{x_{0} + 0}^{1} d x' {(x' - x_{0})}^{- α} {(1 - x')}^{α - 1}) . \end{matrix}

(E.15)

Note that, when $α > 1$ , the limit ${lim}_{ϵ \to 0} \int_{0}^{x_{0} - ϵ} + \int_{x_{0} + ϵ}^{1}$ in Equation E.15 does not exist, although this divergence is rather formal since there exists a natural cutoff $ϵ \sim \frac{1}{N}$ for a finite-size population.

Appendix F: The transition density for the differential Chapman–Kolmogorov equation for $1 < α < 2$

Here we derive the short-time transition density given in Equations 23 and 24 and determine $g (ξ)$ in the scaling ansatz given in Equation 39.

The short-time transition density

Before discussing the CK equation in Equation 18, it is instructive to start from the simple diffusion equation,

\partial_{τ} P (x, τ) = D \partial_{x}^{2} P (x, τ),

(F.1)

with the initial condition $P (x, τ = 0) = δ (x - x_{0})$ . The solution of this initial value problem is given by

P (Δ x, τ) = \frac{1}{\sqrt{2 π (2 D τ)}} exp (- \frac{Δ x^{2}}{2 (2 D τ)}),

(F.2)

which is usually derived from the Laplace-Fourier transformation. However, this solution can also be obtained by using the central limit theorem: Equation F.1 is equivalent to a Brownian motion where jumps $X \to X \pm a$ occur with rate $\frac{m}{2}$ , where a and m are related with D via $D = \frac{a^{2} m}{2}$ . Since $n \approx m τ$ jumps occur in time τ, the displacement is approximately given by $Δ X (τ) \approx \sum_{i = 1}^{n} l_{i}$ where $l_{i} = \pm a$ . Then, from the central limit theorem, $Δ X (τ)$ is distributed according to the normal distribution with mean $n 〈 l_{i} 〉 = 0$ and variance $n 〈 l_{i}^{2} 〉 = (m τ) a^{2} = 2 D τ$ , namely, Equation F.2. Note that, even if the diffusion constant depends on x, the solution in Equation F.2 (with $D \to D (x_{0})$ ) is valid in short times.

Essentially the same argument can be applied to the CK dynamics, except that the generalized central limit theorem should be employed since the variance of jump sizes is divergent in the case of the CK dynamics. Suppose that the initial density is given by $P (x', τ = 0) = δ (x' - x)$ (for notational simplicity, the subscript 0 on x is dropped). In the CK dynamics, the frequency change $Δ X (τ) = X (τ) - x$ is caused by the bias V(x) in Equation 20 and by stochastic jumps. The rate of a frequency-increasing jump and that of a frequency-decreasing jump are given by

W_{+} (x) = \int_{x + ϵ}^{1} w (x' | x) d x' = \frac{x}{α} {(\frac{1 - x}{ϵ})}^{α},

(F.3)

W_{-} (x) = \int_{0}^{x - ϵ} w (x' | x) d x' = \frac{1 - x}{α} {(\frac{x}{ϵ})}^{α},

(F.4)

respectively. Therefore, the expected number n of jump events in time τ is given by

n = (W_{-} + W_{+}) τ .

(F.5)

Because randomness in the number of jump events is negligible compared to that in jump sizes, it can be assumed that exactly n jumps occur in time τ. Then, the displacement $Δ X (τ) = X (τ) - x$ can be written as

Δ X (τ) = V (x) τ + \sum_{i = 1}^{n} l_{i},

(F.6)

where $l_{i} \in [- x, - ϵ] \cup [ϵ, 1 - x]$ denotes the displacement due to the i-th jump. For small τ, $w (y | x (τ')) \approx w (y | x)$ for $0 < τ' < τ$ , which means that $l_{1}, \dots, l_{n}$ are independent and identically distributed. From Equation 19, each l_i is approximately sampled from the following power-law distribution,

P (l) = {\begin{matrix} \frac{W_{+}}{W_{+} + W_{-}} \frac{ϵ^{α} α}{l^{α + 1}} (l \in [+ ϵ, + \infty)) \\ 0 (l \in (- ϵ, + ϵ)) \\ \frac{W_{-}}{W_{+} + W_{-}} \frac{ϵ^{α} α}{| l |^{α + 1}} (l \in (- \infty, - ϵ]) \end{matrix},

(F.7)

where the factor $\frac{W_{+}}{W_{-} + W_{+}}$ (resp. $\frac{W_{-}}{W_{-} + W_{+}}$ ) represents the probability that a given jump is frequency-increasing (resp. frequency-decreasing). P(l) is normalized as $\int_{- \infty}^{\infty} P (l) d l = 1$ . Note that, in Equation F.7, the original range $[- x, - ϵ] \cup [ϵ, 1 - x]$ of l has been extended to $[(- \infty, - ϵ] \cup [ϵ, \infty)$ . Under this modification, the variance $〈 x {(τ)}^{2} 〉$ is no longer well-defined. However, this modification does not alter short-time properties of typical events, because the presence of the boundaries at x = 0, 1 is not important for them.

By noting that P(l) has a divergent variance and that the number of jumps is $n \approx \frac{τ}{ϵ^{α}} ≫ 1$ even for small τ (as $ϵ \to + 0$ ), the generalized central limit theorem states that the sum $\sum_{i = 1}^{n} l_{i}$ in Equation F.6 obeys an α-stable distribution. The stable distribution is characterized by $〈 l 〉, β, γ$ given below (see, e.g., Uchaikin and Zolotarev 2011): The mean $〈 l 〉$ is

〈 l 〉 = \frac{W_{+} - W_{-}}{W_{+} + W_{-}} \frac{α}{α - 1} ϵ = \frac{x {(1 - x)}^{α} - x^{α} (1 - x)}{x^{α} (1 - x) + x {(1 - x)}^{α}} \frac{α}{α - 1} ϵ .

(F.8)

Asymptotically, P(l) satisfies

\begin{matrix} \int_{l}^{\infty} P (l') d l' = \frac{W_{+}}{W_{-} + W_{+}} \frac{ϵ^{α}}{l^{α}} \equiv \frac{c_{+}}{l^{α}} (l \to \infty), \\ \int_{- \infty}^{l} P (l') d l' = \frac{W_{-}}{W_{-} + W_{+}} \frac{ϵ^{α}}{| l |^{α}} \equiv \frac{c_{-}}{| l |^{α}} (l \to - \infty) . \end{matrix}

(F.9)

Note $c_{-} + c_{+} = ϵ^{α}$ . The parameters $γ$ and β are determined from $c_{\pm}$ ;

γ \equiv {(\frac{π (c_{+} + c_{-}) n}{2 Γ (α) sin \frac{π α}{2}})}^{\frac{1}{α}} = ϵ {(\frac{π n}{2 Γ (α) sin \frac{π α}{2}})}^{\frac{1}{α}} = {(τ \frac{π (x {(1 - x)}^{α} + (1 - x) x^{α})}{2 Γ (α + 1) sin \frac{π α}{2}})}^{\frac{1}{α}}

(F.10)

β \equiv \frac{c_{+} - c_{-}}{c_{+} + c_{-}} = \frac{W_{+} - W_{-}}{W_{+} + W_{-}} = \frac{x {(1 - x)}^{α} - x^{α} (1 - x)}{x^{α} (1 - x) + x {(1 - x)}^{α}} .

(F.11)

Then, from the generalized central limit theorem, the random variable,

Z \equiv \frac{\sum_{i = 1}^{n} l_{i} - n 〈 l 〉}{γ},

(F.12)

has the following characteristic function,

〈 e^{ikZ} 〉 = \int e^{ikz} P_{Z} (z) d z \overset{ϵ \to + 0}{=} exp [- | k |^{α} (1 - i β tan \frac{π α}{2} sign k)] .

(F.13)

We can determine the characteristic function for $Δ x$ , using Equation F.13 and the relation

Δ X (τ) = V (x) τ + γ Z + n 〈 l 〉,

(F.14)

which follows from Equations F.6 and F.12. While V(x) and $〈 l 〉$ are divergent in the limit $ϵ \to + 0$ , we can show, by using Equation F.8 and $V (x) = - \int_{| x - x' | > ϵ} d x' (x' - x) w (x' | x) \approx \frac{1}{ϵ^{α - 1}} \frac{1}{α - 1} (x^{α} (1 - x) - x {(1 - x)}^{α})$ , that these divergent terms exactly cancel out each other. Therefore, the displacement is simplified as

Δ X (τ) = γ Z .

(F.15)

Equations 24 and 23 in the main text are the same as Equations F.15 and F.13 (with the replacement of $x \to x_{0}$ ). By substituting this into Equation F.13, we obtain the characteristic function of the allele frequency $X (τ)$ ;

〈 e^{ikX (τ)} 〉 = \int e^{ikx'} P (x', τ | x_{0}) d x' = exp [i k x_{0} - | γ (x_{0}) k |^{α} (1 - i β (x_{0}) tan \frac{π α}{2} sign k)] .

(F.16)

The scaling ansatz for the long-time transition density in Equation 39

Consider the initial distribution $P (x, τ = 0) = δ (x - x_{0})$ with $x_{0} ≪ 1$ . After some time, the distribution spreads over the region $x ≪ 1$ with a peak at the extinction boundary x = 0. As presented in Equation 37 of the main text, up to a constant prefactor, $P (x, τ)$ takes the following form

P (x, τ) \sim τ^{- 2 η} g (ξ),

where $η = {(α - 1)}^{- 1}$ and $ξ = \frac{x}{τ^{η}}$ . Here, we present an analytic argument to determine $g (ξ)$ .

Equation 20 can be rewritten as

\frac{\partial P}{\partial τ} = \int_{| Δ | < ϵ} d Δ (f_{Δ} (x - Δ) P (x - Δ, τ) - f_{Δ} (x) P (x, τ)) + \frac{\partial}{\partial x} \int_{| Δ | < ϵ} d Δ (f_{Δ} (x) P (x, τ)),

(F.17)

where $f_{Δ} (x) \equiv w (x + Δ | x)$ given by Equation 21. For $x ≪ 1, f_{Δ} (x)$ is approximately given by

f_{Δ} (x) = {\begin{matrix} \frac{x}{Δ^{α + 1}} (Δ > 0) \\ \frac{x {(x + Δ)}^{α - 1}}{Δ^{α + 1}} (Δ < 0) \end{matrix} .

(F.18)

We substitute the ansatz $P (x, τ) \sim τ^{- 2 η} g (ξ)$ into the above CK equation. The left-hand side of the CK equation becomes

LHS = - 2 η τ^{- 2 η - 1} g (ξ) - η τ^{- 2 η - 1} g' (ξ) ξ,

(F.19)

which is proportional to $τ^{- 2 η - 1} = τ^{- \frac{α + 1}{α - 1}}$ . The right-hand side is decomposed into the integrals over $Δ > 0$ and those over $Δ < 0$ . We can show that the former is proportional to $τ^{- \frac{α + 1}{α - 1}}$ , while the latter is proportional to $τ^{- \frac{2}{α - 1}}$ .³ Since the extinction time for the initial frequency $x_{0} ≪ 1$ is much shorter than the coalescent timescale, we can assume $τ ≪ 1$ , which implies that the integrals over $Δ > 0$ are negligible compared to those over $Δ > 0$ . By evaluating the integrals over $Δ > 0$ using the scaling form of $P (x, τ)$ and comparing them with Equation F.19, we have

- η (2 g (ξ) + ξ g' (ξ)) = \int_{0}^{\infty} \frac{d δ}{δ^{α + 1}} ((ξ - δ) g (ξ - δ) Θ (ξ - δ) - ξ g (ξ) + δ \frac{d}{d ξ} (ξ g (ξ))),

(F.20)

where $Θ (\cdot)$ is the Heaviside step function. Note that the variable of integration has been changed from Δ to $δ = \frac{Δ}{τ^{η}}$ , and the upper bound in the integral has been extended into $+ \infty$ , to make the equation analytically tractable. It is convenient to express Equation F.20 in terms of $Φ (ξ) \equiv ξ g (ξ)$ ;

- η (\frac{Φ (ξ)}{ξ} + Φ' (ξ)) = \int_{0}^{\infty} \frac{d δ}{δ^{α + 1}} (Φ (ξ - δ) Θ (ξ - δ) - Φ (ξ) + δ Φ' (ξ)) .

(F.21)

The solution of the integro-differential equation in Equation F.21 can be obtained as a series expansion. Assume, for small ξ,

Φ (ξ) = c_{1} ξ^{β} + \dots,

(F.22)

where c₁ is a normalization and the exponent of the leading term is denoted by $β \in (0, 1)$ . Here, $β < 1$ is required since we are considering the situation where $P (x, τ)$ is monotonically decreasing in x, while $β > 0$ is required to make $P (x, τ)$ normalizable. By substituting Equation F.22 into Equation F.21, we have

- \frac{β + 1}{α - 1} \frac{1}{ξ^{1 - β}} + \dots = \frac{Γ (- α) Γ (1 + β)}{Γ (1 - α + β)} \frac{1}{ξ^{α - β}} + \dots .

(F.23)

Since $\frac{1}{ξ^{1 - β}} ≪ \frac{1}{ξ^{α - β}}$ for $ξ ≪ 1$ , in order for the two sides to be balanced, the coefficient $\frac{Γ (- α) Γ (1 + β)}{Γ (1 - α + β)}$ needs to be zero, which is possible only when $Γ (1 - α + β)$ diverges. Since $1 < α < 2$ and $0 < β < 1$ , we can conclude $β = α - 1$ . Therefore, the leading term of $g (ξ)$ is given by

g (ξ) = \frac{c_{1}}{ξ^{2 - α}} + \dots (ξ ≪ 1) .

(F.24)

More generally, by starting from the ansatz,

Φ (ξ) = \sum_{m = 1}^{\infty} c_{m} ξ^{(α - 1) m},

(F.25)

the coefficients $c_{2}, c_{3}, \dots$ can be determined iteratively:

c_{m + 1} = - \frac{1 + (α - 1) m}{α - 1} \frac{Γ (m (α - 1))}{Γ (- α) Γ (m (α - 1) + α)} c_{m} (m = 1, 2, \dots) .

(F.26)

By using this iteratively, we can express $Φ (ξ)$ as

Φ (ξ) = c_{1} \sum_{m = 1}^{\infty} {(- 1)}^{m + 1} \frac{Γ (α + 1) {(\frac{α}{α - 1})}_{m - 1}}{α {(α - 1)}^{m} Γ {(- α)}^{m - 1} Γ (m + 1) Γ (m (α - 1))} ξ^{(α - 1) m},

(F.27)

where ${(\frac{α}{α - 1})}_{m - 1}$ is the Pochhammer symbol, ${(q)}_{n} = Γ (q + n) / Γ (q)$ . The analytic expression of $g (ξ)$ can be obtained from this using $g (ξ) = \frac{Φ (ξ)}{ξ}$ .

On the other hand, for $ξ ≫ 1$ , we expect that $g (ξ)$ decreases in the same way as the offspring distribution does;

g (ξ) \sim \frac{1}{ξ^{α + 1}} + \dots (ξ ≫ 1) .

(F.28)

Therefore, we expect there is a crossover point $ξ_{c}$ such that $g (ξ) \sim \frac{1}{ξ^{2 - α}} + \dots$ for $ξ ≪ ξ_{c}$ and $g (ξ) \sim \frac{1}{ξ^{α + 1}} + \dots$ for $ξ ≫ ξ_{c}$ . The scaling form for $ξ ≫ ξ_{c}$ can indeed be confirmed by considering the following ansatz for $Φ (ξ)$ ,

Φ (ξ) = {\begin{matrix} c_{1} ξ^{α - 1} + \dots (ξ < ξ_{c}) \\ c' ξ^{- α'} + \dots (ξ > ξ_{c}) \end{matrix},

(F.29)

where $c'$ is a normalization and $α'$ is an exponent to be determined. Substituting this ansatz into Equation F.21, we can show $α' = α$ , leading to $g (ξ) \sim \frac{1}{ξ^{α + 1}} + \dots$ for $ξ > ξ_{c}$ .

Finally, we remark that, while Equation F.27 is derived assuming $ξ ≪ 1$ , the series converges for any $ξ > 0$ . This indicates that the scaling form $g (ξ) \sim \frac{1}{ξ^{α + 1}} + \dots$ for large ξ should directly follow from a resummation of the infinite series in Equation F.27. In fact, numerical evaluation of a finite truncation of the series indicates the crossover behavior Equation F.29 (see Figure F1).

Figure F1 — The infinite series in Equation F.27 is evaluated numerically by truncating at m = 150 and using the van Wijngaarden transformation (solid line). $α = 1.7$ is used. The dashed blue and red lines represent the asymptotic behaviors given in Equation F.29.

Appendix G: Site frequency spectra in presence of selection

Here, we argue the effect of the genuine selection on the SFS by using the effective bias when $1 < α < 2$ . As discussed in the main text, there is a crossover point x_c, shown in Equation 39, below which the selection is negligible compared to the effective bias (see Figure 13). Thus, we can expect that the SFS becomes independent of the selective advantage σ for a sufficiently small frequency x. Similarly, for the high-frequency end $1 - x ≪ 1$ , the selection is negligible compared with the effective bias. Therefore, we expect that $f_{SFS} (x) \sim \frac{1}{V_{eff} (x)} \propto {(1 - x)}^{- α + 2}$ even in the presence of natural selection. In particular, the exponent is independent of σ. Figure G1 shows the numerical results when $α = 1.5$ . As x approaches 0, the SFS becomes independent of the selective advantage σ. For frequent variants $1 - x ≪ 1$ , the SFS can be fitted well by ${(1 - x)}^{- α + 2}$ , while the magnitude of the SFS increases with σ. A similar result can be obtained analytically when α = 1 (see Appendix A).

Figure G1 — Left: The SFS under positive selection $s = 0, 0.005, 0.01, 0.02$ , $α = 1.5$ , and $N = 10^{6}$ . Right: The SFS near x = 1. The straight lines are drawn assuming $SFS (x) \propto 1 / {(1 - x)}^{2 - α}$ . The slope is almost independent of s.

Appendix H: Derivation of the rate of adaptation in Equation 46 of the main text

Here, we conjecture the rate of adaptation for an asexual population with a broad offspring distribution ( $1 < α < 2$ ) in the clonal-interference regime, using a self-consistency condition argument described in Desai and Fisher (2007).

We assume that mutations have a fixed effect s much larger than the mutation rate $μ_{B}$ at which they arise. First, we consider the dynamics of the fittest sub-population that becomes established at the nose of the fitness wave. We can estimate the size of the sub-population when established from the establishment probability of a single fittest mutant;

N_{est} \sim \frac{1}{P_{fix} (q s)},

(H.1)

where $q s$ ( $q \in N$ ) is the fitness lead of the sub-population compared with the mean of the whole population, and the fixation probability is given by Equation 42, $P_{fix} \sim {(q s)}^{\frac{1}{α - 1}}$ . In the time this sub-population is seeded and becomes established, the mean fitness should increase by s. This implies that, after its establishment, this sub-population will initially grow exponentially at rate $(q - 1) s$ . The growth rate will slow down to 0 when it fixes. Therefore, the time from establishment to fixation can be estimated as

t_{fix} \sim \frac{1}{(q - 1) s / 2} \ln \frac{N}{N_{est}} = \frac{1}{(q - 1) s / 2} \ln N P_{fix} (q s)

(H.2)

where $(q - 1) s / 2$ is its average growth rate between the establishment and fixation. Thus, the rate of adaptation is given by

R = \frac{(q - 1) s}{t_{fix}} \sim \frac{{((q - 1) s)}^{2}}{2 \ln N P_{fix} (q s)} .

(H.3)

Second, we focus on successive events of establishments at the edge of the fitness wave. We define t_est as the mean time interval between two successive establishments. An established sub-population grows like $n (t) \sim N_{est} e^{(q - 1) s t}$ , from which the next event of establishment is produced with rate $n (t) μ_{B} P_{fix} (q s)$ . Therefore, t_est can be estimated from

μ_{B} P_{fix} (q s) \int_{0}^{t_{est}} n (t) d t \approx 1,

(H.4)

which leads to $t_{est} \sim \frac{1}{(q - 1) s} \ln [\frac{s}{μ_{B}}]$ . Since the nose of the fitness wave advances at a speed $R = \frac{s}{t_{est}}$ , we have

R = \frac{s}{t_{est}} \sim \frac{(q - 1) s^{2}}{\ln \frac{s}{μ_{B}}} .

(H.5)

By comparing Equations H.3 and H.5, we obtain

q \sim 1 + \frac{2 \ln (N P_{fix} (q s))}{\ln \frac{s}{μ_{B}}}, R \sim \frac{2 s^{2} \ln (N P_{fix} (q s))}{{(\ln \frac{s}{μ_{B}})}^{2}} .

(H.6)

By substituting $P_{fix} \sim {(q s)}^{\frac{1}{α - 1}}$ into Equation H.6, we obtain

q \sim 1 + \frac{2 \ln (N s^{\frac{1}{α - 1}})}{\ln \frac{s}{μ_{B}}}, R \sim \frac{2 s^{2} \ln (N s^{\frac{1}{α - 1}})}{{(\ln \frac{s}{μ_{B}})}^{2}},

(H.7)

where we used $\ln N q^{\frac{1}{α - 1}} \approx \ln N$ . In the limit $α \to 2$ , the above results reproduce those in Desai and Fisher (2007).

The case of α = 1 can be discussed in a similar way. Suppose that the population is monoclonal. The fixation probability of a mutant is given by $P_{fix} \sim N^{- 1 + s}$ (see Equation A.25), which implies that the establishment size is roughly given by $N_{est} \sim N^{1 - s}$ . While the timescale of establishment of a mutant is given by ${(μ_{B} N P_{fix})}^{- 1} = {(μ_{B} N^{s})}^{- 1}$ , the timescale of fixation is given by $t_{fix} \sim \frac{1}{s} log \frac{N}{N_{est}} \sim log N$ . Thus, the successive selection sweeps occur if ${(μ_{B} N^{s})}^{- 1} ≫ log N$ , or equivalently,

μ_{B} N^{s} log N ≪ 1 (successive selective sweeps) .

(H.8)

By substituting $P_{fix} \sim N^{- 1 + s}$ into Equation H.6, the rate of adaptation in the clonal-interference regime is given by

R \sim \frac{2 s^{3} \ln N}{{(\ln \frac{s}{μ_{B}})}^{2}} .

(H.9)

In the successive-sweeps regime, the adaptation rate is given by

R = s μ_{B} N \times P_{fix} (s) \sim s μ_{B} N^{s} .

(H.10)

Note that clonal interference becomes unlikely to occur as the offspring distribution becomes broader. For example, when α = 1, the population size needs to be $N ≫ 10^{41}$ for $μ_{B} = 10^{- 4}, s = 0.05$ to satisfy $μ_{B} N^{s} log N ≫ 1$ .

Figure H1 shows the numerical results of the adaptation rate R versus the selection coefficient s. The parameters used in the simulation are in the regime of clonal interference. When $1 < α$ , R is approximately proportional to s², while, when α = 1, R is approximately proportional to s³, which are consistent with Equations H.6 and H.9. However, when α = 1, the quantitative agreement between the numerical result and the theoretical prediction is not good, and a further investigation is needed to validate Equation H.9.

Appendix I: Numerical simulations

Simulations are implemented in C++ with the GNU scientific library’s random number generators. Results obtained from the simulations are analyzed by Mathematica. The codes are freely available upon request.

Numerical synthesis of Pareto random variables and α-stable distribution

In order to generate the mutant frequency of the gamete pool, we need to compute the sums of random Pareto variables,

M = \sum_{i = 1}^{N x} u_{i}, W = \sum_{i = 1}^{N (1 - x)} v_{i},

(I.1)

where u_i, v_i are drawn from the Pareto distribution $P_{U} (u) = α / u^{α + 1} (u \geq 1)$ . One simple way to synthesize u_i, v_i is to sample a number r from the uniform distribution on (0, 1) and compute $r^{- \frac{1}{α}}$ .

To generate the sums M, N efficiently for large N (e.g., $N \sim 10^{6}$ ), we can use the generalized central limit theorem when xN and $(1 - x) N$ are large. In simulations, when xN < 100, M is generated directly by synthesizing $x N$ random variables ${u_{i}}$ , while, when $x N \geq 100$ , M is generated by sampling a random number ζ from the α-stable distribution and then determining $M = \sum_{i} u_{i}$ from Equation C.1. W is generated in a similar way.

After generating M and W, the population is updated by the binomial sampling with the success probability $p = \frac{M}{M + W}$ (although this sampling process can be omitted when $α \geq 2$ since the fluctuations associated with the binomial sampling is negligible compared to the fluctuations associated with M and N). Natural selection and mutations are implemented by modifying the success probability $p = \frac{M}{M + W}$ as

\frac{p (1 + s)}{p (1 + s) + (1 - p)} (1 - μ_{M \to W}) + \frac{(1 - p)}{p (1 + s) + (1 - p)} μ_{W \to M},

(I.2)

where $μ_{W \to M}$ is the mutation rate from the wild-type to the mutant allele, and $μ_{M \to W}$ is the mutation rate in the reverse direction.

Site frequency spectrum

Since the SFS is proportional to the mean sojourn time, the SFS can be computed numerically by generating trajectories staring with $x_{0} = \frac{1}{N}$ until fixation or extinction and measuring how many times a trajectory visits a given frequency interval on average.

Numerical simulation of the model of range expansion in the main text

We first review the numerical implementation of the range expansion model with two neutral alleles without mutations (Birzu et al. 2018). The per capita growth rate r(n) with an Allee effect is given by

r (n) = r_{0} (1 - \frac{n}{K}) (1 + B \frac{n}{K}),

(I.3)

where $n = n_{1} + n_{2}$ is the sum of the two population densities, and B is the strength of cooperativity. In each deme, there are three types; allele 1, allele 2, and “empty.” At each time step, the configuration of deme x is updated by the trinomial sampling process with

p_{i} = \frac{{\tilde{n}}_{i}}{K (1 - r (\tilde{n}) τ)} for i = 1, 2 and p_{empty} = 1 - p_{1} - p_{2},

(I.4)

where ${\tilde{n}}_{i}$ is the population density after migration,

{\tilde{n}}_{i} (t, x) = \frac{m}{2} n_{i} (t, x - a) + (1 - m) n_{i} (t, x) + \frac{m}{2} n_{i} (t, x + a),

(I.5)

and $\tilde{n}$ in the denominator of Equation I.4 is the sum of these densities, $\tilde{n} = {\tilde{n}}_{1} + {\tilde{n}}_{2}$ , and a denotes the width of a deme. The expectation value of the total density n after one time step is given by

K \sum_{i = 1, 2} p_{i} = \frac{\tilde{n}}{1 - r (\tilde{n}) τ} \approx \tilde{n} (1 + r (\tilde{n}) τ),

(I.6)

which explains the denominator of Equation I.4. In the simulation, a = 1 and τ = 1 are used.

As in the standard Wright–Fisher model, a mutation process can be introduced by using the success probabilities $p' = {(p'_{1}, p'_{2})}^{T}$ given by

p' = U p,

(I.7)

where $p = {(p_{1}, p_{2})}^{T}$ and U is a matrix representing mutational transitions. In the case of symmetrical mutations in the main text, U is given by

U = (\begin{matrix} 1 - μ & μ \\ μ & 1 - μ \end{matrix}),

(I.8)

This model serves as a microscopic description of our (non-spatial) macroscopic model of the population with a broad offspring distribution $p (U = u) \sim \frac{1}{u^{α + 1}}$ . We can argue the relation between the parameters in the two models by comparing the coalescent timescales. As established in Birzu et al. (2018), for a semi-pushed wave ( $2 < B < 4$ ), the coalescent timescale is given by

T_{c}^{micro} \sim N^{2 \frac{\sqrt{1 - γ {(B)}^{2}}}{1 - \sqrt{1 - γ {(B)}^{2}}}} .

(I.9)

where $γ (B) = \frac{v_{F}}{v} = 2 {(\sqrt{\frac{B}{2}} + \sqrt{\frac{2}{B}})}^{- 1}$ is the ratio of the Fisher velocity $v_{F} = 2 \sqrt{D r_{0}}$ to the wave velocity $v = \sqrt{r_{0} D} (\sqrt{\frac{B}{2}} + \sqrt{\frac{2}{B}})$ . On the other hand, the coalescent timescale $T_{c}^{macro}$ in the macroscopic description for $1 < α < 2$ is proportional to $N^{α - 1}$ (see Equation 15). By comparing the exponents, a semi-pushed wave with B corresponds to the macroscopic model with⁴

α = 2 \frac{\sqrt{1 - γ {(B)}^{2}}}{1 - \sqrt{1 - γ {(B)}^{2}}} + 1.

(I.10)

For example, B = 3 corresponds to $α = 1.5$ . In addition, the mutation rate $μ_{micro}$ per generation in the microscopic model and the mutation rate $μ_{macro}$ per generation in the macroscopic model should be related by $μ_{micro} \times T_{c}^{micro} \sim μ_{macro} \times T_{c}^{macro}$ .

In the three panels (Left. Center, Right) in Figure 14B of the main text, The following parameters are used.

Left: $B = 1, μ = (5 \times 10^{- 4}, 5 \times 10^{- 5}), K = 28000$ for the microscopic model, and $α = 1, θ = (1.5, 0.15)$ for the macroscopic model.
Center: $B = 3, μ = (2 \times 10^{- 4}, 2 \times 10^{- 5}), K = 35000$ for the microscopic model, and $α = 1.5, θ = (1.6, 0.16)$ for the macroscopic model.
Right: $B = 8, μ = (1 \times 10^{- 5}, 1 \times 10^{- 6}), K = 57000$ for the microscopic model, and the Wright–Fisher model, $θ = (2.4, 0.24)$ for the macroscopic model.

In all of the three cases, the growth rate $\frac{r_{0}}{τ} = 0.01$ and the migration probability $m = \frac{2 D τ}{a^{2}} = 0.125$ are used in the microscopic model, and the population size $N = 10^{5}$ is used in the macroscopic model. Note that, to compare the microscopic model with the macroscopic model, the value of the carrying capacity K for each case is chosen such that the size of the front population $\frac{K}{k}$ , where k is the spatial decay rate of the population density,⁵ approximately agrees with the population size $N = 10^{5}$ in the macroscopic model.

Appendix J: Areas swept by trajectories

J-1: A scaling argument on area distributions

Consider frequency trajectories that depart from a single mutant $x_{0} = \frac{1}{N}$ and are eventually absorbed either at x = 0 or at x = 1. For each of such trajectories, we can define the area in $x - τ$ -space swept by the trajectory (see Figure J1),

A = \int_{0}^{τ_{abs}} x (τ) d τ,

(J.1)

where $τ_{abs}$ is the absorption time of the trajectory. While this quantity is defined for a population without spatial structure, we expect that it has a natural interpretation in a model of range expansion as a spatial integration over the mutant frequency (i.e., the abundance of the mutant type), since τ in Equation J.1 is related with the spatial position of the traveling wave in the comoving frame.

Figure J1 — An area A swept by a trajectory that eventually goes extinct and an area $A'$ swept by a trajectories that eventually gets fixed are illustrated. $τ_{abs}$ and $τ'_{abs}$ are the extinction time and the fixation time, respectively.

Here, we examine how the area A defined in Equation J.1 depends on the exponent α of the offspring distribution. The left panel of Figure J2 shows the numerical results of the area distribution p(A) for $α = 1, 1.5$ , and the Wright–Fisher model (corresponding to $α \geq 2$ ). In a wide range of A, areas are distributed according to $p (A) \sim \frac{1}{N A^{1 + \frac{1}{α}}}$ .

Focusing on small areas, which correspond to extinct trajectories, this power-law behavior can be rationalized again from a scaling argument: First, by using Equation 3, a trajectory whose maximum frequency is $x_{*} ≪ 1$ sweeps an area roughly given by $A \sim x_{*} \times τ_{ext} \sim x_{*}^{α}$ (see Figure J1), i.e., $x_{*} \sim A^{\frac{1}{α}}$ . Second, from the neutrality, the cumulative probability $Pr (X_{*} > x_{*})$ that a single mutant achieves a frequency larger than $x_{*}$ before absorption is estimated as $Pr (X_{*} > x_{*}) \sim \frac{1}{N x_{*}}$ . Hence, the density $p (x_{*})$ is given by $p (x_{*}) = - \frac{d}{d x_{*}} Pr (X_{*} > x_{*}) \sim \frac{1}{N x_{*}^{2}}$ . Combining these two results, we can estimate the area distribution p(A) as

p (A) \sim p (x_{*}) \frac{d x_{*}}{d A} |_{x^{*} = A^{\frac{1}{α}}} \sim \frac{1}{N x_{*}^{2}} {(x_{*})}^{- α + 1} |_{x^{*} = A^{\frac{1}{α}}} \sim \frac{1}{N} A^{- 1 - \frac{1}{α}} .

(J.2)

When $α \to 2 - 0$ (the Wright–Fisher limit), the distribution becomes $p (A) \sim \frac{1}{N} A^{- \frac{3}{2}}$ , which can be analytically confirmed by solving a backward diffusion equation of the Wright–Fisher diffusion (see Appendix J-2).

The numerical results indicate that, when $1 \leq α < 2$ , there is an uptick in the area distribution p(A), which comes from fixed trajectories (see the case of α = 1 in the right panel of Figure J2). The uptick becomes less pronounced as α increases. For the Wright–Fisher model, we can analytically prove that p(A) monotonically decreases with A.

J-2: Area distribution in the Wright–Fisher model

Here, we derive an analytical result of Equation J.1 for the Wright–Fisher diffusion process.

Consider a Langevin equation

\frac{d X}{d τ} = v (X) + ξ (τ),

(J.3)

with $〈 ξ (τ) ξ (τ') 〉 = 2 D (x) δ (τ - τ')$ . Assume the initial value $X (τ = 0) = x_{0} \in (0, 1)$ and the absorbing boundaries at X = 0, 1. For a given trajectory departing from x₀ and ending at either one of the boundaries, we consider the “area” defined by

A = \int_{0}^{τ_{abs}} X (τ) d τ .

(J.4)

where $τ_{abs}$ is the absorption time.

The area distribution $Π (A; x_{0})$ for a given initial condition $X (0) = x_{0}$ obeys a backward equation. To show this, we discretize the dynamics;

Δ X = v h + W

(J.5)

where h denotes a short time interval and $〈 W_{i} W_{j} 〉 = 2 D h δ_{i, j}$ . The transition density is given by

T (x_{0} + Δ x | x_{0}) = \frac{1}{\sqrt{π (2 D (x_{0}) h)}} exp (- \frac{{(Δ x - v (x_{0}) h)}^{2}}{2 (2 D (x_{0}) h)}) .

(J.6)

Note that

\begin{matrix} {〈 Δ x 〉}_{x_{0}} = v (x_{0}) h, \\ {〈 {(Δ x)}^{2} 〉}_{x_{0}} = v (x_{0}) h + 2 D (x_{0}) h . \end{matrix}

(J.7)

By separating a trajectory into the initial step and the remaining part, we have

Π (A; x_{0}) = \int d (Δ x) T (x_{0} + Δ x | x_{0}) Π (A - x_{0} h; x_{0} + Δ x) + o (h),

(J.8)

By Taylor-expanding $Π (A - x_{0} h; x_{0} + Δ x)$ , we have

\begin{matrix} Π (A - x_{0} h; x_{0} + Δ x) = Π (A; x_{0}) - \frac{\partial Π}{\partial A} x_{0} h + \frac{\partial Π}{\partial x_{0}} Δ x \\ + \frac{1}{2} \frac{\partial^{2} Π}{\partial A^{2}} x_{0} h^{2} - \frac{\partial^{2} Π}{\partial A \partial x_{0}} x_{0} h Δ x + \frac{1}{2} \frac{\partial Π}{\partial x_{0}} Δ x^{2} + \dots \\ = Π (A; x_{0}) - \frac{\partial Π}{\partial A} x_{0} h + \frac{\partial Π}{\partial x_{0}} Δ x + \frac{1}{2} \frac{\partial Π}{\partial x_{0}} Δ x^{2} + o (h) . \end{matrix}

(J.9)

Therefore, Equation J.8 becomes

Π (A; x_{0}) = Π (A; x_{0}) - \frac{\partial Π}{\partial A} x_{0} h + \frac{\partial Π}{\partial x_{0}} {〈 Δ x 〉}_{x_{0}} + \frac{1}{2} \frac{\partial^{2} Π}{\partial x_{0}^{2}} {〈 Δ x^{2} 〉}_{x_{0}} + o (h) .

(J.10)

By using Equation J.7, we obtain

x_{0} \frac{\partial Π}{\partial A} = v (x_{0}) \frac{\partial Π}{\partial x_{0}} + D (x_{0}) \frac{\partial^{2} Π}{\partial x_{0}^{2}} .

(J.11)

More generally, it can be shown that, for the following integral,

\tilde{A} = \int_{0}^{T^{*}} d t f (X),

(J.12)

the distribution $Π (\tilde{A}; x_{0})$ satisfies

f (x_{0}) \frac{\partial Π}{\partial \tilde{A}} = v (x_{0}) \frac{\partial Π}{\partial x_{0}} + D (x_{0}) \frac{\partial^{2} Π}{\partial x_{0}^{2}} .

(J.13)

In the neutral Wright–Fisher model, $v (x_{0}) = 0$ and $D (x_{0}) = x_{0} (1 - x_{0})$ . The backward equation in Equation J.11 is given by

\frac{\partial Π}{\partial A} = (1 - x_{0}) \frac{\partial^{2} Π}{\partial x_{0}^{2}} .

(J.14)

From this equation, it follows that $Π (A | x_{0})$ monotonically decreases with A₀ because the spectrum of the operator $\partial_{x_{0}}^{2} \sim {(i k)}^{2}$ is non-positive.

We can determine the area distribution p(A) analytically at least for small A. We are interested in the invasion by a single mutant, $x_{0} = \frac{1}{N} ≪ 1$ . Furthermore, for the purpose of determining the behavior for small areas, we expect that we can ignore the presence of the high-frequency boundary x = 1 and solve the problem on the semi-infinite line $x_{0} \in (0, \infty)$ . Therefore, we consider the following problem:

\begin{matrix} \frac{\partial Π}{\partial A} = \frac{\partial^{2} Π}{\partial x_{0}^{2}}, \\ Π (A; x_{0} = 0) = g (A) \\ Π (A = 0; x_{0}) = 0 f o r x_{0} > 0 \\ lim_{x_{0} \to \infty} Π (A, x_{0}) = 0 \end{matrix}

(J.15)

In our case, $g (A) = δ (A)$ , because the trajectory starting from $x_{0} = 0$ has A = 0.

For a function f(A) of A, we write the Laplace transformation as

\hat{f} (s) = ℓ [f (A)] = \int_{0}^{\infty} d s f (t) e^{- s A} .

(J.16)

By taking the Laplace transform with respect to A, we have

s \hat{Π} (s; x_{0}) = \frac{\partial^{2} \hat{Π} (s, x_{0})}{\partial x_{0}^{2}}, \hat{Π} (s, 0) = \hat{g} (s) .

(J.17)

The solution is

\hat{Π} (s; x) = e^{- x_{0} \sqrt{s}} \hat{g} (s) .

(J.18)

We take the inverse of the Laplace transformation,

Π (A; x_{0}) = ℓ^{- 1} (e^{- x_{0} \sqrt{s}} \hat{g} (s)) .

(J.19)

From the convolution theorem, this is given by the convolution of $ℓ^{- 1} (e^{- x_{0} \sqrt{s}}) = \frac{x_{0}}{2 \sqrt{π} A^{\frac{3}{2}}} e^{- \frac{x_{0}^{2}}{4 A}}$ and g(A);

Π (A; x_{0}) = \int_{0}^{A} d A' \frac{x_{0}}{2 \sqrt{π} A'^{\frac{3}{2}}} e^{- \frac{x_{0}^{2}}{4 A'}} g (A - A') .

(J.20)

When $g (A) = δ (A)$ , we have

Π (A; x_{0}) = \frac{x_{0}}{2 \sqrt{π} A^{\frac{3}{2}}} e^{- \frac{x_{0}^{2}}{4 A}} .

(J.21)

Especially, when $x_{0} = 1 / N$ , we have

\begin{matrix} Π (A; x_{0} = \frac{1}{N}) = \frac{1}{2 \sqrt{π} N A^{\frac{3}{2}}} e^{- \frac{1}{4 A N^{2}}} \\ \approx \frac{1}{2 \sqrt{π} N A^{\frac{3}{2}}}, \end{matrix}

(J.22)

where we have used $e^{- \frac{1}{4 A N^{2}}} \approx 1$ since only areas larger than $x_{0} \times d τ \sim \frac{1}{N} \times \frac{1}{N} = \frac{1}{N^{2}}$ are meaningful for a finite-size population.

Appendix K: Forward-in-time behaviors of the Eldon–Wakeley model

Here, we present simulation results of the median allele frequency and the median and mean square displacements in the Eldon–Wakeley model (Eldon and Wakeley 2006) (see also Der et al. 2012). As shown below, unlike our model, these quantities do not exhibit sustained power-law behaviors, because of the existence of a characteristic size ψ in the offspring distribution.

We consider the neutral Eldon–Wakeley model, where the following offspring distribution $P_{U} (u)$ is given by [see Equation (7) in Eldon and Wakeley (2006)];

P_{U} (u) = (1 - N^{- γ}) δ_{u, 2} + N^{- γ} δ_{u, ψ N},

(K.1)

where $δ_{a, b}$ is the Kronecker delta. $ψ \in (0, 1)$ and the parameters characterizing how large and frequent ‘sweepstakes’ are.

The limiting process as $N \to \infty$ depends on γ [see Equation (9) in Der et al. (2012)]. For $γ > 2$ , the process is the same as the Wright–Fisher diffusion, while, for $γ < 2$ , it is described by a jump process whose backward-time generator $ℓ^{†}$ is given by

L^{†} P (x, τ) = x P (x + ψ (1 - x), τ) - P (x, τ) + (1 - x) P (x - ψ x, τ),

(K.2)

where the continuous time τ is related with generations t by $τ = t / N^{γ}$ . The first term of the generator represents a frequency-increasing jump $x \to x + ψ (1 - x)$ with rate x, while the last one represents a frequency-decreasing jump $x \to x - ψ x$ with rate $1 - x$ .

Figure K1 shows numerical simulation results for the median of allele frequencies and the median/mean square displacements. The median frequency for a small initial frequency $x_{0} ≪ 1$ is well described by $X^{med} (t) = x_{0} e^{- ψ N^{- γ} t}$ (Figure K1A). This exponential decay can be expected from the generator in Equation K.2; for $x ≪ 1$ , frequency-increasing jumps (with rate x) are unlikely to occur, and an allele frequency typically decreases by $- ψ x$ with rate $1 - x \approx 1$ . Thus, the median frequency in the Eldon–Wakeley model does not exhibit a power-law behavior.

As for frequency fluctuations, while the mean SD exhibits a normal diffusion as in the Moran (or the Wright–Fisher) model, i.e., $Mean SD \propto t$ , the median SD does not exhibit a sustained power-law behavior (Figure K1B); in a short- and long-time scales, the median SD exhibits a normal diffusion ( $Median SD \propto t$ ), but, for an intermediate timescale ( $t \sim 500 - 1000$ generations in the figure), it increases more rapidly than expected from a normal diffusion.

Footnotes

$η^{*}$ is obtained from $0 = f' (η^{*}) = - \frac{1}{η^{*}} - log (ϵ) + \frac{π}{tan π η^{*}} \approx - log (ϵ) + \frac{π}{tan π η^{*}} \approx - log (ϵ) + \frac{1}{1 - η^{*}} .$

Although the magnitudes of –1 and $log \frac{- π}{log ϵ}$ are small compared to $- log ϵ$ , we need to retain these two terms because $f (η^{*})$ contributes to $I_{ϵ}$ through $e^{f (η^{*})}$ .

For example, one of the integrals over $Δ > 0$ is

$\int_{Δ > 0} d Δ f_{Δ} (x) P (x) = \int_{Δ > 0} d Δ \frac{x}{Δ^{α + 1}} τ^{- 2 η} g (ξ) = τ^{- \frac{α + 1}{α - 1}} \int_{δ > 0} d δ \frac{ξ}{δ^{α + 1}} g (ξ),$

while one of the integrals over $Δ < 0$ is

$\int_{Δ < 0} d Δ f_{Δ} (x) P (x) = \int_{Δ < 0} d Δ \frac{x {(x + Δ)}^{α - 1}}{Δ^{α + 1}} τ^{- 2 η} g (ξ) = τ^{- \frac{2}{α - 1}} \int_{δ > 0} d δ \frac{ξ}{δ^{α + 1}} g (ξ),$

where we have changed the integration variable from Δ to $δ = \frac{Δ}{τ^{η}}$ .

⁴

Note that the definition of the parameter α_H in Birzu et al. (2018) is different from our definition of α. For $1 < α < 2$ , which corresponds to the semi-pushed wave region $- 1 < α_{H} < 0$ , the two definitions are related by $- α_{H} = α - 1$ .

⁵

Specifically, the rate k is given by $k = \sqrt{\frac{r_{0}}{D}}$ for $0 < B < 2$ and by $k = \sqrt{\frac{r_{0} B}{2 D}}$ for $B \geq 2$ (Birzu et al. 2018).

Literature cited

Adam DC, Wu P, Wong JY, Lau EH, Tsang TK, et al. 2020. Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong. Nat Med. 26:1714–1719. [DOI] [PubMed] [Google Scholar]
Bah B, Pardoux E.. 2015. The Λ-lookdown model with selection. Stoch Process Appl. 125:1089–1126. [Google Scholar]
Barton NH, Etheridge AM.. 2011. The relation between reproductive value and genetic contribution. Genetics. 188:953–973. [DOI] [PMC free article] [PubMed] [Google Scholar]
Basdevant A, Goldschmidt C.. 2008. Asymptotics of the allele frequency spectrum associated with the Bolthausen–Sznitman coalescent. Electron J Probab. 13:486–512. [Google Scholar]
Berestycki J, Berestycki N, Limic V.. 2014. Asymptotic sampling formulae for Λ-coalescents. Ann IHP Prob Stat. 50:715–731. [Google Scholar]
Berg JJ, Coop G.. 2014. A population genetic signal of polygenic adaptation. PLoS Genet. 10:e1004412. [DOI] [PMC free article] [PubMed] [Google Scholar]
Birzu G, Hallatschek O, Korolev KS.. 2018. Fluctuations uncover a distinct class of traveling waves. Proc Natl Acad Sci USA. 115:E3645–E3654. [DOI] [PMC free article] [PubMed] [Google Scholar]
Birzu G, Hallatschek O, Korolev KS.. 2021. Genealogical structure changes as range expansions transition from pushed to pulled. Proc Natl Acad Sci. 118:34. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bollback JP, York TL, Nielsen R.. 2008. Estimation of 2Nes from temporal allele frequency data. Genetics. 179:497–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bolthausen E, Sznitman A.-S.. 1998. On Ruelle’s probability cascades and an abstract cavity method. Commun Math Phys. 197:247–276. [Google Scholar]
Brunet É, Derrida B, Mueller AH, Munier S.. 2007. Effect of selection on ancestry: an exactly soluble case and its phenomenological generalization. Phys Rev E Stat Nonlin Soft Matter Phys. 76:041104. [DOI] [PubMed] [Google Scholar]
Cannings C. 1974. The latent roots of certain Markov chains arising in genetics: a new approach, I. Haploid models. Adv Appl Prob. 6:260–290. [Google Scholar]
Crow JF, Kimura M.. 1970. An Introduction to Population Genetics Theory. New York, Evanston and London: Harper & Row, Publishers. [Google Scholar]
Cvijović I, Good BH, Desai MM.. 2018. The effect of strong purifying selection on genetic diversity. Genetics. 209:1235–1278. [DOI] [PMC free article] [PubMed] [Google Scholar]
Der R, Epstein C, Plotkin JB.. 2012. Dynamics of neutral and selected alleles when the offspring distribution is skewed. Genetics. 191:1331–1344. [DOI] [PMC free article] [PubMed] [Google Scholar]
Der R, Plotkin JB.. 2014. The equilibrium allele frequency distribution for a population with reproductive skew. Genetics. 196:1199–1216. [DOI] [PMC free article] [PubMed] [Google Scholar]
Desai MM, Fisher DS.. 2007. Beneficial mutation–selection balance and the effect of linkage on positive selection. Genetics. 176:1759–1798. [DOI] [PMC free article] [PubMed] [Google Scholar]
Desai MM, Walczak AM, Fisher DS.. 2013. Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics. 193:565–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
Eldon B. 2009. Structured coalescent processes from a modified Moran model with large offspring numbers. Theor Popul Biol. 76:92–104. [DOI] [PubMed] [Google Scholar]
Eldon B. 2011. Estimation of parameters in large offspring number models and ratios of coalescence times. Theor Popul Biol. 80:16–28. [DOI] [PubMed] [Google Scholar]
Eldon B, Wakeley J.. 2006. Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics. 172:2621–2633. [DOI] [PMC free article] [PubMed] [Google Scholar]
Etheridge AM, Griffiths RC, Taylor JE.. 2010. A coalescent dual process in a Moran model with genic selection, and the lambda coalescent limit. Theor Popul Biol. 78:77–92. [DOI] [PubMed] [Google Scholar]
Evans SN, Shvets Y, Slatkin M.. 2007. Non-equilibrium theory of the allele frequency spectrum. Theor Popul Biol. 71:109–119. [DOI] [PubMed] [Google Scholar]
Ewens WJ. 1963. The diffusion equation and a pseudo-distribution in genetics. J R Stat Soc Series B Methodol. 25:405–412. [Google Scholar]
Feder AF, Kryazhimskiy S, Plotkin JB.. 2014. Identifying signatures of selection in genetic time series. Genetics. 196:509–522. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fisher R. 1930. The Genetical Theory of Natural Selection. London, UK: Oxford University Press. [Google Scholar]
Foll M, Shim H, Jensen JD.. 2015. WFABC: a Wright–Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Mol Ecol Resour. 15:87–98. [DOI] [PubMed] [Google Scholar]
Fusco D, Gralka M, Kayser J, Anderson A, Hallatschek O.. 2016. Excess of mutational jackpot events in expanding populations revealed by spatial Luria–Delbrück experiments. Nat Commun. 7:12760. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gardiner C. 2009. Stochastic Methods, Vol. 4. Berlin: Springer. [Google Scholar]
Gnedenko BV, Kolmogorov A.. 1968. Limit Distributions for Sums of Independent Random Variables, Vol. 233. MA: Addison-wesley Reading. [Google Scholar]
Griffiths RC. 2014. The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Adv Appl Prob. 46:1009–1035. [Google Scholar]
Hallatschek O. 2018. Selection-like biases emerge in population models with recurrent jackpot events. Genetics. 210:1053–1073. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hallatschek O, Nelson DR.. 2008. Gene surfing in expanding populations. Theor Popul Biol. 73:158–170. [DOI] [PubMed] [Google Scholar]
Hedgecock D. 1994. Does variance in reproductive success limit effective population sizes of marine organisms. Genet Evol Aquat Organ. 122:122–134. [Google Scholar]
Karlin S, Taylor HE.. 1981. A Second Course in Stochastic Processes. New York: Academic Press. [Google Scholar]
Kimura M. 1955. Stochastic Processes and Distribution of Gene Frequencies under Natural Selection. Cold Spring Harbor Symp Quant Biol. 20:57–66. [DOI] [PubMed] [Google Scholar]
Kimura M. 1969. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics. 61:893–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kosheleva K, Desai MM.. 2013. The dynamics of genetic draft in rapidly adapting populations. Genetics. 195:1007–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
Krapivsky PL, Redner S, Ben-Naim E.. 2010. A Kinetic View of Statistical Physics. New York: Cambridge University Press. [Google Scholar]
Laxminarayan R, Wahl B, Dudala SR, Gopal K, Neelima S, Reddy KJ, et al. 2020. Epidemiology and transmission dynamics of COVID-19 in two Indian states. Science. 370:691–697. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM.. 2005. Superspreading and the effect of individual variation on disease emergence. Nature. 438:355–359. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luria SE, Delbrück M.. 1943. Mutations of bacteria from virus sensitivity to virus resistance. Genetics. 28:491–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
Neher RA, Hallatschek O.. 2013. Genealogies of rapidly adapting populations. Proc Natl Acad Sci USA. 110:437–442. [DOI] [PMC free article] [PubMed] [Google Scholar]
Neher RA, Shraiman BI.. 2011. Genetic draft and quasi-neutrality in large facultatively sexual populations. Genetics. 188:975–996. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sackman AM, Harris RB, Jensen JD.. 2019. Inferring demography and selection in organisms characterized by skewed offspring distributions. Genetics. 211:1019–301684. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schraiber JG, Evans SN, Slatkin M.. 2016. Bayesian inference of natural selection from allele frequency time series. Genetics. 203:493–511. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schweinsberg J. 2003a. Coalescent processes obtained from supercritical Galton–Watson processes. Stoch Process Appl. 106:107–139. [Google Scholar]
Schweinsberg J. 2003b. Coalescent processes obtained from supercritical Galton–Watson processes. Stoch Process Appl. 106:107–139. [Google Scholar]
Schweinsberg J. 2017. Rigorous results for a population model with selection II: genealogy of the population. Electron J Probab. 22:1–54. [Google Scholar]
Tataru P, Simonsen M, Bataillon T, Hobolth A.. 2017. Statistical inference in the Wright–Fisher model using allele frequency data. Syst Biol. 66:e30–e46. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tellier A, Lemaire C.. 2014. Coalescence 2.0: a multiple branching of recent theoretical developments and their applications. Mol Ecol. 23:2637–2652. [DOI] [PubMed] [Google Scholar]
Uchaikin VV, Zolotarev VM.. 1999. Chance and Stability: Stable Distributions and Their Applications. UtrechtVSP. [Google Scholar]
Weissman DB, Desai MM, Fisher DS, Feldman MW.. 2009. The rate at which asexual populations cross fitness valleys. Theor Popul Biol. 75:286–300. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wright S. 1931. Evolution in Mendelian populations. Genetics. 16:97–159. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

[iyab135-B1] Adam DC, Wu P, Wong JY, Lau EH, Tsang TK, et al. 2020. Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong. Nat Med. 26:1714–1719. [DOI] [PubMed] [Google Scholar]

[iyab135-B2] Bah B, Pardoux E.. 2015. The Λ-lookdown model with selection. Stoch Process Appl. 125:1089–1126. [Google Scholar]

[iyab135-B3] Barton NH, Etheridge AM.. 2011. The relation between reproductive value and genetic contribution. Genetics. 188:953–973. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B4] Basdevant A, Goldschmidt C.. 2008. Asymptotics of the allele frequency spectrum associated with the Bolthausen–Sznitman coalescent. Electron J Probab. 13:486–512. [Google Scholar]

[iyab135-B5] Berestycki J, Berestycki N, Limic V.. 2014. Asymptotic sampling formulae for Λ-coalescents. Ann IHP Prob Stat. 50:715–731. [Google Scholar]

[iyab135-B6] Berg JJ, Coop G.. 2014. A population genetic signal of polygenic adaptation. PLoS Genet. 10:e1004412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B7] Birzu G, Hallatschek O, Korolev KS.. 2018. Fluctuations uncover a distinct class of traveling waves. Proc Natl Acad Sci USA. 115:E3645–E3654. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B8] Birzu G, Hallatschek O, Korolev KS.. 2021. Genealogical structure changes as range expansions transition from pushed to pulled. Proc Natl Acad Sci. 118:34. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B9] Bollback JP, York TL, Nielsen R.. 2008. Estimation of 2Nes from temporal allele frequency data. Genetics. 179:497–502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B10] Bolthausen E, Sznitman A.-S.. 1998. On Ruelle’s probability cascades and an abstract cavity method. Commun Math Phys. 197:247–276. [Google Scholar]

[iyab135-B11] Brunet É, Derrida B, Mueller AH, Munier S.. 2007. Effect of selection on ancestry: an exactly soluble case and its phenomenological generalization. Phys Rev E Stat Nonlin Soft Matter Phys. 76:041104. [DOI] [PubMed] [Google Scholar]

[iyab135-B12] Cannings C. 1974. The latent roots of certain Markov chains arising in genetics: a new approach, I. Haploid models. Adv Appl Prob. 6:260–290. [Google Scholar]

[iyab135-B13] Crow JF, Kimura M.. 1970. An Introduction to Population Genetics Theory. New York, Evanston and London: Harper & Row, Publishers. [Google Scholar]

[iyab135-B14] Cvijović I, Good BH, Desai MM.. 2018. The effect of strong purifying selection on genetic diversity. Genetics. 209:1235–1278. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B15] Der R, Epstein C, Plotkin JB.. 2012. Dynamics of neutral and selected alleles when the offspring distribution is skewed. Genetics. 191:1331–1344. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B16] Der R, Plotkin JB.. 2014. The equilibrium allele frequency distribution for a population with reproductive skew. Genetics. 196:1199–1216. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B17] Desai MM, Fisher DS.. 2007. Beneficial mutation–selection balance and the effect of linkage on positive selection. Genetics. 176:1759–1798. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B18] Desai MM, Walczak AM, Fisher DS.. 2013. Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics. 193:565–585. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B19] Eldon B. 2009. Structured coalescent processes from a modified Moran model with large offspring numbers. Theor Popul Biol. 76:92–104. [DOI] [PubMed] [Google Scholar]

[iyab135-B20] Eldon B. 2011. Estimation of parameters in large offspring number models and ratios of coalescence times. Theor Popul Biol. 80:16–28. [DOI] [PubMed] [Google Scholar]

[iyab135-B21] Eldon B, Wakeley J.. 2006. Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics. 172:2621–2633. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B22] Etheridge AM, Griffiths RC, Taylor JE.. 2010. A coalescent dual process in a Moran model with genic selection, and the lambda coalescent limit. Theor Popul Biol. 78:77–92. [DOI] [PubMed] [Google Scholar]

[iyab135-B23] Evans SN, Shvets Y, Slatkin M.. 2007. Non-equilibrium theory of the allele frequency spectrum. Theor Popul Biol. 71:109–119. [DOI] [PubMed] [Google Scholar]

[iyab135-B24] Ewens WJ. 1963. The diffusion equation and a pseudo-distribution in genetics. J R Stat Soc Series B Methodol. 25:405–412. [Google Scholar]

[iyab135-B25] Feder AF, Kryazhimskiy S, Plotkin JB.. 2014. Identifying signatures of selection in genetic time series. Genetics. 196:509–522. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B26] Fisher R. 1930. The Genetical Theory of Natural Selection. London, UK: Oxford University Press. [Google Scholar]

[iyab135-B27] Foll M, Shim H, Jensen JD.. 2015. WFABC: a Wright–Fisher ABC-based approach for inferring effective population sizes and selection coefficients from time-sampled data. Mol Ecol Resour. 15:87–98. [DOI] [PubMed] [Google Scholar]

[iyab135-B28] Fusco D, Gralka M, Kayser J, Anderson A, Hallatschek O.. 2016. Excess of mutational jackpot events in expanding populations revealed by spatial Luria–Delbrück experiments. Nat Commun. 7:12760. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B29] Gardiner C. 2009. Stochastic Methods, Vol. 4. Berlin: Springer. [Google Scholar]

[iyab135-B30] Gnedenko BV, Kolmogorov A.. 1968. Limit Distributions for Sums of Independent Random Variables, Vol. 233. MA: Addison-wesley Reading. [Google Scholar]

[iyab135-B31] Griffiths RC. 2014. The Λ-Fleming-Viot process and a connection with Wright-Fisher diffusion. Adv Appl Prob. 46:1009–1035. [Google Scholar]

[iyab135-B32] Hallatschek O. 2018. Selection-like biases emerge in population models with recurrent jackpot events. Genetics. 210:1053–1073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B33] Hallatschek O, Nelson DR.. 2008. Gene surfing in expanding populations. Theor Popul Biol. 73:158–170. [DOI] [PubMed] [Google Scholar]

[iyab135-B34] Hedgecock D. 1994. Does variance in reproductive success limit effective population sizes of marine organisms. Genet Evol Aquat Organ. 122:122–134. [Google Scholar]

[iyab135-B35] Karlin S, Taylor HE.. 1981. A Second Course in Stochastic Processes. New York: Academic Press. [Google Scholar]

[iyab135-B36] Kimura M. 1955. Stochastic Processes and Distribution of Gene Frequencies under Natural Selection. Cold Spring Harbor Symp Quant Biol. 20:57–66. [DOI] [PubMed] [Google Scholar]

[iyab135-B37] Kimura M. 1969. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics. 61:893–903. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B38] Kosheleva K, Desai MM.. 2013. The dynamics of genetic draft in rapidly adapting populations. Genetics. 195:1007–1025. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B39] Krapivsky PL, Redner S, Ben-Naim E.. 2010. A Kinetic View of Statistical Physics. New York: Cambridge University Press. [Google Scholar]

[iyab135-B40] Laxminarayan R, Wahl B, Dudala SR, Gopal K, Neelima S, Reddy KJ, et al. 2020. Epidemiology and transmission dynamics of COVID-19 in two Indian states. Science. 370:691–697. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B41] Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM.. 2005. Superspreading and the effect of individual variation on disease emergence. Nature. 438:355–359. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B42] Luria SE, Delbrück M.. 1943. Mutations of bacteria from virus sensitivity to virus resistance. Genetics. 28:491–511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B43] Neher RA, Hallatschek O.. 2013. Genealogies of rapidly adapting populations. Proc Natl Acad Sci USA. 110:437–442. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B44] Neher RA, Shraiman BI.. 2011. Genetic draft and quasi-neutrality in large facultatively sexual populations. Genetics. 188:975–996. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B45] Sackman AM, Harris RB, Jensen JD.. 2019. Inferring demography and selection in organisms characterized by skewed offspring distributions. Genetics. 211:1019–301684. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B46] Schraiber JG, Evans SN, Slatkin M.. 2016. Bayesian inference of natural selection from allele frequency time series. Genetics. 203:493–511. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B47] Schweinsberg J. 2003a. Coalescent processes obtained from supercritical Galton–Watson processes. Stoch Process Appl. 106:107–139. [Google Scholar]

[iyab135-B48] Schweinsberg J. 2003b. Coalescent processes obtained from supercritical Galton–Watson processes. Stoch Process Appl. 106:107–139. [Google Scholar]

[iyab135-B49] Schweinsberg J. 2017. Rigorous results for a population model with selection II: genealogy of the population. Electron J Probab. 22:1–54. [Google Scholar]

[iyab135-B50] Tataru P, Simonsen M, Bataillon T, Hobolth A.. 2017. Statistical inference in the Wright–Fisher model using allele frequency data. Syst Biol. 66:e30–e46. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B51] Tellier A, Lemaire C.. 2014. Coalescence 2.0: a multiple branching of recent theoretical developments and their applications. Mol Ecol. 23:2637–2652. [DOI] [PubMed] [Google Scholar]

[iyab135-B52] Uchaikin VV, Zolotarev VM.. 1999. Chance and Stability: Stable Distributions and Their Applications. UtrechtVSP. [Google Scholar]

[iyab135-B53] Weissman DB, Desai MM, Fisher DS, Feldman MW.. 2009. The rate at which asexual populations cross fitness valleys. Theor Popul Biol. 75:286–300. [DOI] [PMC free article] [PubMed] [Google Scholar]

[iyab135-B54] Wright S. 1931. Evolution in Mendelian populations. Genetics. 16:97–159. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Dynamic sampling bias and overdispersion induced by skewed offspring distributions

Takashi Okada

Oskar Hallatschek

Roles

Abstract

Model and methods

Model

Methods

Results

Neutral dynamics: typical trajectories and extinction time

Figure 1.

Figure 2.

Figure 3.

Allele frequency fluctuations as a signature of broad offspring distributions

Figure 4.

Fixation probability

Figure 5.

Figure 13.

Site frequency spectrum

Figure 6.

Figure 7.

Mutation-drift balance

Figure 8.

Analytical arguments

Limiting process, transition density, and time-dependent effective bias

Short-time dynamics and fluctuations

Figure 9.

The advection term arises from a sampling bias

Figure 10.

Allele frequency fluctuations are inconsistent with the Wright–Fisher diffusion

Long-time dynamics and extinction time

Figure 11.

Natural selection and fixation probability

Figure 12.

Site frequency spectrum

Bimodality of stationary distribution

Discussion

Unusual dynamics

Detecting dynamics driven by broad offspring distributions

Implications for the dynamics of adaptation

Emergence of skewed offspring distributions in models of range expansions

Figure 14.

Data availability

Funding

Conflict of interest

Acknowledgments

Appendix A: Analytic results in the marginal case α = 1

Figure A1.

Site frequency spectrum in the presence of genuine selection

Dynamics of the median of allele frequencies

Allele frequency dynamics conditioned on fixation

Appendix B: Stationary distributions of traveling wave model in the presence of natural selection

Figure B1.

Appendix C: Generalized central limit theorem

Appendix D: The transition density of an allele frequency wN(y|x) and the asymptotic dynamics for large N

Appendix E: From Lambda-Fleming-Viot Generator to differential Chapman–Kolmogorov equation

Jump density for general Λ measure

Appendix F: The transition density for the differential Chapman–Kolmogorov equation for 1<α<2

The short-time transition density

The scaling ansatz for the long-time transition density in Equation 39

Figure F1.

Appendix G: Site frequency spectra in presence of selection

Figure G1.

Appendix H: Derivation of the rate of adaptation in Equation 46 of the main text

Figure H1.

Appendix I: Numerical simulations

Numerical synthesis of Pareto random variables and α-stable distribution

Site frequency spectrum

Numerical simulation of the model of range expansion in the main text

Appendix J: Areas swept by trajectories

J-1: A scaling argument on area distributions

Figure J1.

Figure J2.

J-2: Area distribution in the Wright–Fisher model

Appendix K: Forward-in-time behaviors of the Eldon–Wakeley model

Figure K1.

Footnotes

Literature cited

Associated Data

Appendix D: The transition density of an allele frequency $w_{N} (y | x)$ and the asymptotic dynamics for large N

Appendix F: The transition density for the differential Chapman–Kolmogorov equation for $1 < α < 2$