Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

ArXiv logoLink to ArXiv
[Preprint]. 2023 May 23:arXiv:2305.14488v1. [Version 1]

Looking forwards and backwards: dynamics and genealogies of locally regulated populations

Alison M Etheridge 1, Ian Letter 2, Terence Tsui Ho Lung 3, Thomas G Kurtz 4, Peter L Ralph 5
PMCID: PMC10246084  PMID: 37292478

Abstract

We introduce a broad class of mechanistic spatial models to describe how spatially heterogeneous populations live, die, and reproduce. Individuals are represented by points of a point measure, whose birth and death rates can depend both on spatial position and local population density, defined at a location to be the convolution of the point measure with a suitable non-negative integrable kernel centred on that location. We pass to three different scaling limits: an interacting superprocess, a nonlocal partial differential equation (PDE), and a classical PDE. The classical PDE is obtained both by a two-step convergence argument, in which we first scale time and population size and pass to the nonlocal PDE, and then scale the kernel that determines local population density; and in the important special case in which the limit is a reaction-diffusion equation, directly by simultaneously scaling the kernel width, timescale and population size in our individual based model.

A novelty of our model is that we explicitly model a juvenile phase. The number of juveniles produced by an individual depends on local population density at the location of the parent; these juvenile offspring are thrown off in a (possibly heterogeneous, anisotropic) Gaussian distribution around the location of the parent; they then reach (instant) maturity with a probability that can depend on the local population density at the location at which they land. Although we only record mature individuals, a trace of this two-step description remains in our population models, resulting in novel limits in which the spatial dynamics are governed by a nonlinear diffusion.

Using a lookdown representation, we are able to retain information about genealogies relating individuals in our population and, in the case of deterministic limiting models, we use this to deduce the backwards in time motion of the ancestral lineage of an individual sampled from the population. We observe that knowing the history of the population density is not enough to determine the motion of ancestral lineages in our model. We also investigate (and contrast) the behaviour of lineages for three different deterministic models of a population expanding its range as a travelling wave: the Fisher-KPP equation, the Allen-Cahn equation, and a porous medium equation with logistic growth.

Keywords: population model, interacting superprocess, lookdown construction, porous medium equation, reaction-diffusion equation, travelling waves, genealogies, Fisher-KPP equation, Primary, Secondary

1. Introduction

As one takes a journey, long or short, the landscape changes: forests thicken or thin or change their composition; even in flat plains springtime grasslands host intergrading mosaics of different types of flowers. The aim of this paper is to introduce and study a broad class of mechanistic spatial models that might describe how spatially heterogeneous populations live, die, and reproduce. Questions that we (start to) address include: How does population density change across space and time? How might we learn about the underlying dynamics from genealogical or genetic data? And, how does genetic ancestry spread across geography when looking back through time in these populations?

Reproduction of individuals naturally leads to spatial branching process models, including branching random walk, branching Brownian motion, and the Dawson-Watanabe superprocesses. However, as a result of the branching assumption (once born, individuals behave independently of one another), a population evolving according to any of these models will either die out or grow without bound and, in so doing, can develop clumps of arbitrarily large density and extent. Our starting point here is an individual-based model of a single species in continuous space in which birth, death, and establishment may all depend on local population density as well as on spatial location, allowing for stable populations through density-dependent feedback.

Although it is often mathematically convenient to assume that individuals follow Brownian motion during their lifetime, in our model, offspring are thrown off according to some spatial distribution centred on the location of the parent and do not subsequently move. This is particularly appropriate for modelling plant populations, in which this dispersal of offspring around the parent is the only source of spatial motion.

Often models do not distinguish between juveniles and adults, so, for example, the number of adults produced by a single parent is determined only by the degree of crowding at the location of the parent. Although we shall similarly only follow the adult population, in formulating the dynamics of the models we shall distinguish between production of juveniles, which will depend upon the location of the adult, and their successful establishment, which will depend on the location in which a juvenile lands. The result is that not only the absolute number, but also the spatial distribution around their parent, of those offspring that survive to adulthood will depend upon the local population density.

We shall consider three different classes of scaling limits for our model. The first yields a class of (generalised) superprocesses in which coefficients governing both the spatial motion and the branching components of the process can depend on local population density; the second is a corresponding class of deterministic non-local differential equations; and the third are classical PDEs. We measure local population density around a point by convolving with a smooth kernel ρ(), which may differ for the two stages of reproduction. When the limiting population process is deterministic, it is a (weak) solution of an equation of the form

tφtx=rx,φt*φtγ,φtx+φtxFx,φt, (1.1)

where φt(x) can be thought of as the population density at x (although the limit may be a measure without a density), and * is (the adjoint of) a strictly uniformly elliptic second order differential operator, typically the Laplacian. The dependence of each of the terms r,γ, and F on φ is only through the local density at x, e.g., F(x,φ)=F(x,ρ*φ(x)). We shall be more specific about the parameters below.

By replacing ρ by ρϵ()=ρ(/ϵ)/ϵd, we can also scale the “width” of the region over which we measure local population density. When the population follows (1.1), we expect that if we take a second limit of ϵ0, thus scaling the kernels appearing in r,γ, and F and making interactions pointwise, we should recover a nonlinear PDE. We verify that this is indeed the case in two important examples: a special case of the porous medium equation with a logistic growth term, in which the limiting equation takes the form

tφ=Δφ2+φ(1-φ); (1.2)

and a wide class of semi-linear PDEs of the form

tφ=*φ+φF(φ), (1.3)

which includes the Fisher-KPP equation and the Allen-Cahn equation. Equations of this form have been studied extensively in the context of spatial ecology (see for instance Lam and Lou [2023] and Cantrell and Cosner [2004]) and in many other fields; for instance, Ghosh and Good [2022] derive a stochastic version of (1.3) to describe abundances of mutant bacteria strains along the human gut, while Li et al. [2022] study the effects of nonlinear diffusion on long-term survival of a lattice-based interacting particle system. However, we do not study the effect of movement of adults, which can additionally affect the limiting equations: see for instance Holmes et al. [1994] or Potts and Börger [2023].

It is of interest to understand under what conditions we can replace the two-step limiting process described above by one in which we simultaneously scale the kernels and the other parameters in our population model to arrive at the PDE limit. This is mathematically much more challenging, but we establish such one-step convergence in cases for which the limit is a classical reaction-diffusion equation of the form (1.3) with =Δ, and ρ is a Gaussian density. We allow a wide class of reaction terms, F, so that the Fisher-KPP equation (that is equation (1.3) with =Δ and F(φ)=1-φ) emerges as a special case.

Such results on (one-step) convergence to reaction-diffusion equation limits have been achieved for a variety of interacting particle systems. Following the now classical contributions of De Masi et al. [1986], DeMasi and Presutti [2006], Oelschläger [1985], much of this work has focused on lattice based models with one particle per site, or on systems with a fixed number, N, of interacting diffusions as N. For systems of proliferating particles, as considered for example by Oelschläger [1989], Flandoli et al. [2019], Flandoli and Huang [2021], an additional challenge (also apparent in our models), is the control of concentration of particles. We follow Oelschläger [1989], Flandoli et al. [2019] in considering ‘moderate interactions’, meaning that the number of individuals in the neighbourhood over which we measure local population density tends to infinity, whereas Flandoli and Huang [2021] also consider the situation in which that number remains finite. We refer to Flandoli and Huang [2021] for a more thorough literature review, but note that both our model and scaling differ from those considered in the body of work discussed there: whereas in those settings, the only scalings are the number of particles in the system and the size of the neighbourhood over which individuals interact with one another, in keeping with the vast literature on continuous state branching models, we also scale time and so must ensure that births are adequately compensated by deaths to prevent the population from exploding.

The history of a natural population is often only accessible indirectly, through patterns of genetic diversity that have been laid down; from genetic data, one can try to infer the genealogical trees that relate individuals in a sample from the population, and these have been shaped by its history [see e.g., Neigel and Avise, 1993, Kelleher et al., 2019]. It is therefore of interest to establish information about the distribution of genealogical trees under our population model, which we do with a lookdown construction. Lookdown constructions were first introduced in Donnelly and Kurtz [1996] to provide a mechanism for retaining information about genealogical relationships between individuals sampled from a population evolving according to the Moran model when passing to the infinite population limit. Since then, they have been extended to a wide range of models. Of particular relevance to our work here are the papers of Kurtz and Rodrigues [2011] and Etheridge and Kurtz [2019], in which lookdown constructions are provided for a wide variety of population models, including spatially structured branching processes.

In general, even armed with a lookdown construction, calculation of relevant statistics of the genealogy remains a difficult question. However, in special circumstances, some progress can be made. As an illustration, we shall consider a scenario that has received a great deal of attention in recent years, in which a population is expanding into new territory as a travelling wave. In Section 3.2 we shall describe the motion of a single ancestral lineage relative to three different (deterministic) wavefronts across R1.

Most work on the topic of “waves” of expanding populations has focused on models that caricature the classical Fisher-KPP equation with a stochastic term, i.e.

dw=(Δw+sw(1-w))dt+α(w)NW(dt,dx),

where W is space-time white noise, and N is a measure of the local population density. The coefficient α(w) is generally taken to be either w, corresponding to a superprocess limit, or w(1-w) giving a spatial analogue of a Wright-Fisher diffusion. Starting with the pioneering work of Brunet et al. [2006], a considerable body of evidence has been amassed to underpin the conjecture that for this, and a wide class of related models, genealogies converge on suitable timescales in the infinite density limit to a Bolthausen-Sznitman coalescent. This reflects the fact that, for this equation, ancestral lineages become trapped in the wavefront, where the growth rate of the population is highest. Once there, they will experience rapid periods of coalescence corresponding to significant proportions of individuals in the front being descended from particularly reproductively successful ancestors.

If one replaces the logistic growth term of the classical Fisher-KPP equation with a nonlinearity that reflects cooperative behaviour in the population, such as

wFw=w1-wCw-1, (1.4)

then, for sufficiently large C (strong cooperation), the nature of the deterministic wave changes from “pulled” to “pushed”, [Birzu et al., 2018, 2021], and so the genealogies will be quite different from the Fisher-KPP case. For example, Etheridge and Penington [2022] show that for a discrete space model corresponding to this nonlinearity with C>2, after suitable scaling, the genealogy of a sample converges not to a Bolthausen-Sznitman coalescent, but to a Kingman coalescent. The reason, roughly, is that ancestral lineages settle to a stationary distribution relative to the position of the wavefront which puts very little weight close to the ‘tip’ of the wave, so that when ancestral lineages meet it is typically at a location in which population density is high, where no single ancestor produces a disproportionately large number of descendants in a short space of time.

The shape of the wave is not determined solely by the reaction term. For example, as a result of the nonlinear diffusion, for suitable initial conditions, the solution to the one-dimensional porous medium equation with logistic growth (1.2) converges to a travelling wave with a sharp cut-off; i.e., in contrast to the classical Fisher KPP equation, the solution at time t vanishes beyond x=x0+ct for some constant wavespeed c>0 [Kamin and Rosenau, 2004]. As a first step towards understanding what we should expect in models with nonlinear diffusion, one can ask about the position of an ancestral lineage relative to the wavefront in the deterministic models. In Section 3.2 we shall see that in our framework, even with logistic growth, the nonlinear diffusion corresponding to the porous medium equation results in a stationary distribution for the ancestral lineage that is concentrated behind the wavefront, leading us to conjecture that in the stochastic equation the cooperative behaviour captured by the nonlinear diffusion will also result in a qualitatively different pattern of coalescence to that seen under the stochastic Fisher-KPP equation. Indeed, we believe that it should be feasible to show that in an appropriate limit one recovers a Kingman coalescent.

Structure of the paper

In this paper we study scaling limits of spatial population models, obtaining convergence of both the population process (i.e., the population density as a function of time, although strictly speaking it is a measure that may not have a density) and of lineages traced back through such a population. We retain information about lineages as we pass to the scaling limit by means of a lookdown construction.

In what follows we first study various scaling limits of the spatial population process, and then turn our attention to lineages traced back through these populations. First, in Section 2, we describe the model and the main results, Theorems 2.10, 2.20, and 2.23. Next, in Section 3, we discuss a few striking consequences of these results regarding the behavior of genealogies in traveling waves, the appearance of periodic “clumps” in seemingly homogeneous population models, and identifiability of the underlying dynamics from a stationary population profile. In Section 4, we provide heuristic explanations of why the theorems ought to be true, and some key ideas behind them, and in Section 5 we define and discuss the lookdown construction. Proofs of the results begin in Section 6, which proves results for population models with nonlocal interactions, while Section 7 gives the more difficult proof for the case when interaction distances also go to zero in the limit. Finally, Section 8 gives proofs for convergence of the lookdown process and the associated results for the motion of lineages. The Appendix contains a few more technical and less central lemmas. The results are illustrated in a few places with individual-based simulations, made using SLiM [Haller and Messer, 2019], but these are provided for visualization and we do not embark on numerical study.

2. Model and main results

Our model is one of individuals distributed across a continuous space which we shall take to be Rd. For applications, d=1 or d=2 (or even d=3 for cells within the body), but our main results apply more generally. At time zero, the population is distributed over a bounded region, with 𝒪(N) individuals per unit area in that region, so the total number of individuals will also be 𝒪(N). The population changes in continuous time, and we encode the state of the population at time t by a counting measure X(t), which assigns one unit of mass to the location of each individual.

Population dynamics are controlled by three quantities, birth (γ), establishment (r), and death (μ), each of which can depend on spatial location and local population density in a way specified below. Each individual gives birth at rate γ to a single (juvenile) offspring, which is dispersed according to a kernel q(x,) away from the location x of the parent. We assume that q is the density of a multivariate Gaussian, allowing a nonzero mean and anisotropic variance. Both the mean and covariance of q can change across space, but do not depend on population density. The offspring does not necessarily survive to be counted in the population: it “establishes” with probability r, or else it dies immediately. Independently, each individual dies with rate μ.

We aim to capture universal behaviour by passing to a scaling limit. Specifically, we shall take the “density”, N, to infinity, and also scale time by a factor of θ=θ(N), in such a way that defining ηN(t)=X(θt)/N, the process ηN(t)t0 will converge to a suitable measure-valued process as N and θ tend to infinity, with the nature of the limit depending on how they tend to infinity together. Evidently, we also need to scale the dispersal kernel if we are to obtain a nontrivial limit, for which we use qθ(x,), the density of the multivariate Gaussian obtained by multiplying the mean and variance components of q(x,) by 1/θ.

Birth, establishment, and death can depend on the location of the individual and the local population density. Since we would like the population density to scale with N, these are functions of X/N, i.e., the counting measure with mass 1/N placed at the location of each individual. First consider birth rates, defined by a nonnegative function γx,m:Rd×R0R0 of location x and local population density m. Local population density is defined as the convolution of X/N with a smooth (non-negative, integrable) kernel ργ(). We write this convolution as ργ*X/N. Then, when the state of the population is X, an individual at location x gives birth to a single juvenile offspring at rate γx,ργ*X(x)/N. Similarly, the establishment probability of an offspring at location y is is ry,ρr*X(y)/N, where r(y,m):Rd×R0[0,1] and again ρr*X is the convolution of X/N with the smooth kernel ρr.

We shall write μθ(x,X/N) for the per-capita death rate of mature individuals in the population. In order for the population density to change over timescales of order θ, we should like the net per capita reproductive rate to scale as 1/θ. In classical models, in which r,γ, and μ are constant, this quantity is simply rγ-μ. Here, because production of juveniles and their establishment are mediated by population density measured relative to different points, the net reproductive rate will take a more complicated form. In particular, the total rate of production of mature offspring by an individual at x will be

γx,ργ*X(x)/Nry,ρr*X(y)/Nqθ(x,dy). (2.1)

Nonetheless, it will be convenient to define the death rate μθ in terms of its deviation from rγ. To this end, we define the death rate of an individual at x to be

μθ(x)=max0,rx,ρr*X(x)/Nγx,ργ*X(x)/N-1θFx,ρF*X(x)/N, (2.2)

where this equation defines F(x,m):Rd×R0R, and ρF is again a smooth kernel. The function F is nearly the net per capita reproductive rate, scaled by θ, and would be equal to it in a nonspatial model; but, as can be seen from (2.1), differs because an offspring’s establishment probability is measured at their new location rather than that of their parent. For the most part, we work with F instead of μθ.

So, each of the three demographic parameters r,γ, and F, depends on local density, measured by convolution with a smooth kernel, each of which can be different. As a result, death rate depends (in principle) on population densities measured in three different ways, so that we could write μθ(x)=μθ(x,ργ*X(x)/N, ρr*X(x)/N,ρF*X(x)/N). This may seem unnecessarily complex. However, not only is it natural from a biological perspective, it also turns out to be convenient for capturing nontrivial examples in the scaling limit.

Remark 2.1

Although this model allows fairly general birth and death mechanisms, there are a number of limitations. Perhaps most obviously, to simplify the notation individuals give birth to only one offspring at a time, although this restriction could be easily lifted [as in Section 3.4 of Etheridge and Kurtz, 2019]. Furthermore, individuals do not move during their lifetime, and the age of an individual does not affect its fecundity or death rate. Finally, there is no notion of mating (although limitations on reproduction due to availability of mates can be incorporated into the birth rate, γ), so the lineages we follow will be uniparental. For these reasons, the model is most obviously applicable to bacterial populations or selfing plants, although we do not anticipate that incorporation of these complications will change the general picture.

For each N and θ, we study primarily the process with mass scaled by N and time scaled by θ,

ηtNt0:=(X(θt)/N)t0,

which takes values in the space of càdlàg paths in FRd (the space of finite measures on Rd endowed with the weak topology). In fact ηtN will be a purely atomic measure comprised of atoms of mass 1/N.

Notation 2.2

Expressions like γx,ργ*η(x) will appear repeatedly in what follows. To make formulae more readable, we overload notation to define

γx,η:=γx,ργ*ηx,

and similarly write r(x,η) for rx,ρr*η(x),F(x,η) for Fx,ρF*η(x), and μθ(x,η) for the expression of equation (2.2). When convenient, we may also suppress the arguments completely, writing simply γ,r,F, and μθ for these quantities.

Terminology 2.3

In our prelimiting model, the population is represented by a point measure in which each individual is assigned a mass 1/N. We use the term “population density” for this process, as it is supposed to measure population size relative to a nominal occupancy of N individuals per unit area. There is no implication that the measure representing the population is absolutely continuous with respect to Lebesgue measure; indeed in the prelimit it is certainly not.

In summary, at each time t,ηtN is purely atomic, consisting of atoms of mass 1/N (which are the individuals). At instantaneous rate θγx,ηtNNηtN(dx) an offspring of mass 1/N is produced at location x; it disperses to a location y offset from x by an independent Gaussian random variable with mean b(x)/θ and covariance matrix C(x)/θ, and once there establishes instantaneously with probability ry,ηtN, or else dies. At instantaneous rate θμθx,ηtNNηtN(dx) an individual at location x dies. Note that the process ηtNt0, which records numbers and locations of adult individuals, is just a scaled spatial birth and death process. If, for example, we insist that γ(x,m) is bounded, then existence (and in particular non-explosion) is guaranteed by comparison with a pure birth process. We do not dwell on this, as we shall require more stringent conditions if we are to pass to the limit as θ and N tend to infinity.

It is convenient to characterise the process as a solution to a martingale problem. We write CbRd for the space of bounded smooth functions on Rd, and, where convenient, we write f,η=Rdf(x)η(dx).

Definition 2.4 (Martingale Problem Characterisation)

For each value of N and θ, and each purely atomic η0NFRd with atoms of mass 1/N,ηtNt0 is the (scaled) empirical measure of a birth-death process with càdlàg paths in FRd for which, for all fCbRd, writing qθ(x,dy) for the Gaussian kernel with mean x+b(x)/θ and covariance C(x)/θ,

MtN(f):=f,ηtN-f,η0N-0t{(θf(z)rz,ηsN-f(x)rx,ηsNqθ(x,dz))γx,ηsN,ηsN(dx)+f(x)Fx,ηsN,ηsN(dx)}ds (2.3)

is a martingale (with respect to the natural filtration), with angle bracket process

MN(f)t=θN0t{γx,ηsNf2(z)rz,ηsNqθ(x,dz),ηsN(dx)+μθx,ηsNf2(x),ηsN(dx)}ds. (2.4)

The angle bracket process (or, “conditional quadratic variation”) is the unique previsible process making MN(f)t2-MN(f)t a martingale with respect to the natural filtration. It differs from the usual quadratic variation (usually denoted MN(f)t) because the process has jumps; for the (continuous) limit the two notions will coincide. The use of angle brackets for both integrals and this process is unfortunately standard but should not cause confusion, since the angle bracket process always carries a subscript for time.

The form of (2.3) and (2.4) is explained in Section 4. Note that since (juvenile) individuals are produced at rate Nγη, but each has mass 1/N, these factors of N cancel in (2.3). Under our scaling, N and θ=θ(N) will tend to infinity in such a way that α:=limNθ(N)/N exists and is finite. From the expression (2.4) it is easy to guess that whether the limiting processes will be deterministic or stochastic is determined by whether α is zero or nonzero.

It is convenient to record some notation for the generator of the diffusion limit of a random walk with jump distribution determined by qθ(x,dy).

Definition 2.5 (Dispersal generator)

As above, we define the dispersal kernel, qθ(x,dy), to be the density of a multivariate Gaussian with mean b(x)/θ and covariance matrix C(x)/θ (although often we omit the dependence of b and C on x). Furthermore, we define for fCbRd,

f(x)=ijC(x)ijxixjf(x)+ib(x)ixif(x) (2.5)

and denote the adjoint of by

*f(x)=ijxixj(C(x)ijf(x))-ixi(f(x)b(x)i)=ijCij(x)xixjf(x)+ijxjCij(x)-bi(x)xif(x)+ijxixjCij(x)-ixibi(x)f(x).

Remark 2.6

is defined so that

θfy-fxqθx,dyfxasθ.

Remark 2.7

An equivalent way to describe the model would be to say that when the state of the population is η, an individual at x gives birth at rate

θγx,ηry,ηqx,dy,

and that offspring disperse according to the kernel

qθm(x,η,dy):=r(y,η)qθ(x,dy)r(z,η)qθ(x,dz).

Clearly, the random walk driven by this dispersal kernel is biased towards regions of higher establishment probability. For comparison with future results, it is interesting to write down the limiting generator:

limθθ(f(y)-f(x))qθm(x,η,dy)=[f()r(,η)](x)-f(x)[r(,η)](x)rx,ηN. (2.6)

In the simplest case of unbiased isotropic dispersal (i.e., b=0 and C=I),=Δ, and so (2.6) is equal to

Δfx+2fxlogr,ρr*ηx.

One might guess that the spatial motion described by following the ancestral lineage of an individual back through time would be described (in the limit) by the adjoint of this generator. However, we will see in Section 2.2 that this is not in fact the case.

In order to pass to a scaling limit, we will need to impose some conditions on the parameters of our model.

Assumptions 2.8

We shall make the following assumptions on the parameters of our model.

Dispersal generator:

We assume that

  • 1

    b(x) and C(x) are α-Hölder continuous for some α(0,1] and uniformly bounded in each component, and

  • 2

    the operator is uniformly strictly elliptic, i.e., infxinfy:y=1ijyiC(x)ijyj>0.

Reproduction parameters:

We assume that

  • 3

    The function F(x,m) satisfies

    1. F(x,m) is locally Lipschitz in m;

    2. F(x,m) is uniformly bounded above (but not necessarily below);

    3. for each fixed m,supxRdsupkm|F(x,k)|<;

  • 4

    The functions r(x,m), γ(x,m) have bounded first and second derivatives in both arguments;

  • 5

    γ(x,m) is uniformly bounded;

  • 6
    For each fCb2Rd, there is a Cf such that
    γ(x,η)θ(r(y,η)f(y)-r(x,η)f(x))qθ(x,dy)Cf(1+|f(x)|)
    for all xRd and ηFRd. Furthermore, Cf only depends on the norm of the first two derivatives of f, i.e.,
    Cf=C(supxsupz=1max(izixif(x),ijzizjxixjf(x))).
    To keep expressions manageable, we shall also assume that
    μθ(x)=r(x,η)γ(x,η)-1θF(x,η),
    that is, this expression is non-negative so that there is no need to take the maximum with zero in (2.2). (This is anyway true for sufficiently large θ.)

Since we take bounded f, for most situations the bound Cf(1+|f(x)|) in Condition 6 above can be safely replaced simply by Cf; however, this will be useful in certain situations where we consider a sequence of f with increasing upper bounds. We now give two concrete situations in which Condition 6 is satisfied. The proof is in Section 6.1.

Lemma 2.9

Assume that Conditions 2.8 are satisfied, except for Condition 6. If either

  1. xir(x,η) and xixjr(x,η) are uniformly bounded for xRd,ηFRd;

  2. or, m2γ(x,m) is uniformly bounded and there exists C< such that for θ sufficiently large, and all xRd,ηFRd,
    θρr*ηy-ρr*ηxqθx,dyCργ*ηx,
    and
    θρr*ηy-ρr*ηx2qθx,dyCργ*ηx2,
    then Condition 6 is also satisfied.

The purpose of the conditions that we have placed on the reproduction parameters is to ensure that the net per capita reproduction rate (before time scaling) is order 1/θ. As remarked above, because of the non-local reproduction mechanism, it no longer suffices to assume that r(x,η)γ(x,η)-μθ(x) is of order 1/θ. Perhaps the simplest example in which we can see this is the case where γ1 and F0, so that μθ=r, and η=δx (i.e., the population has all individuals at a single location), so that ρr*η(y)=ρr(y). In this case, the mean rate of change of the total population size is ry,ρr(y)-rx,ρr(x)qθ(x,dy); the first condition of Lemma 2.9 would ensure this is of order 1/θ.

If r(x,m) is independent of m, then the conditions are easy to satisfy; they just require some regularity of r as a function of x. Condition 1 of Lemma 2.9 is also satisfied if for example ρrCρr and mmr(x,m), m2mmr(x,m) are bounded. This is the case, for instance, if ρr decays exponentially. On the other hand, it might seem more natural to take ρr to be a Gaussian density with parameter σr, say. Then, as we check in Lemma B.1, Condition 2 of Lemma 2.9 is satisfied if ργ is also Gaussian with parameter σγ and σγ>σr. For large enough θ, this condition guarantees that σr+1/θ<σγ, so that the establishment probability of a juvenile is controlled by individuals that are already ‘felt’ by the fecundity-regulating kernel ργ at the location of their parent.

2.1. Scaling limits of the population process

Our main results depend on two dichotomies: Is the limiting process deterministic or a (generalized) superprocess? And, are interactions pointwise in the limit or nonlocal? See Figure 1 for snapshots of the population from direct simulation of the process using SLiM [Haller and Messer, 2019] illustrating this first dichotomy. Below we have results for deterministic limits with pointwise and nonlocal interactions, and for superprocess limits with nonlocal interactions.

Figure 1:

Figure 1:

Snapshots of two simulations, with small α=θ/N (left) and large α=θ/N (right). Simulations are run with a Fisher-KPP-like parameterization: birth and establishment are constant, while death increases linearly with density, at slope 1/θ. Left: α=0.1. Right: α=10. Other parameters were the same: dispersal and interactions distance were set to 1, and the equilibrium density is 10 individuals per unit area.

Scaling limits with nonlocal interactions:

Recall that the process ηtNt0 takes its values in the space 𝒟[0,)FRd of càdlàg paths on FRd. We endow FRd with the topology of weak convergence and 𝒟[0,)FRd with the Skorohod topology. A sequence of processes taking values in 𝒟[0,)FRd is said to be tight if the corresponding sequence of distributions is tight, i.e., that any infinite subsequence has a weakly convergent subsubsequence. Our first main result establishes tightness of our rescaled population processes in the case in which interactions remain nonlocal under the scaling, and characterises limit points as solutions to a martingale problem.

Theorem 2.10

Let ηtNt0 be as defined in Definition 2.4 and assume that as N,θ(N) in such a way that θ(N)/Nα. (However, the kernels ρr,ργ, and ρF remain fixed.) Suppose that Assumptions 2.8 hold and, further, that η0NN1 is a sequence of purely atomic measures, with η0N comprised of atoms of mass 1/N, which is tight in FRd. Also assume there exists a nonnegative f0CRd with uniformly bounded first and second derivatives (i.e., with supxsupz=1ixif0(x)zi and supxsupz=1ijxixjf0(x)zizj both finite) and f0(x) as |x| for which f0(x),η0N(dx)<C< for some C independent of N. Then the sequence of processes ηtNt0 is tight, and for any limit point ηtt0, for every fCbRd,

Mt(f):=f(x),ηt(dx)-f(x),η0(dx)-0tγx,ηsf()r,ηs(x)+f(x)Fx,ηs,ηs(dx)ds (2.7)

is a martingale (with respect to the natural filtration), with angle bracket process

M(f)t=α0t2γx,ηsrx,ηsf2x,ηsdxds. (2.8)

If α=0 the limit is deterministic.

Recall when interpreting (2.7) that, for instance, rx,ηs=rx,ρr*ηs(x), and so (fr)(x)=f()r,ρr*ηs()(x). The proof of this theorem appears in Section 6.2.

Theorem 2.10 provides tightness of the rescaled processes. If the limit points are unique, then this is enough to guarantee convergence.

Corollary 2.11

Under the assumptions of Theorem 2.10, if the martingale problem defined by equations (2.7) and (2.8) has a unique solution, then ηtNt0 converges weakly to that solution as N.

When α>0, the limit points can be thought of as interacting superprocesses. For example, when r and γ are constant, and Fx,ηs=1-ρF*ηs(x), we recover a superprocess with nonlinear death rates corresponding to logistic growth [Etheridge, 2004] that is a continuous limit of the Bolker-Pacala model [Bolker and Pacala, 1997, 1999]. We are not aware of a general result to determine when we will have uniqueness of solutions to the martingale problem of Theorem 2.10 when α>0. However, the Dawson–Girsanov transform tells us that we have uniqueness in this special case of the superprocess with nonlinear death rates, and Perkins stochastic calculus (and its adaptation to a lookdown setting) provides uniqueness for cases with interactions in the dispersal mechanism of the superprocess. We refer to Dawson [1993], Perkins [1992], and Donnelly and Kurtz [1999] for approaches to showing that these sorts of martingale problems are well-posed.

For the deterministic case of α=0, the limiting process is a weak solution to a nonlocal PDE. We next describe some situations in which more is known about uniqueness and whether the solution is close to the corresponding local PDE. First, recall the following notion of solution to a PDE.

Definition 2.12 (Weak solutions)

We say that ηtt0, with ηtFRd, is a weak solution to the PDE

tφ=r*(γφ)+φF (2.9)

(where r,γ and F can all be functions of φ) if, for all fCbRd,

ddtf,ηt=γ(rf)+fF,ηt.

The notation φ is meant to be suggestive of a density, and recall that equation (2.9) has made dependencies on x and φ implicit; written out more explicitly, (2.9) is

tφt(x)=rx,ρr*φt(x)*φt()γx,ργ*φt()(x)+φt(x)Fx,ρF*φt(x).

Because Theorem 2.10 only tells us about weak convergence, in the case α=0 we can only deduce that any limit point ηt is a weak solution to this nonlocal PDE.

Specialising the results of Kurtz and Xiong [1999] to the deterministic setting provides general conditions under which we have existence and uniqueness of solutions to (2.9) which have an L2-density with respect to Lebesgue measure. Recall that the Wasserstein metric, defined by

ρν1,ν2=sup|fdν1-fdν2|:supx|f(x)|1,|f(x)-f(y)|x-y,

determines the topology of weak convergence on FRd. We write r(x,η)γ(x,η)C(x)=J(x,η)J(x,η)T, and β(x,η)=r(x,η)γ(x,η)(b(x)+2C(x)logr(x,η)) (quantities that will appear in Proposition 5.6). If J,β, and F are bounded and Lipschitz in the sense that

Jx1,ν1-Jx2,ν2,βx1,ν1-βx2,ν2,Fx1,ν1-Fx2,ν2Cx1-x2+ρν1,ν2 (2.10)

for some C>0, the methods of Kurtz and Xiong [1999] show that if the initial condition η0 for our population process has an L2 density, then so does ηt for t>0. Although the necessary estimates (for which we refer to the original paper) are highly nontrivial, the idea of the proof is simple. Take a solution to the equation and use it to calculate the coefficients r,γ and F that depend on local population density. Then η solves the linear equation obtained by regarding those values of r,γ and F as given. It remains to prove that the solution to the linear equation has a density which is achieved by obtaining L2 bounds on its convolution with the heat semigroup at time δ and letting δ0. We also have the following uniqueness result.

Theorem 2.13 (Special case of Kurtz and Xiong [1999], Theorem 3.5)

Suppose J,β, and F are bounded and Lipschitz in the sense of (2.10). If η0 has an L2Rd-density, then there exists a unique L2Rd-valued solution of (2.9) in the sense of Definition 2.12.

Remark 2.14

Kurtz and Xiong [1999] considers an infinite system of stochastic differential equations for the locations and weights of a collection of particles that interact through their weighted empirical measure, which is shown to be the unique solution to a stochastic PDE. As we shall see through our lookdown representation in Section 5, the solution to our deterministic equation can be seen as the empirical measure of a countable number of particles (all with the same weight) which, in the notation above, evolve according to

X(t)=X(0)+0tβX(s),ηsds+0tJX(s),ηsdW(s)

(with an independent Brownian motion W for each particle).

Two-step convergence to PDE:

Although the coefficients at x in (2.9) are nonlocal, we can choose our kernels ργ,ρr, and ρF in such a way that they depend only on the population in a region close to x, and so we expect that under rather general conditions solutions of the nonlocal PDE will be close to the corresponding classical PDE. The following propositions provide two concrete situations in which this is true. In the first, the PDE is a reaction-diffusion equation, and in the proof in Section 6.3.1 we borrow an idea from Penington [2017] to express the solutions to both the nonlocal equation and the classical PDE through a Feynman-Kac formula.

Proposition 2.15

Let ρFϵ(x)=ρF(x/ϵ)/ϵd. Assume φ0L2(R) is a positive, uniformly Lipschitz, and uniformly bounded function. Suppose that φϵL2Rd is a weak solution to the equation

tφϵ=*φϵ+φϵFρFϵ*φϵ,xRd,t>0, (2.11)

with initial condition φ0(), and that φ is a weak solution to the equation

tφ=*φ+φF(φ),xRd,t>0, (2.12)

also with initial condition φ0(). Suppose further that F is a Lipschitz function which is bounded above, and that b(x) and C(x), the drift and covariance matrix of , satisfy the conditions of Assumptions 2.8. Then, for all T>0 there exists a constant K=KT,φ0< and a function δ(ϵ) (dependent on ρF) with δ(ϵ)0 as ϵ0, such that, for all 0tT, and ϵ small enough,

φt()-φtϵ()Kδ(ϵ).

In particular, as ϵ0, we have that φϵ converges uniformly in compact intervals of time to φ.

Remark 2.16

Note that Theorem 2.13 guarantees uniqueness of solutions to equation (2.11).

Our second example in which we know solutions to the nonlocal PDE converge to solutions of the local PDE as interaction distances go to zero is a nonlocal version of a porous medium equation with logistic growth. That is, we consider non-negative solutions to the equation

tψϵ=Δψϵργϵ*ψϵ+ψϵ1-ργϵ*ψϵ. (2.13)

The case without the reaction term (and with Rd replaced by a torus) is considered by Lions and Mas-Gallic [2001] who use it as a basis for a particle method for numerical solution of the porous medium equation. Of course this does not quite fit into our framework, since in the notation of our population models this would necessitate γ(x,m)=ρϵ*m which is not bounded. However, this can be overcome by an additional layer of approximation (c.f. our numerical experiments of Section 3.1) and we do not allow this to detain us here. Existence and uniqueness of solutions to (2.13) can be obtained using the approach of Lions and Mas-Gallic [2001], so we should like to prove that as ϵ0 we have convergence to the solution to the porous medium equation with logistic growth:

tψ=Δψ2+ψ(1-ψ). (2.14)

Notation 2.17

We use to denote weak convergence in the sense of analysts; that is, ψϵψ in L1 means ψϵvdxψvdx for all vL.

We write Lt2H1 for functions for which the H1 norm in space is in L2 with respect to time, i.e.

0T{ψt(x)2+ψt(x)2}dxdt<,

and CtL1 will denote functions for which the L1 norm in space is continuous in time.

Proposition 2.18

Suppose that we can write ργ=ζ*ζˇ, where ζˇ(x)=ζ(-x) and ζ𝒮Rd (the Schwartz space of rapidly decreasing functions). Furthermore, suppose that ψ0ϵ0 is such that there exist λ(0,1) and C(0,) (independent of ϵ) such that

exp(λx)ψ0ϵ(x)dx<C,andsupϵψ0ϵlogψ0ϵdx<,

with ψ0ϵψ0 as ϵ0. Then writing ψϵ for the solution to (2.13) on [0,T]×Rd with initial condition ψ0ϵ, ψϵψ0 as ϵ0 where ψLt2H1CtL1,ψ|logψ|dx<, and ψ solves (2.14) on [0,T]×Rd.

The example that we have in mind for the kernel ργ is a Gaussian kernel. For the proof, see Section 6.3.2.

Remark 2.19

Although it seems hard to formulate an all-encompassing result, Propositions 2.15 and 2.18 are by no means exhaustive. When the scaling limit is deterministic, one can expect analogous results under rather general conditions. However, when the limit points are stochastic, they resemble “nonlinear superprocesses” and so one cannot expect a density with respect to Lebesgue measure in d2. It is then not reasonable to expect to be able to make sense of the limit if we scale the kernels in this way. Moreover, in one dimension, where the classical superprocess does have a density with respect to Lebesgue measure, the form of (2.7) suggests that even if one can remove the local averaging from γ, it will be necessary to retain averaging of r in order to obtain a well-defined limit.

One-step convergence to PDE:

Theorem 2.10, combined with Proposition 2.15 or 2.18 implies that we can take the limit N followed by the limit ϵ0 to obtain solutions to the PDE (2.12). However, it is of substantial interest to know whether we can take those two limits simultaneously. The general case seems difficult, but we prove such “diagonal” convergence in the following situation. The proof is provided in Section 7.

Theorem 2.20 (Convergence to a PDE)

Let ηtNt0 be as defined in Definition 2.4 with r(x,m)1γ(r,m), F(x,m)F(m), ρFϵ a symmetric Gaussian density with variance parameter ϵ2, and =Δ/2. Further suppose that F(m) is a polynomial with F(m)1m0 bounded above. Assume that 1,η0N is uniformly bounded, and that for all xRd and kN,

limsupϵ0E[ρFϵ*η0(x)k]<,

and

limsupϵ0E[ρFϵ*η0(x)k]dx<.

Finally assume that N,θ and ϵ0 in such a way that

1θϵ2+θNϵd0. (2.15)

Then the sequence of 𝒟[0,)FRd-valued stochastic processes ρFϵ*ηtN(x)dxt0 converges weakly to a measure-valued process with a density φ(t,x) that solves

tφt,x=12Δφt,x+φt,xFφt,x. (2.16)

Remark 2.21

In fact, our proof goes through without significant change under the conditions that F(m)1m0 is bounded above (but not necessarily below), and that for all m,n[0,)

|F(m)|j=1kajmj,and|F(n)F(m)||nm|j=1kbj(nj+mj),

for some non-negative constants ajj=0k,bjj=0k. We take F to be polynomial to somewhat simplify notation in the proof.

2.2. Ancestral lineages in the scaling limit

Now that we have established what we can say about how population density changes with time, we turn to results on ancestral lineages, i.e., how genealogical ancestry can be traced back across the landscape. Informally, a lineage LtNt0, begun at a spatial location L0N=x where there is a focal individual in the present day can be obtained by, for each time t, setting LtN to be the spatial location of the individual alive at time t before the present from whom the focal individual is descended. Since in our model individuals have only one parent, this is unambiguous. Although we did not explicitly retain such information, it is clear that for finite N, since individuals are born one at a time, one could construct the lineage LtNt=0T given the history of the population ηtNt=0T, for each starting location to which ηTN assigns positive mass. It is less clear, however, how to formally retain such information when we pass to the scaling limit.

The lookdown construction in Section 5 will enable us to recover information about ancestry in the infinite population limit. Roughly speaking, each particle is assigned a unique “level” from [0,) that functions as a label and thus allows reconstruction of lineages. The key to the approach is that levels are assigned in such a way as to be exchangeable, so that sampling a finite number, k say, of individuals from a given region is equivalent to looking at the individuals in that region with the k lowest levels. Moreover, as we pass to the infinite population limit, the collection of (individual, level) pairs converges, as we show in Theorem 5.4. See Etheridge and Kurtz [2019] for an introduction to these ideas. In particular, even in the infinite population limit, we can sample an individual from a region (it will be the individual in that region with the lowest level) and trace its line of descent. This will allow us to calculate, for each x and yRd, the proportion of the population at location x in the present day population that are descended from a parent who was at location y at time t in the past. To make sense of this in our framework, in Section 8.2, we justify a weak reformulation of this idea.

We are interested in two questions. First, when is the motion of an ancestral lineage, given complete knowledge of the population process, a well-defined process? In other words, is knowledge of the process ηtt=0T that records numbers of individuals but not their ancestry sufficient to define the distribution of Ltt=0T? Second, does the process have a tractable description?

We focus on the simplest situation, that in which the population process is deterministic. However, the results here apply when the population process solves either a nonlocal or a classical PDE. There will be no coalescence of ancestral lineages in the deterministic limit, but understanding motion of single lineages is useful in practice, and our results can be seen as a first step towards understanding genealogies for high population densities.

Proofs of results in this section are found in Section 8.

Definition 2.22 (Ancestral lineage)

Let φt(x)0tT denote the density of the scaling limit of our population model, solving (2.9), and let y be a point with φT(y)>0. We define Lss=0T, the ancestral lineage of an individual sampled from the population at y at time T, by setting L0=y and Ls to be the position of the unique ancestor of that individual at time T-s. We define Qss0 to be the time inhomogeneous semigroup satisfying

Qsf(y):=EyfLs.

Our next result identifies the ancestral lineage as a diffusion by characterizing its generator.

Theorem 2.23

For φ:RdR, define

φf=rφ*(γφf)-f*(γφ) (2.17)
=rγ[ijCijxixjf+jmjxjf], (2.18)

where m is the vector

mj=2iCijxilog(γφ)+2ixiCijbj.

Then the generator of the semigroup Qs of Definition 2.22 is given by sQsf(y)=φT-sQsf(y).

Remark 2.24

As usual, to make the generator readable, we’ve written it in concise notation, omitting the dependencies on location and population density, which itself changes with time. When interpreting this, remember that everything depends on location and density at that location and time – for instance, “r” is actually rx,φ(x)) (in the classical case), or rx,ρr*η(x)) (in the nonlocal case).

Moreover, we haven’t proved any regularity of the population density process φ, so, as written, the generator (2.17) may not make sense. Instead, it should be interpreted in a weak sense which is made precise in Section 8.2.

Corollary 2.25

In addition to the assumptions of Theorem 2.23, if the covariance of the dispersal process is isotropic (i.e., C=σ2I), then

φf=rγσ2Δf+2σ2log(γφ)-bf. (2.19)

(However, b can still depend on location.)

In other words, the lineage behaves as a diffusion driven by Brownian motion run at speed σ2 multiplied by the local per-capita production of mature offspring (rγ) in a potential tilted by migration bias ((φsγ)2exp(-bx/σ2), whose gradient appears in the drift term of the generator). In particular, lineages are drawn to regions of high fecundity (production of juveniles), but their speed is determined by the rate of production of mature offspring. This can be compared to Remark 2.7.

Corollary 2.26

In addition to the assumptions of Corollary 2.25, if the population process is stationary (so φtφ), and b(x)=h(x) for some function h, then Y is reversible with respect to

π(x)=γrφ(x)2e-h(x)/σ2. (2.20)

Long-term fitness of an individual is proportional to the fraction of lineages from the distant future that pass through the individual, and hence the total long-term fitness at a location is proportional to the stationary distribution of Y there, if it exists. Therefore, if π is integrable then the per-capita long-term fitness of an individual at x is proportional to π(x)/φ(x).

Corollary 2.27

In addition to the assumptions of Corollary 2.25, suppose that the population process is described by a travelling wave with velocity c, i.e., the population has density φ(t,x)=w(x-tc) where w solves

r*(γw)+wF+cw=rσ2Δ(γw)-b(γw)+cw=0.

Then the semigroup Qs of the motion of a lineage in the frame that is moving at speed c is time-homogeneous with generator

f=σ2rγ(Δf+2log(γw)f)+(c-rγb)f. (2.21)

3. Examples and applications

We now discuss some consequences of these results.

3.1. Beyond linear diffusion

Equation (2.9) is a nonlocal version of a reaction-diffusion equation; the diffusion is nonlinear if γ depends on population density: in other words, if the diffusivity of the population depends on the population density. Passing to the classical limit, we recover equations like (2.14). Such equations are widely used in a number of contexts in biology in which motility within a population varies with population density. For example, density dependent dispersal is a common feature in spatial models in ecology, eukaryotic cell biology, and avascular tumour growth; see Sherratt [2010] and references therein for further discussion. In particular, it has been suggested as a model for the expansion of a certain type of bacteria on a thin layer of agar in a Petri dish [Cohen et al., 1999]. We shall pay particular attention to the case in which the equation can be thought of as modelling the density of an expanding population. We focus on the monostable reaction of (2.14).

Comparing with (2.9), we see that to set up a limit in which the population density φ follows the porous medium equation with logistic growth of (2.14), we need r=1,γ=φ, and F=1-φ. Consulting equation (2.2), this implies that μθ=max(0,(1+1/θ)φ-1/θ). In other words, establishment is certain and birth rates increase linearly with population density, but to compensate, death rates increase slightly faster (also linearly). Alert readers will notice that the condition from Assumptions 2.8 that γ(x,m) be uniformly bounded is violated. This can be corrected by use of a cut-off, and in fact the downwards drift provided by the logistic control of the population size prevents m from getting too big. In practice the simulations shown in Figure 2 take discrete time steps of length dt (with dt suitably small), and have each individual reproduce and die with probabilities, respectively,

pbirthm=1-e-mdtpdeathm=1-e-m1+1/θ-1/θdt,

where m is the local density at their location. This makes γ(x,m)=pbirth(m)/dtm and

F(x,m)=θ(γ(x,m)-μ(x,m))/dt1-m.

Birth and death rates are equal at density m=1, corresponding to an unscaled density of N individuals per unit area.

Figure 2:

Figure 2:

Simulated populations under a porous medium equation with logistic growth (2.14) in d=1,θ/N small on the left; large on the right. Values of θ in top and bottom figures are 1 and 100, respectively, and both have N set so that the density is roughly 100 individuals per unit of habitat (as displayed on the vertical axis). See text for details of the simulations.

In one dimension, equation (2.14) has an explicit travelling wave solution

wP(t,x):=1-e12x-x0-t+. (3.1)

Notice that the wave profile has a sharp boundary at x=x0+t. There are also travelling wave solutions with c>1 [Gilding and Kersner, 2005], which lack this property. However, for initial conditions that decay sufficiently rapidly at infinity, such as one might use in modelling a population invading new territory, the solution converges to (3.1) [Kamin and Rosenau, 2004]. In Figure 2 we show simulations of the individual based model described above, which display travelling wave solutions qualitatively similar to solutions of (2.14), with better agreement for smaller θ/N (but in both cases, N is reasonably large).

3.2. Ancestry in different types of travelling waves

Although it remains challenging to establish the distribution of genealogical trees relating individuals sampled from our population model, as described in the introduction, we can gain some insight by investigating the motion of a single ancestral lineage. Here we do that in the context of a one-dimensional population expanding into new territory as a travelling wave. We focus on three cases in which we have explicit information about the shape of the travelling wave profile: the Fisher-KPP equation, a special case of the Allen-Cahn equation with a bistable nonlinearity, and the porous media equation with logistic growth, equation (2.14). We work here in one dimension, and take σ2=1 and b=0.

Fisher-KPP equation:

Consider the classical Fisher-KPP equation,

tφ=xxφ+φ(1-φ). (3.2)

Even though we do not have an explicit formula for the wave shape in this case, our methods provide information about ancestral lineages. The equation has non-negative travelling wave solutions of speed c for all c2, but, started from any compact perturbation of a Heaviside function, the solution will converge to the profile wF with the minimal wavespeed, c=2 [Kolomogorov et al., 1937, Fife and McLeod, 1977, Bramson, 1983]. No matter what initial condition, for any t>0 the support of the solution will be the whole real line. In this case, we must have r=γ=1, and F(x,m)=1-m so μθ(x,m)=1+(m-1)/θ. By Corollary 2.27, the generator of the motion of an ancestral lineage is

Ff=xxf+2xwFwFxf+2xf. (3.3)

Near the tip of the wave (for x large), wF(x)e-x, so (3.3) implies that the motion of a lineage is close to unbiased Brownian motion. On the other hand, in the “bulk”, a lineage behaves approximately as Brownian motion with drift at rate two to the right. This implies that ancestral lineages are pushed into the tip of the wave, and there is no stationary distribution, so that long-term dynamics of genetic inheritance depend on the part of the wave not well-approximated by a smooth profile, in agreement with the previous results referred to in the Introduction.

Allen-Cahn equation:

Now take the Allen-Cahn equation:

tφ=xxφ+φ1-φ2φ-1+s, (3.4)

for a given s(0,2). Once again we have taken r=γ=1, but now the reaction term F(x,m)=(1-m)(2m-1+s) is bistable. This equation can be used to model the motion of so-called hybrid zones in population genetics; see, for example, Barton [1979], Gooding [2018], and Etheridge et al. [2022]. This equation has an explicit travelling wave solution with speed s and shape

wA(x)=1+ex-1,

i.e., ϕt(x)=wA(x-st) solves (3.4). Substituting wA in place of wF in (3.3), we find that the generator of an ancestral lineage relative to the wavefront is now,

Af=xxf+2xwAwAxf+sxf=xxf-2ex1+exxf+sxf,

so lineages in the tip are pushed leftwards into the bulk of the wave at a rate s-2ex/1+ex. The density of the speed measure for this diffusion is

mA(x)esx1+ex-2,

which is integrable, and so determines the unique stationary distribution. Thus the position of the ancestral lineage relative to the wavefront will converge to a stationary distribution which is maximised away from the extreme tip of the wave. This is consistent with Etheridge and Penington [2022], who consider an analogous stochastic population model, although the stronger result there (that the genealogy of a sample from behind the wavefront is approximately a Kingman coalecsent) requires the stronger condition s<1.

Porous Medium equation with logistic growth:

Finally, consider equation (2.14). Setting x0=0 (for definiteness) and substituting the form of wP from equation (3.1) into Corollary 2.27, with c=1,γ(x,m)=m,r(x,w)=1, and F(x,m)=(1-m), the generator of the diffusion governing the position of the ancestral lineage relative to the wavefront is, for x<0,

Pf=wPxxf+2xwP2wP2xf+xf=1-e12xxxf-2e12xxf+xf.

The speed measure corresponding to this diffusion has density

mP(ξ)121-eξ/2expηξ1-ex/21-ex/2dxeξ1-eξ/2,forξ<0

and mP(ξ)=0 for ξ0, which is integrable and so when suitably normalised gives the unique stationary distribution. Notice that even though we have the same reaction term as in the Fisher-KPP equation, with this form of nonlinear diffusion, at stationarity the lineage will typically be significantly behind the front, suggesting a different genealogy.

It is interesting to compare the stationary distribution we have obtained here to the expression that we’d get by setting b=-c and using Corollary 2.26, i.e., by giving each offspring a mean displacement that offsets the motion of the wave. In the Fisher-KPP and Allen-Cahn cases above, we get the same expressions, but this is only the case because rγ1 in both. For the PME with b=1 we have *=Δ+ and so the equation solved by the population density is

tφ=Δφ+2φφ+φ(1-φ),

which has a traveling wave solution of the same shape but moving at half the speed, φ(x,t)=wP(x-t/2), and a stationary distribution of the lineage relative to the wavefront of π(ξ)e3ξ/21-eξ/23.

3.3. Clumping from nonlocal interactions

Simulating these processes and exploring parameter space, one sooner or later comes upon a strange observation: with certain parameter combinations, the population spontaneously forms a regular grid of stable, more or less discrete patches, separated by areas with nearly no individuals, as shown in Figure 3. The phenomenon is discussed in Section 16.10 of Haller and Messer [2022], and has been described in similar models, e.g., by Britton [1990], Sasaki [1997], Hernández-García and López [2004], Young et al. [2001], and Berestycki et al. [2009]. For example, if the density-dependent effects of individuals extend farther (but not too much farther) than the typical dispersal distance, then depending on the interaction kernel new offspring landing between two clumps can effectively find themselves in competition with both neighbouring clumps, while individuals within a clump compete with only one.

Figure 3:

Figure 3:

Left: A snapshot of individual locations in a two-dimensional simulation in which the constant density is unstable and a stable, periodic pattern forms. Right: Population density in an expanding wave in a one-dimensional simulation forming a periodic pattern; each panel shows the wavefront in three periods of time; within each period of time the wavefront at earlier times is shown in blue and later times in pink. In both cases, γ(m)=3/(1+m), μ0.3, and r1; dispersal is Gaussian with σ=0.2 and density is measured with ργ(x)=p9(x), i.e., using a Gaussian kernel with standard deviation 3.

More mathematically, consider the case in which =σ2Δ and all parameters are spatially homogeneous, so that r(x,η)=rρr*η(x), and similarly for γ and F. If φ0 is such that Fφ0=0 and Fφ0<0, then the constant solution φφ0 is a nontrivial equilibrium of (1.1). However, this constant solution may not be unique, it may be unstable, and a stable solution may have oscillations on a scale determined by the interaction distance.

To understand the stability of the constant solution φφ0, we linearise (1.1) around φ0 : let φt(x)=φ0+ψt(x), and (informally) r(x)rφ0+rφ0ρr*ψ(x). Writing r0=rφ0 and r0=rφ0, with analogous expressions for γ and F,

tψσ2φ0r0γ0Δργ*ψ+σ2r0γ0Δψ+φ0F0ρF*ψ.

Letting f^(u)=e2πiuxf(x)dx denote the Fourier transform,

tψ^u-u2σ2φ0r0γ0ρ^γu-u2σ2r0γ0+φ0F0ρ^Fuψ^u. (3.5)

In the simplest case, in which γ is constant, so γ0=0, this reduces to

tψ^(u)-u2σ2r0γ0+φ0F0ρ^F(u)ψ^(u). (3.6)

If we take ρF=pϵ2, then ρ^F(u)=exp-2πϵ2u2 and (recalling that F0<0) the term in brackets is always negative, and we recover the well-known fact that in this case the constant solution is stable. If, on the other hand, ρ^F changes sign, there may be values of u for which the corresponding quantity is positive. For example, if d=1 and ρF(x)=1[-ϵ,ϵ](x)/2ϵ, then ρ^F(u)=sin(2πϵu)/(2πϵu), which is negative for u(1/(2ϵ),1/ϵ) (and periodically repeating intervals). Setting v=ϵu, the bracketed term on the right hand side of (3.6) becomes

φ0F012πvsin2πv-σ2ϵ2v2r0γ0,

and we see that if σ2/ϵ2 is sufficiently small, there are values of v for which this is positive. In other words, in keeping with our heuristic above, if dispersal is sufficiently short range relative to the range over which individuals interact, there are unstable frequencies that scale with the interaction distance ϵ. In two dimensions, replacing the indicator of an interval by that of a ball of radius ϵ, a similar analysis applies, except that the sine function is replaced by a Bessel function.

Now suppose that γ is not constant. Then, from (3.5), if we take ργ=ρF=pϵ2,

tψ^u=e-2π2ϵ2u2-σ2φ0r0γ0u2-σ2r0γ0u2e2π2ϵ2u2+φ0F0ψ^u.

If we make the (reasonable) assumption that γ0<0, then we see that even when the Fourier transform of ρ does not change sign, there may be parameter values for which the constant solution is unstable. As before, we set v=ϵu. The term in brackets becomes

σ2ϵ2v2r0-φ0γ0-γ0e2π2v2+φ0F0,

and, provided -φ0γ0/γ0>1, for sufficiently small v the term in round brackets is positive. We now see that if σ2/ϵ2 is sufficiently large, the equilibrium state φφ0 is unstable. As before, the unstable frequencies will scale with ϵ and for given F,r and γ, whether or not such unstable frequencies exist will be determined by σ2/ϵ2, but in this case of Gaussian kernels, it is interaction distance being sufficiently small relative to dispersal that will lead to instability.

3.4. Lineage motion is not uniquely determined by population density

It is natural for applications to wonder about identifiability: when can the observed quantities like population density or certain summaries of lineage movement uniquely determine the underlying demographic parameters? Consider a deterministic, continuous population generated by parameters γ,r, and F, with b=0 and C=I. Suppose it has a stationary profile w(x), that must satisfy

rΔ(γw)+Fw=0.

It is easy to see that w does not uniquely specify γ,F, and r: let λ(x) be a smooth, nonnegative function on Rd, and let r~(x,m)=λ(x)r(x,m) and F~(x,m)=λ(x)F(x,m) (and, let γ~=γ). Since μ=rγ-F/θ, this corresponds to multiplying both establishment probabilities and death rates by λ. Then the population with parameters γ~,r~, and F~ has the same stationary profile(s) as the original population.

Can these two situations be distinguished from summaries of lineage movement? The first has lineage generator

ff=rγ(Δf+2log(γw)f),

while the second has lineage generator fλ(x)f(x). In other words, although the stationary profile of the population is unchanged when we scale local establishment and death by λ, the motion of lineages is sped up locally by λ. This corresponds to making areas with λ>1 more “sink-like” and λ<1 more “source-like”: if λ(x)>1, then at x both the death rate and probability of establishment of new individuals are higher. As a result, lineages in the second model spend more time in areas with λ<1, i.e., those areas have higher reproductive value, something that is, in principle, discernible from genetic data (because, for instance, making reproductive value less evenly distributed reduces long-term genetic diversity).

4. Heuristics

In this section we perform some preliminary calculations and use them to provide heuristic arguments for our main results, to build intuition before the formal proofs.

4.1. The population density

We reiterate that in our prelimiting model, the population is represented by a point measure ηN in which each individual is assigned a mass 1/N. We use the term “population density” for this process, as it is supposed to measure population size relative to a nominal occupancy of N individuals per unit area, but it is not absolutely continuous with respect to Lebesgue measure.

We write 𝒫N for the generator of the scaled population process ηN of Definition 2.4 acting on test functions of the form G(f,η), where f0 is smooth and bounded on Rd and GC([0,)). Recall that θ=θ(N) as N in such a way that θ(N)/Nα.

A Taylor expansion allows us to write

𝒫NG(f,η)=G(f,η)limδt01δtEf,ηδt-f,ηη0=η+12G(f,η)limδt01δtEf,ηδt-f,η2|η0=η+ϵN(f,G,η), (4.1)

where the terms that make up ϵN(f,G,η) will be negligible in our scaling limit (at least if G<).

Mean measure

Recall that in our parameterization only death rates μθ and the dispersal kernel qθ depend on θ. For a suitable test function f, we find

𝒫Nf,η=limδt01δtEf,ηδt-f,ηη0=η=θfzrz,ηqθx,dzγx,ηηdx-θfxμθx,ηηdx. (4.2)

The first term is the increment in f,η resulting from a birth event (recalling that we don’t kill the parent) integrated against the rate of such events, and the second reflects death events. The factor of θ appears from the time rescaling. In both terms, the rate of events has a factor of N (because events happen at a rate proportional to the number of individuals, whereas η has mass 1/N for each individual) which is offset by the fact that the birth or loss of a single individual at the point y, say, changes f,η by f(y)/N.

We use the fact that qθ(x,dz)=1 to rewrite (4.2) as

(θ(f(z)r(z,η)-f(x)r(x,η))qθ(x,dz))γ(x,η)η(dx)+fxθ(rx,ηγx,η-μθx,η)ηdx. (4.3)

We have defined μθ so that the second term is simple:

θ(r(x,η)γ(x,η)-μθ(x,η))=Fx,η.

Furthermore, recall from Remark 2.6 that

θ(r(z,η)f(z)-r(x,η)f(x))qθ(x,dz)θ(r(,η)f())(x). (4.4)

In particular, if dispersal is determined by a standard multivariate Gaussian with mean zero and covariance σ2I/θ, then =σ2Δ, where Δ denotes the Laplacian.

In summary, equation (4.3) converges to

γx,ηfr,ηxηdx+fxFx,ηηdx, (4.5)

which explains the form of the martingale of Theorem 2.10.

Quadratic variation

We now look at the second order term in (4.1), which will converge to the quadratic variation of the limiting process. An individual at location x gives birth to a surviving offspring at y at rate

γ(x,η)r(y,η)qθ(x,dy),

and since this increments f,η by f(y)/N, the contribution to the quadratic variation from birth events, which occur at rate θ per individual (so, rate Nθ|η| overall), is

Nθγx,η1N2f2yry,ηqθx,dyηdx.

Similarly, the increment in f,η resulting from the death of an individual at x is -f(x)/N, and so combining with the above, the second order term in the generator takes the form

G(f,η)12Nθγ(x,η)1N2f2(y)r(y,η)qθ(x,dy)η(dx)+μθ(x,η)1N2f2(x)η(dx)=12G(f,η)θNγ(x,η)f2(y)r(y,η)qθ(x,dy)+f2(x)μθ(x,η)η(dx).

Since f2(y)r(y,η)qθ(x,dy)f2(x)r(x,η) and rγ+μθ=2rγ-F/θ2rγ as θ, this converges to

α2Gf,η2rx,ηγx,η,ηdx.

An entirely analogous argument shows that if G is bounded, then the term ϵθ,N(f,G,η) in (4.1) will be 𝒪θ/N2.

If we hold ργ,ρr,ρF fixed, then by taking θ/N0, the second order term in the generator will vanish and we expect a deterministic limit, for which tf,ηt is equal to (4.5). In other words, the limit is a weak solution to the deterministic equation

tφt(x)=rx,φtγ,φtφt()(x)+Fx,φtφt(x) (4.6)

in the sense of Definition 2.12, where φt is the density of ηt, if it has a density. On the other hand, if N=αθ for some α>0, the second order term remains, and we expect a “generalised superprocess” limit. The limiting quadratic variation is exactly as seen in Theorem 2.10.

One-step convergence:

In order to pass directly to a classical PDE limit in Theorem 2.20 we impose the stronger condition that θ/Nϵd0 and also require that θϵ2. Recall that in this case, we take ρFϵ to be a symmetric Gaussian density with variance ϵ2. The condition θϵ2 ensures that ϵ2 is large enough relative to 1/θ that the regularity gained by smoothing our population density by convolution with ρϵ is preserved under the dynamics dictated by qθ. To understand the first condition, note that we are aiming to obtain a deterministic expression for the limiting population density. It is helpful to think about a classical Wright-Fisher model (with no spatial structure and just two types, say). We know then that if the timescale θ is on the same order as population size N, we see stochastic fluctuations in the frequencies of the two types in the limit as N; to obtain a deterministic limit, we look over timescales that are short relative to population size. In our setting, the total population size is replaced by the local population size, as measured by convolution with ρϵ, which we expect to be of order Nϵd, and so in order to ensure a deterministic limit we take θ/Nϵd0.

4.2. Motion of ancestral lineages

Although our proof of Theorem 2.23 uses an explicit representation in terms of the lookdown process, the result can be understood through informal calculations. Suppose that we have traced a lineage back to an individual at location y at time t. Looking further back through time, at the time of the birth of that individual, the lineage will jump to the location of the parent of the individual. Now, the rate at which new individuals are born to parents at x and establish at y is

θNηtNdxγx,ηtNqθx,dyry,ηtN.

Suppose that ηN did have a density (in the prelimit it does not), say ηtN(dx)=φtN(x)dx. Informally, since the number of individuals near y is NφtN(y)dy, the probability that a randomly chosen individual near y is a new offspring from a parent at x in [t,t+dt) is

θφtN(x)γx,ηtNry,ηtNφtN(y)qθ(x,dy)dydxdt. (4.7)

Leaving aside questions of whether a lineage can be treated as a randomly chosen individual, we define a continuous-time jump process whose transition rates, conditional on φtNt=0T, are given by (4.7). Because we are tracing the lineage backwards in time we make the substitution s=T-t and write LsNs=0T for the location of a lineage that moves according to these jump rates. Then, abusing notation to write qθ(x,y) for the density of qθ(x,dy),

EfLs+dsN-f(y)LsN=y=dsθ(f(x)-f(y))φT-sN(x)γx,ηT-sNry,ηT-sNφT-sN(y)qθ(x,y)dx. (4.8)

(Note that this integral is with respect to x.) Referring back to Remark 2.6, a quick calculation shows that as N,

θ(f(x)-f(y))g(x)qθ(x,y)dx=θ{(f(x)g(x)-f(y)g(y))-f(y)(g(x)-g(y))}qθ(x,y)dx*fgy-fy*gy.

Applying this to (4.8) with g=φT-sγ, this suggests that the generator of the limiting process is

sf=rφT-s*γφT-sf-f*γφT-s. (4.9)

This agrees with Theorem 2.23.

5. The lookdown process

Our characterisation of the motion of lines of descent (from which we establish that of ancestral lineages) when we pass to the scaling limit in our model will be justified via a lookdown construction. In this section we present such a construction for the general population model of Definition 2.4. It will be in the spirit of Kurtz and Rodrigues [2011]. The general set-up is as follows. Each individual will be labelled with a “level”, a number in [0,N]. We will still encode the process embellished by these levels as a point measure: if the ith individual’s spatial location is xi and level is ui, then we will write

ξN=iδxi,ui,

which is a measure on Rd×[0,N]. Note that each individual contributes mass 1 to the measure, not 1/N as above. If we assign mass 1/N to each individual and ignore the levels we will recover our population model. Moreover, at any time, the levels of individuals in a given spatial region will be exchangeable and conditionally uniform on [0,N]: in particular, choosing the k individuals with the lowest levels in that region is equivalent to taking a uniform random sample of size k from the population in the region. However, this exchangeability is only as regards the past: an individual’s level encodes information about their future reproductive output, since individuals with lower levels tend to live longer, and have more offspring. For more explanation of the set-up and how this is possible, see Kurtz and Rodrigues [2011] and Etheridge and Kurtz [2019] (and note that our N corresponds to the λ of those papers). The power of this approach is that we can pass to a limit under the same scalings as described in Theorem 2.10, and the limiting “spatial-level” process will still be a point measure, and so we explicitly retain the notion of individuals and lineages in the infinite-population limit.

5.1. Lookdown representation of the model of Definition 2.4

For the remainder of this section, when there is no risk of ambiguity we shall suppress the superscript N on the processes η and ξ.

In this subsection, we’ll define the process ξtt0 in terms of the dynamics of labelled particles, and write down its generator. The dynamics depend on the spatial locations of particles, and in this section ηt is the corresponding spatial measure, i.e.,

ηt()=1Nξt(×[0,N]).

A nontrivial consequence of the way we define ξt will be that this process has the same distribution as the process ηtt0 of Definition 2.4. (This provides our justification for using the same notation for both.)

Following Etheridge and Kurtz [2019], we build the generator step by step from its component parts. Suppose that the initial population is composed of O(N) particles with levels uniformly distributed on [0,N], and that the current state of the population is ξ, with spatial projection η.

An individual at spatial location x with level u produces one juvenile offspring at rate

2θ1-uNγx,η,

which disperses to a location relative to x drawn from the kernel qθ(x,). Averaging over the uniform distribution of the level u, we recover the birth rate θγ(x,η). This juvenile – suppose its location is y – either survives, with probability r(y,η), or immediately dies. (As before, “maturity” is instantaneous.) If it survives, a new level u1 is sampled independently and uniformly from [u,N], and the parent and the offspring are assigned in random order to the levels u,u1. This random assignment of levels to parent and offspring will ensure that assignment of individuals to levels remains exchangeable.

Evidently this mechanism increases the proportion of individuals with higher levels. To restore the property that the distribution of levels is conditionally uniform given η, we impose that the level v of an individual at location x evolves according to the differential equation

v˙=-θvN(N-v)γ(x,η)Rdr(y,η)qθ(x,dy).

Since v[0,N], this moves levels down; see Etheridge and Kurtz [2019], Section 3.4 for a detailed explanation.

Levels never cross below 0, while particles whose levels move above N are regarded as dead (and are removed from the population). Therefore, in order to incorporate death, the level of the individual at location x with level u moves upwards at an additional rate θμθ(x,η)u. Since levels are uniform, it is easy to check that if μθ were constant, this would imply an exponential lifetime for each individual; see Etheridge and Kurtz [2019], Section 3.1 for more general justification.

Putting these together, the level u of an individual at x evolves according to:

u˙=-θuNN-uγx,ηRdry,ηqθx,dy+θμθx,ηu. (5.1)

We shall write

bθx,η:=θγx,ηRdry,ηqθx,dy-μθx,η,

which captures the local net difference between reproduction and death, and

cθ(x,η):=θNγ(x,η)Rdr(y,η)qθ(x,dy), (5.2)

which captures the local rate of production of successful offspring. Recall from equation (2.2) that F(x,η)=θr(x,η)γ(x,η)-μθ(x,η), and so

bθ(x,η)=θγ(x,η)Rd(r(y,η)-r(x,η))qθ(x,dy)+F(x,η). (5.3)

Under Assumptions 2.8, as θ and cθ(x,η) will tend to αγ(x,η)r(x,η)

bθ(x,η)γ(x,η)r(x,η)+F(x,η). (5.4)

We can then rewrite the differential equation governing the dynamics of the level of each individual as

u˙=θγ(x,η)Rdr(y,η)qθ(x,dy)-uN(N-u)+u-bθ(x,η)u=cθx,ηu2-bθx,ηu. (5.5)

Now, we can write down the generator for ξtt0, the lookdown process. In what follows, we will write sums (and, products) over “(x,u)ξ” to mean a sum over the (location, level) pairs of each individual in the population. Test functions for ξ will take the form

f(ξ)=(x,u)ξg(x,u)=exp(logg(x,u)ξ(dx,du)), (5.6)

where g(x,u) is differentiable in u and smooth in x. We will also assume that 0g(x,u)1 for all u[0,N], and g(x,u)1 for uN. In the expressions that follow, we shall often see one or more factor of 1/g(x,u); it should be understood that if g(x,u)=0, then it simply cancels the corresponding factor in f(ξ).

First consider the terms in the generator that come from birth events. When a birth successfully establishes, a new level is generated above the parent’s level, and this new level is assigned to either the offspring or the parent. Since the probability of each is 1/2, the contribution of birth to the generator maps f(ξ) to

f(ξ)(x,u)ξ2θNγ(x,η)uNd(12{g(y,u1)+g(y,u)g(x,u1)g(x,u)}1)r(y,η)qθ(x,dy)du1 (5.7)
=f(ξ)(x,u)ξ2γ(x,η){12NuNg(x,u1)du1θd(g(y,u)g(x,u))r(y,η)qθ(x,dy)g(x,u)+θNuNd(g(y,u1)+g(x,u1)21)r(y,η)qθ(x,dy)du1}. (5.8)

In (5.7), u1 is the new level and y is the offspring’s location, and so the two terms in the integral correspond to the two situations: in the first, we have added an individual at y,u1, while in the second, we replace an individual at (x,u) by one at x,u1 and another at (y,u). We’ve rewritten it in the form (5.8) because each of the two pieces naturally converges to a separate term in the limit.

The remaining term in the generator is due to the motion of particles’ levels. Reading off from (5.5), it takes the form

fξx,uξcθx,ηu2bθx,ηuugx,ugx,u. (5.9)

We can now define the spatial-level process explicitly as a solution to a martingale problem, whose generator is just the sum of (5.8) and (5.9). We need some notation. Write 𝒞=𝒞Rd×[0,) for the counting measures on Rd×[0,) and 𝒞N for the subset consisting of counting measures on Rd×[0,N].

Definition 5.1 (Martingale Problem Characterisation)

For given positive values of N and θ, define the generator AN by

ANf(ξ)=f(ξ)(x,u)ξ2γ(x,η){12NuNg(x,u1)du1θd(g(y,u)g(x,u))r(y,η)qθ(x,dy)g(x,u)+θNuNd(g(y,u1)+g(x,u1)21)r(y,η)qθ(x,dy)du1}+f(ξ)(x,u)ξ(cθ(x,η)u2bθ(x,η)u)ug(x,u)g(x,u), (5.10)

where f(ξ)=(x,u)ξg(x,u) is as defined in (5.6), and η()=ξ(×[0,N])/N as before. Given ξ0𝒞N, we say that a 𝒟[0,)𝒞N-valued process ξtt0 is a solution to the (AN,ξ0) martingale problem if fξt-fξ0-0tANfξsds is a martingale (with respect to the natural filtration) for all test functions f as defined above.

The martingale problem for finite N has a unique solution. Next we state the limiting martingale problem, for which we do not necessarily have uniqueness. As before, the parameter α will correspond to limNθ(N)/N. Whereas for finite N, conditional on the population process ηtN, the levels of particles are independent and uniformly distributed on [0,N], in the infinite population limit, conditional on ηt, the process ξt is Poisson distributed on Rd×[0,) with mean measure ηt×λ, where λ is Lebesgue measure.

Definition 5.2 (Martingale Problem Characterisation, scaling limit)

Fix α[0,), and define test functions f by f(ξ)=(x,u)ξg(x,u) with g differentiable in u, smooth in x, satisfying 0g(x,u)1 and such that there exists a u0 with g(x,u)=1 for all u>u0. Then, define the operator A on such test functions by

Af(ξ)=f(ξ)(x,u)ξγ(x,η)(g(,u)r(,η))(x)g(x,u)r(x,η)g(x,u)+f(ξ)(x,u)ξ2αγ(x,η)r(x,η)u(g(x,u1)1)du1+f(ξ)(x,u)ξ(αγ(x,η)r(x,η)u2{γ(x,η)r(x,η)+F(x,η)}u)ug(x,u)g(x,u). (5.11)

We say that a 𝒟[0,)(𝒞)-valued process ξtt0 is a solution to the A,ξ0 martingale problem if it has initial distribution ξ0 and fξt-fξ0-0tAfξs ds is a martingale (with respect to the natural filtration) for all test functions f as defined above.

The lookdown processes have been carefully constructed so that observations about the past spatial positions of individuals in the population do not give us any information about the assignment of individuals to levels. In other words, the dynamics of the lookdown process preserve the conditionally uniform (or in the limit, conditionally Poisson) structure – if started with uniform levels, levels are uniform at all future times. Moreover, if we average over levels in the expression for the generator (equation (5.10) or (5.11)) we recover the generator for the population process. Once this is verified (along with some boundedness conditions) the Markov Mapping Theorem (Theorem A.1; also see Etheridge and Kurtz [2019]) tells us that by “removing labels” from the lookdown process ξ we recover the population process η.

To make this precise, define the spatial projection maps κN:Rd×[0,N]Rd by κNξN()=ξN(×[0,N])/N, and κ:Rd×[0,)Rd by κ(ξ)()=limu0ξ(×0,u0/u0. We will also need an inverse notion: for a measure ξN on Rd×[0,N] and a σ-field , we say that ξN is conditionally uniform given if κN(ξ) is -measurable and for all compactly supported f,

Ee-f,ξ=e-HfN,κN(ξ), (5.12)

where

HfN(x)=-Nlog1N0Ne-f(x,u)du.

In other words, the [0,N] components of ξ are independent, uniformly distributed on [0,N], and independent of κN(ξ). Similarly, for a measure ξ on Rd×[0,) we say that ξ is a conditionally Poisson random measure given if κ(ξ) is -measurable and for all compactly supported f,

Ee-f,ξ=e-0(1-e-f(x,u))du,κ(ξ)(dx). (5.13)

In other words, ξ is conditionally Poisson with Cox measure κ(ξ)×λ, where λ is Lesbegue measure.

Proposition 5.3

If η~N is a solution of the martingale problem of Definition 2.4 with initial distribution η0N then there exists a solution ξN of the (AN,ξ0N)-martingale problem of Definition 5.1 such that ηN=κNξN has the same distribution on DFRd[0,) as η~N. Furthermore, for each t,ξtN is conditionally uniform given tηN in the sense of (5.12).

Similarly, if η~ is a solution of the limiting martingale problem of Theorem 2.10 with initial distribution η0 then there exists a solution ξ of the martingale problem of of Definition 5.2 such that η=κξ has the same distribution on DFRd[0,) as η~. Furthermore, ξt is conditionally Poisson given tη in the sense of (5.13).

Now we can present the main convergence theorem that is analogous to Theorem 2.10 for the population process.

Theorem 5.4

Let ξtN satisfy Definition 5.1 and assume that as N,θ in such a way that θ/Nα. Let η0N=κξ0N and suppose also that η0Nη0 in FRd, and that for each N,ξ0N is conditionally uniform given η0N in the sense of (5.12). Then, ξtNt0 has a subsequence which converges in distribution as N to a measure-valued process ξtt0 with ξt conditionally Poisson given ηt=κξt for each t in the sense of (5.13), that is a solution to the martingale problem of Definition 5.2.

Both results are proved in Section 8.

5.2. Explicit construction of lines of descent

The main interest in using a lookdown construction for our population processes is that it allows us to retain information about the relatedness of individuals as we pass to the infinite population limit. In order to exploit this, in this section we write down stochastic equations for the locations and levels of individuals in the prelimiting lookdown model. We will then be able to pass to the scaling limit. This provides an explicit description of the solution to the limiting martingale problem of Definition 5.2 which will enable us to identify all individuals in the current population that are descendants of a given ancestor at time zero. In theory at least, this allows us to recover all the information about genealogies relating individuals sampled from the present day population. This idea draws on the notion of “tracers”, popular in statistical physics and used in population genetics by a number of authors including Hallatschek and Nelson [2008], Durrett and Fan [2016], and Biswas et al. [2021].

We will construct the process using a Ulam-Harris indexing scheme. First, we assign each individual alive at time 0 a unique label from N. Suppose an individual with label a and level u reproduces, and as a result there are two individuals, one with level u and one with a new level u1>u. The parent individual, previously labeled a, might be assigned either level. We will track chains of descendant individuals forwards through time by following levels, rather than individuals, and will call this a line of descent. So, after reproduction, we give a new label to only the individual that is given the new level u1, retaining the label a for the individual with the old level u. In this way, at each birth event, a unique label is assigned to the resulting individual with the higher level, and the label of an individual may change throughout its lifetime.

Concretely, then: for each label a in =k1Nk, let Πa be an independent Poisson process on [0,)2×Rd×{0,1}. The mean measure of each Πa is a product of Lebesgue measure on [0,)2, the density of the standard Gaussian on Rd, and δ0+δ1/2 on {0, 1}. It will also be convenient to suppose that for each label a we have an enumeration of the points in Πa, so we may refer to “the jth point in Πa“, although the precise order of this enumeration is irrelevant. If (τ,v,z,κ) is the jth point in Πa, then τ will determine a possible birth time, v will determine the level of the offspring, z will determine the spatial displacement of the offspring relative to the parent, κ will be used to determine whether parent or offspring is assigned the new level, and the new label produced will be aj, i.e., the label a with j appended (so, if a=a1,,ak then aj=a1,,ak,j). Each label a has a birth time τa, when it is first assigned, and a (possibly infinite) death time σa, when its level first hits N. For any τatσa we denote by Xa(t) and Ua(t) the spatial location and level of the individual carrying label a at time t, respectively. Furthermore, define

ηtN=1Na:τat<σaδXa(t)andξtN=a:τat<σaδXa(t),Ua(t).

Now, since we have defined labels so that the level does not jump, Ua satisfies (5.5) for τatσa, i.e.,

Ua(t)=Uaτa+τat(cθXa(s),ηsUa(s)2-bθXa(s),ηsUa(s))ds, (5.14)

and, of course, σa=inftτa:Ua(t)>N.

Potential reproduction events occur at times τ for each point (τ,v,z,κ)Πa with τaτ<σa. (We say “potential” since if the level of the resulting offspring is greater than N, the event does not happen.) If this is the jth point in Πa, the potential new label is aj, the birth time is τaj=τ, and the spatial displacement of the potential offspring is y(X(τ-),z), where

yx,z:=1θbx+1θKxz,

and KxKTx=Cx.

Next we must choose the new level created at the birth event. We would like an individual with level u and at spatial position x to produce offspring at y at instantaneous rate

21-uNθγx,ηrx+y,η. (5.15)

To do this we will associate the point (τ,v,z,κ)Πa with level u+v, where is chosen so that the rate of appearance of points in Πa with level below N, that is points with v<N-u, is given by (5.15). Since the mean measure of Πa is Lebesgue measure in the t and v directions, we must take

(x,y,η)=N-u2(1-u/N)θγ(x,η)r(x+y,η)=12N-1θγ(x,η)r(x+y,η), (5.16)

and, using this, the (potential) new level is

Uaj(τ)=Ua(τ)+vXa(τ-),yXa(τ-),z,ητ-.

If Uaj(τ)<N, the new individual labeled aj is produced, and κ determines which label, a or aj, is associated with the new location, so

Xajτ=Xaτ-+1-κyXaτ-,z.

On the other hand if Uaj(τ)N, then Xa is unchanged and Xaj is undefined, so

Xa(τ)=Xa(τ-)+κyXa(τ-),z1Uaj(τ)<N. (5.17)

Recall that the parental individual always retains their spatial location, so that κ=0 corresponds to the parent being assigned a new level, and our line of descent switching to the offspring. Combining these observations, Xa, for τat<σa, solves the equation

Xat=Xaτa+τa,t×0,×R×0,1yXaτ-,zκ1Uaτ+vXaτ-,yXaτ-,z,ητ-<NdΠaτ,v,z,κ.

Although we have described the evolution of a line of descent only for a given label (i.e., for τat<σa), we can extend the definition to times 0t<σa by setting Xa(t) equal to X[a]t(t), where [a]t is the label of the ancestor of label a alive at time t, and similarly for Ua(t). It is then straightforward, albeit tedious, to write down the time evolution of Xa(t),Ua(t) for all time back to t=0 in terms of the driving Poisson processes.

Remark 5.5

Although we have a single construction that couples the processes across all N, unlike in Kurtz and Rodrigues [2011] the actual trajectories, Xa(), do not necessarily coincide for different values of N, since they are affected by the whole population process. However, this does suggest approximating the genealogies in the infinite density limit by simulating up until a sufficiently high level that we have a good approximation to the population process.

5.3. Limiting processes for lines of descent

The previous section constructed the lookdown process using the same underlying Poisson processes Πaa for different values of N. As a result, if the spatial projections η converge, then individual lines of descent converge pointwise as N. To see this, first note that if the Poisson processes are fixed then the set of events with which a given label a is associated is also fixed – this is the sequence τk,vk,zk,κk associated with the label a. To conclude that the lines of descent converge, first, we clearly need that the spatial projections η converge. Supposing that they do, consider how a line of descent Xa(t),Ua(t) evolves. It throws off a new line of descent at a higher level when there is a point (τ,v,z,κ) in Πa with τ>τa and

v<2N-Ua(τ)NθγXa(τ-),ητ-rXa(τ-)+yXa(τ-),z,ητ-. (5.18)

Since the mean measure of the v coordinate is Lebesgue measure, θ/Nα, and qθ(x,dy)δx(dy), this corresponds in the limit to new lines of descent being thrown off according to a Poisson process with intensity

2αγXa(t),ηtrXa(t),ηtdt×du.

Now consider the location of the line of descent: at each birth event, with probability one half the line of descent jumps to Xa(t)+y. Taking g to be a suitable test function on Rd, and rewriting (5.18), when the level is u and the state of the population is η, the generator of the spatial motion of the line of descent applied to g(x) is

(1-uNγ(x,η)θRdr(x+y,η)(g(x+y)-g(x))qθ(x,dy)=1-uNγ(x,η){θRd(r(x+y,η)g(x+y)-r(x,η)g(x))qθ(x,dy)-θRd(r(x+y,η)-r(x,η))g(x)qθ(x,dy)}γx,ηrgx-gxrx,asN,θ.

Notice that the factors of 2 have cancelled, and that the result is independent of u. Also recall that r(x,η) depends on η only through ρr*η(x), which is guaranteed to be smooth, so that (r) and (gr) are well-defined.

We write out the differential operator above in more detail. Recall that g(x)=ibiig(x)+ijCijijg(x), and for the moment write r(x) for r(x,η), b(x)=b, and C(x)=C so that

(rg)(x)g(x)(r)(x)=r(x)ibiig(x)+2ijir(x)Cijjg(x)+r(x)ijCijijg(x)=r(x){(b+2Clogr(x))g(x)+ijCijijg(x)}. (5.19)

The only thing that remains is to describe how the levels change, but this is immediate from applying limit (5.4) to equation (5.5).

We summarize the results in a proposition.

Proposition 5.6 (Line of descent construction)

Define J(x,η) and β(x,η) by

r(x,η)γ(x,η)C(x)=J(x,η)J(x,η)T
β(x,η)=r(x,η)γ(x,η)(b(x)+2C(x)logr(x,η)).

Associate with each label a=k1Nk an independent d-dimensional Brownian motion Wa and an independent Poisson process Ra on [0,)2 with Lebesgue mean measure, and with points ordered in some way. Given η0FRd, let xi,ui be the points of a Poisson process on Rd×[0,) with mean measure η0×λ (the product of η0 and Lebesgue measure). For each i, begin a line of descent with label i, location Xi(0)=xi, level Ui(0)=ui, and birth time τi=0.

Write τa for the birth time of the label a and σa=limu0inft0:Ua(t)>u0 the time the level hits . Suppose that the spatial locations and level of each line of descent a solve, for τat<σa,

Xa(t)=Xaτa+τatβXa(s),ηsds+τatJXa(s),ηsdWa(s)Ua(t)=Uaτa+τatαγXa(s),ηsrXa(s),ηsUa(s)2-γXa(s),ηsrXa(s),ηs+FXa(s),ηsUa(s)ds. (5.20)

Each point in each Ra denotes a potential birth time for a: if the jth point in Ra is (τ,v), with τaτ<σa, then a new line of descent with label aj is produced, with birth time τaj=τ, location Xaj(τ)=Xa(τ), and level

Uaj(τ)=Ua(τ)+v2αγXa(τ),ητrXa(τ),ητ,

if this is finite. For any solution to the equations above, the processes defined by

ηt=limu01u0a:τat<σa;Ua(t)<u0δXa(t)andξt=a:τat<σaδXa(t),Ua(t)

are solutions to the martingale problems of Theorems 2.10 and 5.4, respectively.

In particular, note that if α=0, no new lines of descent are produced. More precisely, comparing with (5.16), they are produced, but “at infinity”, and their trace is seen in the spatial motion of the line of descent which results from the production of these lineages.

Proof [of Proposition 5.6]

The fact that a solution to the system of equations (5.20) is a solution to the martingale problem of Theorem 5.4 is an application of Itô’s theorem. Furthermore, in Proposition 5.3 we showed that the conditional Poisson property of ξ0 is preserved (i.e., holds for ξt for all), and so ηtt0 is well-defined, and furthermore that ηt is a solution to the martingale problem of Theorem 2.10. □

Proofs of the remainder of these results are in Section 8.

Remark 5.7

The process we consider is similar to the state-dependent branching processes of Kurtz and Rodrigues [2011], so one might expect that the proofs there would carry over with little change. However, there is an important difference: Recall that the level Ua(t) of a line of descent evolves as

u˙=cθ(x,η)u2-bθ(x,η)u, (5.21)

where bθ(x,η) and cθ(x,η) are defined in (5.3) and (5.2) respectively. Note that cθ(x,η)0, while bθ(x,η) may take either sign. Assumptions 2.8 imply that cθ(x,η) is bounded, while bθ(x,η), because of F(x,η), is bounded above but not necessarily below. In Kurtz and Rodrigues [2011], bθ was bounded above and cθ was bounded away from zero, so they noted that if Ua(t)bθ/cθ for some label a, that line of descent would only move upwards from that time onwards. Furthermore, coefficients did not depend on the state of the process (i.e., on η thus allowing the processes to be jointly and simultaneously constructed for all values of N, with a pointwise embedding of ξtNt>0 within ξMt>0 for bθ/cθ<N<M. In other words, individuals with levels above N>bθ/cθ at time t0 do not affect ξtNtt0, thus allowing a comparison of the number of lines of descent below level u0 to a branching process. Although we have provided a joint construction of ξN for all N in Section 5.2, it does not have this monotonicity: for one thing, bθ and cθ depend on the population process η and so all individuals can affect all other ones (even those with lower levels). Furthermore, in the deterministic case θ/N, and hence c, converges to zero, and so lines of descent with arbitrarily high level may drift back downwards. Indeed, this must be the case if the population persists, since in the deterministic case there is no branching.

6. Proofs of convergence for nonlocal models

In this section we present formal proofs of the first two of our three scaling limits. In Subsection 6.2 we prove Theorem 2.10, to obtain (both stochastic and deterministic) limits in which interactions between individuals in the population are nonlocal. In Subsection 6.3 we show how, in two important examples in which the nonlocal limit is respectively a deterministic solution to a non-local equation of reaction-diffusion type and a deterministic solution to a nonlocal porous medium equation with an additional logistic growth term, one can pass to a further limit to obtain a classical PDE.

6.1. Preliminaries

Below we will have frequent use for the quantity

Bfθ(x,η)=θRd(f(y)r(y,η)-f(x)r(x,η))qθ(x,dy). (6.1)

First, we prove Lemma 2.9.

Proof [of Lemma 2.9]

Here, we need to prove that γ(x,η)Bfθ(x,η) is bounded, uniformly over x and η. First suppose that Condition 1 of Lemma 2.9 is satisfied. We write

r(y,η)f(y)-r(x,η)f(x)=r(y,η)(f(y)-f(x))+(r(y,η)-r(x,η))f(x)=r(y,η)i(y-x)ixif(x)+ij(y-x)i(y-x)jxixjfz1+f(x)i(y-x)ixir(x,η)+ij(y-x)i(y-x)jxixjrz2,η=r(x,η)+j(y-x)jxjrz3,ηi(y-x)ixif(x)+r(y,η)ij(y-x)i(y-x)jxixjfz1+f(x)i(y-x)ixir(x,η)+ij(y-x)i(y-x)jxixjrz2,η,

for some zi=κix+1-κiy. Integrating this against q(x,dy), we get that

θ(r(y,η)f(y)-r(x,η)f(x))qθ(x,dy)|ir(x,η)xif(x)+f(x)xir(x,η)θ(y-x)iqθ(x,dy)|+f(x)θijxixjrz2,η(y-x)i(y-x)jqθ(x,dy)+θij(y-x)i(y-x)jxif(x)xjrz3,η+r(y,η)xixjfz1qθ(x,dy)

Since qθ(x,dy) is the density of a Gaussian with mean b(x)/θ and covariance C(x)/θ, and both b(x) and C(x) are uniformly bounded, so that θ(y-x)iqθ(x,dy) is bounded as well. Furthermore, a change of variables that diagonalizes C(x) shows for any g:RdRd+d, that if Cg=supysupz=1ijg(y)ijzizj and λ*=supysupz=1ijC(y)ijzizj then

θijg(y)ij(y-x)i(y-x)jqθ(x,dy)Cgλ*.

Condition 1 gives uniform bounds on the derivatives of r(x,η)=rx,ρr*η(x) in this expression and so, provided f also has uniformly bounded first and second derivatives, we have a bound of the form

BfθK1+K2|f(x)|,

for suitable constants K1,K2 that depend only on the derivatives of f.

Now suppose instead that Condition 2 of Lemma 2.9 is satisfied. First note that

Bfθ=θRnf(y)ry,ρr*η(y)-f(x)rx,ρr*η(x)qθ(x,dy)θRnf(y)ry,ρr*η(y)-f(x)rx,ρr*η(y)qθ(x,dy)+θRnf(x)rx,ρr*η(y)-f(x)rx,ρr*η(x)qθ(x,dy). (6.2)

(Note the extra term introduced here, rx,ρr*η(y)), has the two arguments to r “at different locations”, contrary to the usual pattern.)

Writing K3=supx,mmaxixif(x)r(x,m) and K4=supx,mmaxi,jxixjf(x)r(x,m), the first term is bounded exactly as above. For the second,

rx,ρr*η(y)-rx,ρr*η(x)=ρr*η(y)-ρr*η(x)rx,ρr*η(x)+12ρr*η(y)-ρr*η(x)2r(x,m),

where m=κρr*η(x)+1-κρr*η(y) for some 0κ1, and we have used r and r to denote the first and second derivatives of r(x,m) with respect to the second argument. So, writing K5=r and K6=r, the second term in (6.2) is bounded by f(x) multiplied by

K5|θRdρr*η(y)-ρr*η(x)qθ(x,dy)|+K6θRdρr*η(y)-ρr*η(x)2qθ(x,dy).

Under Condition 2 of Assumptions 2.8, this is bounded by a constant times ργ*η(x)+(ργ*η(x))2 and supxm2γ(x,m) is bounded, so therefore γ(x,η)Bfθ(x,η)K7+K8f(x), where K7 comes from K3,K4, and the supremum of γ, while K8 comes from K5,K6, and the supremum of m2γ(x,m). □

6.2. Proof of Theorem 2.10: convergence for the nonlocal process

In this section we prove Theorem 2.10. This would be implied by convergence of the lookdown process (see Kurtz and Rodrigues [2011] and Etheridge and Kurtz [2019]); however in our setting, because the parameters in the lookdown process depend on the empirical distribution, we actually use tightness of the sequence of population processes in the proofs of tightness for the corresponding lookdown processes.

Proof [Proof of Theorem 2.10.]

The proof follows a familiar pattern. First we extend Rd to its one-point compactification R¯d and establish, in Lemma 6.2, compact containment of the sequence of scaled population processes in F(R¯d) (for which, since we have compactified Rd, it suffices to consider the sequence of total masses); armed with this, tightness of the population processes in 𝒟[0,)(F)R¯d)) follows from tightness of the real-valued processes Hηtt0 for a sufficiently large class of test functions H, which we establish through an application of the Aldous-Rebolledo criterion in Lemma 6.3. These ingredients are gathered together in Proposition 6.4 to deduce tightness of the scaled population processes in the larger space 𝒟[0,)(F(R¯d)).

We then characterise limit points as solutions to a martingale problem in Lemma 6.6; finally in Lemma 6.7 we check that in the process of passing to the limit, no mass ‘escaped to infinity’, so that in fact the limit points take values in 𝒟[0,)FRd. □

As advertised, we work with the one-point compactification of Rd and consider ηtNt0 as a sequence of F(R¯d)-valued processes. Since, for each K>0, {η:1,ηK} is a compact set in F(R¯d), we shall focus on controlling 1,ηtNt0. The key is that Assumptions 2.8 are precisely chosen to guarantee boundedness of the net per-capita reproduction rate.

Lemma 6.1

Under Assumptions 2.8, for all fCb2Rd with uniformly bounded first and second derivatives, and all T>0, there exists a C=C(f,T)<, independent of N, such that

Ef,ηtNCE1,η0N (6.3)

for all N1.

Proof

Consider the semimartingale decomposition from equation (2.3):

f,ηtN=f,η0N+0tRdγx,ηsNBfθx,ηsN+fxFx,ηsNηsNdxds+MtNf, (6.4)

where MtN(f) is a martingale and Bfθ is defined in (6.1). First note that Condition 6 of Assumptions 2.8 stipulates that γBfθ is uniformly bounded by a contant times 1+f, and so recalling that F is bounded above, we conclude that under Assumptions 2.8 γ(x,η)Bfθ(x,η)+f(x)F(x,η)Cf(1+f(x)) for some Cf.

Now, taking expectations in (6.4),

Ef,ηtNEf,η0N+Cf0tE1+f,ηsNdt. (6.5)

The bound (6.3) then follows by first applying Gronwall’s inequality in the case f=1, which yields

E1,ηtNeCtE1,η0N,

with C independent of N, and then substituting the resulting bound on E1,ηsN into the expression above. □

With a bound on per-capita net growth rate in hand, bounds on the expectation of the supremum of the total population size over a finite time interval also follow easily.

Lemma 6.2 (Compact containment for the population process)

Under the assumptions of Theorem 2.10, for each T>0, there exists some constant CT, independent of N, such that

Esup0tT1,ηtNCTE1,η0. (6.6)

In particular, for any δ>0, there exists Kδ>0 such that

limsupNPsups[0,T]1,ηsN>KδCTKδ<δ. (6.7)

Proof

First note that by Lemma 6.1, E1,ηtNE1,η0NeCt for some C (independent of N). Now, let MtN*(f)=sup0stMtN(f), and as before let MN(f)t be the angle bracket process of MtN(f). The Burkholder-Davis-Gundy inequality says that there is a K for which E[MtN*(1)]KE[MN(1)t], where MN(1)t is the quadratic variation of MN(1). Furthermore, as discussed by Hernández-Hernández and Jacka [2022], the expectation of the quadratic variation of a local martingale is bounded by a (universal) constant multiple of the expectation of its angle bracket process [Barlow et al., 1986, Item (4.b’), Table 4.1, p. 162]. Now, since x1+x, in the notation of Lemma 6.1, there is a C such that

E[MtN*(1)]C1+EMN(1)t=C1+θNE0t{γx,ηsNRdry,ηsNqθ(x,dy)+μθx,ηsN},ηsN(dx)ds=C1+E0t2θNγx,ηsNrx,ηsN+γx,ηsNNB1θx,ηsN-1NFx,ηsN,ηsN(dx)ds.

We have not assumed that F is bounded below, but to see that the term involving -F does not cause us problems, we rearrange equation (6.4) with f=1 to see that

E0t-Fx,ηsN,ηsN(dx)ds=E1,η0N-E1,ηtN+E0tγx,ηsNB1θx,ηsN,ηsN(dx)ds, (6.8)

which is bounded since γ(x,η) and B1θ(x,η) are both bounded and 1,ηt0. Since θ/Nα<, combining constants, we obtain that for some C,

E[MtN*(1)]C+CE1,η0NetC.

Taking suprema and expectations on both sides of equation (6.4), then again using the fact that γ(x,η)B1θ(x,η)+F(x,η)C,

Esup0sT1,ηsNE1,η0N+Esup0tT0tγx,ηsNB1θx,ηsN+Fx,ηsN,ηsN(dx)ds+E[MtN*(1)]E1,η0N+CE0Tsup0st1,ηsNdt+C+CE1,η0NetC.

Once again applying Gronwall’s inequality,

Esup0sT1,ηsNC1+E1,η0Ne2TC.

For any T, the quantity on the right is bounded above by a constant C(T) independent of N. As a result, for any K>0,

limsupNPsup0sT1,ηsNKC(T)K.

Our next task is to show tightness of f,ηtNt0 for fCb(R¯d).

Lemma 6.3 (Tightness of (f,ηtN)t>0)

For each fCb(R¯d), the collection of processes f,ηtNt0 for N=1,2, is tight as a sequence of càdlàg, real-valued processes.

Proof

The Aldous-Rebolledo criterion (Theorem B.2) applied to the semimartingale representation of f,ηtN of equation (6.4), tells us that it suffices to show that for each T>0, (a) for each fixed 0tT, the sequence f,ηtNN1 is tight, and (b) for any sequence of stopping times τN bounded by T, and for each ν>0, there exist δ>0 and N0>0 such that

supN>N0supt[0,δ]Pττ+tRdγx,ηsNBfθx,ηsN+f(x)Fx,ηsNηsN(dx)ds>ν<ν, (6.9)
andsupN>N0supt[0,δ]PMN(f)τ+t-MN(f)τ>ν<ν. (6.10)

Tightness of f,ηtN for fixed t follows from Lemma 6.1 and Markov’s inequality, so we focus on the remaining conditions.

The proof of Lemma 6.1 provides a uniform bound on γBfθ, but we only know that F is bounded above. However, by assumption, for each fixed value of m,supkm|F(x,k)| is uniformly bounded as a function of x. Noting that ρF*η1,ηρF, we can use Lemma 6.2 to choose N0 and K such that if N>N0, then

Psup0sT1,ηsNK<ν/2,

we now choose δ1 so that

δ1fsup{supxFx,k:kKρF}<ν/4,supx,ηγ(x,η)Bfθ(η)δ1<ν/4,

so that (6.9) is satisfied with δ=δ1.

Similarly,

MN(f)τ+t-MN(f)τ=ττ+tθNRdγx,ηsNRdf2(y)ry,ηsNqθ(x,dy)+μθx,ηsNf2(x)ηsN(dx)ds=ττ+tθNRdγx,ηsN2f2(x)rx,ηsN+Bf2θx,ηsN-f2(x)Fx,ηsNθηsN(dx)ds,

and so using the fact that θ/Nα<, an argument entirely analogous to that for (6.9) yields a δ2 for which (6.10) is satsified. Taking δ=minδ1,δ2, the result follows. □

We collect the implications of the last two lemmas into a proposition.

Proposition 6.4 (Tightness of ηtNt0)

The collection of measure-valued processes {ηtNt0:}N1} is tight in 𝒟[0,)(F(R¯d)).

Proof

Theorem 3.9.1 in Ethier and Kurtz [1986] says that if the collection of E-valued processes satisfies a compact containment condition (for any ϵ>0 and T>0, there is a compact set such that the processes stay within that set up to time T with probability at least 1-ϵ), then the collection is relatively compact (which is equivalent to tightness since we are working on a Polish space) if and only if {fηtNt0:N1} is relatively compact for all f in a dense subset of Cb(E) under the topology of uniform convergence in compact sets.

Since {ν:1,νK} is compact in F(R¯d), Lemma 6.2 gives compact containment. Lemma 6.3 shows that the real-valued processes f,ηtN are relatively compact for all f𝒞b(R¯d). Since by the Stone-Weierstrass theorem, the algebra of finite sums and products of terms of this form is dense in the space of bounded continuous functions on F(R¯d), and tightness of f,ηtN extends to sums and products of this form by Lemma B.3, we have relative compactness in 𝒟[0,)(F(R¯d)). □

We wish to characterise the limit points of ηtNt>0N1 as solutions to a martingale problem with generator 𝒫 which we now identify. Most of the work was done in Section 4. First, we record an equivalent formulation of the martingale problems, which were essentially laid out in Subsection 4.1.

Lemma 6.5

For G𝒞(R) with G<, and f𝒞b(R¯d), define the function Gf by Gf(η):=G(f,η). Let 𝒫N be the generator given by

𝒫NGf(η):=θNγ(x,η)(G(f,η+f(z)/N)-G(f,η))r(z,η)qθ(x,dz)+(G(f,η-f(x)/N)-G(f,η))μθ(x,η),η(dx). (6.11)

The process ηtNt0 of Definition 2.4 is the unique solution to the (𝒫N,η0)-martingale problem, i.e., if

Mt:=GfηtN-Gfη0N-0t𝒫NGfηsNds

is a martingale (with respect to the natural σ-field).

Furthermore, let 𝒫 be the generator given by

𝒫Gf(η):=G(f,η)γ(x,η)(f()r(,η))(x)+f(x)F(x,η),η(dx)+αG(f,η)γ(x,η)r(x,η)f2(x),η(dx). (6.12)

A process ηtt0 satisifes the martingale characterization of equations (2.7) and (2.8) if and only if it is a solution to the 𝒫,η0-martingale problem, i.e., if for all such test functions

Mt:=Gfηt-Gfη0-0t𝒫Gfηsds

is a martingale (with respect to the natural σ-field).

Lemma 6.6 (Characterisation of limit points)

Suppose that η0NN1 converges weakly to η0 as N. Then any limit point of {ηtNt0}N1 in 𝒟[0,)(F(R¯d)) is a solution to the martingale problem for 𝒫,η0.

Proof

We use Theorem 4.8.2 in Ethier and Kurtz [1986]. First observe that the set of functions {Gf(η):=G(f,η),G𝒞(R),G<,f𝒞b(R¯d)} is separating on F(R¯d). Therefore, it suffices to show that for any t>0 and τ>0 that

limNE[(Gf(ηt+τN)Gf(ηtN)tt+τ𝒫Gf(ηsN)ds)i=1khi(ηtiN)]=0 (6.13)

for all k0,0t1<t2<,tkt<t+τ, and bounded continuous functions h1,,hk on F(R¯d). Since ηtNt0 is Markov, the tower property gives that, for each N,

E[(Gf(ηt+τN)Gf(ηtN)tt+τ𝒫NGf(ηsN)ds)i=1khi(ηtiN)]=0. (6.14)

Therefore, it suffices to show that

limNEtt+τPNGfηsNPGfηsNdsi=1khiηtiN=0, (6.15)

and, again using the tower property, since the functions hi are bounded, this will follow if

limNEtt+τ𝒫NGfηsN-𝒫GfηsNdst=0 (6.16)

(where tt0 is the natural σ-field).

We rewrite 𝒫NGfηsN using a Taylor series expansion up to third order for G(f,η±f(y)/N) around G(f,η). As in Section 4 (except that now we are more explicit about the error term), we find

𝒫NGf(η):=G(f,η)Rdθγ(x,η)Rdf(y)r(y,η)qθ(x,dy),-f(x)μθ(x,η),η(dx)+12θNG(f,η)Rdγ(x,η)Rdf2(y)r(y,η)qθ(x,dy)+f2(x)μθ(x,η),η(dx)+16θN2G(w)γ(x,η)Rdf3(y)r(y,η)qθ(x,dy)-G(v)f3(x)μθ(x,η),η(dx) (6.17)

for some w,vf,η-f/N,f,η+f/N.

Combining with equation (4.5), and the fact that μθ(x,η)r(x,η)γ(x,η) as θ, we have pointwise convergence:

limN𝒫NG(f,η)-𝒫G(f,η)=0. (6.18)

To conclude convergence of the expectation, we would like to apply the Dominated Convergence Theorem in (6.15). Recall that f and G and their derivatives are bounded, and so γ(x,η) is bounded independent of θ. Since θ/N20, rearranging as in (4.3) and using the convergence of (4.4), we deduce that we can dominate 𝒫NGfηsN-𝒫GfηsN by a constant multiple of 1+|F(x)|,ηs(dx). Since F is bounded above, there is a constant K such that |F|K-F so that, exactly as in equation (6.8), we can check that

Ett+τFx,ηsN,ηsN(dx)ds|t<,

which concludes our proof. □

The last step in the proof of Theorem 2.10 is to check that any limit point ηtt0 of {ηtNt0}N1 actually takes its values in FRd, that is, “no mass has escaped to infinity”.

Lemma 6.7

Under the assumptions of Theorem 2.10, let ηtt0 be a limit point of {ηtNt0}N1. For any δ>0,

Pηt({x>R})>δ0asR.

Proof [Sketch]

Take f0(x) as in the statement of Theorem 2.10, i.e., f0 is nonnegative, grows to infinity as x, has uniformly bounded first and second derivatives, and has f0,η0N uniformly bounded in N. We take a sequence of test functions fn that increase to the function f0 and having uniformly bounded first and second derivatives, so that there is a (single) C from Condition 6 of Assumptions 2.8 such that γ(x,η)Bfnθ(x,η)C1+fn(x) for all x,η, and fn. Then, just as we arrived at equation (6.5),

Efn(x),ηtN(dx)Efn(x),η0N(dx)+C0tEfn(x),ηsN(dx)ds,

with the same constant for all n and all N. Gronwall’s inequality then implies that Efn,ηtNC for some C independent of n,N, and t[0,T]. By first taking N and then n, we find that Ef0,ηt(dx)C for t[0,T], and since f0 as |x|, an application of Markov’s inequality tells us that for any δ>0,

{ηt({x:x>R})>δ}E[f0,ηt]inf{x:xR}f0(x)0asR.

6.3. Convergence of some nonlocal equations to classical PDEs

It is natural to conjecture that when the limit of the rescaled population process that we obtained in the previous section solves a nonlocal PDE, if we further scale the kernels ρr,ργ, and ρF by setting ρϵ()=ρ(/ϵ)/ϵ, as ϵ0, the corresponding solutions should converge to a limiting population density that solves the corresponding “classical” PDE. We verify this in two examples; in the first the nonlocal equation is a reaction-diffusion equation with the “nonlocality” only appearing in the reaction term; in the second the nonlocal PDE is a special case of a nonlinear porous medium equation. These, in particular, capture the examples that we explored in Section 3.2.

6.3.1. Reaction–diffusion equation limits

In this subsection we prove Proposition 2.15. The conditions of the proposition are in force throughout this subsection. The proof rests on a Feynman-Kac representation. We write Ztt0 for a diffusion with generator * and denote its transition density by ft(x,y). The first step is a regularity result for this density.

Lemma 6.8

Fix T>0. There exists a constant K=K(T)>0 such that, for any x,yRd and t[0,T],

ft(x,z)-ft(y,z)dzx-ytK. (6.19)
Proof

We first use the Intermediate Value Theorem to obtain the bound

ft(x,z)-ft(y,z)dzx-yft(w,z)dz

where acts on the first coordinate only and w is in the line segment [x,y] joining x to y. Under our assumptions on b and C, equation (1.3) of Sheu [1991], gives existence of constants λ=λ(T)>0 and K such that,

ft(w,z)Ktpλt(w,z),

where ps(x,y) is the Brownian transition density. Hence,

ft(x,z)-ft(y,z)dzKx-ytpλt(w,z)dz=Kx-yt.

Lemma 6.9

Fix T>0. Let x,yRd,t[0,T], and denote by Ztyt0 and Ztxt0 independent copies of the diffusion Ztt0 starting from y and x respectively. There exists a constant K=K(T)>0 such that,

EZty-ZtxK(t+y-x).
Proof

First we write,

EZty-Ztx=u-vft(y,u)ft(x,v)dudv.

Under our regularity assumptions on C,b, using equation (1.2) of Sheu [1991], there exist constants K,λ=λ(T)>0 for which,

fty,uKpλty,u.

It then follows that,

EZty-Ztxu-vK2pλt(y,u)pλt(x,v)dvdu=K2EBλty-Bλtx, (6.20)

where Btyt0 and Btxt0 are independent Brownian motions starting at y and x respectively. Using the triangle inequality, and writing Bt0t0 for a Brownian motion started from the origin,

EBλty-Bλtxy-x+EB2λt0y-x+Ct. (6.21)

Substituting (6.21) in (6.20) gives the result. □

We use the representations of the solutions to equations (2.12) and (2.11) respectively:

φtx=Exφ0Zt+0tφsZsFφsZsds, (6.22)
φtϵ(x)=Exφ0Zt+0tφsϵZsFρFϵ*φsϵZsds, (6.23)

from which

φtx-φtϵx=Ex0tφsZsFφsZs-φsϵZsFρϵ*φsϵZsds, (6.24)

where Ex denotes expectation for Z with Z0=x. The key to our proof of Proposition 2.15 will be to replace FφsZs by FρFϵ*φsZs in this expression. We achieve this through three lemmas.

First we need a uniform bound on φ and φϵ.

Lemma 6.10

For any T>0 there exists M=MT,φ0>0 such that, for all 0tT:

maxφt(),φtϵ()<M.
Proof

Using that φ0 and F are bounded above, from the representation (6.22), we have

φt(x)φ0+KE0tφsZsds.

In particular,

φt()φ0+K0tφs()ds,

so, by Gronwall’s inequality,

φt()φ0exp(KT).

Similarly, φtϵ()φ0exp(KT). □

We also need a continuity estimate for φ.

Lemma 6.11

Let T>0. There exists a constant K=KT,φ0>0 and δ0=δ0T,φ0>0 such that for all 0<δ<δ0 and 0tT,

x-y<δ3φt(x)-φt(y)<Kδ.
Proof

First we need some notation. Fix T>0 and write M for the corresponding constant from Lemma 6.10. Let FM=supm[0,M]|F(m)|. We reserve K^ for the constant on the right hand side of equation (6.19) and K~ for the constant in Lemma 6.9, and write Kφ0 for the Lipschitz constant of φ0. Set

δ0=min1FM2,1Me2FM+K^,1K~Kφ0+2FMM,1.

In what follows we take 0<δ<δ0.

We first prove that the result holds if t<δ2. As before let Ztx and Zty be independent copies of the diffusion Zt starting at x and y respectively. From our representation (6.22) and Lemma 6.10, we can write:

φt(x)-φt(y)Exφ0Zt-Eyφ0Zt+2FMMtEφ0Ztx-φ0Zty+2FMMtKφ0EZtx-Zty+2FMMtK~Kφ0(t+y-x)+2FMMtK~Kφ0δ+δ3+2FMMδ2(K~Kφ0+1)δ,

where we have used Lemma 6.9 in the fourth inequality and the definition of δ0 in the last inequality.

Suppose now that δ2<t. We will follow the pattern in Lemma 2.2 of Penington [2017]. First, note that by the Feynman-Kac formula we have an alternative representation for φt(x): for any t<t,

φtx=Exφt-tZtexp0tFφt-sZsds.

Therefore, setting t=δ2 and using Lemma 6.10, for all z,

e-δ2FMEzφt-δ2Zδ2φtzeδ2FMEzφt-δ2Zδ2.

We can then deduce that

φt(x)-φt(y)eδ2FMExφt-δ2Zδ2-e-δ2FMEyφt-δ2Zδ2=eδ2FMExφt-δ2Zδ2-Eyφt-δ2Zδ2+(eδ2FM-e-δ2FM)Eyφt-δ2Zδ2eδ2FMExφt-δ2Zδ2-Eyφt-δ2Zδ2+M(eδ2FM-e-δ2FM). (6.25)

To bound the differences of the expected values in the last equation note that, by using again Lemma 6.10,

Exφt-δ2Zδ2-Eyφt-δ2Zδ2=φt-δ2(z)fδ2(x,z)-fδ2(y,z)dzMfδ2(x,z)-fδ2(y,z)dzMK^x-yδMK^δ2,

where we have used Lemma 6.8 and that x-y<δ3. Substituting in (6.25),

φt(x)-φt(y)eδ2FMMK^δ2+M-Me-2δ2FMeδ2FMMK^δ2+2Mδ2FMeMK^+2MFMδ2δ,

where the last two inequalities follow from the definition of δ. Interchanging x and y yields the same bound for φt(y)-φt(x), and the result follows. □

We proceed to control the difference between F(φ) and FρFϵ*φ. Note first that since ρFL1,

I(ϵ):=y>ϵ3/4ρFϵ(y)dy=y>ϵ-1/4ρF(y)dy0asϵ0.
Lemma 6.12

Let T>0. There exists a constant C=CT,φ0>0 such that, for all 0tT, for all ϵ small enough,

φt()-ρFϵ*φt()CI(ϵ)+ϵ1/4. (6.26)

Furthermore, there is a constant C~T,φ0=C~ such that, for all 0tT,

Fφt()-FρFϵ*φt()C~I(ϵ)+ϵ1/4. (6.27)
Proof

Let ϵ<δ04, with δ0 from Lemma 6.11. Then,

φt(x)-ρFϵ*φt(x)x-y>ϵ3/4ρFϵ(x-y)φt(y)-φt(x)dy+x-yϵ3/4ρFϵ(x-y)φt(y)-φt(x)dy2Mx-y>ϵ3/4ρFϵ(x-y)dy+x-yϵ3/4ρFϵ(x-y)Kϵ1/4dy2MI(ϵ)+Kϵ1/4,

where we used the estimates of Lemma 6.10 and Lemma 6.11. This proves (6.26). For (6.27), let LM be the (uniform) Lipschitz constant of F on [0,M], with M still taken from Lemma 6.10. Then,

Fφt()-FρFϵ*φt()LMφt()-ρFϵ*φt()LM2MI(ϵ)+Kϵ1/4,

which proves (6.27). □

Proof [of Proposition 2.15]

Let ϵ be small enough that Lemma 6.12 holds. We use the notation δ^(ϵ) for the quantity on the right hand side of (6.27). Then from the representation (6.24) and Lemma 6.12 we can write,

φt(x)-φtϵ(x)Ex0tφsZsFρFϵ*φsZs-φsϵZsFρFϵ*φsϵZsds+Mtδ^(ϵ)Ex0tFρFϵ*φsϵZsφsϵZs-φsZsds+Ex0tφsZsFρFϵ*φsϵZs-FρFϵ*φsZsds+Mtδ^(ϵ)FM0tφsϵ()-φs()ds+MLM0tρFϵ*φsϵ()-ρFϵ*φs()ds+Mtδ^(ϵ)FM+MLM0tφsϵ()-φs()ds+Mtδ^(ϵ),

where the second inequality is the triangle inequality, and the third is Lemma 6.10. An application of Gronwall’s inequality then yields,

φtϵ()-φt()Mtδ^(ϵ)exptFM+MLMMTδ^ϵexpTFM+MLM,

giving the result, since δ^0 as ϵ0. □

6.3.2. Porous Medium Equation

In this subsection we prove Proposition 2.18. To ease notation, we present the proof in d=1 (although we retain the notation ). However, to recall the dependence on ϵ we write ρϵ for ργ. It should be clear that it extends almost without change to higher dimensions.

Recall that we are concerned with non-negative solutions to the equation (2.13):

tψtϵx=Δψtϵρϵ*ψtϵx+ψtϵx1-ρϵ*ψtϵx.

and we assume that ρ=ζ*ζˇ with ζ a rapidly decreasing function and ζˇ(x)=ζ(-x). The example we have in mind is ζ (and therefore ρ) being the density of a mean zero Gaussian random variable. We shall prove that under the assumptions of Proposition 2.18, as ϵ0, we have convergence to the solution to the porous medium equation with logistic growth, equation (1.2):

tψtx=Δψt2x+ψtx1-ψtx.

We work on the time interval [0,T]. We will require a lower bound on ψtϵ(x)logψtϵ(x)dx which we record as a lemma.

Lemma 6.13

Suppose that there exists λ(0,1) and C<, both independent of ϵ, such that exp(λ|x|)ψ0ϵ(x)dx<C. Then there exists a constant K<, independent of ϵ, such that ψtϵ(x)logψtϵ(x)dx>-K for all t[0,T].

Proof

First observe that, since xlogx is bounded below, -11ψtϵ(x)logψtϵ(x)dx is bounded below, and recall that ψtϵ(x)0.

Now consider

ddtexp(λx)ψtϵ(x)dx=exp(λx)Δψtϵρϵ*ψtϵ(x)dx+exp(λx)ψtϵ(x)1-ρϵ*ψtϵ(x)dx=λ2-1exp(λx)ψtϵ(x)ρϵ*ψtϵ(x)dx+exp(λx)ψtϵ(x)dxexpλxψtϵxdx, (6.28)

and so, by Gronwall’s inequality, exp(λx)ψtϵ(x)dx is uniformly bounded on [0,T]. In particular, combining with the Mean Value Theorem, we find

xx+1ψtϵ(y)dyCexp(-λx),

where the constant C is independent of x1. A fortiori,

xx+1ψtϵ(y)1ψtϵ(y)1dyCexp(-λx). (6.29)

Now the function ψ10ψ1ψ|logψ| is concave, and so using Jensen’s inequality and (6.29),

xx+1ψtϵ(y)logψtϵ(y)1ψtϵ(y)1dyC'xexp(-λx).

Evidently a symmetric argument applies for x-1. Summing over x, and using that ψlogψ-ψ|logψ|1ψ1, we find

ψtϵ(x)logψtϵ(x)dxCx=1xexp(λx)>K>,

as required. □

Proof [Proof of Proposition 2.18]

First observe that

ψtϵ(x)ρϵ*ψtϵ(x)dx=ψtϵ(x)ψtϵ(x-y)ζϵ(y-z)ζˇϵ(z)dzdydx=ψtϵx~-z~ψtϵx~-y~ζϵy~ζϵz~dz~dy~dx~=ζϵ*ψtϵx2dx,

where we have set x~=x-z,y~=y-z,z~=-z.

Now note that

ddtψtϵ(x)dx=Δψtϵρϵ*ψtϵ(x)dx+ψtϵ(x)1-ρϵ*ψtϵ(x)dx=ψtϵxdx-ζϵ*ψtϵx2dx.

Thus, Gronwall’s inequality implies that ψtϵ(x)dx is uniformly bounded above in ϵ and t[0,T]. Note that this also then gives a uniform bound on the rate of change of ψtϵ(x)dx, and since we are working on [0,T] this will be enough to give continuity in time of the L1 norm of the limit when we pass to a convergent subsequence.

Now consider

ddtψtϵlogψtϵdx=1+logψtϵΔψtϵρϵ*ψtϵ+ψtϵ1-ρϵ*ψtϵdx=1+logψtϵψtϵρϵ*ψtϵ+ψtϵρϵ*ψtϵ+ψtϵ1-ρϵ*ψtϵdx=-ψtϵψtϵψtϵρϵ*ψtϵ+ψtϵρϵ*ψtϵ+1+logψtϵψtϵ1-ρϵ*ψtϵdx=-ζϵ*ψtϵ2dx-ψtϵ2ρϵ*ψtϵψtϵdx+ψtϵ+ψtϵlogψtϵ1-ρϵ*ψtϵ-ψtϵρϵ*ψtϵdx=-ζϵ*ψtϵ2dx-ψtϵ2ρϵ*ψtϵψtϵdx-ζϵ*ψtϵ2dx+ψtϵ+ψtϵlogψtϵ1-ρϵ*ψtϵdx. (6.30)

The first three terms are negative; and we already saw that the L1 norm of ψtϵ is uniformly bounded. Moreover, since ψtϵlogψtϵ is uniformly bounded below and ρϵ(x)dx=1,

-ψtϵlogψtϵρϵ*ψtϵdxCρϵ*ψtϵdx=Cψtϵdx.

From this and (6.30), we see immediately that ψtϵlogψtϵdx is uniformly bounded above in ϵ and t[0,T]. Combining with Lemma 6.13, we deduce that we have a uniform bound on ψsϵ(x)logψsϵ(x)dx. From (6.30), this in turn means that both 0tζϵ*ψsϵ(x)2dxds and 0tζϵ*ψsϵ(x)2dxds are uniformly bounded in ϵ and t[0,T].

We shall next show that ζϵ*ψtϵ solves (1.2) up to a remainder of order ϵ. First observe that

Δρϵ*ψtϵψtϵϕdx=-ρϵ*ψtϵψtϵϕdx-ρϵ*ψtϵψtϵϕdx. (6.31)

We would like to show that this is close to ζϵ*ψtϵ2Δϕdx. For the first term

ρϵ*ψtϵψtϵϕdx=ψtϵ(x-y)ζϵ(y-z)ζˇϵ(z)ψtϵ(x)ϕ(x)dzdydx=ψtϵ(x~-y~)ζϵ(y~)ζϵ(z~)ψtϵ(x~-z~)ϕ(x~-z~)dz~dy~dx~=ζϵ*ψtϵζϵ*ψtϵϕdx=12ζϵ*ψtϵ2ϕdx+ζϵ*ψtϵζϵ*ψtϵϕ-ϕζϵ*ψtϵdx, (6.32)

where, as before, we have substituted x~=x-z,y~=y-z,z~=-z. To control the term (6.32) we use the Intermediate Value Theorem to see that

ψtϵ(x-y)ϕ(x-y)ζϵ(y)-ϕ(x)ψtϵ(x-y)ζϵ(y)dyCΔϕψtϵx-yyζϵydy.

Since ζ𝒮(R), the integral in this expression is 𝒪(ϵ).

Similarly, for the second term in (6.31),

ρϵ*ψtϵψtϵϕdx=ψtϵ(x-y)ζϵ(y-z)ζˇϵ(z)ψtϵ(x)ϕ(x)dzdydx=ψtϵ(x~-y~)ζϵ(y~)ζϵ(z~)ψtϵ(x~-z~)ϕ(x~-z~)dz~dy~dx~=ζϵ*ψtϵζϵ*ψtϵϕdx=12ζϵ*ψtϵ2ϕdx+ζϵ*ψtϵζϵ*ψtϵϕ-ϕζϵ*ψtϵdx, (6.33)

and (6.33) is controlled in the same way as (6.32): using the Intermediate Value Theorem,

|ψtϵ(x-y)ϕ(x-y)ζϵ(y)-ϕ(x)ψtϵ(x-y)ζϵ(y)dy|CΔϕ|ψtϵx-yyζϵydy|,

which again is 𝒪(ϵ).

We now have the ingredients that we need. The calculations above yield both a uniform (in ϵ) bound on ζϵ*ψtϵ in L1L2([0,T]×R), and that

ψtϵ(x)ϕ(x)dx-ψ0ϵ(x)ϕ(x)dx=0tζϵ*ψsϵ(x)2Δϕ(x)dx+0tζϵ*ψsϵ(x)1-ζϵ*ψsϵ(x)ϕ(x)dx+𝒪(ϵ) (6.34)

(for sufficiently regular ϕ). Since ψtϵ(x)ϕ(x)dx-ζϵ*ψtϵ(x)ϕ(x)dx is order ϵ, if we replace ψϵ by ζϵ*ψϵ on the left hand side, then (6.34) says that ζϵ*ψϵ solves (2.14) weakly up to order ϵ. Therefore, ζϵ*ψϵ converges weakly to ψ in L1, where ψ is the (unique) solution to equation (2.14) and, so, therefore, does ψϵ. In fact, strong convergence, that is ψϵ-ψϕdx0, follows from the uniform integrability of ψϵ that we can deduce from the uniform control of ψϵlogψϵdx that we proved above. □

7. Simultaneous scaling with interaction distance

In this section we prove Theorem 2.20, which proves convergence in the case that the width of the interaction kernel ρF simultaneously scales along with the parameters θ and N, in the special case in which r1γ,qθ(x,dy) is isotropic with zero mean, the kernel ρF is Gaussian, and the scaling limit is a reaction-diffusion equation.

To simplify notation, in this section we shall write

ρϵ*η(x)=ρFϵ*η(x)=pϵ2(x,y),η(dy),

where pt(x,y) denotes the heat semigroup. The assumptions of Theorem 2.20 will be in force throughout, in particular,

ϵ2θ,andθNϵd0. (7.1)

That N,θ and ϵ0 simultaneously will be implicit, so for example if we write limϵ0, it should be understood that θ,N in such a way that (7.1) is satisfied. Moreover, where there is no risk of confusion, except where it is helpful for emphasis, we suppress dependence of η on N.

The first part of the proof mirrors that of Theorem 2.10: in Subsection 7.1 we establish bounds on the moments of ρϵ*ηt(x) that are sufficient to imply tightness and then apply standard results on convergence of Markov processes from Ethier and Kurtz [1986]. The challenge comes in identifying the limit points. This is much more intricate than the case in which we do not scale the interaction kernel, as weak convergence will no longer be sufficient to guarantee the form of the nonlinear terms in the limiting equation. Identification of the limit will rest on regularity inherited from continuity estimates for a random walk with Gaussian jumps which we prove in Subsection 7.2, before identifying the limit points in Subsection 7.3.

7.1. Moment bounds for ρϵ*η

Let us write θf(x):=θ(f(y)-f(x))qθ(x,y)dy where qθ is a Gaussian kernel of mean 0 and variance 1/θ. We note that θ is the generator of a continuous (time and space) random walk, which makes jumps of mean 0 and variance 1/θ at rate θ. In what follows we write ψtϵ,x(y) for the solution of

tψtϵ,x=θψtϵ,x, (7.2)

with initial condition ψ0ϵ,xy=ρϵy-x=pϵ2x,y.

To see why ψtϵ,x is useful, first note that for any time-dependent function ϕt(x) with time derivative ϕ˙tx=tϕtx,

ϕt(x),ηt(dx)=ϕ0(x),η0(dx)+Mt(ϕ)+0tθϕs(x)+ϕ˙s(x),ηs(dx)ds+0tϕs(x)Fx,ηs,ηs(dx)ds, (7.3)

where Mt(ϕ) is a martingale (with respect to the natural filtration) with angle bracket process given by (2.4) with f replaced by ϕs(). So, taking ϕs()=ψt-sϵ,x() for 0st,

ρϵ*ηt(x)=ψ0ϵ,x(y),ηt(dy)=ψtϵ,x(y),η0(dy)+0tψt-sϵ,x(y)Fρϵ*ηs(y),ηs(dy)ds+Mtx, (7.4)

where Mt(x) has mean zero and a second moment we can easily write down.

Lemma 7.1

Fix t>0, let (Π(s))s0 be a rate one Poisson process, and let T(t)=Π(θt)/θ. Then

ψtϵ,x(y)=Epϵ2+T(t)(x,y),

and, moreover, since under our assumptions θϵ2 is bounded below, there is a C independent of ϵ or t such that

ψtϵ,xCϵ2+td/2.

Proof

The first claim is immediate from the definition of the random walk with generator θ.

For the second claim, first define τ(t)=T(t)-t. Since if τ(t)-ϵ2+t/2, then 1/ϵ2+T(t)2/ϵ2+t, while ϵ2+T(t)ϵ2 always, partitioning over τ(t)-ϵ2+t/2 and its complement,

ψϵ,x=E12πϵ2+T(t)d/2Cϵ2+td/2+CϵdPτt<-ϵ2+t/2. (7.5)

Now, observe that since Ee-Π(θt)=exp-θt1-e-1, by Markov’s inequality,

Pτ(t)<-ϵ2+t2=Pe-Π(θt)>e-θt-ϵ2/2E[exp(-Π(θt))]exp-θt-ϵ2/2=exp-θt1-e-1exp-θt-ϵ2/2=exp-χθt-θϵ22, (7.6)

where χ=1/2-e-1>0. The second term in (7.5) is therefore bounded by

C1+tϵ2d/2e-χθt1ϵ2+td/2e-ϵ2θ/2.

Now observe that the derivative (with respect to t) of e-χθt1+t/ϵ2d/2 is

d2ϵ2-1+tϵ2χθ1+tϵ2d/2-1e-χθt,

which is negative if θϵ2+t>d/2χ. At the maximum, 1+t/ϵ2=d/2χθϵ2, and so this quantity is bounded uniformly over not only t but also ϵ (since we’ve assumed that θϵ2 is bounded below). Therefore, we have the bound

1ϵdPτ(t)<-ϵ2+t/2Cϵ2+td/2e-ϵ2θ/2. (7.7)

Substituting this into (7.5) yields the result. □

Lemma 7.2

Let tt0 denote the natural filtration. Under the assumptions of Theorem 2.20, for each T[0,), and kN, there exist constants C=C(k,T) and C~=C~(k,T), independent of ϵ, such that for all xRd and all u,t[0,T] with u<t,

Eρϵ*ηt(x)kuCψt-uϵ,x(z),ηu(dz)k+CθNϵdψt-uϵ,x(z),ηu(dz); (7.8)

and

Eutψt-sϵ,x(z),ηs(dz)k-1ψt-sϵ,x(z)Fρϵ*ηs(z),ηs(dz)dsuC~ψt-uϵ,xz,ηudzk+C~θNϵdψt-uϵ,xz,ηudz; (7.9)

where the function ψtϵ,x() was defined in (7.2). In particular, under the assumptions of Theorem 2.20, the expected values of the quantities on the right hand side of (7.8) and (7.9) are both integrable with respect to Lebesgue measure.

Proof

To simplify our expressions, we shall consider the case u=0, but the proof goes through unchanged for other values of u.

We proceed by induction. Taking expectations in (7.4), using that F is bounded above, and applying Gronwall’s inequality to ψt-sϵ,x,ηs we obtain Eψ0ϵ,x,ηtCEψtϵ,x,η0, which implies (7.8) in the case k=1. Moreover, rearranging (7.4) we find

-0tψt-sϵ,xyFρϵ*ηsy,ηsdyds=ψtϵ,xy,η0dy-ψ0ϵ,xy,ηtdy+Mtx, (7.10)

and taking expectations again, since ψ0ϵ,x,ηt>0, and M0(x)=0, this yields

E-0tψt-sϵ,x(y)Fρϵ*ηs(y),ηs(dy)ds0ψtϵ,xy,η0dy.

Since F is bounded above, there exists a constant K such that |F|K-F and so combined with the bound on Eψ0ϵ,x(y),ηt(dy) just obtained, this in turn yields

E0tψt-sϵ,x(y)Fρϵ*ηs(y),ηs(dy)ds0C~ψtϵ,xy,η0dy,

which is (7.9) in the case k=1.

Now suppose that we have established (7.8) and (7.9) for all exponents j<k. First we apply the generator 𝒫N of our scaled population process to functions of the form f,ηk. Recalling that each jump of the process involves the birth or death of a single individual, and so increments f,η by ±f/N at the location of that individual and that rγ1, we find

𝒫N(f,ηk)=θNj=1k(kj)f(y)jNjf,ηkjqθ(x,dy),η(dx)+θN(1F(ρϵ*η(x))θ)j=1k(kj)(1)jf(x)jNjf,ηkj,η(dx). (7.11)

Mimicking what we did above, we set f()=ψtϵ,x() and write

Eψ0ϵ,x,ηtk0=ψtϵ,x(y),η0(dy)k+E0t𝒫Nψt-sϵ,x(y),ηs(dy)kds-0tkψ˙t-sϵ,x(y),ηs(dy)ψt-sϵ,x(y),ηs(dy)k-1ds0. (7.12)

Since ψ˙sϵ,x=θψsϵ,x, the j=1 terms from 𝒫Nψt-sϵ,x(y),ηs(dy)k combines with the last term in (7.12) to yield

0tkψt-sϵ,x,ηk-1Fρϵ*ηs(y)ψt-sϵ,x(y),ηs(dy)ds.

As for the remaining terms, using (from Lemma 7.1) that supsψsϵ,x()=C/ϵd,Nϵd>1, and our inductive hypothesis, we find

E0tθNj=2kkjψt-sϵ,x(z)jNjψt-sϵ,x,ηsk-jqθ(y,dz),ηs(dy)ds+0tθNj=2kkjψt-sϵ,x(y)jNjψt-sϵ,x,ηsk-j(-1)j1-Fρϵ*ηs(y)θ,ηs(dy)ds0CE0tj=2kθNϵd1Nϵdj-2ψt-sϵ,x(y)2+Fρϵ*ηs(y)θ,ηs(dy)ψt-sϵ,x,ηsk-jds0C'θNϵdj=1k-1ψtϵ,xy,η0dyjC''θNϵdψtϵ,xy,η0dyk+ψtϵ,xy,η0dy.

Combining this with (7.11) and (7.12), using once again the fact that F is bounded above, we find

Eψ0ϵ,x,ηtk0ψtϵ,x(y),η0(dy)k+C~E0tψt-sϵ,x(y),ηs(dy)kds0+C''θNϵdψtϵ,x(y),η0(dy)k+ψtϵ,x(y),η0(dy),

and (7.8) follows from Gronwall’s inequality. Rearranging exactly as in the case k=1, we recover (7.9) and the inductive step is complete. □

We shall also need the following consequence of the bounds that we obtained in Lemma 7.2:

Corollary 7.3

Under the assumptions of Theorem 2.20, for each k1,T>0, there is a C(k,T) such that

Eρϵ*ηtk,ηt<Ck,T<,forallt0,T. (7.13)

Proof [Sketch]

First observe that if A(0,1), then

pAϵ2(x,y)=1Ad/2pϵ2(x,y)exp-x-y22ϵ21A-11Ad/2pϵ2(x,y). (7.14)

Now consider

Eρϵ*ηt(x),ηt(dx)=Epϵ2(x,z)ηt(dz)ηt(dx)=Epϵ2/2(x,y)pϵ2/2(y,z)dyηt(dz)ηt(dx)=Epϵ2/2*ηt(y)2dyCEρϵ*ηtx2dx,

where we used (7.14) in the last line. Using Lemma 7.2 and our assumptions on η0, this quantity is finite.

To illustrate the inductive step, now consider

Eρϵ*ηt(x)2,ηt(dx)=Epϵ2x,z1pϵ2x,z2ηtdz1ηtdz2ηt(dx)=Epϵ2/2x,y1pϵ2/2x,y2pϵ2/2y1,z1pϵ2/2y2,z2ηtdz1ηtdz2dy1dy2ηt(dx). (7.15)

We use the identity

pϵ2/2x,y1pϵ2/2x,y2=pϵ2y1,y2pϵ2/4x,y1+y22

to rewrite (7.15) as

Epϵ2/2*ηty1pϵ2/2*ηty2pϵ2/4*ηty1+y22pϵ2y1,y2dy1dy2Epϵ2/2*ηty13+pϵ2/2*ηty23+pϵ2/4*ηty1+y223pϵ2y1,y2dy1dy2,

where we have used that for any non-negative real numbers β1,β2,β3, β1β2β3β13+β23+β33. For the first two terms in the sum we integrate with respect to y2 and y1 respectively to reduce to an expression of the form considered in Lemma 7.2. For the final term, the change of variables z1=y1+y2,z2=y1-y2 in the integral similarly allows us to integrate out the heat kernel, and we conclude that the result holds for k=2.

We can proceed in the same way for larger values of k, using repeatedly that

pt1x,y1pt2x,y2=pt1t2t1+t2x,t2y1+t1y2t1+t2pt1+t2y1,y2

to write

j=1kpτy,yj=j=2kpjτj-1yj,Yj-1pτky,Yk

where

Y1=y1,Yj=j-1jYj-1+1jyj,forj2.

Writing pϵ2x,zj=pϵ2/2x,yjpϵ2/2yj,zjdyj and using the above with τ=ϵ2/2, this yields

ρϵ*ηt(x)k,ηt(dx)=j=2kpϵ2j/2(j-1)yj,Yj-1i=1kpϵ2/2*ηtyipϵ2/2k*ηtYkdy1dykj=2kpϵ2j/2(j-1)yj,Yj-1i=1kpϵ2/2*ηtyik+1+pϵ2/2k*ηtYkk+1dy1dyk,

and once again we can change variables in the integrals and use (7.14) to bound this by a constant multiple of Eρϵ*ηt(x)k+1dx, and the inductive step is complete. □

Corollary 7.4 (Tightness of ρϵ*ηtN(x)dxt0)

Under the assumptions of Theorem 2.20, the sequence of measure valued processes ρϵ*ηtN(x)dxt0 (taking values in 𝒟[0,T]FRd) is tight.

Proof

First observe that the proof, from Lemma 6.2, that Esup0tT1,ηtN is bounded goes through unchanged, and since 1,ρϵ*ηtN(x)dx=1,ηtN, compact containment follows.

As in the nonlocal case, it suffices to prove that for T>0, and any fCbRd with bounded second derivatives and |f(x)|dx<, the sequence of real-valued processes f(x)ρϵ*ηtN(x)dxt0N1 is tight. Let us temporarily write XfN(t) for f(x)ρϵ*ηtN(x)dx and set

w'XfN,δ,T=inftimaxisups,tti-1,tiXfN(t)-XfN(s),

where ti ranges over all partitions of the form 0=t0<t1<<tn-1<Ttn with min1inti-ti-1>δ and n1. Using Corollary 3.7.4 of Ethier and Kurtz [1986], to prove tightness of the sequence of real-valued processes XfN it suffices to check compact containment of the sequence f(x)ρϵ*ηtN(x)dxN1 at any rational time t and that for every ν>0 and T>0, there exists δ>0 such that

limsupNPw'XfN,δ,T>ν<ν.

Evidently this will follow if we can show that this condition is satisfied when we replace the minimum over all partitions with mesh at least δ in the definition of w', by the partition into intervals of length exactly δ.

We have

ρϵ*f,ηtN-ρϵ*f,ηsNstθρϵ*f(y)-ρϵ*f(x)qθ(x,dy),ηuN(dx)du+stFρϵ*ηuN(x)ρϵ*|f|(x),ηuN(dx)du+2sup0uTM^N(f)u, (7.16)

where M^N(f) is the martingale of (6.4) with the test function f replaced by ρϵ*f. We control each of the three terms on the right hand side separately.

By the Intermediate Value Theorem, using Tt to denote the heat semigroup, there exists s(0,1/θ) such that

θρϵ*f(y)-ρϵ*f(x)qθ(x,dy)=θTϵ2+1/θf(x)-Tϵ2f(x)=sTϵ2+sf(x)=Tϵ2+sΔf(x)Δf.

The first term in (7.16) is therefore bounded by

Δf|t-s|sup0uT1,ηuN.

We follow the approach of Lemma 6.2. Consulting (2.4), the angle bracket process of M^N(f) satisfies E[M^fNT]C(θ/N)0TE1,ηsdsC'θ/N for some constants C and C'. Now, using the Burkholder-Davis-Gundy inequality and Barlow et al. [1986], E[sup0uT|M^N(f)u|2]C''E[M^N(f)u], and so using Markov’s inequality,

limsupNP2sup0uT|M^N(f)u|>ν3limsupN36ν2C''EM^NfTlimsupN36ν2C'C''θNϵd=0. (7.17)

Now consider

Estρϵ*|f|(x)Fρϵ*ηuN(x),ηuN(dx)du2=2Estρϵ*|f|(x)Fρϵ*ηuN(x),ηuN(dx)utρϵ*|f|(x)Fρϵ*ηrN(x),ηrN(dx)drdu. (7.18)

Since F is polynomial, we use the approach of Corollary 7.3, the tower property, and Lemma 7.2, to bound this in terms of sums of terms of the form

Est(t-u)ρϵ*|f|(x)ρϵ*ηuN(x)jdxρϵ*|f|(y)ρϵ*ηuN(y)kdydu.

Now observe that, again using Lemma 7.2, since for nonnegative a and b,ajbkaj+k+bj+k,

Eρϵ*|f|(x)ρϵ*ηuN(x)jρϵ*|f|(y)ρϵ*ηuN(y)kdxdyEfρϵ*ηuN(x)j+kρϵ*|f|(y)dxdy+ρϵ*|f|(x)fρϵ*ηuN(y)j+kdxdyC|f|(x)dx.

Thus the quantity (7.18) is bounded by C(t-s)2 for a new constant C which we can take to be independent of s,t and ϵ. Markov’s inequality then gives

PfstFρϵ*ηuN(x),ηuN(dx)duν3C(t-s)2ν2.

A union bound gives that

Pmaxifti-1tiFρϵ*ηuN(x),ηuN(dx)duν3CTδν2. (7.19)

Now using Markov’s inequality, we can choose K so that

PΔfsup0tT1,ηtN>K<ν3,

and so choosing δ so that Kδ<ν/3 in this expression and Cδ<ν3/3T in (7.19), combining with (7.17), the result follows. □

7.2. Continuity estimates for ρϵ*η

To identify the limit point of any convergent subsequence of ρϵ*ηN(x), we will require some control on the spatial continuity of the functions ρϵ*ηN(x). This will be inherited from the regularity of the transition density of the Gaussian random walk with generator θ, which in turn follows from its representation as that of a Brownian motion evaluated at the random time T(t) defined in Lemma 7.1. Our approach will be to approximate ψtϵ,x() by pϵ2+t(x,), and to control the error that this introduces we need to control T(t)-t.

Lemma 7.5

In the notation of Lemma 7.1, for any A>1,

PT(t)-t>Aϵ2+texp-θA4ϵ2+t.

Proof

This is just a Chernoff bound. With Π a rate one Poisson process as in Lemma 7.1, for any A>1,

PT(t)-t>Aϵ2+t=PΠ(θt)>θt+Aϵ2+tE[exp(αΠ(θt))]expαθt+Aϵ2+t=expθteα-1-αθt+Aϵ2+texpθteα-α-1-Aα2-Aα2θϵ2+t.

Now set α=1/2. Since A>1,eα-α-1-Aα/2<0 and the result follows. □

As advertised, we wish to control the difference between ψtϵ,x(y) and pϵ2+t(x,y).

Lemma 7.6

In the notation of Lemma 7.1, there exists a C< such that

ψtϵ,x(y)-pϵ2+t(x,y)Cϵ2θ1/2p6ϵ2+t(x,y)+Cϵ2+td/2exp-ϵ2θ/2. (7.20)

Proof

Still using the notation of Lemma 7.1, we partition into three events according to the value of τ(t). Let A1=τ(t)<-ϵ2+t/2,A2=τ(t)>2ϵ2+t, and A3 the remaining event, -ϵ2+t/2τ(t)2ϵ2+t. Then,

ψtϵ,x(y)-pϵ2+t(x,y)=Epϵ2+t+τ(t)(x,y)-pϵ2+t(x,y)E1A1+1A2+1A3pϵ2+t+τtx,y-pϵ2+tx,y.

For the first term, note that if a<b then

pa(x,y)-pb(x,y)=1(2π)d/21ad/2e-x-y2/2a-1bd/2e-x-y2/2b=12πa2d/2e-x-y2/2be-x-y212a-12b-abd/2Cbad/2pb(x,y),

where the inequality follows because both terms under the absolute value are less than 1. Since, on the event A1,τ(t)<0, we can apply this with a=ϵ2+t+τ(t) and b=ϵ2+t, and, using the bound (7.6),

E1A1pa(x,y)-pb(x,y)Cϵ2+tϵ2d/2pϵ2+t(x,y)Pτ(t)<-ϵ2+t2C1ϵdPτ(t)<-ϵ2+t2Cϵ2+td/2exp-θϵ22.

For the third term, we will first collect some facts. Observe that on the event A3,ϵ2+t+τ(t) is between ϵ2+t/2 and 3ϵ2+t, and for any s in this interval,

p2s(y)6ϵ2+tϵ2+td/2p6ϵ2+t(x,y)=6d/2p6ϵ2+t(x,y). (7.21)

Moreover, since ue-ue-1 for all u0,

x-y2sps(x,y)=4(2πs)d/2e-x-y24sx-y24se-x-y24sCp2s(x,y). (7.22)

Now, by the Intermediate Value Theorem,

pϵ2+t+τ(t)(x,y)-pϵ2+t(x,y)=|τ(t)|ps(x,y)s (7.23)

for some s between ϵ2+t+τ(t) and ϵ2+t. Since

sps(x,y)=s1(2πs)d/2exp-x-y22s=-d2sps(x,y)+x-y22s2ps(x,y),

applying the inequality (7.22), using the fact that ps(x,y)2d/2p2s(x,y), and then (7.21), we have that for any sϵ2+t/2,3ϵ2+t,

sps(x,y)Csp2sx,yCϵ2+tp6ϵ2+tx,y.

Therefore, recalling that E[τ(t)2]=t/θ, substituting into (7.23),

E1A3pϵ2+t+τ(t)(x,y)-pϵ2+t(x,y)Cϵ2+tp6ϵ2+t(x,y)E[|τ(t)|]Cϵ2+tp6ϵ2+t(x,y)Eτ(t)21/2=Ctθϵ2+t21/2p6ϵ2+t(x,y)Cθϵ2p6ϵ2+tx,y,

where the last inequality follows from 2ϵ2tϵ2+t2.

Finally, on the event A2=τ(t)>2ϵ2+t, we simply use

pϵ2+t+τ(t)(x,y)-pϵ2+t(x,y)Cϵ2+td/2,

so that

E1A2pϵ2+t+τ(t)(x,y)-pϵ2+t(x,y)Cϵ2+td/2Pτ(t)>2ϵ2+t,

and apply Lemma 7.5 with A=2. □

The last result will be useful when combined with the next bound for the heat kernel.

Lemma 7.7

Let s>0, and x,y,zRd. The following estimate holds:

ps(x,z)-ps(y,z)Cx-ysp2sx,z+p2sy,z,

where the constant C does not depend on x,y,z or s.

Proof

Expanding the difference of two squares,

e-y-z22s-e-x-z22s=e-y-z24s-e-x-z24se-y-z24s+e-x-z24s.

Now, thinking of the first term in brackets as a function of a single variable x on the line segment [y,z] connecting y to z, we can apply the Intermediate Value Theorem and take the modulus to bound this expression by

y-x2w-z4sexp-w-z24s(4πs)d/2p2s(y,z)+p2s(x,z)

for some w[y,z]. Using the fact that xe-x2 is uniformly bounded, we can bound the first bracket in the last equation by C/s, and the result follows. □

We now have the ingredients that we need to write down a continuity estimate for ρϵ*η. We fix δ>0 and suppose that s>δ. Let us write

ϵ^δ,ϵ,θ:=1ϵ2+δd/2e-ϵ2θ/2,

and note that under the assumption that ϵ2θ, for each fixed δ>0,limϵ0,θϵ^(δ,ϵ,θ)=0. Using the semimartingale decomposition (7.4), and Lemma 7.6, we have

ρϵ*ηs(y)-ρϵ*ηs(w)=pϵ2(y,z)-pϵ2(w,z),ηs(dz)pϵ2+s(y,z)-pϵ2+s(w,z),η0(dz)+0s-δ|ps-r+ϵ2(y,z)-ps-r+ϵ2(w,z)|||Fρϵ*ηr(z),ηr(dz)dr+Cθϵ21/2p6ϵ2+s(y,z)+p6ϵ2+s(w,z)+Cϵ^(δ,ϵ,θ),η0(dz)+0s-δCϵ2θ1/2p6s-r+ϵ2(y,z)+p6s-r+ϵ2(w,z)+ϵ^(δ,ϵ,θ)Fρϵ*ηr(z),ηr(dz)dr+s-δsψs-rϵ,y(z)+ψs-rϵ,w(z)Fρϵ*ηr(z),ηr(dz)dr+Ms(y)+Ms(w)y-ws+ϵ2p2s+ϵ2(y,z)+p2s+ϵ2(w,z),η0(dz)+0s-δy-ws-r+ϵ2p2s-t+ϵ2(y,z)+p2s-r+ϵ2(w,z)Fρϵ*ηr(z),ηr(dz)dr+Cθϵ21/2p6ϵ2+s(y,z)+p6ϵ2+s(w,z)+Cϵ^(δ,ϵ,θ),η0(dz)+0s-δCϵ2θ1/2p6s-r+ϵ2(y,z)+p6s-r+ϵ2(w,z)+ϵ^(δ,ϵ,θ)Fρϵ*ηr(z),ηr(dz)dr+s-δsψs-rϵ,y(z)+ψs-rϵ,w(z)Fρϵ*ηr(z),ηr(dz)dr+Ms(y)+Ms(w). (7.24)

Although this expression is lengthy, we have successfully isolated the terms involving y-w, which will control the regularity as we pass to the limit. Asymptotically, we don’t expect the martingale terms to contribute, since their quadratic variation scales with θ/Nϵd; under the assumption that ϵ2θ, for any fixed δ>0, the terms arising from approximating the transition density ψs-rϵ,() of the Gaussian walk by ps-r+ϵ2(,) at times with s-r>δ will tend to zero; and the moment bounds of Lemma 7.2 will allow us to control the integral over [s-δ,s]. There is some technical work to be done to rigorously identify the limit points of ρϵ*ηN, but it really amounts to applying the tower property and our moment bounds from Lemma 7.2 and Corollary 7.3.

7.3. Identification of the limit

We now turn to the identification of the limit points of the sequence ρϵ*ηtN(x)dxt0N1. We would like to show that any limit point solves (2.16) in the limit, i.e.,

fx,φt,xdx=0t12Δfx+fxFφs,x,φs,xdxds. (7.25)

Since f,ρϵ*ηtN(x)dx=ρϵ*f(x),ηtN(dx), and the limit is deterministic, this will follow if we can show that each of the terms in the semimartingale decomposition (2.3), with the test function f replaced by ρϵ*f(), converges to the corresponding term in (7.25).

The linear term is straightforward. Write T. for the heat semigroup, so that ρϵ*f(x)=Tϵ2f(x). By a Taylor expansion,

0tθTϵ2f,ηsN(dx)ds=0t12ΔTϵ2f(x),ηsN(dx)+𝒪1θ=0t12Δf(x),ρϵ*ηsN(x)dxds+𝒪1θ.

Thus, from weak convergence we can deduce that under our scaling, for any (weakly) convergent subsequence ρϵ*ηN(x)dxN1,

0tθTϵ2f,ηsN(dx)ds0t12Δf(x),φ(s,x)dxds.

The nonlinear term in the semimartingale decomposition is more intricate. It takes the form

E0tTϵ2f(y)Fρϵ*ηs(y),ηs(dy)ds

and we should like to show that this converges to

0tfyFφs,yφs,ydyds.

We proceed in stages. First we should like to transfer the heat semigroup from Tϵ2f onto ηs. Since f is smooth, this will follow easily if we can show that

E0tTϵ2f(y)Fρϵ*ηs(y),ηs(dy)dsE0tTϵ2fyFρϵ*ηsy,ρϵ*ηsydyds.

This is the content of Proposition 7.8.

Proposition 7.8

Under the conditions of Theorem 2.20,

limϵ0E0tTϵ2f(y)Fρϵ*ηs(y),ηs(y)-Tϵ2f(y)Fρϵ*ηs(y),ρϵ*ηs(y)dyds=0. (7.26)

Proof

In fact we are going to fix δ>0, with t>δ, and show that the expression on the left hand side of (7.26) is less than a constant times δ, with a constant independent of δ,N, and ϵ. Since δ is arbitrary, the result will follow.

We first note that,

Tϵ2f(y)Fρϵ*ηs(y),ρϵ*ηs(y)dy-Tϵ2f(y)Fρϵ*ηs(dy),ηs(dy)=Tϵ2f(y)Fρϵ*ηs(y)ρϵ(y-w)dy,ηs(dw)-Tϵ2f(w)Fρϵ*ηs(w),ηs(dw)=Tϵ2f(y)Fρϵ*ηs(y)-Tϵ2f(w)Fρϵ*ηs(w)ρϵ(w-y)dy,ηs(dw).

Let us denote the integral against dy in the last expression by I, that is

I:=Tϵ2fyFρϵ*ηsy-Tϵ2fwFρϵ*ηswρϵw-ydy,

and note that |I| is bounded by

Fρϵ*ηs(y)-Fρϵ*ηs(w)Tϵ2f(y)+Fρϵ*ηs(w)Tϵ2f(y)-Tϵ2f(w)ρϵ(w-y)dyfFρϵ*ηsy-Fρϵ*ηswρϵw-ydy+Cϵf'Fρϵ*ηsw, (7.27)

where we have used that

Tϵ2fy-Tϵ2fwpϵ2w,ydyf'y-wpϵ2w,ydy.

Now recall that F is a polynomial of degree n, and so there exist real numbers bk such that F(a)-F(b)=(a-b)k=1n-1bkakbn-1-k and so

|F(ρϵ*ηs(y))F(ρϵ*ηs(w))||ρϵ*ηs(y)ρϵ*ηs(w)|k=1n1|bk|(ρϵ*ηs(y)n1+ρϵ*ηs(w)n1).

Combining the above, we have reduced the problem to showing that for any k0,

limϵ0E[0tρϵ*ηs(y)-ρϵ*ηs(w)ρϵ*ηs(y)k+ρϵ*ηs(w)kpϵ2(w,y)dy,ηs(dw)ds]=0. (7.28)

We are going to use the estimate (7.24). First note that by Lemma 7.2 the contribution to (7.28) from the integral over the time interval [0,δ] is 𝒪(δ). We focus instead on the interval (δ,t].

The first term in (7.24) gives

δtE[y-ws+ϵ2p2s+ϵ2(y,z)+p2s+ϵ2(w,z),η0(dz)(ρϵ*ηs(y)k+ρϵ*ηs(w)k)pϵ2(w,y)dy,ηs(dw)]ds.

We “borrow” from the exponential term to see that y-wpϵ2(w,y)Cϵp2ϵ2(w,y) and so bound this by

Cδtϵs+ϵ2Ep2s+ϵ2(y,z)+p2s+ϵ2(w,z),η0(dz)ρϵ*ηs(y)k+ρϵ*ηs(w)kp2ϵ2(w,y)dy,ηs(dw)ds. (7.29)

The four terms in the product are taken separately, according to the combinations of w and y appearing. First,

δtϵs+ϵ2Ep2s+ϵ2(y,z),η0(dz)ρϵ*ηs(y)kp2ϵ2(w,y)dy,ηs(dw)ds

can be rewritten as

δtϵs+ϵ2Ep2s+ϵ2(y,z),η0(dz)ρϵ*ηs(y)kpϵ2(x,y)ρϵ*ηs(x)dydxdsδtϵs+ϵ2Ep2s+ϵ2(y,z),η0(dz)ρϵ*ηs(y)k+1+ρϵ*ηs(x)k+1pϵ2(x,y)dydxds,

and using Lemma 7.2 and the tower property, and integrating with respect to s, under our assumptions on η0, this is bounded by

Cϵδt1s+ϵ2Ep2s+ϵ2(y,z),η0(dz)ρϵ*η0(y)+ρϵ*η0(y)k+1+ρϵ*η0(x)+ρϵ*η0(x)k+1pϵ2(x,y)dydxdsC'ϵδt1s+ϵ2d/2ds.

For fixed δ, this bound tends to zero as ϵ0. The term involving p2s+ϵ2(w,z),η0(dz)ρϵ*ηs(w)k is handled similarly.

On the other hand

p2s+ϵ2(y,z),η0(dz)ρϵ*ηs(w)kp2ϵ2(w,y)dy,ηs(dw)Cs+ϵ2d/21,η0ρϵ*ηs(w)k,ηs(dw),

and since 1,η0 is uniformly bounded we apply Corollary 7.3 to obtain a bound on the contribution to (7.29) from this term of the same form as the others.

Now consider the contribution to the left hand side of (7.28) from the second term in (7.24). Since F is a polynomial, it is bounded by a sum of terms of the form

δt0s-δy-ws-r+ϵ2p2s-r+ϵ2(y,z)+p2s-r+ϵ2(w,z)ρϵ*ηr(z)j,ηr(dz)ρϵ*ηs(y)kpϵ2(y,w)dy,ηs(dw)drdsCϵδt0s-δ1s-r+ϵ2p2s-r+ϵ2(y,z)+p2s-r+ϵ2(w,z)ρϵ*ηr(z)j,ηr(dz)ρϵ*ηs(y)kp2ϵ2(y,w)dy,ηs(dw)drds,

where as usual we have “borrowed” from the exponential term in pϵ2(y,w) to replace y-w by a constant times ϵ.

Once again, our approach is to rearrange terms so that we can apply Lemma 7.2 or Corollary 7.3 to obtain a bound on the contribution to (7.28) from these terms of the form Cϵ (where C may depend on δ but not ϵ).

For example, using the Chapman-Kolmogorov equation to rewrite

p2s-r+ϵ2(y,z)ρϵ*ηr(z)j,ηr(dz)ρϵ*ηs(y)kp2ϵ2(y,w),ηs(dw)dy

as

p2s-r+ϵ2(y,z)ρϵ*ηr(z)j,ηr(dz)ρϵ*ηs(y)kpϵ2y,xρϵ*ηsxdxdy,

and using Lemma 7.2 and the tower property, we are led to control terms of the form

Ep2s-r+ϵ2(y,z)ρϵ*ηr(z)j,ηr(dz)ρϵ*ηr(y)k+1dy.

This, in turn, is at most

Eρϵ*ηr(z)j+k+1,ηr(dz)+Ep2(s-r)(y,x)ρϵ*ηr(x)ρϵ*ηr(y)j+k+1dydxEρϵ*ηr(z)j+k+1,ηr(dz)+2Eρϵ*ηr(x)j+k+2dx,

which is bounded by Lemma 7.2.

We now turn to the contribution arising from the martingale terms in (7.24):

EδtMs(y)+Ms(w)ρϵ*ηs(y)k+ρϵ*ηs(w)kpϵ2(w,y)dy,ηs(dw)ds.

Since ψt-sϵ,x(y)=EpT(t-s)+ϵ2(x,y), rearranging (7.4) we see that we can pull a convolution with pϵ2/2 out of our expressions for Ms(y) and Ms(w) and so all the manipulations that we used to control terms above with still be valid. To deal with the two terms in the product involving Ms(y), we write the first as Ms(y)ρϵ*ηs(y)k+1dy and then use Hölder’s inequality, Lemma 7.2, and the fact that EMs(y)2 is 𝒪θ/Nϵd to see that the contribution from this term tends to zero in the limit. For the second, we use the idea of the proof of Corollary 7.3 to reduce to a form to which we can apply Hölder’s inequality.

Control of the terms arising from approximating ψϵ,x by the heat kernel follows in an entirely analogous way.

Combining the above, we see that given δ>0,

limϵ0E0tTϵ2f(y)Fρϵ*ηs(y),ηs(y)ds-0tTϵ2f(y)Fρϵ*ηs(y),ρϵ*ηs(y)dyds<Cδ,

where the constant C is independent of δ. Since δ was arbitrary, the proof is complete. □

Since f is smooth, Tϵ2f-f is 𝒪(ϵ), with an application of the triangle inquality,

limϵ0E0tTϵ2f(y)Fρϵ*ηs(y),ηs(y)ds-0tf(y)Fρϵ*ηs(y),ρϵ*ηs(y)dyds<Cδ,

now follows immediately. Thus to complete the characterisation of the limit, it remains to show that if we take a convergent subsequence ρϵ*ηtN(dx)t0 converging to a limit point (φ(t,x)dx)t0, then

0tfxρϵ*ηsxFρϵ*ηsNxdxds0tfxφs,xFφs,xdxds.

Since F is a polynomial, we consider powers of ρϵ*η. To illustrate the approach, we first prove that

0tf(x)ρϵ*ηsN(x)2dxds0tf(x)φ(s,x)2dxds. (7.30)

The convergence of higher powers will follow in an entirely analogous manner, but with more complex expressions.

The approach is standard. We fix τ>0 and, in keeping with our notation ρϵ, in this subsection, use ρτ to denote the symmetric Gaussian kernel with variance parameter τ2. Our strategy is to show that, up to an error that tends to zero as τ0,

0tf(z)ρϵ*ηs(z)2dzds0tfzρϵ*ηszρτz-yρϵ*ηsydzdyds. (7.31)

Analogously, also up to an error that vanishes as τ0,

0tf(z)φ(s,z)2dzds0tfzφs,zρτz-yφs,ydzdyds. (7.32)

On the other hand, weak convergence of ρϵ*η (plus continuity of the mapping (z,y)f(z)ρτ(z-y) gives that

0tf(z)(ρϵ*ηs)(z)ρτ(zy)(ρϵ*ηs)(y)dzdyds0tf(z)φ(s,z)ρτ(zy)φ(s,y)dzdyds. (7.33)

Since τ is arbitrary, the convergence (7.30) will follow.

Proposition 7.9

Under the conditions of Theorem 2.20, we have that along any convergent subsequence,

limsupϵ0E0tf(y)ρϵ*ηs(y)2dyds-0tf(z)ρϵ*ηs(z)ρτ(z-y)ρϵ*ηs(y)dzdydsCτ, (7.34)

where C is independent of τ.

Proof

First note,

0tE[|f(y),ρϵ*ηs(y)2dy-f(y)ρϵ*ηs(z)ρτ(z-y)ρϵ*ηs(y)dzdy]dsf0tEρϵ*ηs(y)-ρϵ*ηs(z)ρτ(z-y)dzρϵ*ηs(y)dsdy. (7.35)

Now proceed exactly as in the proof of Proposition 7.8. The only distinction is that pϵ2(y,z)-pϵ2(w,z) is replaced by pτ(y,z)-pτ(w,z) and the estimate y-wpτ2(y,w)Cτp2τ2(y,w) replaces the corresponding statement with ϵ2 replacing τ2 in our previous argument. □

The extension of Proposition 7.9 to higher moments is straightforward, if notationally messy. For fixed (but arbitrary) τ, one shows that

limsupϵ0E0tf(y)ρϵ*ηsN(y)kdyds-0tfy1ρϵ*ηsNy1i=2kρτyi-yi-1ρϵ*ηsNyidykdy1dsCτ,

as well as a corresponding statement with ρϵ*ηsN(x) replaced by φ(s,x) and then use weak convergence to see that, up to an error of order τ, any limit point of the sequence ρϵ*ηN(x)dx solves (the weak form of) equation (2.16). Since τ was arbitrary, the proof of Theorem 2.20 is complete.

8. Proofs of results for the lookdown process and ancestral lineages

Now we turn to results about the lookdown process, first establishing the basic connection between the population process ηN and the lookdown process ξN, Proposition 5.3, and then in the next section, convergence of the lookdown process itself.

Proof [of Proposition 5.3]

This proposition is the content of the Markov Mapping Theorem, reproduced from Etheridge and Kurtz [2019] as Theorem A.1, applied to our situation. The function γ of that theorem is what we have called κ above, and the kernel α of that theorem is the transition function that assigns levels uniformly on [0,N] (in the first case) or as a Poisson process with Lesbegue intensity (in the limiting case). We need a continuous ψN(ξ)1 such that ANf(ξ)cfψN(ξ) for all f in the domain of AN (and similarly a function ψ for A). We also need that applying the lookdown generator to a function and averaging over levels is equivalent to applying the population process generator to the function whose dependence on levels has been averaged out, a condition which we precisely state, and verify, in Lemmas A.2 and A.3 of the Appendix.

For finite N, taking f(ξ) of the form (5.6), we can use ψN(ξ)=C(1+u|F(x,η)|),ξ(dx,du) for an appropriate constant C. For the scaling limit, recall that the test functions f are of the form f(ξ)=(x,u)ξg(x,u) with g(x,u)=1 for uu0, and consulting (5.11), we see that most terms in Af(ξ) can be bounded as above by constant multiples of 1,η. However, the term involving F is, as usual, more troublesome. Since 0f(ξ)/g(x,u)1 for any x,uξ,

|f(ξ)(x,u)ξF(x,η)uug(x,u)g(x,u)|ug(x,u)ξF(x,η)u1uu0ugeu0(x,u)ξF(x,η)ue-u.

The first line would be just what we want, except that ψ(ξ) cannot depend on f, and hence neither on u0. So, the second line provides us with the required bound: we absorb ugeu0 into cf and take ψ(ξ)=1+1+F(x,η)ue-u,ξ(dx,du). □

8.1. Tightness of the Lookdown Process

Now we turn to the main theorem on convergence of the lookdown process, Theorem 5.4, whose proof follows a similar pattern to that of convergence for the population processes in Section 6.2.

We first give a description of the lookdown process ξN in terms of the lines of descent introduced in Section 5.2. Each line of descent gives birth to lines at higher levels at rate 2(N-u)cθ(x,η), and each such new line chooses a level uniformly from [u,N], a spatial location y from the kernel

qm(x,dy,η)=r(y,η)q(x,dy)/Rdr(y,η)q(x,dy), (8.1)

and the two lines swap spatial locations with probability 1/2; the level of each line of descent evolves according to equation (5.21).

It is evident from the description of the process (or, by differentiating in Definition 5.1) that

f,ξtN=f,ξ0N+Mtf+0tcθx,ηsNuNRdfy,u1+fx,u1+f(y,u)-f(x,u)qmx,dy,ηsNdu1+cθx,ηsNu2-bθx,ηsNudduf(x,u),ξsN(dx,du)ds, (8.2)

where Mf is a martingale with angle bracket process

Mft=0tcθ(x,ηsN)uNd[f(y,u1)2+(f(x,u1)+f(y,u)f(x,u))2]du1qm(x,dy,ηsN),ξsN(dx,du)ds. (8.3)

Remark 8.1

In addition to tightness of the measure-values processes ξN, the bounds used in the proofs below also imply tightness of the number of lines of descent and the number of births below a fixed level, and of the motion of individual lines of descent. In other words, the limiting “line of descent” construction of Section 5.2 holds.

Proof [Proof of Theorem 5.4]

As in Section 6.2, the theorem will follow from tightness and characterization of the limit points. This time, the processes ξN take values in (Rd-×[0,)), the space of locally finite measures on space × levels. (They will in fact be point measures, including the limit, but that is a consequence of this theorem.) Again, tightness follows from a compact containment condition, tightness of one-dimensional distributions, and an application of Ethier and Kurtz [1986] Theorem 3.9.1.

Lines of descent can escape to infinite level in finite time, and so we endow (Rd-×[0,)) with the vague topology “in the level coordinate”, induced by test functions on Rd-×[0,) of the form g(x)h(u), where gCb(Rd-) is bounded and continuous and hCc([0,)) is compactly supported (following, e.g., Etheridge and Kurtz [2019], Condition 2.1). In several places below we require a dense subset of Cb((Rd-×[0,))), the bounded, continuous functions on (Rd-×[0,)). The functions ξf,ξ do not form not a dense subset of Cb((Rd-×[0,))), but they do separate points and vanish nowhere, i.e., for any ξ1 and ξ2 there is an f with f,ξ1f,ξ2, and a g such that g,ξ10. Therefore, by the Stone-Weierstrass theorem, the algebra they generate is dense in Cb((Rd-×[0,))) (with respect to uniform convergence on compact subsets). Topologized in this way, the space (Rd-×[0,)) is completely metrizable, and we may choose a countable set of nonnegative fk, each supported on Rd×0,uk for some uk<, such that a subset K(Rd-×[0,)) is relatively compact if and only if supξKfk,ξ< for each k. (To see this, use Theorem A.2.3 of Kallenberg [1997] and the first argument made in the proof of Lemma B.3.) Below, Lemma 8.4 proves exactly this, and therefore compact containment. Here we have compactified Rd for convenience (since it turned out to be straightforward to show that mass does not escape to infinity in space); however, we need to use the vague topology “in the level direction” because levels may escape to infinity in finite time in the limit.

In order to apply Ethier and Kurtz [1986] Theorem 3.9.1 we require that {FξtNt0}N is tight as a sequence of real-valued càdlàg processes, for all F in a subset of Cb((Rd-×[0,))) that is dense with respect to uniform convergence on compact subsets. Lemma 8.5 shows that f,ξtNN is a tight sequence for any f:Rd-×[0,)R with compact support in the level direction. Since as above the algebra generated by the functions ξf,ξ is dense in Cb((Rd-×[0,))), it suffices to show that tightness for the processes f,ξtN extends to finite sums and products of these processes, which is shown in Lemma B.3. The fact that martingale properties are preserved under passage to the limit is straightforward, and can be proved in a way analogous to Lemma 6.6; we omit the proof. Finally, we must show that the limiting lookdown process ξ projects to the limiting process η, i.e., a solution of the martingale problem in Theorem 2.10. Let Nk be a sequence along which ξNk converges. By Theorem 2.10, there is a subsequence Nk(j) along which the projected population processes ηNk(j) converge, and the limit solves the martingale problem. Thus any limit point of ξN projects to a population process η solving the martingale problem of Theorem 2.10. □

What we need for compact containment will come from the following Lemma. The generality is unimportant – for concreteness one may take h(u)=e-u.

Lemma 8.2

Let h be a positive, continuous, nonincreasing, differentiable function on [0,) such that 0uh(v)dvdu,0u2h'(u)du, and 0h(u)2du are all finite. Suppose that Assumptions 2.8 hold, and that θ/Nα and ξ0Nξ0 weakly as N, where each ξ0N is conditionally uniform given η0N in the sense of (5.12) and ξ0 is conditionally Poisson given η0 in the sense of (5.13). Then for any T there exists a constant K(T) such that for all M>0,

limsupNPsup0tTh,ξtN>M<K(T)M.

We postpone the proof of this Lemma until we have shown how it yields compact containment. First, we show that this implies compact containment of the processes f,ξtN0tT for arbitrary compactly supported f.

Lemma 8.3

Suppose fC(R-d×[0,)) and there is a uf such that if uuf then supxf(x,u)=0. Under the assumptions of Lemma 8.2, for any T there exists a constant K(f,T) such that for all M>0,

limsupNPsup0tTf,ξtN>M<K(f,T)M.

Proof [of Lemma 8.3]

Let h be as in Lemma 8.2, so there is a cf< such that f(x,u)cfh(u) for all x and u. Therefore, f,ξh,ξ, and so by Lemma 8.2,

limsupNPsup0tTf,ξtN>MlimsupNPsup0tTh,ξtN>M/cf<KTcfM.

Lemma 8.4 (Compact containment for ξ)

Let f1,f2, be a sequence of functions each satisfying the conditions of Lemma 8.3. Under the assumptions of Lemma 8.2, for any T and δ>0 there exists a sequence C1,C2, of finite constants such that

limsupNPsup0tTfk,ξtN>Ckforsomek1<δ. (8.4)

In other words, the processes ξN stay in the set

{ξ(Rd-×[0,)):fk,ξCkforallk1},

for all 0tT with uniformly high probability, a set which (as discussed in the proof of Theorem 5.4) is relatively compact for an appropriate choice of fkk1.

Proof [of Lemma 8.4]

By a union bound,

{sup0tTfk,ξtN>Ckforsomek1}k1{sup0tTfk,ξtN>Ck},

so (8.4) follows by taking Ck=2k-1Kfk,T/δ and using Lemma 8.3. □

Finally, we prove the key lemma.

Proof [of Lemma 8.2]

Applied to f(x,u)=h(u), the martingale representation (8.2) is

h,ξtN=h,ξ0N+Mth+0t2cθx,ηsNuNh(v)dv,ξsN(dx,du)ds+0tcθx,ηsNu2-bθx,ηsNuh'(u),ξsN(dx,du)ds,

where Mth is a martingale with angle bracket process

Mht=0t2cθx,ηsNuNh(v)2dv,ξsN(dx,du)ds.

Now, note that 0cθx,ηxNCa< and bθx,ηsNCb<, and we have assumed that h'(u)0 (since h is nonincreasing), so we may bound

h,ξtNh,ξ0N+Mth+0t2Cauh(v)dv+Cau2+Cbuh'(u),ξsN(dx,du)ds. (8.5)

Now, since ξtN is conditionally uniform given ηtN in the sense of (5.12), we know that for compactly supported f,Ef,ξtN=Ef~N,ηtN, where f~N(x)=0Nf(x,u)du. By our assumptions on h, we know that

02Cauh(v)dv+Cau2+Cbuh'(u)du<C

for some C<, and so (by dominated convergence)

Eh,ξtNEh,ξ0N+C0tE1,ηsNds,

which we know by Lemma 6.1 is bounded by C0eC1t for some other constants C0 and C1.

Now consider the maximum. By (8.5), using that the integrand is nonnegative,

sup0tTh,ξtNh,ξ0N+sup0tTMth+0T2Cauh(v)dv+Cau2+Cbuh'(u),ξsN(dx,du)ds.

Since x1+x for x0, the Burkholder-Davis-Gundy inequality tells us that there is a C' such that

Esup0tTMthC'1+EMhTC'1+0TE2cθx,ηsNuh(v)2dv,ξsN(dx,du)dsC'1+2Ca0h(v)2dv0TE1,ξsN(dx,du)dsC2eC1T,

for a constant C2 which is finite by our assumption that 0h(v)2dv<

Therefore,

Esup0tTh,ξtNEh,ξ0N+C2+C0/C1eC1T,

and so

Psup0tTh,ξtN>KEh,ξ0N+C2+C0/C1eC1TK.

Lemma 8.5

Let f be a bounded, continuous real-valued function on Rd×[0,) with uniformly bounded first and second derivatives for which there exists a u0 such that if u>u0 then f(x,u)=0. Then, the sequence of real-valued processes f,ξtNt0 for N1 is tight in 𝒟[0,)(R).

Proof [of Lemma 8.5]

Again, we use the Aldous-Rebolledo criterion. Tightness of f,ξt for a fixed t follows from Lemma 8.3, so we need only prove conditions analogous to (6.9) and (6.10) applied to the martingale representation of equations (8.2) and (8.3). Rewriting (8.2) with cθ=cθx,ηs,

f,ξt=f,ξ0+Mtf+0tcθuNfy,u1+fx,u1qm(x,dy,η)du1+cθ(N-u)0t(f(y,u)-f(x,u))qm(x,dy,η)+cθu2-bθudduf,ξsds.

The bounds analogous to (6.9) and (6.10) follow as in the proof of Lemma 6.3: for instance, observe that using that cθCa for some Ca, the predictable part of this semimartingale decomposition is bounded by

2Cafuf+(1-u/N)γBfθ+Cau2-bθudduf,ξs,

the last term of which is bounded by

Cauf2+supxbθx,ηsufdduf,

which can be bounded as we did for (6.9). □

8.2. Motion of ancestral lineages

In this section we prove Theorem 2.23. The argument follows directly from the discussion in Section 5.3.

Proof [of Theorem 2.23]

For brevity, in the proof we write γ(x) or γ for γ(x,η).

Here we have taken the high-density, deterministic limit (so, θ,N and θ/N0). We first proceed informally, as if the limiting process has a density φt(x) at location x and time t (which it may not), and follow this with an integration against test functions to make the argument rigorous. Let Y denote the spatial motion followed by a single line of descent. Above equation (5.19), we showed that Y is a diffusion with generator at time s

sYg(x)=γx,ηsr,ηsg()(x)-g(x)rx,ηs.

The diffusion is time-inhomogeneous if the density is not constant in time. Let φt(x) be the limiting density, which is a weak solution to (1.1), tφt=r*φtγ+φtF. Formally, the intensity of individuals at y at time t that are descended from individuals that were at x at time s (with s<t) is

φsxEs,xexpstF+γrYudu1Yt=ydy, (8.6)

where the subscript s,x in the expectation indicates that Ys=x. To see why this should be true, suppose that an ancestor at time s has level v. Conditional on its spatial motion Yusut, its level at time t will be vexp-st(F+γr)Yudu. This will be less than a given level λ if v<λexpst(F+γr)Yudu. The intensity of levels at y that are descended from individuals at x can therefore be obtained as the limit as λ of 1/λ times the number of levels at x at time s with u<λexpst(F+γr)Yudu and for which the corresponding individual is at y at time t, which is precisely the quantity in (8.6).

By our construction in Section 5.3, when we integrate (8.6) with respect to x we recover φt(y)dy. Consider an individual sampled at location y at time t, and write p(t,s,y,x) for the probability density that their ancestor at time s was at x. As a consequence of (8.6), still formally,

pt,s,y,x=φsxφtyEs,xexpstF+γrYudu1Yt=yfors<t. (8.7)

To make (8.7) meaningful, we multiply by suitable test functions f and g and integrate.

f(y)φt(y)p(t,s,y,x)g(x)dydx=g(x)φs(x)Ex,sexpst(F+γr)YudufYtdx.

Writing T^t,s for the time-inhomogeneous semigroup corresponding to the motion of ancestral lineages backwards in time (that is, T^t,sf(y)=p(t,s,x,y)f(x)dy, we can write this as

fyφtyT^t,sgydy=gxφsxEs,xexpstF+γrYudufYtdx. (8.8)

Next, we will differentiate this equation with respect to t. There are two terms in the product on the left-hand side that depend on t, so if we use that tφt=r*φtγ+φF (in a weak sense), and write u for the generator of T^t,s at time t=u so that tT^t,sg(y)t=s=sg(y), then

ddtf(y)φt(y)T^t,sg(y)dyt=s=fyφsysgy+ry*γφsy+φsyFygydy.

As for the right-hand side, since Ys=x under Ex,s,

ddtEx,sexpst(F+γr)YudufYtt=s=[F(x)+γ(x)r(x)]f(x)+sYf(x).

Therefore, the derivative of (8.8) (with respect to t, evaluated at t=s) is

f(y)φs(y)sg(y)+r(y)*γφs(y)+φs(y)F(y)g(y)dy=g(x)φs(x)sYf(x)+[F(x)+γ(x)r(x)]f(x)dx=fxsY*φsgx+Fx+γxrxφsxgxdx,

where (sY)* is the adjoint of sY. Since f was arbitrary,

sg=1φssY*φsg+γφsg(r)-rg*γφs.

(Note that the φsFg terms have cancelled.) Since the adjoint of sY is

sY*f=r*γf-γfr,

we can rewrite the generator of a lineage as

sg=rφs*γφsg-g*γφs.

This is equation (2.17).

To simplify to equation (2.18), first define 𝒟f(x)=ijCijijf(x), and so the adjoint of 𝒟 is

𝒟*f(x)=ijijCijf(x).

Note that 𝒟* satisfies the following identity:

𝒟*(fg)=ijgijCijf+2fiCijj(g)+2Ciji(f)j(g)+Cijfijg=g𝒟*f+2fcg+2(Cf)g+f𝒟g,

where cj=ijCij. So, with f=γφs

sg=rφs𝒟*(γφsg)-(γφsgb)-g𝒟*(γφs)+gγφsb=rφsγφs𝒟g+2γφscg+2Cγφsg-γφsbg=rγ𝒟g+2cg+2Clogγφsg-bg,

which is equation (2.18). □

Proof [of Corollary 2.26]

For the moment, we will write r(x) for r(x,η) and γ(x) for γ(x,η). First note that since in this case the semigroup does not depend on time, we can write =s, and

f=σ2rγΔf+2log(γφ)-h/σ2f.

Now, observe that

RdeHxfxΔ+Hxgxdx=-RdeHxfxgxdx,

so that by choosing Hx=2log(γ(x)φ(x))-h(x)/σ2 and

πx=eHx/σ2rxγx=γ(x)φ(x)2e-hx/σ2σ2rx,

we have that

Rdπxfxgxdx=-RdeHxfxgxdx.

Since this is symmetric in f and g, the process Y is reversible with respect to π; the constant factor of σ2 does not affect the result. □

Acknowledgements:

Thanks go to Gilia Patterson for identifying the “clumping” phenomenon, and to Marcin Bownick and David Levin for useful discussions. AME thanks everyone in MAPS at Université Paris Cité for their hospitality during the period in which much of this research took place. AME and PLR also thank the Kavli Institute for Theoretical Physics for their hospitality and birdwatching opportunities. PLR was supported by the NIH NHGRI (grant #HG011395), IL by the ANID/Doctorado en el extranjero doctoral scholarship, grant #2018–72190055, and TTHL by the EPSRC Centre for Doctoral Training in Mathematics of Random Systems: Analysis, Modelling and Simulation (EP/S023925/1) the Deutsche Forschungsgemeinschaft under Germany’s Excellence Strategy, EXC-2047/1–390685813, the Rhodes Trust and St. John’s College, Oxford.

A. Markov Mapping Theorem

The following appears as Theorem A.2 in Etheridge and Kurtz [2019], specialized slightly here to the case that the processes are càdlàg and have no fixed points of discontinuity. For an S0-valued, measurable process Y,^tY denotes the completion of the σ-algebra generated by Y(0) and 0rh(Y(s))ds,rt,hBS0. Also, let DS[0,) denote the space of càdlàg, S-valued functions with the Skorohod topology, and MS[0,) the space of Borel measurable functions from [0,) to S, topologized by convergence in Lesbegue measure. For other definitions see Etheridge and Kurtz [2019].

Theorem A.1 (Markov Mapping Theorem)

Let (S,d) and S0,d0 be complete, separable metric spaces. Let ACb(S)×C(S) and ψC(S), ψ1. Suppose that for each f𝒟(A) there exists cf such that

|Af(x)|cfψ(x),xA,

and define A0f(x)=Af(x)/ψ(x).

Suppose that A0 is a countably determined pre-generator, and suppose that 𝒟(A)=𝒟A0 is closed under multiplication and is separating. Let γ:SS0 be Borel measurable, and let α be a transition function from S0 into SyS0α(y,)𝒫(S) is Borel measurable) satisfying hγ(x)α(y,dx)=h(y) for yS0 and hBS0, that is, αy,γ-1(y)=1. Assume that ψ~(y)Sψ(z)α(y,dz)< for each yS0 and define

C=Sf(z)α(,dz),SAf(z)α(,dz):f𝒟(A). (A.1)

Let μ0𝒫S0 and define ν0=α(y,)μ0(dy).

  1. If Y~ satisfies 0tE[ψ~(Y~(s))]ds< for all t0 and Y~ is a solution of the martingale problem for C,μ0, then there exists a solution X of the martingale problem for A,ν0 such that Y~ has the same distribution on MS0[0,) as Y=γX. If Y and Y~ are cádlág, then Y and Y~ have the same distribution on DS0[0,).

  2. For t0,
    PX(t)Γ^tY=αYt,Γ,forΓS.
  3. If, in addition, uniqueness holds for the martingale problem for A,ν0, then uniqueness holds for the MS0[0,)-martingale problem for C,μ0. If Y~ has sample paths in DS0[0,) then uniqueness holds for the DS0[0,)-martingale problem for C,μ0.

  4. If uniqueness holds for the martingale problem for A,ν0 then Y is a Markov process.

In our application, we have taken S to be the space of locally finite counting measures on Rd×[0,N) or on Rd×[0,), and S0 the space of finite measures on R-d. Then, A corresponds to the generator for the lookdown process (i.e., either AN or A), and C corresponds to the generator for the spatial population process (i.e., either 𝒫N or 𝒫). The “γ” of the theorem is our spatial projection operator that we have called κN or κ, and the “α“ of the theorem will be named ΓN or Γ below. Finally, “X” of the theorem is our lookdown process, ξ, and “Y” is our spatial process, η.

A.1. Lookdown Generators

In this section we verify one of the conditions of the Markov Mapping Theorem, namely, that “integrating out levels” in the generator of the lookdown process we obtain the generator of the projected process. In the notation of the theorem, we are verifying that C defined in (A.1) is in fact 𝒫N (if defined with AN) or 𝒫 (if defined with A). We will work with test functions of the form

f(ξ)=(x,u)ξg(x,u)=exp(logg,ξ), (A.2)

where 0g1 and g(x,u)=1 for all uug for some ug<. Furthermore, recall that κN(ξ)()=ξ(×[0,N))/N is the “spatial projection operator”, and define the transition function ΓN:FRdRd×[0,N) so that for ηFRd, if g^N(x)=0Ng(x,u)du/N, then

FgN(η):=f(ξ)ΓN(η,dξ)=expNlog1N0Ng(x,u)du,η(dx)=expNlogg^N(x),η(dx),

i.e., ΓN assigns independent labels on [0,N] to each of the points in η. It follows from Lemma 6.5 that for test functions of this form the generator of ηtN is

𝒫NFgN(η)=FgN(η)Nθγ(x,η)r(z,η)g^N(z)-1qθ(x,dz)+μθ(x,η)1g^N(x)-1,η(dx). (A.3)

(Note that f here differs from the f used in Lemma 6.5 so as to agree with standard usage in the literature on lookdown processes.) The generator of ξtN is AN, defined in equation (5.10).

Lemma A.2

For all finite counting measures η on Rd, if f is of the form (A.2), then

ANfξΓNη,dξ=𝒫NFgNη. (A.4)

For the limiting process, recall that κ(ξ)()=limuξ(×[0,u))/u is the “spatial projection operator”, and define the probability kernel Γ:FRdRd×[0,) so that for ηFRd, defining g~(x)=0(g(x,u)-1)du,

Fg(η):=f(ξ)Γ(η,dξ)=exp0(g(x,u)-1)du,η(dx)=eg~(x),η(dx).

i.e., Γ(η,) is the distribution of a conditionally Poisson process with intensity a product of η and Lebesgue measure. It again follows from Lemma 6.5 that for test functions of this form the generator of ηt is

𝒫Fgη=Fgηγx,ηg~rx+Fx,ηg~x+αγx,ηrx,ηg~2x,ηdx. (A.5)

The generator of ξt is A, defined in equation (5.11).

Lemma A.3

For all ηFRd, if f is of the form (A.2), then

AfξΓη,dξ=𝒫Fgη. (A.6)

Proof [of Lemma A.2]

First, break the generator AN into three parts,

A1Nf(ξ)=f(ξ)(x,u)ξ2cθ(x,η)uN12gx,v1g(x,u)Rd(g(y,u)-g(x,u))qθm(x,dy,η)dv1,
A2Nfξ=fξx,uξ2cθx,ηuN12Rdgy,v1+gx,v12-1qθmx,dy,ηdv1,
A3Nfξ=fξx,uξcθx,ηu2-bθx,ηuugx,ugx,u,

where qm was defined in equation (8.1), so that

ANfξ=A1Nfξ+A2Nfξ+A3Nfξ.

We now integrate each piece against ΓN. First note that by the product form of f,

fξx,uξx,ugx,uΓNη,dξ=FgNη1Ng^Nx0Nx,udu,Nηdx.

Therefore,

A1Nf(ξ)ΓN(η,dξ)=FgN(η)cθ(x,η)g^N(x)0NuN12gx,v1Rd(g(y,u)-g(x,u))qθm(x,dy,η)dv1du,η(dx)=FgNηcθx,ηg^NxRd0NuNgx,v1gy,u-gx,udv1duqθmx,dy,η,ηdx.

For the second generator, we have

A2Nf(ξ)ΓN(η,dξ)=FgN(η)cθ(x,η)g^N(x)0Ng(x,u)uNRdgy,v1+gx,v12-1qθm(x,dy,η)dv1,η(dx)=FgN(η)cθ(x,η)g^N(x)Rd0NuNg(x,u)gy,v1+gx,v1-2dv1duqθm(x,dy,η),η(dx).

For the third generator we have that

A3Nf(ξ)ΓN(η,dξ)=FgN(η)1g^N(x)0Ncθ(x,η)u2-bθ(x,η)uug(x,u)du,η(dx)=FgNη1g^Nx0Nbθx,η-2cθx,ηugx,u-1du,ηdx.

Note that 20NuNgx,v1g(y,u)dv1du=N2g^N(x)g^N(y), and so

0NuNgx,v1(g(y,u)-g(x,u))dv1du+0NuNg(x,u)gy,v1+gx,v1-2dv1du=N2gN^xgN^y-2+20Nugx,udu.

Combining the last equations, and using the fact that Ncθ(x,η)-bθ(x,η)=θμθ(x,η), we have

A1Nf(ξ)+A2Nf(ξ)+A3Nf(ξ)ΓN(η,dξ)=FgN(η)cθ(x,η)N2Rdg^N(y)-2qm(x,dy,η)+1g^N(x)cθ(x,η)0N2udu+1g^N(x)bθ(x,η)0N(g(x,u)-1)du,η(dx)=FgN(η)cθ(x,η)N2Rdg^N(y)-1qm(x,dy,η)+N2cθ(x,η)1g^N(x)-1+Nbθ(x,η)1-1g^N(x),η(dx)=FgN(η)Ncθ(x,η)Rdg^N(y)-1qm(x,dy,η)+θμθ(x,η)1g^N(x)-1,η(dx).

This matches equation (A.3), as desired, because Ncθ(x,η)qm(x,dy,η)=θγ(x,η)qθ(x,dy). □

Before proving Lemma A.3, we recall an important equality for conditionally Poisson point processes (Kurtz and Rodrigues [2011] Lemma A.3).

Lemma A.4

If ξ=iδZi is a Poisson random measure with mean measure ν, then for L1(ν) and g0 with loggL1(ν),

E[j(Zj)ig(Zi)]=gdνe(g1)dν. (A.7)

Proof [ of Lemma A.3]

By Lemma A.4,

f(ξ)(x,u)ξ(x,u)g(x,u)Γ(η,ξ)=Fg(η)0(x,u)du,η(dx).

Comparing this to the definition of A (equation (5.11)), we see that

AfξΓη,dξ=Fgη01x,u+2x,u+3x,udu,ηdx,

where

1(x,u)=γ(x,η)((g(,u)r(,η))(x)-g(x,u)r(x,η))=γ(x,η)(((g(,u)-1)r(,η))(x)-(g(x,u)-1)r(x,η))

and

2(x,u)=2g(x,u)αγ(x,η)r(x,η)u(g(x,v)-1)dv.

and

3(x,u)=αγ(x,η)r(x,η)u2-{γ(x,η)r(x,η)+F(x,η)}uug(x,u)

First note that since acts on space, it commutes with the integral over levels, and so

01x,udu=γx,ηg~r,ηx-g~xrx,η,

since g~(x)=0(g(x,u)-1)du. Next,

02x,udu=αγx,ηrx,η20gx,uugx,v-1dvdu.

Finally, integrating by parts,

03(x,u)du=-αγ(x,η)r(x,η)02u(g(x,u)-1)du+{γ(x,η)r(x,η)+F(x,η)}g~(x)

Now, note that

0g(x,u)u(g(x,v)-1)dvdu-0u(g(x,u)-1)du=0g(x,u)u(g(x,v)-1)dvdu-0v(g(x,u)-1)dudv=0(g(x,u)-1)u(g(x,v)-1)dvdu=g~(x)2/2.

Adding these together, we get that

01(x,u)+2(x,u)+3(x,u)du=γ(x,η)(g~()r(,η))(x)+F(x,η)g~(x)+αγ(x,η)r(x,η)g~(x)2,

which agrees with (A.5), as desired. □

B. Technical Lemmas

B.1. Constraints on kernel widths

Lemma B.1

Suppose the first three conditions of Assumptions 2.8 hold, and furthermore the kernels ρr=pϵr2 and ργ=pϵγ2 are each Gaussian with standard deviations ϵr and ϵγ respectively. Let λ=supxsupy:y=1yTC(x)y be the largest eigenvalue of C(x) across all x. If ϵr2+2λθ<ϵγ2, then there is a C< such that for all xRd,ηFRd,

θRdρr*η(y)-ρr*η(x)qθ(x,dy)Cργ*η(x) (B.1)

and

θRdρr*η(y)-ρr*η(x)2qθ(x,dy)Cργ*η(x)2. (B.2)

Note that the right hand side of each is the average density over a wider region (since ϵγ>ϵr). The key assumption here is that the spatial scale over which local density affects birth rate is larger than the scale over which it affects establishment. In the simple case of b=0 and C=σ2I, the condition is simply that ϵr2+2σ2/θ<ϵγ2. This gives a yet more concrete situation in which Condition 2 of Lemma 2.9 holds.

Proof [of Lemma B.1]

First we prove (B.1). Recall that ρr*η(x)=pϵr2(x-w)η(dw), where pt is the density of a Gaussian with mean 0 and variance t, so that applying Fubini, (B.1) is

RdθRdpϵr2(y-w)-pϵr2(x-w)qθ(x,dy)η(dw).

Write ps,x() for the density of a Gaussian with mean sb(x) and covariance ϵr2I+sC(x), so that pϵr2(y-w)qθ(x,dy)=p1/θ,x(w-x). It therefore suffices to show that for all x and wRd, there exists K such that

θRdpϵr2(y-w)-pϵr2(x-w)qθ(x,dy)=θp1/θ,x(w-x)-p0,x(w-x)Kpϵγ2(w-x).

However, θp1/θ,x(z)-p0,x(z)=sps,x(z) for some 0s1/θ. Write Γ(s,x)=ϵr2I+sC(x), so that

ps,x(z)=1(2π|Γ(s,x)|)d/2exp-12(z-sb(x))TΓ(s,x)-1(z-sb(x)),

and note that if λi are the eigenvalues of C(x) then |Γ(s,x)|=iϵr2+sλi, and s|Γ(s,x)|=iλi|Γ(s,x)|/ϵr2+sλi. Therefore,

sps,x(z)=b(x)TΓ(s,x)-1(z-sb(x))+(z-sb(x))TΓ(s,x)-1C(x)Γ(s,x)-1(z-sb(x))-iλiϵr2+sλips,xz,

where zT is the transpose of z. This implies that

θRdpϵr2(y-w)-pϵr2(x-w)qθ(x,dy)pϵγ2(x-w).=hx-wekx-w,

where h(z) and k(z) are quadratic polynomials in z whose coefficients depend on s and x but are uniformly bounded, and

k(z)=12ϵγ2z2-12(z-sb(x))TΓ(s,x)-1z-sbx,

Since infzzTΓ(s,x)-1z/z2=1/sλ(x)+ϵr2, where λ(x)=supzTC(x)z/z2 is the largest eigenvalue of C(x), this is negative for all z outside a bounded region, and so equation (B.1) follows from the assumption that ϵr2+2supxλ(x)/θ<ϵγ2. (Note that we do not yet need the factor of 2.)

Next we prove equation (B.2), in a similar way. Again applying Fubini,

θRdρr*η(y)-ρr*η(x)2qθ(x,dy)=RdRdθRdpϵr2(y-w)-pϵr2(x-w)pϵr2(y-v)-pϵr2(x-v)qθ(x,dy)η(dv)η(dw),

and so as before, equation (B.2) will follow if the integrand is bounded by Kpγ(x-w)pγ(x-v). Now, let Y1,Y2, and Z be independent d-dimensional Gaussians with mean zero, where Y1 and Y2 have covariance ϵr2I, and Z has covariance C(x). Write ps,t,x(,) for the joint density of Y1+sZ+sb(x) and Y2+tZ+tb(x). Then, observe that

θRdpϵr2(y-w)-pϵr2(x-w)pϵr2(y-v)-pϵr2(x-v)qθ(x,dy)=θp1/θ,1/θ,x(x-w,x-v)-p0,1/θ,x(x-w,x-v)-p1/θ,0,x(x-w,x-v)+p0,0,x(x-w,x-v)=sps,1/θ,x(x-w,x-v)-tp0,t,x(x-w,x-v),

for some 0s,t1/θ. As before,

θRdpϵr2(y-w)-pϵr2(x-w)pϵr2(y-v)-pϵr2(x-v)qθ(x,dy)pϵγ2(x-w)pϵγ2(x-v)=h(x-w,x-v)ek(x-w,x-v),

where hz1,z2 is a polynomial with uniformly bounded coefficients and

kz1,z2=z12+z22/2ϵγ2-12z1,z2TΓ(s,t,x)-1z1,z2,

where z1,z2 is the R2d vector formed by concatenating z1 and z2, and Γ(s,t,x) is the block matrix

Γs,t,x=ϵr2I+sCxstCxstCxϵr2I+tCx.

If C(x)u=au for some aR, then [us,ut] is an eigenvector of Γ(s,t,x) with eigenvalue ϵr2+(s+t)a, and [ut,-us] is an eigenvector of Γ(s,t,x) with eigenvalue 0. This implies the largest eigenvalue of Γ(s,t,x) is equal to ϵr2+(s+t)λ(x), where λ(x) is again the largest eigenvalue of C(x). Therefore, if s+t2/θ,

z12+z22/ϵγ2-z1,z2TΓ(s,t,x)-1z1,z2z12+z221ϵγ2-1ϵr2+2λxθ,

which is negative by assumption. Therefore, there is a K such that

sps,1/θ,x(x-w,x-v)-tp0,t,x(x-w,x-v)pϵγ2(x-w)pϵγ2(x-v)K

for all θ>1 and all x, v, and wRd, proving equation (B.2) and hence the lemma. □

B.2. Tightness of processes

Here we record, for completeness, the fact used above that tightness for a family of processes, if determined by the Aldous-Rebolledo criterion, extends to sums and products of those processes. We first record for reference one version of the Aldous-Rebolledo criteria for tightness of a sequence real-valued processes (as it appears in Etheridge [2000]):

Theorem B.2 (Rebolledo [1980])

Let Y(n)n1 be a sequence of real-valued processes with càdlàg paths. Suppose that the following conditions are satisfied.

  1. For each fixed t[0,T], Yt(n)n1 is tight.

  2. Given a sequence of stopping times τn, bounded by T, for each ϵ>0 there exists δ>0 and n0 such that
    supnn0supθ0,minδ,T-τnPYτn+θ(n)-Yτn(n)>ϵϵ.
    Then the sequence Yt(n)t=0Tn1 is tight.

Lemma B.3

Let X(n)n1 and Y(n)n1 be sequences of jointly defined real-valued processes with càdlàg paths satisfying the conditions of Theorem B.2. Then X(n)Y(n)n1 and X(n)+Y(n)n1 also satisfy the conditions of Theorem B.2.

By “jointly defined” we mean that X(n) and Y(n) are defined on the same probability space, so that the products and sums make sense.

Proof [of Lemma B.3]

The proof for X(n)+Y(n) is similar to but more straightforward than for X(n)Y(n), so on only prove the Lemma for the latter. First, note that for any ϵ>0, by tightness of Xt(n) and Yt(n) there is a K such that PXt(n)>K and PYt(n)>K are both less than ϵ/2, and hence

PXt(n)Yt(n)>KPXtn>K+PYtn>Kϵ.

Therefore, Xt(n)Yt(n) is tight.

Next, note that for 0τnT,

sup0θminδ,T-τnXτn+θ(n)Yτn+θ(n)-Xτn(n)Yτn(n)sup0θminδ,T-τnXτn+θ(n)Yτn+θ(n)-Yτn(n)+Xτn+θ(n)-Xτn(n)Yτn(n)sup0tTXt(n)sup0θminδ,T-τnYτn+θ(n)-Yτn(n)+sup0θminδ,T-τnXτn+θ(n)-Xτn(n)sup0tTYt(n),

so that for any C,

Psup0θminδ,T-τnXτn+θ(n)Yτn+θ(n)-Xτn(n)Yτn(n)>ϵPsup0tTXt(n)>C+Psup0θminδ,T-τnYτn+θ(n)-Yτn(n)>ϵ/C+Psup0θminδ,T-τnXτn+θ(n)-Xτn(n)>ϵ/C+Psup0tTYt(n)>C (B.3)

Now, since max0tTXt(n) is tight (and likewise for Y) (see, e.g., Remark 3.7.3 in Ethier and Kurtz [1986]), we may choose a C4 for which

Psup0tTXt(n)>Cϵ4.

Similarly, by assumption we can choose a δ for which

Psup0θminδ,T-τnXτn+θ(n)-Xτn(n)>ϵ/CϵC.

If we choose C and δ that do this for both X(n) and Y(n), then each of the terms in equation (B.3) are bounded by ϵ/4, and condition (2) is satisfied for the product process. □

Contributor Information

Alison M. Etheridge, Department of Statistics, Oxford University, 24-29 St Giles, Oxford OX1 3LB, UK

Ian Letter, Department of Statistics, Oxford University, 24-29 St Giles, Oxford OX1 3LB, UK.

Terence Tsui Ho Lung, Department of Statistics, Oxford University, 24-29 St Giles, Oxford OX1 3LB, UK.

Thomas G. Kurtz, Departments of Mathematics and Statistics, University of Wisconsin - Madison, 480 Lincoln Drive, Madison, WI 53706-1388, USA

Peter L. Ralph, Departments of Mathematics and Biology, University of Oregon, Fenton Hall, Eugene, OR 97403-1222, USA

References

  1. Barlow M. T., Jacka S. D., and Yor M.. Inequalities for a pair of processes stopped at a random time. Proceedings of the London Mathematical Society, s3–52(1):142–172, January 1986. ISSN 0024–6115. doi: 10.1112/plms/s3-52.1.142. URL 10.1112/plms/s3-52.1.142. [DOI] [Google Scholar]
  2. Barton N H. The dynamics of hybrid zones. Heredity, 43(3):341–359, 1979. [Google Scholar]
  3. Berestycki Henri, Nadin Gregoire, Perthame Benoit, and Ryzhik Lenya. The non-local fisher-kpp equation: travelling waves and steady states. Nonlinearity, 22(12):2813–2844, 2009. URL http://stacks.iop.org/0951-7715/22/2813. [Google Scholar]
  4. Birzu Gabriel, Hallatschek Oskar, and Korolev Kirill S.. Fluctuations uncover a distinct class of traveling waves. Proc. Natl. Acad. Sci. (USA), 115(16):3645–3654, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Birzu Gabriel, Hallatschek Oskar, and Korolev Kirill S. Genealogical structure changes as range expansions transition from pushed to pulled. Proc. Natl. Acad. Sci. (USA), 118(34):e2026746118, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Biswas Niloy, Etheridge Alison, and Klimek Aleksander. The spatial Lambda-Fleming-Viot process with fluctuating selection. Electron. J. Probab., 26:1–51, 2021. doi: 10.1214/21-EJP593. URL 10.1214/21-EJP593. [DOI] [Google Scholar]
  7. Bolker B M and Pacala S W. Using moment equations to understand stochastically driven spatial pattern formation in ecological systems. Theor. Pop. Biol., 52(3):179–197, 1997. [DOI] [PubMed] [Google Scholar]
  8. Bolker B M and Pacala S W. Spatial moment equations for plant competition: Understanding spatial strategies and the advantages of short dispersal. American Naturalist, 153(6):575–602, 1999. [DOI] [PubMed] [Google Scholar]
  9. Bramson M. Convergence of solutions of the Kolmogorov equation to travelling waves. Mem. Amer. Math. Soc., 44(285), 1983. [Google Scholar]
  10. Britton N. F.. Spatial structures and periodic travelling waves in an integro-differential reaction-diffusion population model. SIAM Journal on Applied Mathematics, 50(6):1663–1688, 1990. doi: 10.1137/0150099. [DOI] [Google Scholar]
  11. Brunet E, Derrida B, Mueller A H, and Munier S. Noisy travelling waves: effect of selection on genealogies. Europhys. Lett., 76:1–7, 2006. [Google Scholar]
  12. Cantrell Robert Stephen and Cosner Chris. Spatial ecology via reaction-diffusion equations. John Wiley & Sons, 2004. [Google Scholar]
  13. Cohen I, Golding I, Kozlovsky Y, Ben-Jacob E, and Ron. Continuous and discrete models of cooperation in complex bacterial colonies. Fractals, 7:235–247, 1999. [Google Scholar]
  14. Dawson D A. Measure-valued Markov processes. In École d’été de probabilités de Saint Flour, volume 1541. Springer-Verlag, 1993. [Google Scholar]
  15. De Masi A., Ferrari P. A., and Lebowitz J. L.. Reaction-diffusion equations for interacting particle system. J. Stat. Phys., 44:589–644, 1986. doi: 10.1007/BF01011311. [DOI] [Google Scholar]
  16. DeMasi Anna and Presutti Errico. Mathematical methods for hydrodynamic limits. Springer, 2006. [Google Scholar]
  17. Donnelly P J and Kurtz T G. A countable representation of the Fleming-Viot measure-valued diffusion. Ann. Probab., 24:698–742, 1996. [Google Scholar]
  18. Donnelly P J and Kurtz T G. Particle representations for measure-valued population models. Ann. Probab., 27:166–205, 1999. [Google Scholar]
  19. Durrett R and Fan W-T. Genealogies in expanding populations. Ann. Appl. Probab., 26:3456–3490, 2016. [Google Scholar]
  20. Etheridge A. and Penington S.. Genealogies in bistable waves. Electron. J. Probab., 27(121):1–99, 2022. doi: 10.1214/22-EJP845. [DOI] [Google Scholar]
  21. A Etheridge, Gooding M, and Letter I. On the effects of a wide opening in the domain of the (stochastic) Allen-Cahn equation and the motion of hyrbid zones. Electron. J. Probab., 27:1–52, 2022. doi: 10.1214/22-EJP888. [DOI] [Google Scholar]
  22. Etheridge A M. Survival and extinction in a locally regulated population. Ann. Appl. Probab., 14(1):188–214, 2004. [Google Scholar]
  23. Etheridge Alison. An introduction to superprocesses. Number 20. American Mathematical Society, 2000. [Google Scholar]
  24. Etheridge Alison M. and Kurtz Thomas G.. Genealogical constructions of population models. Ann. Probab., 47(4):1827–1910, 2019. doi: 10.1214/18-AOP1266. [DOI] [Google Scholar]
  25. Ethier Stewart N. and Kurtz Thomas G.. Markov processes – characterization and convergence. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons Inc., New York, 1986. ISBN 0–471-08186–8. [Google Scholar]
  26. Fife Paul C. and McLeod J. B.. The approach of solutions of nonlinear diffusion equations to travelling front solutions. Archive for Rational Mechanics and Analysis, 65(4):335–361, December 1977. ISSN 1432–0673. doi: 10.1007/BF00250432. URL 10.1007/BF00250432. [DOI] [Google Scholar]
  27. Flandoli Franco and Huang Ruojun. The KPP equation as a scaling limit of locally interacting Brownian particles. J. Differential Equations, 303:608–644, 2021. [Google Scholar]
  28. Flandoli Franco, Leimbach Matti, and Olivera Christian. Uniform convergence of proliferating particles to the fkpp equation. Journal of Mathematical Analysis and Applications, 473(1):27–52, 2019 [Google Scholar]
  29. Ghosh Olivia M. and Good Benjamin H.. Emergent evolutionary forces in spatial models of microbial growth in the human gut microbiota. Proc. Natl. Acad. Sci. (USA), 119(28):e2114931119, Jul 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Gilding B H and Kersner R. A Fisher/KPP-type equation with density dependent diffusion and convection: travelling wave solutions. J. Phys. A: Math. Gen., 38:337–3379, 2005. [Google Scholar]
  31. Gooding M.. Long term behaviour of spatial population models with heterozygous or asymmetric homozygous selection. DPhil thesis, Oxford University, 2018. URL https://ora.ox.ac.uk/objects/uuid:ef35b918-35ad-4f90-98ea-325fc692f1eb. [Google Scholar]
  32. Hallatschek O and Nelson D. Gene surfing in expanding populations. Theor. Pop. Biol., 73:158–170, 2008. [DOI] [PubMed] [Google Scholar]
  33. Haller Benjamin C. and Messer Philipp W.. SLiM 3: Forward genetic simulations beyond the Wright-Fisher model. Mol. Biol. Evol., 36(3):632–637, 2019. ISSN 1537–1719. doi: 10.1093/molbev/msy228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Haller Benjamin C. and Messer Philipp W.. SLiM: An Evolutionary Simulation Framework, 2022. URL https://messerlab.org/SLiM.
  35. Hernández-García Emilio and López Cristóbal. Clustering, advection, and patterns in a model of population dynamics with neighborhood-dependent rates. Phys. Rev. E, 70(1):016216, July 2004. doi: 10.1103/PhysRevE.70.016216. URL 10.1103/PhysRevE.70.016216. [DOI] [PubMed] [Google Scholar]
  36. Hernández-Hernández Ma. Elena and Jacka Saul D.. A generalisation of the Burkholder-davis-gundy inequalities. Electronic Communications in Probability, 27(none), January 2022. doi: 10.1214/22-ecp493. URL 10.1214/22-ecp493. [DOI] [Google Scholar]
  37. Holmes E. E., Lewis M. A., Banks J. E., and Veit R. R.. Partial differential equations in ecology: Spatial interactions and population dynamics. Ecology, 75(1):17–29, 1994. ISSN 00129658, 19399170. URL http://www.jstor.org/stable/1939378. [Google Scholar]
  38. Kallenberg Olav. Foundations of modern probability, volume 2. Springer, 1997. [Google Scholar]
  39. Kamin S and Rosenau P. Emergence of waves in a nonlinear convection-reaction-diffusion equation. Adv. Nonlinear Stud., 4:251–272, 2004. [Google Scholar]
  40. Kelleher Jerome, Wong Yan, Wohns Anthony W., Fadil Chaimaa, Albers Patrick K., and McVean Gil. Inferring whole-genome histories in large population datasets. Nature Genetics, 51(9):1330–1338, 2019. ISSN 15461718. doi: 10.1038/s41588-019-0483-y. URL 10.1038/s41588-019-0483-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Kolomogorov A, Petrovsky I, and Piscounov N. Étude de l’equation de la diffusion avec croissance de la quantité de matière et son application à un problème biologique. Moscow Univ. Math. Bull., 1:1–25, 1937. [Google Scholar]
  42. Kurtz Thomas G. and Rodrigues Eliane R.. Poisson representations of branching Markov and measure-valued branching processes. Ann. Probab., 39(3):939–984, May 2011. URL 10.1214/10-AOP574. [DOI] [Google Scholar]
  43. Kurtz Thomas G. and Xiong Jie. Particle representations for a class of nonlinear spdes. Stoch. Proc. Appl., 83(1):103–126, 1999. ISSN 0304–4149. doi: 10.1016/S0304-4149(99)00024-1. URL https://www.sciencedirect.com/science/article/pii/S0304414999000241. [DOI] [Google Scholar]
  44. Lam King-Yeung and Lou Yuan. Introduction to Reaction-Diffusion Equations: Theory and Applications to Spatial Ecology and Evolutionary Biology. Springer International Publishing AG, 2023. [Google Scholar]
  45. Li Y, Buenzli P, and Simpson M. Interpreting how nonlinear diffusion affects the fate of bistable populations using a discrete modelling framework. Proc. Roy. Soc. A, 478, 2022. doi: 10.1098/rspa.2022.0013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Lions P-L and Mas-Gallic S. Une méthode particulaire déterministe pour des équations diffusives non linéaires. C. R. Acad. Sci. Paris, 332, Série I:369–376, 2001. [Google Scholar]
  47. Neigel J E and Avise J C. Application of a random walk model to geographic distributions of animal mitochondrial DNA variation. Genetics, 135(4):1209–1220, December 1993. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Oelschläger Karl. A law of large numbers for moderately interacting diffusion processes. Z. Wahrsch. verw. Geb., 69(2):279–322, 1985. [Google Scholar]
  49. Oelschläger Karl. On the derivation of reaction-diffusion equations as limit dynamics of systems of moderately interacting stochastic processes. Prob. Theor. Rel. Fields, 82(4):565–586, 1989. [Google Scholar]
  50. Penington S. The spreading speed of solutions of the non-local Fisher-KPP equation. J. Functional Anal., 275(12):3259–3302, 2017. [Google Scholar]
  51. Perkins E A. Measure-valued branching diffusions with spatial interactions. Prob. Th. Rel. Fields, 94:189–245, 1992. [Google Scholar]
  52. Potts Jonathan R. and Börger Luca. How to scale up from animal movement decisions to spatiotemporal patterns: An approach via step selection. Journal of Animal Ecology, 92(1):16–29, 2023. doi: 10.1111/1365-2656.13832. URL 10.1111/1365-2656.13832. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Rebolledo R. Sur l’existence de solutions á certains problèmes de semimartingales. C. R. Acad. Sci. Paris, 290, 1980. [Google Scholar]
  54. Sasaki Akira. Clumped distribution by neighbourhood competition. J. Theor. Biol., 186 (4):415–430, June 1997. doi: 10.1006/jtbi.1996.0370. URL 10.1006/jtbi.1996.0370. [DOI] [Google Scholar]
  55. Sherratt J A. On the form of smooth-front travelling waves in a reaction-diffusion equation with degenerate nonlinear diffusion. Math. Model. Nat. Phenom., 5(5):64–79, 2010. [Google Scholar]
  56. Sheu S J. Some estimates of the transition density of a nondegenerate diffusion Markov process. Ann. Probab., 19(2):538–561, 1991. [Google Scholar]
  57. Young W. R., Roberts A. J., and Stuhne G.. Reproductive pair correlations and the clustering of organisms. Nature, 412(6844):328–331, 2001. ISSN 14764687. doi: 10.1038/35085561. URL 10.1038/35085561. [DOI] [PubMed] [Google Scholar]

Articles from ArXiv are provided here courtesy of arXiv

RESOURCES