A general multi-scale description of metastable adaptive motion across fitness valleys

Manuel Esser; Anna Kraut

doi:10.1007/s00285-024-02143-3

. 2024 Oct 1;89(4):46. doi: 10.1007/s00285-024-02143-3

A general multi-scale description of metastable adaptive motion across fitness valleys

Manuel Esser ^1,^✉, Anna Kraut ^1,²

PMCID: PMC11445367 PMID: 39354121

Abstract

We consider a stochastic individual-based model of adaptive dynamics on a finite trait graph $G = (V, E)$ . The evolution is driven by a linear birth rate, a density dependent logistic death rate and the possibility of mutations along the directed edges in E. We study the limit of small mutation rates for a simultaneously diverging population size. Closing the gap between Bovier et al. (Ann Appl Probab 29(6):3541–358, 2019) and Coquille et al. (Electron J Probab 26:1–37, 2021) we give a precise description of transitions between evolutionary stable conditions (ESC), where multiple mutations are needed to cross a valley in the fitness landscape. The system shows a metastable behaviour on several divergent time scales, corresponding to the widths of these fitness valleys. We develop the framework of a meta graph that is constituted of ESCs and possible metastable transitions between them. This allows for a concise description of the multi-scale jump chain arising from concatenating several jumps. Finally, for each of the various time scales, we prove the convergence of the population process to a Markov jump process visiting only ESCs of sufficiently high stability.

Keywords: Adaptive dynamics, Stochastic individual-based models, Birth death processes with immigration, Metastability, Multi-scale limits

Introduction

The theory of evolution aims to understand the adaptation of biological populations to their environment through mutation and selection. Following the principles originally proposed by Darwin, it associates to each individual a fitness, which characterises their ability to survive and produce a growing population. The path of evolution, tracing the types of individuals that were able to fixate in the population, usually follows a sequence of types of increasing fitness. However, in many cases the mutational path has to pass through a number of deleterious or neutral intermediate types in order to reach a type of higher fitness. This can for example be seen in cancer initiation, where multiple driver mutations need to be accumulated to induce an outgrowing population (Martincorena et al. 2017). Other examples are the formation of complex mechanisms like flagella in bacteria, where only partially functional intermediate stages of flagella yield an evolutionary disadvantage but fully functional apparatuses lead to increased fitness (Pallen 2006). See also (De Visser and Krug 2014) for a review of empirical fitness landscapes arising in nature.

When the population needs to cross types of lower fitness in order to reach a fitter type, many such attempts will be unsuccessful. This is because the intermediate unfit types are destined to go extinct within a short time and might not produce a new mutant type before this happens. As a result, the waiting time to cross a valley in the fitness landscape is much longer than the invasion time of fit mutant types that are directly accessible. Once a fit type is attained, however, it rapidly fixates in the population. These dynamics, which can also be analysed in the framework of metastability, as illustrated below, have already been studied heuristically by Gillespie in the 80 s (Gillespie 1984). Since then, fitness valleys have been studied in a variety of mathematical models, ranging from Moran models (Komarova 2007; Gokhale et al. 2009) to multi-type branching processes (Nicholson and Antal 2019).

The model that we want to focus on in this paper is a stochastic individual-based model of adaptive dynamics, for which Bovier, Coquille and Smadi have studied fitness valleys in the simple case of a linear trait space (Bovier et al. 2019). This type of model tracks the sizes of different subpopulations and—opposed to many others like the Moran model—does not work under the assumption of a constant overall population size. It is in this aspect closer to branching processes, where the population size varies over time. However, infinite growth is limited due to competitive interactions. Moreover, selective advantages of certain traits are not prescribed by a fixed parameter but arise through these interactions. This is particularly important for the long-term evolution of the population since the fitness landscape depends on the current composition of the dominant population and changes over time.

This study of the interplay of ecology and evolution goes back to ideas from Metz and Geritz (among others) in the early 90 s (Metz et al. 1992). Shortly after, an individual-based approach has been proposed by Bolker and Pacala (1997) and a rigorous construction was first presented by Fournier and Méléard almost 20 years ago (Fournier and Méléard 2004). Since then, these models have been the topic of study for scaling limits in a variety of parameter regimes and extensions to the base model (e.g. Champagnat 2006; Champagnat and Méléard 2011; Baar et al. 2017; Smadi 2017; Bovier et al. 2018; Kraut and Bovier 2019; Champagnat et al. 2021; Coquille et al. 2021). We refer to Bovier (2021) for a comprehensive overview of various scaling limits.

To study the typical long-term behaviour of the population, two scaling parameters are introduced: The carrying capacity K, which scales the order of the population size, and the mutation probability $μ_{K}$ , which scales the frequency of mutation events. For large populations ( $K \to \infty$ ) and rare mutations ( $μ_{K} \to 0$ ), different mechanisms that change the state of the population—like mutations introducing a new type or interactions between individuals that lead to a new equilibrium state of resident traits—act on different time scales. There are three important time scales in this setting: Ecological interactions between well-established subpopulations, like the competition for resources, can change the composition of the overall population within a short time of order 1. This is related to classical Lotka–Volterra dynamics and leads to equilibrium states between the larger traits. Short-range mutations and the initial exponential growth of small mutant populations can be witnessed on a logarithmic time scale of order $ln K$ . Finally, long-range mutations—in particular those that need to traverse a large fitness valley of width L—are quite rare and occur on a time scale of order $1 / K μ_{K}^{L}$ . The distinction between long and short-range mutations depends on the choice of the mutation probability $μ_{K}$ , where long ranges L satisfy $K μ_{K}^{L} ≪ 1$ . To obtain a non-trivial limit as $K \to \infty$ , the population size is usually rescaled by K. As a result, only the established resident traits are visible. Since the ecological changes of these traits happen very fast in comparison with the other time scales, the limit of the population process yields a jump process that transitions between different equilibrium states.

The effects of short-range mutations on the $ln K$ -time scale have been studied extensively by Coquille et al. (2021). The authors give a full description of the limiting dynamics for the scenario of a general finite graph as a trait space. As mentioned above, the crossing of fitness valleys through long-range mutations (on the $1 / K μ_{K}^{L}$ -time scale) has been analysed for a simple linear trait space in Bovier et al. (2019). Moreover, the case of very rare mutations, where even neighbouring traits are regarded as long-range mutations, has already been studied by Champagnat and Méléard in Champagnat (2006), Champagnat and Méléard (2011), who showed convergence to the trait substitution sequence or polymorphic evolution sequence.

The present paper finally closes the gap between the previous works and gives a full description of the jump processes resulting from long-range mutations on general finite trait graphs, thus extending the results of Bovier et al. (2019) to the more general setting of Coquille et al. (2021). This general setting entails that, for a given equilibrium state, there might be several paths to cross the surrounding fitness valley. Concentrating on the decisive, shortest paths we calculate the rate of a transition to the next evolutionary stable condition and give the precise asymptotics in Theorem 7 and Corollary 8. The length of the shortest paths determines the time scale to cross the valley. Based on this, we introduce the notion of a stability degree L to classify the equilibrium states. Combining multiple of these steps gives rise to a jump chain that moves on a so called metastability graph stated in Corollary 10. This graph typically consists of fitness valleys of different width, which can be crossed on different time scales of the form $1 / K μ_{K}^{L}$ . Depending on the choice of time scale, only some of these transitions are possible (valleys of width strictly larger than L cannot be crossed) or visible (transitions of valleys of width strictly smaller than L are immediate). This leads to different limiting jump processes in Theorem 11.

When long-range mutations are necessary to cross a large fitness valley, the system displays an almost stable behaviour on shorter time scales but can change its state when waiting a long time. This type of phenomenon is also known as metastability. It has been studied mathematically mostly in the context of physics and statistical mechanics (e.g. Cirillo and Nardi 2013). However, the concept is very versatile and can be applied to many dynamical systems, including models for biological processes. This has for example been mentioned in Bovier et al. (2019) for models of adaptive dynamics, and in Dawson and Greven (2014) for population dynamics.

In the former case, as well as in this paper, the role of the traditional physical energy (landscape) is taken over by the fitness (landscape). Instead of passing a critical state of high energy, the process has to cross a valley of negative fitness through a sequence of deleterious mutations. Similarly to the fast dynamics after passing a high energy state, the adaptive dynamics system quickly attains a new metastable equilibrium once a fit mutant is reached due to fast exponential growth. The results of Bovier et al. (2019) and this paper even confirm classical definitions of the mean time for a metastable transition (e.g. Bovier and den Hollander 2015), by proving that the waiting times for jumps between equilibrium states are exponentially distributed when considering the correct time scale.

While single jumps across a fitness valley can be regarded as metastable transitions, the limiting jump chain can be related to the concept of adaptive walks or flights. Those are stochastic processes that directly study the motion of the macroscopic population on the trait space, focussing on successful invasions and omitting the microscopic dynamics (see Krug 2021 for an overview). There are two sources of randomness in adaptive walks: A random fitness landscape and a random motion towards neighbours of higher fitness, according to some transition law. Based on these, properties of interest are the distribution and accessibility of fitness maxima (Schmiegelt and Krug 2014; Nowak and Krug 2015; Berestycki et al. 2016, 2017), as well as the time or path length to reach those maxima (Orr 2003). In adaptive flights, transitions are not just possible between neighbouring traits but from one local fitness maximum to another (Jain and Krug 2005, 2007; Jain 2007; Neidhart and Krug 2011). This relates back to the limiting processes derived in this paper, where the population jumps between equilibrium states that are surrounded by valleys of traits of lower fitness.

A major difference between the models of adaptive walks/flights and adaptive dynamics is that the former assume a fitness landscape that is random but fixed in time, while in the latter case the fitness landscape is dynamic and depends on the current resident traits. As mentioned before, the notion of local fitness maxima can nevertheless be translated. Moreover, if equal competition between all traits is assumed in the adaptive dynamics model, the fitness landscape can again be regarded as fixed. We study this special case in a number of examples. Overall, the results of this paper can be seen as a validation of certain types of adaptive walks or flights, deriving their macroscopic dynamics from a microscopic, individual-based model.

The remainder of this paper is structured as follows: In Sect. 2, we rigorously define the individual-based model of adaptive dynamics, for which we derive our limit theorems. We introduce key quantities, like the fitness of a trait, and recapitulate the most important results of Coquille et al. (2021) that lead to a metastable state on the $ln K$ -time scale. Finally, we heuristically derive the limit behaviour on longer time scales and present the formal convergence results, starting with a single metastable transition in Sect. 2.3 and treating the full jump process in Sect. 2.4. Section 3 is devoted to the discussion of a number of examples that highlight different aspects of the complicated limiting dynamics in an easy set up. The proofs of the main results of this paper can be found in Sect. 4. A combinatorial result on excursions of subcritical birth death processes and the complete version of the results from Coquille et al. (2021) are stated in Appendix A, for the convenience of the reader.

Model and main results

In this chapter we introduce the individual-based model of adaptive dynamics and develop the main results of this paper. After a rigorous definition of the population process and its driving parameters we give a short overview of the behaviour on the time scales of order 1 and $ln K$ in Sect. 2.2. Moreover, in this section we derive the key quantities that lead us to the definition of the notion of an evolutionary stable condition. Our main results on the transition out of an ESC are stated in Sect. 2.3 and we give a heuristic explanation there. Finally, Sect. 2.4 is devoted to our results on multi-scale jump chains and the convergence of the population process. For the convenience of the reader, we provide a preview of the different time scales and the main results of this paper at the end of Sect. 2.1.

Individual-based model

To study the evolution of a heterogeneous population, we consider a classical stochastic individual-based model of adaptive dynamics. Each individual of our haploid population is characterised by its trait, which can be interpreted as its geno- or phenotype. Note that we assume a one to one correspondence between trait and physical properties. In this paper we consider a finite trait space that is given by a directed graph $G = (V, E)$ . Here, the set of vertices V represents the possible traits that individuals can obtain. The set of edges E marks the possibility of mutation between traits.

To each trait we associate a number of parameters that describe the dynamics of the system. For $v, w \in V$ and $K \in N$ , denote by

$b (v) \in R_{+}$ , the birth rate of an individual of trait v,
$d (v) \in R_{+}$ , the (natural) death rate of an individual of trait v,
$c^{K} (v, w) = c (v, w) / K \in R_{+}$ , the competition imposed by an individual of trait w onto an individual of trait v,
$μ_{K} \in [0, 1]$ , the probability of mutation at a birth event,
$m (v, \cdot) \in M_{p} (V)$ , the law of the trait of a mutant offspring produced by an individual of trait v.

Here, $M_{p} (V)$ denotes the set of probability measures on V. The parameter K scales the competitive interaction between individuals. It is called carrying capacity and can be interpreted as the environment’s capacity to support life, e.g. through the supply of nutrients or space. The way in which the mutation probability $μ_{K}$ may depend on K is discussed below.

To ensure a limited population size and to establish the connection between the possibility of mutation and the edges of our trait graph, we make the following assumptions on our parameters.

Assumption 1

For every $v \in V$ , $c (v, v) > 0$ . Moreover, $m (v, v) = 0$ , for all $v \in V$ , and $(v, w) \in E$ if and only if $m (v, w) > 0$ .

The evolution of the population over time is described by the Markov process $N^{K}$ with values in $D (R_{+}, N^{V})$ . $N_{v}^{K} (t)$ denotes the number of individuals of trait $v \in V$ that are alive at time $t \geq 0$ . The process is characterised by its infinitesimal generator:

\begin{matrix} L^{K} ϕ (N) = & \sum_{v \in V} (ϕ (N + δ_{v}) - ϕ (N)) (N_{v} b (v) (1 - μ_{K}) + \sum_{w \in V} N_{w} b (w) μ_{K} m (w, v)) \\ + \sum_{v \in V} (ϕ (N - δ_{v}) - ϕ (N)) N_{v} (d (v) + \sum_{w \in V} c^{K} (v, w) N_{w}), \end{matrix}

where $ϕ : N^{V} \to R$ is measurable and bounded and $δ_{v}$ denotes the unit vector at $v \in V$ . The process can be constructed algorithmically following a Gillespie algorithm (Gillespie 1976). Alternatively the process can be represented via Poisson measures [see Fournier and Méléard (2004)], a representation that is used in the proofs of our results. Throughout this paper, we assume that all processes $N^{K}$ , $K \in N$ , are defined on a common probability space. We give an example of a joint construction in the proof of Lemma 14. However, we emphasize that we do not assume a specific dependence or independence between the different processes in order for our results to hold true.

We want to study the typical behaviour of this process for large populations and moderately rare mutations. We do not have a fixed population size. However, due to our scaling of $c^{K} (v, w)$ , the equilibrium size of the population is always of order K. We therefore consider the limit of the processes $(N^{K} / K, K \in N)$ as $K \to \infty$ and $μ_{K} \to 0$ simultaneously in this paper.

Outlook: In the following sections, we develop the theory to describe the systems behaviour on various time scales. Since the description of each increasing time scale builds on the behaviour on previous shorter time scales, we go through these step by step, introducing the relevant notation as well as previous and new results along the way. To give the reader some orientation, we provide a brief overview of the time scales and preview the main results:

During times of order 1, the limiting rescaled stochastic process can be approximated by the solution of deterministic differential equations of Lotka–Volterra type. These describe how the larger subpopulations attain an equilibrium state (if existent). Since we consider the regime of $μ_{K} \to 0$ , mutations cannot be observed on this time scale.
For moderately rare mutations $μ_{K} = K^{- 1 / α}$ , mutations occur on the time scale $1 / K μ_{K}$ and mutant subpopulations grow from a single individual to a size of order K on the time scale $ln K ≫ 1 / K μ_{K}$ . The limiting dynamics on the $ln K$ -time scale have been described in Coquille et al. (2021). We provide the heuristics of this result in Sect. 2.2 and give the precise statement in A.2. On this time scale, the system evolves until it reaches an equilibrium state, where there are no fit mutant traits of (graph-)distance at most $α$ to the resident traits. This state is what we call an evolutionary stable condition (ESC).
In Sect. 2.3, we discuss how, on a more accelerated time scale $1 / K μ_{K}^{L}$ that corresponds to the distance $L > α$ of the closest fit mutant, the process can escape an ESC. Our first result Theorem 7 states that the time to produce a new fit mutant outside of the ESC is of order $1 / K μ_{K}^{L}$ and approximately exponentially distributed with a rate that can be calculated precisely. It moreover states the probabilities to produce specific mutant types. Corollary 8 deduces that the time to reach a new ESC has the same distribution as the time of leaving the old ESC and calculates transition probabilities to reach specific new ESCs. These single transitions between ESC states, which can be regarded as metastable transitions, are used to define the (directed) metastability graph $G_{ESC}$ in Definition 9, in the beginning of Sect. 2.4. It consists of subsets of V that allow for an ESC and the possible transitions between them.
Since the time scales on which transitions on the metastability graph occur depend on the distances L between fit mutants and current resident traits, the corresponding jump chain (characterised in Corollary 10) cannot be obtained as a limiting process on a single time scale. Instead, if we fix a time scale $1 / K μ_{K}^{L}$ , only transitions of this precise distance L are visible in the limit of $N^{K} / K$ as $K \to \infty$ . Shorter jumps occur immediately and longer jumps cannot be observed. To describe these dynamics, we introduce an L-scale graph $G^{L}$ , consisting of all ESCs that are not left immediately on the time scale $1 / K μ_{K}^{L}$ and characterize the limiting jump process on this graph in Theorem 11.

Short-term dynamics and frequent mutations

A law of large numbers result by Ethier and Kurtz (1986) states that, for $μ_{K} \equiv 0$ , the rescaled processes $N^{K} / K$ converge to the solution of a system of Lotka–Volterra equations. The study of these equations is central to determine the short term evolution, i.e. the evolution on a finite time scale, of the process $N^{K}$ .

Definition 1

(Lotka-Volterra system, equilibrium states, invasion fitness) For a subset $v \subset V$ we denote by $L V S (v)$ the system of Lotka-Volterra equations given by

\begin{matrix} \frac{d}{d t} n_{v} (t) = (b (v) - d (v) - \sum_{w \in v} c (v, w) n_{w} (t)) n_{v} (t), v \in v, t \geq 0 . \end{matrix}

By $L V E (v)$ , we denote the set of all equilibrium states $\bar{n} \in R_{\geq 0}^{v}$ such that

\begin{matrix} (b (v) - d (v) - \sum_{w \in v} c (v, w) {\bar{n}}_{w}) {\bar{n}}_{v} = 0, v \in v, \end{matrix}

and by $L V E_{+} (v) : = L V E (v) \cap R_{> 0}^{v}$ the subset of positive equilibrium states. If $L V E_{+} (v)$ consists of a single globally asymptotically stable element, we denote it by $\bar{n} (v)$ and call it coexistence equilibrium.

For a trait $w \in V$ and coexistence equilibrium $\bar{n} (v)$ , we denote by

\begin{matrix} f (w, v) = b (w) - d (w) - \sum_{v \in v} c (w, v) {\bar{n}}_{v} (v) \end{matrix}

the invasion fitness of w. For a given equilibrium $\bar{n} (v)$ , we call a trait w fit if $f (w, v) > 0$ and unfit if $f (w, v) < 0$ .

Note that the invasion fitness $f (w, v)$ describes the approximate growth rate of a small population of trait w in a bulk population of coexisting traits $v$ , in the mutation-free system. To simplify notation for later purpose, in the case of monomorphic equilibria, i.e. $v = {v}$ , we write

\begin{matrix} \bar{n} (v) : = {\bar{n}}_{v} ({v}) and f (w, v) : = f (w, {v}) . \end{matrix}

Going back to the stochastic process $N^{K}$ , it is of interest to study the logarithm of the population size as $K \to \infty$ . Only subpopulations with a size of order K are visible in the rescaled limit of $N^{K} / K$ and exponential growth of the absolute population size translates to linear growth of the K-exponent when studying a logarithmic time scale via $e^{t ln K \cdot f} = K^{t \cdot f}$ . This makes it easier to describe the limiting dynamics. We therefore define $β^{K} = {(β_{v}^{K})}_{v \in V}$ , where

\begin{matrix} β_{v}^{K} (t) : = \frac{ln (1 + N_{v}^{K} (t))}{ln K}, \end{matrix}

which is equivalent to $N_{v}^{K} (t) = K^{β_{v}^{K} (t)} - 1$ . Note that we add or subtract 1 here respectively to ensure that $β_{v}^{K} (t) = 0$ if and only if $N_{v}^{K} (t) = 0$ . As $K \to \infty$ , $β_{v}^{K}$ ranges between 0 and 1.

Remark 1

In contrast to Champagnat et al. (2021), Coquille et al. (2021), we do not rescale the time by $ln K$ in this definition of $β^{K}$ since we are studying a variety of different time scales.

Based on this definition, we introduce the following subsets of traits.

Definition 2

(macroscopic, microscopic, living and resident traits)

(i)
A trait $v \in V$ with exponent $β_{v}^{K}$ is called macroscopic if ${lim inf}_{K \to \infty} β_{v}^{K} = 1$ .
(ii)
A trait that is not macroscopic is called microscopic.
(iii)
The set of living traits is the set $V_{living}^{K} : = {v \in V : β_{v}^{K} > 0}$ .
(iv)
A subset of traits $v \subseteq V$ is called resident if all $v \in v$ are macroscopic and have a population size close to the coexistence equilibrium $\bar{n} (v)$ .

Remark 2

Note that these definitions are time dependent when considering an evolving population. The macroscopic traits change according to $β^{K} (t)$ and the varying subset of living traits is denoted by $V_{living}^{K} (t)$ . Most of the time macroscopic and resident traits coincide. A non-resident macroscopic trait is either unfit and will shrink to an order lower than K within a short time, or it is fit and will therefore induce a change in resident traits according to the short-term Lotka–Volterra dynamics.

To study multi-step mutations we consider paths on the trait graph $G = (V, E)$ .

Definition 3

(paths and distances) We denote a (finite) path on $G = (V, E)$ by $γ = (γ_{0}, \dots, γ_{ℓ})$ such that $γ_{i} \in V$ , $0 \leq i \leq ℓ$ , and $(γ_{i}, γ_{i + 1}) \in E$ , $0 \leq i \leq ℓ - 1$ .

The length of a path $γ = (γ_{0}, \dots, γ_{ℓ})$ is defined as $|γ| = ℓ$ . We write $γ : v \to v^{'}$ as a short notation for all paths $γ$ that connect $v \subset V$ to $v^{'} \subset V$ , i.e. that satisfy $γ_{0} \in v$ and $γ_{|γ|} \in v^{'}$ .

We introduce the graph distance between two vertices $v, w \in V$ as the length of the shortest connecting path

\begin{matrix} d (v, w) : = min_{γ : v \to w} |γ|, \end{matrix}

where the minimum over an empty set is taken to be $\infty$ . For two subsets $v, v^{'} \subset V$ we define

\begin{matrix} d (v, v^{'}) : = min_{v \in v, v^{'} \in v^{'}} d (v, v^{'}) . \end{matrix}

Remark 3

Note that d(v, w) is not a distance in the classical sense, as it may not be symmetric in the case of a directed graph.

Along these paths $γ$ , mutants can be produced. A macroscopic trait produces subpopulations of a size of order $K μ_{K}$ of its neighbouring traits, which then produce subpopulations of a size of order $K μ_{K}^{2}$ of the second order neighbours, and so on. These subpopulations, that are produced along a path $γ$ , can survive as long as $K μ_{K}^{ℓ} ≫ 1$ . This motivates the study of mutation probabilities $μ_{K} = K^{- 1 / α}$ , $α > 0$ , where mutants can survive within a radius $α$ of the resident traits.

Remark 4

We could also study mutation probabilities $μ_{K} = f (K) K^{- 1 / α}$ such that $|ln (f (K))| \in o (ln K)$ . This would not change the following results. However, we restrict ourselves to the case of $f (K) \equiv 1$ to simplify notation.

To avoid mutant subpopulations with a size of order $K^{0} = 1$ and to ensure that non-resident traits are always either fit or unfit we make the following assumptions.

Assumption 2

(i)
The mutation probability satisfies $μ_{K} = K^{- 1 / α}$ for some $α \in R_{+} \ N$ .
(ii)
For each $v \subset V$ such that $L V E_{+} (v) = {\bar{n} (v)}$ , it holds $f (w, v) \neq 0$ , for all $w \notin v$ .

Remark 5

Both of these assumptions are purely technical. The first one prevents the case where a fit mutant population of size of order 1 can die out due to stochastic fluctuations such that fixation in the population becomes random. The second one allows us to approximate non-resident subpopulations by branching processes that are either super- or subcritical, but not critical. Note that the second assumption is only required for subsets $v$ that allow for a unique positive equilibrium state (i.e. such that $L V E_{+} (v)$ contains exactly one element).

Under these assumptions, the evolution of the population on the time scale $ln K$ has been studied in Coquille et al. (2021). The authors give an algorithmic description of the limiting evolution of $β^{K} (t ln K)$ as long as there always exists a unique asymptotically stable equilibrium of the Lotka–Volterra system (2) involving all macroscopic traits. In the following, we give the heuristics of this description. For the precise result we refer to Sect. A.2.

Roughly speaking, for a given set of resident traits $v$ at their (coexistence) equilibrium $\bar{n} (v)$ , every living microscopic trait $w \in V_{living}$ can grow (or shrink) with rate at least $f (w, v)$ . This is due to the fact that the competitive interaction with all microscopic traits can be neglected in comparison with this rate. If there was no mutation (i.e. $μ_{K} = 0$ ), $f (w, v)$ would be the exact growth rate of w. However, due to incoming mutants from neighbouring traits, the population size of w is also at least as big as a $μ_{K}$ -fraction of the population sizes of its (incoming) neighbours. Since we only consider the order of the population size $β_{w}^{K}$ , the largest of these influences dominates the asymptotics and a sum of population sizes (coming from different mutation sources) yields a maximum in the exponent. Overall, we obtain the relation

\begin{matrix} β_{w}^{K} (t ln K) \approx (β_{w}^{K} (0) + t f (w, v)) \lor max_{u \in V : d (u, w) = 1} (β_{u}^{K} (t ln K) - \frac{1}{α}) . \end{matrix}

Iterating this argument for traits at increasing distance to w yields that, as long as the resident traits remain unchanged (i.e. traits $v$ stay close to their equilibrium $\bar{n} (v$ ) and no new traits become macroscopic), $β^{K} (t ln K)$ converges to $β (t)$ such that

\begin{matrix} β_{w} (t) = max_{u \in V} {[β_{u} (0) + (t - t_{u}) f (u, v) - \frac{d (u, w)}{α}]}_{+} . \end{matrix}

Here,

\begin{matrix} t_{u} : = \{\begin{matrix} inf \{s \geq 0 : \exists u^{'} \in V : β_{u^{'}} (s) = \frac{1}{α}, (u^{'}, u) \in E\} & if β_{u} (0) = 0, \\ 0 & if β_{u} (0) > 0 . \end{matrix}) \end{matrix}

Once a former microscopic trait $w^{*}$ becomes macroscopic, the population sizes of $v \cup w^{*}$ follow the Lotka–Volterra dynamics of (2) to reach a new equilibrium associated to the resident traits $v^{'} \subset v \cup w^{*}$ within a time of order 1 (if such a new unique equilibrium does not exist, or in a number of other technical special cases, the algorithm terminates as described in Sect. A.2). During this phase, the orders of population sizes $β_{w}$ do not change significantly. After the change of resident traits, the population sizes again follow (10), now with the changed fitnesses $f (u, v^{'})$ .

This algorithmic description yields a series of successive resident traits. The macroscopically visible evolution stops as soon as an equilibrium $v$ is reached such that $f (w, v) < 0$ for all $w \in V_{living} \ v$ . All traits $w \in V$ such that $d (v, w) < α$ stay alive due to incoming mutations but all other traits eventually go extinct according to (10) on the $ln K$ -time scale.

This observation leads us to the following definitions (visualised in Fig. 1).

Definition 4

(mutation spreading neighbourhood) For a subset $v \subset V$ , we denote by $V_{α} (v) : = {w \in V : d (v, w) < α}$ the mutation spreading neighbourhood of $v$ . The traits at the boundary of $V_{α}$ are denoted by $\partial V_{α} (v) : = {w \in V : d (v, w) = ⌊ α ⌋}$ .

Definition 5

((asymptotic) evolutionary stable condition)

(i)
A subset $v \subset V$ and (orders of) population sizes $β$ are called an evolutionary stable condition (ESC) if the traits $v$ can coexist at a unique globally asympotically stable equilibrium $\bar{n} (v)$ ,
$\begin{matrix} f (w, v) < 0, \forall w \in V_{α} (v) \ v, \end{matrix}$ 12
and
$\begin{matrix} β_{w} = {(1 - \frac{d (v, w)}{α})}_{+}, \forall w \in V . \end{matrix}$ 13
(ii)
A subset $v \subset V$ and population sizes ${(β^{K})}_{K \geq 0}$ are called an asymptotic evolutionary stable condition if the traits $v$ can coexist at equilibrium $\bar{n} (v)$ , (12) is satisfied,
$\begin{matrix} |β_{w}^{K} - (1 - d (v, w) / α)| \in O (\frac{1}{ln K}), \forall w \in V_{α} (v), \end{matrix}$ 14
and there exists a $K_{0} < \infty$ such that $β_{w}^{K} = 0$ , for all $K > K_{0}$ and $w \in V \ V_{α} (v)$ .

Remark 6

(i)
Note that (12) is only a necessary condition for a subset $v \subset V$ to be able to attain an ESC during the evolution of a population. (13) are the orders of population sizes that unfit traits stabilise at purely due to (multi-step) mutations from $v$ . (12) guarantees that these will be reached for $w \in V_{α} (v)$ . To attain an ESC $(v, β)$ , in addition all other traits $w \in V_{living} (τ_{v})$ , that are alive at the time $τ_{v}$ when the new equilibrium $\bar{n} (v)$ is reached, have to satisfy $f (w, v) < 0$ . If this is the case, all traits outside of $V_{α} (v)$ will die out within a time of order $ln K$ and (13) will be reached. Otherwise, if there is a $w \in V_{living} (τ_{v}) \ V_{α} (v)$ such that $f (w, v) > 0$ (the case $f (w, v) = 0$ is excluded by Assumption (2)), its subpopulation is able to grow, will not die out, and hence not satisfy (13). The characterization of ESCs is therefore highly dependent on the state of the whole system.
(ii)
Note that the definition of an asymptotic ESC forces the population process to be in an ESC up to a multiplicative error of order one. That is
$\begin{matrix} N_{w}^{K} = (K^{{(1 - d (v, w) / α)}_{+}} - 1) \times O (1) . \end{matrix}$ 15
The reason for introducing this error is that, for finite K, $N_{w}^{K}$ might never reach exactly $K^{{(1 - d (v, w) / α)}_{+}}$ . This is for example the case if ${\bar{n}}_{v} (v) < 1$ for some $v \in v$ .

By definition, an evolutionary stable condition is surrounded by unfit traits, at least within an $α$ -radius. This form of a fitness landscape is referred to as a fitness valley and has been studied in a special case in Bovier et al. (2019). Based on this, we introduce a measure for the stability of a coexistence equilibrium, connected to the width of the surrounding fitness valley.

Definition 6

(Stability degree) For a subset $v \subseteq V$ we define its stability degree $L (v)$ by

\begin{matrix} L (v) : = \{\begin{matrix} {min}_{w \in V : f (w, v) > 0} d (v, w) & if v can coexist, \\ 0 & else . \end{matrix}) \end{matrix}

Remark 7

A subset $v$ associated to an ESC satisfies $L (v) > α$ by definition. The evolution of the population process reaches a final state, independent of the time scale, once the resident traits satisfy $L (v) = \infty$ , i.e. there are no fit traits anymore.

Transitioning out of an ESC and first convergence result

Once an ESC is obtained, there is no further evolution on the $ln K$ -time scale. However, as long as there is a fitter trait that is connected to the resident traits, i.e. that can be reached along a finite path in G, we can witness metastable transitions on an even more accelerated time scale. On this time scale, under certain assumptions on the $ln K$ -dynamics, we observe a direct transition from one ESC to another.

In the following, we consider one of these transitions for an arbitrary initial asymptotic ESC. We split the transition into two phases: In the first phase, a new fit mutant beyond the fitness valley fixates in the population within a time of order $1 / K μ_{K}^{L (v)}$ . In the second phase, a new ESC is obtained, starting with these new initial conditions, which takes a time of order $ln K$ . We assume that $v$ and ${(β^{K} (0))}_{K \geq 0}$ are an asymptotic ESC. We could also consider more general initial conditions that lead to an ESC within finitely many steps of the $ln K$ -algorithm in Coquille et al. (2021), see Remark 11. For the sake of a simpler notation, we stick with the assumption of starting in an (asymptotic) ESC here.

To consider the first phase of the transition, we introduce the set

\begin{matrix} V_{mut} (v) & : = \underset{w \in V : f (w, v) > 0}{arg min} d (v, w) = {w \in V : f (w, v) > 0, d (v, w) = L (v)} . \end{matrix}

This consists of all fit mutant traits that are closest to $v$ (visualised in Fig. 1).

Note that $V_{mut} (v) \cap V_{α} (v) = \emptyset$ by the definition of an ESC. It turns out that the traits $V_{mut} (v)$ are the only ones that need to be considered for a crossing of the fitness valley since one of them will be the first new trait to fixate in the equilibrium population. If $V_{mut} (v) = \emptyset$ , i.e. $L (v) = \infty$ , there is no fitter trait connected to the resident traits and the equilibrium associated to $v$ is the final state of the population.

For $L (v) < \infty$ , we define the stopping time

\begin{matrix} T_{fix}^{K} : = inf \{t \geq 0 : \exists w \in V \ V_{α} (v) : β_{w}^{K} (t) \geq \frac{1}{α}\}, \end{matrix}

the first time when a new trait reaches a size of order $K^{1 / α}$ , can thus produce neighbouring mutants within a time of order 1 and influence the subpopulations of other traits.

Remark 8

Note that the name $T_{fix}^{K}$ might be a little misleading at first glance. Generally, we speak of the fixation of a trait within a population as the event that the subpopulation corresponding to this trait does not go extinct (due to random fluctuations or negative fitness), as long as the fitness landscape stays unchanged. As this event is determined by the future progression of the population, there is no precise time point to pin it to. In particular, whether a trait fixates or goes extinct is not foreseeable at the time point when the first individual of this trait arises. Therefore, we choose instead the time point when the subpopulation has reached a size that guarantees non-extinction with probability 1, asymptotically as $K \to \infty$ . We could choose a much smaller size than $K^{1 / α}$ for this, however, this will not influence the event of fixation and only change the stopping time by a time of order $ln K$ , which is negligible compared to the much longer time scale on which mutants arise. We thus pick the first time when mutants can influence the population size of other traits.

Our first result describes the limiting distribution of this stopping time $T_{fix}^{K}$ .

For a path $γ : v \to V_{mut} (v)$ such that $|γ| = L (v)$ , the rate at which a $w = γ_{L (v)}$ mutant population arises along this path $γ$ and fixates can be derived as the product of several factors. The rate at which the first trait in $γ$ outside of $V_{α} (v)$ arises can be calculated in terms of the equilibrium population sizes of the traits in $V_{α} (v)$ (see Sect. 4.1). This rate then has to be multiplied by the probabilities that all of the following unfit traits on the path $γ$ produce mutants of the correct trait before extinction, i.e. during small subcritical excursions. This yields the rate at which single mutants of trait w arise, which finally has to be multiplied by their probability of fixating in the population, i.e. of non-extinction.

In order to calculate the probability of mutation during a subcritical excursion, we need to introduce some notation. For a subset $v \subset V$ and a trait $w \in V$ we define

\begin{matrix} ρ (w, v) : = \frac{b (w)}{b (w) + d (w) + \sum_{v \in v} c (w, v) {\bar{n}}_{v} (v)}, \end{matrix}

which is connected to the probability of a birth event in the branching process approximating the growth of a mutant w in a bulk population of coexisting traits $v$ . Moreover, we let

\begin{matrix} λ (ρ) : = \sum_{ℓ = 1}^{\infty} \frac{(2 ℓ)!}{(ℓ - 1)! (ℓ + 1)!} ρ^{ℓ} {(1 - ρ)}^{ℓ + 1}, \end{matrix}

which is the expected number of birth events before extinction in a subcritical birth death process with birth probability $ρ$ (related to the expected number of positive jumps in a simple random walk on $N$ before hitting 0, as explained in the proof of this result in Sect. A, Lemma 17). Note that, for $ρ \in [0, 1 / 2)$ , one can explicitely calculate that $λ (ρ) = ρ / (1 - 2 ρ) < \infty$ . Moreover, the symmetry relation $λ (ρ) ρ = λ (1 - ρ) (1 - ρ)$ shows convergence of the series for $ρ \in (1 / 2, 1]$ as well.

With these definitions, the overall rate of mutants of trait $w \in V_{mut} (v)$ arising along path $γ$ and fixating in the population is approximately equal to $R (v, γ) K μ_{K}^{L (v)}$ , where

\begin{matrix} R (v, γ) : = & {\bar{n}}_{γ_{0}} (v) (\prod_{i = 1}^{⌊ α ⌋}, \frac{b (γ_{i - 1}) m (γ_{i - 1}, γ_{i})}{|f (γ_{i}, v)|}) b (γ_{⌊ α ⌋}) m (γ_{⌊ α ⌋}, γ_{⌊ α ⌋ + 1}) \\ \cdot (\prod_{j = ⌊ α ⌋ + 1}^{L (v) - 1}, λ, (ρ (γ_{j}, v)), m, (γ_{j}, γ_{j + 1})) \cdot \frac{f (γ_{L (v)}, v)}{b (γ_{L (v)})} . \end{matrix}

Here, the first line is the rate at which the first trait in $γ$ outside of $V_{α} (v)$ arises, which is related to the equilibrium size of trait $γ_{⌊ α ⌋}$ . The first factor in the second line is the probability of producing consecutive mutants during subcritical excursions and the last factor is the fixation probability of trait $w = γ_{L (v)}$ . Note that, as $b (γ_{L (v)})$ increases, so does $f (γ_{L (v)}, v)$ (cf. (4)), and hence this fixation probability is in fact increasing in the birth rate $b (γ_{L (v)})$ .

The total rate at which a mutant population of trait $w \in V_{mut} (v)$ arises and fixates collects all shortest paths that end in w and is approximately equal to $R (v, w) μ_{K}^{L (v)}$ , where

\begin{matrix} R (v, w) : = \sum_{\begin{matrix} γ : v \to w \\ |γ| = L (v) \end{matrix}} R (v, γ) . \end{matrix}

Finally, the total rate at which any mutant population of a trait in $V_{mut} (v)$ arises and fixates, i.e. the rate at which the population exits the ESC associated to $v$ , is approximately equal to $R (v) μ_{K}^{L (v)}$ , where

\begin{matrix} R (v) : = \sum_{w \in V_{mut} (v)} R (v, w) . \end{matrix}

The probability that this population is of trait $w \in V_{mut} (v)$ is proportional to the rate $R (v, w)$ .

With these heuristics, we can now state the first main result of this paper.

Theorem 7

Let $G = (V, E)$ be a finite graph. Suppose that Assumption 1 and 2 are satisfied and consider the model defined by (1) with $μ_{K} = K^{- 1 / α}$ . Assume that $v \subset V$ and ${(β^{K} (0))}_{K \geq 0}$ are an asymptotic ESC. Then there exist constants $ε_{0} > 0$ and $0 < c < \infty$ such that, for all $0 < ε < ε_{0}$ , there exist exponential random variables $E_{+}^{K} (ε)$ and $E_{-}^{K} (ε)$ with parameters $R (v) (1 + c ε)$ and $R (v) (1 - c ε)$ , such that

\begin{matrix} \underset{K \to \infty}{lim inf} P (E_{-}^{K} (ε) \leq T_{fix}^{K} K μ_{K}^{L (v)} \leq E_{+}^{K} (ε)) \geq 1 - c ε . \end{matrix}

Moreover, for all $w \in V$ , the probability of w being the trait to trigger $T_{fix}^{K}$ is

\begin{matrix} lim_{K \to \infty} P (β_{w}^{K} (T_{fix}^{K}) = 1 / α) = \{\begin{matrix} R (v, w) / R (v) & if w \in V_{mut} (v), \\ 0 & else. \end{matrix}) \end{matrix}

Remark 9

Note that traits in $w \in V_{α} (v)$ do not attain $β_{w}^{K} = 1 / α$ before $T_{fix}^{K}$ due to the assumption that $α \notin N$ . Therefore the probability in (25) is zero for such traits.

Once some $w \in V_{mut} (v)$ has reached $β_{w}^{K} \geq 1 / α$ , the $ln K$ -dynamics evolve as described in Coquille et al. (2021), initiated with $β_{w}^{K} = 1 / α$ and $β_{u}^{K} = {(1 - d (v, u) / α)}_{+}$ , for $u \in V \ w$ . These dynamics are deterministic and in case they do not terminate early and if they lead to a new ESC, we denote the associated set of resident traits by $v_{ESC} (v, w)$ .

Observe that there is no general formula to express $v_{ESC} (v, w)$ in terms of $v$ and w and the parameters of the system. An interesting case is illustrated in Example 3.

Under the assumption that all traits $w \in V_{mut} (v)$ lead to asymptotic ESCs $v_{ESC} (v, w)$ , we define the stopping time at which one of these asymptotic ESCs is obtained by

\begin{matrix} T_{ESC}^{K} : = inf { & t \geq T_{fix}^{K} : \exists w \in V_{mut} (v) : \\ \forall u \in V_{α} (v_{ESC} (v, w)) : |β_{u}^{K} (t) - (1 - \frac{d (v_{ESC} (v, w), u)}{α})| < ε_{K}, \\ \forall u \notin V_{α} (v_{ESC} (v, w)) : β_{u}^{K} (t) = 0}, \end{matrix}

where we pick $ε_{K} = C / ln K$ for a large enough $0 < C < \infty$ . Then this definition is precisely in line with the definition of an asymptotic ESC.

Remark 10

The minimal necessary C can be made precise using the prefactors of the population sizes in equilibrium, calculated in Lemma 14. We refrain from doing so here as it is notationally very heavy and does not provide any deeper insight.

Since the time $T_{ESC}^{K} - T_{fix}^{K}$ is of order $ln K$ , the asymptotics for $T_{fix}^{K}$ translate to $T_{ESC}^{K}$ . Moreover, the transition probabilities from one ESC to another can be expressed in terms of the probabilities of traits $w \in V_{mut} (v)$ fixating in the population. For $w \subset V$ we define

\begin{matrix} p (v, w) : = \sum_{\begin{matrix} w \in V_{mut} (v) : \\ v_{ESC} (v, w) = w \end{matrix}} \frac{R (v, w)}{R (v)} . \end{matrix}

Example 1 treats a case where this probability is indeed the sum over multiple mutant candidates w.

We can now state the result on transitions between ESCs as a direct corollary of Theorem 7.

Corollary 8

Suppose the same assumptions as in Theorem 7 are satisfied. Moreover, assume that, for every $w \in V_{mut} (v)$ , the algorithmic description of the $ln K$ -dynamics in Sect. A.2, initiated with

\begin{matrix} β_{u} (0) = \{\begin{matrix} \frac{1}{α} & if u = w \\ {(1 - \frac{d (v, u)}{α})}_{+} & else \end{matrix}), \end{matrix}

does not stop early due to one of its termination criteria and reaches an ESC associated to some traits $v_{ESC} (v, w)$ after finitely many steps. Then, $T_{ESC}^{K} - T_{fix}^{K} \in O (ln K)$ and therefore, with the same constants $ε_{0}$ and c and with the same random variables $E_{+}^{K} (ε)$ and $E_{-}^{K} (ε)$ as in Theorem 7,

\begin{matrix} \underset{K \to \infty}{lim inf} P (E_{-}^{K} (ε) \leq T_{ESC}^{K} K μ_{K}^{L (v)} \leq E_{+}^{K} (ε)) \geq 1 - c ε . \end{matrix}

Moreover, for all $w \subset V$ ,

\begin{matrix} lim_{K \to \infty} P ({u \in V : β_{u}^{K} (T_{ESC}^{K}) > 1 - ε_{K}} = w) = p (v, w) . \end{matrix}

Remark 11

(i)
Note that Theorem 7 and Corollary 8 only consider a specific transition from the ESC associated to some $v$ to another ESC. The constants $ε_{0}$ and c can however be chosen uniformly for all ESCs by reason of the finite trait graph.
(ii)
Both results assume that the system starts out in an asymptotic ESC. These are the natural initial conditions, particularly when a first transition between asymptotic ESCs has already occurred. We could however allow for more general initial conditions, as long as they lead to an asymptotic ESC within finitely many steps of the $ln K$ -algorithm.

Multi-scale jump chain and limiting Markov jump processes

Building on the previous description of a single transition step from one ESC to another, we now want to describe the multi-step transitions between ESCs as a jump chain ${(v^{(k)})}_{k \geq 0}$ on a meta-graph. We first introduce the underlying metastability graph $G_{ESC}$ , consisting of all sufficiently stable macroscopic equilibrium configurations, and then describe the dynamics of the jump chain. Finally, we give a convergence result that derives different Markov jump processes, depending on the chosen time scale.

Definition 9

(Metastability graph) As vertices for the general metastability graph $G_{ESC} = (V_{ESC}, E_{ESC})$ we take all sets of resident traits that correspond to an ESC, i.e. that have stability degree strictly bigger than $α$ , and edges represent possible transitions to other ESCs. More precisely,

\begin{matrix} V_{ESC} : = & \{v \subseteq V : L (v) > α\}, \end{matrix}

\begin{matrix} E_{ESC} : = & \{(v, w) : \exists w \in V_{mut} (v) s.t. w = v_{ESC} (v, w)\} . \end{matrix}

Recall that $v_{ESC} (v, w)$ stands for the resident traits associated to the new ESC that is attained at the end of the $ln K$ -algorithm being started with resident set $v$ and invading mutant $w \in V_{mut} (v)$ . We already assigned to each vertex $v \in V_{ESC}$ the exit rate $R (v)$ in (23) and to each edge $(v, w) \in E_{ESC}$ the transition probability $p (v, w)$ in (27).

Using Corollary 8, we can now work out inductively the multi-scale jump chain ${(v^{(k)})}_{k \geq 0}$ on $G_{ESC}$ . To this end, let $v^{(0)} \in V_{ESC}$ be the resident traits of the initial ESC that the process starts in and set $T_{ESC}^{(0, K)} : = 0$ . We describe the $k^{th}$ transition, for $k \geq 1$ , conditioned on the knowledge of $v^{(k - 1)}$ . We denote the set of closest fit mutant traits by $V_{mut}^{(k)} = V_{mut} (v^{(k - 1)})$ , the width of the next fitness valley to cross by $L^{(k)} = L (v^{(k - 1)})$ , and the exit rate by $R^{(k)} = R (v^{(k - 1)})$ . Moreover, we keep track of the time when the first mutant population fixates and when the next ESC is reached by introducing the stopping times

\begin{matrix} T_{fix}^{(k, K)} : = inf { & t \geq T_{ESC}^{(k - 1, K)} : \exists w \in V \ V_{α} (v^{(k - 1)}) : β_{w}^{K} (t) \geq \frac{1}{α}}, \end{matrix}

\begin{matrix} T_{ESC}^{(k, K)} : = inf { & t \geq T_{fix}^{(k, K)} : \exists w \in V_{mut}^{(k)} : \\ \forall u \in V_{α} (v_{ESC} (v^{(k - 1)}, w)) : \\ |β_{u}^{K} (t) - (1 - \frac{d (v_{ESC} (v^{(k - 1)}, w), u)}{α})| < ε_{K}, \\ \forall u \notin V_{α} (v_{ESC} (v^{(k - 1)}, w)) : β_{u}^{K} (t) = 0}, \end{matrix}

with $ε_{K}$ as in 26.

With this notation, we can now state the result on the $k^{th}$ transition of the multi-scale jump chain.

Corollary 10

Assume that we constructed the process up to time $T_{ESC}^{(k - 1, K)}$ , when the ESC associated to $v^{(k - 1)}$ is obtained, and suppose the same assumptions as in Theorem 7 are satisfied. Moreover assume that the $ln K$ -dynamics behave as in Corollary 8, for every $w \in V_{mut}^{(k)}$ . Then there exist constants $ε_{0} > 0$ and $0 < c < \infty$ such that, for all $0 < ε < ε_{0}$ , there are exponential distributed random variables $E_{+}^{(k, K)} (ε)$ and $E_{-}^{(k, K)} (ε)$ with parameters $R_{\pm}^{(k)} (ε) : = R^{(k)} (1 \pm c ε)$ such that

\begin{matrix} \underset{K \to \infty}{lim inf} P (E_{-}^{(k, K)} (ε) \leq (T_{ESC}^{(k, K)} - T_{ESC}^{(k - 1, K)}) K μ_{K}^{(k)} \leq E_{+}^{(k, K)} (ε) | v^{(k - 1)}) \geq 1 - c ε . \end{matrix}

Moreover, for all $w \subset V$ ,

\begin{matrix} lim_{K \to \infty} P ({v \in V : β_{v}^{K} (T_{ESC}^{(k, K)}) > 1 - ε_{K}} = w | v^{(k - 1)}) = p (v^{(k - 1)}, w) . \end{matrix}

The preceding corollary allows us to construct a limiting random jump chain ${(v^{(k)})}_{k \geq 0}$ on the metastability graph $G_{ESC}$ . To be precise, given the current state $v^{(k - 1)}$ , the next ESC $v^{(k)}$ is taken at random from $V_{ESC}$ with probability distribution $p (v^{(k - 1)}, \cdot)$ . However, the jumps take place on varying time scales of type $1 / K μ_{K}^{(k)}$ . The construction is valid until an ESC is obtained such that some mutant $w \in V_{mut}^{(k)}$ does not induce a unique new ESC, following the deterministic $ln K$ -dynamics. A visualisation of the metastability graph including a particular realisation of the jump chain is given in Fig. 2.

Fig. 2 — Metastability graph $G_{ESC}$ including a jump chain ${(v_{k})}_{k \geq 0}$ , where $v_{i}^{(4)} = v_{ESC} (v^{(3)}, w_{i})$ , for $V_{mut} (v^{(3)}) = {w_{1}, w_{2}, w_{3}}$

After this general description of the multi-scale jump chain we can now easily elaborate the true Markov jump process on each time scale. To be more precise, for each stability degree $L > α$ , we are looking for the limit process of $N_{w}^{K} (t / K μ_{K}^{L}) / K$ , for $K \to \infty$ . The support of this process jumps between sets of coexisting traits of sufficiently high stability degree, which can only be exited on this time scale. In this context, we define the level sets of equal stability degree as

\begin{matrix} S^{L} : = \{v \subseteq V : L V E_{+} (v) = {\bar{n} (v)}, L (v) = L\} . \end{matrix}

Note that, for $L > α$ , a stability degree of $L (v) = L$ ensures that the coexisting traits $v$ allow for an asymptotic ESC, see Remark 7.

As the state space for the limiting jump process, we introduce the L-scale graph $G^{L}$ , which is a collapsed version of $G_{ESC}$ . The vertex set consists of all ESCs that are stable enough to be visible on the respective time scale. Therefore, we set

\begin{matrix} V^{L} : = ⋃_{L^{'} \geq L} S^{'} . \end{matrix}

Note that it is possible that the process jumps into an ESC $v \in S^{'}$ , for $L^{'} > L$ , on the $1 / K μ_{K}^{L}$ -time scale. However, there is no possibility to escape from those on this time scale, which means that these are absorbing states.

Edges $E^{L}$ in $G^{L}$ represent possible transitions of the limiting process. To construct these, we study the limiting jump chain from Corollary 10.

In order to use the corollary and in particular the process ${(v^{(k)})}_{k \geq 0}$ , we have to ensure that, for fixed $L > α$ , this process always reaches an ESC of stability degree at least L in finitely many steps.

Assumption 3

\begin{matrix} \forall v \in S^{L} : P (\exists k \in N_{> 0} : L (v^{(k)}) \geq L | v^{(0)} = v) = 1 \end{matrix}

Note that, if this assumption is satisfied for some fixed L, this has no implications for the validity for different $L^{'} \neq L$ . This is due to the fact, that only the initial conditions $v \in S^{L}$ are considered. One can easily think of counterexamples where $G_{ESC}$ is non-connected such that there may be cycles of lower time scale but there is no danger to run into them. For a broader discussion of the assumption we refer to the Examples 4 and 5.

Remark 12

If the process runs into a cycle or stable cluster on a lower time scale, there are still possibilities to escape from these by accelerating and looking at higher time scales. The detailed description of such behaviour is much more involved. This is mainly due to technical reasons: Errors accumulate in the approximation of each transition step. As long as it is ensured that the system reaches a (sufficiently stable) ESC after finitely many steps, these errors can be iteratively bounded to ensure convergence. This however fails if the number of lower time scale transitions between higher time scale jumps is not bounded. Heuristically, if one can observe ergodic behaviour on the $L^{'}$ -scale graph, for some $L^{'} < L$ , transitions out of the ergodic cluster will occur along one of the shortest fitness valleys of width L. Transition rates will be weighed according to the stationary distribution on states in $S^{'}$ and the transition takes a time of order $1 / K μ_{K}^{L}$ . Rather than defining vertices of $G^{L}$ as single sets of coexisting traits in $S^{L}$ , one would then choose communication classes of such sets in $S^{'}$ (for possibly multiple $L^{'} < L$ ) that support an ergodic stationary distribution. Rigorously justifying this argument is a topic of current and future research of the authors.

Now asking for possible jumps in $G^{L}$ we have to respect again the principle that jumps on lower time scales are absorbed in those happening on the $1 / K μ_{K}^{L}$ -time scale. This means that the critical event for a transition starting in $v \in S^{L}$ is to escape from $v$ , which needs a time of order $1 / K μ_{K}^{L}$ . Compared to that, the subsequent transitions in $G_{ESC}$ until reaching again a state $w$ of stability at least $L (w) \geq L$ take place in very short time. Therefore we say that the (directed) egde $(v, w)$ is in $E^{L}$ if and only if $L (v) = L$ and there exists a finite path $Γ : v \to w$ in $G_{ESC}$ such that $L (Γ_{i}) < L, \forall 1 \leq i < |Γ|$ .

The probability of possible transitions $(v, w) \in E^{L}$ is then the sum over all possible paths $Γ$ that give rise to this edge, while the probability of taking a particular path is easily computed as the product of its segments in $G_{ESC}$ .

\begin{matrix} p^{L} (v, w) : = \sum_{\begin{matrix} Γ : v \to w \\ L (Γ_{i}) < L, \forall 1 \leq i < |Γ| \end{matrix}} \prod_{i = 1}^{|Γ|} p (Γ_{i - 1}, Γ_{i}) \end{matrix}

For an explanatory computation of these probabilities we refer to the Examples 6 and 7.

The transition rate for the jumps on the $1 / K μ^{L}$ time scale are then given by the over-all rate to escape from $v$ weighted with the transition probability to end in $w$ .

\begin{matrix} R^{L} (v, w) : = R (v) p^{L} (v, w) \end{matrix}

Now we are prepared to formulate the main result, i.e. the convergence to a Markov jump process on different time scales.

Theorem 11

Let $L > α$ such that Assumption 3 holds true and take $v^{L} (0) \in V^{L}$ . Suppose the same assumptions as in Theorem 7 are satisfied for $v = v^{L} (0)$ and assume that the $ln K$ -dynamics behave as in Corollary 8, for every $v \in ⋃_{L^{'} \leq L} S^{'}$ and all $w \in V_{mut} (v)$ . Then, for all $T < \infty$ , the rescaled process $(N_{v}^{K} (t / K μ_{K}^{L}) / K, v \in V, t \in [0, T])$ converges in the sense of finite dimensional distributions to a jump process of the form

\begin{matrix} N_{v}^{L} (t) = 1_{v \in v^{L} (t)} {\bar{n}}_{v} (v^{L} (t)), & v \in V, t \in [0, T] . \end{matrix}

Here $(v^{L} (t), t \in [0, T])$ is a Markov jump process on the L-scale graph $G^{L} = (V^{L}, E^{L})$ , with transition rates given by (41).

Remark 13

(i)
We like to point out that Assumption 3 does not exclude the cases where we have cycles in $G^{L}$ , i.e. on the time scale $1 / K μ_{K}^{L}$ . It only prevents the process from running into a cycle of lower time scale. We even allow for self connecting edges, i.e. edges of the form $(v, v)$ .
(ii)
As shown in Champagnat (2006, Prop. 1) it is not possible to get convergence with respect to the Skorohod (J1)-topology since this would imply continuity for the limit of the total mass process, which cannot be true.

Interesting examples

In this chapter, we present and analyse a variety of examples that aim to highlight different aspects of the complicated dynamics covered in our main results. The first two examples are dedicated to single transition steps from one ESC to another, applying the results of Theorem 7 and Corollary 8. The next three examples focus on the metastability graph $G_{ESC}$ that is constructed in Corollary 10 and we study two cases that are concerned with Assumption 3. The final two examples are focussed on applications of Theorem 11, studying the limiting Markov jump processes on different time scales as well as the L-scale-graphs $G^{L}$ .

In order to give a manageable and clear description of the dynamically changing fitness landscape, we introduce some new notation that helps to simplify the set up of the examples.

Definition 12

We speak of a regime of equal competition if and only if $c (v, w) \equiv const > 0$ , for all $v, w \in V$ .

This is by no means a necessary assumption to produce the studied phenomena, however, it allows us to characterise the fitness landscape in a much simpler way. In the case of equal competition, the invasion fitness of a trait w with respect to a single resident trait v is fully characterised by

\begin{matrix} f (w, v) = r (w) - r (v), \end{matrix}

where we set $r (v) : = b (v) - d (v)$ as the individual fitness of trait v, i.e. its net growth rate in the absence of competitive interactions. As a consequence, traits w with higher individual fitness than the resident v are able to invade the population. Hence, instead of specifying the invasion fitnesses for all possible resident traits, the fitness landscape is fully described by the individual fitnesses r(v).

To specify the fitness relations between different traits - in particular in the case of non-equal competition - we introduce the following notation.

Definition 13

For $v, w \in V$ , we write $v ≪ w$ if and only if $f (w, v) > 0$ and $f (v, w) < 0$ . Moreover, we write $v_{1}, \dots, v_{k} ≪ w_{1}, \dots, w_{l}$ whenever $v_{i} ≪ w_{j}$ , for all $1 \leq i \leq k$ and $1 \leq j \leq l$ .

This reflects the case where the equilibrium of the Lotka–Volterra system involving v and w is the monomorphic equilibrium $\bar{n} (w)$ of w. In other words w can invade the v population and fixate.

Single transition steps

A first example with multiple mutation paths

Example 1

Let us consider the directed graph G depicted in Fig. 3. Assume equal competition and the individual fitness r plotted in Fig. 3. Moreover, let $α \in (1, 2)$ .

In this case, the initial resident trait 0 has stability degree $L ({0}) = 2 > α$ . This is due to the fact that traits 1a and 1b are unfit in presence of the resident, while traits 2a, 2b and 2c are fit, with connecting paths $γ^{A} = (0, 1 a, 2 a)$ , $γ^{B} = (0, 1 b, 2 b)$ and $γ^{C} = (0, 1 b, 2 c)$ of length 2 respectively. Therefore, we have the possible mutant candidates $V_{mut} ({0}) = \{2 a, 2 b, 2 c\}$ . An application of Theorem 7 yields that we can observe a new fixating trait at rescaled time $T_{fix}^{K} K μ_{K}^{2}$ , which is distributed approximately as a exponential random variable with rate $R ({0}) = R ({0}, 2 a) + R ({0}, 2 b) + R ({0}, 2 c)$ . The probability for say trait 2b to be the trait that fixates in the population and triggers the stopping time is $R ({0}, 2 b) / R ({0})$ .

Asking for the new ESCs, which are reached after fixation, we have to take into account the subsequent evolution on the $ln K$ time scale. This allows for jumps towards traits of higher fitness, which are in the mutation spreading neighbourhood, i.e. direct neighbours in this case. Therefore, we end up with

\begin{matrix} v_{ESC} ({0}, 2 a) = {2 a} v_{ESC} ({0}, 2 b) = {4}, v_{ESC} ({0}, 2 c) = {4} . \end{matrix}

In particular, note that ${2 b}, {2 c}$ are not ESCs and thus not part of the metastability graph $G_{ESC}$ as plotted in Fig. 4.

This puts us into the setting where the sum in (27) becomes relevant. In particular, despite the micro-evolutionary branching from 1b into 2b and 2c in the trait graph G , there is no such branching on the macro-evolutionary level in $G_{ESC}$ . There, we only observe a transition from ${0}$ to ${4}$ . Note also that the different path lengths of $2 b \to 4$ and $2 c \to 4$ do not matter for the asymptotics of the time $T_{ESC}$ until stabilising in the new ESC. This is because this time is dominated by the waiting time $T_{fix}$ for the first fixation of a fit mutant trait. Since $L ({0}) = 2$ , this time is of order $1 / K μ_{K}^{2}$ and thus absorbs the much faster $ln K$ evolution.

Note that, since all transitions between ESCs occur on the time scale $1 / K μ_{K}^{2}$ here, the metastability graph $G_{ESC}$ agrees with the 2-scale graph $G^{2}$ .

An ESC with coexistence

Since in this paper we discuss the occurrence of metastable behaviour in a rather general setting, we like to point out that Definition 5 explicitly allows for ESCs $v$ that consist of several coexisting traits. This clearly enlarges the mutation spreading neighbourhood $V_{α} (v)$ and changes the set of mutant candidates $V_{mut}$ in a non-trivial way.

Example 2

Let us consider the directed graph G depicted in Fig. 5. Let $α \in (1, 2)$ and consider a fitness landscape that satisfies

\begin{matrix} f (0, 3), f (3, 0) > 0, \end{matrix}

\begin{matrix} f (1, {0, 3}), f (2, {0, 3}) < 0, \end{matrix}

\begin{matrix} f (4, {0, 3}), f (5, {0, 3}) > 0, \end{matrix}

\begin{matrix} 0, 1, 2, 3 ≪ 4, 5 \end{matrix}

\begin{matrix} 1, 2 ≪ 0, 3 \end{matrix}

\begin{matrix} f (4, 5), f (5, 4) < 0, \end{matrix}

and allows for no polymorphic coexistence equilibria apart from ${0, 3}$ . Moreover, assume that the unique stable equilibrium of the Lotka–Volterra system involving traits ${0, 3, 4}$ is $\bar{n} (4)$ and the same is true for 5 replacing 4.

Fig. 5 — Trait graph G and metastability graph $G_{ESC}$ (which agrees with the 2-scale graph $G^{2}$ ) of Example 2

Checking for traits that do not have any fitter direct neighbours, and hence do not allow for transitions on the $ln K$ -time scale, the monomorphic ESCs in this case correspond to ${0}$ , ${3}$ , ${4}$ , and ${5}$ . Classical results on Lotka–Volterra systems yield that under assumption (45) traits 0 and 3 can coexist, i.e. $\bar{n} ({0, 3}) \in R_{> 0}^{2}$ . Now the mutation spreading neighbourhood is given by $V_{α} ({0, 3}) = {0, 1, 2, 3}$ . Apart from the resident traits themselves, those traits are by assumption unfit with respect to ${0, 3}$ and thus ${0, 3}$ allows for an ESC.

Looking for the stability degree and possible mutant candidates, the assumptions on the fitness landscape imply that

\begin{matrix} L ({0, 3}) = 2 and V_{mut} ({0, 3}) = {4, 5} . \end{matrix}

By Theorem 7, we can observe a fixating mutant population of one of the traits $w \in V_{mut} ({0, 3})$ on the time scale $1 / K μ_{K}^{2}$ . The corresponding rates are given by

\begin{matrix} R ({0, 3}, 4) = {\bar{n}}_{0} ({0, 3}) \frac{b (0) m (0, 1)}{|f (1, {0, 3})|} b (1) m (1, 4) \frac{f (4, {0, 3})}{b (4)}, & for w = 4, \end{matrix}

\begin{matrix} R ({0, 3}, 5) = {\bar{n}}_{3} ({0, 3}) \frac{b (3) m (3, 2)}{|f (2, {0, 3})|} b (2) m (2, 5) \frac{f (5, {0, 3})}{b (5)}, & for w = 5 . \end{matrix}

Note that, although there are also paths connecting $3 \to 4$ and $0 \to 5$ , only the paths of shortest length $|γ| = 2$ do have an impact on the above rates.

To conclude this example, we see that both mutant traits 4 and 5 are fit enough to invade the coexisting resident population. Overall, we obtain the metastability graph $G_{ESC}$ pictured in Fig. 5, which in this case agrees with the 2-scale graph $G^{2}$ . Note that the traits 0 and 3 appear both as monomorphic ESCs, as well as a polymorphic coexistence ESC.

Successive metastable transitions

Self connection in $G_{ESC}$

By definition of an ESC, the first fixating mutant has a distance of at least $⌊ α ⌋ + 1$ from the corresponding resident traits. Despite this fact, the $ln K$ -mechanism triggered by such a mutant may lead to a new ESC that is closer to the old one than $⌊ α ⌋ + 1$ . It can even be the same and thus give rise to a self-connecting edge in $G_{ESC}$

Example 3

Let us consider the directed graph depicted in Fig. 6 and take $α \in (1, 2)$ . Consider a fitness landscape that satisfies

\begin{matrix} 0 ≪ 2 ≪ 4 ≪ 5 ≪ 2, \end{matrix}

\begin{matrix} 1 ≪ 2, 3 ≪ 4, \end{matrix}

\begin{matrix} f (1, 0), f (3, 2), f (3, 5) < 0 \end{matrix}

and assume that there are no polymorphic coexistence equilibria.

Fig. 6 — Trait graph G and metastability graph $G_{ESC}$ of Example 3

After a first jump from $v^{(0)} = {0}$ to $v^{(1)} = {2}$ on the time scale $1 / K μ_{K}^{2}$ , the next fixating mutant is of trait 4 and arises on the same time scale. The chosen fitness landscape ensures that it grows and can invade the population of trait 2 within a $ln K$ -time. Since $α \in (1, 2)$ , we obtain a non-vanishing population of trait 5 on the same time scale, which can grow as soon as trait 4 is the new resident trait. Due to its positive invasion fitness, 5 invades the trait 4 population. Finally, the same argument applies for an invasion by trait 2, where we then get stuck in because ${2}$ is an ESC of stability degree $L ({2}) = 2 > α$ .

Overall, we obtain that

\begin{matrix} v^{(2)} = v_{ESC} ({2}, 4) = {2} . \end{matrix}

In view of Definition 9, this gives rise to the self-connecting edge $({2}, {2}) \in G_{ESC}$ , which is illustrated in Fig. 6.

On Assumption 3

Since the assumption that prevents the process from getting stuck on a slower time scale is somewhat involved, we give two examples. First, we illustrate in Example 4 that Assumption 3 may hold true even if there is a cycle in the metastability graph $G_{ESC}$ . Second, we slightly modify the trait graph G and the fitness landscape to get Example 5, where Assumption 3 is not satisfied, and explain why this leads to difficulties.

Example 4

Let us consider the directed graph depicted in Fig. 7. Let $α \in (0, 1)$ and consider a fitness landscape that satisfies

\begin{matrix} 0 ≪ 2 ≪ 3 ≪ 4 ≪ 6, \end{matrix}

\begin{matrix} 1 ≪ 2, 5 ≪ 6, \end{matrix}

\begin{matrix} 3 ≪ 7 ≪ 2, \end{matrix}

\begin{matrix} f (1, 0), f (5, 4) < 0 \end{matrix}

and assume that there are no polymorphic coexistence equilibria.

Fig. 7 — Trait graph G, metastability graph $G_{ESC}$ and L-scale graphs $G^{1}$ and $G^{2}$ of Example 4

Let us first remark that, because of $α \in (0, 1)$ , we are in the regime of the trait substitution sequence (cf. Champagnat 2006). This means that we can neglect the $ln K$ -algorithm. In particular, if $v ≪ w$ , for some $w \in V_{mut} ({v})$ , then $v_{ESC} ({v}, w) = {w}$ .

With this knowledge, let us construct the jump chain step by step. The first two jumps are determined easily, noting that

\begin{matrix} v^{(0)} & = {0}, & L ({0}) & = 2, & V_{mut} ({0}) & = {2}, \end{matrix}

\begin{matrix} v^{(1)} & = {2}, & L ({2}) & = 1, & V_{mut} ({2}) & = {3}, \end{matrix}

\begin{matrix} v^{(2)} & = {3}, & L ({3}) & = 1, & V_{mut} ({3}) & = {4, 7} . \end{matrix}

For the third jump, there are two possible triggering mutants. If trait 7 fixates first, the process jumps to the ESC $v^{(3)} = {7}$ and then returns to $v^{(4)} = {2}$ , all on the time scale $1 / K μ_{K}$ . If instead trait 4 fixates earlier, the jump chain continues to $v^{(3)} = {4}$ within a time of order $1 / K μ_{K}$ and then to $v^{(4)} = {6}$ on the time scale $1 / K μ_{K}^{2}$ , since $f (5, 4) < 0$ .

Mentioning that $V_{mut} ({1}) = {2}$ and $V_{mut} ({5}) = {6}$ gives us the metastability graph drawn in Fig. 7.

To check whether Assumption 3 is satisfied, we decompose the set of ESCs $V_{ESC}$ according to the stability degree,

\begin{matrix} S^{1} = \{{1}, {2}, {3}, {5}, {7}\}, & S^{2} = \{{0}, {4}\}, & S^{\infty} = \{{6}\} . \end{matrix}

For all $v \in S^{1}$ , one directly sees that an ESC of the same or a higher stability is reached after one jump with probability one. Thus the assumption is true for $L = 1$ and we can construct the graph $G^{1}$ as drawn.

In the case of $L = 2$ , for $v^{(0)} = {4}$ , we obtain that with probability one the process jumps to $v^{(1)} = {6}$ , which is of higher stability. Finally, we have to check the most involved case of $v^{(0)} = {0}$ . From the metastability graph we identify $v = {4}$ as the only reachable ESC of degree $L \geq 2$ . Due to the branching at ${3}$ , we have to ensure that the process does not get stuck in a cycle of $({2}, {3}, {7}, {2})$ for infinitely many steps. We can see that

\begin{matrix} P (\forall k \in N_{> 0} : v^{(k)} \neq {4} | v^{(0)} = {0}) = 0 \end{matrix}

since the number of cycles that run through before exiting towards ${4}$ has a geometric law with success probability $p ({3}, {4}) > 0$ . Therefore, Assumption 3 also holds true for $L = 2$ . This yields the L-scale graph $G^{2}$ , depicted in Fig. 7.

Let us now modify the example by inserting an additional trait 8, that can be viewed as an intermediate unfit mutation between 3 and 4. Moreover, for the sake of clarity, we cut off the traits 5 and 6.

Example 5

Let us consider the directed graph depicted in Fig. 8 and let $α \in (0, 1)$ . Consider a fitness landscape that satisfies

\begin{matrix} 0 ≪ 2 ≪ 3 ≪ 4, \end{matrix}

\begin{matrix} 3 ≪ 7 ≪ 2, \end{matrix}

\begin{matrix} 1 ≪ 2, 8 ≪ 4, \end{matrix}

\begin{matrix} f (1, 0), f (8, 3) < 0 \end{matrix}

and assume that there are no polymorphic coexistence equilibria.

Fig. 8 — Trait graph G, metastability graph $G_{ESC}$ and L-scale graphs $G^{1}$ and $G^{2}$ of Example 5

Since we only changed the trait graph G slightly, also the metastability graph $G_{ESC}$ stays almost the same. Apart from the omitted traits 5 and 6, the main difference is that the valley from the ESC ${3}$ to the fit mutant 4 is now of width 2. Therefore, trait 4 is no longer one of the nearest fit traits to trait 3 and the set of possible mutants gets reduced to $V_{mut} ({3}) = {7}$ . In particular, there is no longer an edge $({3}, {4})$ in the metastability graph.

To check whether Assumption 3 is satisfied, we again separate the stability classes

\begin{matrix} S^{1} = \{{1}, {2}, {3}, {7}, {8}\}, & S^{2} = \{{0}\}, & S^{\infty} \{{4}\} . \end{matrix}

For $L = 1$ , it is again easy to see from $G_{ESC}$ that the assumption holds true. To check this for $L = 2$ , we have to consider how the process can get from the initial ESC ${0}$ to some ESC of at least the same stability degree. This is not possible since the only candidate would bee ${4}$ , which is not reachable since the metastability graph is disconnected. As a conclusion, Assumption 3 is not satisfied for $L = 2$ and thus we can neither construct the L-scale graph $G^{2}$ nor apply Theorem 11.

Remark 14

Although the population process gets stuck in a cycle between of the ESCs ${2}, {3}, {7}$ for infinite time, we expect that it might escape through the fitness valley $3 \to 8 \to 4$ eventually, when looking at the time scale $1 / K μ_{K}^{2}$ . This is due to the fact that, from the microscopic point of view, it is possible to observe mutants of trait 4 in the phases where 3 is the resident trait. Indeed, those mutants appear with a much smaller rate than those of trait 7, but since these phases occur infinitely often, it should only be a question of acceleration to escape from this cycle (c.f. Remark 12).

Collapse on higher time scales

In the two final examples, we demonstrate how paths in the metastability graph that pass through ESCs of different stability degree collapse to a single edge in the L-scale graph when focussing on a particular time scale. To this end we start with an example of a simple linear trait graph with multiple successive fitness valleys of different length. The second example allows for a branching in the metastability graph, which again vanishes in the L-scale graph.

Example 6

Let us consider the directed graph G depicted in Fig. 9. Assume equal competition and the individual fitness r plotted in Fig. 9. Moreover, let $α \in (1, 2)$ .

Due to the linear and directed structure of the trait graph, we can extract the fitness valleys and thus the stability degrees directly from the plotted individual fitness r. The jump chain ${(v^{(k)})}_{k \geq 0}$ is the deterministic sequence

\begin{matrix} v^{(0)} = {0}, & v^{(1)} = {3}, & v^{(2)} = {5}, & v^{(3)} = {8} . \end{matrix}

This is reflected in the metastability graph drawn in Fig. 10. Note that ${6}$ is also an ESC of stability degree 2, but it cannot be reached starting from ${0}$ .

Let us now have a look at the L-scale-graphs, i.e. at how the limiting jump process evolves when fixing a particular time scale. To this end we focus on the sets of ESCs of equal stability degree, namely

\begin{matrix} S^{2} = \{{3}, {6}\}, & S^{3} = \{{0}, {5}\}, & S^{\infty} = \{{8}\} . \end{matrix}

Following our construction in (38), the L-scale-graph $G^{2}$ consists of the vertices $V^{2} = \{{0}, {3}, {5}, {6}, {8}\}$ . Since all but ${3}$ and ${6}$ are of stability degree higher than $L = 2$ , the only edges are $E^{2} = \{({3}, {5}), ({6}, {8})\}$ .

The construction of the edges of $G^{3}$ is far more interesting. In particular, starting in the initial ESC $v^{(0)} = {0}$ , we cannot simply take the edge $({0}, {3})$ from the metastability graph since $L ({3}) < 3$ and thus ${3}$ is not stable enough. Instead, we have to consider the whole path $Γ = ({0}, {3}, {5})$ until an ESC of higher stability is reached. This is because the second jump of $Γ$ happens much faster (more precisely on the time scale $1 / K μ_{K}^{2}$ ) and hence becomes absorbed in the slower first jump when rescaling the process with $1 / K μ_{K}^{3}$ . This gives us one edge in $E^{3}$ . The second one is given by the jump $({5}, {8})$ . Since $L ({8}) = \infty$ , no further evolution is possible here.

Overall, these considerations lead to the pictures of $G^{2}$ and $G^{3}$ in Fig. 10.

Example 7

Let us consider the directed graph G depicted in Fig. 11. Assume equal competition and the individual fitness r plotted in Fig. 11. Moreover, let $α \in (1, 2)$ .

Starting with the resident population in $v^{(0)} = {0}$ , we can directly extract from the plotted individual fitness r that only the traits 3 and 5 have positive invasion fitness. Moreover, both can be reached via a path of length $|γ| = 3$ , namely

\begin{matrix} γ^{A} = (0, 1, 2, 3), & γ^{B} = (0, 1, 6, 5) . \end{matrix}

Hence, we associate to this ESC the stability degree $L ({0}) = 3$ and the set of mutant candidates $V_{mut} ({0}) = \{3, 5\}$ .

If trait 5 fixates first, there is no further evolution and we end with $v_{ESC} ({0}, 5) = {5}$ . In the case where trait 3 fixates, it can grow and becomes macroscopic. Moreover, since $α \in (1, 2)$ , the population of trait 4 grows by frequent incoming mutants. However, due to its negative invasion fitness with respect to the resident ${0}$ and later against the macroscopic population ${3}$ , it cannot invade. Hence $v_{ESC} ({0}, 3) = {3}$ is the corresponding ESC and is of stability degree $L ({3}) = 2$ . From thereon, only trait 5 is a fit reachable mutant, which arises after a waiting time of order $O (1 / K μ_{K}^{2})$ and replaces 3 as an ESC. Those three jumps form the edges of the drawn metastability graph $G_{ESC}$ in Fig. 12.

Fig. 12 — Metastability graph $G_{ESC}$ and L-scale graphs $G^{2}$ and $G^{3}$ of Example 7

The L-scale-graph $G^{2}$ is constructed easily whereas the really interesting behaviour occurs when asking for the $G^{3}$ . Since $L ({0}) = 3$ , the jumps $({0}, {3})$ and $({0}, {5})$ happen on the visible time scale. The latter one is clearly also an edge in $G^{3}$ , due to the high stability of the final ESC $L ({5}) = \infty$ . However, in case of the former, the ESC that the process jumps to is of smaller stability, i.e. $L ({3}) = 2$ . Therefore, the next jump $({3}, {5})$ directly occurs within a time that vanishes under rescaling. The path $Γ = ({0}, {3}, {5})$ in $G_{ESC}$ thus yields an edge $({0}, {5})$ for $G^{3}$ . This edge already exists and we do not allow for double edges in $G^{L}$ . However, the two edges are merged in the sense of adding up the transition rates and probabilities as in (40).

Overall, we see that even a branching in the metastability graph can disappear when multiple paths collapse to the same edge on a particular time scale.

Proofs

In this chapter, we discuss the proofs of the results on metastable transitions and limiting jump processes that are presented in Sects. 2.3 and 2.4. These build on the results in Bovier et al. (2019) on the crossing of a fitness valley on a linear trait space and in Coquille et al. (2021) on the faster $ln K$ -dynamics on general finite graphs. The main idea is to extend the techniques from Bovier et al. (2019) to more complex trait spaces by considering sequential mutations along certain paths. Since mutations are very rare outside of the mutation spreading neighbourhood of the resident traits and unfit traits quickly go extinct, mutations along different paths can essentially be regarded as independent. Consequently, the overall rate of transitioning out of an ESC is obtained by summing over the rates of taking specific paths through the surrounding fitness valley.

The remaining chapter is structured as follows: In Sect. 4.1, we determine the precise equilibrium size of the subpopulations with traits inside the mutation spreading neighbourhood. In Sect. 4.2, we consider the rates at which mutants of any fitness arise along specific paths and combine these to the overall rate at which single mutants are born. Finally, in Sect. 4.3, we combine these rates of producing mutants beyond the fitness valley with the probability of fixation and the faster $ln K$ -dynamics of reaching a new ESC to conclude Theorem 7 and Corollary 8. Section 4.4 is dedicated to the proof of Corollary 10 and Theorem 11, where we concatenate several jumps across fitness valleys to obtain the multi-scale jump chain and carefully study which transitions are visible on the respective time scales to obtain the dynamics of the limiting Markov jump process.

Estimation of the equilibrium size

In this section we discuss the equilibrium population sizes of the living traits once an ESC is obtained. The results from Coquille et al. (2021) only characterize the orders of population sizes $β_{w}$ and the actual size $\bar{n} (v)$ of the resident traits associated to an ESC. In order to calculate the precise transition rates from one ESC to another, we do, however, need a better estimate for the population sizes of the non-resident traits in $V_{α} (v)$ .

We prove that, if the initial conditions of our process satisfy the assumptions of an asymptotic ESC, all living traits in $V_{α} (v)$ get arbitrarily close to their equilibrium size within a finite time. This equilibrium size preserves the orders of population sizes and is of the form

\begin{matrix} N_{v}^{K} (t) = a_{v} K μ_{K}^{d (v, v)} + o (K, μ_{K}^{d (v, v)}) \forall v \in V_{α} (v), \end{matrix}

for some $a_{v} \in R_{+}$ , which can be calculated precisely. The populations of living traits stay close to these equilibrium sizes as long as no new trait arises and reaches a size at which it can influence the population sizes of other traits, i.e. a size of order $K^{1 / α}$ . To this extend, we recall the definition of the stopping time

\begin{matrix} T_{fix}^{K} : = inf \{t \geq 0 : \exists w \in V \ V_{α} (v) : β_{w}^{K} (t) \geq 1 / α\} . \end{matrix}

Lemma 14

(Equilibrium size inside the $α$ -radius) Let $v \subset V$ and ${(β^{K} (0))}_{K \geq 0}$ be an asymptotic ESC. Then, for all $ε > 0$ , there exist constants $τ_{ε} < \infty$ , $U_{ε} > 0$ and Markov processes ${(N_{v}^{(K, \pm)} (t), t \geq 0)}_{K \geq 0}$ such that,

\begin{matrix} lim_{K \to \infty} P (N_{v}^{(K, -)} (t) \leq N_{v}^{K} (t) \leq N_{v}^{(K, +)} (t) \forall t \in (τ_{ε}, T_{fix}^{K} \land e^{U_{ε} K}), v \in V_{α} (v)) = 1 \end{matrix}

and

\begin{matrix} |\frac{E [N_{v}^{(K, \pm)} (t)]}{K μ_{K}^{d (v, v)}} - a_{v}| < ε \forall t \geq τ_{ε}, \end{matrix}

where

\begin{matrix} a_{v} : = \sum_{\begin{matrix} γ : v \to v \\ |γ| = d (v, v) \end{matrix}} {\bar{n}}_{γ_{0}} (v) \prod_{i = 1}^{|γ|} \frac{b (γ_{i - 1}) m (γ_{i - 1}, γ_{i})}{|f (γ_{i}, v)|} . \end{matrix}

Proof

We will prove the claim by induction w.r.t. the distance from the resident traits. For the initialisation let us start with $v \in v$ . That is, we count also a single vertex as a path of length zero together with the convention that an empty product has the value one. In this case $(N_{v}^{K}, v \in v)$ can be coupled with logistic birth-death processes with immigration, by estimating the incoming and outgoing mutants, which are of order $O (K μ_{K})$ or smaller. Hence we know already from Coquille et al. (2021, Lemma A.6(ii)) that the residents stabilize near their Lotka–Volterra-equlilibrium within a time of order $O (1)$ . To make this more precise, define, for all $ε > 0$ , the stopping time when the resident populations enter an $ε$ -neighbourhood of their equilibrium size

\begin{matrix} τ_{ε}^{K} : = inf \{t \geq 0 : \forall v \in v : |K^{- 1} N_{v}^{K} (t) - {\bar{n}}_{v} (v)| < ε C\} . \end{matrix}

Here C is a constant, depending only on the competition rates c(v, w), which compensates the slight shift of the equilibrium due to small fluctuations of non-resident traits. Then there exists a constant time ${\tilde{τ}}_{ε} < \infty$ , such that ${lim}_{K \to \infty} P (τ_{ε}^{K} < {\tilde{τ}}_{ε}) = 1$ . After this time ${\tilde{τ}}_{ε}$ , the environment of competitive pressure stays almost constant, unless the fluctuations of the resident populations become too big or the non-residents reach a macroscopic level. These two events are described by the stopping times

\begin{matrix} S_{ε}^{K} : = inf \{t \geq τ_{ε}^{K} : \exists v \in v : |K^{- 1} N_{v}^{K} (t) - {\bar{n}}_{v} (v)| > 2 ε C\} \end{matrix}

and

\begin{matrix} σ_{ε}^{K} : = inf \{t \geq 0 : \sum_{w \in V \ v} N_{w}^{K} (t) \geq ε K\} . \end{matrix}

We know from Champagnat and Méléard (2011, Propostition A.2) that, for some constant $U_{ε} > 0$ ,

\begin{matrix} lim_{K \to \infty} P (S_{ε}^{K} > e^{U_{ε} K} \land σ_{ε}^{K}) = 1 . \end{matrix}

For the other traits in the $α$ -radius $v \in V_{α} \ v$ we prove as the induction step that (77) is satisfied with

\begin{matrix} a_{v} = \sum_{\begin{matrix} (w, v) \in E \\ d (v, w) = d (v, v) - 1 \end{matrix}} a_{w} \frac{b (w) m (w, v)}{|f (v, v)|} \end{matrix}

by deriving an upper and a lower bound on the population size through couplings. These bounds then immediately imply the claim.

Following the notation of Fournier and Méléard (2004), we represent the population processes in terms of Poisson random measures. For this purpose let $(Q_{v}^{(b)}, Q_{v}^{(d)}, Q_{w, v}^{(m)} ; v, w, \in V)$ be independent homogeneous Poisson random measures on $R_{+}^{2}$ with intensity $d s d θ$ . Then we can write

\begin{matrix} N_{v}^{K} (t) & = N_{v}^{K} (0) + \int_{0}^{t} \int_{R_{+}} 1_{θ \leq b (v) (1 - μ_{K}) N_{v}^{K} (s^{-})} Q_{v}^{(b)} (d s, d θ) \\ - \int_{0}^{t} \int_{R_{+}} 1_{θ \leq [d (v) + \sum_{w \in V} c^{K} (v, w) N_{w}^{K} (s^{-})] N_{v}^{K} (s^{-})} Q_{v}^{(d)} (d s, d θ) \\ + \sum_{(w, v) \in E} \int_{0}^{t} \int_{R_{+}} 1_{θ \leq μ_{K} b (w) m (w, v) N_{w}^{K} (s^{-})} Q_{w, v}^{(m)} (d s, d θ) . \end{matrix}

Note that we use the same Poisson measures to construct the processes for each K here. However, as already pointed out in Sect. 2.1, this is not necessary and we do not use any particular correlation between the processes for different K. We can use a specific joint construction here since we are only considering the convergence of probabilities of certain events, rather than of the processes themselves.

Since we already know from Coquille et al. (2021, Theorem 2.2) that in the equilibrium state the non-resident populations $w \in V_{α} (v)$ stay of size $O (K μ_{K}^{d (v, w)})$ , the main part of the mutations in the last line comes only from traits lying closer to the resident traits. Thus we can adopt the inductive structure of Bovier et al. (2019, Lemma 7.1) and approximate the population size of v analogously by coupling it, for K large enough, with two processes

\begin{matrix} N_{v}^{(K, -)} (t) \leq N_{v}^{K} (t) \leq N_{v}^{(K, +)} (t), \forall {\tilde{τ}}_{ε} \leq t \leq σ_{ε}^{K} \land T_{fix}^{K} \land S_{ε}^{K} . \end{matrix}

To be precise, we take care of the admissible fluctuations of the residents by defining

\begin{matrix} {\bar{n}}_{v}^{(\pm)} (v) : = {\bar{n}}_{v} (v) \pm 2 ε C . \end{matrix}

Then, for $v \in V \ v$ and $μ_{K} < ε$ , we set

\begin{matrix} N_{v}^{(K, -)} (t) & = N_{v}^{K} ({\tilde{τ}}_{ε}) + \int_{{\tilde{τ}}_{ε}}^{t} \int_{R_{+}} 1_{θ \leq b (v) (1 - ε) N_{v}^{(K, -)} (s^{-})} Q_{v}^{(b)} (d s, d θ) \\ - \int_{{\tilde{τ}}_{ε}}^{t} \int_{R_{+}} 1_{θ \leq [d (v) + \sum_{w \in v} c (v, w) {\bar{n}}_{w}^{(+)} (v) + ε {max}_{\tilde{w} \in V \ v} c (v, \tilde{w})] N_{v}^{(K, -)} (s^{-})} Q_{v}^{(d)} (d s, d θ) \\ + \sum_{(w, v) \in E} \int_{{\tilde{τ}}_{ε}}^{t} \int_{R_{+}} 1_{θ \leq μ_{K} b (w) m (w, v) N_{w}^{K} (s^{-})} Q_{w, v}^{(m)} (d s, d θ) \end{matrix}

\begin{matrix} N_{v}^{(K, +)} (t) & = N_{v}^{K} ({\tilde{τ}}_{ε}) + \int_{{\tilde{τ}}_{ε}}^{t} \int_{R_{+}} 1_{θ \leq b (v) N_{v}^{(K, +)} (s^{-})} Q_{v}^{(b)} (d s, d θ) \\ - \int_{{\tilde{τ}}_{ε}}^{t} \int_{R_{+}} 1_{θ \leq [d (v) + \sum_{w \in v} c (v, w) {\bar{n}}_{w}^{(-)} (v)] N_{v}^{(K, +)} (s^{-})} Q_{v}^{(d)} (d s, d θ) \\ + \sum_{(w, v) \in E} \int_{{\tilde{τ}}_{ε}}^{t} \int_{R_{+}} 1_{θ \leq μ_{K} b (w) m (w, v) N_{w}^{K} (s^{-})} Q_{w, v}^{(m)} (d s, d θ), \end{matrix}

where we use the same Poisson measures as in (85). Note that this coupling satisfies (86) only on the event $\{τ_{ε}^{K} < {\tilde{τ}}_{ε}\}$ . However, as mentioned above, this event’s probability converges to 1 and we can hence restrict our considerations to this case to obtain the desired convergence.

On closer inspection, the approximating processes $N_{v}^{(K, -)}, N_{v}^{(K, +)}$ are nothing but subcritical branching processes with immigration stemming from incoming mutations.

Similar to the proof of Bovier et al. (2019, Equation (7.8) et sqq.) we can use the martingale decomposition of $N_{v}^{(K, +)}$ and $N_{v}^{(K, -)}$ to derive, for $t > {\tilde{τ}}_{ε}$ , the differential equation

\begin{matrix} \frac{d}{d t} & E [N_{v}^{(K, *)} (t)] \\ = (b (v) (1 - 1_{{* = -}} ε) - d (v) - \sum_{w \in v} c (v, w) {\bar{n}}_{w}^{(\bar{*})} (v) - 1_{{* = -}} ε sup_{\tilde{w} \in V \ v} c (v, \tilde{w})) \\ \times E [N_{v}^{(K, *)} (t)] + \sum_{(w, v) \in E} μ_{K} b (w) m (w, v) E [N_{w}^{K} (t)] \end{matrix}

\begin{matrix} = f^{(*)} (v, v) E [N_{v}^{(K, *)} (t)] + \sum_{(w, v) \in E} μ_{K} b (w) m (w, v) E [N_{w}^{K} (t)], \end{matrix}

where $\bar{*} = {+, -} \ *$ denotes the inverse sign.

Here, we introduce $f^{(*)} (v, v)$ as a short notation to point out that this is nothing but a perturbation of the invasion fitness. Then we can apply our a priori knowledge on the size of the sub-populations, i.e.

\begin{matrix} E [N_{w}^{K} (t)] = O (K, μ_{K}^{d (v, w)}) \forall w \in V_{α} (v), \end{matrix}

to rewrite the ODE system

\begin{matrix} \frac{d}{d t} E [N_{v}^{(K, *)} (t)] = & f^{(*)} (v, v) E [N_{v}^{(K, *)} (t)] + \sum_{\begin{matrix} (w, v) \in E \\ d (v, w) = d (v, v) - 1 \end{matrix}} μ_{K} b (w) m (w, v) E [N_{w}^{K} (t)] \\ + O (K, μ_{K}^{d (v, v) + 1}) \end{matrix}

\begin{matrix} = & f^{(*)} (v, v) E [N_{v}^{(K, *)} (t)] + \sum_{\begin{matrix} (w, v) \in E \\ d (v, w) = d (v, v) - 1 \end{matrix}} b (w) m (w, v) a_{w} K μ_{K}^{d (v, v)} \\ + o (K, μ_{K}^{d (v, v)}) . \end{matrix}

Here we use the induction hypothesis to estimate the populations with traits lying closer to the residents in the latter equality.

Rescaling with $K μ_{K}^{d (v, v)}$ and using (84), the equation becomes

\begin{matrix} \frac{d}{d t} E [\frac{N_{v}^{(K, *)} (t)}{K μ_{K}^{d (v, v)}}] & = f^{(*)} (v, v) E [\frac{N_{v}^{(K, *)} (t)}{K μ_{K}^{d (v, v)}}] + a_{v} |f (v, v)| + o (1) . \end{matrix}

By variation of constants the solution is given by

\begin{matrix} E [\frac{N_{v}^{(K, *)} (t)}{K μ_{K}^{d (v, v)}}] = & e^{f^{(*)} (v, v) (t - {\tilde{τ}}_{ε})} (E [\frac{N_{v}^{K} ({\tilde{τ}}_{ε})}{K μ_{K}^{d (v, v)}}] - \frac{|f (v, v)|}{|f^{(*)}, (v, v)|} a_{v} + o (1)) \\ + \frac{|f (v, v)|}{|f^{(*)}, (v, v)|} a_{v} + o (1) \end{matrix}

Note that the term in brackets can be bounded uniformly in K and $ε$ , for $ε$ small enough. Moreover the ratio of (perturbed) fitness can be expressed as $(1 \pm ε {\tilde{c}}_{ε})$ . So (96) becomes

\begin{matrix} E [\frac{N_{v}^{(K, *)} (t)}{K μ_{K}^{d (v, v)}}] = e^{f^{(*)} (v, v) (t - {\tilde{τ}}_{ε})} O (1) + (1 \pm ε {\tilde{c}}_{ε}) a_{v} + o (1) \end{matrix}

Finally taking into account that the fitness $f^{(*)} (v, v) < 0$ is negative for $v \in V_{α} (v)$ the first term vanishes for increasing time. Hence we see that for all $\tilde{ε} > 0$ there are $ε > 0$ and $τ_{\tilde{ε}} \in ({\tilde{τ}}_{ε}, \infty)$ and $K_{0} \in N$ such that, for all $t > τ_{\tilde{ε}}$ and $K > K_{0}$

\begin{matrix} |E [\frac{N_{v}^{(K, *)} (t)}{K μ_{K}^{d (v, v)}}] - a_{v}| < \tilde{ε} . \end{matrix}

Finally, we can deduce again from our knowledge on the orders of population sizes that

\begin{matrix} lim_{K \to \infty} P (σ_{ε}^{K} < T_{fix}^{K} \land e^{U_{ε} K}) = 0, \end{matrix}

which allows us to drop the stopping time $σ_{ε}^{K}$ in the claim. $□$

Pathwise evolution rates

From the precise description of the population sizes inside the mutation spreading neighbourhood we can now deduce the rate of occurrence of mutants that lay outside.

To observe a new mutant, whose trait is far away from the resident population, a whole sequence of mutation steps is needed. Traits outside the $α$ -neighbourhood $V_{α} (v)$ cannot avoid extinction only due to incoming mutants. Therefore, if such a trait has negative invasion fitness, mutants only give rise to small excursions approximated by subcritical branching processes. During each of these excursions there is a small probability that a new mutant is produced before extinction.

To overcome the problem of tracking possible back mutations, we not only observe the sizes of the different mutant populations. Instead, we distinguish mutants by the mutational path along which they arose and keep track of the genealogy. We set

\begin{matrix} N_{v}^{K} (t) = \sum_{γ : \partial V_{α} \to v} N_{v, γ}^{K} (t) \forall v \in V \ V_{α}, \end{matrix}

100

where the pathwise mutations can by represented by

\begin{matrix} N_{v, γ}^{K} (t) & = \int_{0}^{t} \int_{R_{+}} 1_{θ \leq b (v) (1 - μ_{K}) N_{v, γ}^{K} (s^{-})} Q_{v, γ}^{(b)} (d s, d θ) \\ + \int_{0}^{t} \int_{R_{+}} 1_{θ \leq μ_{K} b (\tilde{v}) m (\tilde{v}, v) N_{\tilde{v}, γ \ v}^{K} (s^{-})} Q_{v, γ}^{(m)} (d s, d θ) \\ - \int_{0}^{t} \int_{R_{+}} 1_{θ \leq [d (v) + \sum_{w \in V} c^{K} (v, w) N_{w}^{K} (s^{-})] N_{v, γ}^{K} (s^{-})} Q_{v, γ}^{(d)} (d s, d θ) . \end{matrix}

101

Here $\tilde{v}$ stands for the next-to-last vertex in $γ$ , which is the progenitor of v in $γ$ , and for $\tilde{v} \in \partial V_{α}$ we set

\begin{matrix} N_{\tilde{v}, (\tilde{v})} (t) : = N_{\tilde{v}} (t) . \end{matrix}

102

As before, $(Q_{v, γ}^{(b)}, Q_{v, γ}^{(d)}, Q_{v, γ}^{(m)} ; v \in V, γ : \partial V_{α} \to v)$ are independent homogeneous Poisson random measures with constant intensity one.

Remark 15

It suffices to only sum over the paths starting in $\partial V_{α}$ in the decomposition. By the definition of $T_{ESC}^{K}$ all populations outside of $V_{α}$ are extinct at that time. The probability that a mutant of trait $v \in V \ V_{α}$ arises before the finite time $τ_{ε}$ in Lemma 14, when the populations in $V_{α}$ reach their equilibrium, goes to zero. After this time we have good bounds on the population sizes of all traits in $V_{α}$ and it is therefore sufficient to trace back the genealogy of new mutants to the last trait in $V_{α}$ , i.e. a trait in $\partial V_{α}$ .

With this representation at hand, we are now able to define the cumulated number of mutant individuals of trait v that arose as mutants of the progenitor $\tilde{v}$ , along the path $γ$

\begin{matrix} M_{v, γ}^{K} (t) = \int_{0}^{t} \int_{R_{+}} 1_{θ \leq μ_{K} b (\tilde{v}) m (\tilde{v}, v) N_{\tilde{v}, γ \ v}^{K} (s^{-})} Q_{v, γ}^{(m)} (d s, d θ), \end{matrix}

103

as well as the respective occurrence times of these mutants

\begin{matrix} T_{v, γ}^{(i, K)} : = inf \{t \geq 0 : M_{v, γ}^{K} (t) \geq i\}, \end{matrix}

104

where we set $T_{v, γ}^{(0, K)} : = 0$ .

Our aim is to show that new mutants outside of $V_{α}$ appear at the end of a mutation path approximately as a Poisson point process with rate scaling with length of the path.

Lemma 15

Suppose $v$ and ${(β^{K} (0))}_{K \geq 0}$ are an asymptotic ESC and let $T_{fix}^{K}$ be defined as in (76). Let $v \in V \ V_{α}$ and $γ : \partial V_{α} \to v$ be such that $|γ| \geq L - ⌊ α ⌋$ and $f (γ_{i}, v) < 0$ , for all $i = 0, \dots, |γ| - 1$ . Then there exist $0 < c, C < \infty$ such that, for each $ε > 0$ , there exist two Poisson point processes $M_{v, γ}^{(K, \pm)}$ with rates ${\tilde{R}}_{v, γ}^{(\pm)} K μ_{K}^{⌊ α ⌋ + |γ|}$ such that

\begin{matrix} \underset{K \to \infty}{lim inf} P (M_{v, γ}^{(K, -)} (t) < M_{v, γ}^{K} (t) < M_{v, γ}^{(K, +)} (t), \forall t < T_{fix}^{K}) \geq 1 - c ε, \end{matrix}

105

where the rate parameters are defined as

\begin{matrix} {\tilde{R}}_{v, γ} : = a_{γ_{0}} b (γ_{0}) m (γ_{0}, γ_{1}) \prod_{j = 1}^{|γ| - 1} λ (ρ (γ_{j}, v)) m (γ_{j}, γ_{j + 1}), & {\tilde{R}}_{v, γ}^{(\pm)} = (1 \pm C ε) {\tilde{R}}_{v, γ} . \end{matrix}

106

For the definitions of $λ (ρ)$ and $ρ (v, v)$ we refer to (20) and (19) respectively, while $a_{γ_{0}}$ is the equilibrium size defined in (79).

Proof

Note that, throughout the whole proof, we assume that $τ_{ε} < t < T_{fix}^{K} \land e^{U_{ε} K}$ , where $τ_{ε}$ and $U_{ε}$ are defined in Lemma 14. This can then be extended to all $0 \leq t < T_{fix}^{K}$ in the limit of $K \to \infty$ since $T_{fix}^{K} < e^{U_{ε} K}$ with probability converging to 1 and, since $μ_{K} \to 0$ , there is almost surely no mutation event during the finite time interval $[0, τ_{ε}]$ .

Let $v \in V \ V_{α}$ and $γ : \partial V_{α} \to v$ be given as in the Lemma. To better distinguish from the full path $γ$ , we refer to the vertices of the path via $γ = (v_{0}, v_{1}, \dots, v_{|γ|})$ . The idea of this proof is to consider the path isolated from the remaining graph and adapt the tools from Bovier et al. (2019, Ch. 7.3.) to the present situation. We refrain from adding much more notation to our already complicated situation. We try to handle the far more general structure of our trait graph by translating the notation of the central objects between the articles instead.

The first observation is that, for every $t < T_{fix}^{K}$ , we can bound the mutant counting process of trait $v_{1}$ by

\begin{matrix} M_{v_{1}, γ}^{(K, -)} (t) \leq M_{v_{1}, γ}^{K} (t) \leq M_{v_{1}, γ}^{(K, +)} (t) a.s., \end{matrix}

107

with the bounding processes being defined as

\begin{matrix} M_{v_{1}, γ}^{(K, \pm)} (t) = \int_{0}^{t} \int_{R_{+}} 1_{θ \leq μ_{K} b (v_{0}) m (v_{0}, v_{1}) N_{v_{0}}^{(K, \pm)} (s^{-})} Q_{v_{0}, γ}^{(m)} (d s, d θ) . \end{matrix}

108

Note that the estimate corresponds to equation (7.42) in Bovier et al. (2019), while the definition is the adapted version of (7.17) therein. In order make use of Lemma 14, we continue temporarily with the simplified processes

\begin{matrix} {\bar{M}}_{v_{1}, γ}^{(K, \pm)} (t) = \int_{0}^{t} \int_{R_{+}} 1_{θ \leq μ_{K} b (v_{0}) m (v_{0}, v_{1}) E [N_{v_{0}}^{(K, \pm)} (s^{-})]} Q_{v_{0}, γ}^{(m)} (d s, d θ) \end{matrix}

109

and

\begin{matrix} {\bar{T}}_{v_{1}, γ}^{(i, K, \pm)} : = inf \{t \geq 0 : {\bar{M}}_{v_{1}, γ}^{(K, \pm)} (t) \geq i\} . \end{matrix}

110

In fact, this turns out to be sufficient for our results since a standard application of Doob’s martingale inequality shows that, with probability converging to 1, the difference of the processes $M_{v_{1}, γ}^{(K, \pm)}$ and ${\bar{M}}_{v_{1}, γ}^{(K, \pm)}$ during the relevant time interval stays of sufficiently small order. To be precise there exist sequences of numbers $N_{1} (K)$ and $N_{2} (K)$ , with

\begin{matrix} N_{1} (K) ≫ {(K μ_{K}^{L})}^{- 1} and N_{2} (K) ≪ {(μ_{K}^{L - 1 - ⌊ α ⌋})}^{- 1} \end{matrix}

111

such that

\begin{matrix} lim_{K \to \infty} P (sup_{s \leq N_{1} (K)} |M_{v_{1}, γ}^{(K, \pm)} (s) - {\bar{M}}_{v_{1}, γ}^{(K, \pm)} (s)| > N_{2} (K)) = 0 . \end{matrix}

112

For details, see Bovier et al. (2019, p. 3583). At each time ${\bar{T}}_{v_{1}, γ}^{(i, K, \pm)}$ an individual of trait $v_{1}$ is born. In order to track its descendants until potentially a trait $v_{|γ|}$ individual is born, in a similar way as done in the previous section, we couple the k-mutant population, for $1 \leq k \leq |γ| - 1$ , to birth-death processes with individual birth and death rates

\begin{matrix} b^{(*)} (v_{k}) & = b (v_{k}) (1 - 1_{{* = -}} ε) \end{matrix}

113

\begin{matrix} d^{(*)} (v_{k}) & = d (v_{k}) + \sum_{w \in v} c (v_{k}, w) {\bar{n}}_{w}^{(\bar{*})} (v) + 1_{{* = -}} ε sup_{\tilde{w} \in V \ v} c (v_{k}, \tilde{w}) . \end{matrix}

114

Note that in contrast to Sect. 4.1, these subcritical processes do not gain from any immigration and hence go extinct in finite time. However, there is a small probability during such an excursion of the k-mutant population that an individual of trait $(k + 1)$ is born. Analogously to Bovier et al. (2019, pp. 3581–3582), we can use Lemma 17 (see Appendix A.1) to derive

\begin{matrix} P (An excursion of trait v_{k} produces exactly 1 mutant of type v_{k + 1}) \\ = μ_{K} λ (ρ (v_{k}, v)) m (v_{k}, v_{k + 1}) (1 + O (ε)), \end{matrix}

115

while on the other hand

\begin{matrix} P (An excursion of trait v_{k} produces at least 2 mutants of v_{k + 1}) = O (μ_{K}^{2}) . \end{matrix}

116

Hence, the probability that the i-th mutant of trait $v_{1}$ (i.e. the one triggering ${\bar{T}}_{v_{1}, γ}^{(i, K, \pm)}$ ) produces a $v_{|γ|}$ -mutant is, for large K,

\begin{matrix} μ_{K}^{|γ| - 1} (\prod_{k = ⌊ α ⌋ + 1}^{|γ| - 1}, λ, (ρ (v_{k}, v)), m, (v_{k}, v_{k + 1})) (1 + O (ε)) . \end{matrix}

117

Since Lemma 14 implies that ${\bar{M}}_{v_{1}, γ}^{(K, \pm)}$ can be treated as a Poison process with intensity

\begin{matrix} K μ_{K}^{d (v, v_{0}) + 1} a_{v} b (v_{0}) m (v_{0}, v_{1}), \end{matrix}

118

we get appearance of $v_{|γ|}$ -mutants also as Poison process with thinned intensity

\begin{matrix} K μ_{K}^{d (v, v_{0}) + |γ|} a_{v} b (v_{0}) m (v_{0}, v_{1}) (\prod_{k = ⌊ α ⌋ + 1}^{|γ| - 1}, λ, (ρ (v_{k}, v)), m, (v_{k}, v_{k + 1})) (1 + O (ε)) \end{matrix}

119

\begin{matrix} = {\tilde{R}}_{v, γ}^{(\pm)} K μ_{K}^{⌊ α ⌋ + |γ|} . \end{matrix}

120

Eventually, the difference between $M_{v_{1}, γ}^{(K, \pm)}$ and ${\bar{M}}_{v_{1}, γ}^{(K, \pm)}$ is of smaller order than ${(μ_{K}^{L - 1 - ⌊ α ⌋})}^{- 1}$ and multiplying with the thinning probability (117), which is of order $μ_{K}^{|γ| - 1 - ⌊ α ⌋}$ , this only changes the appearance rate for the $v_{|γ|}$ -mutants by a vanishing order. $□$

Remark 16

Note that in general there could be an overlap of two excursions of $N_{v_{k}, γ}^{K}$ , associated to different incoming mutants. Nevertheless in the limit of $K \to \infty$ this does not happen since the time interval between the incoming mutants diverges, while the durations of the excursions stay of order one, i.e. $T_{v_{k}, γ}^{(i + 1, K)} - T_{v_{k}, γ}^{(i, K)} ≫ 1$ .

As a direct corollary we can deduce the law of the appearance times of new mutants with trait $v \in V \ V_{α}$ .

Corollary 16

Suppose $v$ and ${(β^{K} (0))}_{K \geq 0}$ are an asymptotic ESC. Let $v \in V \ V_{α}$ be a trait such that all paths $γ : \partial V_{α} \to v$ of shortest length $|γ| = d (V_{α}, v)$ do only visit traits with negative invasion fitness, excluding the last trait v, i.e. $f (γ_{i}, v) < 0 \forall i = 0, \dots, |γ| - 1$ . Denote by $T_{v}^{(i, K)}$ the appearance time of the i-th mutant of trait v descended from an nearest neighbour trait. Then there exists a $0 < c < \infty$ such that, for each $ε > 0$ , there exist sequences of iid. exponential random variables $E_{v}^{(i, K, \pm)}$ , $i \geq 1$ with rates ${\tilde{R}}_{v}^{(\pm)} = (1 \pm C ε) {\tilde{R}}_{v}$ , where

\begin{matrix} {\tilde{R}}_{v} : = \sum_{\begin{matrix} γ : \partial V_{α} \to v \\ |γ| = d (V_{α}, v) \end{matrix}} a_{γ_{0}} b (γ_{0}) m (γ_{0}, γ_{1}) \prod_{j = 1}^{|γ| - 1} λ (ρ (γ_{j}, v)) m (γ_{j}, γ_{j + 1}) \end{matrix}

121

Such that

\begin{matrix} \underset{K \to \infty}{lim inf} P (E^{(i, K, -)} \leq K μ_{K}^{d (v, v)} (T_{v}^{(i, K)} - T_{v}^{(i - 1, K)}) \leq E^{(i, K, +)} |T_{v}^{(i, K)} < T_{fix}^{K})) \\ \geq 1 - c ε \end{matrix}

122

Proof

Due to Lemma 15, we can describe the arrivals of new v-type mutants approximately as sum of Poisson point processes. Since the Poisson measures $Q_{\cdot, \cdot}^{(\cdot)}$ in our representation (101) are taken as independent, the resulting mutation counting processes $M_{v, γ}^{K}$ are also independent. Hence their sum can be approximated by a Poisson process with with intensity

\begin{matrix} \sum_{γ : \partial V_{α} \to v} {\tilde{R}}_{v, γ} K μ_{K}^{|γ| + ⌊ α ⌋} . \end{matrix}

123

Since each summand scales with the length of the respective path, the first order of the overall rate is given only by the shortest paths (i.e. $γ$ such that $|γ| = d (V_{α}, v) [1] = d (v, v) - ⌊ α ⌋$ ). As a result, the first order becomes (121) multiplied by $K μ^{d (v, v)}$ . Finally, the waiting times of homogeneous Poisson point processes are exponentially distributed with the same rate. $□$

Proof of Theorem 7 and Corollary 8

We have now assembled all the tools to finish the proof of Theorem 7 and Corollary 8.

Note that, with the notation from the proof of Lemma 14, all following considerations are only valid up to the stopping time $S_{ε}^{K} \land σ_{ε}^{K}$ , for sufficiently small $ε$ . Since we have seen previously that $T_{fix}^{K} \leq S_{ε}^{K} \land σ_{ε}^{K}$ with probability converging to one, as $K \to \infty$ , we do not condition on this anymore in the following. Moreover, constants c and C may vary throughout the proof but are always assumed to satisfy $0 < c, C < \infty$ .

Both results assume that the initial conditions ${(β^{K} (0))}_{K \geq 0}$ compose an asymptotic ESC associated to the coexisting traits $v \subset V$ . In a first step, we study the time until the fixation of the first mutant trait outside of $V_{α} : = V_{α} (v)$ , i.e. $T_{fix}^{K}$ . Corollary 16 implies that, for all traits $w \in V \ V_{α}$ such that all shortest paths $γ : v \to v$ only pass through unfit traits, new mutants of this trait arise approximately according to a Poisson point process with rate ${\tilde{R}}_{v}$ . By assumption, $β_{w}^{K} (0) = 0$ , for all $K > K_{0}$ and $w \in V \ V_{α}$ , i.e. all traits outside of $V_{α}$ are initially extinct. As a result, individuals of such traits w are only present due to the above incoming mutations.

We now argue why it suffices to consider traits $w \in V \ V_{α}$ such that $f (w, v) > 0$ and $d (v, w) = L (v)$ , i.e. the $w \in V_{mut} : = V_{mut} (v)$ , as candidates to reach $β_{w}^{K} = 1 / α$ first and trigger the stopping time $T_{fix}^{K}$ .

For all w such that $⌊ α ⌋ < d (v, w) < L (v)$ , the definition of $L (v)$ yields $f (w, v) < 0$ . Therefore, the descendants of a mutant of such traits can be bounded from above by a subcritical birth-death process with rates that do not depend on K, that dies out within a finite time with probability 1. As a result,

\begin{matrix} lim_{K \to \infty} P (sup_{t \in [0, T_{fix}^{K} \land e^{U_{ε} K}]} β_{w}^{K} (t) \geq \frac{1}{α}) = 0 . \end{matrix}

124

For w such that $d (v, w) = L (v)$ and $f (w, v) < 0$ , the same argument can be applied.

Finally, for all w such that $d (v, w) > L (v)$ , for all $T < \infty$ , Corollary 16 implies that the arrival time of the first w mutant, $T_{w}^{(1)}$ , satisfies

\begin{matrix} lim_{K \to \infty} P (T_{w}^{(1)} \leq \frac{T}{K μ_{K}^{L (v)}} \land T_{fix}^{K}) = 0 . \end{matrix}

125

Focussing on the $w \in V_{mut}$ , we can use couplings to supercritical birth-death processes (similar to the arguments in the previous sections) to bound the different mutant populations. Using classical results on branching processes (e.g. from Athreya and Ney 1972, Ch. III.4) we can approximate the probability that the descendants of a single mutant of a particular trait w do not go extinct by $(1 \pm C ε) f (w, v) / b (w)$ . Moreover, conditioned on not going extinct, the time that such a population needs to grow to a size of $K^{1 / α}$ can be bounded by $(1 \pm C ε) ln K / α f (w, v)$ . It is therefore negligible on the time scale $1 / K μ_{K}^{L (v)}$ , on which the w mutants arise.

Overall, we can deduce from Corollary 16 that there is a constant $0 < c < \infty$ and exponential random variables $E_{w, fix}^{(K, \pm)}$ with parameters $(1 \pm c ε) {\tilde{R}}_{w} f (w, v) / b (w) = (1 \pm c ε) R (v, w)$ such that

\begin{matrix} \underset{K \to \infty}{lim inf} P (E_{w, fix}^{(K, -)} \leq K μ_{K}^{L (v)} T_{fix}^{K} \leq E_{w, fix}^{(K, +)} | β_{w}^{K} (T_{fix}^{K}) = \frac{1}{α}) \geq 1 - c ε . \end{matrix}

126

Since the mutants arising along different paths are independent (see the proof of Corollary 16), the actual stopping time $K μ_{K}^{L (v))} T_{fix}^{K}$ (without conditioning on a trait w) is roughly exponentially distributed with the sum of all rates $R (v) = \sum_{w \in V_{mut}} R (v, w)$ . In addition, the probability that a certain trait $w \in V_{mut}$ triggers the stopping time $T_{fix}^{K}$ can be approximated by $R (v, w) / R (v)$ . More precisely, there are exponential random variables $E^{(K, \pm)} (ε)$ such that

\begin{matrix} \underset{K \to \infty}{lim inf} P (E^{(K, -)} (ε) \leq K μ_{K}^{L (v)} T_{fix}^{K} \leq E^{(K, +)} (ε)) \geq 1 - c ε, \end{matrix}

127

\begin{matrix} \frac{R (v, w)}{R (v)} (1 - c ε) \leq lim_{K \to \infty} P (β_{w}^{K} (T_{fix}^{K}) = \frac{1}{α}) \leq \frac{R (v, w)}{R (v)} (1 + c ε) . \end{matrix}

128

Since $ε$ can be picked arbitrarily small, this concludes the proof of Theorem 7.

To deduce Corollary 8, we note that at time $T_{fix}^{K}$ the population sizes satisfy (28), for some $w \in V_{mut} (v)$ . Hence the assumption of the corollary and Theorem 18 imply that a new ESC associated to $v_{ESC} (v, w)$ is obtained within a time of order $ln K$ . We emphasise that, although Theorem 18 only implies that $β_{u}^{K} \to 0$ for traits $u \notin V_{α} (v_{ESC} (v, w))$ after this time, these subpopulations can be bounded from above by subcritical branching processes that go extinct within a time of order 1, such that the conditions of $T_{ESC}^{K}$ are truly satisfied. This yields the first claim of Corollary 8. Since this time is again negligible with respect to the $1 / K μ_{K}^{L (v)}$ -time scale, the second claim follows directly. For the last claim, we realise that a new ESC $w$ might be reached from multiple $w \in V_{mut} (v)$ , and we therefore add up all corresponding probabilities to obtain $p (v, w)$ . This concludes the proof of Corollary 8.

Proof of Corollary 10 and Theorem 11

In order to derive results for the jump chain ${(v^{(k)})}_{k \geq 0}$ on $G_{ESC}$ , we observe that, after a successful transition according to Corollary 8, the final state of the process again satisfies the initial assumptions for another application of the corollary. We simply need to recompute the state-dependent quantities ( $L (v), V_{mut} (v)$ , etc.). As a consequence, the strong Markov property allows us to use Corollary 8 to construct the random sequence ${(v^{(k)})}_{k \geq 0}$ as well as derive the asymptotics of the stopping times $T_{ESC}^{(k, K)}$ by an inductive procedure. This proves Corollary 10.

To extract the limiting process on the time scale $1 / K μ_{K}^{L}$ for fixed $L > α$ , take an initial configuration of this stability degree, i.e. $v \in S^{L}$ . Considering the jump chain ${(v^{(k)})}_{k \geq 0}$ with $v^{(0)} = v$ , Assumption 3 implies that, with probability one, ${(v^{(k)})}_{k \geq 0}$ reaches an ESC of stability degree at least L within finitely many steps. We now consider such a finite path $Γ : v \to w$ in $G_{ESC}$ , where $L (w) \geq L$ . Without loss of generality we may assume that the intermediate ESCs are of strictly lower stability degree, i.e. $L (Γ_{i}) < L \forall 1 \leq i < |Γ|$ . Otherwise we could shorten the path. Asking now for the time $T_{Γ}^{K}$ that it takes to transition from $v$ to $w$ along $Γ$ , we can simply add up the single step transition times $T^{(i, K)} - T^{(i - 1, K)}$ . By Corollary 10, we know that, on the time scale $1 / K μ_{K}^{(i)}$ , those are well approximated by exponential random variables $E_{\pm}^{(i, K)}$ . Since $L = L^{(1)} > L^{(i)}$ , for $2 \leq i \leq |Γ|$ , we can deduce that the rescaled transition time $T_{Γ}^{K} K μ_{K}^{L}$ is dominated by the very first transition and thus well described by exponential random variables.

To compute the respective transition rates, notice that by Corollary 10, on the time scale $1 / K μ_{K}^{L}$ , the rate to escape from $v = Γ_{0}$ is given by $R (v) = R^{(1)}$ . Moreover, we have to take into account that we consider the case where the limit process ${(v^{(i)})}_{i \geq i}$ takes a particular path, i.e. $v^{(i)} = Γ_{i}$ , for $0 \leq i \leq |Γ|$ . The probability of this event is simply given by the product of the one-step-probabilities $p (v^{(i - 1)}, v^{(i)})$ . Similarly to previous arguments, there might by different paths $Γ : v \to w$ and hence we add up their probabilities. This yields the rates $R^{L} (v, w)$ in (41) and therefore the claimed dynamics of the jump process ${(v^{L} (t))}_{t \in [0, T]}$ on the L-scale graph $G^{L}$ .

To finally deduce the limit of the rescaled population process $N^{K} / K$ , we note that there is no macroscopic evolution during almost the entire waiting time for a transition on $G^{L}$ . The set of macroscopic traits $\{v \in V : β_{v}^{K} (t) > 1 - ε_{K}\}$ only changes after a new mutant fixates, which happens at time $T_{fix}^{(1, K)}$ . The rest of the transition time, which may consist of many chances of the macroscopic traits, vanishes when rescaling with $K μ_{K}^{L}$ . Therefore, we obtain the limit process of Theorem 11, which jumps between the Lotka–Volterra-equilibria associated to the state of ${(v^{L} (t))}_{t \in [0, T]}$ .

Acknowledgements

This work was partially supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy GZ 2047/1, Projekt-ID 390685813 and GZ 2151, Project-ID 390873048 and through Project-ID 211504053 - SFB 1060. The authors thank Anton Bovier for stimulating discussion and feedback on the manuscript. Moreover, the authors are very grateful to the anonymous referees for their numerous comments and questions, which helped to greatly improve the manuscript.

Appendix A Technical results

The aim of this chapter is to collect some results on the $O (1)$ - and $O (ln K)$ -time scale behaviour of the population process. While Sect. A.1 explains the form of $λ (ρ)$ , Sect. A.2 justifies the notation $v_{ESC} (v, v)$ . The statements have been derived in Bovier et al. (2019) and Coquille et al. (2021) whereto we refer for detailed proofs.

A.1 Excursions of subcritical birth death processes

The first lemma quantifies the mean number of birth events before a subcritical birth death process goes extinct, corresponding to $λ (ρ)$ . Although we restate an existing result here, we provide a short proof below. This proof is different to the more general scenario that is cited in Bovier et al. (2019) and gives the reader an intuition behind the expression.

Lemma 17

(Bovier et al. 2019, Lemma A.3) Consider a subcritical linear birth death process with individual birth and death rates $0 < b < d$ . Denote by Z the total number of birth events during an excursion of this process initiated with exactly one individual. Then, for $k \in N_{0}$ ,

\begin{matrix} p^{(b, d)} (k) : = P (Z = k) = \frac{(2 k)!}{k! (k + 1)!} {(\frac{b}{b + d})}^{k} {(\frac{d}{b + d})}^{k + 1} \end{matrix}

and in particular

\begin{matrix} e^{(b, d)} : = E [Z] = \sum_{k = 1}^{\infty} \frac{(2 k)!}{(k - 1)! (k + 1)!} {(\frac{b}{b + d})}^{k} {(\frac{d}{b + d})}^{k + 1} . \end{matrix}

Moreover we have the following continuity result. There exist two positive constants $c, ε_{0} > 0$ , such that, for all $0 < ε < ε_{0}$ and $0 < b_{i} < d_{i}$ , if $|b_{1} - b_{2}| < ε$ and $|d_{1} - d_{2}| < ε$ , then

\begin{matrix} |e^{(b_{1}, d_{1})} - e^{(b_{2}, d_{2})}| < c ε . \end{matrix}

Remark 17

Note that (A2) corresponds to (19) via $e^{(b, d)} = λ (ρ)$ , where $ρ = b / (b + d)$ .

Proof

Although the considered process takes place in continuous time, it suffices to focus on the birth and death events as jump chain in discrete time. This is nothing but a simple random walk on $N_{0}$ with probabilities

\begin{matrix} p (x, x + 1) = \frac{b}{b + d}, p (x, x - 1) = \frac{d}{b + d} \forall x \geq 1 \end{matrix}

and absorbing state 0. From this point of view it is only a question of counting the number of paths leading from one individual to extinction consisting of exactly k births and hence $k + 1$ death events. As final step there has to happen a death since the population does not vanish before. So the first 2k events form a walk from 1 to 1. There are $(\begin{matrix} 2 k \\ k \end{matrix})$ of such paths but some of them would lead to extinction earlier. To determine their number we apply a reflection principle in the following way. Let $x = (x_{0}, x_{1}, \dots, x_{2 k})$ be a path leading from one to one such that there exists a $0 < j < 2 k$ with $x_{j} = 0$ . Then we define the partially reflected path $\tilde{x}$ by

\begin{matrix} {\tilde{x}}_{i} : = \{\begin{matrix} x_{i} & for i \leq j \\ - x_{i} & for i > j \end{matrix}) \end{matrix}

This gives us a unique path from ${\tilde{x}}_{0} = 1$ to ${\tilde{x}}_{2 k} = - 1$ (cf. Fig. 13). Moreover there is a one to one correspondence between prematurely extincting processes and paths leading from 1 to $- 1$ . The latter ones consist of only $k - 1$ births and hence there are $(\begin{matrix} 2 k \\ k - 1 \end{matrix})$ different ones. Finally the total number of legal paths is

\begin{matrix} # \{x = (x_{0}, x_{1}, \dots, x_{2 k}) | x_{0} = 1, x_{2 k} = 1, x_{i} > 0\} = (\begin{matrix} 2 k \\ k \end{matrix}) - (\begin{matrix} 2 k \\ k - 1 \end{matrix}) = \frac{(2 k)!}{k! (k + 1)!} . \end{matrix}

We now achieve (A1) by multiplying with the probability of k births and $k + 1$ death events. The last statement is a simple consequence of the mean value theorem. $□$

Fig. 13 — Original path x that prematurely goes extinct and its reflection $\tilde{x}$

A.2 Fast evolution until ESC

In this subsection we discuss the first phase of evolution, where an ESC is obtained on the $ln K$ -time scale. The convergence of $N^{K} (t ln K) / K$ and $β^{K} (t ln K)$ , as $K \to \infty$ , is studied in Coquille et al. (2021). In the following we cite the respective results in the notation of this paper.

For a finite graph $G = (V, E)$ and under Assumptions 1 and 2, the trajectories $(β_{w} (t), w \in V)$ (which turn out to be the limit of $(β_{w}^{K} (t ln K), w \in V)$ ) are defined by an inductive procedure. The construction is valid until a stopping time $T_{0}$ .

Denote by ${\tilde{v}}^{(ℓ)}$ , $ℓ \geq 0$ , the sequence of consecutive coexisting resident traits. We emphasize that these are not to be confused with the sequence of resident traits $v^{(k)}$ , $k \geq 0$ , that are associated to ESCs. The invasion times, at which the sets of resident traits change due to upcoming mutant traits, are denoted by the increasing sequence ${(s_{ℓ})}_{ℓ \geq 0}$ .

For initial conditions $\tilde{β} (0)$ , the support of the unique asymptotically stable equilibrium of the Lotka–Volterra system (2) associated to the traits ${w \in V : {\tilde{β}}_{w} (0) = 1}$ (if existent) is denoted by ${\tilde{v}}^{(0)}$ . The equilibrium $\bar{n} ({\tilde{v}}^{(0)})$ is reached within a time of order 1 and we set $s_{0} : = 0$ . Moreover, we define $β_{w} (0) : = {max}_{u \in V} {[{\tilde{β}}_{u} (0) - d (u, w) / α]}_{+}$ as the initial condition of the limiting trajectories. This reflects that, within a time of order 1, living traits produce neighbouring mutant populations with the size of a $μ_{K}$ -fraction of their own size. This time of order 1 is negligible on the $ln K$ -time scale, which the limit $β$ is defined on.

Assuming that $s_{ℓ - 1}$ , ${\tilde{v}}^{(ℓ - 1)}$ such that $L V E_{+} ({\tilde{v}}^{(ℓ - 1)}) = \bar{n} ({\tilde{v}}^{(ℓ - 1)})$ , and $β (s_{ℓ - 1})$ are known, the next phase can be described as follows. The $ℓ^{th}$ invasion time is set to

\begin{matrix} s_{ℓ} : = inf {t > s_{ℓ - 1} : \exists w \notin {\tilde{v}}^{(ℓ - 1)} : β_{w} (t) = 1} . \end{matrix}

For $s_{ℓ - 1} \leq t \leq s_{ℓ}$ , for any $w \in V$ , $β_{w} (t)$ is defined by

\begin{matrix} β_{w} (t) : = max_{\begin{matrix} u \in V \end{matrix}} [β_{u} (s_{ℓ - 1}) + (t - t_{u, ℓ} \land t) f (u, {\tilde{v}}^{(ℓ - 1)}) - \frac{d (u, w)}{α}] \lor 0, \end{matrix}

where, for any $w \in V$ ,

\begin{matrix} t_{w, ℓ} : = \{\begin{matrix} inf \{t \geq s_{ℓ - 1} : \exists u \in V : d (u, w) = 1, β_{u} (t) = \frac{1}{α}\} & if β_{w} (s_{ℓ - 1}) = 0 \\ s_{ℓ - 1} & else \end{matrix}) \end{matrix}

is the first time in $[s_{ℓ - 1}, s_{ℓ}]$ when this trait arises. If we define $V_{living} (t) : = {w \in V : β_{w} (t) > 0}$ equivalently to $V_{living}^{K}$ (on the $ln K$ -time scale), then this implies $β_{w} (t_{w, ℓ}) \geq 0$ and $β_{w} (t_{w, ℓ} + δ) > 0$ , for small $δ > 0$ .

The stopping time $T_{0}$ , that terminates the inductive construction of the limiting trajectories, is set to $s_{ℓ}$ if

There is more than one $w \in V \ {\tilde{v}}^{(ℓ - 1)}$ such that $β_{w} (s_{ℓ}) = 1$ ;
The mutation-free Lotka–Volterra system associated to ${\tilde{v}}^{(ℓ - 1)}$ and the unique $w \in V \ {\tilde{v}}^{(ℓ - 1)}$ such that $β_{w} (s_{ℓ}) = 1$ does not have a unique globally attractive stable equilibrium (in particular, if such an equilibrium does not exist for ${w \in V : {\tilde{β}}_{w} (0) = 1}$ , $T_{0}$ is set to 0);
There exists $w \in V \ {\tilde{v}}^{(ℓ - 1)}$ such that $β_{w} (s_{ℓ}) = 0$ and $β_{w} (s_{ℓ} - δ) > 0$ for all $δ > 0$ small enough.
There exists $w \in V \ {\tilde{v}}^{(ℓ - 1)}$ such that $s_{ℓ} = t_{w, ℓ}$ .

These conditions are mostly technical and are discussed in Coquille et al. (2021).

With this construction, the results can be stated as follows:

Theorem 18

(Coquille et al. 2021, Theorem 2.7) Let $G = (V, E)$ be a finite graph. Suppose that Assumption 1 and 2 hold and consider the model defined by (1) with $μ_{K} = K^{- 1 / α}$ . Let ${\tilde{v}}_{0} \subset V$ and assume that, for every $w \in V$ ,

\begin{matrix} β_{w}^{K} (0) \to {\tilde{β}}_{w} (0), (K \to \infty) in probability . \end{matrix}

A10

Then, for all $T > 0$ , as $K \to \infty$ , the sequence $((β_{w}^{K} (t ln K), w \in V), t \in [0, T \land T_{0}])$ converges in probability in $D ([0, T \land T_{0}], R_{+}^{V})$ to the deterministic, piecewise affine, continuous function $((β_{w} (t), w \in V), t \in [0, T \land T_{0}])$ , which is defined in (A8).

Theorem 19

(Coquille et al. 2021, Proposition 2.8) Under the same assumptions as in Theorem 18, for all $T > 0$ , as $K \to \infty$ , the sequence $((N_{w}^{K} (t ln K) / K, w \in V), t \in [0, T \land T_{0}])$ converges in the sense of the finite dimensional distributions to a deterministic jump process $((N_{w} (t), w \in V), t \in [0, T \land T_{0}])$ , which jumps between different Lotka–Volterra equilibria according to

\begin{matrix} N_{w} (t) : = \sum_{ℓ \in N : s_{ℓ + 1} \leq T_{0}} 1_{s_{ℓ} \leq t < s_{ℓ + 1}} 1_{w \in {\tilde{v}}^{(ℓ)}} {\bar{n}}_{w} ({\tilde{v}}^{(ℓ)}) . \end{matrix}

A11

Moreover, the invasion times $s_{ℓ}$ and the times $t_{w, ℓ}$ when new mutants arise are calculated precisely in Coquille et al. (2021). This is however not relevant to the discussion in this paper.

We notice that the constructed trajectories $(β_{w} (t), w \in V)$ stay constant precisely once an ESC is obtained. In this case, there is no more visible evolution on the $ln K$ -time scale.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Athreya KB, Ney PE (1972) Branching processes. Die Grundlehren der mathematischen Wissenschaften, vol 196, Springer-Verlag, New York-Heidelberg
Baar M, Bovier A, Champagnat N (2017) From stochastic, individual-based models to the canonical equation of adaptive dynamics in one step. Ann Appl Probab 27(2):1093–1170. 10.1214/16-AAP1227 [Google Scholar]
Berestycki J, Brunet E, Shi Z (2016) The number of accessible paths in the hypercube. Bernoulli 22(2):653–68. 10.3150/14-BEJ641 [Google Scholar]
Berestycki J, Brunet E, Shi Z (2017) Accessibility percolation with backsteps. ALEA Lat Am J Probab Math Stat 14(1):45–62 [Google Scholar]
Bolker B, Pacala SW (1997) Using moment equations to understand stochastically driven spatial pattern formation in ecological systems. Theor Popul Biol 52(3):179–19. 10.1006/tpbi.1997.1331 [DOI] [PubMed] [Google Scholar]
Bovier A (2021) Stochastic models for adaptive dynamics: scaling limits and diversity. In: Baake E, Wakolbinger A (eds) Probabilistic structures in evolution, EMS series of congress reports, vol 17. EMS Press, Berlin, pp 127–149
Bovier A, den Hollander F (2015) Metastability, a potential-theoretic approach, Grundlehren der Mathematischen Wissenschaften, vol 351. Springer Cham Heidelberg New York Dortrecht London
Bovier A, Coquille L, Neukirch R (2018) The recovery of a recessive allele in a Mendelian diploid model. J Math Biol 77(4):971–103. 10.1007/s00285-018-1240-z [DOI] [PubMed] [Google Scholar]
Bovier A, Coquille L, Smadi C (2019) Crossing a fitness valley as a metastable transition in a stochastic population model. Ann Appl Probab 29(6):3541–358. 10.1214/19-AAP1487 [Google Scholar]
Champagnat N (2006) A microscopic interpretation for adaptive dynamics trait substitution sequence models. Stoch Process Appl 116(8):1127–116. 10.1016/j.spa.2006.01.004 [Google Scholar]
Champagnat N, Méléard S (2011) Polymorphic evolution sequence and evolutionary branching. Probab Theory Relat Fields 151(1–2):45–9. 10.1007/s00440-010-0292-9 [Google Scholar]
Champagnat N, Méléard S, Tran VC (2021) Stochastic analysis of emergence of evolutionary cyclic behavior in population dynamics with transfer. Ann Appl Probab 31(4):1820–1867 [Google Scholar]
Cirillo ENM, Nardi FR (2013) Relaxation height in energy landscapes: an application to multiple metastable states. J Stat Phys 150:1080–1114 [Google Scholar]
Coquille L, Kraut A, Smadi C (2021) Stochastic individual-based models with power law mutation rate on a general finite trait space. Electron J Probab 26:1–37 [Google Scholar]
Dawson DA, Greven A (2014) Spatial Fleming-Viot models with selection and mutation, vol 2092. Lecture notes in mathematics. Springer [Google Scholar]
De Visser JAG, Krug J (2014) Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet 15(7):480–490 [DOI] [PubMed] [Google Scholar]
Ethier SN, Kurtz TG (1986) Markov processes. Wiley Ser Probab Math Stat. Wiley, New York. 10.1002/9780470316658
Fournier N, Méléard S (2004) A microscopic probabilistic description of a locally regulated population and macroscopic approximations. Ann Appl Probab 14(4):1880–191. 10.1214/105051604000000882 [Google Scholar]
Gillespie DT (1976) A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 22(4):403–43. 10.1016/0021-9991(76)90041-3 [Google Scholar]
Gillespie JH (1984) Molecular evolution over the mutational landscape. Evolution 38(5):1116–1129 [DOI] [PubMed] [Google Scholar]
Gokhale CS, Iwasa Y, Nowak MA et al (2009) The pace of evolution across fitness valleys. J Theor Biol 259(3):613–620 [DOI] [PMC free article] [PubMed] [Google Scholar]
Jain K (2007) Evolutionary dynamics of the most populated genotype on rugged fitness landscapes. Phys Rev E Stat Nonlinear Soft Matter Phys 76(3):031922 [DOI] [PubMed] [Google Scholar]
Jain K, J Krug (2005) Evolutionary trajectories in rugged fitness landscapes. J Stat Mech Theory Exp 4:400. 10.1088/1742-5468/2005/04/p04008 [Google Scholar]
Jain K, Krug J (2007) Deterministic and stochastic regimes of asexual evolution on rugged fitness landscapes. Genetics 175:1275–8. 10.1534/genetics.106.067165 [DOI] [PMC free article] [PubMed] [Google Scholar]
Komarova NL (2007) Loss- and gain-of-function mutations in cancer: mass-action, spatial and hierarchical models. J Stat Phys 128:413–446 [Google Scholar]
Kraut A, Bovier A (2019) From adaptive dynamics to adaptive walks. J Math Biol 79(5):1699–174. 10.1007/s00285-019-01408-6 [DOI] [PubMed] [Google Scholar]
Krug J (2021) Accessibility percolation in random fitness landscapes. In: Baake E, Wakolbinger A (eds) Probabilistic structures in evolution. EMS series of congress reports, vol 17. EMS Press, Berlin, pp 1–22
Martincorena I, Raine KM, Gerstung M et al (2017) Universal patterns of selection in cancer and somatic tissues. Cell 171(5):1029–1041 [DOI] [PMC free article] [PubMed] [Google Scholar]
Metz JAJ, Nisbet RM, Geritz SAH (1992) How should we define ‘fitness’ for general ecological scenarios? Trends Ecol Evol 7(6):198–202 [DOI] [PubMed] [Google Scholar]
Neidhart J, Krug J (2011) Adaptive walks and extreme value theory. Phys Rev Lett 107(178):10. 10.1103/PhysRevLett.107.178102 [DOI] [PubMed] [Google Scholar]
Nicholson M, Antal T (2019) Competing evolutionary paths in growing populations with applications to multidrug resistance. PLoS Comput Biol 15(4):e1006866 [DOI] [PMC free article] [PubMed] [Google Scholar]
Nowak S (2015) Krug J (2015) Analysis of adaptive walks on NK fitness landscapes with different interaction schemes. J Stat Mech Theory Exp 6:06014. 10.1088/1742-5468/2015/06/p06014 [Google Scholar]
Orr HA (2003) A minimum on the mean number of steps taken in adaptive walks. J Theor Biol 220(2):241–24. 10.1006/jtbi.2003.3161 [DOI] [PubMed] [Google Scholar]
Pallen MNM (2006) From the origin of species to the origin of bacterial flagella. Nat Rev Microbiol 4:784–790 [DOI] [PubMed] [Google Scholar]
Schmiegelt B, Krug J (2014) Evolutionary accessibility of modular fitness landscapes. J Stat Phys 154(1–2):334–35. 10.1007/s10955-013-0868-8 [Google Scholar]
Smadi C (2017) The effect of recurrent mutations on genetic diversity in a large population of varying size. Acta Appl Math 149:11–5. 10.1007/s10440-016-0086-x [Google Scholar]

[CR1] Athreya KB, Ney PE (1972) Branching processes. Die Grundlehren der mathematischen Wissenschaften, vol 196, Springer-Verlag, New York-Heidelberg

[CR2] Baar M, Bovier A, Champagnat N (2017) From stochastic, individual-based models to the canonical equation of adaptive dynamics in one step. Ann Appl Probab 27(2):1093–1170. 10.1214/16-AAP1227 [Google Scholar]

[CR3] Berestycki J, Brunet E, Shi Z (2016) The number of accessible paths in the hypercube. Bernoulli 22(2):653–68. 10.3150/14-BEJ641 [Google Scholar]

[CR4] Berestycki J, Brunet E, Shi Z (2017) Accessibility percolation with backsteps. ALEA Lat Am J Probab Math Stat 14(1):45–62 [Google Scholar]

[CR5] Bolker B, Pacala SW (1997) Using moment equations to understand stochastically driven spatial pattern formation in ecological systems. Theor Popul Biol 52(3):179–19. 10.1006/tpbi.1997.1331 [DOI] [PubMed] [Google Scholar]

[CR6] Bovier A (2021) Stochastic models for adaptive dynamics: scaling limits and diversity. In: Baake E, Wakolbinger A (eds) Probabilistic structures in evolution, EMS series of congress reports, vol 17. EMS Press, Berlin, pp 127–149

[CR7] Bovier A, den Hollander F (2015) Metastability, a potential-theoretic approach, Grundlehren der Mathematischen Wissenschaften, vol 351. Springer Cham Heidelberg New York Dortrecht London

[CR8] Bovier A, Coquille L, Neukirch R (2018) The recovery of a recessive allele in a Mendelian diploid model. J Math Biol 77(4):971–103. 10.1007/s00285-018-1240-z [DOI] [PubMed] [Google Scholar]

[CR9] Bovier A, Coquille L, Smadi C (2019) Crossing a fitness valley as a metastable transition in a stochastic population model. Ann Appl Probab 29(6):3541–358. 10.1214/19-AAP1487 [Google Scholar]

[CR10] Champagnat N (2006) A microscopic interpretation for adaptive dynamics trait substitution sequence models. Stoch Process Appl 116(8):1127–116. 10.1016/j.spa.2006.01.004 [Google Scholar]

[CR11] Champagnat N, Méléard S (2011) Polymorphic evolution sequence and evolutionary branching. Probab Theory Relat Fields 151(1–2):45–9. 10.1007/s00440-010-0292-9 [Google Scholar]

[CR12] Champagnat N, Méléard S, Tran VC (2021) Stochastic analysis of emergence of evolutionary cyclic behavior in population dynamics with transfer. Ann Appl Probab 31(4):1820–1867 [Google Scholar]

[CR13] Cirillo ENM, Nardi FR (2013) Relaxation height in energy landscapes: an application to multiple metastable states. J Stat Phys 150:1080–1114 [Google Scholar]

[CR14] Coquille L, Kraut A, Smadi C (2021) Stochastic individual-based models with power law mutation rate on a general finite trait space. Electron J Probab 26:1–37 [Google Scholar]

[CR15] Dawson DA, Greven A (2014) Spatial Fleming-Viot models with selection and mutation, vol 2092. Lecture notes in mathematics. Springer [Google Scholar]

[CR16] De Visser JAG, Krug J (2014) Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet 15(7):480–490 [DOI] [PubMed] [Google Scholar]

[CR17] Ethier SN, Kurtz TG (1986) Markov processes. Wiley Ser Probab Math Stat. Wiley, New York. 10.1002/9780470316658

[CR18] Fournier N, Méléard S (2004) A microscopic probabilistic description of a locally regulated population and macroscopic approximations. Ann Appl Probab 14(4):1880–191. 10.1214/105051604000000882 [Google Scholar]

[CR19] Gillespie DT (1976) A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 22(4):403–43. 10.1016/0021-9991(76)90041-3 [Google Scholar]

[CR20] Gillespie JH (1984) Molecular evolution over the mutational landscape. Evolution 38(5):1116–1129 [DOI] [PubMed] [Google Scholar]

[CR21] Gokhale CS, Iwasa Y, Nowak MA et al (2009) The pace of evolution across fitness valleys. J Theor Biol 259(3):613–620 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] Jain K (2007) Evolutionary dynamics of the most populated genotype on rugged fitness landscapes. Phys Rev E Stat Nonlinear Soft Matter Phys 76(3):031922 [DOI] [PubMed] [Google Scholar]

[CR23] Jain K, J Krug (2005) Evolutionary trajectories in rugged fitness landscapes. J Stat Mech Theory Exp 4:400. 10.1088/1742-5468/2005/04/p04008 [Google Scholar]

[CR24] Jain K, Krug J (2007) Deterministic and stochastic regimes of asexual evolution on rugged fitness landscapes. Genetics 175:1275–8. 10.1534/genetics.106.067165 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] Komarova NL (2007) Loss- and gain-of-function mutations in cancer: mass-action, spatial and hierarchical models. J Stat Phys 128:413–446 [Google Scholar]

[CR26] Kraut A, Bovier A (2019) From adaptive dynamics to adaptive walks. J Math Biol 79(5):1699–174. 10.1007/s00285-019-01408-6 [DOI] [PubMed] [Google Scholar]

[CR27] Krug J (2021) Accessibility percolation in random fitness landscapes. In: Baake E, Wakolbinger A (eds) Probabilistic structures in evolution. EMS series of congress reports, vol 17. EMS Press, Berlin, pp 1–22

[CR28] Martincorena I, Raine KM, Gerstung M et al (2017) Universal patterns of selection in cancer and somatic tissues. Cell 171(5):1029–1041 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] Metz JAJ, Nisbet RM, Geritz SAH (1992) How should we define ‘fitness’ for general ecological scenarios? Trends Ecol Evol 7(6):198–202 [DOI] [PubMed] [Google Scholar]

[CR30] Neidhart J, Krug J (2011) Adaptive walks and extreme value theory. Phys Rev Lett 107(178):10. 10.1103/PhysRevLett.107.178102 [DOI] [PubMed] [Google Scholar]

[CR31] Nicholson M, Antal T (2019) Competing evolutionary paths in growing populations with applications to multidrug resistance. PLoS Comput Biol 15(4):e1006866 [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] Nowak S (2015) Krug J (2015) Analysis of adaptive walks on NK fitness landscapes with different interaction schemes. J Stat Mech Theory Exp 6:06014. 10.1088/1742-5468/2015/06/p06014 [Google Scholar]

[CR33] Orr HA (2003) A minimum on the mean number of steps taken in adaptive walks. J Theor Biol 220(2):241–24. 10.1006/jtbi.2003.3161 [DOI] [PubMed] [Google Scholar]

[CR34] Pallen MNM (2006) From the origin of species to the origin of bacterial flagella. Nat Rev Microbiol 4:784–790 [DOI] [PubMed] [Google Scholar]

[CR35] Schmiegelt B, Krug J (2014) Evolutionary accessibility of modular fitness landscapes. J Stat Phys 154(1–2):334–35. 10.1007/s10955-013-0868-8 [Google Scholar]

[CR36] Smadi C (2017) The effect of recurrent mutations on genetic diversity in a large population of varying size. Acta Appl Math 149:11–5. 10.1007/s10440-016-0086-x [Google Scholar]

PERMALINK

A general multi-scale description of metastable adaptive motion across fitness valleys

Manuel Esser

Anna Kraut

Abstract

Introduction

Model and main results

Individual-based model

Assumption 1

Short-term dynamics and frequent mutations

Definition 1

Remark 1

Definition 2

Remark 2

Definition 3

Remark 3

Remark 4

Assumption 2

Remark 5

Fig. 1.

Definition 4

Definition 5

Remark 6

Definition 6

Remark 7

Transitioning out of an ESC and first convergence result

Remark 8

Theorem 7

Remark 9

Remark 10

Corollary 8

Remark 11

Multi-scale jump chain and limiting Markov jump processes

Definition 9

Corollary 10

Fig. 2.

Assumption 3

Remark 12

Theorem 11

Remark 13

Interesting examples

Definition 12

Definition 13

Single transition steps

A first example with multiple mutation paths

Example 1

Fig. 3.

Fig. 4.

An ESC with coexistence

Example 2

Fig. 5.

Successive metastable transitions

Self connection in GESC

Example 3

Fig. 6.

On Assumption 3

Example 4

Fig. 7.

Example 5

Fig. 8.

Remark 14

Collapse on higher time scales

Example 6

Fig. 9.

Fig. 10.

Example 7

Fig. 11.

Fig. 12.

Proofs

Estimation of the equilibrium size

Lemma 14

Proof

Pathwise evolution rates

Remark 15

Lemma 15

Proof

Remark 16

Corollary 16

Proof

Proof of Theorem 7 and Corollary 8

Self connection in $G_{ESC}$