The Rate at Which Asexual Populations Cross Fitness Valleys

Daniel B Weissman; Michael M Desai; Daniel S Fisher; Marcus W Feldman

doi:10.1016/j.tpb.2009.02.006

. Author manuscript; available in PMC: 2010 Nov 26.

Published in final edited form as: Theor Popul Biol. 2009 Mar 13;75(4):286–300. doi: 10.1016/j.tpb.2009.02.006

The Rate at Which Asexual Populations Cross Fitness Valleys

Daniel B Weissman ^*, Michael M Desai ^†, Daniel S Fisher ^††, Marcus W Feldman ^*

PMCID: PMC2992471 NIHMSID: NIHMS102575 PMID: 19285994

Abstract

Complex traits often involve interactions between different genetic loci. This can lead to sign epistasis, whereby a set of mutations are individually deleterious or neutral but in combination confer a fitness benefit. In order to acquire the beneficial genotype, an asexual population must cross a fitness valley or plateau by first acquiring the deleterious or neutral intermediates. Here, we present a complete, intuitive theoretical description of the valley-crossing process across the full spectrum of possible parameter regimes. We calculate the rate at which a population crosses a fitness valley or plateau of arbitrary width, as a function of the mutation rates, the population size, and the fitnesses of the intermediates. We find that when intermediates are close to neutral, a large population can cross even wide fitness valleys remarkably quickly, so that valley-crossing dynamics may be common even when mutations that directly increase fitness are also possible. Thus the evolutionary dynamics of large populations can be sensitive to the structure of an extended region of the fitness landscape – the population may be likely to pass up directly uphill paths in favor of paths across valleys and plateaus that lead eventually to fitter genotypes. In smaller populations, we find that below a threshold size which depends on the width of the fitness valley and the strength of selection against intermediate genotypes, valley-crossing is much less likely and hence the evolutionary dynamics are less influenced by distant regions of the fitness landscape.

Keywords: Sign Epistasis, Fitness Valleys, Adaptive Evolution

I. INTRODUCTION

Complex traits derive their complexity, in part, from the interactions between multiple genes. This complicates the quantitative description of the evolution of these traits. In some cases, complex phenotypes may evolve through the accumulation of a number of individually beneficial mutations. In others, however, advantageous traits could require multiple mutations in different genes, each of which may be individually neutral or deleterious in the absence of the other mutations. For example, the evolution of a new function in a signal pathway may require mutations in the genes for both a receptor and the corresponding ligand, or in a series of receptor-ligand pairs involved in the pathway (Goh et al., 2000). Other examples include types of cancer that typically occur only after a series of mutations (Knudson, 2001), pathogens that require multiple mutations in order to escape their hosts’ immune response (Levin et al., 2000; McDonald and Linde, 2002; Shih et al., 2007), and the evolution of citrate usage in E. coli (Blount et al., 2008).

In order for a population to acquire an adaptation involving multiple mutations that are individually neutral or deleterious, at least some individuals must first acquire the neutral or deleterious intermediate mutations. In the language of fitness landscapes, there is no directly uphill path from the current genotype to one of higher fitness that corresponds to this adaptation. The population must cross a “fitness valley,” or in the case of neutral intermediates a “fitness plateau,” to reach the higher-fitness state. In this way a population can escape a local peak in fitness space (i.e. a genotype in which no single mutation confers a fitness advantage) by producing a more distantly related higher-fitness genotype (Weinreich and Chao, 2005). Valley-crossing dynamics may also be important when the population is not at a local fitness peak. We want to understand more generally the dynamics in situations where both individually advantageous mutations as well as valley-crossing processes are simultaneously possible.

In general, these evolutionary dynamics depend on the full range of possible pathways by which a population can accumulate mutations to produce nearby higher-fitness genotypes. All of the mutation rates and selective pressures of intermediate genotypes affect the dynamics. We refer to this set of possibilities and relevant parameters as the local structure of the fitness landscape. Unfortunately, very little is known about what fitness landscapes are typical in nature, so it is impossible to say what sorts of evolutionary dynamics are most common. Instead, we aim to lay out the various qualitatively different types of dynamics, and to understand which aspects of the fitness landscape determine the the relative likelihood of different dynamics. As we will see, this provides a new perspective on what could plausibly be typical in evolution. We find, for example, that a population will not necessarily go directly “uphill” in fitness space even if such a change is possible – sometimes valley-crossing will be more likely.

There are two general ways a population can cross a fitness valley. Each of the intermediates can fix in turn through random drift, until eventually the final mutation provides the advantageous effect. We refer to this process as sequential fixation. Alternatively, intermediates can drift at relatively low frequencies, each such intermediate eventually disappearing, until an individual accumulates a combination of mutations that provides a selective advantage. While recombination can bring together such combinations of mutations in a sexual population, in an asexual population they can only occur through multiple mutation events in a single lineage. This latter process in an asexual population has been dubbed “stochastic tunneling” (Iwasa et al., 2004b). Since it is easier for neutral or deleterious mutations to fix through drift in small populations, we expect that in small enough populations sequential fixation will dominate and stochastic tunneling will not occur. In larger populations, on the other hand, neutral and especially deleterious mutations very rarely fix, so we expect that stochastic tunneling will be more important.

The simplest version of the valley-crossing problem in asexuals is when only two mutations, each individually neutral or deleterious, combine to produce a beneficial trait. Kimura (1985) and Carter and Wagner (2002) analyzed this problem in the context of the evolution of pairs of compensatory mutations. Weinreich and Chao (2005) expanded on this work to analyze the valley-crossing problem in both small and large populations for the case of strongly deleterious single-mutant intermediates. This complements the earlier work of Iwasa et al. (2004b), who focused exclusively on the stochastic tunneling process in large populations, but analyzed neutral or arbitrarily deleterious single-mutant intermediates. Durrett and Schmidt (2008) extended this work by also including valley-crossing in small and intermediate-size populations with neutral single-mutants, although without considering the effect of the strength of selection on the double-mutants. For adaptations requiring more than two mutations, Iwasa et al. (2004a) derived the probability of tunneling in large populations with either neutral or strongly deleterious intermediate mutations. Serra and Haccou (2007) extended this work on adaptations requiring more than two mutations to the case of arbitrarily deleterious intermediates. All of this work is also related to the analysis of Barton and Rouhani (1987), who studied a different kind of fitness valley in which there are multiple stable deterministic equilibria.

In this paper, we provide a complete, intuitive description of the valley-crossing problem in asexuals involving any number of intermediates with arbitrary fitness losses. Earlier results are derived as special cases. For the bulk of the paper, we study how an asexual population traverses a particular fitness valley. That is, we imagine that there is one set of mutations that a population must acquire in one specific order to reach one specific beneficial genotype, and that all mutations away from this specific pathway are strongly deleterious. In analyzing this process, we focus primarily on the tunneling process in large populations, but also study the transition to the small-population regime. Our framework allows us to study not only the probability of stochastic tunneling, but also the dynamics of the intermediate mutations, and hence the time required for the beneficial combination of mutations to arise. Our analysis is very much in the spirit of Karlin (1973), as well as Christiansen et al. (1998), though these authors focused on the case where intermediate mutations were also beneficial. In the Discussion, we consider the situation where multiple valley-crossing and possibly directly uphill pathways are possible, leading to the same or different advantageous genotypes. We explain how our analysis of a single pathway can be applied to this more complex situation.

II. MODEL

We consider an asexual population of of N haploid individuals, and study the process by which this population acquires a beneficial trait that requires mutations at K loci. We refer to this as a “K-hit” process. We assume that all combinations of less than K of these mutations are neutral or deleterious relative to the initial genotype, and that only when an individual has acquired all K of them do they confer a benefit. For the bulk of this paper, we analyze the process by which an asexual population traverses a particular path through this genotype space to acquire the K-hit beneficial mutation. That is, we study one specific order in which the intermediate mutations can be acquired, and implicitly assume that any mutation away from this specific pathway is strongly deleterious. Given this order, the fitness of the individual with k mutations is

w_{k} \equiv {\begin{matrix} 1 & k = 0 \\ 1 - δ_{k} & 1 \leq k \leq K - 1 \\ 1 + s & k = K \end{matrix},

(1)

where δ_k ≥ 0, and we assume that the K-mutant is substantially but not enormously beneficial, 1/N ⪡ s ⪡ 1. We assume that mutations occur from an individual with k mutations to an individual with k + 1 mutations at rate μ_k. The simplest case, K = 2, is illustrated in Fig. 1a. A case with larger K is illustrated in Fig. 1b. In most realistic situations, there will be a variety of different mutational pathways (e.g. different mutation orders) by which the population can acquire the favored genotype; we show how our analysis describes this more complex situation in the Discussion.

FIG. 1 — An illustration of our model. (a) The simplest case, K = 2. Here the wild-type has fitness 1, and there is a possible double-mutant with fitness 1 + s > 1. The single-mutant, however, has fitness 1 – δ₁ < 1. The mutation rate from the wild-type to the single-mutant is μ₀, while the mutation rate of the single-mutant to the double-mutant is μ₁. (b) A slightly more complex example where K = 3. The wild-type again has fitness 1, single-mutants have fitness 1 – δ₁, double-mutants have fitness 1 – δ₂, and the triple-mutant is advantageous with fitness 1 + s.

Our model also applies to asexual diploids, where the fitness of each mutation refers to its fitness given the existing genotype at the homologous portion (e.g. the fitness effect of a mutation from an aa genotype to an Aa genotype is the fitness difference between these two diploid genotypes). In this diploid case, the first mutation of a series could convert a diploid one-locus genotype aa to Aa, and a later mutation to genotype AA; if allele A is a recessive mutation which confers some fitness advantage, mutation in either homologous allele would be individually neutral, but the two mutations together would be advantageous. In this sense our model is related to the earlier work by Karlin and Tavare (1981).

For much of this paper, we will consider the case where the population is large and the frequencies of the intermediate mutations are always small — this is the regime in which stochastic tunneling is important. Under these conditions, we treat the dynamics of the intermediate mutations using a continuous-time branching process, according to which all individuals with k mutations die at rate 1, split into two identical individuals at rate w_k, and split into two individuals, one of which has an additional mutation, at rate μ_k, with all of these processes occurring independently. We assume back-mutations are rare enough to be neglected. This model makes a variety of important quantities easier to calculate analytically, but it is only valid when each individual mutant can be treated independently, which requires that all mutants are at low frequency in the population. We will also consider the case of smaller population sizes and the transition to the regime where the dominant valley-crossing process is sequential fixation of the intermediate genotypes. In this case, we use standard Wright-Fisher dynamics to describe the evolution (Ewens, 2004, pp. 20-21).

III. HEURISTIC ANALYSIS

In this section we lay out a simple intuitive analysis of the valley-crossing problem, which demonstrates the main ideas of our approach. Our analysis follows the general lines of our earlier discussion in Fisher (2007). We first note that when a beneficial mutant arises, it will usually soon go extinct due to random genetic drift. In our haploid model, there is a probability $\frac{1 - e^{- s}}{1 - e^{- N s}} \approx s$ that it will survive this drift, and eventually fix in the population (Ewens, 2004, p. 99). We call the process by which such a lucky beneficial mutant survives drift the establishment of the beneficial mutant; once a beneficial mutation is established (upon reaching a size of order $\frac{1}{s}$ ), its frequency will increase roughly deterministically until the population is dominated by beneficial mutants. We refer to any mutant (beneficial or not) whose descendants will include a beneficial multiple-mutant that will establish as successful. We wish to calculate the time it takes for a beneficial multiple-mutant to first establish.

We begin by considering the simplest case, where a double mutation increases fitness by s (i.e. K = 2), but each of the single mutants is neutral (i.e. δ₁ = 0). We refer to this as a two-hit process. We initially consider a population so large that the single mutations essentially never fix. In this case, double-mutants can be produced in two ways. A wild-type individual could acquire both mutations in a single generation. Alternatively, a wild-type could acquire just one mutation, and this lineage could drift neutrally until a second mutation within the lineage produces a beneficial double-mutant. As we will see, this latter process is much more likely, so we begin by considering the rate at which this occurs. To do so, we must calculate the probability, p₁, that a single-mutant will be successful (i.e. the probability that an individual in its lineage will acquire a second mutation and establish). The essential property of the single-mutant lineage that determines its probability of producing a double-mutant is its time-integrated population size, ∫n(t)dt, because this is the number of mutational opportunities this lineage presents. We call the value of this integral at time t the “weight” at time t of the single-mutant lineage, and denote it by

W (t) \equiv \int_{0}^{t} n (t^{'}) d t^{'} .

(2)

The total number of mutational opportunities before the lineage goes extinct is $W \equiv \lim_{t \to \infty} W (t) = \int_{0}^{\infty} n (t) d t$ , the total weight of the lineage.

To calculate W, we must understand the dynamics of the single-mutant lineages. In a large population these dynamics are quite simple (Desai and Fisher, 2007). Most of the time the lineage will never reach a substantial size, and will die out within a few generations. Its weight will be of order 1, and the probability that a double mutation occurs and establishes from such a lineage is μ₁s times this weight, and is therefore of order μ₁s. But with probability of order 1/T, the lineage will survive for more than T generations. If it does, its population size n(T) will be of order T. This will produce a weight of order T² (a population size of order T for a time of order T). These dynamics are illustrated in Fig. 2, and justified rigorously below and in Appendix B. The probability that such a lineage gives rise to at least one double-mutant that establishes is thus 1 – e^{−Cμ₁sT²}, where C is an unknown constant of order 1 and we have used the fact that the occurrence of successful mutations is a Poisson process (the factor C reflects the fact that our estimate of the probability a lineage will reach a given weight is only roughly correct; see Appendix B for a precise calculation.). This means that lineages that survive longer than $T \sim 1 ∕ \sqrt{μ_{1} s}$ generations (and hence reach size $1 ∕ \sqrt{μ_{1} s}$ individuals) are very likely to produce established double-mutants. Thus with probability $\sqrt{μ_{1} s}$ a single-mutant lives long enough that it is extremely likely to produce a double-mutant that establishes. Since the probability of a single-mutant lineage having weight at least T² falls off only as 1/T, while the expected number of double-mutants produced by the single-mutant lineage increases as T² for $T < 1 ∕ \sqrt{μ_{1} s}$ , the rate at which double-mutants are produced is dominated by these rare lucky single-mutant lineages that reach this size $1 ∕ \sqrt{μ_{1} s}$ . Thus the overall probability that a single-mutant gives rise to a double-mutant that establishes is simply

p_{1} \sim \sqrt{μ_{1} s} .

(3)

FIG. 2 — Sketch of the fate of a mutant lineage which has selective disadvantage δ. Shown is the population size n of the lineage as a function of time t. With probability 1/T, the lineage reaches a size T in roughly T generations, and then drifts to extinction in another T generations. This gives rise to an overall weight W = T². The overall shape of a “typical” such trajectory is shown (in an actual case, the trajectory would be somewhat “jagged” due to stochastic fluctuations in n(t), but the overall shape would be roughly as shown). This is valid for T < 1/δ; the lineage will almost never reach a population size larger than 1/δ.

If the single-mutant intermediates are deleterious, things are only slightly more complicated. In this case, a single-mutant lineage is still effectively neutral while its population size is small compared to 1/δ₁ (Fisher, 2007). On the other hand, it will almost never grow to a size much larger than 1/δ₁. Thus if $δ_{1} < \sqrt{μ_{1} s}$ , single-mutant lineages still reach this size with about the same probability as in the neutral case, and all of the neutral results above apply. We have as before $p_{1} \sim \sqrt{μ_{1} s}$ . The single-mutant is effectively neutral for the purposes of producing double-mutants — note this can be true even if Nδ₁ ⪢ 1 (where the single-mutant is not effectively neutral by conventional definitions). If on the other hand $δ_{1} > \sqrt{μ_{1} s}$ , then the fact that the single-mutant is deleterious matters. In this case, the single-mutant lineage will reach a size of at most order 1/δ₁, have a weight of order $1 ∕ δ_{1}^{2}$ , and give rise to a double-mutant that establishes with probability of order $μ_{1} s ∕ δ_{1}^{2}$ . Since the probability of this happening is of order δ₁, we have

p_{1} \sim \frac{μ_{1} s}{δ_{1}} .

(4)

Note that when $δ_{1} \sim \sqrt{μ_{1} s}$ , this reduces to the neutral result, as it should.

All of our discussion to this point has implicitly assumed that the population size N is large enough that the intermediates can drift to the sizes described above. For the neutral case this means

N ⪢ \frac{1}{\sqrt{μ_{1} s}} .

(5)

When this is true, lineages that typically produce double-mutants that establish can do so while staying small compared to N. When Eq. (5) fails, double-mutants establish primarily after a lucky single-mutant has first drifted to fixation. The probability that the neutral mutation fixes is 1/N, after which it will eventually produce a beneficial double-mutant. So for this small-N case we have

p_{1} \sim 1 ∕ N .

(6)

Note that this approaches our large-N result at the threshold population size, $N \sim 1 ∕ \sqrt{μ_{1} s}$ , as expected. For the case of deleterious mutations, when $δ_{1} < \sqrt{μ_{1} s}$ the condition on N is identical to the neutral case. When $δ_{1} > \sqrt{μ_{1} s}$ , the critical size threshold is instead $N ⪢ \frac{1}{δ_{1}}$ . For N below this threshold, the single-mutant lineages are always effectively neutral (now because the population size is too small for selection to be felt), and hence the small-population neutral-intermediate results apply: double-mutants establish primarily after a lucky single-mutant has first drifted to fixation, and p₁ ~ 1/N. Above this threshold population size, the results above for strongly deleterious intermediates in large populations will generally apply (however, if μ₁s is sufficiently small, even strongly deleterious single-mutants are likely to drift to fixation before producing a double-mutant; we discuss this in more detail in our rigorous analysis below).

To summarize all of these results, we have the following regimes, which we summarize in Fig. 3. For $δ_{1} < \max (\sqrt{μ_{1} s}, 1 ∕ N)$ , the single mutants are effectively neutral; these are the regimes labeled “Neutral” in Fig. 3 (the numerical factors in the boundaries to the regimes follow from our formal analysis below). In this effectively neutral case, for large (but not enormous) N we have $p_{1} \sim \sqrt{μ_{1} s}$ (the “neutral stochastic tunneling” regime in Fig. 3), and this transitions to p₁ ~ 1/N for smaller populations where $N < 1 ∕ \sqrt{μ_{1} s}$ (the “neutral sequential fixation” regime in Fig. 3). (Note that none of our analysis thus far has addressed the large-N “neutral semi-deterministic” and “neutral deterministic” regimes shown in the figure; we discuss these in section IV.E below). For larger $δ_{1} > \max (\sqrt{μ_{1} s}, 1 ∕ N)$ , we have $p_{1} \sim \frac{μ_{1} s}{δ_{1}}$ , as long as μ₁s is not too small. This is the “deleterious tunneling” regime in Fig. 3. As μ₁s approaches 0, this case becomes slightly more complicated, and is discussed below (this is the “deleterious sequential fixation” regime in Fig. 3). The large-N neutral and deleterious stochastic tunneling regimes were studied earlier by Iwasa et al. (2004b) and Serra and Haccou (2007), while the large-δ₁ regimes were analyzed by Weinreich and Chao (2005), although these earlier studies did not explore the full parameter space nor the transitions between regimes.

Thus far we have only considered the probability that a single-mutant lineage will give rise to a double-mutant that establishes. We must also understand the rate at which the single-mutant lineages arise in the first place, and the amount of time that it takes such a lineage to produce a double-mutant, given that it is destined to do so. The former is simple: single-mutants arise as a Poisson process at rate Nμ₀. Since each single-mutant has a probability p₁ of being successful, the expected time until the first successful single-mutant is produced is $\frac{1}{N μ_{0} p_{1}}$ generations. In a large population with neutral or weakly deleterious intermediates, this is $\frac{1}{N μ_{0} \sqrt{μ_{1} s}}$ . We have seen that in this regime the successful single-mutant lineage will typically survive of order $\frac{1}{\sqrt{μ_{1} s}}$ generations, and produce the first successful double-mutant after this time. Thus the total expected time for the successful double-mutant to be produced is $\frac{1}{N μ_{0} \sqrt{μ_{1} s}} + \frac{1}{\sqrt{μ_{1} s}} = \frac{1}{\sqrt{μ_{1} s}} (\frac{1}{N μ_{0}} + 1)$ . Similar calculations apply to the other regimes described above. This expression is only valid for Nμ₀ < 1; if single-mutants are produced more frequently, it is likely that a second successful single-mutant lineage will arise while the first is still drifting, so we can no longer assume that only the first successful lineage matters. We analyze this more carefully in section IV.E below.

We can understand more complex multi-hit processes by iterating the above analysis. For example, consider a three-hit beneficial mutation (K = 3) with neutral intermediates. A double-mutant will produce a successful beneficial triple-mutant with probability $p_{2} = \sqrt{μ_{2} s}$ . Thus in a large enough population a single-mutant will produce a double-mutant destined to produce a successful triple-mutant with probability

p_{1}^{(3)} = \sqrt{μ_{1} p_{2}} = \sqrt{μ_{1} \sqrt{μ_{2} s}} .

(7)

The population size must be large for this to obtain: we need $N ⪢ 1 ∕ \sqrt{μ_{1} \sqrt{μ_{2} s}}$ . For slightly deleterious mutations the result is the same, although now the single-mutant must be very close to neutral for this to hold. When these conditions are met, however, this three-hit process is only a factor of (μ₂/s)^1/4 more improbable than the two-hit one, rather than the naive guess that it would be a factor of μ₂ more improbable. We describe these more complex processes in more detail in our rigorous analysis below.

Finally, we note that although we have drawn a clear line between one-hit and multi-hit processes, the actual distinction is much less sharp. Take for example the case of two-hit processes, K = 2. We showed above that for weakly deleterious intermediates, the behavior is identical to the case of neutral intermediates. In fact, the argument we made there also applies to the case where the intermediates are weakly beneficial. If single-mutants have some advantage σ₁, then when $σ_{1} < \sqrt{μ_{1} s}$ , the fact that the intermediate is advantageous is not felt before the double-mutant arises. Thus our neutral result still applies, and even though each mutation is independently beneficial, the dynamics are still those of a two-hit process with neutral intermediates. This is reflected in the region where δ₁ < 0 in Fig. 3. Similar behavior holds for the more complex multi-hit situation, although the intermediates have to be closer to precisely neutral for the neutral results to apply.

IV. FORMAL ANALYSIS

We now turn to formal analysis, and rigorously derive and extend the results described heuristically above. We first focus on describing the fate of a given k-mutant lineage, using Laplace transforms to calculate the probability that this lineage will be successful for arbitrary selective coefficients and mutation rates. We then calculate the expected time that a successful k-mutant lineage will drift before producing the first successful (k + 1)-mutant. We next consider the entire trajectory of evolution, from the initial wild-type to the eventual fixation of the beneficial mutants, paying special attention to the case of beneficial double-mutants. In doing so, we describe the population sizes for which the beneficial mutants are more likely to establish via a tunneling process than via the sequential fixation through drift of the intermediate mutants.

A. The probability a mutation in a large population is successful

We begin by rigorously calculating p_k, the probability that a k-mutant will be successful. In this section we focus on the case of a large population, where the frequencies of all intermediate mutations are always small compared to the total population size. We also assume that the absolute number of mutants is small enough that we can ignore competition between multiple potentially successful lineages; we will examine this assumption in section IV.E below. We use an argument based on the weights of mutant lineages; for an alternative derivation see Appendix C.

By definition, for k < K, a k-mutant individual will be successful if and only if one of its descendants is a successful (k + 1)-mutant. Thus p_k will depend on p_k₊₁, the probability that a (k + 1)-mutant individual will be successful. p_k₊₁ will in turn depend on p_k₊₂, and so on. We can therefore determine all the probabilities of success recursively, starting with the probability that a (beneficial) K-mutant individual will be successful, $p_{K} = \frac{1 - e^{- s}}{1 - e^{- N s}} \approx s$ .

For a given k-mutant individual, let n(t) be the number of its k-mutant descendants in the population at time t (note that descendants that have accumulated additional mutations are not included in n). Each of these descendants has a probability μ_kdt of producing a (k + 1)-mutant in a time dt, and each of these mutants has a probability p_k₊₁ of being successful. The probability that the chosen lineage will produce a successful mutant in the interval [t, t + dt) is therefore μ_kp_k₊₁n(t)dt. This is a Poisson process, so the total probability that the lineage will produce a successful mutant before going extinct, given n(t), is 1–e^{−∫μ_kp_k₊₁n(t)dt} = 1–e^{−μ_kp_k_+1^W}, where W is the weight as defined above, W ≡∫n(t)dt. To obtain p_k, we must average this over all possible values of W, giving

\begin{matrix} p_{k} & = \int_{0}^{\infty} d w P (W = w) (1 - e^{- μ_{k} p_{k + 1} w}) \\ = 1 - E [e^{- μ_{k} p_{k + 1} W}], \end{matrix}

(8)

where P(W = w)dw is the probability that the weight is between w and w + dw.

We see that p_k depends only on the expectation of the exponential of W. This makes it convenient to consider the Laplace transform φ of the probability density function, defined as

φ (y) \equiv \int_{0}^{\infty} d w P (W = w) e^{- y w} = E [e^{- y W}] .

(9)

With this definition, we can rewrite Eq. (8) as

p_{k} = 1 - φ (μ_{k} p_{k + 1}) .

(10)

We calculate φ in Appendix A and find

φ (y) = \frac{2 - δ_{k} + y - \sqrt{{(δ_{k} - y)}^{2} + 4 y}}{2 (1 - δ_{k})} .

(11)

Combining Eqs. (10) and (11), we find that p_k is given by

p_{k} = \frac{- δ_{k} - μ_{k} p_{k + 1} + \sqrt{{(δ_{k} - μ_{k} p_{k + 1})}^{2} + 4 μ_{k} p_{k + 1}}}{2 (1 - δ_{k})} .

(12)

The derivation of Eq. (11) assumes that N is large enough that the probability that k-mutants will drift to a high frequency is very small, so that such lineages contribute negligibly to the integral in Eq. (9). This requirement becomes more stringent as y approaches 0, so for very small μ_kp_k₊₁ the population must be very large for Eq. (12) to hold. We will derive the explicit condition on N in section IV.D below.

For biological values of μ_k and p_k₊₁, we will typically have μ_kp_k₊₁ ⪡ 1, in which case Eq. (12) is well-approximated for any value of δ_k < 1 by

p_{k} \approx \frac{- δ_{k} + \sqrt{δ_{k}^{2} + 4 μ_{k} p_{k + 1}}}{2} .

(13)

In the limits of neutral or strongly deleterious mutations, Eq. (13) simplifies further to

p_{k} \approx {\begin{matrix} \sqrt{μ_{k} p_{k + 1}} & for δ_{k} ⪡ 2 \sqrt{μ_{k} p_{k + 1}} \\ μ_{k} p_{k + 1} ∕ δ_{k} & for δ_{k} ⪢ 2 \sqrt{μ_{k} p_{k + 1}} \end{matrix},

(14)

which agrees with the heuristic calculation given above.

So far, we have written p_k in terms of p_k₊₁, but we can apply Eq. (12) recursively to get an expression for p_k involving only the selection coefficients and the mutation rates. Most important is p₁, the probability that a given single-mutant will eventually give rise to the successful beneficial K-mutant. The general expression that this gives is not illuminating, but in the cases where the intermediate mutants are either all close to neutral or all strongly deleterious, we find the simple expressions

p_{1} \approx {\begin{matrix} s^{1 ∕ 2^{K - 1}} \prod_{j = 1}^{K - 1} μ_{j}^{1 ∕ 2^{j}} & for δ_{k} ⪡ 2 \sqrt{μ_{k} p_{k + 1}}, k = 1, \dots, K - 1 \\ s \prod_{j = 1}^{K - 1} (μ_{j} ∕ δ_{j}) & for δ_{k} ⪢ 2 \sqrt{μ_{k} p_{k + 1}}, k = 1, \dots, K - 1 . \end{matrix}

(15)

When K is large and all the intermediate mutants are close to neutral, note that the probability p₁ that a single-mutant will be successful depends only weakly on the ultimate selective advantage s of the eventual beneficial mutants. This is because p₁ is essentially the probability that a single-mutant lineage will drift to a very large size and generate many mutants, and it is relatively likely that the first such large lineage will drift to an enormous size, large enough so that at least a few descendants are likely to acquire a very large number of mutations. In contrast, when the intermediate mutants are strongly deleterious, the successful single-mutant lineage is likely to be small and produce just a few lucky mutant offspring, and the same is true for all the other successful intermediate mutant lineages. Since each k-mutant lineage must be roughly equally lucky, the selection coefficients and mutation rates at the end of the valley are just as important to p₁ as δ₁ and μ₁.

Note that these results are only valid when the population size is large enough that intermediate mutants are always at low frequency in the total population. This condition is not particularly restrictive when the intermediate mutations are strongly deleterious. However, it can easily fail when the intermediates are close to neutral; in particular when K is large the population sizes required are often enormous. We calculate the population sizes needed for these results to hold, and describe what happens when they do not, in section IV.D below. These critical population sizes are also illustrated in Fig. 3, as the boundary between the “sequential fixation” and “tunneling” regimes. Further, our results above also require that population sizes not be too large. When they exceed a critical value, we must consider competition between different mutant lineages. This situation is the “neutral semi-deterministic tunneling” and “deterministic” regimes in Fig. 3; we analyze these in section IV.E below.

B. Time to the generation of the next mutant

Thus far we have focused on the probability that a given mutation will be successful. What we are ultimately interested in, however, is the time T_e that it takes for the beneficial multiple mutant to establish. We will focus on calculating the expected time, τ ≡ E[T_e]. To do this, it will be helpful to write T_e as $T_{e} = T_{0} + \sum_{k = 1}^{K - 1} T_{k} + T_{K}$ , where T₀ is the time for the first successful single-mutant to occur, T_k is the time that the first successful k-mutant drifts before producing the first successful k + 1-mutant, and T_K is the establishment time of the first beneficial mutant, once it arises. An example of the dynamics and this division of T_e for the case K = 3 is illustrated in Fig. 4. Defining τ_k ≡ E[T_k], we can write the total expected time as $τ = τ_{0} + \sum_{k = 1}^{K - 1} τ_{k} + τ_{K}$ , and find τ by calculating τ₀, τ_K, and τ_k. Note, however, that this division of T_e is invalid at large population sizes because sometimes the first successful mutant will be overtaken by a later one; we describe this situation in section IV.E below.

FIG. 4 — A typical example of the dynamics by which a beneficial triple-mutant (K = 3) is acquired, from an individual-based computer simulation. Shown in light gray is the population size of single-mutants, in darker gray is the population size of double-mutants, and the darkest gray represents triple mutants. Inset is a magnified view of the last few hundred generations in this simulated dynamics. The waiting time T₀ for the first successful single-mutant to arise is shown along with the time T₁ this single-mutant drifts before giving rise to the first successful double-mutant, the time T₂ this double-mutant drifts before giving rise to the first successful triple-mutant, and the time T₃ for this triple-mutant to establish. We see that the total time until the beneficial mutant establishes (*T_e*) is dominated by T₀. The total population size is N = 10⁵, the fitness parameters are δ₁ = δ₂ = 10⁻³, s = 0.1, and the mutation rates are μ₀ = 10⁻⁶, μ₁ = μ₂ = 10⁻⁴, which corresponds to a neutral stochastic tunneling regime.

We begin by calculating τ₀. Making the assumption that the population is dominated by wild-type individuals until the beneficial mutation arises, new single-mutant lineages are generated at a roughly constant rate Nμ₀ (we discuss the situation when this assumption is invalid in section IV.D below). Since each of these lineages has a probability p₁ of being successful, T₀ will be exponentially distributed with expectation $τ_{0} = \frac{1}{N μ_{0} p_{1}}$ .

The expected time τ_K for a successful beneficial mutant lineage to establish has been analyzed by Barton (1998) and Desai and Fisher (2007), who found $τ_{K} = (γ - \log (1 + s)) ∕ s \approx \frac{γ}{s}$ , where γ = 0.577 … is Euler’s constant.

We now turn to calculating τ_k, the expected time for which a successful k-mutant lineage will drift before producing the successful (k + 1)-mutant. To do this, we calculate the probability p_k(t) that the lineage will be successful by time t (i.e., the probability that a k-mutant lineage will produce a successful (k + 1)-mutant after drifting for no more than time t). Just as p_k depends on the expectation of the exponential of the weight W of the lineage, p_k(t) depends on the expectation of the exponential of the weight accumulated by time t, defined as $W (t) \equiv \int_{0}^{t} d t^{'} n (t^{'})$ . The probability of success by time t is then

p_{k} (t) = 1 - E [e^{- μ_{k} p_{k + 1} W (t)}] .

(16)

Again, we see that it is natural to use a Laplace transform for W(t) to calculate p_k(t). We thus define the Laplace transform ϕ(y, t) ≡ E[e^−yW(t). In Appendix A, we find that ϕ(y, t) is given by

ϕ (y, t) = \frac{a_{-} (a_{+} - 1) + a_{+} (1 - a_{-}) \exp [- (1 - δ_{k}) (a_{+} - a_{-}) t]}{a_{+} - 1 + (1 - a_{-}) \exp [- (1 - δ_{k}) (a_{+} - a_{-}) t]},

(17)

where the dependence on y is contained in a_±, which are defined as

a_{\pm} \equiv \frac{2 - δ_{k} - y \pm \sqrt{{(δ_{k} + y)}^{2} - 4 y}}{2 (1 - δ_{k})} .

(18)

Combining this with Eq. (16) gives an explicit expression for p_k(t):

p_{k} (t) = \frac{(a_{+} - 1) (1 - a_{-}) (1 - \exp [- (1 - δ_{k}) (a_{+} - a_{-}) t])}{a_{+} - 1 + (1 - a_{-}) \exp [- (1 - δ_{k}) (a_{+} - a_{-}) t]},

(19)

with y = μ_kp_k₊₁ in a_±. Note that as t increases, W(t) approaches W, and ϕ and p_k(t) approach the values from the previous section: lim_t→∞ ϕ(y, t) = φ(y) and lim_t→∞ p_k(t) = p_k.

From p_k(t), it is straightforward to find τ_k. The cumulative distribution function for T_k, the time for a successful k-mutant lineage to produce a successful (k + 1)-mutant, is just p_k(t)/p_k. Since T_k is a continuous positive random variable, its expectation is given by the integral of the complement of this cumulative distribution function:

τ_{k} = \int_{0}^{\infty} d t (1 - \frac{p_{k} (t)}{p_{k}}) .

(20)

Inserting the values from Eq. (12) and Eq. (19) into Eq. (20) and performing the integration gives

τ_{k} = \frac{2 \log (\frac{2 \sqrt{{(δ_{k} - μ_{k} p_{k + 1})}^{2} + 4 μ_{k} p_{k + 1}}}{δ_{k} + μ_{k} p_{k + 1} + \sqrt{{(δ_{k} - μ_{k} p_{k + 1})}^{2} + 4 μ_{k} p_{k + 1}}})}{- δ_{k} - μ_{k} p_{k + 1} + \sqrt{{(δ_{k} - μ_{k} p_{k + 1})}^{2} + 4 μ_{k} p_{k + 1}}} .

(21)

For biological parameter values (μ_kp_k₊₁ ⪡ 1), this is

τ_{k} \approx \frac{2 \log [2 ∕ (1 + δ_{k} ∕ \sqrt{δ_{k}^{2} + 4 μ_{k} p_{k + 1}})]}{- δ_{k} + \sqrt{δ_{k}^{2} + 4 μ_{k} p_{k + 1}}} .

(22)

When all the mutations are either nearly neutral or strongly deleterious, this simplifies to

τ_{k} \approx {\begin{matrix} \log 2 ∕ \sqrt{μ_{k} p_{k + 1}} & for δ_{k} ⪡ 2 \sqrt{μ_{k} p_{k + 1}} \\ 1 ∕ δ_{k} & for δ_{k} ⪢ 2 \sqrt{μ_{k} p_{k + 1}} \end{matrix} .

(23)

Note that Eq. (23) agrees with our earlier heuristic arguments that a successful neutral k-mutant lineage should drift for approximately $1 ∕ \sqrt{μ_{k} p_{k + 1}}$ generations, and that a successful deleterious k-mutant lineage is likely to have survived for approximately 1/δ_k generations.

Putting these results together, we find that the total expected time until the beneficial mutants establish is

τ = \frac{1}{N μ_{0} p_{1}} + \sum_{k = 1}^{K - 1} τ_{k} + \frac{γ}{s},

(24)

where τ_k is given by Eq. (21). As we will describe in section IV.E below, Eq. (24) is only valid when τ is dominated by τ₀, the expected waiting time until the first successful single-mutant occurs. This is the situation considered by Weinreich and Chao (2005) and Iwasa et al. (2004b). As can be seen from our expressions above, a sufficient condition for this to be true is Nμ₀ ⪡ 1. Below we will derive alternative expressions for τ valid in larger populations, Nμ₀ ⪢ 1.

C. The case K = 2, beneficial double mutants

Many of the above complex expressions are easier to understand in the simplest possible case, K = 2. In this case, p₂ = s, and the chance that a single mutant individual will be successful is (from Eq. (13))

p_{1} = \frac{- δ_{1} + \sqrt{δ_{1}^{2} + 4 μ_{1} s}}{2}

(25)

\approx {\begin{matrix} \sqrt{μ_{1} s} & for δ_{1} ⪡ 2 \sqrt{μ_{1} s} \\ μ_{1} s ∕ δ_{1} & for δ_{1} ⪢ 2 \sqrt{μ_{1} s} \end{matrix} .

(26)

Eq. (25) agrees with the result (10) in Iwasa et al. (2004b), $p_{1} = \frac{- δ_{1} + \sqrt{δ_{1}^{2} + 2 (2 - δ_{1}) μ_{1} s}}{2 - δ_{1}}$ , to leading order in δ₁ and μ₁s. The small- and large-δ₁ approximations thus agree as well.

The expected time until the spread of the beneficial mutants is (from Eqs. (23) and (24))

τ = \frac{1}{N μ_{0} p_{1}} + τ_{1} + \frac{γ}{s}

(27)

\approx {\begin{matrix} 1 ∕ (N μ_{0} \sqrt{μ_{1} s}) & for δ_{1} ⪡ 2 \sqrt{μ_{1} s} \\ δ_{1} ∕ (N μ_{0} μ_{1} s) ∕ & for δ_{1} ⪢ 2 \sqrt{μ_{1} s} \end{matrix} .

(28)

In Eq. (28), we have used the fact that τ₁ and $\frac{γ}{s}$ can be neglected for Nμ₀ ⪡ 1. Note that the strongly-deleterious approximation reduces to Eq. (2) in Weinreich and Chao (2005), $τ = \frac{δ_{1}}{N μ^{2} s}$ (where we have adjusted for a haploid population) when the mutation rates are equal, μ₀ = μ₁ ≡ μ.

D. Smaller population sizes

So far we have assumed that N is large enough that intermediate mutations never become a substantial fraction of the total population before the beneficial mutants start to spread. This was implicit in our calculation of τ₀, as well as in the branching process approximation we used to calculate the rest of the τ_k. For smaller N, we can no longer treat mutant lineages as independent, so the branching process approximation breaks down. In addition, it becomes likely that intermediate mutants will drift to fixation before the beneficial mutants are generated; the dynamics shift from being dominated by stochastic tunneling to being dominated by sequential fixation. In this section, we calculate the value of N that separates these two regimes, and find the rate at which beneficial mutations are produced in the small-population, sequential-fixation regime.

For simplicity, first consider the case where double-mutants are beneficial, so that the probability p₁ that a single-mutant individual is successful via tunneling is given by Eq. (25). A given single-mutant individual has probability

ρ_{1} = \frac{e^{δ_{1}} - 1}{e^{N δ_{1}} - 1}

(29)

of giving rise to a lineage that will drift to fixation. If p₁ ⪢ ρ₁, then stochastic tunneling will dominate the dynamics, and the assumption we made in deriving Eq. (12) that lineages which drift to fixation make only a negligible contribution to p₁ will necessarily be valid. If ρ₁ ⪢ p₁, then the single-mutant genotype is likely to dominate the population before the first successful beneficial mutant is produced. The transition p₁ = ρ₁ occurs at the population size N_×, where

N_{\times} = \frac{1}{δ_{1}} \log (1 + \frac{e^{δ_{1}} - 1}{p_{1}})

(30)

\approx {\begin{matrix} 1 ∕ \sqrt{μ_{1} s} & for δ_{1} ⪡ 2 \sqrt{μ_{1} s} \\ \frac{1}{δ_{1}} ∕ \log (1 + \frac{δ_{1} (e^{δ_{1}} - 1)}{μ_{1} s}) & for δ_{1} ⪢ 2 \sqrt{μ_{1} s} \end{matrix} .

(31)

This threshold population size N_× is the boundary between the tunneling and sequential fixation regimes shown in Fig. 3. Note that the second expression for N_× in Eq. (31) is always smaller than the first, so $N ⪢ 1 ∕ \sqrt{μ_{1} s}$ is always a sufficient condition for tunneling to be more likely than fixation of single mutants. Again, this agrees with our intuition that if a lineage drifts to size ~ $1 ∕ \sqrt{μ_{1} s}$ , it will achieve a weight ~ 1/μ₁s, and will therefore be likely to have generated a successful double mutant. If δ₁ ⪡ 1, the second expression in Eq. (31) reduces to $N_{\times} \approx \frac{1}{δ_{1}} \log (\frac{δ_{1}^{2}}{μ_{1} s})$ (this is the boundary between the deleterious sequential fixation and deleterious tunneling regimes in Fig. 3), in agreement with the result (4) of Weinreich and Chao (2005), adjusted for a haploid population. Intuitively, we can understand this result by noting that if N < 1/δ₁, the single-mutant is effectively neutral and can drift to fixation, and that even if N ~ 1/δ₁, so that the single-mutant is slightly deleterious, it can still fix before generating a successful double-mutant individual if the rate of producing successful double mutants, μ₁s, is small enough (this is the “deleterious sequential fixation” regime in Fig. 3).

The expected waiting time τ_seq for the beneficial mutants to establish via fixation of the intermediate mutants is approximately the sum of the expected time for a single-mutant lineage destined for fixation to be produced and the expected time for a successful double-mutant lineage to arise on a background of single-mutants:

τ_{seq} \approx \frac{1}{N μ_{0} ρ_{1}} + \frac{1}{N μ_{1} (s + δ_{1})} .

(32)

Note this equation assumes no back-mutations which would take the population back to the original wild-type, though it would be straightforward to include these (this would increase τ_seq by a factor of roughly two if the forward and back mutation rates were the same). In practice, τ_seq will be dominated by either the first or second term; when the two are comparable we will typically be in a parameter regime where Eq. (32) does not apply. Note that since τ_seq is only relevant in small populations, the time for the single-mutants to drift to fixation after being produced and the time for the successful double mutants to establish after being produced will generally be much smaller than the terms we have included in Eq. (32). It is straightforward to show that as the population size approaches the threshold value N_×, the expected waiting time τ_seq approaches the value τ found for the parameter regime where tunneling is more likely, as long as we assume N_×μ₀ ⪡ 1 so that Eq. (27) for τ is valid. Thus, the two parameter regimes join smoothly together.

For larger values of K, Eq. (30) and Eq. (31) for the threshold population size are still valid, provided we replace s by p₂. However, for K > 2, N_× only describes the population size above which the first successful double-mutant individual is likely to be produced before single-mutant individuals dominate the population, while the dominant valley-crossing dynamics may involve a combination of some intermediate mutants fixing and others succeeding via tunneling. Thus even for N > N_×, it may still be likely that intermediate mutants reach a high frequency or even fix before the first successful K-mutant individual is produced, and even if N < N_× stochastic tunneling may still play an important part in the valley-crossing process.

We will explicitly characterize the dynamics of the simplest case, in which all intermediate mutants are neutral. In this case, a single k-mutant has a probability ρ_k = 1/N of producing a lineage which drifts to fixation. If p_k > 1/N for a given k < K, then the successful k-mutant lineage is likely to produce the successful k + 1-mutant lineage before it drifts to fixation, and our derivation of Eq. (12) is valid. Thus there is a sequence of K – 2 threshold population sizes $N_{\times}^{(k)} \equiv 1 ∕ p_{k}$ for 1 < k < K, with $N_{\times}^{(k - 1)} < \dots < N_{\times}^{(2)} < N_{\times}$ . For N between $N_{\times}^{(k + 1)}$ and $N_{\times}^{(k)}$ , the first k mutations are likely to drift to fixation, after which it is likely that the beneficial mutants will establish via stochastic tunneling before the k + 1-mutants can drift to fixation. The total expected time $τ_{seq}^{(k)}$ for populations with size between $N_{\times}^{(k + 1)}$ and $N_{\times}^{(k)}$ to cross the valley is approximately the sum of the expected times for each of the first k successful mutant lineages to be produced once the preceding genotype is fixed in the population, along with the expected time for the successful k + 1-mutant lineage to be produced after the k-mutants drift to fixation:

τ_{seq}^{(k)} \approx \sum_{i = 0}^{k - 1} \frac{1}{μ_{i}} + \frac{1}{N μ_{k} p_{k + 1}} .

(33)

Here we are ignoring the relatively small times for successful mutant lineages to drift to fixation or produce the next successful mutant lineage by tunneling. The generalization to arbitrarily deleterious intermediate mutants is straightforward, although the effect of the fixation of deleterious mutants on the mean fitness must be taken into account, as well as the possibility that there may be population sizes for which the dominant dynamics involve tunneling through lower-order mutants with higher-order mutants drifting to fixation.

E. Larger population sizes

Eq. (24) for the total expected time for the establishment of the beneficial mutants implicitly assumes that the first successful mutant lineage will be the one that eventually dominates the population. But if the population size is sufficiently large, multiple lineages may compete against each other. In particular, the first lineage that would have been successful in the absence of competition may be superseded by a later lineage that happens to produce beneficial mutants unusually quickly (i.e., in much less than the mean time $\sum_{k = 1}^{K} τ_{k}$ ). This will occur with an appreciable frequency if the expected time for a successful lineage to drift before the beneficial mutation establishes is larger than the expected time for a successful lineage to arise: $\sum_{k = 1}^{K} τ_{k} > τ_{0}$ . In this section, we explore the values of N for which our approximation is valid, and describe what happens when it is not. We restrict our analysis to the case K = 2 for simplicity.

For Nμ₀ ⪡ 1, we have seen above that $τ_{0} ⪢ \sum_{k = 1}^{K} τ_{k}$ , so no more than a few mutant lineages will typically be present in the population at a given time. Thus the first successful lineage will produce beneficial mutants that establish before another successful lineage can arise and overtake it. Our approximation above is valid, and the dynamics of the system are characterized by the time τ derived above.

We now consider what happens for larger values of N. For Nμ₀ ⪢ 1, the total number of single mutants at time t, which we will denote n₁(t), is well-approximated by its expectation (Fisher, 2007). Thus we can solve the deterministic differential equation for n₁(t) to obtain

\begin{matrix} n_{1} (t) & = \frac{N μ_{0}}{δ_{1}} (1 - e^{- δ_{1} t}) \\ \approx {\begin{matrix} N μ_{0} t & for δ_{1} t ⪡ 1 \\ N μ_{0} ∕ δ_{1} & for δ_{1} t ⪢ 1, \end{matrix} \end{matrix}

(34)

where the first line of Eq. (34) corresponds to effectively neutral single-mutants, and the second to deleterious single-mutants in mutation-selection balance. Double-mutants will be produced at the (time-dependent) rate R = n₁(t)μ₁.

If this rate R remains less than 1 until after the first double-mutant lineage establishes, then typically that first established lineage will dominate the population, and we can find the expected time, τ_sd, for this lineage to arise using an analysis similar to that used for single-mutant lineages above. That is, we define W₁(t) to be the total weight of single-mutants by time t, $W_{1} (t) \equiv \int_{0}^{t} d t^{'} n (t^{'})$ . Then the probability that a successful double-mutant lineage will have been produced by time t is p₂(t) = 1 – e^{−μ₁sW₁(t)}. The expected time is therefore given by

\begin{matrix} τ_{sd} & = \int_{0}^{\infty} d t e^{- μ_{1} s W_{1} (t)} \\ \approx {\begin{matrix} \sqrt{π ∕ (2 N μ_{0} μ_{1} s)} & for δ_{1} ⪡ \sqrt{π N μ_{0} μ_{1} s ∕ 2} \\ δ_{1} ∕ (N μ_{0} μ_{1} s) & for δ_{1} ⪢ \sqrt{π N μ_{0} μ_{1} s ∕ 2}, \end{matrix} \end{matrix}

(35)

where we have used Eq. (34) to derive the approximations in Eq. (35). Note that the expected establishment time in the case of deleterious intermediates is the same as found for Nμ₀ ⪡ 1 in Eq. (28). However, the expected time for neutral intermediates is different, and the threshold value of δ₁ below which single-mutants are effectively neutral is larger by a factor proportional to $\sqrt{N μ_{0}}$ . We refer to this large-N neutral regime as the “neutral semi-deterministic tunneling” regime in Fig. 3 (the subscript in τ_sd refers to “semi-deterministic”). There is no corresponding deleterious semi-deterministic tunneling regime, since as we have just seen in the deleterious case the establishment time is the same as it is for Nμ₀ ⪡ 1.

As mentioned above, Eq. (35) is only valid when double-mutants are produced rarely enough that the first successful double-mutant lineage will dominate the population (i.e. R < 1). From the time when this first successful lineage arises, it takes ~ γ/s generations to establish, after which it has a doubling time of ~ log 2/s, while new successful double-mutant lineages are being produced at a rate ~ n₁(τ_sd)μ₁s. So a second double-mutant lineage will be likely to establish and make a significant contribution to the total double-mutant population if $(n_{1} (τ_{sd}) μ_{1} s) (\frac{1}{s}) ⪢ 1$ . Note that this is the same as the condition for being able to treat the double-mutant population deterministically, R = n₁(τ_sd)μ₁ ⪢ 1. Using our expression for τ_sd above, we see that Eq. (35) is valid for N < N_det, where

N_{\det} μ_{0} = {\begin{matrix} 2 s ∕ π μ_{1} & for δ_{1} ⪡ \sqrt{π N μ_{0} μ_{1} s ∕ 2} \\ δ_{1} ∕ μ_{1} & for δ_{1} ⪢ \sqrt{π N μ_{0} μ_{1} s ∕ 2} . \end{matrix}

(36)

This N_det is the boundary between the “deterministic” and “neutral semi-deterministic/deleterious tunneling” regimes in Fig. 3.

In the deterministic regime, where N > N_det, multiple lineages of double-mutants are produced very quickly, and the total number of double-mutants is well-approximated by its mean. By solving the deterministic differential equation for the number of double-mutants, we find that this mean is $n_{2} (t) \approx \frac{N μ_{0} μ_{1}}{s (s + δ_{1}) e^{s t}}$ . We can use this to calculate the time for the beneficial mutants to establish, τ_d (here the subscript refers to “deterministic”), which is the time at which $n_{2} = \frac{1}{s}$ . We find

τ_{d} = \log (\frac{s + δ_{1}}{N μ_{0} μ_{1}}) .

(37)

Note that τ_d < 0 for sufficiently large Nμ₀, reflecting the fact that in extremely large populations there will be even more double-mutants at long times than would be expected from a single successful lineage arising immediately at t = 0 (see Desai and Fisher (2007) for a discussion of these subtleties in the definition of the establishment time).

Combining this expression for τ_d with Eq. (28) for τ, Eq. (32) for τ_seq, and Eq. (35) for τ_sd, we now have a complete description of the typical trajectories of populations with K = 2 for all biological values of N, μ₀, μ₁, δ₁, and s. This is shown in Fig. 3. As a function of N, there are four regimes. For very small values of N, N < N_× (as defined in Eq. (30)), the mutations fix sequentially and the beneficial mutant establishes after a time τ_seq, as given by Eq. (32). These are the neutral and deleterious sequential fixation regimes in Fig. 3. For N_× < N < 1/μ₀, our main analysis applies and the beneficial mutants establish after a time τ as given by Eq. (24). These are the neutral stochastic tunneling and deleterious tunneling regimes in Fig. 3. For 1/μ₀ < N < N_det, we can treat the single-mutants deterministically but still require a stochastic analysis of the beneficial mutants; this is the neutral semi-deterministic tunneling regime in Fig. 3, as well as the large-N part of the deleterious tunneling regime. In this semi-deterministic regime, the beneficial mutants establish after a time τ_sd, as given by Eq. (35). Finally, for N > N_det, the analysis is fully deterministic and the beneficial mutants establish after a time τ_d, as given by Eq. (37). This is the deterministic regime in Fig. 3.

For larger values of K, there are yet more possible regimes. In these cases, when Nμ₀ ⪡ 1, the time for the beneficial mutants to establish is given by Eq. (24), the extension of Eq. (32), or Eq. (33) (or its generalization for deleterious intermediate mutants), as described above. When N is larger than this, however, an analysis in the spirit of this section is necessary. In general, there can be a regime where single-mutants are treated deterministically but stochastic analysis is required for the rest, a regime where single and double-mutants are treated deterministically but stochastic analysis is needed for the rest, and so on. Note that there may be some regimes where population is large enough that some mutant classes must be treated deterministically, but also small enough that some intermediate mutants are likely to fix. We do not enumerate all the possibilities here, but these cases can all be analyzed using the approach we have developed above.

V. SIMULATIONS

To complement the analytical results described above, we performed stochastic individual-based computer simulations of our model. We focused on the cases K = 2 and K = 3, and verified our results for the time τ it takes for the population to acquire the K-hit adaptation, across a range of population sizes, mutation rates, and fitnesses of the intermediates.

To implement our simulations, we evolved a simulated population using time steps of dt = 10⁻² generations. At the beginning of each time step, the mean fitness $\overset{‒}{w}$ of the population was calculated, after which each k-mutant individual divided into two k-mutants with probability $(1 + w_{k} - \overset{‒}{w}) d t$ , produced a (k + 1)-mutant with probability μ_kdt, and died with probability dt. These three events occurred simultaneously and independently for each individual, for a total of 3N independent events per time step. If the population size N* at the end of a time step was different from N, then the population size was normalized by multiplying the size of each mutant class by N/N*, rounding to the nearest integer, and then adjusting the number of individuals in the largest class to bring the total number of individuals to exactly N. Each simulation run continued until the population consisted entirely of K-mutants. At the end of the run, the last time at which there were no K-mutants was recorded, and taken to be the time of the production of the first successful K-mutant. We ran 500 independent simulations for each set of parameters, and averaged these results to produce each data point shown in Fig. 5 and Fig. 6.

In Fig. 5, we compare our theoretical predictions for the time, τ, until the advantageous genotype establishes to our simulation results for the case K = 2. Our theoretical results are in excellent agreement with the simulations across a range of population sizes N, fitness costs of the single-mutant δ₁, and mutation rate of the single mutant μ₁. Note in particular that our theory accurately describes the transitions between the sequential fixation, tunneling, and semi-deterministic regimes as a function of N (Fig. 5a-b), and the transition between the neutral tunneling and deleterious tunneling regimes as a function of δ₁ and μ₁ (Fig. 5c-d). However, right at the transition Nμ₀ ≈ 1 between the neutral stochastic tunneling regime and the semi-deterministic regime, our predictions are only accurate to about 30% (Fig. 5a). This is presumably because in this regime, both stochastic fluctuations in the number of single-mutants and competition between mutants are important.

In Fig. 6, we compare our theoretical predictions to the results of our simulations for the case K = 3. Once again, our theoretical results are in good agreement with the simulations, and accurately describe the transitions between the various regimes described by our analysis.

VI. DISCUSSION

Our analysis has provided a complete description of the rate at which an asexual population traverses a specific path through genotype space, involving fitness valleys or plateaus, to a particular fitter genotype. In general, however, there can be several different possible paths to the same final genotype. More interestingly, there could be many different fitter genotypes that are several mutations away from the original wild-type, with different paths leading to each.

In each such complex situation, the probability that evolution proceeds along any particular pathway or finds any particular beneficial genotype depends on the mutation rates of selective pressures involved — that is, on the local structure of the fitness landscape. Since very little is known about this structure, we cannot say which of the various modes of evolution (possibly involving various degrees of valley-crossing) are most important in nature. Instead, we will aim to describe a range of qualitatively different types of evolutionary behavior that are possible, and to understand which aspects of the structure of fitness landscapes are key to determining which of these different modes are important in nature. As we will see, this leads to some surprising insights — for example, that valley-crossing processes could be quite routine even when directly uphill pathways are also possible.

In order to address these questions, we must extend our analysis of simple evolutionary trajectories, in which only there is only one possible pathway to one possible beneficial genotype, to describe more complex situations with several possible pathways to several possible beneficial genotypes. Fortunately, each of these more complex possibilities can be broken down into multiple possible simpler evolutionary trajectories, of the type we have analyzed above. Thus our analysis provides a toolbox for studying these more complex situations. Note that in principle our earlier results also allow us to consider the case where one of the possible mutations increases the mutation rates at the other loci, as for example is common for some forms of cancer, although we will not explicitly discuss this situation here (Lengauer et al., 1998).

It may initially seem that the rate at which a population acquires a particular favorable genotype is simply the sum of the rates at which the population traverses all the possible pathways to that genotype. However, this is not the case. Instead, the overall rate involves a complicated combination of the probabilities of each possible path and the fitnesses of all the different intermediates. The basic types of situations that can arise are illustrated in Fig. 7. The essential rule is that whenever two pathways are entirely disjoint, the overall rate at which the population acquires an adaptation is the sum of the rates for each pathway. The same is true for pathways that overlap (i.e. share intermediate genotypes) without branching, or that branch (i.e. an intermediate genotype can mutate into two or more different further intermediate genotypes) at genotypes that are strongly deleterious.

FIG. 7 — Characteristic situations where there are several different pathways to a final advantageous genotype or genotypes. In each panel, the initial genotype is at the left, the final advantageous genotype or genotypes at the right, and all intermediate genotypes are assumed to be neutral or deleterious. In (a), the population initially has genotype abcd, and there are two entirely disjoint pathways by which evolution can proceed: either mutation A and then mutation B can occur, or mutation C and then mutation D. The overall rate at which a population acquires one of the two fitter genotypes equals the sum of the rates of acquiring each separately, regardless of the fitness of the intermediates. The probability the population acquires genotype AB before genotype CD is given by their relative rates. In (b), the population can acquire mutations A or B in either order, and then mutation C. Again, the overall rate equals the sum of the rates for the two possible pathways independently. In (c), the potential evolutionary trajectory branches after the population acquires mutation A. The population can then acquire mutations B and C, or mutations D and E, to reach an advantageous genotype. If genotype Abcde is strongly deleterious, the overall rate is the sum of the rates for each pathway. If Abcde is effectively neutral, the overall rate is *lower* than the sum of the rates for each pathway. In (d), the three mutations A, B, and C can be acquired in any order to reach advantageous genotype ABC. Because there are intermediate branching points, the overall rate is *lower* than the sum of the rates for each possible pathway if the single-mutants are effectively neutral.

On the other hand, when a branching point is effectively neutral, the behavior is more complex. Imagine for example a given intermediate genotype A, where mutations to genotypes B and C are possible, with rates μ_B and μ_C respectively. Each of these further mutations has some probability of eventually being successful, p_B and p_C respectively. If only mutation B were possible, the intermediate genotype would have to drift to a size $n_{B} = \frac{1}{μ_{B} p_{B}}$ to be likely to give rise to a successful mutant. This would occur with probability $\sqrt{μ_{B} p_{B}}$ . Since both mutations are possible, the intermediate genotype only has to drift to a size $n_{B C} = \frac{1}{μ_{B} p_{B} + μ_{C} p_{C}}$ , which occurs with probability $\sqrt{μ_{B} p_{B} + μ_{C} p_{C}}$ . Note that the difference between the probability of reaching size n_BC and that of reaching size n_B is smaller than the difference in the sizes themselves. Essentially, the intermediate A can drift to a much smaller size and still be successful, but this is only increased in probability by the square root of the ratio of the sizes. Thus the rate of the overall process is only the square root of the sum of the rates of the two processes.

Several specific examples help illustrate the behavior. We begin by considering the situation where there is only one advantageous genotype possible, separated from the original wild-type by K mutational steps. Because the mutations can occur in different orders, there are multiple paths to acquire this fitter genotype — since K mutations are needed, there are a total of K! possibilities. It will prove useful to compare in each case the rate at which the population acquires the advantageous genotype to the rate if only one of the K! pathways were possible (i.e. the mutations had to be acquired in one specific order).

The simplest case is when all the K! possible intermediates are deleterious with the same sufficiently large δ, and each possible mutation occurs with the same rate μ. Then our earlier analysis applies, with δ_i = δ for all i between 1 and K — 1, and the mutation rate μ_k = (K – k)μ. The result (from Eq. (15)) is that the rate at which the fitter genotype is acquired is increased by a factor of K! relative to the case where each of these mutations has to be acquired in one specific order. In other words, the overall rate is just the sum of the rate of each pathway. More generally, as long as all possible intermediates are sufficiently deleterious, the rate at which the fitter genotype is acquired is the sum over that for all the possible paths, even if the fitnesses of the intermediates (and possibly the mutation rates) depend on the order of the mutations.

If some of the intermediates are effectively neutral, the behavior is more subtle. For example, if all the intermediates are neutral with each mutation having the same rate, μ, one again can use our earlier analysis to include the effects of all the possible orderings, as this is equivalent to having μ₀ = Kμ, μ₁ = (K – 1)μ, …, μ_K_–1 = μ. This yields an overall enhancement of the fixation probability by a factor of K(K – 1)^1/2(K – 2)^1/4 … 2^{1/2^K–2} (again from Eq. (15)), relative to the case where the mutations can only be acquired in one particular order. This factor is no larger than K(K – 1), and hence much less than the K! we would expect if the rates for each of the possible pathways could be simply added together.

Although the scenario of a single beneficial genotype with moderately large K may indeed be relevant in many situations involving high mutation rates, more interesting are situations in which many different beneficial genotypes are within reach. If each of these involves completely distinct, non-overlapping sets of mutational changes, then the probabilities of any of them reaching fixation are independent, under the conditions on the population size for which our primary results obtain. The relative likelihood of each possible result is thus given by the ratio of the rates for each given by our earlier analysis. On the other hand, if the mutational paths to the beneficial genotypes overlap, the behavior is more complicated.

Consider first a simple example for which a particular neutral or deleterious first mutation enables many, say M, different conditionally beneficial second mutations. If each of the second mutation rates are the same, μ₁, and the fitnesses of the beneficial double-mutants are also the same, the rate at which the population acquires one of the beneficial genotypes is straightforward: in our earlier calculations this is equivalent to replacing μ₁ by Mμ₁. When the intermediate is sufficiently deleterious, the rate at which the population acquires one of the M possible beneficial genotypes is simply M times that for each alone. But for a neutral or weakly deleterious intermediate, this combinatoric factor is smaller — only $\sqrt{M}$ . In this neutral case, if the second mutations have different rates μ_1,j and selective advantages s_j with j = 1 … M, then the probability a single mutant is successful at generating an established beneficial genotype is $p_{1} \approx \sqrt{Σ_{j} μ_{1, j} s_{j}}$ : note that the sum appears inside the square root because the neutral intermediate only needs to last long enough for one of the many possible second mutations to become established.

When more than two mutations are needed to reach the beneficial genotypes, general results can be obtained iteratively, but are very messy. The important features are encompassed by a few illustrative cases. If all the intermediates are sufficiently deleterious, then again the rates can be summed over all the paths (and over the final advantageous genotypes). But if the intermediates are close to neutral, this is not the case. Consider a situation where the possible evolutionary pathways form a branching tree. In this scenario, each mutation can give rise to M possible next mutations, each with rate μ. Each of these in turn give rise to M possibilities for the subsequent mutation, and so on until a beneficial genotype is reached. Whether or not this situation is broadly relevant depends on the structure of fitness landscapes, but it is quite plausible that in many situations each genotype has a roughly equal number of “promising” future directions that may lead to beneficial combinations of alleles. If we restrict the analysis to all the beneficial genotypes requiring K mutations from the initial state, there are M^K possible paths from the initial genotype to a beneficial K-mutant. If all the intermediates have the same fitness, this is equivalent to the simple K-step process we have analyzed, with each μ replaced by Mμ. For sufficiently deleterious intermediates, this gives rise to an enhancement of the overall rate at which one of these beneficial genotypes is found by a factor of M^K. For the neutral case, the overall fixation probability is N(Mμ)²(s/Mμ)^{1/2^K–1} which is larger than the single path rate by a factor which increases only gradually with K: from M for K = 1 to M² in the limit of large K.

In situations where multiple advantageous genotypes are possible, we can also ask which is most likely to be acquired first. This is straightforward: the probability that a particular genotype is acquired first is proportional to the rate at which it is established. Thus this seemingly abstract discussion of relative rates has broad implications for the way in which populations adapt. For example, the conventional assumption is that adaptation is far more likely to occur by single mutations which each increase the fitness, since double mutations or more complex processes are far less probable. Yet consider for example a situation in which one adaptation requires two mutations, while another requires only one to confer a benefit. It might intuitively seem that the latter is highly unlikely to fix before the former. Yet our analysis shows that this is not necessarily true. If a single-mutant is beneficial (K = 1, the one-hit process), it has a probability $p_{1}^{(1)} = s$ of establishing given that it occurs. In a large population, if a double-mutant is beneficial with weakly deleterious intermediates (K = 2), we have seen that the double-mutant has a probability $p_{1}^{(2)} = \sqrt{μ_{1} s}$ of arising given that the initial mutation occurs. Thus the ratio of the probabilities of these events is of order $\sqrt{μ_{1} s}$ , rather than the much smaller mutation rate for the second mutation, μ₁. In other words, the two-hit process is not necessarily much more unlikely than the one-hit. In addition, it is crucial to consider the number of possibilities. The total number of possible double (and higher) mutations is enormously larger than the number of possible single mutations. Thus we might expect that there are more available beneficial two-mutation combinations than there are beneficial one-hit mutations, particularly if we are near a local fitness peak. Given this, it is entirely plausible that beneficial two-hit mutations arise faster than beneficial one-hit mutations, and hence populations could tend to acquire these more complex adaptations even when simpler one-hit adaptations are also possible. Shih et al. (2007) have found that this indeed seems to be case for influenza A hemagglutinin evolution.

Unfortunately, very little is known about what fraction of possible double, triple, and higher-hit mutations are likely to be beneficial, and hence what the differences in initial mutation rates are for these different types of events (or in the language of our previous discussion, what the value of M is). As we have seen, which types of events dominate evolution depends on this number of combinatoric possibilities and various other parameters in a complex way — for example, we have just seen that when intermediates are close to neutral, multiple-hit processes are not much less likely than single-hit ones, but at the same time the overall rate of these events does not increase linearly with the number of possibilities M, but only as $\sqrt{M}$ . Since we have very little understanding of any of these parameters, it is not at all clear which types of events dominate evolution. What we have learned instead is what aspects of the structure of fitness landscapes determine the relative likelihoods of different types of evolutionary behavior. Better information about these aspects of the structure of genome space is sorely needed in order to understand how organisms, particularly microbes, adapt; we have discussed some of the open questions along these lines in Fisher (2007).

Given a particular structure of genome space, our results give some insight into how different populations will explore this space. We have seen that in small enough populations (in the sequential fixation or deleterious tunneling regimes), one-hit processes will typically be much faster than multiple-hit ones, even if there are many possible multiple-hit processes. Thus a small population will adapt by “choosing” among the possible single mutations that directly increase fitness. It will choose at random among these mutations (weighted by their establishment probabilities) if it is sufficiently small; if it is somewhat larger, clonal interference processes may allow it to tend to “choose” one of the best possible single mutations. Adaptation will progress by this series of individually beneficial steps, even if by doing so the population “misses” more-fit genotypes separated by neutral or slightly deleterious intermediates.

A larger population, because it is in a stochastic tunneling regime, can “see” further away in genotype space. In such a population, two-hit processes are easily found, and hence single mutations that offer small increases in fitness (or lead to genetic dead ends) will tend not to be fixed, in favor of double mutations that offer larger increases. Still larger populations can explore three-hit mutations, and so on. Thus the threshold population sizes we have calculated for transitions between regimes can be thought of as the characteristic sizes above which a population can “see” a step further in genotype space. Populations in different regimes will adapt and explore genome space in qualitatively different ways.

While this intuition is useful and may provide a basis for further work, it glosses over important subtleties. Most importantly, it envisions a population as inhabiting a “point” in genome space, moving from there to another point, and so on. In actuality, a large population can contain significant genetic diversity, and can spread out substantially among nearby neutral genotypes. This will be particularly true if there are a large number of paths leading to different potential adaptations, with Mμ₀ larger than the rate at which beneficial mutations establish. Understanding these dynamics remains an important challenge, and will be necessary if we are to form a more complete understanding of how asexual populations adapt and explore genome space.

Acknowledgements

This research was supported in part by NIH Grant P50GM071508 to the Lewis-Sigler Institute, by NIH grant GM28016, by an NSF Graduate Research Fellowship, and by a Stanford Graduate Fellowship.

Appendix A: Laplace transforms

We wish to derive an expression for the Laplace transform of the probability density function of $ϕ (y, t) = E [e^{- y W (t)}]$ . As mentioned in the main text, φ, the Laplace transform of the probability density function of W, can then be found from ϕ by taking the limit as t goes to infinity. However, calculating ϕ(y, t) is difficult to do directly because W(t) is not a Markov random variable. An easier approach is to instead consider the two-dimensional (Markov) random variable (n(t), W(t)), and calculate $Φ (x, y, t) \equiv E [e^{- x n (t) - y W (t)}]$ . Once we have found φ, we can evaluate it at x = 0 to average over all values of n(t) and recover the Laplace transform for the weight: ϕ(y, t) = φ(0, y, t).

To derive φ, we will follow a procedure similar to that used by Kendall (1948). We first need to find an equation for the time evolution of the joint probability density of the lineage size n(t) and the weight W(t), which we denote by p_t(n, w). For an infinitesimal time dt, p_t satisfies

\begin{matrix} p_{t + d t} (n, w) = & d t (n + 1) p_{t} (n + 1, w) + d t (n - 1) (1 - δ) p_{t} (n - 1, w) \\ + (1 - d t (2 - δ) n) p_{t} (n, w - n d t) + o (d t), \end{matrix}

(38)

where the first term on the right-hand side is the probability that a death occurred in [t, t + dt), the second term is the probability of a birth, and the third term is the probability that neither a birth nor a death occurred. Rearranging terms, and using $p_{t} (n, w - n d t) = p_{t} (n, w) - n d t \frac{\partial}{\partial w} p_{t} (n, w) + o (d t)$ , we can rewrite Eq. (38) as a partial differential equation for p_t:

\frac{\partial}{\partial t} p_{t} (n, w) = (n + 1) p_{t} (n + 1, w) + (n - 1) (1 - δ) p_{t} (n - 1, w) - n (2 - δ) p_{t} (n, w) - n \frac{\partial}{\partial w} p_{t} (n, w) .

(39)

By definition, $Φ (x, y, t) = \sum_{n = - \infty}^{\infty} \int_{- \infty}^{\infty} d w p_{t} (n, w) e^{- x n - y w}$ , where we have defined p_t(n, w) ≡ 0 for all n, w < 0. Differentiating both sides of this equation with respect to time, and using Eq. (39) to replace $\frac{\partial}{\partial t} p_{t}$ , we can find a partial differential equation for φ:

\begin{matrix} \frac{\partial Φ}{\partial t} & = \sum_{n = - \infty}^{\infty} \int_{- \infty}^{\infty} d w e^{- x n - y w} [(n + 1) p_{t} (n + 1, w) + (n - 1) (1 - δ) p_{t} (n - 1, w) \\ - n (2 - δ) p_{t} (n, w) - n \frac{\partial}{\partial w} p_{t} (n, w)] \\ = (- e^{x} - (1 - δ) e^{- x} + (2 - δ) + y) \frac{\partial Φ}{\partial x} . \end{matrix}

(40)

In deriving the last term in Eq. (40), we have used integration by parts:

\begin{matrix} \sum_{n = - \infty}^{\infty} n e^{- x n} \int_{- \infty}^{\infty} d w e^{- y w} \frac{\partial}{\partial w} p_{t} (n, w) & = \sum_{n = - \infty}^{\infty} n e^{- x n} ({[e^{- y w} p_{t} (n, w)]}_{w = - \infty}^{w = \infty} + \int_{- \infty}^{\infty} d w y e^{- y w} p_{t} (n, w)) \\ = y \sum_{n = - \infty}^{\infty} n e^{- x n} \int_{- \infty}^{\infty} d w e^{- y w} p_{t} (n, w), \end{matrix}

since p_t(n,±∞) = 0 for n ≠ 0.

We can solve Eq. (40) using the method of characteristics. If we write the characteristics as x(y, t), then they must satisfy

\frac{\partial x}{\partial t} = e^{x} + (1 - δ) e^{- x} - 2 + δ - y .

(41)

Solving this differential equation, we find that φ must depend on x and t only through $\frac{e^{- x} - a_{-}}{a_{+} - e^{- x}} \exp [- (1 - δ) (a_{+} - a_{-}) t]$ , where a_± are the roots in e^−x of the right-hand side of Eq. (41):

a_{\pm} (y) = \frac{2 - δ + y \pm \sqrt{{(2 - δ + y)}^{2} - 4 (1 - δ)}}{2 (1 - δ)} .

(42)

Note that 0 < a₋ < 1 < a₊. Since the lineage starts at time t = 0 with one individual (p₀(n, w) = δ_n,₁ δ(w)), φ must satisfy boundary condition φ(x, y, 0) = e^−x, which gives

ϕ (x, y, t) = \frac{a_{-} (a_{+} - e^{- x}) + a_{+} (e^{- x} - a_{-}) \exp [- (1 - δ) (a_{+} - a_{-}) t]}{a_{+} - e^{- x} + (e^{- x} - a_{-}) \exp [- (1 - δ) (a_{+} - a_{-}) t]} .

(43)

From this, the simpler Laplace transform of the probability density of W(t) follows immediately:

\begin{matrix} ϕ (y, t) & = Φ (0, y, t) \\ = \frac{a_{-} (a_{+} - 1) + a_{+} (1 - a_{-}) \exp [- (1 - δ_{k}) (a_{+} - a_{-}) t]}{a_{+} - 1 + (1 - a_{-}) \exp [- (1 - δ_{k}) (a_{+} - a_{-}) t]} . \end{matrix}

(44)

The Laplace transform of the probability density function of W follows from this:

\begin{matrix} φ (y) & = \lim_{t \to \infty} ϕ (y, t) = a_{-} \\ = \frac{2 - δ + y - \sqrt{{(2 - δ + y)}^{2} - 4 (1 - δ)}}{2 (1 - δ)} . \end{matrix}

(45)

Appendix B: Probability density of W

Although Eq. (45) is all we need to confirm the results of our original intuitive calculation, it does not by itself show that our argument (using the weights of lineages) underlying that calculation was correct. In order to do this, we must show first that the cumulative probability of a lineage achieving a weight of at least w goes like ~ $1 ∕ \sqrt{w}$ for w ⪡ 1/δ², and falls off rapidly for larger w. Equivalently, we wish to show that the probability density for the weight of a lineage, P(w), goes like P(w) ~ w^−3/2 before falling off at 1/δ². From this, we will also be able to show that, given a probability μσ per individual per unit time of producing a successful mutant, the typical weight for a successful lineage is ~ 1/(μσ) for $δ ⪡ \sqrt{μ σ}$ , and ~ 1/δ² for $δ ⪢ \sqrt{μ σ}$ . (Here σ, the probability of success for a mutant individual, could represent an actual selective advantage s, or it could be the probability that the individual’s descendants will fix after acquiring additional mutations necessary for a selective advantage.)

We can find P(w) by taking the inverse Laplace transform of φ(y). Applying standard identities of Laplace transforms (see Arfken and Weber (1995), Tables 15.1 and 15.2) to Eq. (45), we obtain

P (w) = \frac{\exp [- (2 - δ) w]}{\sqrt{1 - δ} w} I_{1} [2 \sqrt{1 - δ} w],

(46)

where I₁ is a modified Bessel function of the first kind. This exact result is valid for both positive and negative δ, although for δ < 0 (beneficial mutants), there will be a positive probability that the lineage achieves infinite weight, corresponding to fixation.

For w ⪢ 1, which includes all weights large enough to be relevant for δ ⪡ 1, Eq. (46) is well-approximated by the asymptotic expansion (see Arfken and Weber (1995), Table 11.2)

P (w) \approx \frac{\exp [- (2 - δ - 2 \sqrt{1 - δ}) w]}{\sqrt{4 π {(1 - δ)}^{3}} w^{3 ∕ 2}} (1 - \frac{3}{16 \sqrt{1 - δ} w} + O (1 ∕ w^{2})) .

(47)

Assuming that δ ⪡ 1, we can Taylor expand the argument of the exponential to obtain

P (w) \approx \frac{\exp [- δ^{2} w ∕ 4]}{\sqrt{4 π {(1 - δ)}^{3}} w^{3 ∕ 2}} (1 + O (1 ∕ w)),

(48)

which exhibits exactly the behavior that we predicted, behaving like w^−3/2 until falling off rapidly at w ~ 1/δ².

To find the typical weight of a successful lineage, we note that the probability density of the weight of a successful lineage is

\begin{matrix} P (w ∣ success) & = P (success ∣ w) \frac{P (w)}{P (success)} \\ = (1 - e^{- μ σ w}) \frac{P (w)}{p}, \end{matrix}

where the probability of success p is given by Eq. (12). Plugging in our approximate expression for P(w), we see that

P (w ∣ success) \propto (1 - e^{- μ σ w}) e^{- δ^{2} w ∕ 4} w^{- 3 ∕ 2} .

(49)

This expression can then be integrated to give the cumulative distribution function. Although the exact expression is not illuminating, the asymptotics are exactly as predicted by our heuristic argument: for μσ ⪢ δ²/4, the cumulative distribution is dominated by weights w ≲ 1/μσ, while for μσ ⪡ δ²/4, it is dominated by w ≲ 1/δ².

Appendix C: Alternative derivation of p_k

Eq. (12) for p_k can easily be derived without referring to weights by using a first-step analysis, although this derivation does not provide the same intuitive understanding, and does not provide an expression for τ. To perform the first-step analysis, consider a k-mutant individual, having a probability p_k of success. There are four possibilities for the first event that will occur to this individual: with rate 1, it will die; with rate 1–δ_k, it will divide into two k-mutant individuals; with rate μ_kp_k₊₁, it will produce a successful (k+1)-mutant; and with rate μ_k(1–p_k₊₁), it will produce an unsuccessful (k+1)-mutant. In the first case (death), the individual has zero probability of being successful. In the second case (reproduction), each of the offspring has probability p_k of being successful, for a total probability of 1 – (1 – p_k)² of success. In the third case (reproduction producing a successful mutant), the probability of success is, by definition, 1. In the final case, (reproduction producing an unsuccessful mutant), we can ignore the unsuccessful (k + 1)-mutant, leaving us in the original situation of a single k-mutant individual with a probability p_k of success.

Equating the original probability of success to the probability of success summed over the four possible first events yields a quadratic equation for p_k:

p_{k} = \frac{(1) (0) + (1 - δ_{k}) (2 p_{k} - p_{k}^{2}) + μ_{k} p_{k + 1} + μ_{k} (1 - p_{k + 1}) p_{k}}{1 + (1 - δ_{k}) + μ_{k}} .

(50)

(The denominator of the right-hand side is the sum of the rates of the different possible first events.) Solving Eq. (50) for p_k gives

p_{k} = \frac{- δ_{k} - μ_{k} p_{k + 1} + \sqrt{{(δ_{k} - μ_{k} p_{k + 1})}^{2} + 4 μ_{k} p_{k + 1}}}{2 (1 - δ_{k})},

the same expression derived via a more complicated calculation in Appendix A.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Literature Cited

Arfken GB, Weber HJ. Mathematical methods for physicists. fourth edition Academic Press; San Diego: 1995. [Google Scholar]
Barton N. The effect of hitch-hiking on neutral genealogies. Genetical Research. 1998;72:123–133. [Google Scholar]
Barton NH, Rouhani S. The frequency of shifts between alternative equilibria. Journal of Theoretical Biology. 1987;125:397–418. doi: 10.1016/s0022-5193(87)80210-2. [DOI] [PubMed] [Google Scholar]
Blount ZD, Borland CZ, Lenski RE. Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. PNAS. 2008;105:7899–7906. doi: 10.1073/pnas.0803151105. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carter AJR, Wagner GP. Evolution of functionally conserved enhancers can be accelerated in large populations: a population-genetic model. Proceedings of the Royal Society B. 2002;269:953–960. doi: 10.1098/rspb.2002.1968. [DOI] [PMC free article] [PubMed] [Google Scholar]
Christiansen FB, Otto SP, Bergman A, Feldman MW. Waiting with and without recombination: The time to production of a double mutant. Theoretical Population Biology. 1998;53:199–215. doi: 10.1006/tpbi.1997.1358. [DOI] [PubMed] [Google Scholar]
Desai MM, Fisher DS. Beneficial mutation selection balance and the effect of linkage on positive selection. Genetics. 2007;176:1759–1798. doi: 10.1534/genetics.106.067678. [DOI] [PMC free article] [PubMed] [Google Scholar]
Durrett R, Schmidt D. Waiting for two mutations: With applications to regulatory sequence evolution and the limits of darwinian evolution. Genetics. 2008;180:1501–1509. doi: 10.1534/genetics.107.082610. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ewens W. Mathematical population genetics. second edition Springer-Verlag; New York: 2004. [Google Scholar]
Fisher DS. Evolutionary dynamics. In: Bouchaud JP, Mezard M, Dalibard J, editors. Complex Systems. Volume LXXXV. Springer-Verlag; Amsterdam: 2007. (Lecture Notes of the Les Houches Summer School). [Google Scholar]
Goh C-S, Bogan AA, Joachimiak M, Walther D, Cohen FE. Co-evolution of proteins with their interaction partners. Journal of Molecular Biology. 2000;299:283–293. doi: 10.1006/jmbi.2000.3732. [DOI] [PubMed] [Google Scholar]
Iwasa Y, Michor F, Nowak MA. Evolutionary dynamics of invasion and escape. Journal of Theoretical Biology. 2004a;226:205–214. doi: 10.1016/j.jtbi.2003.08.014. [DOI] [PubMed] [Google Scholar]
Iwasa Y, Michor F, Nowak MA. Stochastic tunnels in evolutionary dynamics. Genetics. 2004b;166:1571–1579. doi: 10.1534/genetics.166.3.1571. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karlin S. Sex and infinity: A mathematical analysis of the advantages and disadvantages of genetic recombination. In: Hiorns RW, Hiorns MSB, editors. The Mathematical Theory of the Dynamics of Biological Populations. Academic Press; New York: 1973. pp. 155–194. [Google Scholar]
Karlin S, Tavare S. The detection of a recessive visible gene in finite populations. Genetical Research Cambridge. 1981;37:33–46. [Google Scholar]
Kendall DG. On the generalized ”birth-and-death” process. Annals of Mathematical Statistics. 1948;19:1–15. [Google Scholar]
Kimura M. The role of compensatory neutral mutations in molecular evolution. Journal of Genetics. 1985;64:7–19. [Google Scholar]
Knudson AG. Two genetic hits (more or less) to cancer. Nature Reviews Cancer. 2001;1:157–162. doi: 10.1038/35101031. [DOI] [PubMed] [Google Scholar]
Lengauer C, Kinzler KW, Vogelstein B. Genetic instabilities in human cancers. Nature. 1998;396:643–649. doi: 10.1038/25292. [DOI] [PubMed] [Google Scholar]
Levin BR, Perrot V, Walker N. Compensatory mutations, antibiotic resistance and the population genetics of adaptive evolution in bacteria. Genetics. 2000;154:985–997. doi: 10.1093/genetics/154.3.985. [DOI] [PMC free article] [PubMed] [Google Scholar]
McDonald BA, Linde C. Pathogen population genetics, evolutionary potential, and durable resistance. Annual Review of Phytopathology. 2002;40:349–379. doi: 10.1146/annurev.phyto.40.120501.101443. [DOI] [PubMed] [Google Scholar]
Serra MC, Haccou P. Dynamics of escape mutants. Theoretical Population Biology. 2007;72:167–168. doi: 10.1016/j.tpb.2007.01.005. [DOI] [PubMed] [Google Scholar]
Shih AC-C, Hsiao T-C, Ho M-S, Li W-H. Simultaneous amino acid substitutions at antigenic sites drive influenza a hemagglutinin evolution. PNAS. 2007;104:6283–6288. doi: 10.1073/pnas.0701396104. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weinreich DM, Chao L. Rapid evolutionary escape by large populations from local fitness peaks is likely in nature. Evolution. 2005;59:1175–1182. [PubMed] [Google Scholar]

[R1] Arfken GB, Weber HJ. Mathematical methods for physicists. fourth edition Academic Press; San Diego: 1995. [Google Scholar]

[R2] Barton N. The effect of hitch-hiking on neutral genealogies. Genetical Research. 1998;72:123–133. [Google Scholar]

[R3] Barton NH, Rouhani S. The frequency of shifts between alternative equilibria. Journal of Theoretical Biology. 1987;125:397–418. doi: 10.1016/s0022-5193(87)80210-2. [DOI] [PubMed] [Google Scholar]

[R4] Blount ZD, Borland CZ, Lenski RE. Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. PNAS. 2008;105:7899–7906. doi: 10.1073/pnas.0803151105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Carter AJR, Wagner GP. Evolution of functionally conserved enhancers can be accelerated in large populations: a population-genetic model. Proceedings of the Royal Society B. 2002;269:953–960. doi: 10.1098/rspb.2002.1968. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Christiansen FB, Otto SP, Bergman A, Feldman MW. Waiting with and without recombination: The time to production of a double mutant. Theoretical Population Biology. 1998;53:199–215. doi: 10.1006/tpbi.1997.1358. [DOI] [PubMed] [Google Scholar]

[R7] Desai MM, Fisher DS. Beneficial mutation selection balance and the effect of linkage on positive selection. Genetics. 2007;176:1759–1798. doi: 10.1534/genetics.106.067678. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] Durrett R, Schmidt D. Waiting for two mutations: With applications to regulatory sequence evolution and the limits of darwinian evolution. Genetics. 2008;180:1501–1509. doi: 10.1534/genetics.107.082610. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Ewens W. Mathematical population genetics. second edition Springer-Verlag; New York: 2004. [Google Scholar]

[R10] Fisher DS. Evolutionary dynamics. In: Bouchaud JP, Mezard M, Dalibard J, editors. Complex Systems. Volume LXXXV. Springer-Verlag; Amsterdam: 2007. (Lecture Notes of the Les Houches Summer School). [Google Scholar]

[R11] Goh C-S, Bogan AA, Joachimiak M, Walther D, Cohen FE. Co-evolution of proteins with their interaction partners. Journal of Molecular Biology. 2000;299:283–293. doi: 10.1006/jmbi.2000.3732. [DOI] [PubMed] [Google Scholar]

[R12] Iwasa Y, Michor F, Nowak MA. Evolutionary dynamics of invasion and escape. Journal of Theoretical Biology. 2004a;226:205–214. doi: 10.1016/j.jtbi.2003.08.014. [DOI] [PubMed] [Google Scholar]

[R13] Iwasa Y, Michor F, Nowak MA. Stochastic tunnels in evolutionary dynamics. Genetics. 2004b;166:1571–1579. doi: 10.1534/genetics.166.3.1571. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Karlin S. Sex and infinity: A mathematical analysis of the advantages and disadvantages of genetic recombination. In: Hiorns RW, Hiorns MSB, editors. The Mathematical Theory of the Dynamics of Biological Populations. Academic Press; New York: 1973. pp. 155–194. [Google Scholar]

[R15] Karlin S, Tavare S. The detection of a recessive visible gene in finite populations. Genetical Research Cambridge. 1981;37:33–46. [Google Scholar]

[R16] Kendall DG. On the generalized ”birth-and-death” process. Annals of Mathematical Statistics. 1948;19:1–15. [Google Scholar]

[R17] Kimura M. The role of compensatory neutral mutations in molecular evolution. Journal of Genetics. 1985;64:7–19. [Google Scholar]

[R18] Knudson AG. Two genetic hits (more or less) to cancer. Nature Reviews Cancer. 2001;1:157–162. doi: 10.1038/35101031. [DOI] [PubMed] [Google Scholar]

[R19] Lengauer C, Kinzler KW, Vogelstein B. Genetic instabilities in human cancers. Nature. 1998;396:643–649. doi: 10.1038/25292. [DOI] [PubMed] [Google Scholar]

[R20] Levin BR, Perrot V, Walker N. Compensatory mutations, antibiotic resistance and the population genetics of adaptive evolution in bacteria. Genetics. 2000;154:985–997. doi: 10.1093/genetics/154.3.985. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] McDonald BA, Linde C. Pathogen population genetics, evolutionary potential, and durable resistance. Annual Review of Phytopathology. 2002;40:349–379. doi: 10.1146/annurev.phyto.40.120501.101443. [DOI] [PubMed] [Google Scholar]

[R22] Serra MC, Haccou P. Dynamics of escape mutants. Theoretical Population Biology. 2007;72:167–168. doi: 10.1016/j.tpb.2007.01.005. [DOI] [PubMed] [Google Scholar]

[R23] Shih AC-C, Hsiao T-C, Ho M-S, Li W-H. Simultaneous amino acid substitutions at antigenic sites drive influenza a hemagglutinin evolution. PNAS. 2007;104:6283–6288. doi: 10.1073/pnas.0701396104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Weinreich DM, Chao L. Rapid evolutionary escape by large populations from local fitness peaks is likely in nature. Evolution. 2005;59:1175–1182. [PubMed] [Google Scholar]

PERMALINK

The Rate at Which Asexual Populations Cross Fitness Valleys

Daniel B Weissman

Michael M Desai

Daniel S Fisher

Marcus W Feldman

Abstract

I. INTRODUCTION

II. MODEL

FIG. 1.

III. HEURISTIC ANALYSIS

FIG. 2.

FIG. 3.

IV. FORMAL ANALYSIS

A. The probability a mutation in a large population is successful

B. Time to the generation of the next mutant

FIG. 4.

C. The case K = 2, beneficial double mutants

D. Smaller population sizes

E. Larger population sizes

V. SIMULATIONS

FIG. 5.

FIG. 6.

VI. DISCUSSION

FIG. 7.

Acknowledgements

Appendix A: Laplace transforms

Appendix B: Probability density of W

Appendix C: Alternative derivation of p_k

Footnotes

Literature Cited

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

The Rate at Which Asexual Populations Cross Fitness Valleys

Daniel B Weissman

Michael M Desai

Daniel S Fisher

Marcus W Feldman

Abstract

I. INTRODUCTION

II. MODEL

FIG. 1.

III. HEURISTIC ANALYSIS

FIG. 2.

FIG. 3.

IV. FORMAL ANALYSIS

A. The probability a mutation in a large population is successful

B. Time to the generation of the next mutant

FIG. 4.

C. The case K = 2, beneficial double mutants

D. Smaller population sizes

E. Larger population sizes

V. SIMULATIONS

FIG. 5.

FIG. 6.

VI. DISCUSSION

FIG. 7.

Acknowledgements

Appendix A: Laplace transforms

Appendix B: Probability density of W

Appendix C: Alternative derivation of pk

Footnotes

Literature Cited

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Appendix C: Alternative derivation of p_k