Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jun 24.
Published in final edited form as: Phys Rev E Stat Nonlin Soft Matter Phys. 2013 Feb 13;87(2):022704. doi: 10.1103/PhysRevE.87.022704

Evolutionary Processes in Finite Populations

Dirk M Lorenz 1, Jeong-Man Park 1,2, Michael W Deem 1
PMCID: PMC4479310  NIHMSID: NIHMS700258  PMID: 23496545

Abstract

We consider the evolution of large but finite populations on arbitrary fitness landscapes. We describe the evolutionary process by a Markov, Moran process. We show that to O(1N), the time-averaged fitness is lower for the finite population than it is for the infinite population. We also show that fluctuations in the number of individuals for a given genotype can be proportional to a power of the inverse of the mutation rate. Finally, we show that the probability for the system to take a given path through the fitness landscape can be non-monotonic in system size.

1 Introduction

Natural populations are characterized by finite sizes. For this reason, it is impossible for biology to sample the entire space of all possible genotypes. Even the number of possible sequences with high fitness is typically much larger than the population size in naturally occurring populations. Effects due to finite population size are particularly pronounced in asexual populations. For example, the reduction of fitness in a finite population without back mutation is termed Muller’s ratchet [1], and the decreased speed of evolution in a finite population without recombination is termed the the Hill-Robertson effect [2].

The relative influence of different evolutionary forces changes between small and large populations. While stochastic effects such as genetic drift act more strongly on small populations, natural selection acts more effectively on large populations. Many results in classical population genetics have focused on the limiting cases of small or infinite populations. In sufficiently small populations, beneficial mutations occur but rarely survive long enough to become established in the population. Those mutations that survive, however, can spread through a small population, reaching fixation, before another beneficial mutation arises. This regime is referred to as successional-mutations regime [3, 4] and is fairly well-understood. This theory has been useful, for example, to understand evolution of transcription factor binding sites [5]. As the population size increases, beneficial mutations arise more frequently. Fixation of individual mutations does not occur before the arrival of another beneficial mutation. In asexual populations this leads to competition between descendants of each of the mutations — an effect referred to as clonal interference [6]. As the population becomes even larger, ultimately stochastic effects become negligible, and the time-evolution of the evolving population can be described by a set of ordinary differential equations. This regime has been studied extensively in quasispecies theory, albeit often only for simple fitness functions.

Here we investigate the regime between clonal interference and quasispecies theory. We seek to predict the evolutionary dynamics followed by a large yet finite population and how this dynamics differs from that of an infinite population. The study of finite-population effects requires a stochastic description based on a master equation [7]. We make no assumption about the fitness landscape upon which the population evolves. We show that, averaged over time, the average fitness of a large finite population is lower than that of a population of infinite size. In other words, for large asexual populations evolving on a fixed fitness landscape, an increase in population size is accompanied by an increase in the average fitness. Furthermore, small mutation rates lead to high fluctuations and correlations. In particular, for small mutation rates, fluctuations and correlations in the number of individuals for a given genotype are inversely proportional to a power of the mutation rate. These large correlations enhance finite population effects and make the convergence to infinite-population behavior occur only for extremely large populations.

This article is organized as follows. We describe the stochastic process underlying our studies in section 2. We explain how this dynamic process can be written as a field theory. We derive analytic results for the infinite population evolution from this field theory. We describe finite population effects in section 3. We introduce the fitness landscape that we use to illustrate our results in section 4. In section 5 we investigate fluctuations in this random process. We verify our analytic results using stochastic simulations in section 5. We conclude in section 6.

2 Stochastic Process Mapped to a Field Theory

Throughout this article, we use the Moran process to model evolution of a population [8]. The individuals in the population are identified by their genotype, a sequence of length l. In this continuous-time process a constant population size, N, is maintained by simultaneous replication and death. The individual to be replicated is chosen randomly from the population with probability proportional to its microscopic fitness, while the individual to be killed is chosen randomly from the population with uniform probability. We further assume that replication and mutation are independent. Thus, there are two classes of events: mutation and replication. Mutation from genotype i to genotype j occurs at a rate of μΔijNi, where μ is the mutation rate per locus, Ni is the number of individuals with genotype i, and Δij is equal to one if an individual can mutate from sequence i to sequence j with a single mutation and Δij is equal to zero otherwise. This description allows for the incorporation of back-mutations which are often ignored in the literature. Note that the analytical results in this paper do not depend on this binary form of the matrix Δ. Its elements can be arbitrary non-negative numbers as would be appropriate if back-mutation rates differed from forward mutation rates. Replication of genotype i and simultaneous death of genotype j occurs at a rate of 1NriNiNj, where ri is the replication rate of sequence i. The stochastic master equation for this process is

tP(N;t)=μi,jΔij[(Ni+1)P(N+eiej;t)NiP(N;t)]+1Niriji[(Ni1)(Nj+1)P(Nei+ej;t)NiNjP(N;t)]. (1)

Here N is a vector describing the state of the population by the number of individuals of each genoptype: (N1, N2, …), and ei is a unit vector associated with genotype i. Note ∑iNi = N.

We obtain analytic expressions for the average occupation numbers and the fluctuations by mapping the stochastic process described in the previous section onto a field theory following [9]. To do this we introduce the state vector

ψ(t)=NP(N;t)N (2)

whose time evolution is governed by

tψ(t)=N[μi,jΔij[(Ni+1)P(N+eiej;t)NiP(N;t)]+1Niriji[(Ni1)(Nj+1)P(Nei+ej;t)NiNjP(N;t)]]N. (3)

By defining annihilation and creation operators

a^iN=NiNei,a^iN=N+eia^ia^ja^ja^i=δij, (4)

we can write the governing equation for the state vector as

tψ(t)=H^ψ(t), (5)

where

H^=μi,jΔij(a^ja^i)a^i+1Ni,jria^i(a^ia^j)a^ia^j. (6)

This differential equation has the formal solution

ψ(t)=eH^tψ(0), (7)

where ∣ψ(0)⟩ = ∣N0⟩ is the initial distribution of individuals in the population. At time T, the average of an observable represented by the (normal-ordered) operator F({a^i,a^i}) can be obtained [10] by multiplying with the “sum bra” =0(iea^i)

FT=F({a^i,a^i})ψ(T)=F({a^i,a^i})eH^TN0. (8)

We introduce a Trotter factorization for the evolution operator eH^T, using a time interval ϵ → 0, in the basis of coherent states defined by a^iz=ziz and obtain a path integral representation

F({a^i,a^i})eH^TN0=F({a^i,a^i})eϵH^eϵH^eϵH^N0=[DzDz]F({z(Tϵ)})eS(z,z). (9)

Here, the action in the exponent is, after the change of variables zi1+zi,

S(z,z)=i[k=0Tϵzi(k)zi(k)k=1Tϵzi(k)zi(k1)Ni(0)ln(1+zi(0))]μϵk=1Tϵi,j(zj(k)zi(k))zi(k1)ΔijϵNk=1Tϵi,jri(1+zi(k))(zi(k)zj(k))zi(k1)zj(k1). (10)

The population dynamics in the limit as the population size, N, becomes infinite emerges as a saddle point in the action [9]. Setting δS/δzi(t)∣c = 0 leads to zic(t)=0. From setting δSδzi(t)c=0 we obtain zic(t)=Npi(t) where pi(t) obeys the differential equation

dpidt=μj(ΔjipjΔijpi)+ripirpi. (11)

Here ⟨r⟩ = ∑j rjpj is the average fitness of the infinite population. This differential equation has the closed-form solution [11]

pi(t)=j(eYt)ijpj(0)a,j(eYt)ajpj(0), (12)

where the matrix Y is defined by Yij = μΔjiμδijk Δik + δijri.

3 Finite Population Shift to Probability Distribution

We proceed to quantify analytically how finite population effects alter the infinite population dynamics. To do so we expand the action about the saddle point and separate it into a Gaussian and a non-Gaussian part. Introducing zi(k) = zci(k) + δzi(k) and zi(k)=δzi(k) in Eq. 10 we can write S = S0 + ΔS, where the reference action S0 can be written as

S0=12xTΠ01x (13)

where

xT=({δz(0),δz(0)},{δz(1),δz(1)},,{δz(Tϵ),δz(Tϵ)}). (14)

Here,

graphic file with name nihms-700258-f0001.jpg (15)

with

(Π01)00=(δijNi(0)δijδij0),(Π01)kk=(ϵ(B)ijδijδij0),(Π01)k,k1=(0δij+ϵ(A)ij00),(Π01)k1,k=(00δij+ϵ(A)ijT0). (16)

The matrices A and B are

(A)ij=μΔijμδij(mΔim)+1Nrizci(k1)+δijri1Nδij(mrmzcm(k1))1Nrjzci(k1), (17)

and

(B)ij=2δijrizci(k1)1N(ri+rj)zci(k1)zcj(k1). (18)

The non-Gaussian part of the action is given by

ΔS=iNi(0)[ln(1+δzi(0))δzi(0)+12(δzi(0))2]ϵNk=1Tϵi,j[ri(δzi(k)δzj(k))δzi(k1)δzj(k1)+riδzi(k)(δzi(k)δzj(k))zci(k1)δzj(k1)+riδzi(k)(δzi(k)δzj(k))δzi(k1)zcj(k1)+riδzi(k)(δzi(k)δzj(k))δzi(k1)δzj(k1)]. (19)

This formulation allows us to calculate averages using the Gaussian action and thermodynamic perturbation theory, which is equivalent to a cumulant expansion. The average occupation numbers are given by

NiT=a^ia^ieH^TN0=a^ieH^TN0=[DzDz]zi(Tϵ)eS(z,z) (20)
=[DzDz]zi(Tϵ)eΔSeS0=zi(Tϵ)eΔS0 (21)
=zi(Tϵ)0zi(Tϵ)ΔS0+12zi(Tϵ)(ΔS)20+ (22)
=Npi(T)δzi(Tϵ)ΔS0+12δzi(Tϵ)(ΔS)20+, (23)

where the last step follows from (ΔS)n0=0nZ,n1. This procedure leads to an asymptotic expansion for the occupation numbers in power of 1/N. To first order, we obtain

1NNa(T)~pa(T)+1N20Tdti,jΠ0aizz(T,t)Π0ijzz(t,t)(rirj). (24)

This expansion about infinite size is accurate when the correction term on the right hand side of Eq. (24) is much smaller than pa(T). Equation (36) provides an estimate of the magnitude of the correction for a common landscape with k intermediate steps. The second order term is given by Eq. A.1 in the appendix. We derive expressions for the matrices Π0aizz(T,t) and Π0ijzz(t,t) by inverting Π01 in Eq. 15. In continuous time for T > t, they obey

Π0zz(T,t)T=A(T)Π0zz(T,t), (25)

with

Π0zz(t,t)=I (26)

and

dΠ0zz(t,t)dt=B(t)+A(t)Π0zz(t,t)+Π0zz(t,t)AT(t), (27)

with

Π0ijzz(0,0)=δijNi(0). (28)

Using the expression for the first-order shift to the occupation numbers due to finite population effects, we calculate the finite population shift in the average fitness of the population. The average fitness correction is

δr(T)=1N20Tdti,j,araΠ0aizz(T,t)Π0ijzz(t,t)(rirj) (29)
=1N20Tdti,j,araΠ0aizz(T,t)(Π0ijzz(t,t)+Nδijpi(t))rj, (30)

This result shows that the correction to the mean fitness is O(1N) the mean fitness in the limit of infinite population. This result can be rewritten in a more revealing form. Let r(t) be a random variable defined as

r(t)1Niri(Ni(t)Ni(t)) (31)

in the limit of large population size. The finite population correction to the average fitness can then be written as

δr(T)=0Tr(T)r(t)dt (32)

and its time integral as

0Tδr(t)dt=0Tdt0tdtr(t)r(t)=12(0Tr(t)dt)2. (33)

This expression for the average fitness correction, which resembles a fluctuation dissipation theorem, implies that the time-average of the finite-population shift is always negative. In other words, the average fitness of a large finite population is smaller than that of a population of infinite size. Note that this result is perturbative, valid for large population size N, and it does not require the average fitness to be a monotonic function of N for small N. On complex fitness landscapes, it is possible for small asexual populations to achieve a higher average fitness than larger ones [12]. Nonetheless, for sufficiently large population sizes, the time-integrated average fitness increases monotonously with population size.

4 The Landscape

The analytical expressions developed in this paper are applicable to arbitrary fitness landscapes and mutational pathways. However, we now describe in some detail the implications for fitness landscapes [13] defined by a certain number of fitness loci l with two alleles each. Genotypes that differ from each other by exactly one point mutation in one of the loci are connected in the mutation matrix. Each position in sequence space is thus connected by a mutation event to l other genotypes. Figure 1 shows the geometry of the landscape for the case of three loci. Typically in this landscape, the fitness of each state increases upon moving to the right in the figure.

Figure 1.

Figure 1

Left-hand side: the state-space for a fitness landscape with three forward-mutations and no back-mutations. Each node, i, is a particular genotype. The replication rate of each genotype is ri. Right-hand side (discussed in Section 6): The state-space can be expanded to include mutational histories. Each two-mutation state is split into 2! = 2 states while the three-mutation state is split into 3! = 6 states. The node is now identified by a vector which conveys the mutational history of a particular path through the landscape.

5 Fluctuations around the Mean

The matrices Π0zz(t, t) and Π0zz(T,t) can be understood intuitively. In the limit of large N, the off-diagonal elements of Π0zz(t, t) describe the covariances between the occupation numbers at time t while the diagonal elements are related to the variances of the occupation numbers at time t by

1N2(δNa(t))2~1N(pa(t)+1NΠ0aazz(t,t)). (34)

At different times, Π0zz(T, t) gives the cross-covariances between the occupation numbers at times T and t. The matrix Π0aizz(T,t) relates the correlations at different times to the same-time correlations via

Π0zz(T,t)=Π0zz(T,t)Π0zz(t,t). (35)

We observe numerically that for small mutation rates, the fluctuations are proportional to a negative power of the mutation rate. Specifically,

1N2(δNa(t))2~1N(rμ)k, (36)

where k is the number of mutational steps as shown in Fig. 2. This dependence can also be shown analytically for sufficiently simple landscapes. See section B in the appendix for one example. Thus the expansion, which naively appears to be in 1/N is actually in 1/(k). Thus, the expansion breaks down when μ < 1/N1/k. The expansion is valid for large N and μ ⪢ 1/N1/k.

Figure 2.

Figure 2

The maximal change of the variance with time (+), i.e. maxt,idΠ0iizz(t,t)dt where Π0zz is obtained from Eqs. 27 and 28, depends on the mutation rate as an inverse power law. Shown are calculations for a nonepistatic version of the landscape as described in section 4 with a) two possible mutations — r0 = 0, Δr1 ≈ 0.049, Δr2 ≈ 0.010, b) three possible mutations — r0 = 0, Δr1 ≈ 0.049, Δr2 ≈ 0.010, Δr3 ≈ 0.002 — and c) four possible mutations — r0 = 0, Δr1 ≈ 0.049, Δr2 ≈ 0.020, Δr3 ≈ 0.006, Δr4 ≈ 0.002. In this case, the fitness of each state is simply the sum of contribution from each mutation. The solid lines indicate power law fits using the values for μ ≤ 10−5. Their exponents are a) −1.999, b) −2.989, and c) −3.939. The exponent is observed to be equal to the number of mutational steps in the landscape.

We verify our analytical results by performing stochastic simulations using the Lebowitz/Gillespie algorithm [14, 15]. Rewriting Eq. 24 for the first order shifts to the occupation numbers,

Na(T)Npa(T)~1N0Tdti,jΠ0aizz(T,t)Π0ijzz(t,t)(rirj), (37)

we observe that the finite population correction converges to a constant value for large N. The average replication rate in the population is linear in the occupation numbers. It is equal to 1NiriNi(t). Therefore, the average replicationrate also converges to the quasispecies result in the limit of a large population. That is, the average replication rate is equal to that of the infinite population plus a correction that is of order 1/N smaller. Figure 3 shows this convergence for one set of parameters. As a further check on our analytic results, we fit a cubic polynomial in 1/N to the simulation data displayed in Fig. 3. For the particular fitness parameters chosen here, the coefficients from this fit are 320.4±2.5 for the constant term and (−5.3 × 0.8) × 105 for the linear term, while our theory predicts 319.0 and −5.2 × 105, respectively. Here, the coefficient of the linear term is obtained from Eq. A.1 in Appendix A. Similarly, we observe that the variances obtained from stochastic simulations agree with the analytic expression given in Eq. 34 as shown in Fig. 4.

Figure 3.

Figure 3

(a) Finite-population correction to the average occupation numbers (left-hand side of Eq. 37) as a function of population size, N, on a three-mutation landscape as shown in Fig. 1 including back-mutations. Shown are data for a mutation rate of μ = 10−5 and replication rates of r0 = 0, r1 ≈ 0.049, r2 ≈ 0.010, r3 ≈ 0.002, r4 ≈ 0.059, r5 ≈ 0.051, r6 ≈ 0.012, and r7 ≈ 0.061. The time is chosen as T = 157.5 which approximately maximizes ⟨N0⟩ (T) − Np0(T). As N increases, the corrections obtained from stochastic simulations — N0(×), N1(엯), N2(+), N3(*), N4(◻), N5(◇), N6(▿), N7(▵) — converge to the values predicted by the theory (solid lines). The dashed curves show the second order expansion, given by Eqs. 37 and A.1. The error bars are one standard error. (b) Finite-size correction to the mean population fitness. The average replication rate in the population is linear in the occupation numbers, being equal to 1NiriNi(t), and so it too converges to the quasispecies result in the limit of a large population.

Figure 4.

Figure 4

Variances divided by population size, N as a function of N. The values obtained from stochastic simulations — N0(×), N1(엯), N2(+), N3(*), N4(◻), N5(◇), N6(▿), N7(▵) — agree with the values predicted by Eq. 34 (solid lines). The time and other parameters are the same as in Fig. 3. The error bars are one standard error.

6 Discussion and Conclusion

Although the theory described in this paper was developed to study the time-evolution of the occupation numbers in sequence space, we can immediately apply these results to investigate which mutational paths individuals take. This allows us to predict the large N behavior of the probability that a population will follow a certain mutational trajectory. To do this we simply expand the state space describing the identity of each individual to include not only the possible sequences but also the mutational histories. Figure 1 illustrates this expansion for the case of three mutations. Figure 5 compares the probability of following a given path as obtained from stochastic simulations to the expressions given in Eqs. 24 and A.1. We again observe that the simulation results converge to the values predicted by the theory as the population size increases. Interestingly, we observe numerically that the probability for a population to take a certain mutational path varies with the population size in a non-monotonic fashion. In particular, there is an intermediate population size at which the population is most likely to take the dominant path through the landscape.

Figure 5.

Figure 5

Probability that a population will follow a certain mutational trajectory as a function of population size. Shown are data for the landscape in Fig. 1 excluding back-mutations with a mutation rate of μ = 10−3 and epistatic replication rates of r0 = 0, r1 ≈ 0.049, r2 ≈ 0.010, r3 ≈ 0.002, r4 ≈ 0.012, r5 ≈ 0.051, r6 ≈ 0.059, and r7 ≈ 0.061. Equation 37 (solid lines) predicts the asymptotic behavior of the simulation values — N123(×), N213(엯), N132(◻), N312(+), N231(*), N321(◇) — for large population sizes. The second order expansion (dashed lines) improves the prediction for sufficiently large populations. The error bars are one standard error.

Fluctuations due to finite population can be quite large. As shown in Appendix B, these fluctuations are proportional to an inverse power of the mutation rate. That is, the expansion in 1/N has a coefficient that depends on a power of the inverse of the mutation rate. For this reason, convergence to the infinite population limit can be exceedingly slow. The coefficient in the expansion in 1/N also has a time dependence. As shown in Appendix C, this coefficient can be proportional to t, and so diverge at long times. This divergence occurs when there are multiple final states, with equal replication rates. For example, the fluctuations diverge at long times in the expanded state space due to what may be termed fixation of path probabilities.

In this paper we presented a path-integral formulation of evolution under a Moran-type process on arbitrary fitness landscapes. We derived analytic results that describe the dynamics exactly in the limit of an infinite population size and obtained an asymptotic expansion in the inverse of the population size for finite populations. We showed that the finite population correction to the time-averaged fitness is always negative, which implies that for sufficiently large population sizes the time-averaged fitness increases with population size. We also found that for small mutation rates, the infinite-population variances of the occupation numbers behave as μk where k is the number of mutational steps from the ancestral sequence. Finally, we showed how the formalism described in this paper can also be used to investigate which mutational path a population takes through the fitness landscape by expanding the sequence space to include mutational histories.

Acknowledgments

This research was supported by the US National Institutes of Health (1 R01 GM 100468–01). JMP was also supported by the Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Education, Science, and Technology (grant number 2010–0009936).

A Second Order Correction

Equation 24 gives the terms up O(N0) of an asymptotic expansion for the average occupation numbers in powers of 1/N. We here determine the second order, O(N1) correction terms. Figure A.1 shows all possible vertices appearing in the diagrams. Unlike the first correction term, which is derived from only the single non-vanishing diagram shown in Fig. A.2, the second order correction term comes from the nine different diagrams shown in Fig. A.3. We obtain

1NNa(T)~pa(T)+1N20Tdti,jΠ0aizz(T,t)Π0ijzz(t,t)(rirj)+1NNa(2)(T), (A.1)

where

Na(2)(T)=12N20Tdti,j(Π0aizz(T,t)Π0ajzz(T,t))(rirj)0tdti,jri(Π0iizz(t,t)Π0ijzz(t,t))Π0jizz(t,t)Π0i,jzz(t,t)+14N30Tdti,j(Π0aizz(T,t)Π0ajzz(T,t))(rirj)0tdti,j(Π0iizz(t,t)Π0ijzz(t,t))(rirj)0tdti,j(rirj)[(Π0jizz(t,t)Π0jjzz(t,t))(Π0i,jzz(t,t)Π0ijzz(t,t)+2Π0ijzz(t,t)Π0jizz(t,t))+2(Π0iizz(t,t)Π0ijzz(t,t))(Π0jjzz(t,t)Π0i,jzz(t,t)+2Π0jizz(t,t)Π0jjzz(t,t))]+1N30Tdti,j(Π0aizz(T,t)Π0ajzz(T,t))(rirj)0Tdti,j(Π0iizz(t,t)Π0ijzz(t,t))(rirj)0tdti,jri[Π0jizz(t,t)(Π0j,izz(t,t)Π0jjzz(t,t))(Π0iizz(t,t)zcj(t)+Π0i,jzz(t,t)zci(t))+Π0j,izz(t,t)(Π0jizz(t,t)Π0jjzz(t,t))(Π0iizz(t,t)zcj(t)+Π0jjzz(t,t)zci(t))+Π0j,izz(t,t)(Π0i,izz(t,t)Π0ijzz(t,t))(Π0jizz(t,t)zcj(t)+Π0jjzz(t,t)zci(t))]+1N30Tdti,j(Π0aizz(T,t)Π0ajzz(T,t))(rirj)0tdti,jri0tdti,jΠ0ijzz(t,t)(rirj)Π0iizz(t,t)(Π0jizz(t,t)Π0jjzz(t,t))(Π0iizz(t,t)zcj(t)+Π0jizz(t,t)zci(t))+2N20Tdti,j(Π0aizz(T,t)Π0ajzz(T,t))(rirj)0tdti,jΠ0iizz(t,t)(rirj)iΠ0jizz(t,0)Π0iizz(t,0)Π0jizz(t,0)ni(0) (A.2)

Figure A.1.

Figure A.1

Vertices for the diagrammatic expansion. A white circle represents an open time, while black circles stand for times that are integrated over.

Figure A.2.

Figure A.2

Diagram for the O(N0) correction to the average occupation numbers.

Figure A.3.

Figure A.3

Diagrams for the O(N1) correction to the average occupation numbers with their multiplicities.

B Fluctuations Proportional to a Negative Power of the Mutation Rate

In this appendix we consider a special case of the model described in section 2, for which we show analytically that for small mutation rates, μ, the variance in the infinite population occupation numbers is proportional to k, where k is the number of mutational steps in the landscape. We work in the limit that N → ∞. We seek to understand when the 1/N expansions of Eqs. (24) and (34) break down. We will show that for small μ, the naive expansion in 1/N is actually an expansion in 1/(k). The expansions in Eqs. (24) and (34), therefore, break down when μ < 1/N1/k. In other words, the expansion is valid for large N and μ ⪢ 1/N1/k. Let there be k + 1 positions in sequence space linked by k mutations which occur at equal rate μ such that Δij = δi,j−1 for i < k, where δi,j is the Kronecker delta. The fitness increases in the direction of mutations (all mutations are beneficial) but the fitness increments decrease monotonically. This landscape is commonly encountered when there is a dominant path through a landscape. For example, we encountered this case when applying our theory to long-term experimental studies of bacterial evolution [16]. Fig. B.1 shows a graphical representation of this landscape. We assume that the mutation rate is very small, μr, and that there is no back mutation. Initially, the entire population is in the starting state, Ni(t = 0) = i,0. For this simple landscape Eq. 11 can be solved explicitly for the infinite population occupation numbers. In the limit as μ → 0, we have

p=i=0kb=0iμiγbierbt (B.1)
pi=p1b=0iμiγbierbt, (B.2)

where

γbi={(1)ibj=b+1i(rjrb)j=0b1(rbrj)bi0b>i.} (B.3)

Substitution into Eq. (11) confirms these solutions in the μ → 0 limit.

Figure B.1.

Figure B.1

A simple landscape in which mutations occur at rate μ, without back mutation, the replication rate at position i is ri, and Ni is the occupation number at position i.

Let Cij ≡ limN→∞ (⟨NiNj⟩ − ⟨Ni⟩ ⟨Ni⟩)/N denote the infinite-population covariance matrix. From section 5 we know that

Cij(t)=δijpi(t)+1NΠ0ijzz(t,t) (B.4)

In the limit of infinite N, the correlation matrix C converges to a number independent of N. We can show that

dC(t)dt=B(t)+A(t)C(t)+C(t)AT(t), (B.5)

with

Cij(0)=0 (B.6)

and

Bij(t)=(μΔijpi(t)+μΔjipj(t)+(ri+rj)pi(t)pj(t))+δij(μaΔaipa(t)+μaΔiapi(t)+ripi(t)+rpi(t)). (B.7)

To compute B, one is allowed to use the infinite N values for pi(t) because finite N corrections to pi(t) lead to higher order terms in the expansion Eq. (34). Let t0 = 0, ta ≡ ln (Δra/μ)/Δra, 0 < ak. We examine Eq. (B.2). We consider t > ta. Expression B.2 for pa will be dominated by the last term in the series, since the ratio of the magnitude of the last term to the second to last term is exp(Δrat)j=0j=a2(ra1rj)(rarj)=(Δraμ)exp[Δra(tta)]j=0j=a2(ra1rj)(rarj), and this is large for small μ and t > ta. Furthermore, the ratio of pa to pa−1 is (μra) exp(Δrat)j=0j=a2(ra1rj)(rarj)=exp[Δra(tta)]j=0j=a2(ra1rj)(rarj), which is also large for t > ta. The time interval from ta to ta+1 gets larger as μ gets smaller, so that the time period during which pa−1 and pa are of similar magnitude, t ~ ta, becomes less and less significant. Figure B.2 shows this result numerically. Finally, the ratio of pa+1 to pa is exp[Δra+1(tta+1)]j=0j=a1(rarj)(ra+1rj), which is small for t < ta+1. Thus, for small μ, in the time interval ta to ta+1, most of the population is in state a. That is,

pa(t)pa(t)aa,ta<t<ta+1,μ0. (B.8)

Using this result and keeping the lowest order in μ in Eq. 17, we find

Aij(t)~(rjra)(δi,jδi,a),ta<t<ta+1,μ0 (B.9)

such that

dCij(t)dt~Bij(t)+(ri+rj2ra)Cij(t)n(rnra)(δj,aCi,n+δi,aCj,n). (B.10)

For this landscape, Eq. B.7 reduces to

Bij(t)=[μδi,j1pi(t)+μδi,j+1pj(t)+(ri+rj)pi(t)pj(t)]+δij(μpi1(t)+μ(1δi,k)pi(t)+ripi(t)+rpi(t)) (B.11)

and, in particular,

Bkk(t)=2rk(pk(t))2+μpk1(t)+rkpk(t)+rpk(t). (B.12)

Substituting Eqs. B.1 and B.2 into this expression and keeping only the lowest power of μ, we obtain, for t < t1,

Bkk(t<t1)~rkμka=0kγakeratμ0 (B.13)

and thus

dCkk(t)dt~rkμka=0kγakerat+2rkCkk(t),t<t1,μ0. (B.14)

Integrating and only keeping terms to lowest order in μ yields

Ckk(t1)~uk(Δr1μ)2rkΔr1a=0kγak2rarkμ0. (B.15)

For later time periods, the evolution of Ckk(t1 < t < tk) is dominated by the second term in Eq. B.10 as μ → 0:

dCkk(t)dt~2(rkra)Ckk(t),ta<t<ta+1,0<a<k,μ0 (B.16)

with solution

Ckk(t)~μke2(rkra)tj=1a(Δrjμ)2a=0kγak2rarkta<t<ta+1,0<a<k,μ0. (B.17)

Fig. B.3 shows the convergence of this approximation to Eq. 34 as μ → 0 for one set of replication rates.

Figure B.2.

Figure B.2

Infinite population occupation numbers versus time for k = 4, r0 = 0, r1 = 1.00, r2 = 1.45, r3 = 1.65, r4 = 1.74, and three different values for μ: (a) 10−5, (b) 10−8, and (c) 10−11. The occupation numbers, p0 (solid), p1 (dotted), p2 (dash-dotted), p3 (dashed), p4 (solid with circles), are calculated using Eq. 12. Note that as μ becomes smaller, pa becomes more and more dominant during the time interval ta < t < ta+1.

Figure B.3.

Figure B.3

Infinite population variance of the final state vs. time for k = 4, r0 = 0, r1 = 1.00, r2 = 1.45, r3 = 1.65, r4 = 1.74, and three different values for μ: (a) 10−3, (b) 10−5, and (c) 10−8. Exact values calculated using Eq. 34 (solid lines) and the approximation given in Eq. B.17 (dashed lines) are both shown. Note that Ckk(tk) ∝ μk.

Using Eq. B.17, we find that as μ → 0

Ckk(tk)~μke2Δrktka=0kγak2rarkj=1k1(Δrjμ)2=μka=0kγak2rarkj=1k(Δrj)2. (B.18)

The maximum of Ckk(t) occurs near tk. This result follows from Eq. (B.10). The first term on the righthand side of Eq. (B.10) only matters during 0 < t < t1. After that, Bkk has a larger power of μ then Ckk does. The second term on the righthand side is zero for t > tk. Thus, for t > tk, only the third term on the righthand side matters, and it is negative. Thus, for t > tk, Ckk(t) decreases. It is for this reason that the dashed curves in Fig. B.3 are shown for 0 < t < tk only.

C Fluctuations in the Expanded State Space at Large Times

Consider the expanded state space of a landscape as shown in Fig. 1 generalized to an arbitrary number of loci. For any finite population size N, the only sinks are the final states in which all mutations have occurred in some order, all of which have the same replication rate. Thus, after a certain time tf, the occupation numbers at positions prior to the final states can be neglected so that the dynamics can be described by Eq. 1 with a single replication rate r and without mutation,

tP(N;t)=rNi,ji[(Ni1)(Nj+1)P(Nei+ej;t)NiNjP(N;t)].

From this we obtain that the average occupation numbers remain constant

Na(t)=const=Na(tf)ttf (C.1)

and that the covariances are

Σab(t)Na(t)Nb(t)Na(t)Nb(t)=(1e2r(ttf)N)Na(δabNNb)+e2r(trf)NΣab(tf)ttf. (C.2)

Expanding this to largest order in N, yields

Σab(t)~2r(ttf)(δabNa1NNaNb)+Σab(tf)ttf. (C.3)

Note that the expansion in N converges only for finite times.

References

  • [1].Muller HJ. The relation of recombination to mutational advance. Mutat. Res. 1964;106:2–9. doi: 10.1016/0027-5107(64)90047-8. [DOI] [PubMed] [Google Scholar]
  • [2].Hill WG, Robertson A. The effect of linkage on limits to artificial selection. Genetical Research. 1966;8:265–294. [PubMed] [Google Scholar]
  • [3].Desai Michael M, Fisher Daniel S. Beneficial mutation selection balance and the effect of linkage on positive selection. Genetics. 2007 Jul;176(3):1759–98. doi: 10.1534/genetics.106.067678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Sella G, Hirsh AE. The application of statistical physics to evolutionary biology. Proc. Natl. Acad. Sci. 2005;102:9541–9546. doi: 10.1073/pnas.0501865102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Berg J, Willmann S, Lässig M. Adaptive evolution of transcription factor binding sites. BMC Evol. Biol. 2004;4:42. doi: 10.1186/1471-2148-4-42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Gerrish Philip, Lenski Richard. The fate of competing beneficial mutations in an asexual population. 1998. [PubMed]
  • [7].Alves D, Fontanari J. Error threshold in finite populations. Physical Review E. 1998 Jun;57(6):7008–7013. [Google Scholar]
  • [8].Moran PAP. The Statistical Processes of Evolutionary Theory. Clarendon Press; 1962. [Google Scholar]
  • [9].Park Jeong-Man, Muñoz Enrique, Deem Michael W. Quasispecies theory for finite populations. Physical Review E. 2010;81(1):011902. doi: 10.1103/PhysRevE.81.011902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Peliti L. Path integral approach to birth-death processes on a lattice. Journal de Physique. 1985;46(9):1469–1483. [Google Scholar]
  • [11].Thompson Colin J, McBride John L. On Eigen’s theory of the self-organization of matter and the evolution of biological macromolecules. Mathematical Biosciences. 1974 Oct;21(1-2):127–142. [Google Scholar]
  • [12].Jain Kavita, Krug Joachim, Park Su-Chan. Evolutionary advantage of small populations on complex fitness landscapes. Evolution; international journal of organic evolution. 2011 Jul;65(7):1945–55. doi: 10.1111/j.1558-5646.2011.01280.x. [DOI] [PubMed] [Google Scholar]
  • [13].Cowperthwaite Matthew C., Meyers Lauren Ancel. How Mutational Networks Shape Evolution: Lessons from RNA Models. Annual Review of Ecology, Evolution, and Systematics. 2007 Dec;38(1):203–230. [Google Scholar]
  • [14].Bortz AB, Kalos M, Lebowitz J. A new algorithm for Monte Carlo simulation of Ising spin systems. Journal of Computational Physics. 1975 Jan;17(1):10–18. [Google Scholar]
  • [15].Gillespie D. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. Journal of Computational Physics. 1976 Dec;22(4):403–434. [Google Scholar]
  • [16].Paixão T, Lorenz DM, Songhurst J, Deem MW, Azencott R, Cooper TF, Azevedo RBR. Clonal interference can lead to evolutionary farsightedness. submitted.

RESOURCES