Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jun 24.
Published in final edited form as: Phys Rev E Stat Nonlin Soft Matter Phys. 2010 Jan 6;81(1 0 1):011902. doi: 10.1103/PhysRevE.81.011902

Quasispecies theory for finite populations

Jeong-Man Park 1,2, Enrique Muñoz 1, Michael W Deem 1
PMCID: PMC4479305  NIHMSID: NIHMS700260  PMID: 20365394

Abstract

We present stochastic, finite-population formulations of the Crow-Kimura and Eigen models of quasispecies theory, for fitness functions that depend in an arbitrary way on the number of mutations from the wild type. We include back mutations in our description. We show that the fluctuation of the population numbers about the average values are exceedingly large in these physical models of evolution. We further show that horizontal gene transfer reduces by orders of magnitude the fluctuations in the population numbers and reduces the accumulation of deleterious mutations in the finite population due to Muller’s ratchet. Indeed the population sizes needed to converge to the infinite population limit are often larger than those found in nature for smooth fitness functions in the absence of horizontal gene transfer. These analytical results are derived for the steady-state by means of a field-theoretic representation. Numerical results are presented that indicate horizontal gene transfer speeds up the dynamics of evolution as well.

INTRODUCTION

Biological populations in nature are finite. In particular, it is clear that the number of individuals in a population is much smaller than the number of possible genetic sequences, even for genomes of modest length. For example, the largest populations observed in biological systems, RNA viruses, are on the order of N = 1012 viral particles within a single infected organism [1]. These viruses possess a relatively short genome of length L ~ 103 − 104 bases [1], and hence the theoretical size of the sequence space is 4L ~ 106000N. Even the region of phase space for which fitness is high is typically much larger than the biological population size. From this example, it is clear that no real biological population will be able to sample the entire sequence space during evolutionary dynamics [2], and therefore finite population size effects may be important for a realistic description of evolution [3]. Finite populations with asexual reproduction are subject to the “Muller’s ratchet” effect [4], which is the tendency to accumulate deleterious mutations in finite populations [46]. It has been suggested that horizontal gene transfer and recombination may provide a way to escape Muller’s ratchet in small populations [710], and this mechanism has been proposed as one of the evolutionary advantages of sex, despite the additional mutational load for fitness functions with positive epistasis [46, 9, 1114]. The role of the finite population size in the Muller’s ratchet effect has been previously studied by the traveling-wave approximation [15, 16]. This theoretical approach introduces an approximate treatment, by assuming deterministic dynamics for the bulk of the population, but stochastic dynamics for the edge composed of the class of highest fitness genotypes. The deterministic component of this theory, which considers single point mutations coupled to replication, is similar to traditional quasispecies models for infinite populations. These previous studies considered only linear fitness functions and analyzed in detail the case of no back mutations, an approximation which changes the dynamics and leads to a different steady-state distribution. We here include back mutations and consider fitness functions that depend in an arbitrary way on the number of mutations from the wild type in our exact description.

Quasispecies models for molecular evolution, represented by the Crow-Kimura model [17] and the Eigen model [1821], are traditionally formulated in the language of chemical kinetics. That is, they describe the basic processes of mutation and selection in an infinite population of self-replicating, information encoding molecules such as RNA or DNA, which are assumed to be drawn from a binary alphabet (e.g. purines/pyrimidines). These models exhibit a phase transition in the infinite genome limit [1826], separating an organized or quasispecies phase from a disordered phase. This phase transition occurs when the mutation rate exceeds a critical value, which depends on the nature of the fitness function [25, 27]. The phase transition is usually of first order for binary alphabets [25, 27], but it is of higher order for smooth fitness functions in larger alphabets [28]. The quasispecies is composed by a collection of nearly neutral mutants, that is, a cloud of closely related individuals sharing similar fitness values, rather than by a single sequence type. Despite its abstract character, the quasispecies model has been successfully applied to interpret experimental studies in RNA viruses [2932].

FINITE POPULATION EFFECTS IN THE CROW-KIMURA MODEL

In the infinite population limit, the mean field approach that is customary in chemical kinetics is justified, and the evolution of the probability distribution of sequence types can be described by a deterministic system of differential equations. This mean field approach cannot capture the fluctuations in the numbers of individuals with different sequences, which are a consequence of the stochastic dynamics of the process. An accurate description of all aspects of a finite population therefore requires a master equation formulation [3]. We here consider arbitrary fitness functions. The special case of linear fitness functions f(ξ) = , have been analyzed in [15, 16, 33].

We consider a finite population, composed of N < ∞ binary purine/pyrimidine sequences, of length L. The terms in the master equation for the Crow-Kimura, or parallel, model are i) a replication term, whereby each individual of sequence Si reproduces at a rate Lf(Si) and the offspring replaces a random member of the population, ii) a mutation term, whereby each base in a sequence mutates at a rate μ per unit time, and iii) a horizontal gene transfer term, whereby bases in a sequence are replaced at rate ν per unit time with bases randomly chosen from the population. We assume that the replication rate, or microscopic fitness, is a function of the Hamming distance from the wild-type genome, and hence of the one-dimensional coordinate 0 ≤ ξL representing the alignment of an individual’s sequence with the wild type. The master equation can be exactly projected onto the ξ coordinate and defines the rates at which the sequences of individuals change with time due to replication, mutation, and horizontal gene transfer. We define (1 + u)/2 to be the probability of a wild type letter in the sequence, ρ± = (1 ± u)/2 is the probability of inserting a wild-type or non-wild-type letter by horizontal gene transfer [27, 34], and

u=1Nξ=0L(2ξL1)nξ (1)

is the ‘average base composition,’ where nξ is the number of individuals at coordinate ξ.

We formulate the master equation for the probability distribution P ({nξ}; t), as a function of the set of occupation numbers {nξ}0≤ξL. As in the classical, infinite population Crow-Kimura model [17], we consider point mutation with rate μ, and replication with a rate r(ξ) = Lf(ξ), while preserving the population size N. In addition, we consider horizontal gene transfer of single letters between an individual sequence and the population, with rate ν.

The master equation describing this process is

tP({nξ})=1Nξξr(ξ)[(nξ1)(nξ+1)×P(nξ1,nξ+1)nξnξP({nξ})]+μξ=0L[(Lξ)(nξ+1)P(nξ+1,nξ+11)+ξ(nξ+1)P(nξ11,nξ+1)Lnξ×P({nξ})]+νξ=0L[ρ+(Lξ)(nξ+1)×P(nξ+1,nξ+11)+ξρ(nξ+1)×P(nξ11,nξ+1)nξ{ρ+(Lξ)+ρξ}P({nξ})] (2)

Note that this exact master equation includes ‘back mutations’ often ignored in the literature [15, 16]. Note that the approximation of setting back mutations to zero leads to both different dynamics and a different steady-state.

Mapping to a field theory

We seek analytical expressions for the fluctuations in number of individuals with given sequence compositions in the finite population parallel model. We derive these results by means of a field-theoretic method [25, 35, 36]. This approach provides a system of coupled differential equations for the probability distribution and the fluctuation of numbers of individuals with given sequence composition, whose computational solution is essentially instantaneous. These results give us the fluctuation and correlation in population numbers and are an exact expansion in the inverse of the population size. We introduce an exact representation of the classical master equation in terms of a many-body quantum theory [25]. For that purpose, we define the population state vector

Ψ(t)={nξ}P({nξ};t){nξ} (3)

with

{nξ}=n0,n1,,nL=ξ=0Lnξ (4)

This population state vector evolves according to a Schrödinger equation in imaginary time,

ddtΨ(t)=H^Ψ(t) (5)

which possesses the formal solution

Ψ(t)=eH^tΨ(0) (6)

with Ψ(0)={nξ0} representing the initial configuration of the population. The master equation is written in second quantized form, with a Hamiltonian expressed in terms of boson creation and destruction operators [a^ξ,a^ξ]=δξ,ξ, whose action over the occupation number vectors is defined by a^ξnξ=nξnξ1, and a^ξnξ=nξ+1. The Hamiltonian is given by

H^=1Nξ,ξ=0LLf(ξ)a^ξ(a^ξa^ξ)a^ξa^ξ+μξ=0L[(Lξ)(a^ξ+1a^ξ)a^ξ+ξ(a^ξ1a^ξ)a^ξ]+νξ=0L[ρ+(Lξ)(a^ξ+1a^ξ)a^ξ+ρξ(a^ξ1a^ξ)a^ξ] (7)

The terms proportional to f represent replication, μ represent mutation, and ν represent horizontal gene transfer. The population average of a (normal-ordered) classical observable, represented by the operator F({a^ξ}), is obtained by the inner product with the “sum” [35] bra =0(ξ=0Lea^ξ),

F=F({a^ξ})Ψ(t)=F({a^ξ})eH^t{nξ0} (8)

A Trotter factorization is introduced for the evolution operator eH^t in a basis of coherent states, defined as a^ξzξ=zξzξ. This procedure leads to a path integral representation [25, 27, 28],

F=[DzDz]F({zξ(tϵ)})eS[{z},{z}]. (9)

Here z are the coherent state field of the second quantized theory of the parallel model, and S is the corresponding action. The action in the exponent of Eq. (9) is given, after the change of variables z=1+z, in continuous time by

S[{z},{z}]=ξ=0L0Tdt{[zξ(t)zξ(t)nξ0ln[1+zξ(t)]]δ(t)+zξzξtμ[(Lξ)zξ+1+ξzξ1Lzξ]zξν[(Lξ)ρ+zξ+1+ξρzξ1{(Lξ)ρ++ξρ}zξ]zξ1Nξ=0LLf(ξ)(1+zξ)[zξzξ]zξzξ} (10)

In the limit of a large population, we look for a saddle-point in the action Eq. (10). From the condition δSδzξ(t)c=0, we obtain zξc(t)=0. From the condition δSδzξ(t)c=0, we find the saddle-point solution zξc(t)=NPξ(t), where Pξ satisfies the differential equation for infinite population quasispecies theory, generalized to include horizontal gene transfer [27, 34]:

ddtPξ=μ[(Lξ+1)Pξ1+(ξ+1)Pξ+1LPξ]+ν[ρ+(Lξ+1)Pξ1+ρ(ξ+1)Pξ+1{(Lξ)ρ++ξρ}Pξ]+[r(ξ)ξ=0Lr(ξ)Pξ]Pξ (11)

Details are given in Appendix 1.

Fluctuations

To calculate fluctuations, we expand the action up to second order, to obtain the correlation matrix ⟨δzξ(t)δzξ′ (t)⟩ = Cξ,ξ′ (t), which in continuous time evolves according to the Lyapunov equation

ddtC=AC+CAT+B (12)

subject to the initial condition Cξ,ξ(0)=nξ0δξ,ξ. Here, the matrices A and B are defined by

[A]ξ,ξ=δξ1,ξ(Lξ+1)[μ+νρ+]+δξξ[Lf(ξ)ξ1Lf(ξ1)Pξ1ν{(Lξ)ρ++ξρ}Lμ]+L[f(ξ)f(ξ)]Pξ+δξ+1,ξ(ξ+1)[μ+νρ] (13)
[B]ξ,ξ=δξ,ξ2Lf(ξ)NPξL[f(ξ)+f(ξ)]NPξPξ (14)

See Appendix 2 for details in the derivation.

The fluctuations in the number of individuals with a given sequence composition are obtained from the relation

(δnξ)2N2=1N(Pξ+1NCξ,ξ) (15)

Continuous and discontinuous fitness functions

We consider two example fitness functions, which exhibit a quasi-species phase transition in the infinite genome length limit L → ∞. The sharp peak represents the extreme case of the wild type sequence replicating at a high rate, and all other sequences replicating at a single lower rate. The sharp peak fitness function represents a very strong selective advantage for the wild type. For the sharp peak f(ξ) = ξ,L, from Eq. (11) and large L, we find that the wild-type probability

ddtPLLAPL(1PL)L(μ+νρ)PL (16)

At steady-state, taking into account that u = 1 − O(L−1) for the sharp peak, we have ρ = (1 − u)/2 = O(L−1), and after Eq. (16) we find

Pξ=L={0,μA>11μA+O(L1),μA<1} (17)

Notice that the steady-state distribution is not affected by horizontal gene transfer (ν > 0). To obtain the fluctuations in the probability distribution, we consider Eq. (12) for the matrix element CL,L. The terms CL,L±1 are O(L−1). We also notice that ξ1=0LCξ1,L=NPL, to find that the stationary solution of Eq. (12) is given by

0=LANPL(1PL)μLCL,LνρLCL,L+LA(1PL)CL,LLANPL2LAPLCL,L=ANPL(12PL)+[(Aμνρ)2APL]CL,L (18)

From Eq. (18), we have Aμνρ = APL, and substituting into Eq. (18) we obtain

CL,L=N(12PL) (19)

Substitution of this result into Eq. (15) shows that the fluctuation is given by

(δnξ=L)2N2={0,μA>1μ(NA),μA<1} (20)

a result first given in Ref. [37] by a different method.

The second fitness function we consider is one for which the replication rate decreases continuously as a function of the Hamming distance from the wild type. In particular, we choose a quadratic fitness f(ξ) = (k/2)(2ξ/L − 1)2. The quadratic fitness represents any continuous fitness function, for which mutants reproduce more slowly than the wild type, in a way that depends continuously on the Hamming distance from the wild type. Figure 1 shows that horizontal gene transfer reduces by orders of magnitude the fluctuations in number of individuals with a given sequence composition, nξ. Indeed, a small rate of horizontal gene transfer is enough to reduce by several orders of magnitude these fluctuations, as compared to the case without horizontal gene transfer, ν = 0.

FIG. 1.

FIG. 1

Fluctuation in the number of individuals with a given sequence composition. The quadratic fitness is used in the parallel model, with L = 200 and k = 4.0. The theory is obtained from Eqs. (11) and (12). Fluctuations decrease by orders of magnitude with increasing horizontal gene transfer rate, ν.

The linear fitness function f(ξ) = /L was considered in Refs. [15, 16, 33] in the absence of back mutations. The steady-state exhibits no phase transition for the linear fitness. We skip this example in favor of the forms considered above.

Stochastic simulations

We performed Lebowitz/Gillespie simulations [38, 39] in which we explicitly simulate a population of size N undergoing the stochastic processes of mutation, horizontal gene transfer, and replication. In Fig. 2 and Fig. 3, we compare our theory with stochastic simulations, at different rates of horizontal gene transfer. The results obtained from stochastic simulations converge toward the theoretical value calculated from Eqs. (11) and (12) as the size of the population, N, increases. Non-zero horizontal gene transfer rates both reduce fluctuations and accelerate convergence towards the infinite-population value of the mean fitness.

FIG. 2.

FIG. 2

Stochastic results obtained by averaging over 50 independent Gillespie simulations, are shown and compared with analytical theory, for ν = 7.0.

FIG. 3.

FIG. 3

The average composition as a function of time, averaged over 50 independent Gillespie simulations, with population size N = 104 (solid curves). Also shown are one standard deviation envelopes ±σ(t) (dotted curves). The steady-state averages u±(δu)2 are displayed as solid lines for reference.

In Fig. 4, the steady-state probability distribution obtained from the numerical solution of Eq. (11) is compared with the distributions obtained from stochastic simulations, for different sizes, N, of the population. The convergence with N toward the infinite-population limit is more rapid for non-zero ν. Indeed for smooth fitness functions, the infinite population limit is only reached for population sizes larger than those commonly found in nature. For the discontinuous sharp peak fitness function, on the other hand, fluctuations are small, Eq. (20), and the convergence to the infinite population limit is rapid.

FIG. 4.

FIG. 4

(Color online) Finite population versus infinite population results for the probability distribution of the parallel model with quadratic fitness. Note that the Muller’s ratchet phenomenon, whereby fitness is reduced for finite populations, is greatly suppressed for ν > 0. Here k = 4 and L = 200, and the stochastic results are obtained by averaging over 50 independent numerical experiments.

Another point from Fig. 3 is that horizontal gene transfer speeds up the rate of evolution. We see that the convergence to the steady state is more rapid for increased horizontal gene transfer rates. Numerical experiments have shown that the effect of horizontal gene transfer on the rate of evolution is especially dramatic for rugged fitness landscapes [40, 41]. At the local scale, biological fitness landscapes may be relatively smooth. At larger genetic distances, however, we expect biological fitness landscapes to be rugged. Correlations exists in the rugged landscape, and horizontal gene transfer couples to those correlations in a way that allows evolution to speed up dramatically [42]. We expect that this speedup of evolution on rugged landscaped is one of the most significant effects of horizontal gene transfer in biology.

Note that when ν = 0 the number fluctuations for the case of fitness functions for which the population is not exponentially localized at ξ = L (i.e. continuous fitness functions) are large in comparison to the fluctuations for a localized population, e.g. sharp peak. Another way to see this effect is shown in Fig. 5, where for ν = 0, the convergence to N → ∞ is slow.

FIG. 5.

FIG. 5

Fluctuations in the probability distribution for the Crow-Kimura model, obtained from stochastic simulations using the Gillespie method (dots and diamonds) at different sizes of the population, in the absence of horizontal gene transfer ν = 0. Convergence towards the theoretical curve Eq. (12) (solid line) is observed. Here L = 200, and the quadratic fitness with k = 4.0 and μ = 1 was considered.

As a final remark, we tested the validity of the description of the stochastic process in the language of Hamming distance classes, as used in our theory. For that purpose, we performed numerical experiments with Lebowitz-Gillespie simulations with both a finite population of explicit sequences [27], and the analogous system in the representation of Hamming distance classes. As expected from a simple argument based on permutation invariance of the fitness function that shows the stochastic class dynamics is an exact projection of the stochastic sequence dynamics, both descriptions yield exactly the same statistics, as shown in Fig. 6.

FIG. 6.

FIG. 6

Probability distributions for the Crow-Kimura model, obtained from stochastic simulations using the Gillespie method with explicit sequences or alternatively with Hamming distance classes. Clearly both descriptions are statistically identical. Here L = 200, and the quadratic fitness is used with k = 4.0, μ = 1, and for a population of N = 109 individuals.

THE EIGEN MODEL

We now turn to the Eigen model. In contrast to the parallel model, mutation and horizontal gene transfer are assumed to occur only during replication in the Eigen model. That is, multiple mutations occur along each sequence as a consequence of errors in the replication process, and during this process horizontal gene transfer with probability ν/L per letter can also occur. The transfer matrix for mutations from class ξ′ into class ξ is denoted by Qξ,ξ′ [25],

Qξ,ξ=ξ1=0min{ξ+ξ,2L(ξ+ξ)}qL(2ξ1+ξξ)×(1q)2ξ1+ξξ(Lξξ1+ξξξ+ξ2)×(ξξ1+ξξ+ξξ2) (21)

Here, q ≃ 1 characterizes the fidelity in the replication process, when 1 − q is the probability (per site) that an incorrect letter is placed by the polymerase enzyme. Note that ‘back mutations’, often ignored in the literature, are included in the Eigen model. There is also random degradation of individuals with rate Ld. We again seek to calculate shifts in the average population distribution as well as fluctuations about the average for a finite population of individuals following the dynamics of the Eigen model master equation. Here, terms proportional to (1−ν/L) represents the evolutionary processes of replication and multiple mutations in the absence of horizontal gene transfer. On the other hand, the terms proportional to ν/L represent the coupled sequential processes of replication, horizontal gene transfer and multiple mutations. We also consider the possibility of degradation through terms proportional to the degradation rate d(ξ).

tP({nξ})=(1νL){ξ=0Lr(ξ)Qξ,ξ[(nξ1)×ξξnξ+1NP(nξ1,nξ+1)nξξξnξNP(nξ,nξ)]+ξ=0Lr(ξ)×ξξQξ,ξ[nξnξ+1NP(nξ+1,nξ1)(nξ1)nξNP(nξ,nξ)]+ξ=0Lr(ξ)×ξξQξ,ξ[nξ(ξξ,ξξ)nξ+1N×P(nξ1,nξ+1)nξ(ξξ,ξξ)nξN×P(nξ,nξ)]}+ξ=0Ld(ξ)[(nξ+1)×ξξnξ1NP(nξ+1,nξ1)nξ×ξξnξNP(nξ,nξ)]+ξ,ξ=0LQξ,ξ+1νLρ+×(L=ξ)r(ξ)nξ(ξξ,ξξ)[nξ+1N×P(nξ1,nξ+1)nξNP(nξ,nξ)]+ξ,ξ=0LQξ,ξ1νLρξr(ξ)nξ×(ξξ,ξξ)[nξ+1NP(nξ1,nξ+1)nξNP(nξ,nξ)] (22)

Mapping to a field theory

By the same method as in the parallel model, we map the master equation into a second quantized formulation, with Hamiltonian

H^=(1νL)(LN)ξ,ξ,ξ=0LQξ,ξf(ξ)a^ξ(a^ξa^ξ)×a^ξa^ξ+(LN)ξ,ξ=0Ld(ξ)a^ξ(a^ξa^ξ)a^ξa^ξ+(LN)ξ,ξ,ξ=0LOξ,ξ+1(νL)ρ+(Lξ)f(ξ)a^ξ(a^ξa^ξ)a^ξa^ξ+(LN)ξ,ξ,ξ=0LQξ,ξ1(νL)ρξ×f(ξ)a^ξ(a^ξa^ξ)a^ξa^ξ (23)

With a similar method as in the Parallel model, we introduce coherent states in a Trotter factorization of the evolution operator, as defined in Eq. (8). From this procedure, we derive the field theory for the Eigen model as well. In this case, the action given by

S[{z},{z}]=ξ=0L0Tdt{zξzξt+(zξ(t)zξ(t)nξ0ln[1+zξ(t)])δ(t)LN(1νL)ξ,ξ=0LQξ,ξf(ξ)[1+zξ]×[zξzξ]zξzξLNξ,ξ=0L[δξ,ξd(ξ)+νL[Qξ,ξ+1ρ+(Lξ)+Qξ,ξ1ρξ]f(ξ)][1+zξ][zξzξ]zξzξ} (24)

In the limit of a large population, we look for a saddle-point in the action Eq. (24). From the condition δSδzξ(t)c=0, we obtain zξc(t)=0. From the second equation δSδzξ(t)c=0, we find that zξc(t)=NPξ(t) satisfies the differential equation

ddtPξ(t)=(1νL)[ξ=0LQξ,ξr(ξ)Pξ(t)Pξ(t)×ξ=0Lr(ξ)Pξ(t)]Pξ(t)[d(ξ)ξ=0LPξ(t)×d(ξ)]+νL[ξ=0L{Qξ,ξ+1ρ+(Lξ)+Qξ,ξ1ρξ}r(ξ)Pξ(t)Pξ(t)×ξ=0L{ρ+(Lξ)+ρξ}r(ξ)Pξ(t)] (25)

and the initial condition corresponds to Pξ(0)=nξ0N, as derived in Appendix 3. This is exactly the differential equation for Pξ(t) from infinite population quasispecies theory [27, 34].

By expanding the action Eq. (24) up to second order to calculate the matrix of correlations, as shown in Appendix 4, we obtain in the continuous time limit the Lyapunov Eq. (12), with matrices A defined by

L1[A]ξ,ξ=(1νL)[ξ=0LQξ,ξf(ξ)Pξ+Qξ,ξf(ξ)δξ,ξξ=0Lf(ξ)Pξf(ξ)Pξ]+[d(ξ)d(ξ)]Pξ+δξ,ξ[ξ1=0Ld(ξ1)Pξ1d(ξ)]+νL[ξ=0L(Qξ,ξ1ρξ+Qξ,ξ+1ρ+(Lξ))×f(ξ)Pξ+(Qξ,ξ1ρξ+Qξ,ξ+1ρ+(Lξ))×f(ξ)δξ,ξξ=0L(ρ+(Lξ)+ρξ)×f(ξ)Pξ(ρ+(Lξ)+ρξ)f(ξ)Pξ] (26)

and matrices B given by

L1[B]ξ,ξ=N{(1νL)[Qξ,ξf(ξ)Pξ+Qξ,ξf(ξ)Pξ(f(ξ)+f(ξ))PξPξ]+2(ξ1=0Ld(ξ1)Pξ1)Pξδξ,ξ+νL[(Qξ,ξ+1ρ+(Lξ)+Qξ,ξ1ρξ)f(ξ)Pξ+(Qξ,ξ+1ρ+(Lξ)+Qξ,ξ1ρξ)f(ξ)Pξ[(ρ+(Lξ)+ρξ)f(ξ)+(ρ+(Lξ)+ρξ)f(ξ)]PξPξ](d(ξ)+d(ξ))PξPξ} (27)

Continuous and discontinuous fitness functions

For the sharp peak f(ξ) = (AA0)δξ,L + A0, for the Eigen model in the absence of horizontal gene transfer (ν = 0), we obtain that the wild type probability is

ξ=0Lqξ(1q)Lξf(ξ)PξPL[APL+A0ξLPξ]=0 (28)

Since q ≃ 1, (the fidelity in the replication process is very high), then 1 − q ≪ 1 and Eq. (28) becomes.

qLAPLPL[(AA0)PL+A0]=0 (29)

By defining qL = e−μ, we obtain for the probability of the wild-type

Pξ=L={0,A<eμA0(eμAA0)(AA0),A>eμA0} (30)

For the correlation matrix, we define Dξ,ξ=1NCξ,ξ, and find that the stationary solution for DL,L in the absence of degradation d(ξ) = 0 is given by

0=1NBL,L+ξ1=0L[AL,ξ1Dξ1,L+AL,ξ1Dξ1,L] (31)

From this equation, we find ξ1AL,ξ1Dξ1,L=12NBL,L. Hence, expanding the left hand side explicitly, we find

ξ1=0L[ξ=0LQL,ξf(ξ)Pξ+QL,ξ1f(ξ1)(ξ1f(ξ1)Pξ1)δL,ξ1f(ξ1)PL]Dξ1,L=[QL,Lf(L)PLf(L)PL2] (32)

Expanding this equation when L is large and q ≃ 1, we find

[qLA(AA0)PLA0(AA0)PL]DL,L=APL(PLqL)+qLAPL2A0PL2 (33)

Substituting the result PL=qLAA0AA0 from Eq. (30), we find

DL,L=1(AA0)2[AA0A02(qLA)2+qLAA0] (34)

The fluctuation in the number of individuals with the wild-type sequence is obtained from Eq. (15),

(δnξ=L)2N2={0,A<eμA0eμ(1eμ)A2N(AA0)2,A>eμA0} (35)

For smooth fitness functions, there are large fluctuations in the population numbers in the absence of horizontal gene transfer. In Fig. 7 we present the fluctuations in the number of individuals with a given sequence for the quadratic fitness, as predicted from our theory Eqs. (2527). A moderate horizontal gene transfer rate reduces by orders of magnitude the fluctuations. In Fig. 8 inset, we present the equilibrium probability distributions, for different rates of horizontal gene transfer, as obtained from our theory for the quadratic fitness f(ξ) = (k/2)(2ξ/L − 1)2/2 + 1. For this fitness function with negative epistasis, horizontal gene transfer reduces the mean fitness in the infinite population limit [27].

FIG. 7.

FIG. 7

Fluctuations in the probability distribution, as predicted from our theory Eqs. (2526), for the Eigen model and quadratic fitness, at different horizontal gene transfer rates, ν. Here L = 200, k = 4.0, and μ = 1. Fluctuations decrease by orders of magnitude with increasing horizontal gene transfer rate.

FIG. 8.

FIG. 8

Probability distributions, as predicted from our theory, for the Eigen model and quadratic fitness, at different recombination rates. Here L = 200, k = 4.0, and μ = 1.

CONCLUSION

For both the parallel and Eigen models, we have found that horizontal gene transfer reduces by orders of magnitude the fluctuations in the number of individuals with a given sequence composition for smooth fitness functions, such as quadratic. Horizontal gene transfer also reduces the variability within and between independent experiments for smooth fitness functions. Finally, horizontal gene transfer substantially reduces the “Muller’s ratchet” phenomenon, whereby fitness is reduced in finite populations relative to the infinite population limit. For the sharp peak fitness, horizontal gene transfer does not modify the steady-state distribution of fluctuations.

The reduction in finite populations by horizontal gene transfer of both the magnitude of the Muller’s ratchet phenomenon [79] and the fluctuations in population numbers should be observable in experiments. The fluctuation in population numbers can be measured either at different time points in long experiments or as fluctuations between different experimental replicates. The latter is likely to be more feasible in the laboratory.

ACKNOWLEDGMENTS

Supported by the FunBio program of DARPA. JMP is also supported by a Korea Research Foundation grant funded by the Korean Government (KRF-2008-314-C00123).

APPENDIX 1.

We present the derivation of the saddle point equations for the Kimura model. We look for a saddle point of the action Eq. (10) in the coherent fields zξ(t) and zξ(t). The first condition is

δSδzξ(t)=zξt+δ(tT)zξ(T)μ[(Lξ)zξ+1(t)+ξzξ1(t)Lzξ(t)]ν[(Lξ)ρ+zξ+1(t)+ξρzξ1{(Lξ)ρ++ξρ}zξ(t)]1Nξ1=0Lξ2=0LLf(ξ1)(1+zξ1(t))[zξ1(t)zξ2(t)](δξ1,ξzξ2(t)+zξ1(t)δξ2,ξ)=0 (36)

where T is the final integration time in Eq. (10), which we typically set as T = ∞. The solution which satisfies this saddle-point condition is zξc(t)=0, for 0 < t < T.

The saddle point condition in the fields zξ(t) is

δSδzξ(t)=[zξ(0)nξ(0)1+zξ(0)]δ(t)+zξtμ[(Lξ+1)×zξ1(t)+(ξ+1)zξ+1(t)Lzξ(t)]ν[(Lξ+1)ρ+zξ1(t)+(ξ+1)ρzξ+1(t){(Lξ)ρ++ξρ}zξ(t)]1Nξ1=0Lξ2=0LLf(ξ1){δξ1,ξ[zξ1(t)zξ2(t)]+(1+zξ1)[δξ1,ξδξ2,ξ]}zξ1(t)zξ2(t)=0 (37)

In combination with the solution zξc(t)=0 obtained from Eq. (36), Eq. (37) provides the differential equation for the probability distribution Pξ(t)=zξc(t)N,

ddtPξ=μ[(Lξ+1)Pξ1+(ξ+1)Pξ+1LPξ]+ν[ρ+(Lξ+1)Pξ1+ρ(ξ+1)Pξ+1{(Lξ)ρ++ξρ}Pξ]+[r(ξ)ξ=0Lr(ξ)Pξ]Pξ (38)

and the initial condition Pξ(0)=nξ0N. In deriving Eq. (38) from Eq. (37), the property ξ=0LPξ(t)=1 was used, and we introduce the notation r(ξ) = Lf(ξ).

APPENDIX 2.

We next consider the expansion of the action Eq. (10) near the saddle-point Sc. For convenience, we define a discrete time label k = t/ϵ, with ϵ → 0. Fluctuations near the saddle-point solution are given by δzξ(k)=zξ(k)zξc(k), and δzξ(k)=zξ(k)zξc(k). This gives

SSc=ξ,ξ=0L[δzξ(0)δzξ(0)δξ,ξ+12nξ0δzξ(0)δzξ(0)×δξ,ξ+k=1tϵ{δzξ(k)δzξ(k)δξ,ξϵδzξ(k)δzξ(k)×[δξ,ξr(ξ)NPξ(k1)r(ξ)NPξ(k1)×Pξ(k1)]}+k=1tϵδzξ(k)δzξ(k1){δξ,ξϵμ[(Lξ+1)δξ1,ξ+(ξ+1)δξ+1,ξLδξ,ξ]ϵν[(Lξ+1)ρ+δξ1,ξ+(ξ+1)ρδξ+1,ξ{(Lξ)ρ++ξρ}δξ,ξ]ϵ[{r(ξ)ξ1r(ξ1)×Pξ1(k1)}δξ,ξ+(r(ξ)r(ξ))Pξ(k1)]}]=12XTΠ1X+O(X3) (39)

Here, we have defined the vector XT=({δz(0),δz(0)},,{δz(tϵ),δz(tϵ)}). The matrix ∏−1 is banded tri-diagonal, with

Π1=(Π001Π011000Π101Π111Π121000Π211Π221Π2310Πtϵ,tϵ1) (40)

Here,

Π001=(N0II0),[N0]ξ,ξ=nξ0δξ,ξΠk,k1=(ϵB(k1)II0),k0Πk,k11=(0I+ϵA(k1)00)Πk1,k1=(00I+ϵAT(k1)0) (41)

The matrices A and B are defined by

[A]ξ,ξ=δξ1,ξ(Lξ+1)[μ+νρ+]+δξ,ξ[Lf(ξ)ξ1Lf(ξ1)Pξ1ν{(Lξ)ρ++ξρ}Lμ]+L[f(ξ)f(ξ)]Pξ+δξ+1,ξ(ξ+1)[μ+νρ] (42)
[B]ξ,ξ=δξ,ξ2Lf(ξ)NPξL[f(ξ)+f(ξ)]NPξPξ (43)

Here, A a symmetric matrix [AT (k)]ξ,ξ′ = [A(k)]ξ′,ξ. By standard matrix inversion, we obtain

Π(tϵ)=[Π1(tϵ)]1=[[Π1(tϵ1)](00Πtϵ1,tϵ1)(00Πtϵ,tϵ11)Πtϵ,tϵ1]1 (44)

Calculating the inverse in Eq. (44), we obtain

[Π(tϵ)]tϵ,tϵbtϵ,tϵ=[Πtϵ,tϵ1(00Πtϵ,tϵ11)×[Π(tϵ1)](00Πtϵ1,tϵ1)]1=[Πtϵ,tϵ1Πtϵ,tϵ11btϵ1,tϵ1×Πtϵ1,tϵ1]1 (45)

From this recursive equation, we find

b00=[Π001]1=(0IIN0)b11=[Π111Π101b00Π011]1=[0II{I+ϵA(0)}[N0]{I+ϵAT(0)}+ϵB(0)] (46)

From Eq. (46), proceeding by induction, we prove that the matrices bk possess the structure

bk,k=(0IIC(k)), (47)

and after the recursion relation

bk,k=[Πk1Πk,k11bk1,k1Πk1,k1]1 (48)

we obtain

C(k)=[I+ϵA(k1)]C(k1)[I+ϵAT(k1)]+ϵB(k1)C(0)=N0 (49)

In the continuous time limit ϵ → 0, Eq. (49) becomes a Lyapunov equation

ddtC=B+AC+CATC(0)=N0 (50)

with [N0]ξ,ξ=δξ,ξnξ0.

APPENDIX 3.

Now, we derive the saddle point equations for the Eigen model. We look for a saddle point of the action Eq. (24) in the coherent fields zξ(t) and zξ(t). The first condition is

δSδzξ(t)=zξt+δ(tT)zξ(T)LN(1νL)×ξ1,ξ2,ξ3=0L{Qξ2,ξ1f(ξ1)[1+zξ1(t)][zξ2(t)zξ3(t)](δξ1,ξzξ3(t)+δξ3,ξzξ1(t))}LNξ1,ξ2,ξ3=0L{[δξ1,ξ2d(ξ3)+νL[Qξ2,ξ1+1ρ+(Lξ1)+Qξ2,ξ11ρξ1]×f(ξ1)][1+zξ1(t)][zξ2(t)zξ3(t)](zξ3(t)δξ1,ξ+zξ1(t)δξ3,ξ)}=0 (51)

where T is the total integration time in Eq. (24), which we typically set as T = ∞. This saddle-point condition is satisfied by the solution zξc(t)=0, for 0 < t < T.

The saddle-point condition in the fields zξ(t) is

δSδzξ(t)=zξt+(zξ(0)nξ01+zξ(0))δ(t)LN×(1νL)ξ1,ξ2,ξ3=0L{Qξ2,ξ1f(ξ1)(δξ1,ξ[zξ2(t)zξ3(t)]+[1+zξ1(t)][δξ2,ξδξ3,ξ])zξ1(t)zξ3(t)}LNξ1,ξ2,ξ3{[δξ1,ξ2d(ξ3)+νL[Qξ2,ξ1+1ρ+(Lξ1)+Qξ2,ξ11ρξ1]f(ξ1)][δξ1,ξ[zξ2(t)zξ3(t)]+[1+zξ1(t)](δξ2,ξδξ3,ξ)]zξ1(t)zξ3(t)}=0 (52)

In combination with the solution zξc(t)=0 obtained from Eq. (51), after Eq. (52) we obtain the differential equation for the probability distribution Pξ(t)=zξc(t)N,

ddtPξ(t)=(1νL)[ξ=0LQξ,ξr(ξ)Pξ(t)Pξ(t)×ξ=0Lr(ξ)Pξ(t)]Pξ(t)[d(ξ)ξ=0LPξ(t)×d(ξ)]+νL[ξ=0L{Qξ,ξ+1ρ+(Lξ)+Qξ,ξ1ρξ}r(ξ)Pξ(t)Pξ(t)×ξ=0L{ρ+(Lξ)+ρξ}r(ξ)Pξ(t)] (53)

and the initial condition Pξ(0)=nξ0N. In deriving Eq. (53) from Eq. (52), we used the properties: ξ=0LPξ=1, and ξ=0LQξ,ξ=1.

APPENDIX 4.

Now, let us consider the expansion of the action Eq. (24) for the Eigen model near the saddle point, with fluctuations near the saddle-point solution given by δzξ(k)=zξ(k)zξc(k), and δzξ(k)=zξ(k)zξc(k).

SSc=ξ=0L[δzξ(0)δzξ(0)+12nξ0δzξ(0)δzξ(0)+k=1tϵδzξ(k)(δzξ(k)δzξ(k1))]ϵNk=1tϵ[×(1νL)ξ,ξ,ξQξ,ξr(ξ)[δzξ(k)δzξ(k)]×[δzξ(k)N2PξPξ+NPξδzξ(k1)+NPξδzξ(k1)]+ξ,ξd(ξ)[δzξ(k)δzξ(k)][δzξ(k)N2PξPξ+NPξδzξ(k1)+NPξδzξ(k1)]]νLϵNk=1tϵξ,ξ,ξ{Qξ,ξ+1ρ+(Lξ)+Qξ,ξ1×ρξ}r(ξ)[δzξ(k)δzξ(k)][δzξ(k)N2PξPξ+NPξδzξ(k1)+NPξδzξ(k1)]+O[(δz,δz)3]=12XTΠ1X+O(X3) (54)

Here, we defined XT=({δz(0),δz(0)},,{δz(tϵ),δz(tϵ)}). The matrix ∏−1 is tridiagonal by blocks, as in the case of the parallel model. A similar analysis holds for the Eigen model as well, with matrices A and B defined as

L1[A]ξ,ξ=(1νL)[ξ=0LQξ,ξf(ξ)Pξ+Qξ,ξf(ξ)δξ,ξξ=0Lf(ξ)Pξf(ξ)Pξ]+[d(ξ)d(ξ)d(ξ)]Pξ+δξ,ξ[ξ1=0Ld(ξ1)Pξ1d(ξ)]+νL[ξ=0L(Qξ,ξ1ρξ+Qξ,ξ+1ρ+(Lξ))f(ξ)Pξ+(Qξ,ξ1ρξ+Qξ,ξ+1ρ+(Lξ))f(ξ)δξ,ξξ=0L(ρ+(Lξ)+ρξ)f(ξ)Pξ(ρ+(Lξ)+ρξ)f(ξ)Pξ] (55)
L1[B]ξ,ξ=N{(1νL)[Qξ,ξf(ξ)Pξ+Qξ,ξf(ξ)Pξ(f(ξ)+f(ξ))PξPξ]+2(ξ1=0Ld(ξ1)Pξ1)Pξδξ,ξ+νL[(Qξ,ξ+1ρ+(Lξ)+Qξ,ξ1ρξ)f(ξ)Pξ+(Qξ,ξ+1ρ+(Lξ)+Qξ,ξ1ρξ)f(ξ)Pξ[(ρ+(Lξ)+ρξ)f(ξ)+(ρ+(Lξ)+ρξ)×f(ξ)]PξPξ(d(ξ)+d(ξ))PξPξ]} (56)

A recursion relation identical to Eq. (50) is obtained, which in the continuous time limit ϵ → 0 yields a Lyapunov equation for the matrix C,

ddtC=B+AC+CAT (57)

with initial condition Cξ,ξ=δξ,ξnξ0.

Footnotes

PACS numbers: 87.10.+e, 87.15.Aa, 87.23.Kg, 02.50.-r

References

RESOURCES