Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jun 24.
Published in final edited form as: Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Dec 23;78(6 0 1):061921. doi: 10.1103/PhysRevE.78.061921

Quasispecies theory for Horizontal Gene Transfer and Recombination

Enrique Muñoz 1, Jeong-Man Park 1,2, Michael W Deem 1
PMCID: PMC4478466  NIHMSID: NIHMS700262  PMID: 19256882

Abstract

We introduce a generalization of the parallel, or Crow-Kimura, and Eigen models of molecular evolution to represent the exchange of genetic information between individuals in a population. We study the effect of different schemes of genetic recombination on the steady-state mean fitness and distribution of individuals in the population, through an analytic field theoretic mapping. We investigate both horizontal gene transfer from a population and recombination between pairs of individuals. Somewhat surprisingly, these nonlinear generalizations of quasi-species theory to modern biology are analytically solvable. For two-parent recombination, we find two selected phases, one of which is spectrally rigid. We present exact analytical formulas for the equilibrium mean fitness of the population, in terms of a maximum principle, which are generally applicable to any permutation invariant replication rate function. For smooth fitness landscapes, we show that when positive epistatic interactions are present, recombination or horizontal gene transfer introduces a mild load against selection. Conversely, if the fitness landscape exhibits negative epistasis, horizontal gene transfer or recombination introduce an advantage by enhancing selection towards the fittest genotypes. These results prove that the mutational deterministic hypothesis holds for quasi-species models. For the discontinuous single sharp peak fitness landscape, we show that horizontal gene transfer has no effect on the fitness, while recombination decreases the fitness, for both the parallel and the Eigen models. We present numerical and analytical results as well as phase diagrams for the different cases.

I. INTRODUCTION

It has been argued that genetic recombination provides a mechanism to speed up evolution, at least in finite populations [1]. Moreover, it has been suggested that recombination may provide a way to escape from the phenomenon of “Muller’s ratchet” [2], or suboptimal fitness characteristic of finite populations with asexual reproduction. In bacteria, it has been proposed [3] that horizontal gene transfer allows for the gradual emergence of modularity, through the formation of gene clusters and their eventual organization into operons. In in-vitro systems, protein engineering protocols by directed evolution incorporate genetic recombination in the form of DNA shuffling [4, 5] to speed up the search for desired features such as high binding constants among combinatorial libraries of mutants.

Besides these inherently dynamical effects, it remains a matter of debate if the exchange of genetic-encoding elements provides a long-term advantage to an infinite population in a nearly static environment. Indeed, it is argued that [6] when advantageous genetic associations have been generated as a result of selection in a given environment, further random recombination is likely to disrupt these associations, thus decreasing the overall fitness. This argument is less cogent if we consider that recombination and horizontal gene transfer preserve the modular structure of the genetic material [3]. That is, entire operational and functional units are recombined, rather than random pieces. It has also been proposed that for recombination to introduce an advantage in infinite populations, negative linkage disequilibrium is required [710]. This situation means that particular allele combinations are present in the population at a lower frequency than predicted by chance. Negative linkage disequilibrium can result as a consequence of negative epistasis: alleles with negative contributions to the fitness interact synergistically, increasing their deleterious effect when combined, and alleles with positive contributions to the fitness interact antagonistically [7, 11, 12], see Fig. 1. Under negative epistasis, the mutational deterministic hypothesis [7, 911, 1315] postulates that recombination promotes a more efficient removal of deleterious mutations, by bringing them together into single genomes, and hence facilitating selection [13, 16] to discard those genotypes with low fitness. It has been argued that the negative linkage disequilibrium generated by negative epistatic interactions is a factor to promote the evolution of recombination in nature [7, 15, 17], and conversely that recombination may act as a mechanism to evolve epistasis [1820]. This later statement is controversial, since it is intuitive that recombination should contribute to weaken correlations between different genes [21]. Despite these theoretical arguments, experimental studies seem to indicate that negative epistasis is not so common in nature [22, 23] as recombination and, moreover, both negative and positive epistasis may coexist as different fitness components [7] within the same genome in natural organisms.

FIG. 1.

FIG. 1

Convention for the sign of epistasis, ε. In the figure are represented two smooth fitness landscapes, as a function of u = 2l/N − 1, with N the total length of the (binary) genetic sequences and 0 ≤ lN the number of beneficial mutations (number of ’+’ spins) along the sequence. In this representation, positive (synergistic) epistasis ε > 0 corresponds to a positive curvature f″ (u) > 0, while negative (antagonistic) epistasis ε < 0 corresponds to a negative curvarture f″ (u) < 0 [7, 11, 12]. The examples shown are a quadratic fitness landscape f(u) = ku2/2 (dashed line), with positive curvature and ε > 0, and a square-root fitness landscape f(u)=ku (solid line), with negative curvature and ε < 0. We set k = 4.0 in both examples.

To address some of these questions, we study the effect of transferring genetic information between different organisms in an infinite population. We choose the conceptual framework of “quasi-species” theory, represented by two classical models of molecular evolution: the Eigen [2427] model and the parallel, or Crow-Kimura, model [28, 29]. These classical models include the basic processes of mutation, selection, and replication that occur in biological evolution. Our goal is to solve these two standard models of quasi-species theory, Crow-Kimura and Eigen, when horizontal gene transfer or recombination are included. Since horizontal gene transfer and recombination are essential features of evolutionary biology, our solutions bring quasi-species theory closer to modern biology. An operational definition of fitness is provided in these models by the replication rate, which is considered to be a function of the genotype. In their simplest formulation quasi-species models consider a static environment, with a deterministic mapping between individual genetic sequences and replication rate. Both the Eigen [24, 25] and the parallel, or Crow-Kimura model [28], are formulated in terms of a large system of differential equations, describing the time evolution of the relative frequencies of the different sequence types in an infinite population, a mathematical language that is common in the field of chemical kinetics [24, 25]. Sequences, representing information carrying molecules such as RNA or DNA, are assumed to be drawn from a binary alphabet (e.g. purines/pyrimidines). The most remarkable property of these classical models is that when the mutation rate is below a critical value they exhibit a phase transition in the infinite genome limit [2427, 3034], with the emergence of a self-organized phase: the quasi-species [2426]. This organized phase, characterized by a collection of nearly neutral mutants rather than by a single homogeneous sequence type, is mainly a consequence of the auto-catalytic character of the evolution dynamics, which tends to enrich exponentially the proportion of fittest individuals in the population [2427]. The quasi-species concept, with its corresponding ”error threshold” transition, has been applied in the interpretation of experimental studies in RNA viruses [3538]. In particular, the error-threshold transition has been proposed as a theoretical motivation for an antiviral strategy [39], termed ”lethal mutagenesis”, which drives an infecting population of viruses towards extinction by enhancing their mutation rate [4042]. It has been argued, however, that the mechanism for lethal mutagenesis possesses a strong ecological component [43], and that perhaps the mean population fitness is simply driven negative, and so the total number of viral particles in an infecting population decreases in time towards extinction, in contrast with error-threshold theories that describe a randomization of the composition of the quasi-species in genotype space.

The existence of the error threshold transition has motivated the attention of theoretical physicists, especially since it was proved that the quasi-species theory can be exactly mapped into an 2D Ising spin system [30, 31], with a phase transition that is first order for a sharp peak fitness, and second or higher order for smooth fitness functions. More recently, exact mappings into a quantum spin chain [4448] or field theoretic representations [33] have been developed. Analytical and numerical studies of these systems, in the large genome limit, are possible when the fitness function is considered to be permutation invariant [32, 33, 44, 45, 49], or depending on the overlap with several peaks in sequence space [34]. The mapping of the quasi-species models into a physical system allows for the application of the powerful mathematical techniques of statistical mechanics, thus obtaining exact analytical solutions which provide significant insight over numerical studies [33, 34, 46]. Most of the existing analytical solutions correspond to the case when recombination is absent. Recombination and horizontal gene transfer have been studied by computer simulations of artificial gene networks [11] and digital organisms [8], but relatively few analytical approaches have been reported in the context of quasi-species theory [1, 4951]. A numerical study of a mathematical model for viral super-infection termed uniform crossover, and intermediate between horizontal gene transfer and recombination, has been reported [50], with numerical solutions based on relatively short viral sequences (N=15). More recently, the effect of incorporating horizontal gene transfer in quasi-species theory has been studied in terms of the dynamics [1], reporting numerical studies and approximate analytical expressions. Exact analytical expressions for the equilibrium properties of the population in the presence of horizontal gene transfer have been derived using the methods of quantum field theory [49].

In this article, we study the effect of introducing different schemes of genetic recombination in quasi-species theory. Extending the results in [49], we present an exact field theoretical mapping of the parallel and Eigen models. We remark that field theoretical methods provide a unique and powerful set of tools for the analytical study of dynamical systems, such as reaction-diffusion [52, 53] or birth-death processes [54]. In this paper, we employ these theoretical tools to obtain exact analytical expressions for the equilibrium mean fitness and average composition of the population, for permutation invariant but otherwise arbitrary replication rate functions.

In Section 2 we consider the parallel model. We consider horizontal gene transfer of non-overlapping blocks, as well as of blocks of random size. We also consider a recombination process producing a daughter sequence symmetrically from two parents, as might occur in viral super- or co-infection. In Section 3, we study the effect of these different genetic recombination schemes in the context of the Eigen model. In both models, recombination leads to two selected phases. Interestingly, beyond a critical recombination rate, the distribution of the population becomes independent of the recombination rate. Also interesting is that the steady-state distribution is independent of the crossover probability.

To study the effect of epistasis, whose sign is determined by the curvature of the fitness landscape (second derivative) when represented as a function of the Hamming distance with respect to the wild-type, we considered two different examples of smooth fitness functions: a quadratic function, representing positive epistasis, and a square-root function representing negative epistasis. We find that, for the quadratic fitness function, horizontal gene transfer and recombination introduce a mild load against selection. The opposite effect is observed for the square-root fitness, that is, horizontal gene transfer and recombination introduce an advantage by enhancing selection towards fittest genotypes. This results provide support for the mutational deterministic hypothesis, which postulates that recombination should be beneficial for negative epistasis fitness functions, and deleterious for positive epistasis fitness functions. Moreover, we prove analytically in Appendix L that the mutational deterministic hypothesis applies for the parallel model in the presence of horizontal gene transfer. A similar proof is provided in Appendix M for the Eigen model. We also show analytically that the mutational deterministic hypothesis applies for the case of two-parent recombination, as presented in Appendix N for the parallel model, and in Appendix O for the Eigen model.

The effect of recombination becomes negligible for discontinuous fitness landscapes, such as a single sharp peak. For all these cases, we present exact analytical expressions that determine the phase structure of the population at steady state. Results are explicit for any microscopic fitness function: Eqs. (14), (31), and (6263) for the parallel model and Eqs. (82), (93), and (106107) for the Eigen model. We evaluate these expressions for three permutation invariant fitness functions: sharp peak, quadratic, and square root for the two common forms of quasi-species theory, parallel and Eigen: Eqs. (22), (23), (33), (34), (68), (71), (8587), (9698), (112), and (113). We also present numerical tests supporting our analytical equations.

II. THE PARALLEL MODEL

We consider a generalization [49] of the parallel, or Crow-Kimura [28], model to take into account the transfer of genetic material between pairs of individuals in an infinite population.

dqidt=riqi+k=12Nμikqk+νNk,lRkliqkqlkqkνNqi (1)

Here, qi represents the (unnormalized) frequency of the sequence type Si=(s1i,s2i,,sNi), with sji=±1, for 1 ≤ i ≤ 2N and 1 ≤ jN. The normalized frequencies are obtained from pi=qi/j=12Nqj. In Eq. (1), ri is the replication rate of sequence Si. It is given that ri=Nf(1Nj=1Nsji). The mutation rate from sequence Sj into Si is μij = μδdij,1Nμδdij,0. The Kronecker delta in this expression ensures that mutations involve a single base substitution per unit time (generation). Genetic recombination processes between pairs of sequences in the population are represented by the nonlinear term. They are considered to occur with an overall rate ν, while the coefficient Rkli represents the probability that a pair of parental sequences Sk, Sl produces an offspring Si. Depending on the particular recombination mechanism, some of these coefficients will be identically zero. Also, these coefficients must satisfy the condition i=12NRkli=1, ∀ 1 ≤ k, l ≤ 2N.

For this generic process, we will present the analytical solutions for the steady-state mean fitness by considering different schemes of genetic recombination.

A. Horizontal gene transfer of non-overlapping blocks

In this recombination scheme, we consider the exchange of blocks of genetic material between pairs of individuals. We consider these blocks to be non-overlapping in the parental sequences, and of a fixed size . Thus, each sequence is made of N/ blocks. The recombination coefficients in the differential Eq. (1) are given for this horizontal gene transfer process by

Rkli=b=0N/M¯1jb=M¯b+1M¯(b+1)(1+sjblsjbi2)j{jb}N(1+sjksji2). (2)

Here, 0 ≤ bN/ − 1 represents the block index, while M̄b + 1 ≤ jb (b + 1) represents the site index within block b.

Generalizing the method presented in [49], we write the non-linear term as

lqlRklimqm=b=0N/M¯1jb=M¯b+1M¯(b+1)(1+sjblsjbi2)j{jb}N(1+sjksji2). (3)

Here, 〈Al〉 = ∑l qlAl/∑m qm is a population average. At steady state, this average is independent of the value of b, due to the symmetry of the fitness function.

The variance of the composition ul=1Nj=1Nsjl is given by 1N2j,j=1Nδsjlδsjl. In the absence of recombination or horizontal gene transfer this variance is 𝒪(N−1), which implies correlations along the sequence are 𝒪(N−1) [33]. We expect the same scaling of the variance in the presence of recombination or horizontal gene transfer. Therefore, we introduce the factorization

jb=M¯b+1M¯(b+1)1+sjblsjbi2~jb=M¯b+1M¯(b+1)1+sjblsjbi2+𝒪(M¯/N)=jb=M¯b+1M¯(b+1)(δsjbi,+11+u(jb)2+δsjbi,11u(jb)2) (4)

which becomes exact in the N → ∞ limit. Here, u(jb)=lqlsjbl/mqm is the average base composition at site jb.

We are interested in the long time behavior of the system, when the average base composition becomes independent of time and position u(j) ~ u. Thus, in the formalism of spin Boson operators [49] â(j)=(â1(j),â2(j)), we define the recombination operator describing this recombination term by

R^=1Nb=0N/M¯1[jb=M¯b+1M¯(b+1)[ρ+â1(jb)+ρâ2(jb)][â1(jb)+â2(jb)]Î] (5)

Here, Î is the identity operator. The coefficients ρ± = (1 ± u)/2 represent [49] the steady-state probability (per site) of having a “+1” or a “−1”. Defining the matrix

D=(ρ+ρ+ρρ), (6)

the recombination operator in Eq. (5) can be expressed as

R^=1Nb=0N/M¯1[jb=M¯b+1M¯(b+1)â(jb)Dâ(jb)Î]. (7)

1. The Hamiltonian

Considering the recombination operator in Eq. (7), we formulate the Hamiltonian describing the system

Ĥ=Nf[1Nj=1Nâ(j)σ3â(j)]+μj=1N[â(j)σ1â(j)Î]+νb=0N/M¯1[jb=M¯b+1M¯(b+1)â(jb)Dâ(jb)Î]. (8)

Here, σ3=(1001) and σ1=(0110) are the Pauli matrices. We introduce a Trotter factorization

eĤt=limM[𝒟z*𝒟z]|zM(k=1Mzk|eεĤ|zk1)z0|. (9)

As shown in Appendix A, the partition function that gives the mean population fitness is

Z=[𝒟ξ¯𝒟ξ𝒟ϕ¯𝒟ϕ]eS[ξ¯,ξ,ϕ¯,ϕ]~eNfmt. (10)

Here, the action in the continuous time limit is

S[ξ¯,ξ,ϕ¯,ϕ]=N0tdt[ξ¯ξϕ¯ϕμνM¯+f(ξ)+νM¯ϕM¯]NlnQ. (11)

2. The saddle point limit

In the N → ∞ limit, the saddle point is exact and we obtain an analytical expression for the partition function Eq. (10). We look for the steady-state solution, when the fields become independent of time, ξc, ξ̄c, ϕc, ϕ̄c. The trace defined by Eq. (A10) in the long time saddle-point limit becomes

limtlnQct=ϕ¯c2+[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c/2)2]1/2 (12)

Hence, the saddle-point action is

limN,tlnZNt=limtScNt=fm=maxξc,ξ¯c,ϕc,ϕ¯c{f(ξc)ξ¯cξcϕ¯cϕcμνM¯+νM¯ϕcM¯+ϕ¯c2+[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c/2)2]1/2}. (13)

As shown in Appendix B, the mean fitness of the population is

fm=max1ξc1{f(ξc)μνM¯+νM¯[ϕc(ξc)]M¯+μ1ξc21u2(1+ν2μ[ϕc(ξc)]M¯1)[(1+ν2μ(1u2)[ϕc(ξc)]M¯1)2u2]1/2}. (14)

Here, ϕc is given by Eq. (B7), and the surplus u is obtained through the self-consistency condition fm = f(u). Equation (14) represents an exact analytical expression for the mean fitness of an infinite population experiencing horizontal gene transfer. This expression is valid for an arbitrary, permutation invariant replication rate f(u).

It is worth to notice that Eq. (14) is a natural generalization of the single-site horizontal gene transfer process described in [49]. Indeed, specializing the Eqs. (B7) and (14) to the particular case = 1, after some algebra, we obtain

fm(M¯=1)=max1ξc1{f(ξc)μν2+νu2ξc+1ξc2[(μ+ν2)2(uν2)2]1/2}, (15)

which reproduces the analytical result in [49].

3. Numerical tests and examples

For numerical calculations, it is convenient to reformulate Eq. (1) in terms of the fraction of the population at a distance l from the wild type, Pl = ∑j∈𝒞l pj. Here, 𝒞l is the class of sequences with l number of “−1” sites. The number of sequences within this class is (Nl).

As an example, for the case = 3, the differential equation representing the time evolution of the probability distribution of classes within an infinite population of binary sequences is

dPldt=N[f(2l/N1)l=0NPlf(2l/N1)μ]Pl+μN[Nl+1NPl1+l+1NPl+1]+ν3N{ρ3g3(Nl+3)Pl3+[ρ3h(Nl+2)+3ρ2ρ+g3(Nl+2)]Pl2+[ρ3h(l1)+3ρρ+2g3(Nl+1)+3ρ2ρ+h(Nl+1)]Pl1+[ρ+3h(Nl1)+3ρ2ρ+g3(l+1)+3ρρ+2h(l+1)]Pl+1+[ρ+3h(l+2)+3ρρ+2g3(l+2)]Pl+2+ρ+3g3(l+3)Pl+3}ν3N{(ρ3+3ρ2ρ++3ρρ+2)g3(Nl)+(ρ3+3ρ2ρ++ρ+3)h(Nl)+(ρ3+3ρρ+2+ρ+3)h(l)+(ρ+3+3ρ+ρ2+3ρρ+2)g3(l)}Pl (16)

In writing this equation we have made use of the only 𝒪(N−1) correlations between sites, which holds at long time as well as for short time with suitable initial conditions. Here, we defined

ρ±=1±u2 (17)

where the average composition is calculated as

u=l=0NN2lNPl (18)

and the functions

g3(l)=l(l1)(l2)N(N1)(N2)
h(l)=3l(l1)(Nl)N(N1)(N2) (19)

A comparison between the analytical expression Eq. (14) and the direct numerical solution of the differential Eq. (16) for N = 1002 is presented in Table I, where the quadratic fitness f(u) = ku2/2 was considered. We notice that the analytical method and the numerical solution provide the same results within 𝒪(N−1), as expected from the saddle point limit.

TABLE I.

Analytical versus numerical results for horizontal gene transfer in the parallel (Kimura) model for the quadratic fitness f(u) = ku2/2, with = 3.

k ν/μ unumeric uanalytic

2.0 0.0 0.4993 0.5000
2.0 0.5 0.4830 0.4838
2.0 1.0 0.4668 0.4677
2.0 1.5 0.4510 0.4519

2.5 0.0 0.5995 0.6000
2.5 0.5 0.5915 0.5920
2.5 1.0 0.5838 0.5844
2.5 1.5 0.5766 0.5772

5.0 0.0 0.7998 0.8000
5.0 0.5 0.7988 0.7990
5.0 1.0 0.7979 0.7981
5.0 1.5 0.7970 0.7972

The differential equation representing the horizontal gene transfer of blocks of size = 4 within an infinite population of binary sequences is given by

ddtPl=N[f(2l/N1)l=0NPlf(2l/N1)μ]Pl+μN[Nl+1NPl1+l+1NPl+1]+ν4N{g4(Nl+4)ρ4Pl4+[ρ4h3(Nl+3)+4ρ3ρ+g4(Nl+3)]Pl3+[ρ4h2(l2)+4ρ3ρ+h3(Nl+2)+6ρ2ρ+2g4(Nl+2)]Pl2+[ρ4h3(l1)+4ρ3ρ+h2(l1)+6ρ2ρ+2h3(Nl+1)+4ρρ+3g4(Nl+1)]Pl1+[ρ+4h3(Nl1)+4ρρ+3h2(l+1)+6ρ2ρ+2h3(l+1)+4ρ3ρ+g4(l+1)]Pl+1+[ρ+4h2(l+2)+4ρρ+3h3(l+2)+6ρ2ρ+2g4(l+2)]Pl+2+[ρ+4h3(l+3)+4ρρ+3g4(l+3)]Pl+3+ρ+4g4(l+4)Pl+4}ν4N{[ρ4+6ρ2ρ+2+4ρ3ρ++ρ+4]h3(Nl)+[ρ4+6ρ2ρ+2+4ρρ+3+ρ+4]h3(l)+[4ρ3ρ++6ρ2ρ+2+4ρρ+3+ρ4]g4(Nl)+[4ρ3ρ++6ρ2ρ+2+4ρρ+3+ρ+4]g4(l)+[ρ4+4ρ3ρ++4ρρ+3+ρ+4]h2(l)}Pl (20)

Here, the parameters ρ± and u are defined, as before, by Eq. (17) and Eq. (18), respectively. We also define the functions

g4(l)=l(l1)(l2)(l3)N(N1)(N2)(N3)
h3(l)=4l(l1)(l2)(Nl)N(N1)(N2)(N3)
h2(l)=6l(l1)(Nl)(Nl1)N(N1)(N2)(N3) (21)

A comparison between the analytical expression Eq. (14) and the direct numerical solution of the differential Eq. (20) for N = 1002 is presented in Table II, for the quadratic fitness f(u) = ku2/2. As in the former case, the numerical and analytical results agree to within 𝒪(N−1), as expected.

TABLE II.

Analytical versus numerical results for horizontal gene transfer in the parallel model for the quadratic fitness f(u) = ku2/2, with = 4.

k ν/μ unumeric uanalytic

2.0 0.0 0.4993 0.5000
2.0 0.5 0.4832 0.4840
2.0 1.0 0.4672 0.4680
2.0 1.5 0.4510 0.4519

2.5 0.0 0.5995 0.6000
2.5 0.5 0.5916 0.5921
2.5 1.0 0.5839 0.5845
2.5 1.5 0.5766 0.5773

5.0 0.0 0.7998 0.8000
5.0 0.5 0.7988 0.7990
5.0 1.0 0.7979 0.7981
5.0 1.5 0.7970 0.7973

For the quadratic fitness case in the absence of recombination (ν = 0), the exact analytical result predicts the existence of a “selected” organized phase, or quasi-species, when k > μ. In this phase, the average composition is given by u = 1 − μ/k. For k < μ, a phase transition occurs and the quasi-species disappears in favor of a disordered or “unselected” phase with u = 0. In Figure 3, we display the phase structure in the presence of horizontal gene transfer. In agreement with the numerical results presented in Table I and Table II, the recombination scheme considered in this model introduces a mild mutational load. However, near the critical region k/μ ~ 1, one observes that horizontal gene transfer distorts the phase boundary which defines the error threshold, from the horizontal line k/μ = 1, to a monotonically increasing curve that saturates for large values of ν/μ. We obtain an analytical expression for the phase boundary, by expanding Eqs. (B7) and (14) near the critical region ξc ~ 0, u ~ 0. We find that the boundary is defined by

kcrit=μ1+ν/μ1+ν/2μ. (22)

We notice from this expression that for small ν, kcrit ~ μ + ν/2, whereas for large ν the phase boundary becomes asymptotically independent of ν, kcrit ~ 2μ. We also notice from this formula that the phase boundary is independent of the block size .

FIG. 3.

FIG. 3

Phase diagram of the parallel (Kimura) model for the quadratic fitness f(u) = ku2/2, with horizontal gene transfer of non-overlapping blocks of size . The phase boundary of the error threshold phase transition is given by the curve, and its shape is independent of the block size . In the absence of horizontal gene transfer, the phase transition occurs at k/μ = 1.

As a second example, we consider a square-root fitness function

f(u)=k|u| (23)

In Table III, we present a comparison of our analytical result, obtained from Eq. (14), with the direct numerical solution of the differential Eq. (16), for = 3. As in the quadratic fitness example, the analytical and numerical results agree to order 𝒪(N−1), as expected.

TABLE III.

Analytical versus numerical results for horizontal gene transfer in the parallel (Kimura) model for the square-root fitness f(u)=k|u|, with = 3, N = 801.

k ν/μ unumeric uanalytic

2.0 0.0 0.4858 0.4855
2.0 0.5 0.4892 0.4889
2.0 1.0 0.4918 0.4915
2.0 1.5 0.4939 0.4936

2.5 0.0 0.5399 0.5396
2.5 0.5 0.5428 0.5425
2.5 1.0 0.5450 0.5448
2.5 1.5 0.5469 0.5466

4.0 0.0 0.6525 0.6523
4.0 0.5 0.6542 0.6540
4.0 1.0 0.6556 0.6554
4.0 1.5 0.6568 0.6565

From the results presented in Table III, it is remarkable that the average composition u, and correspondingly the mean fitness of the population fm=k|u|, increase when increasing the horizontal gene transfer rate ν.

The mutational deterministic hypothesis states that recombination is beneficial for negative epistasis fitness functions (see Fig. 1) f″ (u) < 0, and deleterious for positive epistasis fitness functions, f″ (u) > 0 [7, 911, 13, 14]. Our results for the quadratic and square-root fitness functions, Eqs. (14)(22) and Tables I, II, and III provide support for this hypothesis. In fact, we can prove the mutational deterministic hypothesis holds for the parallel model in the presence of horizontal gene transfer, Appendix L.

Horizontal gene transfer has less of an effect for the sharp peak fitness, f(u) = Aδu,1. For general , the maximum in Eq. (14) is achieved for ξc = 1, with ϕc(1) = (1 + u)/2 from Eq. (B7). Thus, one obtains

fm=AμνM¯[1(1+u2)M¯]. (24)

The error threshold is given for u = 0 by the condition A>μ+νM¯(12M¯). However, we notice from Eq. (24) that fm(u = 1) = A − μ > fm(u = 0). Therefore, we have u = 1 − 𝒪(N−1) in the selected phase, with the effect of horizontal gene transfer being negligible for finite . We obtain the fraction of the population located at the peak P0, from the self-consistency condition P0A = fm, which yields P0 = 1 − μ/A. Thus, the true error threshold is at Acrit = μ, with the condition A>μ+νM¯(12M¯) defining the limit of metastability for initial conditions with u ~ 0. These results are similar to the ones obtained in the absence of horizontal gene transfer [33, 49, 55]. Thus, we conclude that for the sharp peak fitness, horizontal gene transfer does not spread out the population in sequence space. This result differs from the numerical studies presented in [50], where a mathematical model for ’uniform crossover’ recombination between viral strains super-infecting a population of cells was described. We remark that this model studied sequences of finite length (N = 15), where the error threshold transition is not really sharp. Our results correspond to the more realistic limit N → ∞ (typical viral genomes are 103 − 104).

In summary, from our exact analytical formula for the mean fitness Eq. (14), which is valid for any permutation invariant replication rate, we developed the explicit solution of three different examples: a quadratic fitness, a square-root fitness and a single sharp peak. For the case of smooth fitness functions, from our exact analytical formulas for the mean fitness fm and average composition u, we conclude that in agreement with the mutational deterministic hypothesis [7, 9, 10, 13, 14], a population whose fitness represents positive epistasis (i.e. quadratic), will experience an additional load against selection due to horizontal gene transfer. On the contrary, when negative epistasis is present (e.g. square-root), horizontal gene transfer is beneficial by enhancing selection. We provided a mathematical proof for this effect, Appendix L. When the fitness is defined by a single sharp peak, the population steady-state distribution behaves more rigidly in response to horizontal gene transfer. This fundamental difference can be attributed to the structure of the quasi-species distribution, which in the smooth fitness case is a Gaussian centered at the mean fitness, while in the sharp peak it is a fast decaying exponential, sharply peaked at the master sequence [33]. While the Gaussian distribution spreads its tails over a wide region of sequence space, thus allowing for horizontal gene transfer effects to propagate over a large diversity of mutants, the sharp exponential distribution concentrates in a narrow neighborhood of the master sequence, acting as a barrier to the propagation of such effects.

B. Horizontal gene transfer for multiple-size blocks

A natural extension to the model of horizontal gene transfer involving blocks of genes of a given size is to consider a process where each site along the sequence may be transferred with probability γ, or left intact with probability 1 − γ. The operator describing this process is

R^=1M¯j=1N[(1γ)Îj+γR^j]1M¯Î. (25)

Here, R^j=â(j)Dâ(j) is the single-site recombination operator defined in Eq. (5), with the matrix D defined as in Eq. (6). Notice that this operator represents a binomial process, where an average number of sites 〈〉 = γN is transferred. If we consider, as in the former finite block size case, that N/〈〉 = 𝒪(N), then we have γ = 〈〉/N, and for very large N Eq. (25) reduces to

R^=1M¯j=1N[(1γ)Îj+γR^j]1M¯Î~1M¯eM¯+M¯Nj=1Nâ(j)Dâ(j)1M¯Î. (26)

Considering the recombination operator defined in Eq. (26), the spin Boson Hamiltonian for the Kimura model becomes

Ĥ=Nf[1Nj=1Nâ(j)σ3â(j)]+μj=1N[â(j)σ1â(j)Î]+νM¯NeM¯+M¯Nj=1Nâ(j)Dâ(j)νM¯NÎ. (27)

We introduce a Trotter factorization

eĤt=limM[𝒟z*𝒟z]|zM(k=1Mzk|eεĤ|zk1)z0|. (28)

As shown in Appendix C, the partition function becomes

Z=[𝒟ξ¯𝒟ξ𝒟ϕ¯𝒟ϕ]eS[ξ¯,ξ,ϕ¯,ϕ]~eNfmt. (29)

Here, the action in the continuous time limit is

S[ξ¯,ξ,ϕ¯,ϕ]=N0tdt[ξ¯ξϕ¯ϕμνM¯+f(ξ)+νM¯eM¯(1ϕ)]NlnQ (30)

1. The saddle point limit

As in the previous model, the saddle point limit is exact as N → ∞ in Eq. (30).

After a similar procedure as in section II.A.2, we find the saddle-point equation for the mean fitness

fm=max1ξc1{f(ξc)μνM¯+νM¯eM¯(1ϕc(ξc))+μ1ξc21u2(1+ν2μeM¯(1ϕc(ξc)))[(1+ν2μ(1u2)eM¯(1ϕc(ξc)))2u2]1/2} (31)

Here, ϕcc) is obtained from the equation

ϕc(ξc)=1+uξc2+1ξc221u2[1(u1+ν2μ(1u2)eM¯(1ϕc))2]1/2 (32)

Eq. (31) represents an exact analytical expression for the mean fitness fm of an infinite population experiencing horizontal gene transfer of multiple size sequences. The formula is valid for an arbitrary, permutation invariant replication rate function f(u).

We notice that recombination introduces an additional mutational load against selection. This load is mild at low values of the fitness constant k, and becomes negligibly small at larger values. Numerical evaluation of Eqs. (31) and (32) is presented in Table IV for the quadratic fitness f(u) = ku2/2, and average block size 〈〉 = 3.

TABLE IV.

Analytical results for horizontal gene transfer in the parallel model for the quadratic fitness f(u)=k2u2, with 〈〉 = 3.

k ν uanalytic

2.0 0.0 0.50
2.0 0.5 0.4840
2.0 1.0 0.4680
2.0 1.5 0.4522

2.5 0.0 0.6000
2.5 0.5 0.5921
2.5 1.0 0.5845
2.5 1.5 0.5773

4.0 0.0 0.8000
4.0 0.5 0.7990
4.0 1.0 0.7981
4.0 1.5 0.7973

An analytical expression for the phase boundary is obtained from Eqs. (31) and (32), near the error threshold u ~ 0, ξc ~ 0. We find

kcrit=μ1+νμ1+ν2μ (33)

We notice that for small ν, the critical value is kcrit ~ μ + ν/2, whereas for large values of ν it becomes independent of recombination kcrit ~ 2μ. This behavior is similar to the one previously observed in Fig. 3 for the case of horizontal gene transfer with blocks of fixed size. The shape of the phase boundary is independent of the block size in the horizontal gene transfer process, assuming that the size of the blocks is finite.

As a second example, we consider the square root fitness f(u)=k|u|. Analytical results for the average composition, obtained after Eq. (14), are represented in Table V for blocks of average size 〈〉 = 3. From the values displayed in Table V, we notice that horizontal gene transfer introduces a mild increase in the average composition and, correspondingly, in the mean fitness of the population fm=k|u|. This trend, which is opposite to the quadratic fitness case, can be attributed to the negative epistasis represented by the square root fitness, by similar arguments as in the case of fixed block size.

TABLE V.

Analytical results for horizontal gene transfer in the parallel model for the square-root fitness f(u)=k|u|, with 〈〉 = 3.

k ν uanalytic

2.0 0.0 0.4855
2.0 0.5 0.4889
2.0 1.0 0.4915
2.0 1.5 0.4936

2.5 0.0 0.5396
2.5 0.5 0.5425
2.5 1.0 0.5448
2.5 1.5 0.5466

5.0 0.0 0.6523
5.0 0.5 0.6540
5.0 1.0 0.6554
5.0 1.5 0.6566

Horizontal gene transfer does not affect the phase boundary for the sharp peak fitness, f(u) = Aδu,1. In this case, Eq. (31) is maximized at ξc = 1, with ϕc = (1 + u)/2 from Eq. (32). Thus, the mean fitness becomes

fm=AμνM¯[1eM¯(1u)/2] (34)

The error threshold is given, for u = 0 in Eq. (34), by the condition A>μ+νM¯[1eM¯/2]. However, we notice that fm(u = 1) = A − μ > fm(u = 0). Hence, in the selected phase u = 1 − 𝒪(N−1), and the recombination effect becomes negligible for infinite N. From the self-consistency condition fm = P0A, we obtain the fraction of the population located at the peak P0 = 1 − μ/A. Therefore, the true error threshold is given by Acrit > μ, with A>μ+νM¯[1eM¯/2] the limit of metastability for initial conditions with u ~ 0.

Therefore, we conclude that horizontal gene transfer for multiple size blocks displays a qualitatively similar behavior to the corresponding process for fixed block size. A population evolving under a smooth fitness function with positive epistasis (e.g. quadratic, see Fig. 1) experiences an additional mutational load due to horizontal gene transfer, which modifies the quasi-species structure, reducing the mean fitness, and hence shifting the error threshold. On the contrary, when epistasis is negative (e.g. square-root, see Fig. 1) a beneficial effect is induced by horizontal gene transfer, in agreement with the mutational deterministic hypothesis, as we demonstrate in Appendix L.

A discontinuous sharp peak fitness function does not change the quasi-species distribution or the mean fitness, although it does introduce metastability.

C. The parallel model with two-parent recombination

Biological recombination, as occurs for example in viral super- or co-infection or in sexual reproduction, involves the crossing over of parental strands at random points along the sequence. The copying process is carried out by the action of polymerase enzymes, which move alternatively along one or the other parental strand. An approximate representation of this process is to consider that the polymerase enzyme starts, with probability 1/2 on either parental strand, copying one base at a time. We consider the crossovers to occur because there exists a probability pc per site that the polymerase “jumps” from its current position towards the other parental strand. Alternatively, the enzyme progresses along the current strand with probability 1 − pc. A pictorial representation is shown in Fig. 4.

FIG. 4.

FIG. 4

Pictorial representation of the two-parent genetic recombination process considered in the theory.

For this particular process representing the wandering path followed by the polymerase enzyme, the recombination coefficients Rkli in Eq. (1) are given by the exact analytical expression

Rkli=12{αj=±1}(1+s1ks1i2)1+α12(1+s1ls1i2)1α12×[(1pc)1+α1α22pc1α1α22](1+s2ks2i2)1+α22(1+s2ls2i2)1α22×[(1pc)1+α2α32pc1α2α32](1+s3ks3i2)1+α32(1+s3ls3i2)1α32××[(1pc)1+αN1αN2pc1αN1αN2](1+sNksNi2)1+αN2(1+sNlsNi2)1αN2 (35)

Here, the recombining parental sequences are Sk=(s1k,s2k,,sNk),Sl=(s1l,,sNl) and the offspring sequence is Si=(s1i,s2i,,sNi), with sj = ±1. Using Eq. (35), Eq. (1) representing the time evolution of an infinite population of binary sequences experiencing replication, point mutations and two-parent recombination, exactly becomes

dqidt=riqi+k=12Nμikqk+νNk=12N12{αj=±1}{[j=2Npc1αj1αj2(1pc)1+αj1αj2]×j=1N(1+sjksji2)1+αj2l=12Npl(1+sjl2δsji,+1+1sjl2δsji,1)1αj2}qkνNqi (36)

where, again, pl=ql/l=12Nql is the normalized probability for sequence 1 ≤ l ≤ 2N.

From Eq. (36), the recombination operator corresponding to this recombination process in the spin Boson representation is

R^=12l=12Npl{αi=±1}[Î11+α12R^l(1)1α12]×[(1pc)1+α1α22pc1α1α22]×[Î21+α22R^l(2)1α22]×[(1pc)1+α2α32pc1α2α32]×[Î31+α32R^l(3)1α32]××[(1pc)1+αN1αN2pc1αN1αN2]×[ÎN1+αN2R^l(N)1αN2]Îg({R^l(j)})Î (37)

Here, the local recombination operator is R^l(j)=â(j)Djlâ(j), with

Djl=(1+sjl21+sjl21sjl21sjl2). (38)

The Îj are the identity operators acting on site 1 ≤ jN, whereas Î=j=1NÎj is the identity operator for the entire sequence vector.

1. The Hamiltonian

The Hamiltonian describing the evolution of this system in the spin Boson representation is given by

Ĥ=Nf[1Nj=1Nâ(j)σ3â(j)]+μj=1N[â(j)σ1â(j)Î]+νN(g[{R^l(j)}]Î) (39)

We introduce a Trotter factorization

eĤt=limM[𝒟z*𝒟z]|zM(k=1Mzk|eεĤ|zk1)z0| (40)

As shown in Appendix D the partition function is

Z=[𝒟ξ¯𝒟ξ𝒟ϕ¯𝒟ϕ]eS[ξ¯,ξ,ϕ¯,ϕ] (41)

Here, the action in the continuous time limit is given by

S[ξ¯,ξ,ϕ¯,ϕ]=N0tdt[ξ¯ξϕ¯ϕμν+f(ξ)+νg(ϕ)]NlnQ (42)

As shown in Appendix E, the recombination term can be represented, for 0 ≤ pc ≤ 1/2, by the exact finite series

g({ψjl})=l=12Npl{j=1N(1+ψjl2)+1i<jN(12pc)ji1ψjl21ψil2ki,jN(1+ψkl2)+1i<j<k<nN(12pc)ji+nk1ψjl21ψil21ψnl21ψkl2×mi,j,k,nN(1+ψml2)++(12pc)N2j=1N(1ψjl2)} (43)

were we used the notation ψjl=zk*(j)Djlzk1(j), and Djl is defined in Eq. (38).

We consider first the case when pc = 1/2 in the above expression. Then, we have

g({ψjl},pc=1/2)=l=12Nplj=1N(1+ψjl)/2 (44)

We notice that the recombination term in the differential Eq. (36) satisfies l=12NplRkli1, ∀ k, i, because Rkli0 and i=12NRkli=1. In our field theoretic representation of the model, this condition is equivalent to g({ψjl})1 for any physical state. We also have, for example, j=1N(1+ψjl2)z1. If we consider evaluating the g interaction term perturbatively, as in Appendix A, we obtain terms such as

g=[(1+ψ)/2]N+[(1+ψ)/2]N2×(1/8)lplijδψilδψjlz+ (45)

where ψ=lpl(1/N)jψjlz. Since both the correlations of the spins in the Dil matrix for typical, likely l and the correlations in the z fields are each 𝒪(1/N), the interaction term g in Eq. (44) contributes nothing, unless 〈ψ〉 = 1 − 𝒪(1/N), in which case 〈g〉 = 𝒪(N0).

For the general case of 0 < pc < 1/2, we notice that 0 < 1 − 2pc < 1. Making the ansatz that correlations between z fields and correlations between spins of typical, likely sequences l each remain 𝒪(1/N) at different sites, terms other than the first in Eq. (43) are at least 𝒪(1/N) smaller when 〈ψ〉 = 1 − 𝒪(1/N). Thus, when 〈ψ〉 ~ 1, the first term dominates the series, and the others become arbitrarily small, thus recovering the same expression as for pc = 1/2. On the other hand, when 〈ψ〉 ~ −1, we notice that the dominant terms are the last ones. However, those terms are proportional to powers of 1 − 2pc of order N, whereas the number of these terms is of just polynomial order in N. Therefore, for N very large these terms become arbitrary small. Thus, we conclude that in the limit N → ∞, regardless of the value of pc, the function g is represented by Eq. (44).

In the particular case of uniform crossover pc = 1/2, and when the fitness function is permutation invariant, i.e., it depends only on the average composition of the sequence through the average base composition u, it is possible to reformulate the differential equation Eq. (1) for the evolutionary dynamics of an infinite population of binary sequences in terms of the distribution of classes:

Pl=j𝒞lpj (46)

where 𝒞l represents the class of sequences with l, “−1” spins. Although all the sequences in a given class do not have the same dynamics, we can nonetheless calculate the class dynamics exactly:

dPldt=N[f(2l/N1)l=0NPlf(2l/N1)]Pl+μ(Nl+1)Pl1+μ(l+1)Pl+1NμPl+νNl1,l2R(l|l1,l2)Pl1Pl2NνPl. (47)

The coefficients R(l|l1, l2) represent the probability that a pair of parental sequences in the classes 𝒞l1, 𝒞l2, due to uniform crossover recombination, generate a child sequence in the class 𝒞l. The number of sequences in these classes is (Nl1),(Nl2) and (Nl), respectively. For a given pair of parental sequences, let us consider the variables n++, n+−, n−+ and n−−, representing the number of pairs of (+1, +1), (+1, −1), (−1, +1) and (−1, −1) spins respectively. These variables satisfy the equation N = n++ + n+− + n−+ + n−−. We further notice that these variables also satisfy n−+ = l1n−− and n+− = l2n−−. Considering that from each pair of (+1, −1) or (−1, +1) spins in the parental sequences, the child sequence will inherit a “−1” spin with probability 1/2, while from a pair of the kind (−1, −1) it will inherit a “−1” spin with probability 1, we have the explicit analytical expression for these coefficients

R(l|l1,l2)=n=max{0,l1+l2N}min{l1+l2l,l1,l2}(Nn,lln,l2n)(Nl1)(Nl2)(l1+l22nln)2(l1+l22n) (48)

The first factor is the probability for a configuration with nn−−, given l1, l2 and l. The second factor is the number of ways of picking ln−− “−1” spins among n+− + n−+. The third factor is just (1/2)n−+(1/2)n+−(1)n−−. These coefficients are different from zero only if

max{0,l1+l2N}lmin{N,l1+l2} (49)

They also satisfy the following properties:

R(l|l1,l2)=R(l|l2,l1) (50)
l=0NR(l|l1,l2)=1l1,l2 (51)
R(N|N,N)=R(0|0,0)=1 (52)

In the limit of large N, we find that the recombination coefficients satisfy a Gaussian distribution in the variables u1 = 1 − 2l1/N, u2 = 1 − 2l2/N, and u = 1 − 2l/N (see Appendix F):

Ru1,u2u~eN[(u1+u2)/2u]2/(1u*2)π(1u*2)/N (53)

where fm = f(u*).

This form of the recombination operator, Eq. (53), is equivalent to Eq. (44) with sjl replaced by u in the D matrix. Alternatively, we notice that when the singular behavior of the function g can be described as a delta function, we have

g=l=12Nplδ1Nj=1Nzk*(j)Djlzk1(j),1=l=12Npl02πdλ2πeiλ[1Nj=1Nzk*(j)Djlzk1(j)1]=l=12Npl02πdλ2πeiλ{1+iλNj=1Nzk*(j)Djlzk1(j)+12!(iλN)2j,m=1Nzk*(j)Djlzk1(j)zk*(m)Dmlzk1(m)+} (54)

By noticing that correlations between compositions at different sites along the sequence are of order 𝒪(N−1), we have that for the second order correlation

DjlDmlDjl2~𝒪(N1) (55)

where Djl=l=12NplDjlDj is the population average. A similar analysis for the higher order correlations allows us to factorize order by order the terms in the series Eq. (54), to obtain

g~δ1Nj=1Nzk*(j)Djzk1(j),1+𝒪(N1) (56)

We are interested in the long term, steady state distribution, when the average base composition u(j)=sjl~u becomes independent of time. In this limit, the trace defined by Eq. (D9) becomes

limtlnQct=ϕ¯c2+[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c/2)2]1/2 (57)

Hence, from Eq. (42), the saddle point action is

limN,tlnZNt=limtScNt=fm=maxξc,ξ¯c,ϕcϕ¯c{ξ¯cξcϕ¯cϕcμν+νg(ϕc)+f(ξc)+ϕ¯c2+[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c2)2]1/2} (58)

As shown in Appendix G, we find

ScNt=maxϕc,ξc{f(ξc)μν+νg(ϕc)+μ1u2(2ϕc1uξc)μ|u|1u2[(2ϕc1uξc)2(1u2)(1ξc2)]1/2} (59)

Because of the singular behavior of the function gc), to find the saddle point we need to consider three separate cases: ϕc < 1, ϕc = 1, and ϕc = 1 − 𝒪(1/N). The existence of different expressions for the mean fitness suggests the possibility of different selected phases in certain conditions. We also notice that the saddle point analysis may not apply exactly, unless gc) = δϕc,1.

Case 1: ϕc < 1. For this case, we look for a saddle point in the field ϕc, in the interior of the domain, ϕc < 1 where gc) = 0

δδϕc(ScNt)=2μ1u2μ|u|1u22(2ϕc1uξc)[(2ϕc1uξc)2(1u2)(1ξc2)]1/2=0 (60)

From Eq. (60), we solve for ϕc as a function of ξc

ϕc(ξc)=1+uξc2+121ξc2 (61)

Substituting Eq. (61) in the saddle-point action Eq. (59), we obtain

fm(1)=max1ξc1{f(ξc)μν+μ1ξc2} (62)

Case 2: ϕc = 1. The mean fitness is obtained from Eq. (59) as

fm(2)=max1ξc1{f(ξc)μ+μ1u2(1uξc|uξcu2|)} (63)

Case 3: ϕc = 1 − 𝒪(1/N). In this case, additional analysis is necessary to calculate the mean fitness due to the singular behavior of the gc) function. For a smooth fitness function, we can argue this case does not exist. We first consider the Hamiltonian (39) for the case g = 0. The largest eigenvalue, fm, is shifted by −ν relative to the ν = 0 case. This allows us to calculate the average composition, u*, from the implicit relation fm(ν) = fm(ν = 0) − ν = f(u*). Alternatively, if we consider the differential equation for the unnormalized class probabilities, dQ/dt = LQ, we see that the differential operator L looks like that in the absence of recombination, save for a shift of −ν in the fitness function. Thus, the variance of the population is given by [33] σu2/N=2μu*/[Nf(u*)]. Considering more carefully the g function, we find du1du2Ru1u2uP(u1)P(u2)=exp[N(uu*)2/(2σ2)]/2πσ2N, with σ2=σu2/2+(1u*2)/2. This term is exponentially negligible compared to the −νP(u) term when σ2<σu2, since P(u)=exp[N(uu*)2/(2σu2)]/2πσu2N. In other words, we must strictly be in case 1 when

1u*2<2μu*/f(u*). (64)

We denote the value of ν at which

1u*2=2μu*/f(u*)atν=ν* (65)

as ν*. Now, at this value of ν* we have du1du2Ru1u2uP(u1)P(u2)=P(u). Thus, the term proportional to ν in Hamiltonian (39), or differential equation (47), exactly vanishes. Thus, we have dfm/dν = 0 and dP(u)/dν = 0 at this value of ν. There is spectral rigidity. This implies that for ν > ν*, the distribution P(u) is independent of ν, and that the value of u* is constant. In other words, the value of fm in case 2 must be constant with ν. Assuming fm varies continuously with ν in case 1, and that the fitness values for case 1 and case 2 are equal at a single value of ν, therefore, case 2 is simply case 1 with the value ν = ν*

fm(ν>ν*)=fm(ν=ν*) (66)

Eqs. (62), (63) provide an exact analytical solution for the mean fitness of an infinite population, for a general permutation invariant replication rate represented by a continuous, smooth function f(u).

For a non-smooth fitness function, additional analysis is necessary, since f′(u*) is undefined, and P(u) may no longer be Gaussian.

2. Examples and numerical tests

We investigate the phase diagrams, as predicted from our theoretical equations Eqs. (62), (63) for three different fitness functions: A sharp peak, a quadratic fitness landscape and a square-root fitness landscape.

For the sharp peak landscape f(u) = Aδu,1, we notice that the maximum is achieved at ξc = 1, with u = 1−𝒪(N−1). From Eqs. (63) and (62), we obtain

fm(2)=Aμ>fm(1)=Aμν (67)

Therefore, for the sharp peak only a single selected phase is observed. In this case, the function gc) is not exactly a Kronecker delta δϕc,1, we are in case 3, and thus we find a small correction, approximately linear in ν, to the saddle-point prediction. In the selected phase, where the population is exponentially localized near u = 1 for large N, Eq. (48) becomes R(l|l1, l2) ~ (l1 + l2)!2l1l2/[l!(l1 + l2l)!]. By analyzing the differential equation at zeroth-order in ν for large N, we find that the class distribution is given by Pl(0)=P0(0)(1P0(0))l. Hence, we find that at first order in ν, the fraction of the population P0 located at the peak is given by

P0=1μ/Aν/A[141μA(2μA)2]+𝒪(ν2) (68)

We note that this value of fm = AP0 interpolates between fm(1) for A/μ = 1 and fm(2) for A/μ = ∞. There is no dependence on pc because the −1 spins are separated by 𝒪(N) sites.

As a second example, we consider the quadratic fitness landscape, f(u) = ku2/2. This smooth, continuous fitness function allows for the use of the exact analytical formulas Eq. (62), (63). By maximizing Eq. (62) with respect to ξc, when ϕc < 1 and hence gc) = 0, we find

fm(1)=k2[(1μk)22νk] (69)

This mean fitness defines a selective phase S1.

According to our previous analysis, when ϕc = 1 and gc) = 1, we maximize Eq. (63) in ξc. Here, we consider that the order parameters ξc and u have the same sign, uξc ≥ 0. We then have uξcu2 in Eq. (63) [56]. Hence, we find

fm(2)=k2(12μk) (70)

which defines a second selective phase S2.

By applying the self-consistency condition fm(1,2)=ku2/2, we find the following phases

S1:u=[(1μk)22νk]1/2,2νμ<μk<1[2νk]1/2
S2:u=12μk,2νμ>μk<12
NS:u=0,otherwise (71)

We note that the phase transition between case 1 and case 2 is exactly as predicted by Eq. (65). We further note that the mean fitness is independent of ν for ν > ν* = μ2/(2k), exactly as predicted by Eq. (66).

The system of differential equations (47) provides an exact representation of the evolution dynamics for an infinite population, when uniform crossover probability pc = 1/2 is assumed. On the other hand, our analytical equations Eq. (62), Eq. (63) for smooth fitness, or Eq. (68) for the discontinuous sharp peak, predict that the equilibrium results should be independent of the crossover probability pc. To test this theory, we performed exact stochastic simulations based on a Lebowitz/Gillespie algorithm [57, 58]. We generate a population of M = 10 000 sequences initially in the wild-type. The size of the finite population represented in the simulation was chosen large enough such that the results become independent of size M. Then, the population is evolved in time by point mutation, recombination and replication with rates proportional to μ, ν, and f(ul) respectively, with ul=1Nj=1Nsjl the average composition of sequence Sl, 1 ≤ lM. For that purpose, a list is generated by defining: τl = μ + ν + f(ul), τ=l=1Mτl. With probability τl/τ, a sequence 1 ≤ lM is chosen from the population to undergo either a single point mutation with probability μ/τl, replication with probability f(ul)/τl, or recombination with another sequence with probability ν/τl according to the process described in Fig. 4.

To preserve the size M of the population, when replication or recombination is performed, a sequence chosen at random from the population is substituted with the offspring. The time increment after any of these events is performed is calculated as dt = −log(w)/(Nτ), with w ∈ (0, 1] a uniformly distributed random number. The results obtained from this stochastic simulation are compared with the theoretical prediction in Table VII for the sharp peak fitness landscape and uniform crossover pc = 1/2.

TABLE VII.

Stochastic process versus differential equation for two-parent recombination in the parallel model for the sharp peak fitness, A/μ = 4.0, N = 400.

ν/μ ustochastic udiffeq
P0stochastic,pc=0.1
P0stochastic,pc=0.3
P0stochastic,pc=0.5
P0diffeq
P0analytic

0.0 0.998337 0.998336 0.75017 0.75017 0.75017 0.75016 0.75
1.0 0.998329 0.998326 0.7455 0.7454 0.74591 0.74544 0.7449
2.0 0.998312 0.998317 0.7415 0.7414 0.74085 0.74140 0.7398

In agreement with our theoretical prediction, as shown in Table VI from stochastic simulations in the quadratic fitness landscape, the effect of recombination is independent of the polymerase crossover probability pc. The probability distributions obtained for the systems considered in Table VI are displayed in Fig. 6. Clearly, the distributions are independent of pc, in agreement with the theory.

TABLE VI.

Stochastic process versus analytical theory for two-parent recombination in the parallel model for the quadratic fitness f(u) = ku2/2, with k/μ = 4.0, ν/μ = 3.0, and N = 100.

pc ustochastic uanalytic

0.1 0.7065 0.7071
0.3 0.7052 0.7071
0.5 0.7058 0.7071
FIG. 6.

FIG. 6

Probability distributions for two-parent recombination in the parallel model for the quadratic fitness f(u) = ku2/2, with k/μ = 4.0 and ν/μ = 3.0, obtained from stochastic simulations with M = 10 000 sequences of N = 100 bases and different values of pc.

We obtain a direct numerical solution of the deterministic system of differential equations Eq. (47), which provides an exact representation of the evolution dynamics for an infinite population experiencing uniform crossover recombination pc = 1/2. A comparison between these numerical solutions, and results obtained from the stochastic simulation for a system large enough to eliminate finite size effects, is displayed in Table VII for the sharp peak fitness. The theoretical prediction from the analytical formula Eq. (68) is also shown for comparison. It is evident from this table that the effect of recombination is independent of the polymerase crossover probability pc, in agreement with our theoretical predictions.

From the data presented in Table VII, we notice that the deterministic system of differential equations provides an accurate representation of the underlying stochastic dynamics for the case of uniform crossover, pc = 1/2. Thus, the results obtained from the numerical solution of the deterministic system of differential equations can be fairly compared with the analytical theory.

It is remarkable that the small, but finite, effect introduced by recombination in the structure of the quasi-species distribution for the sharp peak case, is not a consequence of the Muller’s ratchet phenomenon [2] characteristic of finite populations. Indeed, the shift in the wild-type probability P0 due to recombination, as predicted from our analytical equation Eq. (68), was derived from the system of differential equations Eq. (47), which describes the time evolution of an infinite population. Moreover, this closed analytical result is in excellent agreement with the numerical solution of the system of differential equations Eq. (47), as displayed in Fig. 8 and Table VIII. A good agreement between our analytical and differential equation results, which correspond to the infinite population case, and the stochastic simulation is expected when the later is performed in a large enough population. We determined that for the parameters we consider, M = 10 000 sequences provides simulation results that are independent of the population size for the sharp peak fitness function, thus allowing for a comparison with the infinite population theory expressed by the differential equations Eq. (47) and with our analytical solution Eq. (68).

FIG. 8.

FIG. 8

Convergence of the numerical results towards the theoretical value for two-parent recombination in the parallel model for the selective phase S2 in Eq. (71). In this example, k/μ = 4.0 and ν/μ > 1/8.

TABLE VIII.

Analytical theory versus numerical solution for two-parent recombination in the parallel model for the quadratic fitness f(u) = ku2/2 with N = 800 and k/μ=4.0.

ν/μ udiffeq uanalytic

0.0 0.7499 0.7500
0.025 0.7417 0.7416
0.05 0.7329 0.7331
0.1 0.7202 0.7159
0.5 0.7091 0.7071
1.0 0.7083 0.7071
2.0 0.7075 0.7071
3.0 0.7073 0.7071

Notice that for the quadratic fitness, the analytical theory reproduces the differential equation results within 𝒪(N−1). The convergence towards the theoretical value as a function of the system size 1/N, for parameters within the S1 phase defined in Eq. (71), is displayed in Fig. 7, and for the S2 phase in Fig. 8.

FIG. 7.

FIG. 7

Convergence of the numerical results towards the theoretical value for two-parent recombination in the parallel model for the selective phase S1 in Eq. (71). In this example, k/μ = 4.0 and ν/μ < 1/8.

As a final example, we apply our analytical solution Eq. (62) and Eq. (63) to study the square-root fitness, f(u)=k|u|, as displayed in Table IX, where analytical theory and direct numerical solution of the differential equation agree to 𝒪(N−1).

TABLE IX.

Analytical theory versus numerical solution for two-parent recombination in the parallel model for the square-root fitness f(u)=k|u|, with N = 400, 800, 1000 and k/μ=4.0.

ν/μ udiffeq, N = 400 udiffeq, N = 800 udiffeq, N = 1000 uanalytic

0.0 0.6527 0.6525 0.65249 0.6523
0.1 0.6650 0.6672 0.6678 0.6710
0.3 0.6686 0.6697 0.66993 0.6710
0.5 0.6696 0.6703 0.67043 0.6710
0.8 0.6703 0.6707 0.67073 0.6710
1.0 0.6705 0.6708 0.67083 0.6710

As shown in Table IX, two-parent recombination in the square-root fitness landscape enhances selection towards sequences which are on average more fit, as observed by a slight increase of the average composition u, with respect to the case when recombination is absent. This effect, which was already observed for the square-root landscape in the presence of horizontal gene transfer, can be attributed to the negative (see Fig.1) epistatic interactions introduced by the square-root fitness, in agreement with the mutational deterministic hypothesis, Appendix N.

An additional interesting effect in two-parent recombination, which was observed in the quadratic as well as in the square-root fitness landscapes, is the presence of spectral rigidity: the effect of recombination becomes independent of the recombination rate for ν > 0.

In summary, from our generalization of the parallel or Crow-Kimura model for an infinite population of evolving sequences Eq. (36), we conclude that two-parent recombination introduces a mild mutational load over discontinuous fitness functions, such as a single sharp peak, and thus it can shift the error-threshold transition. For smooth fitness functions, the effect of recombination depends on the sign of epistasis (see Fig. 1), in agreement with the mutational deterministic hypothesis [9, 10, 13, 14]. We show this analytically in Appendix N.

In contrast with horizontal gene transfer, recombination affects the structure of the quasi-species (and the error threshold transition) for a sharp peak fitness. We believe that this fundamental difference between horizontal gene transfer and recombination is because of the fact that the latter can generate a much larger diversity in the offspring per recombination event. Hence, the diversity barrier that, as previously discussed in section II, is imposed by the sharp exponential distribution in the sharp peak case can be tunneled through due to the more radical mixing effects of two-parent recombination. Our analytical theory, which provides explicit expressions for the mean fitness fm and average composition u, is developed in the realistic regime (N → ∞), considering that typical viral genomes are N ~ 103 − 104.

III. THE EIGEN MODEL

In this section, we present a generalization of the classical Eigen model [2426], including the exchange of genetic material between pairs of individuals in an infinite population [49],

dqidt=j,k=12N[BijCjkrkδijδikDi]qk (72)

Here, recombination as well as mutation are considered to be coupled to the replication process. Recombination is represented by the coefficients Cjk, which in general will be functions of the frequencies qk, Cjk~δjk+lqlC˜kljkqk.

A. Horizontal gene transfer of non-overlapping blocks

In this recombination scheme, we consider the exchange of blocks of genetic material between pairs of individuals in the population. We consider the blocks to be non-overlapping, such that we have N/ of them. We define a block index 0 ≤ bN/ − 1, and a site index within each block to be M̄b + 1 ≤ jb (b + 1). For this process, we have that the nonlinear recombination term in the differential Eq. (72) is

Cjk~(1ν/M¯N/M¯)δj,k+ν/M¯N/M¯×b=0N/M¯1[jb=M¯b+1M¯(b+1)δsjbj,sjbk(δsjb,+11+u(jb)2+δsjb,11u(jb)2)]m{jb}δsmj,smk (73)

The recombination operator representing this process, assuming the recombination rate per block to be ν/, becomes

R^=b=0N/M¯1[(1ν/M¯N/M¯)jb=M¯b+1M¯(b+1)Îjb+ν/M¯N/M¯jb=M¯b+1M¯(b+1)R^jb] (74)

Here, we defined the single-site recombination operator as R^j=â(j)Dâ(j), with the matrix D defined in Eq. (6). We consider the large N limit, while keeping N/ ≃ 𝒪(N). Then, the recombination operator defined in Eq. (74) becomes, to order 𝒪(N−1)

R^=eνM¯eνNb=0N/M¯1jb=M¯b+1M¯(b+1)â(j)Dâ(j) (75)

1. The Hamiltonian

The Hamiltonian operator for the Eigen model, including the horizontal gene transfer process described by the operator Eq. (74) is given by

Ĥ=Neμ+μNj=1Nâ(j)σ1â(j)eνN+νNb=1N/M¯1jb=M¯b+1M¯(b+1)[â(jb)Dâ(jb)]×f[1Nj=1Nâ(j)σ3â(j)]Nd[1Nj=1Nâ(j)σ3â(j)] (76)

The microscopic fitness function is f(u) and degradation function is d(u). Here, the matrix D is defined as in Eq. (6). We introduce a Trotter factorization of the evolution operator, in the basis of coherent states

eĤt=limM[k=1M𝒟zk*𝒟zk]|zM(k=1Mzk|eεĤ|zz1)z0| (77)

As shown in Appendix H, the partition function is

Z=[𝒟ξ¯𝒟ξ𝒟η¯𝒟η𝒟ϕ¯𝒟ϕ]eS[ξ¯,ξ,η¯,η,ϕ¯,ϕ] (78)

Here, the action is defined by

S[ξ¯,ξ,η¯,η,ϕ¯,ϕ]=N0tdt[ξ¯ξη¯ηϕ¯ϕ+eμ(1η)ν/M¯+νM¯ϕM¯f(ξ)d(ξ)]NlnQ (79)

2. The saddle point limit

We consider the saddle point limit of the action defined by Eq. (79). In the saddle point limit, for long times, the trace defined by Eq. (H11) becomes

limtlnQct=ϕ¯c2+[ξ¯c(ξ¯c+uϕ¯c)+(η¯c+ϕ¯c/2)2]1/2 (80)

In this saddle-point limit, the action is given by

limN,tlnZNt=limtScNt=maxξc,ξ¯c,ϕc,ϕ¯c,ηc,η¯c{f(ξc)eμ(1ηc)νM¯+νM¯ϕcM¯d(ξc)ξ¯cξcη¯cηcϕ¯cϕc+ϕ¯c2+[ξ¯c(ξ¯c+uϕ¯c)+(η¯c+ϕ¯c/2)2]1/2} (81)

As shown in Appendix I the mean fitness, defined from the saddle point action fm = limN,t→∞ ln Z/Nt = −Sc/Nt, is

fm=max1ξc1{eμ[1ηc(ξc)]νM¯{1[ϕc(ξc)]M¯}f(ξc)d(ξc)} (82)

Here, the expressions ϕcc) and ηcc) are given by

ϕc(ξc)=1+uξc2+1ξc22μ+ν2(1u2)ϕcM¯1[(μ+ν2ϕcM¯1)2ν2u24[ϕcM¯1]2]1/2 (83)
ηc(ξc)=1ξc2μ+ν2ϕcM¯1[(μ+ν2ϕcM¯1)2ν2u24[ϕcM¯1]2]1/2 (84)

The average composition, u, is obtained from the self-consistency condition fm = f(u) − d(u).

Eq. (82) is an exact analytical expression for the equilibrium mean fitness of an infinite population of evolving sequences. This analytical expression is valid for arbitrary permutation invariant replication rate f(u) and degradation rate d(u).

3. Examples

We consider first the quadratic fitness case, f(u) = ku2/2+k0. By expanding the formulas Eqs. (82), (83) and (84) near the error threshold ξc ~ 0, u ~ 0, we obtain the phase boundary from the critical condition

kcrit=μk01+ν/μ1+ν/2μ (85)

We notice that the phase boundary is qualitatively similar to the horizontal gene transfer process analyzed in section II. A 2, Eq. (12) for the parallel model. As in this former case, we notice that horizontal gene transfer introduces a mild mutational load against selection for a smooth fitness (i.e. quadratic).

As a second example, we consider the square-root fitness landscape f(u)=k|u|+1. In Table X, we evaluate our analytical Eqs. (8284) for this particular case.

TABLE X.

Analytical results for horizontal gene transfer in the Eigen model for the square-root fitness f(u)=k|u|+1, with = 3.

k ν uanalytic

3.0 0.0 0.3346
3.0 0.5 0.3398
3.0 0.8 0.3422
3.0 1.5 0.3466

5.0 0.0 0.3588
5.0 0.5 0.3642
5.0 0.8 0.3667
5.0 1.5 0.3713

8.0 0.0 0.3741
8.0 0.5 0.3796
8.0 0.8 0.3822
8.0 1.5 0.3869

From the results displayed in Table X, we notice that horizontal gene transfer increases the average composition u and therefore the mean fitness of the population. This effect, which is attributed to the negative epistasis introduced by the square-root fitness (see Fig. 1), is in agreement with the previous examples studied in the case of the parallel model, and with the mutational deterministic hypothesis [7, 1012], as we prove in Appendix M.

As a third example, we consider the sharp peak fitness f(u) = (AA0u,1 + A0. In this case, the maximum in Eq. (82) corresponds to ξc = 1. From Eqs. (83) and (84), we have ξc = (1 + u)/2, ηc = 0, and hence after Eq. (82)

fm=AeμνM[1(1+u2)M¯] (86)

The error threshold is given, for u = 0 in Eq. (86), by the condition AeμνM¯[11/2M¯]>A0. However, we notice that fm(u = 1) = Ae−μ > fm(u = 0). Hence, in the selected phase we have u = 1 − 𝒪(N−1). The fraction of the population located at the peak P0 is obtained from the self-consistency condition fm = AP0 + A0(1 − P0)

P0=AeμA0AA0 (87)

After Eq. (87), we find the true error threshold at Acrit = A0eμ, while the condition AeμνM¯[12M¯]>A0 represents the limit of metastability for initial conditions with u ~ 0. We notice that this result is similar to the exact solution in the absence of horizontal gene transfer [33]. Hence, as previously discussed in section I.A. for the parallel model, we conclude that horizontal gene transfer does not affect the structure of the quasi-species for a discontinuous, single sharp peak fitness.

B. Horizontal gene transfer for multiple-size blocks

In analogy with the model treated in Section II.B, we consider the natural extension of horizontal gene transfer of blocks with multiple size, with average 〈〉 and 〈〉/N = 𝒪(N−1). Following a similar analysis as in the derivation of Eq. (25), we define the recombination operator for multiple-size blocks as

R^~eM¯+M¯Nj=1Nâ(j)Dâ(j) (88)

1. The Hamiltonian

We consider horizontal gene transfer to be coupled to the replication process. Moreover, we will consider that when replication occurs, a horizontal gene transfer event also occurs with a probability 0 ≤ ν/〈〉 ≤ 1. The Hamiltonian operator for the Eigen model, including the horizontal gene transfer process described by the operator Eq. (88) is given by

Ĥ=Neμ+μNj=1Nâ(j)Dâ(j)(1νM¯+νM¯eM¯+M¯Nj=1Nâ(j)Dâ(j))×f[1Nj=1Nâ(j)σ3â(j)]Nd[1Nj=1Nâ(j)σ3â(j)] (89)

We introduce a Trotter factorization

eĤt=limM[𝒟z*𝒟z|zM](j=1Mzk|eεĤ|zk1)z0| (90)

As shown in Appendix J, the partition function is

Z=[𝒟ξ¯𝒟ξ𝒟η¯𝒟η𝒟ϕ¯𝒟ϕ]eS[ξ¯,ξ,η¯,η,ϕ¯,ϕ] (91)

Here, the action in the continuous time limit is

S[ξ¯,ξ,η¯,η,ϕ¯,ϕ]=N0tdt{ξ¯ξη¯ηϕ¯ϕ+eμ(1η)[1νM¯+νM¯eM¯(1ϕ)]f(ξ)d(ξ)}NlnQ (92)

2. The saddle point limit

The saddle point limit is exact as N → ∞ in Eq. (92). After a similar procedure as in Section 3.A.2, we find the saddle point equation for the mean fitness

fm=max1ξc1{eμ(1ηc)[1νM¯+νM¯eM¯(1ϕc)]f(ξc)d(ξc)} (93)

Here, the fields ηc and ϕc are expressed as functions of ξc

ηc(ξc)=1ξc2νM¯+[1νM¯]eM¯(1ϕc)+ν2μ[(νM¯+[1νM¯]eM¯(1ϕc)+ν2μ)2u2ν24μ2]1/2 (94)
ϕc(ξc)=1+uξc2+1ξc22νM¯+[1νM¯]eM¯(1ϕc)+ν(1u2)2μ[(νM¯+[1νM¯]eM¯(1ϕc)+ν2μ)2u2ν24μ2]1/2 (95)

Equations (93)(95) represent an exact analytical solution for the equilibrium mean fitness of an infinite population experiencing horizontal gene transfer of variable blocks size. This expression is valid for arbitrary, permutation invariant replication rate f(u) and degradation rate d(u).

3. Examples

We consider first the sharp peak fitness f(u) = (AA0u,1 + A0. In this case, the maximum in Eq. (93) is at ξc = 1. From Eqs. (94) and (95), we obtain ηc = 0 and ϕc = (1 + u)/2. Substituting these values in Eq. (93), we obtain for the mean fitness

fm=eμ[1νM¯+νM¯eM¯(1u)/2]A (96)

The error threshold for u = 0 is obtained from Eq. (96) by the condition Aeμ[1νM¯+νM¯eM¯/2]>A0. However, we notice that fm(u = 1) = Ae−μ > fm(u = 0). Therefore, in the selected phase the average composition u = 1 − 𝒪(N−1), and the effect of recombination becomes negligible for the sharp peak fitness. The fraction of the population located at the peak P0 is obtained from the self-consistency condition fm = AP0 + A0(1 − P0)

P0=AeμA0AA0 (97)

From this expression, we find that the true error threshold for the sharp peak fitness is Acrit = eμA0, with the condition Aeμ[1νM¯+νM¯eM¯/2]>A0 representing the limit for metastability for initial conditions with u ~ 0.

As a second example, we consider the quadratic fitness f(u) = ku2/2 + k0. An analytical expression for the phase boundary is obtained from Eqs. (93), (94) and (95) near the error threshold ξc ~ 0, u ~ 0. We find

kcrit=μk01+νμ1+ν2μ (98)

For small ν, the critical value is kcrit ~ k0(μ + ν/2).

As a final example, we consider the square-root fitness f(u)=k|u|+1. Analytical results, as obtained from Eqs. (93)(95) for this case, are presented in Table XI.

TABLE XI.

Analytical results for horizontal gene transfer in the Eigen model for the square-root fitness f(u)=k|u|+1, with 〈〉 = 3.

k ν uanalytic

3.0 0.0 0.3346
3.0 0.5 0.3409
3.0 0.8 0.3450
3.0 1.5 0.3546

5.0 0.0 0.3588
5.0 0.5 0.3654
5.0 0.8 0.3695
5.0 1.5 0.3794

8.0 0.0 0.3741
8.0 0.5 0.3809
8.0 0.8 0.3851
8.0 1.5 0.3950

We notice that the results obtained for the horizontal gene transfer process with variable block size agree with the corresponding ones when the size of the recombination blocks is fixed. We recall that this correspondence was also observed and discussed in the previous section for the parallel model, so similar arguments apply to the Eigen model as well. An analytical proof is provided in Appendix M.

C. The Eigen model with two-parent recombination

For the Eigen model, we introduce the recombination process described in Section II.C and illustrated in Fig. 4, which considers the exchange of genetic material between pairs of sequences due to crossovers governed by the polymerase switching from one parental chromosome to the other with probability pc per site. For the Eigen model, mutation and recombination are considered to be coupled to the recombination process, as stated in the generic differential equation Eq. (72). We will consider that during replication, a sequence can recombine with probability ν ≤ 1, or just replicate without recombining with probability 1 − ν. This process is represented by the coefficients in Eq. (72)

Cjk=(1ν)δj,k+ν2{αn=±1}{[n=2Npc1αn1αn2(1pc)1+αn1αn2]×l=12Npln=1N(1+snksnj2)1+αn2(1+snl2δsnj,+1+1snl2δsnj,1)1αn2} (99)

Here, again, pl=ql/l=12Nql is the normalized probability for the sequence 1 ≤ l ≤ 2N.

In the spin Boson representation, we express the Eigen model Hamiltonian by the operator

Ĥ=Neμ+μNj=1Nâ(j)σ1â(j)[(1ν)I^+νg[{â(j)Djlâ(j)}]]×f[1Nj=1Nâ(j)σ3â(j)]Nd[1Nj=1Nâ(j)σ3â(j)] (100)

Here, g[{R^jl}] was defined in Eq. (37), and the matrices Djl were defined in Eq. (38). We introduce a Trotter factorization

eĤt=limM[𝒟z*𝒟z]|zM(k=1Mzk|eεĤ|zk1)z0| (101)

As shown in Appendix K, the partition function is

Z=𝒟ξ¯𝒟ξ𝒟η¯𝒟η𝒟ϕ¯𝒟ϕeS[ξ¯,ξ,η¯,η,ϕ¯,ϕ] (102)

Here, the action is defined by

S[ξ¯,ξ,η¯,η,ϕ¯,ϕ]=N0tdt[ξ¯ξη¯ηϕ¯ϕ+eμ(1η)(1ν+νg(ϕ))f(ξ)d(ξ)]NlnQ (103)

1. The saddle point limit

For long times, a steady state condition is achieved. Then, the fields become time-independent, and we have

limtlnQct=ϕ¯c2+[ξ¯c(ξ¯c+uϕ¯c)+(η¯c+ϕ¯c2)2]1/2 (104)

We look for the saddle point solution from the action

limN,tlnZNt=limtScNt=maxξ¯c,ξc,η¯c,ηc,ϕ¯c,ϕc{ξ¯cξcη¯cηcϕ¯cϕc+eμ(1ηc)(1ν+νg(ϕc))f(ξc)d(ξc)+ϕ¯c2+[ξ¯c(ξ¯c+uϕ¯c)+(η¯c+ϕ¯c2)2]1/2} (105)

Because of the singular behavior of the function gc), to find the saddle point we need to consider three separate cases: ϕc < 1, ϕc = 1, and ϕc = 1 − 𝒪(1/N). We notice that the saddle point analysis may not apply exactly, unless gc) = δϕc,1.

Case 1: ϕc < 1. The mean fitness is given by

fm(1)=max1ξc1{(1ν)eμ[11ξc2]f(ξc)d(ξc)} (106)

We note ϕc is still given by Eq. (61).

Case 2: ϕc = 1. The mean fitness is given by

fm(2)=max1ξc1{eμ[11uξc|uξcu2|1u2]f(ξc)d(ξc)} (107)

Case 3: ϕc = 1 − 𝒪(1/N). In this case, additional analysis is necessary to calculate the mean fitness due to the singular behavior of the gc) function. For a smooth fitness function, we can argue this case does not exist. We first consider the Hamiltonian (100) for the case g = 0. In this case, the fitness function is simply multiplied by (1 − ν). If the degradation function is zero, the largest eigenvalue, fm is simply multiplied by (1 − ν) relative to the ν = 0 case. Without degradation, this result allows us to calculate the average composition, u*, from the implicit relation fm(ν) = (1 − ν)fm(ν = 0) = f(u*). With a non-zero degradation function, the equation for fm(ν) will be a bit more involved. Alternatively, if we consider the differential equation for the unnormalized class probabilities, dQ/dt = LQ, we see that the differential operator L looks like that in the absence of recombination, save for a multiplication of (1 − ν) in the fitness function. Thus, the variance of the population is given by [33] σu2/N=2μu*(1ν)f(u*)/[N((1ν)f(u*)d(u*))]. Considering more carefully the g function, we find as before this term is exponentially negligible compared to the −νP(u) term when σ2<σu2. In other words, we must strictly be in case 1 when

1u*2<2μu*(1ν)f(u*)/[(1ν)f(u*)d(u*)] (108)

We denote the value of ν at which

1u*2=2μu*(1ν)f(u*)/[(1ν)f(u*)d(u*)]atν=ν* (109)

as ν*. Now, at this value of ν* we have du1du2Ru1u2uP(u1)P(u2)=P(u). Thus, the term proportional to ν in Hamiltonian (100) exactly vanishes. Thus, we have dfm/dν = 0 and dP(u)/dν = 0 at this value of ν. There is spectral rigidity. This result implies that for ν > ν*, the distribution P(u) is independent of ν, and that the value of u* is constant. In other words, the value of fm in case 2 must be constant with ν. Assuming fm varies continuously with ν in case 1, and that the fitness values for case 1 and case 2 are equal at a single value of ν, which mathematically may be negative, case 2 is simply case 1 with the value ν = ν*

fm(ν>ν*)=fm(ν=ν*) (110)

Equations (106), (107) constitute an exact analytical expression for the equilibrium mean fitness of an infinite population of sequences evolving under the dynamics of the Eigen model, and experiencing two-parent recombination. These equations are exact for a smooth, permutation invariant replication rate f(u) and degradation rate d(u).

For a non-smooth fitness function, additional analysis is necessary, since f′(u*)−d′(u*) is undefined, and P(u) may no longer be Gaussian.

2. Examples

We investigate the phase diagrams, as predicted from our theoretical equations, for two different fitness functions: A sharp peak and a quadratic fitness landscape.

As an example, we consider the sharp peak fitness, f(u) = (AA0u,1 + A0. The maximum is obtained at ξc = 1, u = 1 − 𝒪(N−1). From Eqs. (107) and (106) we have

fm(2)=Aeμ>fm(1)=(1ν)Aeμ (111)

Hence, for the sharp peak fitness a single selective phase is observed. In this case, the function gc) is not exactly a Kronecker delta δϕc,1, we are in case 3, and then we expect to observe a small correction, approximately linear in ν from the prediction of the saddle point analysis. By considering the differential equations for the sharp peak case at zeroth-order in ν, we find that the class distributions satisfy eμ/2k(rk/N)Pk(0)/2k=fm(0)lPl(0)/2l with P0(0)=(AeμA0)/(AA0) and fm(0)=AP0(0)+A0(1P0(0))=Aeμ. Thus we find S=lPl(0)/2l=(AA0)P0(0)eμ/2/(fm(0)A0eμ/2)=(AeμA0)eμ/2/(AeμA0eμ/2). Thus, we find the recombination term k(rk/N)Pk(0)/2klPl(0)/2l=Aeμ/2S2. Hence, we find that at first order in ν, the fraction of the population located at the peak is given by

P0=AeμA0AA0νeμ[AAA0Aeμ/2AeμA0(Aeμ/2A0)2]+𝒪(ν2) (112)

We note that this value of fm = AP0+A0(1 − P0) interpolates between fm(1) for Ae−μ/A0 = 1 and a value intermediate to fm(1) and fm(2) for Ae−μ/A0 = ∞.

As a second example, we consider the quadratic fitness f(u) = ku2/2 + k0. By maximizing expressions Eq. (107) [59] and Eq. (106), we obtain two selective phases S1 and S2, and a non-selective phase NS, defined by the equations

S1:u=[2(1ν)eμ[11ξc2](ξc2/2+k0/k)2k0/k]1/2,ν<min(ν*,νc)
S2:u=[12μk0/k1+μ]1/2,νc>ν*<ν
NS:u=0,otherwise (113)

where in the S1 phase

ξc2=2[1+μ2(1+2k0/k)1μ2k0/k]/μ2 (114)

and we have defined

νc=1k0keμ[11ξc2]/(ξc2/2+k0/k)
ν*=1k+2k02k(1+μ)eμ[11ξc2]/(ξc2/2+k0/k) (115)

where ξc2 is given by Eq. (114). The phase structure is defined by the conditions: For 2μk0/k ≥ 1, the system is in S1 if ν < νc, or in NS if ν ≥ νc; for 2μk0/k < 1, the system is in S1 if ν ≤ ν*, or in S2 if ν > ν*. From Eq. (115), we notice that at 2μk0/k = 1, νc = ν*.

We note that the phase transition between case 1 and case 2 is exactly as predicted by Eq. (109). We further note that the mean fitness is independent of ν for ν > ν*, exactly as predicted by Eq. (110).

As a final example, we consider the square-root fitness f(u)=k|u|+1. By maximizing expressions Eq. (107) [59] and Eq. (106) for the square-root fitness landscape, we obtain the results presented in Table XII From the results displayed in Table XII, we observe a similar qualitative behavior as in the two-parent recombination for the parallel case, Table IX. In the square-root fitness, recombination introduces a favorable effect over selection, which can be attributed to negative epistasis (see Fig. 1) according to the mutational deterministic hypothesis [7, 1012], as shown in Appendix O. Spectral rigidity is also observed in this case when ν > 0.

TABLE XII.

Analytical results for two-parent recombination in the Eigen model for the square-root fitness f(u)=k|u|+1.

k ν/μ uanalytic

4.0 0.0 0.3493
4.0 0.1 0.3892
4.0 0.2 0.3892
4.0 0.5 0.3892

3.0 0.0 0.3346
3.0 0.1 0.3892
3.0 0.2 0.3892
3.0 0.5 0.3892

IV. CONCLUSION

We have generalized two classical models of evolutionary biology, the Crow-Kimura and Eigen models. We have introduced inter-individual transfer of genetic information to these models, bringing them closer to the modern understanding of evolutionary biology. For both models, we showed how to incorporate horizontal gene transfer. We showed that these generalized models may be written in an equivalent field-theoretic formulation. This mapping allows us to apply the powerful mathematical techniques of quantum field theory to obtain exact analytical solutions. For fitness landscapes that depend only on distance from a wild-type genome and for long genome lengths, we are able to solve for the mean population fitness for arbitrary functional forms of the fitness. Horizontal gene transfer of genetic units was shown to be analogous to horizontal gene transfer of one genetic unit, with a suitably scaled horizontal gene transfer rate.

We also showed how to incorporate recombination to these classical models, as might occur in viral super- or co-infection. This case seems at first glance far more non-linear, since on average half of the genetic material is taken from each parent to make the child, rather than O(1) genes as in horizontal gene transfer. Somewhat surprisingly, we were able to exactly solve the two-parent recombination case for both the Eigen and Crow-Kimura model as well. In the limit of a long genome and for fitness landscapes that depend on the distance from a wild-type genome, we find that the mean population fitness is independent of the average cross-over length in the recombination process. We also find two selected phases. The phase for large recombination rates is spectrally rigid, with the mean fitness and population distribution independent of the rate of recombination.

We proved the mutational deterministic hypothesis holds for horizontal gene transfer or recombination in both the parallel (Kimura) and Eigen models. That is, horizontal gene transfer and recombination reduce the mean fitness in the presence of positive epistasis and increase the fitness in the presence of negative epistasis (see Fig. 1 and Appendices L, M, N, and O).

For a discontinuous, sharp peak fitness landscape, we found that horizontal gene transfer does not affect the structure of the quasi-species distribution or the error threshold transition. For the sharp peak fitness function, the only appreciable effect of horizontal gene transfer is related to the potential emergence of metastability depending on the initial conditions, and we analytically determined the region of parameters space in which this situation may occur. On the other hand, even for the sharp peak fitness function, two-parent recombination induces enough mixing to enhance diversity in systems evolving under a sharp peak replication rate, thus changing the quasi-species distribution and shifting the error threshold transition. We found explicit analytical expressions for this shift.

For smooth fitness landscapes, these genetic transfers affect the steady-state population distribution and mean fitness. Recombination and horizontal gene transfer may, of course, dramatically change the dynamics of the evolution process as well. The most dramatic impact of these exchanges of genetic material is expected for fitness landscapes that have a correlated, biological structure that is conjugate to these exchanges [60]. Analytic investigation of such correlated fitness landscapes is perhaps one of the next steps in the development of modern theories of evolution.

FIG. 2.

FIG. 2

Pictorial representation of the horizontal gene transfer process considered.

FIG. 5.

FIG. 5

Convergence of the numerical results towards the theoretical value for two-parent recombination in the parallel (Kimura) model for the sharp peak fitness. In this example, A/μ = 4.0.

Acknowledgments

This work was supported by DARPA under the FunBio program and by the Korea Research Foundation.

APPENDIX A

We consider Eq. (9) for horizontal gene transfer of blocks of fixed length in the parallel model. For ε = t/M and M → ∞, we have

zk|eεĤ|zk1zk|zk1εzk|Ĥ|zk1zk|zk1eεzk|Ĥ|zk1zk|zk1. (A1)

For the Hamiltonian matrix elements in the coherent states basis, we obtain to order 𝒪(N0)

zk|Ĥ|zk1zk|zk1=Nf[1Nj=1Nzk*(j)σ3zk1(j)]+μj=1N[zk*(j)σ1zk1(j)1]+νb=0N/M¯1[jb=M¯b+1M¯(b+1)zk*(jb)Dzk1(jb)1] (A2)

We introduce the auxiliary field

ξk=1Nj=1Nzk*(j)σ3zk1(j) (A3)

and the conjugate field ξ̄k to enforce the constraint via a Laplace representations of the delta function. Substituting into Eq. (A2) into Eq. (9), we obtain

eĤt=limM[𝒟z*𝒟z][k=0MiεNdξ¯kdξk2π]|zMz0|×ek=1Mj=1N{1/2[zk*(j)·zk(j)+zk1*(j)·zk1(j)2zk*(j)·zk1(j)]+ε[zk*(j)(ξ¯kσ3+μσ1)zk1(j)]}×eεNk=1M[ξ¯kξk+μ+νMf(ξk)νNb=0N/M¯1jb=M¯b+1M¯(b+1)zk*(jb)Dzk1(jb)]. (A4)

The contribution of the interaction term νNb=0N/M¯1jb=M¯b+1M¯(b+1)zk*(jb)Dzk1(jb) to the partition function can be treated to arbitrary order in perturbation theory using the formula Z = Z0e−δS0, and its contribution shown to be site-independent. Moreover, this reference perturbation theory has 𝒪(N−1) fluctuations. Thus, it can be shown that with an error 𝒪(/N) at all orders in perturbation theory, we obtain the same partition function when substituting this interaction term by νM¯(1/Nj=1Nzk*(j)Dzk1(j))M¯. Therefore, we define the auxiliary field

ϕk=1Nj=1Nzk*(j)Dzk1(j). (A5)

We obtain the partition function from the trace of the evolution operator, Eq. (A4), projected onto physical states [33]

Z=Tr[eĤtP^]=02π[j=1Ndλj2πeiλj]limM[k=0M𝒟zk*𝒟zk]eS[z*,z]|z0=eiλzM. (A6)

By inserting Eq. (A5), we obtain

Z=limM[𝒟ξ¯𝒟ξ𝒟ϕ¯𝒟ϕ]eNεk=1M[ξ¯kξk+ϕ¯kϕkf(ξk)+μ+νMνMϕkM¯]×02π[j=1Ndλj2πeiλj][k=0M𝒟zk*𝒟zk]ej=1Nk,l=1Nzk*(j)Skl(j)zl(j)|zM=eiλz0 (A7)

The matrix S(j) in Eq. (A7) is defined by

S(j)=(I00eiλjA1A2I000A3I0000AMI) (A8)

Here Ak = I + ε(ξ̄kσ3 + μσ1 + ϕ̄kD).

After calculating the Gaussian integral over the coherent state fields, we obtain

limM02πj=1Ndλj2πeiλj[k=0M𝒟zk*𝒟zk]ej=1Nk,l=1Mzk*(j)Skl(j)zl(j)=limM02πj=1Ndλj2πeiλj[detS(j)]1=limM02πj=1Ndλj2πeiλjeTr ln[IeiλjT^exp(εk=1Mξ¯kσ3+μσ1+ϕ¯kD)]=limMj=1NTrT^eεk=1M(ξ¯kσ3+μσ1+ϕ¯kD)=QN, (A9)

where is the time ordering operator and

Q=TrT^e0tdt(ξ¯σ3+μσ1+ϕ¯D). (A10)

With this result the partition function in Eq. (A7) becomes Eq. (10).

APPENDIX B

From Eq. (13), we obtain the saddle-point equations with respect to the fields ξ̄c, ϕ̄c for horizontal gene transfer of blocks of fixed length in the parallel model:

δδξ¯c(ScNt)=ξc+2ξ¯c+uϕ¯c2[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c2)2]1/2=0 (B1)
δδϕ¯c(ScNt)=ϕc+12+uξ¯c+μ+ϕ¯c22[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c2)2]1/2=0. (B2)

Then, the system of Eqs. (B1) and (B2) reduces to

ξc=ξ¯c+u2ϕ¯c[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c2)2]1/2 (B3)
ϕc12=uξ¯c+μ+ϕ¯c22[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c2)2]1/2. (B4)

We eliminate ξ̄c, ϕ̄c, to obtain

ScNt=maxξc,ϕc{f(ξc)μνM¯+νM¯ϕcM¯+μ1u2(2ϕc1uξc)μ|u|1u2[(2ϕc1uξc)2(1u2)(1ξc2)]1/2}. (B5)

Finally, we look for an extremum in ϕc,

δδϕc(ScNt)=νM¯M¯ϕcM¯1+2μ1u2μ|u|1u22(2ϕc1uξc)[(2ϕc1uξc)2(1u2)(1ξc2)]1/2=0. (B6)

We solve for ϕc as a function of ξc from this equation

ϕc(ξc)=1+uξc2+1ξc221u2[1(u1+ν2μ(1u2)ϕcM¯1)2]1/2. (B7)

Substituting into Eq. (B5), we obtain for the mean fitness or average replication rate Eq. (14).

APPENDIX C

We consider Eq. (28) for horizontal gene transfer of blocks of variable length in the parallel model. For ε = t/M and M → ∞, we have

zk|eεĤ|zk1zk|zk1εzk|Ĥ|zk1zk|zk1eεzk|Ĥ|zk1zk|zk1. (C1)

For the Hamiltonian matrix elements in the coherent states basis, we obtain

zk|Ĥ|zk1zk|zk1=Nf[1Nj=1Nzk*(j)σ3zk1(j)]+μj=1N[zk*(j)σ1zk1(j)1]+νM¯NeM¯+M¯Nj=1Nzk*(j)Dzk1(j)νM¯N. (C2)

We introduce the fields

ξk=1Nj=1Nzk*(j)σ3zk1(j) (C3)
ϕk=1Nj=1Nzk*(j)Dzk1(j) (C4)

and the conjugate fields ϕ̄k and ξ̄k to enforce the constraints via Laplace representations of the Dirac delta functions. Substituting into Eq. (28), we obtain

eĤt=limM[𝒟z*𝒟z][k=1MiεNdξ¯kdξk2πiεNdϕ¯kdϕk2π]|zMz0|×ek=1Mj=1N{1/2[zk*(j)·zk(j)+zk1*(j)·zk1(j)2zk*(j)·zk1(j)]+ε[zk*(j)(ξ¯kσ3+μσ1+ϕ¯kD)zk1(j)]}×eεNk=1M[ξ¯kξk+ϕ¯kϕk+μ+νM¯f(ξk)νM¯eM¯(1ϕk)] (C5)

We obtain the partition function from the trace of the evolution operator Eq. (C5)

Z=Tr[eĤtP^]=02π[j=1Ndλj2πeiλj]limM[k=1MDzk*Dzk]eS[z*,z]|z0=eiλzM (C6)

By inserting Eq. (C5), we obtain

Z=limM[𝒟ξ¯𝒟ξ𝒟ϕ¯𝒟ϕ]eNεk=1M[ξ¯kξk+ϕ¯kϕkf(ξk)+μ+νM¯νM¯eM¯(1ϕk)]×02π[j=1Ndλj2πeiλj][k=1M𝒟zk*𝒟zk]ej=1Nk,l=1Mzk*(j)Skl(j)zl(j)|zM=eiλz0 (C7)

The matrix S(j) in Eq. (C7) is defined by

S(j)=(I00eiλjA1A2I000A3I0000AMI) (C8)

where Ak = I + ε(ξ̄kσ3 + μσ1 + ϕ̄kD).

After calculating the Gaussian integral over the coherent state fields, we obtain

limM02πj=1Ndλj2πeiλj[k=1M𝒟zk*𝒟zk]ej=1Nk,l=1Mzk*(j)Skl(j)zl(j)=limM02πj=1Ndλj2πeiλj[detS(j)]1=limM02πj=1Ndλj2πeiλjeTr ln[IeiλjT^exp(εk=1Mξ¯kσ3+μσ1+ϕ¯kD)]=limMj=1NTrT^eεk=1M(ξ¯kσ3+μσ1+ϕ¯kD)=QN (C9)

where

Q=TrT^e0tdt(ξ¯σ3+μσ1+ϕ¯D) (C10)

With this result, in the limit M → ∞, the partition function in Eq. (C7) becomes Eq. (29).

APPENDIX D

We consider recombination in the parallel model. For the Hamiltonian matrix elements in the coherent states basis, we obtain to order 𝒪(N0)

zk|Ĥ|zk1zk|zk1=Nf[1Nj=1Nzk*(j)σ3zk1(j)]+μj=1N[zk*(j)σ1zk1(j)1]+νN(g[{zk*(j)Djlzk1(j)}]1) (D1)

where the matrices Djl are defined by Eq. (38). We introduce the auxiliary fields

ξk=1Nj=1Nzk*(j)σ3zk1(j) (D2)

and the conjugate fields ξ̄k to enforce the constraints via a Laplace representations of the delta functions. Substituting into Eq. (40), we obtain

eĤt=limM[𝒟z*𝒟z][k=1MiNdξ¯kdξk2π]|zMz0|×ek=1Mj=1N{(1/2)[zk*(j)·zk(j)+zk1*(j)·zk1(j)2zk*(j)·zk1(j)]+εN[zk*(j)(ξ¯kσ3μσ1)zk1(j)]}×eεk=1M[ξ¯kξk+μ+νf(ξk)νg({zk*(j)Djlzk1(j)})] (D3)

We obtain the partition function from the trace of the evolution operator, Eq. (D3), for recombination in the parallel model

Z=Tr[eĤtP^]=02π[j=1Ndλj2πeiλj]limM[k=1MDzk*Dzk]eS[z*z]|z0=eiλzM (D4)

It is convenient to define the auxiliary field

ϕk=1Nj=1Nzk*(j)Dzk(j) (D5)

and the corresponding ϕ̄k to enforce the constraint by a Laplace representation of the Dirac delta function. From Eq. (D4), we have

Z=limM[𝒟ξ¯𝒟ξ𝒟ϕ¯𝒟ϕ]eNεk=1M[ξ¯kξk+ϕ¯kϕkf(ξk)+μ+ννg(ϕk)]×02π[j=1Ndλj2πeiλj][k=1M𝒟zk*𝒟zk]ej=1Nk,l=1Mzk*(j)Skl(j)zl(j)|zM=eiλz0 (D6)

Here, for large N the function g(ϕ) has the singular behavior g(ϕ) = 0 unless ϕ = 1 𝒪(1/N). We also notice g(1) = 1. The matrix S(j) in Eq. (D6) is defined by

S(j)=(I00eiλjA1A2I000A3I0000AMI) (D7)

Here, Ak = I + ε(ξ̄kσ3 + μσ1 + ϕ̄kD). After calculating the Gaussian integral over the coherent states fields, we obtain

limM02πj=1Ndλj2πeiλj[k=1M𝒟zk*𝒟zk]ej=1Nk,l=1Mzk*(j)Skl(j)zl(j)=limM02πj=1Ndλj2πeiλj[detS(j)]1=limM02πj=1Ndλj2πeiλjeTr ln[IeiλjT^exp(εk=1Mξ¯kσ3+μσ1+ϕ¯kD)]=limMj=1NTrT^eεk=1M(ξ¯kσ3+μσ1+ϕ¯kD)=QN (D8)

where in the continuous limit

Q=TrT^e0tdt(ξ¯σ3+μσ1+ϕ¯D) (D9)

With this result, the partition function in Eq. (D6) becomes Eq. (59).

APPENDIX E

The recombination operator

For the recombination process, we consider that in the first step, the polymerase enzyme starts the copying path in either of both parental chains with equal probability 1/2. Then, at each site, it can jump to the other chain with probability 0 < pc ≤ 1/2 or continue along the same chain with probability 1 − pc.

As presented in Section II.C, this process is represented in the general differential Eq. (1) by the coefficients in Eq. (35)

Rkli=12{αj=±1}(1+s1ks1i2)1+α12(1+s1ls1i2)1α12×[(1pc)1+α1α22pc1α1α22](1+s2ks2i2)1+α22(1+s2ls2i2)1α22×[(1pc)1+α2α32pc1α2α32](1+s3ks3i2)1+α32(1+s3ls3i2)1α32××[(1pc)1+αN1αN2pc1αN1αN2](1+sNksNi2)1+αN2(1+sNlsNi2)1αN2 (E1)

The operator for this process in the Schwinger-boson representation is presented in Eq. (37)

R^=12l=12Npl{αi=±1}[Î11+α12R^l(1)1α12]×[(1pc)1+α1α22pc1α1α22]×[Î21+α22R^l(2)1α22]×[(1pc)1+α2α32pc1α2α32]×[Î31+α32R^l(3)1α32]××[(1pc)1+αN1αN2pc1αN1αN2]×[ÎN1+αN2R^l(N)1αN2]Îg({R^l(j)})Î (E2)

Here, we define the single-site recombination operator as R^l(j)=â(j)Djlâ(j), with

Djl=(1+sjl21+sjl21sjl21sjl2) (E3)

and pl=ql/l=12Nql is the normalized probability for sequence 1 ≤ l ≤ 2N.

It is possible to group the different terms in the form of Ising-like traces, by using the definition J = −(1/2) ln[pc/(1 − pc)],

g({R^l(j)})=12[2cosh(J)](N1)l=12Npl{αj=±1}eJj=2Nαjαj1j=1N[1+αj2Îj+1αj2R^l(j)] (E4)

After the representation in terms of coherent states fields, we have R^l(j)zk*(j)Djlzk1(j)ψjl, and correspondingly gg({ψjl})

g({ψjl})=12[2cosh(J)](N1)l=12Npl{αj=±1}eJj=2Nαjαj1j=1N[1+αj2+1αj2ψjl] (E5)

It is convenient to reorganize this expression as

g({ψjl})=12[2cosh(J)](N1)l=12Nplj=1N(1+ψjl2){αj=±1}eJj=2Nαjαj1j=1N[1+αj1ψjl1+ψjl] (E6)

We define the transfer matrix

T=(eJeJeJeJ) (E7)

with eigenvalues λ+ = 2 cosh(J) and λ = 2 sinh(J).

The Ising trace in Eq. (E6) is given by

{αj=±1}eJj=2Nαjαj1={α1=±1}(α1|TN1|α1+α1|TN1|α1)=Tr[TN1]+Tr[TN1σ1]=λ+N1+λN1+λ+N1λN1=2λ+N1=2[2cosh(J)]N1 (E8)

By considering this formula, and expanding the product in Eq. (E6), we obtain

g({ψjl})=l=12Nplj=1N(1+ψjl2){1+j=1Nαj1ψjl1+ψjl+1k<mNαkαm1ψkl1+ψkl1ψml1+ψml+1k<m<nαkαmαn1ψkl1+ψkl1ψml1+ψml1ψnl1+ψnl++α1α2αNj=1N1ψjl1+ψjl} (E9)

In this notation, we defined the averages

αkαl12λ+N1{αj=±1}eJj=2Nαjαj1αkαl (E10)

We present the first and second order averages, to illustrate the general technique to obtain the higher orders.

The first order average is

αk=12λ+N1αj=±1eJj=2Nαjαj1αk=12λ+N1Tr{(1111)Tk1σ3TNk}=12λ+N1Tr{P1(1111)PP1Tk1PP1σ3PP1TNkP} (E11)

To evaluate the trace, we introduced the matrix P which diagonalizes the transfer matrix T

P=12(1111) (E12)

We use the identities

P1TP=(λ00λ+),P1(1111)P=(0002),P1σ3P=σ1 (E13)

Substituting into Eq. (E11), we obtain

αk=1λ+N1Tr{(0001)(λk100λ+k1)σ1(λNk00λ+Nk)}=0 (E14)

a result we expect due to the symmetry of the Hamiltonian in Eq. (E11). Following a similar procedure, we can express the second order correlation in the form

αkαm=12λ+N1Tr{(1111)Tk1σ3Tmkσ3TNm}=1λ+N1Tr{(0001)(λk100λ+k1)σ1(λmk00λ+mk)σ1(λNm00λ+Nm)}=λ+k1+Nmλmkλ+N1=(λλ+)mk=(tanh(J))mk=(12pc)mk (E15)

From the same analysis, we prove that the correlations for an odd number of α′s vanish, whereas those for an even number become

αkαlαmαn=(λλ+)lk+nm+=(tanh(J))lk+nm+=(12pc)lk+nm+ (E16)

Substituting into Eq. (E9), we obtain the finite series representation

g({ψjl})=l=12Nplj=1N(1+ψjl2){1+1k<mN(12pc)mk1ψkl1+ψkl1ψml1+ψml+1k<m<n<qN(12pc)mk+qn1ψkl1+ψkl1ψml1+ψml1ψnl1+ψnl1ψql1+ψql++(12pc)N12j=1N(1ψjl1+ψjl)} (E17)

Finally, we can obtain the alternative representation

g({ψjl})=l=12Npl{j=1N(1+ψjl2)+1k<m(12pc)mk1ψkl21ψml2jk,l1+ψjl2+1k<m<n<qN(12pc)mk+qn1ψkl21ψml21ψnl21ψql2×jk,m,n,qN1+ψjl2++(12pc)N2j=1N1ψjl2} (E18)

APPENDIX F

For the case of uniform crossover recombination, pc = 1/2, a simplified analysis can be carried out to obtain the large N, or Gaussian limit, of the recombination coefficients Ru1,u2u because permutation symmetry is exactly obeyed. For the child sequence created from parental sequences with number of “+1” sites as n1 and n2, the number of child sequences, n, with “+1” sites is given by the expression

n=i=1N(1+αi21+si12+1αi21+si22) (F1)

Here, the path followed by the polymerase while copying from either parental sequence is parametrized by the random variables αi = ±1, with 〈αi〉 = 0 and 〈αiαj〉 = δij. From Eq. (F1), we obtain the corresponding expression for the average composition of the child sequence, u = (N − 2n)/N

u=1Ni=1N(1+αi2si1+1αi2si2) (F2)

From Eq. (F2), we obtain the average

uα=1Ni=1Nsi1+si22=u1+u22 (F3)

To obtain the variance, we calculate

u2α=1N2i,j=1N(1+αi2si1+1αi2si2)(1+αj2sj1+1αj2sj2)α=14N2i,j=1N(si1+si2)(sj1+sj2)14N2i=1N(αisi1αisi2)2α=uα2+14N2i=1N(si1si2)2 (F4)

Therefore, we obtain the variance as

(δu)2α=14N2N·4·21+u21u21u22N (F5)

Hence, in the large N Gaussian limit, the recombination coefficients are given by the distribution

Ru1,u2u~eN[(u1+u2)/2u]2/(1u*2)π(1u*2)/N (F6)

where fm = f(u*).

For pc < 1/2, making the ansatz that correlations between spins at different sites remain 𝒪(N−1), the additional contribution to 〈(δu)2α is [1/(4N2)]ij(12pc)|ij|(si1si2)(sj1sj2)=[1/(2N2)]k>0i(12pc)k(si1si2)(si+k1sj+k2)~[1/(2N)]k>0(12pc)k[s1s22+𝒪(1/N)]~[1/(4pcN)][𝒪(1/N)2+𝒪(1/N)]~const/N2, and the large N limit becomes that of the pc = 1/2 case.

APPENDIX G

We consider the saddle point condition for recombination in the parallel model. First, we look for the saddle-point condition with respect to the fields ξ̄c, ϕ̄c

δδξ¯c(ScNt)=ξc+2ξ¯c+uϕ¯c2[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c2)2]1/2=0 (G1)
δδϕ¯c(ScNt)=ϕc+12+uξ¯c+μ+ϕ¯c22[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c2)2]1/2=0 (G2)

Eqs. (G1) and (G2) become

ξc=2ξ¯c+uϕ¯c2[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c2)2]1/2 (G3)
ϕc=12+uξ¯c+μ+ϕ¯c22[ξ¯c(ξ¯c+uϕ¯c)+(μ+ϕ¯c2)2]1/2 (G4)

By combining Eqs. (G3) and (G4), with the saddle-point action Eq. (58), we obtain Eq. (59).

APPENDIX H

We consider horizontal gene transfer of blocks of length M in the Eigen model. The matrix elements of the Hamiltonian in the basis of coherent states are given by

zk|Ĥ|zk1zk|zk1=Neμ+μNj=1Nzk*(j)σ1zk1(j)×eνM¯+νNb=0N/M¯1jb=M¯b+1M¯(b+1)zk*(jb)Dzk1(jb)f[1Nj=1Nzk*(j)σ3zk1(j)]Nd[1Nj=1Nzk*(j)σ3zk1(j)] (H1)

We introduce the auxiliary fields

ξk=1Nj=1Nzk*(j)σ3zk1(j) (H2)
ηk=1Nj=1Nzk*(j)σ1zk1(j) (H3)

and the corresponding conjugate fields ξ̄k, η̄k to enforce the constraints via Laplace representations of the Dirac delta functions. Therefore, Eq. (77) becomes

eĤt=limM[𝒟z*𝒟z]|zMz0|[k=1MiεNdξ¯kdξk2πiεNdη¯kdηk2π]×e1/2k=1M[zk*(j)·zk(j)+zk1*(j)·zk1(j)2zk*(j)·zk1(j)]×eεk=1Mj=1Nzk*(j)(ξ¯kσ3+η¯kσ1)zk1(j)eεNk=1M[ξ¯kξk+η¯kηk]×eεNk=1M[eμ(1ηk)ν/M¯+νNb=0N/M¯1jb=M¯b+1M¯(b+1)zk*(jb)Dzk1(jb)f(ξk)d(ξk)] (H4)

At this point, a perturbation theory analysis similar to the case of the horizontal gene transfer of finite blocks in the Kimura model leads us to conclude that to within error 𝒪(/N) at each order in perturbation theory, it is possible to substitute the recombination term by

νM¯(1Nj=1Nzk*(j)Dzk1(j))M¯ (H5)

Then, it is convenient to introduce the auxiliary field

ϕk=1Nj=1Nzk*(j)Dzk1(j) (H6)

and the corresponding ϕ̄k field to enforce the constraint through a Laplace representation of the Dirac delta function. The partition function is obtained from the trace of the evolution operator in Eq. (H4)

Z=Tr[eĤtP^]=02π[j=1Ndλj2πeiλj]limM[k=0M𝒟zk*𝒟zk]eS[z*,z]|z0=eiλzM (H7)

Thus, we obtain

Z=limM[𝒟ξ¯𝒟ξ𝒟η¯𝒟η𝒟ϕ¯Dϕ]eεNk=1M[ξ¯kξk+η¯kηk+ϕ¯kϕk]eεNk=1M[eμ(1ηk)ν/M¯+νM¯+ϕkM¯f(ξ)d(ξ)]×02π[dλj2πeiλj][k=1M𝒟zk*𝒟zk]ej=1Nk,l=1Mzk*(j)Skl(j)zl(j)|zM=eiλjz0 (H8)

The matrix S(j) in Eq. (H8) is defined by

S(j)=(I00eiλjA1A2I000A3I0000AMI) (H9)

Here Ak = I + ε(ξ̄kσ3 + η̄kσ1 + ϕ̄kD).

After calculating the Gaussian integral over the coherent states fields, we obtain

limM02πj=1Ndλj2πeiλj[k=1M𝒟zk*𝒟zk]ej=1Nk=1Mzk*(j)Skl(j)zl(j)=limM02πj=1Ndλj2πeiλj[detS(j)]1=limM02πj=1Ndλj2πeiλjeTr ln[IeiλjT^exp(εk=1Mξ¯kσ3+η¯kσ1+ϕ¯kD)]=limMj=1NTrT^eεk=1M(ξ¯kσ3+η¯kσ1+ϕ¯kD)=QN (H10)

where

Q=TrT^e0tdf(ξ¯σ3+η¯σ1+ϕ¯D) (H11)

With this result the partition function in Eq. (H8) becomes Eq. (78).

APPENDIX I

We consider the saddle-point equations for horizontal gene transfer of blocks of length M in the Eigen model:

δδξ¯c(ScNt)=ξc+ξ¯c+u2ϕ¯c[ξ¯c(ξ¯c+uϕ¯c)+(η¯c+ϕ¯c2)2]1/2=0 (I1)
δδϕ¯c(ScNt)=ϕc+12+uξ¯c+η¯c+ϕ¯c22[ξ¯c(ξ¯c+uϕ¯c)+(η¯c+ϕ¯c2)2]1/2=0 (I2)
δδϕc(ScNt)=ϕ¯c+νϕcM¯1eμ(1ηc)νM¯+νM¯ϕcM¯1f(ξc)=0 (I3)
δδη¯c(ScNt)=ηc+η¯c+ϕ¯c22[ξ¯c(ξ¯c+uϕ¯c)+(η¯c+ϕ¯c2)2]1/2=0 (I4)
δδηc(ScNt)=η¯c+μeμ(1ηc)νM¯+νM¯ϕcM¯f(ξc)=0 (I5)

We obtain the following identities

ξc=ξ¯c+uϕ¯c/2[ξ¯c(ξ¯c+uϕ¯c)+(η¯c+ϕ¯c/2)2]1/2 (I6)
ηc=η¯c+ϕ¯c/2[ξ¯c(ξ¯c+uϕ¯c)+(η¯c+ϕ¯c/2)2]1/2 (I7)
η¯c=μeμ(1ηc)νM¯(1ϕcM¯)f(ξc) (I8)
ϕc=12+12uξ¯c+η¯c+ϕ¯c/2[ξ¯c(ξ¯c+uϕ¯c)+(η¯c+ϕ¯c/2)2]1/2 (I9)
ϕ¯c=νϕcM¯1eμ(1ηc)νM¯(1ϕc)f(ξc) (I10)

Combining Eq. (I8) and Eq. (I10), we obtain

νη¯cϕcM¯1=μϕ¯c (I11)

From the system of Eqs. (I6)(I11), it can be shown that

ξ¯cξcη¯cηcϕ¯cϕc+lnQct=0 (I12)

APPENDIX J

We consider horizontal gene transfer of blocks of variable length in the Eigen model. The Hamiltonian matrix elements in the coherent states basis are given, to 𝒪(N−1), by

zk|Ĥ|zk1zk|zk1=Neμ+μNj=1Nzk*(j)Dzk1(j)×(1νM¯+νM¯eM¯+M¯Nj=1Nzk*(j)Dzk1(j))×f[1Nj=1Nzk*(j)σ3zk1(j)]Nd[1Nj=1Nzk*(j)σ3zk1(j)] (J1)

We introduce the auxiliary fields

ξk=1Nj=1Nzk*(j)σ3zk1(j) (J2)
ηk=1Nj=1Nzk*(j)σ1zk1(j) (J3)
ϕk=1Nj=1Nzk*(j)Dzk1(j) (J4)

and the corresponding ξ̄k, η̄k, ϕ̄k to enforce the constraints via Laplace representations of the Dirac delta functions. From Eq. (90), we obtain

eĤt=limM[𝒟z*𝒟z][k=1MiεNdξ¯kdξk2πiεNdη¯kdηk2πiεNdϕ¯kdϕk2π]|zMz0|×ek=1Mj=1N{1/2[zk*(j)·zk(j)+zk1*(j)·zk1(j)2zk*(j)·zk1(j)]+ε[zk*(j)(ξ¯kσ3+η¯kσ1+ϕ¯kD)zk1(j)]}×eεNk=1M{ξ¯kξkϕ¯kϕkη¯kηk+eμ(1ηk)[1νM¯+νM¯eM¯(1ϕk)]f(ξk)d(ξk)} (J5)

We obtain the partition function from the trace of the evolution operator Eq. (J5)

Z=Tr[eĤtP^]=limM02π[j=1Ndλj2πeiλj][k=1M𝒟zk*𝒟zk]eS[z*,z]|z0=eiλzM (J6)

By inserting Eq. (J5), we obtain

Z=limM[𝒟ξ¯𝒟ξ𝒟η¯𝒟η𝒟ϕ¯𝒟ϕ]eεNk=1M(ξ¯kξkη¯kηkϕ¯kϕk)×eεNk=1M{eμ(1ηk)[1νM¯+νM¯eM¯(1ϕk)]f(ξk)d(ξk)}×02π[j=1Ndλj2πeiλj][k=1M𝒟zk*𝒟zk]ej=1Nk,l=1Mzk*(j)Skl(j)zl(j)|zM=eiλz0 (J7)

The matrix S(j) in Eq. (J7) is defined by

S(j)=(I00eiλjA1A2I000A3I0000AMI) (J8)

Here Ak = I + ε(ξ̄kσ3 + η̄kσ1 + ϕ̄kD).

After calculating the Gaussian integral over the coherent states fields, we obtain

limM02πj=1Ndλj2πeiλj[k=1M𝒟zk*𝒟zk]ej=1Nk=1Mzk*(j)Skl(j)zl(j)=limM02πj=1Ndλj2πeiλj[detS(j)]1=02πj=1Ndλj2πeiλjeTr ln[IeiλjT^exp(εk=1Mξ¯kσ3+η¯kσ1+ϕ¯kD)]=limMj=1NTrT^eεk=1M(ξ¯kσ3+η¯kσ1+ϕ¯kD)=QN (J9)

where,

Q=TrT^e0tdt(ξ¯σ3+η¯σ1+ϕ¯D) (J10)

With this result the partition function in Eq. (J7) becomes Eq. (91).

APPENDIX K

We consider recombination in the Eigen model. The matrix elements of the Hamiltonian operator in the coherent states basis are given, to order 𝒪(N), by

zk|Ĥ|zk1zk|zk1=NeμeμNj=1Nzk*(j)σ1zk1(j)×[1ν+νg({zk*(j)Djlzk1(j)})]f[1Nj=1Nzk*(j)σ3zk1(j)]Nd[1Nj=1Nzk*(j)σ3zk1(j)] (K1)

Here we notice that the function g({zk*(j)Djlzk1(j)}) is the same as in Eq. (43). Therefore, the same analysis presented through Eqs. (43)(45) regarding the singular behavior of the function g applies for the Eigen model as well. Hence, in the large N limit, we have g(1Nj=1Nzk*(j)Dzk1(j)), with D=Djl being again the matrix defined in Eq. (42).

We introduce the auxiliary fields

ξk=1Nj=1Nzk*(j)σ3zk1(j) (K2)
ηk=1Nj=1Nzk*(j)σ1zk1(j) (K3)
ϕk=1Nj=1Nzk*(j)Dzk1(j) (K4)

and the corresponding conjugate fields ξ̄k, η̄k and φ̄k to enforce the constraints via Laplace representations of the Dirac delta functions. Thus, we have

eĤt=limM[𝒟z*𝒟z][k=1MiεNdξ¯kdξk2πiεNdη¯kdηk2πiεNdϕ¯kdϕk2π]×|zMz0|e12k=1Mj=1N[zk*(j)·zk(j)+zk1*(j)·zk1(j)2zk*(j)·zk1(j)]×eεk=1Mj=1Nzk*(j)[ξ¯kσ3+η¯kσ1+ϕ¯kD]zk1(j)eεNk=1M[ξ¯kξk+η¯kηk+ϕ¯kϕk]×eεNk=1M[eμ(1ηk)(1ν+νg(ϕk)f(ξk)d(ξk)] (K5)

The partition function is expressed by

Z=Tr[eĤtP^]=02π[j=1Ndλj2πeiλj]limM[k=1M𝒟zk*𝒟zk]eS[z*,z]|z0=eiλzM (K6)

By inserting Eq. (K5), we obtain

Z=limM[𝒟ξ𝒟ξ¯𝒟η¯𝒟η𝒟ϕ¯𝒟ϕ]×eεNk=1M[ξ¯kξk+η¯kηk+ϕ¯kϕk]eεNk=1M[eμ(1ηk)(1ν+νg(ϕk))f(ξk)d(ξk)]×02πj=1Ndλj2πeiλj[k=1M𝒟zk*𝒟zk]ej=1Nk,l=1Mzk*(j)Skl(j)zl(j)|z0=eiλzM (K7)

The Gaussian integral can be performed over the coherent state fields, to obtain the representation in Eq. (102). Here, the one-dimensional Ising trace is defined by

Q=TrT^e0tdt(ξ¯σ3+η¯σ1+ϕ¯D) (K8)

APPENDIX L

We analyze the effect of introducing different schemes of horizontal gene transfer in the parallel model. For the parallel model in the presence of horizontal gene transfer with blocks of size = 1, we obtain

dudν|ν0=u0ξ0+1ξ0212f(u0) (L1)

Here, (ξ0, u0) represents the solution for ν = 0, i.e., they are obtained from the system

[ξ]=f(ξ)+μ1ξ2μ (L2)
ξ|ξ=ξ0=0=f(ξ0)μξ01ξ02 (L3)
fm=f(u0)=[ξ0]=f(ξ0)+μ1ξ02μ (L4)

From Eq. (L4), we obtain u0 from the inverse function

u0=f1[[ξ0]]=f1[f(ξ0)+μ1ξ02μ] (L5)

Let us Taylor-expand Eq. (L5) near x = f0),

u0=f1[x]+(f1)[x]δx+(f1)[x](δx)22 (L6)

with δx=μ(1ξ021). Here, we use the inverse function theorem to obtain the derivatives

(f1)[x]=1f(f1[x])=1f(ξ0)
(f1)[x]=f(f1[x])(f(f1[x]))3=f(ξ0)(f(ξ0))3 (L7)

Hence, Eq. (L6) becomes

u0=ξ0+δxf(ξ0)f(ξ0)(f(ξ0))3(δx)22 (L8)

From Eq. (L3), we have

δxf(ξ0)=μ(1ξ021)μξ01ξ02=1ξ021ξ02ξ0 (L9)

From Eq. (L8) into Eq. (L7), after multiplying by ξ0, we have

u0ξ0=ξ02+ξ0δxf(ξ0)ξ0f(ξ0)(f(ξ0))3(δx)22=ξ02+ξ0(1ξ021ξ02)ξ0ξ0f(ξ0)f(ξ0)(f(ξ0))2(δx)22=11ξ02f(ξ0)(δx)22(f(ξ0))21ξ02μ (L10)

Therefore, we finally obtain

u0ξ0+1ξ021=f(ξ0)2(δx)2(f(ξ0))21ξ02μ (L11)

The sign of this expression is clearly determined by −f″ (ξ0), and hence after Eq. (L1) we obtain the condition

dudν|ν0={>0iff(ξ0)<0<0iff(ξ0)>0 (L12)

From Eq. (L12), we conclude that horizontal gene transfer will enhance selection towards the fittest individuals when negative epistasis is present [f″ (u) < 0], while it will introduce an additional load against selection, with the corresponding deleterious effect on the mean fitness, when positive epistasis is present [f″ (u) > 0]. This result proves that the mutational deterministic hypothesis holds for horizontal gene transfer of blocks of size = 1 in the parallel model.

For the case of horizontal gene transfer of blocks > 1, we obtain the equation

dudν|ν0=[1+u0ξ01+1ξ022]M¯1M¯f(u0) (L13)

We notice by expanding the binomial up to first order, that the leading term in Eq. (L13) is

dudν|ν0~u0ξ01+1ξ022f(u0) (L14)

which is identical to Eq. (L1), and hence the analysis presented for the case = 1 also applies for > 1, in particular Eq. (L12).

For the process of horizontal gene transfer with multiple-size blocks, with average 〈〉, we obtain the equation

dudν|ν0=eM¯2(u0ξ01+1ξ02)1M¯f(u0) (L15)

By expanding the exponential at first order, we obtain that the leading term in this case is also Eq. (L14), which is identical to Eq. (L1). Therefore, the analysis presented for = 1, and in particular Eq. (L12) applies in this case as well.

In conclusion, we proved that the mutational deterministic hypothesis, expressed in quantitative form by Eq. (L12), holds for the different forms of horizontal gene transfer discussed in our work for the parallel model.

APPENDIX M

We analyze the effect of introducing different schemes of horizontal gene transfer in the Eigen model.

For the Eigen model in the presence of horizontal gene transfer, and for zero degradation rate d(u) = 0, we obtain the equation

dudν|ν0=u0ξ0+1ξ0212f(u)eμ[11ξ02]f(ξ0) (M1)

The sign of this derivative is determined by the combination u0ξ0+1ξ021, where (ξ0, u0) represents the solution for ν = 0, i.e. they are obtained from the system

[ξ]=f(ξ)eμ(11ξ2) (M2)
ξ|ξ=ξ0=0=(f(ξ0)μξ01ξ02)eμ[11ξ02] (M3)
fm=f(u0)=[ξ0]=eμ(11ξ02)f(ξ0) (M4)

By inverting Eq. (M4), we obtain u0

u0=f1[[ξ0]]=f1[f(ξ0)eμ(11ξ02)] (M5)

We expand Eq. (M5) near x = f0), by applying identities Eqs. (L6L9)

u0=ξ0+δxf(ξ0)f(ξ0)[f(ξ0)]3(δx)22 (M6)

with δx=[eμ(11ξ02)1]f(ξ0)~μ[11ξ02]f(ξ0). From Eq. (M3), we have

δxf(ξ0)=μ[1ξ021]f(ξ0)μξ01ξ02f(ξ0)=1ξ021ξ02ξ0 (M7)

From Eq. (M7) into Eq. (M6), after multiplying by ξ0 we find

u0ξ0=ξ02+ξ01ξ021ξ02ξ0ξ0(δx)22f(ξ0)[f(ξ0)]3=11ξ02f(ξ0)ξ0(δx)2[f(ξ0)]3 (M8)

Hence, we obtain

u0ξ0+1ξ021=f(ξ0)ξ0(δx)2[f(ξ0)]3 (M9)

Clearly, the sign of this expression is determined by the sign of −f″(ξ0), and hence after Eq. (M1) we obtain the condition

dudν|ν0={>0iff(ξ0)<0<0iff(ξ0)>0 (M10)

which proves that the mutational deterministic hypothesis holds for horizontal gene transfer of blocks of size = 1 in the Eigen model.

For the case of horizontal gene transfer of blocks of size > 1, we obtain the equation

dudν|ν0=[1+u0ξ01+1ξ022]M¯1M¯f(u0)eμ(11ξ02)f(ξ0) (M11)

By expanding the binomial in the numerator of Eq. (M11) up to first order, we notice that the leading term is given by

dudν|ν0~u0ξ01+1ξ022f(u0)eμ(11ξ02)f(ξ0) (M12)

which is identical to Eq. (M1). Therefore, the analysis presented for the case = 1, and in particular Eq. (M10) applies for > 1 as well.

When considering the process of horizontal gene transfer of blocks of multiple size with average 〈〉, we obtain the equation

dudν|ν0=eM¯2(u0ξ01+1ξ02)1M¯f(u0)f(ξ0)eμ(11ξ02) (M13)

By expanding the exponential in Eq. (M13) up to first order, we notice that the leading term is given by Eq. (M12) in this case as well, which is identical to Eq. (M1). Therefore, the analysis presented for the process with = 1, and in particular Eq. (M10), applies for the process of horizontal gene transfer of multiple size blocks as well.

Summarizing, we proved that the mutational deterministic hypothesis, expressed quantitatively in Eq. (M10), holds for the different forms of horizontal gene transfer studied in this work for the Eigen model.

APPENDIX N

For the case of two-parent recombination in the parallel model, we find that the phase structure is defined by two fitness functions. A low ν-dependent phase S1, defined as the maximum in ξ of

ν(1)[ξ]=f(ξ)+μ(1ξ21)ν (N1)

The maximum of this expression, attained at ξ0, is obtained from the equation

ξν(1)[ξ0]=f[ξ0]μξ01ξ02 (N2)

We notice that the value ξ0 is the same as in the absence of recombination, when ν = 0. Therefore, from the self-consistency condition, we obtain for this phase

fm(1)=ν(1)[ξ0]=0[ξ0]ν=f(uν) (N3)

Here, we have denoted uν as the value of the average composition in phase S1, when the recombination rate is ν. Correspondingly, we also have from Eq. (N3) the exact relation

f(uν)=f(u0)ν (N4)

with f(u0) = ℱ00] and u0 the average composition in the absence of recombination, when ν = 0.

Let us define as u* the value of the average composition at the S2 phase, which is independent of the recombination rate. The value u* is obtained as the solution of the non-linear equation

f(u*)=2μu*1u*2 (N5)

We consider in Eq. (N4) the value ν = ν* at which the average fitness of the S1 and S2 phases are identical, as the condition uν* = u*,

ν*=f(u0)f(u*) (N6)

In Eq. (N6), let us consider the Taylor expansion of f(u*) near u0, up to first order in ε = u*u0,

ν*=εf(u*)+𝒪(ε2) (N7)

We expand Eq. (N5) near u0 at first order in ε = u*u0,

f(u0)+εf(u0)=2μ(u0+ε)1(u0+ε)2~2μ(u0+ε)1u02[12u01u02ε]1=2μu01u02+2μ1+u02(1u02)2ε+𝒪(ε2) (N8)

We solve explicitly for ε in Eq. (N8), and combine with Eq. (N7), to obtain an expression for ν*

ν*=f(u0)[f(u0)2μu01u02]f(u0)2μ1+u02(1u02)2 (N9)

Let us now analyze the sign of ν* as a function of the sign of the curvature of the fitness function, as defined by f″. We consider the Laurent series of f(u) for small u. That is,

f(u)=kuα
f(u)=kαuα1
f(u)=kα(α1)uα2 (N10)

where α > 0 to satisfy the monotonically increasing condition. This family of polynomials provides a representation of arbitrary, monotonically increasing functions for small u0.

The case α = 0, corresponding to a constant identical fitness for all sequence types in the population, possesses the trivial solution after Eq. (N2) ξ0 = 0, which implies u0 = 0, and after Eq. (N5) u* = 0. Thus a single non-selective phase is observed for this case, both in the presence and in the absence of recombination.

From Eq. (N10), we have f″ < 0 for α < 1, f″ > 0 for α > 1 and f″ = 0 at α = 1. We analyze these possible cases separately. From Eq. (N10) into Eq. (N9), we have

ν*=kαu0α(kαu0α2μu021u02)kα(α1)u0α2μu021+u02(1u02)2 (N11)

Case 1: α < 1, f″ < 0.

The denominator in Eq. (N11) is clearly negative, since α − 1 < 0 in this case.

The numerator, for u0 ≪ 1

kαu0α2μu021u02~kαu0α2μu02>0 (N12)

Therefore, in this case ν*=(>0)(<0)<0, and hence u*u0 > 0.

Case 2: 1 < α < 2, f″ > 0.

The denominator in Eq. (N11), for u0 ≪ 1 and α − 1 > 0,

kα(α1)u0α2μu021+u02(1u02)2~kα(α1)u0α2μu02>0 (N13)

The numerator is also positive, by the same argument as in Eq. (N12). Therefore, in this case ν*=(>0)(>0)>0, and hence u* < u0.

Case 3: α > 2, f″ > 0.

The denominator in Eq. (N11), for u0 ≪ 1 and α − 1 > 0,

kα(α1)u0α2μu021+u02(1u02)2~kα(α1)u0α2μu02<0 (N14)

The numerator is

kαu0α2μu021u02~kαu0α2μu02<0 (N15)

Therefore, in this case ν*=(<0)(<0)>0, and hence u*u0 < 0.

For α = 1, we obtain an exact solution from Eq. (N2), u0=1+μ2/k2μ/k. This result in Eq. (N11) yields ν* = 0, and thus u* = u0 for this particular case.

For α = 2, we have the analytical solution presented in Eqs. (71),

u*u0=12μk(1μk)=(1μk)2μ2k2(1μk)<0 (N16)

with ν*=μ22k>0.

Summarizing, we proved that

u*u0={>0,f<0<0,f>0 (N17)

This result proves the mutational deterministic hypothesis for two-parent recombination in the parallel model.

APPENDIX O

For the case of two-parent recombination in the Eigen model, we find that the phase structure is defined by two fitness functions. A low ν-dependent phase S1, defined as the maximum in ξ of

ν(1)[ξ]=(1ν)eμ[11ξ2] (O1)

The maximum of this expression, attained at ξ0, is obtained from the equation

ξν(1)[ξ0]=0
f(ξ0)=μξ01ξ02f(ξ0)
[lnf(ξ0)]=μξ01ξ02 (O2)

We notice that the value ξ0 is the same as in the absence of recombination, when ν = 0. Therefore, from the self-consistency condition, we obtain for this phase

fm(1)=ν(1)[ξ0]=(1ν)0[ξ0]=f(uν) (O3)

Here, we have denoted uν as the value of the average composition in phase S1, when the recombination rate is ν. Correspondingly, we also have from Eq. (O3) the exact relation

f(uν)=(1ν)f(u0) (O4)

with f(u0) = ℱ00] and u0 the average composition in the absence of recombination, when ν = 0.

Let us define as u* the value of the average composition at the S2 phase, which is independent of the recombination rate. The value u* is obtained as the solution of the non-linear equation

f(u*)=2μu*1u*2f(u*)
[lnf(u*)]=2μu*1u*2 (O5)

We consider in Eq. (O3) the value ν = ν* at which the average fitness of the two phases are equal, as the condition uν* = u*,

1ν*=f(u*)f(u0) (O6)

We take the logarithm of this expression, and Taylor expand up to first order in ε = u*u0,

ln(1ν*)=ln[f(u0+ε)]ln[f(u0)]ν*=ε[lnf(u0)] (O7)

We expand Eq. (O5) near u0 at first order in ε = u*u0,

[lnf(u0)]+ε[lnf(u0)]=2μ(u0+ε)1(u0+ε)2=2μ(u0+ε)1u02[12u01u02ε]1+𝒪(ε2)=2μu01u02+2μ1+u02(1u02)2ε+𝒪(ε2) (O8)

We solve explicitly for ε in Eq. (O8), and combine with Eq. (O7), to obtain an expression for ν*

ν*=[lnf(u0)][lnf(u0)]2μu01u02[lnf(u0)]2μ(1+u02)(1u02)2 (O9)

The analysis follows the same lines as in the parallel model case. That is, we analyze the sign of ν* after Eq. (O9). We consider a family of polynomials f(u) = kuα + k0, which for u0 ≪ 1

lnf(u)=ln(1+kk0uα)+ln(k0)~kk0uα+ln(k0)
[lnf(u)]=αkk0uα1
[lnf(u)]=α(α1)kk0uα2 (O10)

with α > 0 to satisfy the monotonically increasing condition. This family of polynomials provides a representation of smooth and monotonically increasing functions for small u0.

The case α = 0 corresponds to a constant identical fitness for all sequence types in the population, and possesses the trivial solution after Eq. (O2) ξ0 = 0, which implies u0 = 0, and after Eq. (O5) u* = 0. Therefore, a single non-selective phase is observed for this case, both in the presence and in the absence of recombination.

From Eq. (O10), we have f″ < 0 for α < 1, f″ > 0 for α > 1 and f″ = 0 at α = 1. We analyze these possible cases separately. From Eq. (O10) into Eq. (O9), we have

ν*=kk0αu0α(kk0αu0α2μu021u02)kk0α(α1)u0α2μu021+u02(1u02)2 (O11)

Case 1: α < 1, f″ < 0.

The denominator in Eq. (O11) is clearly negative, since α − 1 < 0 in this case.

The numerator, for u0 ≪ 1

kk0αu0α2μu021u02~kk0αu0α2μu02>0 (O12)

Therefore, in this case ν*=(>0)(<0)<0, and hence u*u0 > 0.

Case 2: 1 < α < 2, f″ > 0.

The denominator in Eq. (O11), for u0 ≪ 1 and α − 1 > 0,

kk0α(α1)u0α2μu021+u02(1u02)2~kk0α(α1)u0α2μu02>0 (O13)

The numerator is also positive, by the same argument as in Eq. (O12). Therefore, in this case ν*=(>0)(>0), and hence u* < u0.

Case 3: α > 2, f″ > 0.

The denominator in Eq. (O11), for u0 ≪ 1 and α − 1 > 0,

kk0α(α1)u0α2μu021+u02(1u02)2~kk0α(α1)u0α2μu02<0 (O14)

The numerator is

kk0αu0α2μu021u02~kk0αu0α2μu02<0 (O15)

Therefore, in this case ν*=(<0)(<0)>0, and hence u*u0 < 0.

For α = 1, we find that for u* ≪ 1 and u0 ≪ 1, u*=k2μk0+𝒪(k2μk0)2,ξ0=kμk0+𝒪(k2μk0)2 and u0=k2μk0+𝒪(k2μk0)2. Therefore, u*u0 = 0 and ν* = 0 in this case.

For α = 2, we have the exact solution expressed in Eqs. (113), (114). The region of parameters space where phases S1 and S2 intersect is 2μk0k<1. We analyze these formulas considering that u* < 1 and u0 < 1. It is convenient to define in this case the small parameter ε=2μk0k<1. From Eq. (O5), we have

u*=11+μ𝒪(ε) (O16)

Expanding Eq. (113) up to first order in ε, we obtain the result

u0=21+μ21μ2𝒪(ε) (O17)

Therefore, for ε ≪ 1, from Eq. (O17) and Eq. (O16), when α = 2, u* < u0, and hence ν* > 0.

Summarizing, we have shown that

u*u0={>0,f<0<0,f>0 (O18)

This result proves the mutational deterministic hypothesis for two-parent recombination in the Eigen model.

References

  • 1.Cohen E, Kesslerm DA, Levine H. Phys. Rev. Lett. 2005;94:098102. doi: 10.1103/PhysRevLett.94.098102. [DOI] [PubMed] [Google Scholar]
  • 2.Muller HJ. Mutation Research. 1964;1:2. doi: 10.1016/0027-5107(64)90047-8. [DOI] [PubMed] [Google Scholar]
  • 3.Lawrence JG. Trends Microbiol. 1997;5:355. doi: 10.1016/S0966-842X(97)01110-4. [DOI] [PubMed] [Google Scholar]
  • 4.Patten PA, Howard RJ, Stemmer WPC. Curr. Opin. Biotechnol. 1997;8:724. doi: 10.1016/s0958-1669(97)80127-9. [DOI] [PubMed] [Google Scholar]
  • 5.Lutz S, Benkovic SJ. Curr. Opin. Biotechnol. 2000;11:319. doi: 10.1016/s0958-1669(00)00106-3. [DOI] [PubMed] [Google Scholar]
  • 6.Otto SP, Lenormand T. Nature Rev. Genet. 2002;3:252. doi: 10.1038/nrg761. [DOI] [PubMed] [Google Scholar]
  • 7.Arjan J, de Visser GM, Elena SF. Nature Rev. Genet. 2007;8:139. doi: 10.1038/nrg1985. [DOI] [PubMed] [Google Scholar]
  • 8.Misevic D, Ofria C, Lenski RE. Proc. R. Soc. B. 2006;273:457. doi: 10.1098/rspb.2005.3338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kondrashov AS. Genet. Res. 1982;42:325. doi: 10.1017/s0016672300019194. [DOI] [PubMed] [Google Scholar]
  • 10.Kondrashov AS. J. Hered. 1993;84:372. doi: 10.1093/oxfordjournals.jhered.a111358. [DOI] [PubMed] [Google Scholar]
  • 11.Azevedo RBR, Lohaus R, Srinivasan S, Dang KK, Burch CL. Nature. 2006;440:87. doi: 10.1038/nature04488. [DOI] [PubMed] [Google Scholar]
  • 12. Phillips PC, Otto SP, Whitlock MC. In: Beyond the average. Wolf JB, Brodie ED III, Wade MJ, editors. Oxford University Press; 2000. ISBN-0-19-512806-0, chap. The evolutionary importance of gene interactions and variability of epistatic effects.
  • 13.Kimura M, Maruyama T. Genetics. 1966;54:1337. doi: 10.1093/genetics/54.6.1337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kondrashov AS. Nature. 1988;336:435. doi: 10.1038/336435a0. [DOI] [PubMed] [Google Scholar]
  • 15.Kouyos RD, Silander OK, Bonhoeffer S. Trends Ecol. Evol. 2007;22:310. doi: 10.1016/j.tree.2007.02.014. [DOI] [PubMed] [Google Scholar]
  • 16.Rice WR, Chippindale AK. Science. 2001;294:555. doi: 10.1126/science.1061380. [DOI] [PubMed] [Google Scholar]
  • 17.Kouyos RD, Otto SP, Bonhoeffer S. Genetics. 2006;173:589. doi: 10.1534/genetics.105.053108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Liberman U, Feldman MW. Theor. Popul. Biol. 2005;67:141. doi: 10.1016/j.tpb.2004.11.001. [DOI] [PubMed] [Google Scholar]
  • 19.Liberman U, Puniyani A, Feldman MW. Theor. Popul. Biol. 2007;71:230. doi: 10.1016/j.tpb.2006.10.00. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Liberman U, Feldman M. Theor. Popul. Biol. 2008:307–316. doi: 10.1016/j.tpb.2007.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Malmberg RL. Genetics. 1977;86:607. doi: 10.1093/genetics/86.3.607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bonhoeffer S, Chappey C, Parkin NT, Whitcomb JM, Petropoulos CJ. Science. 2004;306:1547. doi: 10.1126/science.1101786. [DOI] [PubMed] [Google Scholar]
  • 23.Wloch DM, Borts RH, Korona R. J. Evol. Biol. 2001;14:310. [Google Scholar]
  • 24.Eigen M, Schuster P. Naturwissenschaften. 1971;58:465. doi: 10.1007/BF00623322. [DOI] [PubMed] [Google Scholar]
  • 25.Eigen M, McCaskill J, Schuster P. J. Phys. Chem. 1988;92:6881. [Google Scholar]
  • 26.Eigen M, McCaskill J, Schuster P. Adv. Chem. Phys. 1989;75:149. [Google Scholar]
  • 27.Biebricher CK, Eigen M. Virus Res. 2005;107:117. doi: 10.1016/j.virusres.2004.11.002. [DOI] [PubMed] [Google Scholar]
  • 28.Crow JF, Kimura M. An introduction to population genetics theory. New York: Harper and Row; 1970. [Google Scholar]
  • 29.Baake E, Wagner H. Genet. Res. Camb. 2001;78:93. doi: 10.1017/s0016672301005110. [DOI] [PubMed] [Google Scholar]
  • 30.Tarazona P. Phys. Rev. A. 1992;45:6038. doi: 10.1103/physreva.45.6038. [DOI] [PubMed] [Google Scholar]
  • 31.Leuthausser I. J. Stat. Phys. 1987;48:343. [Google Scholar]
  • 32.Franz S, Peliti L. J. Phys. A: Math. Gen. 1997;30:4481. [Google Scholar]
  • 33.Park J-M, Deem MW. J. Stat. Phys. 2006;123:975. [Google Scholar]
  • 34.Saakian DB, Muñoz C-K, Hu E, Deem MW. Phys. Rev. E. 2006;73:041913. doi: 10.1103/PhysRevE.73.041913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Domingo E, Sabo D, Taniguchi T, Weissman C. Cell. 1978;13:735. doi: 10.1016/0092-8674(78)90223-4. [DOI] [PubMed] [Google Scholar]
  • 36.Domingo E, Escarmis C, Lazaro E, Manrubia SC. Virus Res. 2005;107:129. doi: 10.1016/j.virusres.2004.11.003. [DOI] [PubMed] [Google Scholar]
  • 37.Ortin J, Najera R, Lopez C, Davila M, Domingo E. Gene. 1980;11:319. doi: 10.1016/0378-1119(80)90072-4. [DOI] [PubMed] [Google Scholar]
  • 38.Domingo E, Martinez-Salas E, Sobrino F, de la Torre JC, Portela A, Ortin J, Lopez-Galindez C, na PP-B, Villanueva N, Najera R, et al. Gene. 1985;40:1. doi: 10.1016/0378-1119(85)90017-4. [DOI] [PubMed] [Google Scholar]
  • 39.Eigen M. Proc. Natl. Acad. Sci. USA. 2002;99:13374. doi: 10.1073/pnas.212514799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Graci JD, Harki DA, Korneeva VS, Edathil JP, Too K, Franco D, Smidansky ED, Paul AV, Peterson BR, Brown DM, et al. J. Virol. 2007;81:11256. doi: 10.1128/JVI.01028-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Loeb LA, Essigmann JM, Kazazi F, Zhang J, Rose KD. Proc. Natl. Acad. Sci. USA. 1999;96:1492. doi: 10.1073/pnas.96.4.1492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Loeb LA, Mullins JI. AIDS Res. Hum. Retroviruses. 2000;16:1. doi: 10.1089/088922200309539. [DOI] [PubMed] [Google Scholar]
  • 43.Bull JJ, Sanjuan R, Wilke CO. J. Virol. 2007;81:2930. doi: 10.1128/JVI.01624-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Baake E, Baake M, Wagner H. Phys. Rev. Lett. 1997;78:559. [Google Scholar]
  • 45.Baake E, Baake M, Wagner H. Phys. Rev. E. 1998;57:1191. [Google Scholar]
  • 46.Saakian DB, Hu C-K. Proc. Natl. Acad. Sci. USA. 2006;103:4935. doi: 10.1073/pnas.0504924103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Saakian DB, Hu C-K, Kachatryan H. Phys. Rev. E. 2004;70:041908. doi: 10.1103/PhysRevE.70.041908. [DOI] [PubMed] [Google Scholar]
  • 48.Saakian DB, Hu C-K. Phys. Rev. E. 2004;70:021913. [Google Scholar]
  • 49.Park J-M, Deem MW. Phys. Rev. Lett. 2007;98:05101. doi: 10.1103/PhysRevLett.98.058101. [DOI] [PubMed] [Google Scholar]
  • 50.Boerlijst MC, Bonhoeffer S, Nowak MA. P. Roy. Soc. London B. 1996;263:1577. [Google Scholar]
  • 51.Jacobi MN, Nordahl M. Theor. Popul. Biol. 2006;70:479. doi: 10.1016/j.tpb.2006.08.002. [DOI] [PubMed] [Google Scholar]
  • 52.Lee BP, Cardy J. J. Stat. Phys. 1995;80:971. [Google Scholar]
  • 53.Mattis DC, Glasser ML. Rev. Mod. Phys. 1998;70:979. [Google Scholar]
  • 54.Peliti L. J. Physique. 1985;46:1469. [Google Scholar]
  • 55.There was a typo in [49] for the sharp peak fitness case: the formula reads P0 = 1 − μ/A [instead of u, which is unity to O(1/N)]. Similarly for the Eigen model P0 = (Ae−μA0)/(AA0) (rather than u, which is unity).
  • 56.The alternative choice uξc < u2, combined with the self-consistency condition, leads to the equation max−1≤ξc≤1 fc) = f(u), whose unique solution for the quadratic fitness is |ξc| = |u| = 1, in contradiction with the assumption uξc < u2. Hence, we necessarily have uξcu2.
  • 57.Bortz AB, Kalos MH, Lebowitz JL. J. Comput. Phys. 1995;17:10. [Google Scholar]
  • 58.Gillespie DT. J. Comput. Phys. 1976;22:403. [Google Scholar]
  • 59.For the case φc = 1 there is an alternative solution ηc = 1. This implies the equation uξ̄c = u[f′(ξc) − d′(ξc)] = 0, whose solution is u = 0, unless there is an absolute maxima for fc) − dc) in the interior of the region |ξc| < 1, both of which possibilities are contained in Eq. (107).
  • 60.Sun J, Deem MW. Phys. Rev. Lett. 2007;99:228107. doi: 10.1103/PhysRevLett.99.228107. [DOI] [PubMed] [Google Scholar]

RESOURCES