Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2015 Jun 23.
Published in final edited form as: Phys Rev E Stat Nonlin Soft Matter Phys. 2015 Jan 28;91(1):012714. doi: 10.1103/PhysRevE.91.012714

Quasispecies Theory for Evolution of Modularity

Jeong-Man Park 1, Liang Ren Niestemski 1, Michael W Deem 1
PMCID: PMC4477872  NIHMSID: NIHMS700252  PMID: 25679649

Abstract

Biological systems are modular, and this modularity evolves over time and in different environments. A number of observations have been made of increased modularity in biological systems under increased environmental pressure. We here develop a quasispecies theory for the dynamics of modularity in populations of these systems. We show how the steady-state fitness in a randomly changing environment can be computed. We derive a fluctuation dissipation relation for the rate of change of modularity and use it to derive a relationship between rate of environmental changes and rate of growth of modularity. We also find a principle of least action for the evolved modularity at steady state. Finally, we compare our predictions to simulations of protein evolution and find them to be consistent.

I. INTRODUCTION

Biological systems have long been recognized to be modular. In 1942 Waddington presented his now classic description of a canalized landscape for development, in which minor perturbations do not disrupt the function of developmental modules [1]. In 1961 H. A. Simon described how biological systems are more efficiently evolved and are more stable if they are modular [2]. A seminal paper by Hartwell et al. firmly established the concept of modularity in cell biology [3]. Systems biology has since provided a wealth of examples of modular cellular circuits, including metabolic circuits [4, 5] and modules on different scales, i.e. modules of modules [6]. Protein-Protein interaction networks have been observed to be modular [79]. Ecological food webs have been found to be modular [10]. The gene regulatory network of the developmental pathway exhibits modules [11, 12], and the developmental pathway is modular [13]. Modules have even been found in physiology, specifically in spatial correlations of brain activity [14, 15].

The modularity of a biological system can change over time. There are a number of demonstrations of the evolution of modularity in biological systems. For example, the modularity of the protein-protein interaction network significantly increases when yeast is exposed to heat shock [16], and the modularity of the protein-protein networks in both yeast and E. coli appears to have increased over evolutionary time [17]. Additionally, food webs in low-energy, stressful environments are more modular than those in plentiful environments [18], arid ecologies are more modular during droughts [19], and foraging of sea otters is more modular when food is limiting [20]. Other complex dynamical systems exhibit time-dependent modularity as well. The modularity of social networks changes over time: stock brokers instant messaging networks are more modular under stressful market conditions [21], and socio-economic community overlap decreases with increasing stress [22]. Modularity of financial networks changes over time: the modularity of the world trade network has decreased over the last 40 years, leading to increased susceptibility to recessionary shocks [23], and increased modularity has been suggested as a way to increase the robustness and adaptability of the banking system [24]. Much of the research on modularity has suggested that gene duplication, horizontal gene transfer, and changes in the total number of connections may all play a role in the evolution of modularity [2527].

In an effort to proceed further with these observations, we here present a quasispecies theory for the evolutionary dynamics of modularity. This analytical theory complements numerical models that have investigated the dynamics of modularity [2730]. We assume that modularity can be quantified in the system under study. We further assume that modularity is a good order parameter to describe the state of the system. That is, we project the dynamics onto the slow mode of modularity, M. In section II we introduce the quasispecies description for the dynamics of modularity. The details of the sequence level evolutionary dynamics are what, when projected out, define the fitness function f(m) introduced in this section. In section III we show how the steady-state fitness in a randomly changing environment can be computed from the time-dependent average fitness starting from random initial conditions. In section IV we derive a fluctuation dissipation theory for the dynamics of modularity. In section V we derive a relationship between rate of environmental change and rate of growth of modularity. In section VI we find the evolved, steady-state value of modularity by a principle of least action. In section VII we compare some of the predictions to simulations of protein evolution. We conclude in section VIII.

II. THE QUASISPECIES THEORY FOR DYNAMICS OF MODULARITY

Quasispecies theory captures the basic aspects of mutation and evolutionary selection in large, evolving populations [31, 32]. These models have been widely used in the physics literature to describe evolutionary biology [33]. A series of papers showed how these models could be solved in the steady-state limit, first by a mapping to an inhomogeneous Ising model [3438] and later by solution with functional integral techniques [3941]. A Hamilton-Jacobi approach has been used to derive dynamical predictions in these models [42]. Quasispecies theory has been extended to larger alphabets [43] and to describe the effects of horizontal gene transfer [4446] and finite populations [47, 48].

We here develop quasispecies theory for the dynamics of modularity. We consider a population of systems, where each system is characterized by a specific connection matrix, from which the modularity can be calculated. Evolution occurs within each system by mechanisms such as point mutation or horizontal gene transfer. Horizontal gene transfer is not allowed between systems, because such events would violate the assumption that the fitness of each system depends only on the modularity of that system. Competition occurs both within and between systems. The evolutionary dynamics of this population of systems is fully specified by the rate at which each system reproduces, f, termed “fitness,” and the rate at which changes of modularity arise, μ. Since the state of each system is specified by the slow modularity variable, M, the fitness is a function of the modularity, f = f(M). The f(M) function is from a detailed calculation, numerical simulation, or experimental observation of the competitive evolutionary dynamics within each system with a given value of modularity. Thus, the rate at which a system with modularity M replicates, f(M), is an input to the theory to be derived here. The present theory predicts how modularity in the population of systems will evolve, given the replication rates and mutation rates.

The fitness function f(M) fundamentally characterizes an evolving network. With this f(M), the dynamics of modularity can be calculated. For example, the f(M) could be deduced for the evolution of the protein-protein interaction network in E. coli, showing the evolutionary advantage of modularity for this system [17]. The f(M) is the driving force for spontaneous emergence of modularity in a protein network [27]. The f(M) quantifies the benefit of modularity to a system, and we will show that modularity evolves to a finite modularity at steady state in a population of systems.

Modularity is defined on a network of nodes and edges. Thus, the fundamental object describing each system is the connection matrix, with the ij element of the connection matrix representing the value of edge ij. The connection matrix gives the links between the nodes of the network. For example, in the protein-protein interaction network, the nodes are the proteins and the links tell one whether protein i interacts with protein j. Modularity of each system is calculated directly from the connection matrix of that system, and rearrangement of the connections within this matrix changes the modularity of a given system.

The connection matrix, Δij, is a binary matrix that denotes whether nodes i and j interact (Δij = 1) or not (Δij = 0). The detailed dynamics of the system may well have non-trivial couplings between nodes [27], and the connection matrix is the projection of the non-zero couplings. We allow each node to be connected to C other nodes on average. The number of nodes is denoted by L. Rearrangement of the entries within this matrix changes the modularity of the matrix. For simplicity, we assume that the modules which form are of size l. There are two ways to view the fixed partitioning that we consider. First, this partitioning results from modularity that is induced by horizontal gene transfer of segments with fixed length l, as was previously shown [17, 27]. Second, biological modules are often of roughly fixed size, so it is not too much of a simplification to say the module size is constant for all modules. A fixed partitioning is a subset of all possibilities; in this work, we consider only this fixed partitioning. Thus a modular system will have an excess of connections along the l × l block diagonals of the connection matrix. In other words, the probability of a connection is C0/L outside the block diagonals when ⌊i/⌋ ≠ ⌊j/l⌋ and C1/L inside the block diagonals when ⌊i/l⌋ = ⌊j/l⌋, with C = C0 + (C1C0)l/L. Modularity is defined by the excess of connections in the block diagonals, over that observed outside the block diagonals: M = (C1C0)l/(LC).

Modularity changes because the entries in the connection matrix change. There are several possible models for how the connection matrix may reorganize. We here consider the model in which connections may independently reorganize. This model is biologically appropriate when connections between nodes are governed by independent pieces of structure in each node. We are not specifically considering “hub” nodes that connect to a very large number of other nodes. A model of this effect would be hierarchical. We are here considering one level of this hierarchy in the present model. Thus, we here consider a simple model in which each of these connections has a rate μ to rewire. That is, we define μ to be the rate at which any given 1 in the Δ matrix hops to another random location. In a typical biological system there are a finite number of connections per site, even for a large matrix, and so we consider the limit of C finite and L large, i.e. a dilute matrix of connections. Thus, the entries in the connection matrix each have rate μ to independently move to a new position in the connection matrix, and collisions between connections do not significantly affect the dynamics in the dilute limit.

When the population of systems is large, the probability distribution to have a connection matrix with modularity m obeys (see Appendix A)

dPm(t)dt=L[f(m)-f]Pm(t)+μCl[(1-m)(1-lL)+1LC]Pm-1/[(L-l)C](t)+μC(L-l)[m+(1-m)lL+1LC]Pm+1/[(L-l)C](t)+μC(L-l)(m+2(1-m)lL)Pm(t) (1)

where m takes values −l/(Ll), (−l+1/C)/(Ll), (−l+2/C)/(Ll),…, 1. The average fitness is given by

f(t)=mf(m)Pm(t) (2)

The average modularity as a function of time is given by M(t) = ΣmmPm(t).

III. THE STEADY-STATE FITNESS IN A RANDOMLY FLUCTUATING ENVIRONMENT

We here consider how to describe the effect of environmental change on the evolution of modularity. We characterize the environmental changes by their magnitude and frequency. We denote the magnitude of environmental change by p. If p = 0, the environment does not change at all, and if p = 1, the environment is completely different before and after the change. Although the environmental change is random, on average a fraction p of the environment’s effect on the fitness of the system is modified by the change. This model is used to describe evolution of influenza viruses, where p is defined as above [49, 50]. In application to data on influenza vaccines, p is termed pepitope and serves as an accurate order parameter to characterize how effective a vaccine against one strain will be in protecting against another strain that is distance pepitope away [5153]. Here we consider these environmental changes to occur with a frequency, which we denote by 1/T. In particular, we consider that the environmental changes occur every T timesteps. This characterization of environmental change by magnitude and frequency, p and 1/T, has been used extensively in the past [17, 18, 23, 27, 54].

A changing environment will put pressure on the system to have an efficient response function. As the environment changes, the favorable niches for the system change, and the system must adapt to the changing landscape. The more rapidly the environment changes or the more dramatically the environment changes, the more pressure there is on the system to be adaptable. As noted above, it has been widely observed that systems under pressure tend to become more modular. The mean fitness of the systems a time T after an environmental change will depend on the magnitude of the change, p, as well as the modularity. We denote this value by fp,T (M). We can derive this function fp,T (M) for any p and T from the average fitness as a function of time, starting from random initial conditions, which we denote as 〈g〉(t), with 〈g〉(0) = 0. See Fig. 1 for a depiction of the hierarchy of evolutionary timescales. The observable 〈g〉(t), Fig. 1c, is an input to the theory presented here and comes from a detailed calculation, numerical simulation, or experimental observation of the competitive evolutionary dynamics. The change of environment decreases the fitness by 1−p on average [54], and the time of evolution in each environment is T. These two conditions imply fp,T (M) = 〈g〉(t*) where t* is defined by

FIG. 1.

FIG. 1

a) Shown is the fitness of a single evolving system with a given modularity as a function of time. Positive fitness means growth of the system. The environment is repeatedly changed each T = 300 time steps. Shown in b) is the average of these responses during a time 0 to T after each environmental change, averaged over many environmental changes. Shown in c) is the average response function to p = 1 environmental changes, 〈g〉(t). The response function in b) follows from a master response function curve in c), being the t*T to t*subset where 〈g〉(t*T) = (1−p)〈g〉(t*). Here p = 0.3 and T = 300. The present theory applies once the curve in c) has been determined.

g(t-T)=(1-p)g(t) (3)

The function fp,T (M) tells us the average, evolved fitness of the system at the end of each environmental change. This function can be considered to be the fitness when the environmental change is integrated out. This fp,T (M) is the fitness function that goes into Eq. (1).

Evolution of modularity depends on how the response function fp,T (M) of the system varies with the parameters of environmental change, p and T. Since systems under stress tend to become more modular, an interpretation is that the average fitness for a modular system is greater than that for a non-modular system, at least for small T or large p where stress is large. This behavior has been observed in a model of systems evolving in a changing environment, when horizontal gene transfer is included [27]. We have recently proved this canonical behavior for a Moran model of population evolution in a glassy, modular fitness landscape [55]. Glassy evolutionary dynamics has been noted a number of times [56, 57]. Conversely, at long time, the less modular system should have a higher fitness, because modularity is a constraint on the optima that can be achieved.

In Eq. (1), we here take this function f(m) as input. We assume only that the population averages for large M and small M look like the dashed and solid curves in Fig. 2a. Putting these points together, the quasispecies theory presented here quantitatively describes the emergence of modularity at small p or large T, as shown in Figs. 3 and 2b.

FIG. 2.

FIG. 2

Shown is the fitness of an evolving system. a) The fitness of the non-modular (〈g0〉, solid) and block-diagonal (〈g1〉, dashed) system are shown, starting from a random initial configuration. These 〈g0〉 and 〈g1〉 are inputs to the theory. The modular system is taken to be more fit at short time and less fit at long time. b) The evolved, steady-state fitness of a system predicted by the theory in a changing environment (dot dashed), shown for varying T and p = 1. The fitness follows the high-modularity curve at rapid environmental changes, small T, and the low-modularity curve at slow environmental changes, large T. Since p = 1, the function fp=1,T (M) = 〈g(M)〉(t = T). The function 〈g(M)〉 is here taken for simplicity to be (1−M)〈g0〉(t)+Mg1〉(t). Note the modularity tends to 1 and the fitness to 〈g1〉 for rapid environmental change (small T), and the modularity tends to 0 and the fitness to 〈g0〉 for slow environmental change (large T). The modularity calculated from theory, Eq. (13), is shown (dotted). Also shown is the theoretical result for small M, Eq. (15), to first order in l/L (short dashed). In this example L = 120, l = 10, μ = 0.01, and C = 5.77. For these particular 〈g0〉 and 〈g1〉, the modularity emerges only for environmental changes that occur on a timescale T < tc ≈ 285.

FIG. 3.

FIG. 3

The phase diagram for emergence of modularity. Below a critical mutation rate, modularity spontaneously emerges. Results are shown for f(M) = kM2/2 (solid), f(M) = kM3/2 (long-dashed), f(M) = kM4/2 (short-dashed), f(M) = kM10/2 (dotted), and f(M) = ekMkM − 1 (dot-dashed). Results here are shown for l = 10,L = 120.

IV. A FLUCTUATION DISSIPATION THEOREM

There is a fluctuation dissipation relation for the rate of change of modularity. Multiplying Eq. (1) by m and summing, we find that the rate of change of modularity satisfies

dMdt=Lmf(m)-LMf-μM (4)

This equation is a type of continuous-time Price equation [58]. This equation implies a type of useful fluctuation-dissipation theorem. Expanding f(m), we can alternatively write this fluctuation dissipation relation describing the evolution of modularity as

dMdtLdfdm|MσM2-μM (5)

Here M = 〈m〉 is the average modularity of the system, and σM2=m2-M2 is the variance of the modularity, where m is the modularity for any particular system in the population.

V. ENVIRONMENTAL CHANGE SELECTS FOR MODULARITY

We now derive a relationship between the rate of growth of modularity and the environmental pressure. We investigate the dynamics for small modularity, and we consider a Taylor series expansion of the fitness function: f(m) = f(0) + mΔf + o(m). The function Δf is time independent, depending on p, T, and other parameters of the evolution within each system that have been projected out. We investigate the growth of modularity from an initially non-modular state. We consider how the response function depends on p. If p = 0, the environment is not changing, t* → ∞ in the expression of Eq. (3), and the system will stay in the M = 0 state. This implies Δf = 0 when p = 0, as otherwise a non-zero modularity would emerge, see Eq. (15) below. For small p, the environment is changing only slightly, t* is large, and the system will evolve a small value of M. Expanding in a Taylor series for small p and Tt*, Eq. (3) becomes

pT=gm(tm)gm(tm)gm(tm)gm()gm(tm)g0() (6)

where the last two relationships arise because gm(tm) is small and because tm is large and m is small. Thus, Δf = limm→0[f(m) − f(0)]/m = Δf(p/T). Expanding Δf to first order in p/T and taking m small, we find Δf = αp/T. When m is small, equation (4) becomes

M=LσM2Δf-μM (7)

Using the result above for Δf, we find MαLσM2p/T, leaving out the small term proportional to M in Eq. (7). We, thus, find

pE1RdMdt (8)

where pE = p/T is the environmental pressure, and R=αLσM2. In this equation, RσM2, which as experimentalists have anticipated is related to replicate variability in experiments [59].

This Eq. (8) follows from the fluctuation dissipation relation in Eq. (4) and the response function of the modular system being greater than that of the non-modular system at short time. Equation (8) may be interpreted as a Taylor series expansion of dM/dt in allowed combinations of p and 1/T. Alternatively, Eq. (8) may be interpreted as the linear response of the modularity to the environmental pressure. The coefficient R is a measure of ruggedness of the evolutionary landscape within each system. This ruggedness slows down the evolutionary dynamics, and the selection for an effective response function provided by a changing environment implicitly selects for modularity when horizontal gene transfer is active [27]. Here, we are able to show that R is proportional to the variance of the modularity, which is expected to be related to the ruggedness of the landscape. It is the ruggeddness of the landscape that leads to non-trivial replicate variability.

For what forms of 〈gm〉(t) will the Δf(p/T) function be analytic in p/T? We first consider an exponential convergence of the fitness function: 〈gm〉(t) = g(∞) − am exp(−βmt), where we have left out the m depencence of g(∞) because we expect it to be higher order than linear in m. Eq. (6) becomes pg()/T=amβmexp(-βmtm), and we find fp,T(m)=gm(tm)=g()-g()p/(Tβm). We thus find mΔf = g(∞)(p/T)(1/β0− 1/βm), which is positive because we expect the modular system to converge faster, βm > β0. Thus, we find Eq. (8), with α=-g()dβm-1/dmm=0. Conversely, for a power law decay 〈gm〉(t) = g(∞) − amtβ, we find the fitness to be non-linear in p/T: fp,T (m) = g(∞) − am[(pg(∞)/(Tamβ)]β/(β+1). In this case, Eq. (8) is modified to be pEβ/(β+1) on the left hand side, with α=-[g()/β]β/(β+1)dam1/(β+1)/dmm=0. Finally, for a logarithmic decay [55] gm(t)=g()-amln-2/ν(t/tm0), we find the fitness to be non-analytic in p/T, since (p/T)tmg()=(2/ν)amln-2/ν-1(tm/tm0). This equation can be solved in terms of powers of the product logarithm, or Lambert W0 function. Performing an asymptotic analysis for small p/T, we find fp,T (m) ~ g(∞) − am ln−2/ν(T/p). In this case, Eq. (8) is modified to be 1/(ln2 pE)1/ν on the left hand side, with α = −dam/dm|m=0.

Equation (8) is a description of how the evolvability of the system depends on the environmental change. That is, dM/dt is a measure of the evolvability of the system, with larger values indicating a greater rate of change of the measurable order parameter M. This measure of evolvability is greater for greater environmental pressures, pE. The drive for spontaneous emergence of modularity, large dM/dt, is also greater for landscapes that are more rugged, i.e. larger R, which can be estimated from variability of replicate experiments.

Equation (8) says that an increase of environmental pressure should lead to the evolution of systems with increased modularity. A study of 117 species of bacteria showed that the modularity of the bacteria’s metabolic networks increased monotonically with variability of the environment in which the bacteria lived [60]. Metabolic networks of pathogens alternating between hosts were found to be more modular than those of single-host pathogens [61].

VI. STEADY-STATE VALUES OF MODULARITY IN ONE ENVIRONMENT

A. Field Theory for the Dynamics of Modularity

Here we rewrite the dynamical equations of quasispecies theory in the language of field theory. We solve the field theory in the limit of large system sizes to determine the steady-state modularity that emerges at long time. The theory is distinct from traditional quasispecies theory because the replication rate depends on the modularity rather than the Hamming distance from a wild-type strain. Nonetheless, we will show that the theory can still be solved exactly in the limit of a large system size.

For large values of L, for which the changes inM are nearly continuous, we here determine the average fitness implied by Eq. (1) at long time by techniques borrowed from quantum field theory [39, 41]. We write the dynamical equations in Eq. (1) in terms of raising and lowering operators. We then use coherent states to write this second quantization in terms of a Bosonic field theory, with fields zij(t), zij(t) representing density at Δij(t) at time t. The action of this field theory is

S[{z},{z}]=0tfijzij(t)tzij(t)dt+ij[zij(0)zij(0)-zij(tf)]+LC-C1Lijinzij(0)-C0Lijoutzij(0)-L0tff[1LC(ijinzij(t)zij(t)-1L/l-1ijoutzij(t)zij(t))]dt-μL20tfijmn[zmn(t)-zij(t)]zij(t)dt (9)

Note that the fitness depends on the modularity of the connection matrices of each state at each point in time in Eq. (9), just as it did in Eq. (1). Also note that Eqs. (1) and (9) are exact for arbitrary, non-linear fitness functions f(m). Here “in” means in the l × l block diagonals and “out” means outside these block diagonals. The quadratic terms can be integrated out (see Appendix B) [41], and we are left with an action expressed in terms of a modularity field, ξ, and its conjugate, ξ̄:

S=L0tf[Cξ¯(t)ξ(t)f(ξ(t))]dt-LClnQ (10)

where the determinant is Q = [lC1(tf) + (Ll)C0(tf)]/(LC), where the vector C(t) = (C1(t),C0(t)) satisfies

dC/dt=A(t)C(t) (11)

where

A(t)=(-μ(L-l)/L+ξ¯(t)μ(L-l)/Lμl/L-μl/L-ξ¯(t)l/(L-l)) (12)

and C(0) = (C1,C0).

B. The Steady-State, Average Value of Modularity

The average modularity follows a dynamical trajectory away from an initial state to a final steady state value. For large L, this action becomes large, and a saddle point calculation can be used (see Appendix C). The remarkable result from this derivation is that the modularity which emerges at long time obeys a principle of least action:

fpop=maxξ{f(ξ)-μC[(L-l)l/L2][2+(L/l-2)ξ-2(1-ξ)(1+(L/l-1)ξ)]} (13)

The variance of the modularity is small, Inline graphic(1/L), and the modularity is determined by the solution of the implicit equation

f(M)=fpop (14)

Here fpop is the mean population fitness, i.e. Eq. (2) with t → ∞. Thus, a principle of least action gives the evolved modularity at steady state. Coexistence of populations with different modularity, i.e. bimodality in the distribution of modularity, is possible if the f(m) function is discontinuous [45].

C. Phase Diagrams for The Emergence of Modularity

While Eq. (13) is a general result, we can proceed further in the limit that evolved modularities are small. Expanding for small M, we find

ξmax~2l[df/dMM=0]μC(L-l)-2l[d2f/dM2M=0]fpop~l[df/dMM=0]2μC(L-l)-2l[d2f/dM2M=0]+f(0)M~l[df/dMM=0]μC(L-l)-2l[d2f/dM2M=0] (15)

Thus, as long as a modular system has a higher fitness, df/dM > 0, modularity will spontaneously emerge, M > 0, for large enough system sizes, L. Note also when M is small, that the steady state modularity calculated exactly from Eq. (13) is in agreement with the small M result in Eq. (15), as shown in Fig. 2b. Note that for large L/l, Eq. (7) combined with Eq. (15) implies that at steady state σM2=l/[L(L-l)C].

For fitness functions for which df/dM|M=0 = 0, more analysis is required. For example, if f(M) = kM2/2, there is a phase transition at μ*: For μ < μ* modularity emerges, whereas for μ > μ* the population remains in the non-modular phase. This phase transition is analogous to the error catastrophe found in traditional quasispecies theory. Phase diagrams for a number of fitness functions are shown in Fig. 3.

VII. USING QUASISPECIES THEORY TO EXTRAPOLATE SIMULATION DATA ON SPONTANEOUS EMERGENCE OF MODULARITY

We use Eq. (1) to analyze M(t) data on spontaneous emergence of modularity in a simulation of an evolving protein network [27] to deduce df/dm and to derive f(M) by integration. For this system, we know the mutation rate, as two of the connections change per time step in the upper half of the connection matrix, and so we can use Eq. (1) at short time to determine df/dm. Alternatively we can determine df/dm if we know the variance of the modularity and M(t), c.f. Eq. (5). We assume f(M) is quadratic, and integrate the df/dm to determine the f(M). There are ND = 346 total connections in the upper half of the connection matrix and N0 = 22 connections in the upper half of the connection matrix when M = 0 for the parameters of [27]. Thus, we take C = 346 × 2/L = 5.77 and μ = 2/346. When M = 0, the population was prepared by four discrete time iterations of the mutation step, from a single initial configuration [27]. We find f(M) ~ 1.4M reproduces the data at small M. For the initial condition of M = 0.38, the configurations were taken from an ensemble [27], which we take to satisfy Eq. (1). We find f(M) = 1.4M − 1.31M2 approximately reproduces the data, as shown in Fig. 4. Equation (13) predicts a steady-state value of M = 0.45, toward which the computationally costly simulations appear to be heading.

FIG. 4.

FIG. 4

Shown is modularity versus time for a population that exhibits spontaneous emergence of modularity. The curves are from theory, Eq. (1), and the data (circles) are from [27]. Two different initial conditions are shown, M(0) = 0 and M(0) = 0.38. In this example the derived underlying fitness function is f(M) = 1.4M − 1.31M2, the mutation rate is μ = 2/346, and the average number of connections is C = 346 × 2/L = 5.77.

VIII. CONCLUSION

The examples of environmental stress leading to modularity, ranging from metabolic networks of bacteria in different physical environment to simulations of emergence of protein secondary structure, can be quantified by quasispecies theory. The approximate relation RpE = dM/dt relates rate of growth of modularity to the ruggedness of the fitness landscape, R, and environmental pressure, pE, for small values of modularity. The present theory should allow the analysis of complex, evolving populations to go beyond a demonstration of the existence of modularity to a quantitative analysis of the dynamics of modularity. That is, the theory presented here should allow the determination of the f(M) function for these evolving populations, by using the predictions to determine the f(M) that best matches observation. Knowing the f(M) and μ that fundamentally characterize a population would then allow for out-of-sample predictions of dynamical modularity.

Acknowledgments

This research was supported by US National Institutes of Health, 1 R01 GM 100468– 01. JMP was also supported by the Catholic University of Korea research fund 2014 and by the National Research Foundation of Korea Grant (NRF-2011-013-C00029 and NRF-2013R1A1A2006983).

IX. APPENDIX A

We here derive Eq. (1). The rate to increase modularity for a matrix with modularity m is rup = μnout(L/l)l2/L2. Recall we are in the dilute limit: C is finite, and L is large. Thus, collisions between entries in the connection matrix can be ignored. The rate to decrease modularity for a matrix with modularity m is rdown = μninL(Ll)/L2. Here the number of connections inside the l×l blocks is given by nin and the number of connections outside the l × l blocks is given by nout. We have the constraint nin + nout = CL. We also have by the definition of modularity m = [nin/lnout/(Ll)]l/(CL), which shows modularity changes by discrete increments of ±1/[C(Ll)]. Thus, we find rup(m) = μCl(Ll)(1 −m)/L and rdown(m) = μC(Ll)(Lmlm + l)/L. For non-zero modularity, to avoid collisions in the Δ matrix, we further require 〈nin〉 ≪ lL, i.e. C(llM + LM) ≪ lL. Alternatively, if this constraint is not satisfied, we can view Eq. (1) as a generalization to the case of integer occupation numbers of the Δ matrix with certain biased hopping probabilities, rup(m) and rdown(m), given above. The rate of change of Pm(t) due to replication is L[f(m)−〈f〉]Pm(t), where the second term ensures conservation of probability, Σm Pm(t) = 1 ∀ t. This is the first term on the right hand side in Eq. (1). The rate of increasing Pm(t) due to an increase of modularity from m − 1/[C(Ll)] to m is rup[m − 1/(C(Ll))]Pm−1/[C(Ll)](t), which is the first μ-dependent term in Eq. (1). The rate of increasing Pm(t) due to a decrease of modularity from m + 1/[C(Ll)] to m is rdown[m + 1/(C(Ll))]Pm+1/[C(Ll)](t), which is the second μ-dependent term in Eq. (1). The rate of decreasing Pm(t) due to modularity changing from m to m ± 1/[C(Ll)] is [rup(m) + rdown(m)]Pm(t), which is the third μ-dependent term in Eq. (1). Thus, we have derived Eq. (1).

X. APPENDIX B

We here calculate the determinant that comes from integrating out the zij and zij fields in Eq. (9). The probability of connections inside and outside the blocks have been taken initially to be Poisson in Eq. (9), with average probability of a connection per site to be C1/L inside the blocks and C0/L outside the blocks. The overall average number of connections per row is C = C0 + (C1C0)l/L. We here project the number of connections onto the constraint that there are LC total connections. As in [41], this constraint is enforced with a projection operator that leads to twisted boundary conditions. A modularity field ξ and conjugate field ξ̄ are defined, with ξ(t) as the argument of the fitness function in Eq. (9). We use a trotter factorization and define ε = tf/M and will take the limit M → ∞. We define δ = 1 if ⌊i/l⌋= ⌊j/l⌋ and zero otherwise. The partition function becomes

Z=[Dξ¯Dξ]e-εLCk=1Mξ¯(k)ξ(k)+εLk=1Mf[ξ(k)]×02πdη2πe-iη-LC[DzDz]e-k=0Mijzij(k)zij(k)+ijzij(M)×ek=1Mij[zij(k)+(εμ/L2)mn(zmn(k)-zij(k))+εξ¯(k)(Lδ-l)/(L-l)zij(k)]zij(k-1)×e(C1(0)/L)eiη/(LC)ijinzij(0)+(C0(0)/L)eiη/(LC)ijoutzij(0) (16)

Integrating out zij(0) and zij(0), the action remains the same except the start on sums over k are incremented by one, and the terms C1(0)z*(0) and C0(0)z*(0) become C1(1)z* (1) and C0(1)z*(1) with

C1(1)=C1(0)[1-εμ(1-lL)+εξ¯(1)]+C0(0)εμ(1-lL)C0(1)=C0(0)[1-εμlL-εξ¯(1)lL-l]+C1(0)εμlL (17)

Iterating the process of integrating out the z*(k) and z(k), we find that the vector C(t) = (C1(t),C0(t)) renormalizes according to Eq. (11). Finally, integrating out z*(M) and z(M), we find the final contribution to the partition function is

Z=[Dξ¯Dξ]e-εLCk=1Mξ¯(k)ξ(k)+εLk=1Mf[ξ(k)]×02πdη2πe-iη-LCe[lC1(M)+(L-l)C0(M)]eiη/(LC) (18)

Performing the final integration over η, we find the final expression for the partition function to be

Z=[Dξ¯Dξ]e-εLCk=1Mξ¯(k)ξ(k)+εLk=1Mf[ξ(k)][lC1(M)+(L-l)C0(M)LC]LC (19)

Thus, the action in Eq. (10) is derived.

XI. APPENDIX C

Here we calculate the saddle-point solution to the action (10) at large time. For large L, this saddle point solution is exact. For large tf, Eq. (10) becomes

S=Ltf[Cξ¯ξ-f(ξ)]-LClnQ (20)

where

Q=Tr[etfA(C1(0)/CC0(0)/C)(lL,L-lL)] (21)

The larger eigenvalue of A is given by

λ+=-12(μ-L-2lL-lξ¯)+12[(μ-L-2lL-lξ¯)2+4lξ¯2L-l]1/2 (22)

Thus, the action tends to

-S~Ltf[-Cξ¯ξ+f(ξ)]+LCtfλ+ (23)

Maximizing this over ξ̄, we find

-S/(Ltf)~f(ξ)-μC[(L-l)l/L2][2+(L/l-2)ξ-2(1-ξ)(1+(L/l-1)ξ)] (24)

Maximizing over ξ gives Eq. (13). Using that the partition function Inline graphic grows at long time as exp(Lfpoptf) [41], we find Eq. (14).

Footnotes

PACS numbers: 87.10.-e, 87.15.A-, 87.23.Kg

References

RESOURCES