Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2018 May 29.
Published in final edited form as: Phys Rev E. 2018 Apr;97(4-1):040401. doi: 10.1103/PhysRevE.97.040401

Hidden long evolutionary memory in a model biochemical network

Md Zulfikar Ali 1, Ned S Wingreen 2,*, Ranjan Mukhopadhyay 1,
PMCID: PMC5973509  NIHMSID: NIHMS969188  PMID: 29758653

Abstract

We introduce a minimal model for the evolution of functional protein-interaction networks using a sequence-based mutational algorithm, and apply the model to study neutral drift in networks that yield oscillatory dynamics. Starting with a functional core module, random evolutionary drift increases network complexity even in the absence of specific selective pressures. Surprisingly, we uncover a hidden order in sequence space that gives rise to long-term evolutionary memory, implying strong constraints on network evolution due to the topology of accessible sequence space.


Within even the simplest living cells there is a highly complex web of interacting molecules, with biological function typically emerging from the actions of a large number of different factors [1,2]. What is the relationship between the architecture of such interaction networks and the underlying processes of evolution? Much of the theory related to evolution focuses on the evolution of individual phenotypic traits or on population dynamics (see, for example, [3]); however, in general, individual genes do not determine individual traits. Rather, many traits arise from the dynamics of interacting components. With this in mind, we formulated and analyzed a minimal physically based protein-protein interaction model that allows us to map from sequence space to interactions and, consequently, to network dynamics and fitness. Surprisingly, the model reveals a long-term memory of network origins hidden in the space of sequences.

Recently, bottom-up approaches to molecular evolution, typically in the context of the folding properties or thermodynamics of individual proteins or RNAs [48], have led to new insights into evolutionary outcomes, for example regarding a power-law distribution of protein family sizes. Here we generalize such bottom-up studies to functional networks. We focus on oscillatory networks of interacting enzymes, both due to the relevance of biological oscillators (e.g., cell cycle, circadian rhythms) [911] and due to the simplicity of defining function and fitness. As such a network evolves, are the original nodes still both necessary and sufficient or does the network redistribute function over new nodes? If new nodes do become essential, is there still memory of the original network?

In order to address these questions, we develop a model of protein-protein interaction networks consisting of two classes of enzymes—activators (e.g., kinases) and deactivators (e.g., phosphatases). Each of these can be in either an active state or an inactive state and only function when in the active state. To model cooperativity, we assume that activation or deactivation of a target (either an activator or a deactivator) requires h independent binding or modification events, with partially modified intermediates being short lived. The resulting chemical kinetic processes are

hAi+TlkilhAi+Tl,hDj+TlkjlhDj+Tl, (1)

where A and A*, D and D*, and T and T* denote activator, deactivator, and target in inactive and active states, respectively. We note here that, in our model, the same protein species act both as enzymes (represented as A* or D* in the equations) as well as targets (represented as T and T*). The corresponding chemical kinetic equation can be approximated as (see Supplemental Material (SM) [12], Sec. I for details)

d[Tl]dt=i=1mkil[Ai]h[Tl]-j=1nkjl[Dj]h[Tl]+α[Tl]-α[Tl], (2)

where m and n are the number of distinct types of activators and deactivators, respectively. In Eq. (2), α and α′ are background activation and deactivation rates. We further assume that the total concentration of each species is constant, such that [Tl]=c0-[Tl].

Protein-protein interaction strengths are generally determined by amino-acid-residue interactions at specific molecular interfaces. Moreover, it has been estimated that >90% of protein interaction interfaces are planar with the dominant contribution coming from hydrophobic interactions [13,14]. For simplicity, we therefore assume each protein possesses a pair of interaction interfaces, an in-face and an out-face, and we associate a binary sequence, σ⃗in,out, of hydrophobic residues (1’s) and hydrophilic residues (0’s) to each interface (our approach builds on previous studies [15,16]). The interaction strength between an enzyme (denoted by index i) and its target (denoted by index l) is determined by the interaction energy Eil=εσout(i)·σin(l) between the out-face of the enzyme and in-face of its target. (All energies are expressed in units of the thermal energy kBT.) The effective reaction rate is then given by

kil=k0(1+exp[-(Eil-E0)])-h, (3)

where E0 plays the role of a threshold energy, e.g., accounting for the loss of entropy due to binding. The background activation and deactivation rates are set equal and define the unit of time via α = α′ = 1. In our simulations we set k0 = 104, ε = 0.2, cooperativity h = 2, E0 = 5, c0 = 1, and we take the length of each sequence representing an interface to be N = 25. These interaction parameters were chosen to provide a large range for the rate constants kil as a function of sequence and to keep the background rates small compared to the highest enzymatic rates; cooperativity was introduced to allow oscillations in relatively simple biomolecular networks.

For our evolutionary scheme, we assume a population sufficiently small that each new mutation is either fixed or entirely lost [17,18]. We consider only point mutations—namely, replacing a randomly chosen hydrophobic residue (1) in the in- or out-face of one enzyme by a hydrophilic residue (0), or vice versa. In this study, mutations are accepted if and only if they satisfy the selection criterion that the network remains oscillatory and moreover that the network exhibits oscillatory dynamics independent of the choice of initial concentrations of the active fractions (global oscillators). For this purpose we identified the fixed points of the chemical dynamics and carried out linear stability analysis (see SM [12], Sec. II).

In order to address the question of network drift—how function could redistribute over new nodes in an evolving network—we construct a three-component oscillator [see Fig. 1(a) for a schematic] by starting with a two-component oscillator, with one activator and one deactivator, and adding a second activator with all 0’s for the sequences representing in-and out-interfaces (so that initially Activator 2, representing a new node, has minimal interactions with the other two components). We then let the system evolve, accepting only mutations corresponding to global oscillators. To characterize network drift, we studied the time evolution of the essentiality of each activator for a random sample of starting sequences that corresponded to oscillators, as depicted in Fig. 2(a), where we characterize a component as being “essential” if the system stops oscillating when the component is removed, or equivalently, in our model, if we set the total concentration c0 of that component to zero [19]. Initially Activator 2 is inessential (since deactivator and Activator 1 generate oscillations), and in Fig. 2(b) we exhibit the distribution of the number of accepted mutational steps before it becomes essential for two distinct starting sequences. While the two distributions peak at very different values for the number of mutational steps, the interaction strengths for the two initial states do not differ appreciably [Fig. 2(b), inset], highlighting the importance of the underlying sequence in governing evolutionary dynamics. Returning to Fig. 2(a), we find relatively rapid flips between states where both activators are essential to states where only one of the activators is essential.

FIG. 1.

FIG. 1

Oscillatory protein-protein interaction network. (a) Schematic of a three-component network with two activators (A1,A2) and one deactivator (D). The symbols, → and ⊣, indicate the chemical process of activation and deactivation, respectively. (b) Steady-state oscillations of the active fractions of the components of the network in (a). The dashed vertical lines indicate peaks of the activator oscillations, and the horizontal arrow indicates the time shift between these peaks.

FIG. 2.

FIG. 2

Temporal evolution of essentiality of activators in three-component systems. (a) Temporal evolution for two different initial sequences (the two sequences are specified in the SM [12]). On the y axis, +1 indicates only activator 1 is essential, −1 indicates only Activator 2 is essential, and 0 indicates both activators are essential [19]. (b) Histograms of the number of accepted mutational steps before Activator 2 first becomes essential, for the two distinct initial sequences. Inset: interaction strengths of the two initial states.

Surprisingly, we also note the prevalence of much longer time periods where Activator 1 is always essential or where Activator 2 is always essential. This is true independent of initial conditions. These long evolutionary periods presumably reflect the division of sequence space into two regions or “phases”: Phase 1 where Activator 1 is always essential and Phase 2 where Activator 2 is always essential. The system starts in Phase 1 (Activator 2 is inessential), then when Activator 1 first becomes inessential we infer that the system has entered Phase 2, and so on. These results imply that while, naively, one might have expected that the starting state of the network (e.g., the identity of the solely essential activator) would be effectively forgotten as soon as both activators became essential, the system retains a hidden memory of the starting conditions in terms of persistence in the starting phase (Phase 1, in this case). Thus the long duration in each phase (in comparison to the duration between successive flips in essentiality) constitutes a long-term memory in our evolving network.

Can these two phases be distinguished in terms of measurable dynamical quantities or rate constants? Since the two phases presumably relate to an asymmetry in the roles of the two activators, we quantify this asymmetry via the relative peak-to-valley ratio (PVR) of the oscillations of their active fractions, where relative PVR is [(PVR A1 − PVR A2)/(PVR A1 + PVR A2)]. The PVR of a component is obtained by determining the peak value and the valley (minimum) of the active concentration for steady-state oscillations [see Fig. 1(b)] and taking the ratio of the two. From Fig. 3(a) (top panel) and Fig. 3(b), we see that relative PVR correlates with the phase, and we display the distribution quantifying this correlation. A corollary is that the probability that an activator is essential also correlates with the relative PVR [Fig. 3(c)], so that if an activator has a relatively larger PVR it is also more likely to be essential. Moreover, we find that the phase shift between peaks in the active fractions of the two activators also correlates with the phase [Fig. 3(d)], so that Activator 1 typically leads in Phase 1 and Activator 2 in Phase 2. Finally in order to determine how these observations relate to the underlying rate constants, we constructed the covariance matrix for the covariation of the nine rate constants kij and carried out a principal component analysis (see SM [12], Sec. IV). We find that the projected component of the rates onto the eigenvector with the largest eigenvalue (PC1 = 94.93%) strongly correlates with the phase [Fig. 3(a), lowest panel, and Fig. 3(e)]; we find no such correlation for projections onto any of the remaining eigenvectors. On examining the top eigenvector, we find that it primarily consists of a linear superposition of the difference in autoactivation rates of the two activators and the difference in their deactivation rates. This suggests that strong autoactivation coupled with strong deactivation produces an activator that peaks first during each oscillation cycle and also has a large PVR (see SM [12], Sec. VIII, for a physical explanation of the correlation). However, the co-occurrence of these features does not by itself explain the observed long intervals of the two distinct phases.

FIG. 3.

FIG. 3

Temporal evolution of phases in a three-component system. (a) Depiction of the temporal evolution where a value of +1 indicates Phase 1 and −1 indicates Phase 2. Along with the phase, the three panels show (i) normalized relative PVR of the two activators (red, top panel), (ii) phase shift between their oscillatory peaks (green, middle panel), and (iii) projected component of the chemical rates on the principal eigenvector from principal component analysis (PCA) (magenta, bottom panel). (b) Distributions of relative PVR of the two activators in Phase 1 and in Phase 2. (c) Probability that each activator is essential as a function of its relative PVR. (d) Distribution of phase shifts between active fraction peaks of the two activators in Phase 1 and Phase 2. (e) Distribution of projected rate constants on the principal eigenvector, obtained from PCA analysis, in Phase 1 and Phase 2.

The question remains: what is the origin of the observed long-term memory? We first quantify the duration of long-term network memory by constructing a histogram of the number of mutational steps that the system spends in each phase before flipping. As shown in Fig. 4(a), we find an approximately exponential distribution, P(τ) ∝ eτ/τ0, where τ0 ≃ 3200 ± 48 mutational steps. An exponential distribution implies a fixed, history-independent rate of flipping between the two phases, which in turn suggests that flipping corresponds to barrier crossing. Since our model treats all oscillatory states as equally fit, the only barriers are entropic, i.e., there must be, relatively speaking, very few boundary points connecting phases (see SM [12], Sec. V). To check this hypothesis, we studied the neighborhood of states in Phase 1 and Phase 2. In Phase 1, for example, we distinguished between states where only Activator 1 is essential and states where both are essential. For states where only Activator 1 is essential we found no examples of sequences that were one Hamming distance away (that is, separated by a single point mutation) for which Activator 1 stops being essential. Of the states in Phase 1 where both activators are essential, for only 3% of states the Hamming distance 1 neighborhood contained one or more states where Activator 1 was inessential. The relative rarity of such states (which can be considered as boundary states) is consistent with our hypothesis that in sequence space the two phases touch at a relatively small number of boundary points.

FIG. 4.

FIG. 4

Distribution of accepted mutational steps between flips. (a) Distribution of the number of accepted mutational steps between flips from one phase to the other, on a semilogarithmic scale to highlight the exponential distribution (data is binned with bin size 600). Inset: same data on log-log scale. (b) Distribution of the number of accepted mutational steps where an activator is essential for the whole duration, on a log-log scale showing a power-law fit f(x) ~ x−2.3±0.05 for short times (bin size 50). Inset: same distribution over longer times on semilogarithmic scale (bin size of 600).

Interestingly, in contrast to flipping between phases, the distribution of the number of mutational steps that an activator remains essential exhibits a power-law distribution for short times, as depicted in Fig. 4(b). For Activator 1, for example, this power-law part of the distribution is dominated by cases where the system is in Phase 2, with Activator 1 switching between being essential and inessential. Thus the power-law distribution is related to the presence of domains within Phase 2 where Activator 1 is also essential (and likewise for Activator 2 in Phase 1). For longer times, the periods of essentiality correspond to the duration of phases, and thus the distribution decays exponentially [Fig. 4(b), inset]. In contrast to exponential decay, a power-law distribution implies a history-dependent switching rate, with the escape rate from a domain proportional (on average) to the inverse of the time elapsed since the system entered the domain (see SM [12], Sec. IX; see also Sec. VI for a toy model exhibiting mixed power-law and exponential distributions).

It is not a priori obvious how the above observations of two phases generalize to more complex networks. We therefore extended our study by starting with a three-component oscillator and adding a fourth component (Activator 3) with all its sequences initially set to 0’s. Once again we find that Activator 3 becomes essential relatively rapidly (typically in ~100 mutational steps). If we continue to follow the evolution of essentiality for the activators, we find for each activator long periods (~1000+ mutational steps) where that activator remains essential, separated by similarly long periods where that activator is intermittently essential or inessential [Fig. 5(a)]. This suggests that for each activator, the sequence space of oscillators divides into two regions: one region where that activator is essential at every point and a second region consisting of smaller domains where the activator is essential interspersed with domains where it is inessential. Note that time periods where one activator remains essential sometimes overlap with periods where one of the other activators remains essential, implying that the region where one activator is essential at every point has some overlap with the regions where other activators are essential at every point. This contrasts somewhat with the three-component system where Phase 1, the region in which Activator 1 is essential at every point, is complementary to Phase 2. By contrast, as shown in Fig. 5(b), the distribution of mutational steps over which any one of the activators is essential for the four-component system is quite similar to that of the three-component system, being power law at short times with a similar exponent, and exponential for longer times, albeit with a shorter decay time τ0 ≃ 1750 ± 54 mutational steps. As for three-component systems, we also find a strong correlation between normalized and relative PVR of oscillation, phase-shift, and essentiality for pairs of activators. We find that when the normalized PVR of an activator is higher, the probability that it is essential is also higher [Figs. 5(c) and 5(d)]; these results generalize to much larger systems of activators and deactivators (see SM [12], Sec. X).

FIG. 5.

FIG. 5

Temporal evolution of essentiality of each activator in four-component systems. (a) Depiction of temporal evolution where, on the y axis, +1 indicates that the activator is essential and 0 indicates that it is not essential. (b) Distribution of the number of accepted mutational steps where the activator is essential for the whole period, on a log-log scale showing the power-law distribution f(x) ~ x−2.15±0.02 for short times (bin size 20). Inset: same distribution on a semilogarithmic scale. (c) Probability of Activator i being essential as a function of its normalized PVR defined as PVR Ai /(PVR A1 + PVR A2 + PVR A3). (d) For any pair of activators, the probability that Activator i leads Activator j as a function of their relative PVR.

In this Rapid Communication, we focused on oscillatory networks and introduced a sequence-based evolutionary scheme, in contrast to schemes where mutations are directly implemented by changes in rate constants (see, for example, [20]). We studied how function can become distributed over new nodes due to random network drift. For a three-node network, the typical timescale for the new node to become essential for oscillation is ~100 point accepted mutations, which, given the total of 150 sites, corresponds to around 66% accepted mutations [21]. Surprisingly, our model also revealed a much longer term memory (around 2000 point accepted mutations per 150 amino acids for a three-node system) with exponential decay, indicative of a barrier crossing process in the space of sequences.

We expect our model to be broadly useful for exploring the principles of protein network evolution. While simple and easy to implement, the model is biologically grounded in sequence-based evolution, and also physically grounded insofar as all proteins interact via binding with other proteins. In this approach, any component is allowed to interact with all other components and no specialized topology is introduced by hand. Moreover, there is no fine tuning and the degree of cooperativity utilized for the studies in this Rapid Communication is modest and easily achievable in practice by biochemical networks [22]. The model provides a natural framework to study the interplay between selection pressure and sequence-based designability and accessibility. It can moreover be readily extended to larger networks, networks with other functions, and also to other mutation-selection regimes (for example, the concurrent mutations regime expected for larger populations [23]).

We also believe our results for network drift will apply beyond the context of oscillators studied here. It has been suggested that protein networks evolve primarily by two biological mechanisms: (i) gene duplication and (ii) random mutations in proteins leading to neofunctionalization, that is, the de novo creation of new relationships with other proteins [24]. Our studies illustrate the significance of neofunctionalization in the context of functional networks where protein-protein interactions are physically grounded, i.e., described via quantitative interaction strengths rather than Boolean variables. Our discovery of hidden order in sequence space leading to evolutionary long-term memory could also be quite general, highlighting the strong constraints to network evolution that emerge from the topology of accessible sequence space. It will be interesting to see if the presence of “phases” generalizes to other network types. Future studies may profitably include the evolutionary dynamics of nodes, address other network functions (e.g., signal integration), and explore the role of graded selection in the de novo evolution of new functions.

Acknowledgments

We acknowledge helpful discussions with Yigal Meir and Ammar Tareen. The research was supported in part by DARPA Biochronicity program, Grant No. D12AP00025, National Science Foundation Grant No. PHY-1305525, and National Institutes of Health Grant No. R01 GM082938.

References

  • 1.Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Molecular Biology of the Cell. Taylor & Francis; London: 2002. [Google Scholar]
  • 2.Alon U. An Introduction to Systems Biology: Design Principles of Biological Systems. Chapman and Hall; London: 2009. [Google Scholar]
  • 3.Maynard SJ. The Theory of Evolution. Cambridge University Press; Cambridge: 1993. [Google Scholar]
  • 4.Eigen M. Naturwissenschaften. 1971;58:465. doi: 10.1007/BF00623322. [DOI] [PubMed] [Google Scholar]
  • 5.Eigen M, Schuster P. J Mol Evol. 1982;19:47. doi: 10.1007/BF02100223. [DOI] [PubMed] [Google Scholar]
  • 6.Bloom JD, Raval A, Wilke O. Genetics. 2007;175:255. doi: 10.1534/genetics.106.061754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Serohijos AWR, Shakhnovich EI. Curr Opin Struct Biol. 2014;26:84. doi: 10.1016/j.sbi.2014.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zeldovich KB, Shakhnovich EI. Annu Rev Phys Chem. 2008;59:105. doi: 10.1146/annurev.physchem.58.032806.104449. [DOI] [PubMed] [Google Scholar]
  • 9.Goldbeter A. Biochemical Oscillations and Cellular Rhythms: The Molecular Bases of Periodic and Chaotic Behavior. Cambridge University Press; Cambridge: 1996. [Google Scholar]
  • 10.Ditty JL, Mackey SR, Johnson CH. Bacterial Circadian Programs. Springer; New York: 2009. [Google Scholar]
  • 11.Nakajima M, Imai K, Ito H, Nishiwaki T, Murayama Y, Iwasaki H, Oyama T, Kondo T. Science. 2005;308:414. doi: 10.1126/science.1108451. [DOI] [PubMed] [Google Scholar]
  • 12.See Supplemental Material at http://link.aps.org/supplemental/10.1103/PhysRevE.97.040401 for detailed chemical kinetics, further analysis, toy model, and robustness of the model.
  • 13.Heo M, Maslov S, Shakhnovich EI. Proc Natl Acad Sci USA. 2011;108:4258. doi: 10.1073/pnas.1009392108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Keskin Z, Gursoy A, Ma B, Nussinov R. Chem Rev. 2008;108:1225. doi: 10.1021/cr040409x. [DOI] [PubMed] [Google Scholar]
  • 15.Johnson ME, Hummer G. J Phys Chem B. 2013;117:13098. doi: 10.1021/jp402944e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nooren IM, Thornton JM. EMBO J. 2003;22:3486. doi: 10.1093/emboj/cdg359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Moran PAP. Math Proc Cambridge Philos Soc. 1958;54:60. [Google Scholar]
  • 18.Nowak MA. Evolutionary Dynamics: Exploring the Equations of Life. Belknap Press; Cambridge, MA: 2006. [Google Scholar]
  • 19.Since we find that states where both activators are individually inessential are very rare, approximately 0.001% of the total number of oscillatory states, we ignore such states for the purposes of the figure.
  • 20.François P, Despierre N, Siggia ED. PLoS Comput Biol. 2012;8:e1002585. doi: 10.1371/journal.pcbi.1002585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Pevsner J. Bioinformatics and Functional Genomics. 2. Wiley-Blackwell; Hoboken, New Jersey: 2009. [Google Scholar]
  • 22.Ferrell JE. Trends Biochem Sci. 1996;21:460. doi: 10.1016/s0968-0004(96)20026-x. [DOI] [PubMed] [Google Scholar]
  • 23.Desai MM, Fisher DS. Genetics. 2007;176:1759. doi: 10.1534/genetics.106.067678. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Peterson GJ, Presse S, Peterson KS, Dill KA. PLoS One. 2012;7:e39052. doi: 10.1371/journal.pone.0039052. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES