Significance
Simple biophysical models successfully describe bacterial regulatory code, by predicting gene expression from DNA sequences that bind specialized regulatory proteins. Analogous simple models fail in multicellular organisms, where regulatory proteins bind DNA very transiently, yet, nevertheless, effect precise control over gene expression. To date, the more general, “nonequilibrium” models have proven difficult to analyze and connect to data. Here, we reduce this complexity theoretically, by constructing simple nonequilibrium models which perform optimal gene regulation within known experimental constraints.
Keywords: transcriptional regulation, nonequilibrium models, noise in gene expression, enhancer function, Monod–Wyman–Changeux model
Abstract
In prokaryotes, thermodynamic models of gene regulation provide a highly quantitative mapping from promoter sequences to gene-expression levels that is compatible with in vivo and in vitro biophysical measurements. Such concordance has not been achieved for models of enhancer function in eukaryotes. In equilibrium models, it is difficult to reconcile the reported short transcription factor (TF) residence times on the DNA with the high specificity of regulation. In nonequilibrium models, progress is difficult due to an explosion in the number of parameters. Here, we navigate this complexity by looking for minimal nonequilibrium enhancer models that yield desired regulatory phenotypes: low TF residence time, high specificity, and tunable cooperativity. We find that a single extra parameter, interpretable as the “linking rate,” by which bound TFs interact with Mediator components, enables our models to escape equilibrium bounds and access optimal regulatory phenotypes, while remaining consistent with the reported phenomenology and simple enough to be inferred from upcoming experiments. We further find that high specificity in nonequilibrium models is in a trade-off with gene-expression noise, predicting bursty dynamics—an experimentally observed hallmark of eukaryotic transcription. By drastically reducing the vast parameter space of nonequilibrium enhancer models to a much smaller subspace that optimally realizes biological function, we deliver a rich class of models that could be tractably inferred from data in the near future.
An essential step in the control of eukaryotic gene expression is the interaction between transcription factors (TFs), various necessary cofactors, and TF binding sites (BSs) on the regulatory segments of DNA known as enhancers (1). While we are far from having either a complete parts list for this extraordinarily complex regulatory machine or an insight into the dynamical interactions between its components, experimental observations have established the following constraints on its operation. (i) TFs individually only recognize short, 6- to 10-bp-long BS motifs (2). (ii) TF residence times on the cognate BSs can be as short as a few seconds, much shorter than typical TF residence times on bacterial operators, and only two to three orders of magnitude longer than residence times on nonspecific DNA (3–5). (iii) The order of arrival of TFs to their BSs can affect gene activation (4). (iv) TFs do not activate transcription by RNA polymerase directly, but interact first with various coactivators, essential amongst which is the Mediator complex. (v) Binding of multiple TFs is typically required within the same enhancer for its activation (6), which can lead to very precise downstream gene expression only in the presence of a specific combination of TF concentrations (7). (vi) When activated, gene expression can be highly stochastic and bursty (8–10). (vii) Gene induction curves show varying degrees of steepness, suggesting tunable amounts of cooperativity among TFs (11). Here, we look for biophysical models of enhancer function consistent with these observations.
Mathematical modeling of gene regulation traces its origins to the paradigmatic examples of the bacteriophage switch (12) and the lac operon (13). In prokaryotes, biophysical models have proven very successful (14–16), assuming gene expression to be proportional to the fraction of time RNA polymerase is bound to the promoter in thermodynamic equilibrium; TFs modulate this fraction via steric or energetic interactions with the polymerase. Crucially, these models are very compact: They are fully specified by enumerating all bound configurations and energies of the TFs and the polymerase on the promoter. While some open questions remain (17–19), the thermodynamic framework has provided a quantitative explanation for combinatorial regulation, cooperativity, and regulation by DNA looping (20, 21), while remaining consistent with experiments that also probe the kinetic rates (22, 23).
No such consensus framework exists for eukaryotic transcriptional control. Limited specificity of individual TFs (i) is hard to reconcile with the high specificity of regulation (v) and the suppression of regulatory cross-talk (24), suggesting nonequilibrium kinetic-proofreading schemes (25). Likewise, short TF residence times (ii) and the importance of TF arrival ordering (iii) contradict the conceptual picture where stable enhanceosomes are assembled in equilibrium (4). Kinetic schemes may be required to match the reported characteristics of bursty gene expression (vi) (26) or realize high cooperativity (vii) (27). Thermodynamic models indisputably have statistical power to predict expression from regulatory sequence, even in eukaryotes (28), yet this does not resolve their biophysical inconsistencies or rule out nonequilibrium (“NEQ”) models. Unfortunately, mechanistically detailed NEQ models entail an explosion in the complexity of the corresponding reaction schemes and the number of associated parameters: On the one hand, such models are intractable to infer from data, while on the other, it is difficult to understand which details are essential for the emergence of regulatory function. As a result, existing models that have been confronted with data typically assume or detect nonequilibrium “state transitions” of a promoter without any reference to TF binding (29–32) or only include a phenomenological description of how TFs modulate state transition rates (33). To our knowledge, a class of nonequilibrium gene-expression models that accounts for the chemical kinetics of multiple TF–DNA interactions without losing control over complexity is still lacking.
To deal with the emergent complexity, we systematically simplify the space of enhancer models. We adopt the normative approach, commonly encountered in the applications of optimality ideas in neuroscience and elsewhere (34–36): We theoretically identify those models for which various performance measures of gene regulation, which we call “regulatory phenotypes,” are extremized. Such optimal model classes are our candidates that could subsequently be refined for particular biological systems and confronted with data. Thus, rather than inferring a single model from experimental data or constructing a complex, molecularly detailed model for some specific enhancer, we find the simplest generalizations of the classic equilibrium regulatory schemes to nonequilibrium processes, which drastically improves their regulatory performance, while leaving the models simple to analyze, simulate, and fit.
Results
Model.
Multiple lines of evidence suggest that eukaryotic transcription is a stochastic process which switches between active (ON) and inactive (OFF) states, with rates dependent on the TF concentrations (37–39). We sought to generalize classic regulatory schemes that can describe the balance between ON and OFF transcriptional states in equilibrium. In SI Appendix, section 1.3, we present a generalization of the classic “thermodynamic models” used mainly for prokaryotic gene regulation (20), which give rise to Hill-function-like induction curves in the limit of high TF cooperativity (40). Here, we focus on a generalization of a Monod–Wyman–Changeux (MWC) scheme (41) that has been proposed for eukaryotic gene regulation because it can naturally accommodate TF–nucleosome interactions (42), as well as regulation by different TF species, each binding to multiple BSs (43).
Fig. 1A shows a schematic of the proposed functional enhancer model (SI Appendix, section 1.1 and Fig. S1). A complex of transcriptional cofactors that we refer to as a “Mediator”* can interact with TFs that bind and unbind from their DNA BSs with baseline rates and (Fig. 1 B, i). Mediator—and thus the whole enhancer—can switch between its functional ON/OFF states with baseline rates and (Fig. 1 B, ii). Enhancer ON state and TF bound state are both stabilized (by a factor relative to baseline rates) when a bound TF establishes a “link” with the Mediator (Fig. 1 B, iii). The molecular identity of such links can remain unspecified: It could, for example, correspond to an enzymatic creation of chemical marks (e.g., methylation or phosphorylation) on the TFs or Mediator proteins, conditional on their physical proximity or interaction. Crucially, the links can be established and removed in processes that can break detailed balance and are thus out of equilibrium. Here, we consider that a link is established at a rate between a bound TF and the Mediator complex; for simplicity, we assume that the links break when the TFs dissociate or upon the switch into OFF state (this assumption can be relaxed; SI Appendix, Fig. S2).
Fig. 1.
A nonequilibrium MWC-like model of enhancer function. (A) Schematic representation of TFs (teal circles) interacting with BSs (here, orange slots) and the putative Mediator complex via links (red lines). The Mediator complex can be in two conformational states (OFF or ON), with the ON state enabling productive transcription of the regulated gene. Increasing TF concentration, , facilitates TF binding and the switch into the ON state (left to right). (B) Key reactions and rates of the NEQ model. (B, i) TFs can bind with concentration dependent on rate () and unbind with basal rate that is, in principle, sequence-dependent. (B, ii) The Mediator state switches between the conformational states with basal rates and . (B, iii) Linking and unlinking of TFs to Mediator can move the system out of equilibrium: Links are established with rate , and the link stabilizes both TF residence and the ON state of the Mediator by a factor per established link. (C) Regulatory phenotypes. Mean TF residence time, , on specific sites in functional enhancers (solid curves) vs. random site on the DNA (dashed curves) increases with concentration (Top), as does mean expression, (the fraction of time the Mediator is ON; induction curve, Middle, with sensitivity, , defined as the slope of the induction curve at midpoint expression). Specificity, , is defined as the ratio of expression from the specific sites in the enhancer relative to the expression from random piece of DNA.
An important thrust of our investigations will concern the role of limited specificity of individual TFs to recognize their cognate sequences on the DNA. If sequence specificity arises primarily through TF binding—a strong, but relatively unchallenged assumption (that can also be relaxed within our framework; SI Appendix, Fig. S3)—then we should ask how likely it is for the Mediator complex to form and activate at specific sites contained within functional enhancers (with low off-rates characteristic of strong eukaryotic TF BSs, ) vs. at random, nonspecific sites on the DNA (with orders-of-magnitude higher individual TF off-rates, ) from which expression should not occur.
Given the number of TF BSs () and the various rate parameters (), the full state of the system—i.e., the probability to observe any number of bound and/or linked TFs jointly with the ON/OFF state of the enhancer—evolves according to a Chemical Master Equation (SI Appendix, section 1.1) that can be solved exactly (44–46) or simulated by using the Stochastic Simulation Algorithm (47). Importantly, we show analytically that our scheme reduces to the true equilibrium MWC model in the limit : In this limit, there can be no distinction between a bound TF and a TF that is both bound and linked, and one can define a free energy that governs the probability of enhancer being ON, which in our model is equal to (a normalized) mean expression level, , with
| [1] |
where , (see also the Fig. 1 legend), and . The parameter thus interpolates between the equilibrium limit in Eq. 1, corresponding to a textbook MWC model, and various nonequilibrium (kinetic) schemes, which we will explore next. A similar generalization with an equilibrium limit exists for thermodynamic Hill-type models, where, furthermore, can be directly identified with cooperativity between DNA-bound TFs (SI Appendix, section 1.3); we will see that this qualitative role of will hold also for the MWC case.
Regulatory Phenotypes.
How does the regulatory performance depend on the enhancer parameters and, in particular, on moving away from the equilibrium limit? To assess this question systematically, we defined a number of “regulatory phenotypes,” enumerated in Table 1 and illustrated in Fig. 1C. As a function of TF concentration, we computed: (i) individual TF residence time, , on specific sites in functional enhancers, as well as on random, nonspecific DNA, because these quantities have been experimentally reported in single-molecule experiments and provide strong constraints on enhancer function; (ii) average expression, , for functional enhancers as well as random, nonspecific DNA; we require to be in the middle () of the wide range reported for functional enhancers; (iii) sensitivity of the induction curve at half-maximal induction, , an observable quantity often interpreted as a signature of cooperativity in equilibrium (“EQ”) models; (iv) specificity, , as the ratio between expression from functional enhancers vs. from nonspecific DNA, which should be as high as possible to prevent deleterious cross-talk or uncontrolled expression (24); and (v) expression noise, , defined more precisely later, originating in stochastic enhancer ON/OFF switching.†
Table 1.
Regulatory phenotypes
Specificity, Residence Time, and Expression.
Fig. 2A explores the relationship between three regulatory phenotypes for a MWC-like enhancer scheme of Fig. 1A: the average TF residence time (), specificity (), and the average expression (), at fixed concentration of the TFs. Each point in this “phase diagram” corresponds to a particular enhancer model; points are accessible by varying and (Fig. 2B) and fall into a compact region that is bounded by intuitive, analytically derivable limits to specificity and the residence time. As tends to large values, approaches one, as it must: Once a TF–Mediator complex forms, large will ensure it never dissociates and expression will tend to one (see also Fig. 2D), irrespective of whether this occurred on a functional enhancer or a random piece of DNA—in this limit, all sequence-discrimination ability is lost, yielding undesirable regulatory phenotypes. In contrast, the EQ MWC limit as (Eq. 1) is functional and, interestingly, corresponds to a nonmonotonic curve in the phase diagram that lower-bounds the specificity of NEQ models accessible at finite values of .
Fig. 2.
Accessible space of regulatory phenotypes. (A) Specificity, , mean TF residence time, (expressed in units in inverse off-rate for isolated TFs at their specific sites, ), and average expression, (color), for MWC-like models with TF BSs, obtained by varying and at fixed TF concentration, . EQ models fall onto the red line; two models with equal TF residence times, I (EQ) and II (NEQ), are marked for comparison. Dashed gray lines show analytically derived bounds. (B) Phase space of regulatory phenotypes is accessed by varying at fixed values of (grayscale; Upper) or varying at fixed values of (grayscale; Lower). (C) As in A, but the TF concentration at each point in the phase space is adjusted to hold average expression fixed at (green). Plotted is a smaller region of phase space of interest; nearly vertical thin lines are equi-concentration contours (SI Appendix, Fig. S6). Mean TF residence time, , depends on TF concentration and differs from the residence time on an isolated BS () because of TF–Mediator interactions. (D) All models in the phase diagrams in A and C collapse onto nearly one-dimensional manifolds (“fixed c,” left axis, for A; “fixed E,” right axis, for C) when plotted as a function of mean TF residence time, , supporting the choice of this variable as a biologically relevant observable (SI Appendix, Fig. S11). Color on the manifold corresponds to mean expression using the colormap of A. Vertical scales are chosen so that models I and II coincide. (E) Induction curves of EQ model I and NEQ model II for expression from functional enhancer that contains specific sites (basal TF off-rate ; solid curves) vs. expression from random DNA containing nonspecific sites (basal TF off-rate here; dashed curves).
In a wide intermediate range of TF residence times, the full space of NEQ MWC-like models—which we can exhaustively explore—offers large, orders-of-magnitude improvements in specificity, essentially using a stochastic variant of Hopfield’s proofreading mechanism (25, 50). This observation is generic, even though the precise values of depend on parameters that we explore below, and always remains bounded from above by (in equilibrium, this is related to stochastic, thermal-fluctuation-driven Mediator transitions to the ON state, even in the absence of bound TFs). At the same average TF residence time and TF concentration, the best NEQ model (II in Fig. 2) will suppress expression from noncognate DNA by almost two orders of magnitude relative to the best EQ model (I). These findings remain qualitatively unchanged for enhancers with a larger number of BSs (SI Appendix, Fig. S4).
A comparison of various enhancer operating regimes is perhaps biologically more relevant at fixed mean expression, allowing the TF concentration to adjust accordingly under cells’ own control, as shown in Fig. 2C for . As TF residence time lengthens with increasing , TFs and the Mediator establish more stable complexes on the DNA, and lower concentrations are needed for all models to reach the desired expression (see also Fig. 2D). Nevertheless, the ability of to increase the specificity in EQ models is limited and saturates at a value substantially below the specificity reachable in NEQ models at much smaller TF residence times. The observations of Fig. 2 A and C underscore an important, yet often overlooked, point: The ability to induce at low TF concentration (that is, high affinity) achieved through “cooperative interactions” at high either has a detrimental or, at best, a marginally beneficial effect for the ability to discriminate between cognate and random DNA sites (that is, high specificity) in equilibrium (24).
Fig. 2E shows induction curves for expression from functional enhancers containing specific sites and from random DNA sites, for EQ (I) and NEQ (II) models. Both yield essentially indistinguishable induction curves for expression from a functional enhancer (which is true generically across our phase diagram; SI Appendix, Fig. S5), suggesting that it would be difficult to discriminate between the models based on induction-curve measurements. In sharp contrast, the behavior of the two models is qualitatively different at nonspecific DNA: With sufficiently high TF concentration (e.g., in an overexpression experiment), the EQ model I will fully induce, even from random DNA, as its BSs get saturated by TFs; on the contrary, the NEQ model II will start inducing at much higher and will never do so fully due to its proofreading capability. Thus, given the relatively weak individual TF preference for cognate vs. noncognate DNA, one should not focus on measuring individual TF binding in search for signatures of EQ vs. NEQ proofreading signatures; rather, the focus should be on measuring gene-expression activity, a behavior generated by collective binding of multiple TFs, to mutated or random enhancer sequences.
Sensitivity.
Intuitively, sensitivity measures the “steepness” of the induction curve. More precisely, is proportional to the logarithmic derivative of the expression with log concentration at the point of half-maximal expression, so that for Hill-like functions, , it corresponds exactly to the Hill coefficient, . Fig. 3A shows that increases monotonically with (and, thus, with ; cf. Fig. 2B), indicating that more stable TF–Mediator complexes indeed lead to higher apparent cooperativity, which is always upper-bounded by the number of TF BSs in the enhancer, . The highly cooperative “enhanceosome” concept (51) would, in our framework, correspond to an equilibrium limit with very high , and, thus, ; yet, the analysis above predicts vanishingly small specificity increases as this limit is approached. In contrast, we observe that the point at which the specificity advantage of NEQ models is maximized—i.e., where is largest—occurs far away from , at much lower values (SI Appendix, Fig. S8). If high specificity is biologically favored, we should, therefore, not expect the “number of known BSs” to equal the “measured Hill coefficient of the induction curve” for well-functioning eukaryotic transcriptional schemes, even on theoretical grounds.
Fig. 3.
Limits to sensitivity and specificity. (A) Sensitivity (apparent Hill coefficient) of enhancer models in the phase diagram of Fig. 2C, at fixed mean expression, . All models approximately collapse onto the manifolds shown for different number of TF BSs, , as (SI Appendix, Fig. S11). (B) Phase diagram of enhancer models for three different values of mean expression, (columns), shows specificity and fraction of variance in enhancer switching propagated to expression noise (see Noise). Compact blue region for each shows all MWC-like models with BSs accessible by varying and with specificity higher than that of the EQ model with lowest noise (red dot). Increase in noise is monotonically related to increase in enhancer correlation time, , marked with dashed vertical lines. Largest specificity increases over EQ models occur at high and, thus, high noise (upper right corner of the blue region). (C) Maximal gain in enhancer specificity for NEQ vs. EQ models for different (legend as in A), as a function of the intrinsic specificity of individual TF BSs, . Expression is fixed to and mean TF residence time to . Typical value used in Fig. 2 and A and B is shown in the vertical dashed line. (D) Same as in C, but with the comparison at fixed gene-expression noise, .
Noise.
Lastly, we turn our attention to gene-expression noise. All stochastic two-state models have a steady-state binomial variance of in the enhancer state, where is the probability of the enhancer to be ON. When ON, transcripts are made and subsequently translated into protein, which typically has a slow lifetime, , on the order of at least a few hours. Random fluctuations in the enhancer state will cause random steady-state fluctuations in protein copy number around the average, ; these fluctuations can be quantified by noise, . While there can be other contributions to noise (e.g., birth–death fluctuations due to protein production and degradation), we focus here solely on the effects of ON/OFF switching, since only these effects depend on the enhancer architecture (35).
How is noise in gene expression, , related to the binomial variance, ? Based on simple noise-propagation arguments (52, 53), fractional variance in protein should be equal to fractional variance in enhancer state times the noise filtering that depends on the timescales of enhancer switching, , and protein lifetime, (here, we assume h), so that (see SI Appendix, section 1.5 for exact derivation). Thus, if enhancer switches much faster than the protein lifetime, , protein dynamics almost entirely averages out the enhancer-state fluctuations. Since all enhancer models have the same binomial variance, the gene-expression noise in various models will be entirely determined by the mean expression, , and the correlation time, , both of which we can compute analytically for any combination of enhancer-model parameters in the phase diagram of Fig. 2.
Fig. 3B shows the phase diagram of accessible MWC-like regulatory phenotypes for the specificity (), mean expression (), and fraction of enhancer switching noise that propagates to gene expression, , found by varying and . As in Fig. 2, EQ models have the lowest specificity , but also the lowest correlation time and, thus, lowest noise, regardless of the average expression, . There exist NEQ models that achieve higher specificity at a small increase in noise, but the highest-specificity increases always come hand-in-hand with a substantial lengthening of the correlation times in enhancer-state fluctuations, and, thus, with the inevitable increase in noise.
To better elucidate the trade-offs and limits to specificity in NEQ vs. EQ models, we next explore how enhancer-specificity gains depend on the ability of individual TFs to discriminate cognate BSs from random DNA in Fig. 3C. If individual TFs permit very strong discrimination (; prokaryotic TF regime), NEQ models at fixed individual TF residence times, , do not offer appreciable specificity increases in the collective enhancer response; in contrast, for the range around typically reported for eukaryotic TFs, the specificity increase ranges from 10- to 1,000-fold, with the peak depending on the number of TF BSs, , as well as baseline Mediator specificity limit, (as this increases, the peak specificity gain is higher and moves toward lower ; SI Appendix, Fig. S9). If, instead of fixing , as we have done until now, we pick this ratio to maximize the specificity gain () and again explore the noise-specificity trade-off as in Fig. 3B, we find that the extreme specificity gains are only possible when correlation times, diverge (SI Appendix, Fig. S10), implying high noise. SI Appendix, Fig. S12 depicts the global trade-off between noise, specificity, and sensitivity in an alternative fashion.
These observations are summarized in Fig. 3D, showing the specificity gain of NEQ models relative to EQ models, if the comparison is made at fixed noise level rather than at fixed individual TF residence time, as in Fig. 3C. Specificity gains are limited to roughly 10-fold, even when, as we do here, we systematically search for the best NEQ models through the complete phase diagram in Fig. 2C. The specificity–noise trade-off thus appears unavoidable.
Experimentally Observable Signatures of Enhancer Function.
To illustrate how the proposed NEQ MWC-like scheme could function in practice, we simulated it explicitly and compared it to an EQ scheme with the same mean TF residence time in Fig. 4. The two enhancers, composed of TF BSs, responded to a simulated protocol where the TF concentration was first switched from a minimal value that drives essentially no expression to a high value giving rise to , and, after a long stationary period, the concentration was switched back to the low value. The comparative results we report below are representative and qualitatively hold also for other simulation-parameter choices.
Fig. 4.
High-specificity NEQ schemes predict bursty gene expression. (A) Stochastic simulation of an EQ and an NEQ enhancer model with TF BSs, responding to a TF concentration step (bottom-most panel). Average TF residence times are matched between EQ and NEQ models at , s, and both induction curves (scaled for half-maximal concentration) are nearly identical, with sensitivity . When TF concentration is high, expression is fixed at . Parameters for NEQ model: , , ; for EQ model: , . Rasters show the occupancy of TF BSs; the orange line above shows the enhancer ON/OFF state; the zoom-in for the EQ model is necessary due to its fast dynamics. (B) Regulatory phenotypes for EQ and NEQ models during steady-state epoch (gray in A). Specificity () and enhancer-state correlation time () are higher for the NEQ model; the Mediator mean ON residence time, , is the same between the models, but the probability density function reveals a long tail in the NEQ scheme and a nearly exponential distribution for the EQ scheme. The bottom two panels show the TF occupancy histogram during a high TF concentration interval, conditional on the enhancer being OFF or ON. (C) Transient behavior of the mean enhancer state (), mean protein number (; assuming deterministic production/degradation protein dynamics given enhancer state), and gene-expression noise () for the NEQ and EQ models, upon a TF concentration low-to-high switch (Left) and high-to-low switch (Right). Traces shown are computed as averages over 1,000 stochastic simulation replicates.
Fig. 4A shows the occupancy of the BSs and the functional ON/OFF state of the enhancer. Even though the two models share the same TF mean residence time and nearly indistinguishable induction curves (with ), their collective behaviors are markedly different: The EQ scheme appears to have significantly faster TF binding/unbinding as well as Mediator switching dynamics, whereas the NEQ scheme undergoes long, “bursty” periods of sustained enhancer activation and TF binding that are punctuated by OFF periods. If the typical residence time of an isolated TF on its specific site were s, an NEQ enhancer could stay active even for hour-long periods ( s), just somewhat shorter than the protein lifetime ( s). Such enhancer-associated stable mediator clusters are consistent with recent experimental reports (54, 55).
The detailed steady-state behavior at high TF concentration is analyzed in Fig 4B. Consistent with our theoretical expectations, the NEQ scheme enables 10-fold higher specificity, but at the cost of substantial noise in gene expression (), due to strong transcriptional bursting. High noise is a direct consequence of the much longer correlation time of enhancer fluctuations, , for the NEQ scheme, seen in Fig. 4A. Interestingly, the mean residence time of the enhancer ON state, , is nearly unchanged between the EQ and NEQ scheme at s, but, here, the mean turns to be a highly misleading statistic, as revealed by an in-depth exploration of the full probability density function. The NEQ scheme has a long tail of extended ON events interspersed with an excess of extremely short OFF events (due to the high rate necessary for high specificity) relative to the EQ scheme (which, itself, does not deviate strongly from an exponential density function with a matched mean). The behavior of such an enhancer is highly cooperative, even though the sensitivity () is not maximal: When the enhancer is ON, with very high probability, all TFs are bound, and when OFF, often four out of five TFs are bound—yet the enhancer is not activated. In sum, a well-functioning NEQ regulatory apparatus with its Mediator complex makes many short-lived attempts to switch ON, but only commits to a long, productive ON interval rarely and collectively, after insuring that activation is happening due to a sequence of valid molecular recognition events between several TFs and their cognate BSs in a functional enhancer.
Transient behavior after a TF concentration change is analyzed in Fig. 4C. The mean response time of the two models to the concentration change is governed by the correlation time of the enhancer state, , and is, thus, much slower for NEQ vs. EQ models, but since the protein lifetime is even longer, the mean protein levels adjust equally quickly in the EQ and NEQ cases. This suggests that the dynamics of the mean protein level is unlikely to discriminate between EQ and NEQ models. In contrast, live imaging of the nascent messenger RNA (mRNA) could put constraints on (1). In that case, the filtering time scale is the elongation time, typically on the order of a few minutes, while the reported transcriptional response times—and, thus, estimates of —would range from minutes to 1 to 2 h (9, 26).
Steady-state noise levels at high induction, as reported already, are considerably higher for the NEQ model due to transcriptional bursting; an intriguing further suggestion of our analyses is a long transient in the noise levels upon a high-to-low TF concentration switch, which finally settles to a high fractional noise level (here, ), even at very low induction, due to sporadic transcriptional bursts.
Discussion
In this paper, we took a normative approach to address the complexity of eukaryotic gene-regulatory schemes. We proposed a minimal extension to a well-known MWC model that can be applied to the switching between the active and inactive states of an enhancer. The one-parameter extension is kinetic and accesses NEQ system behaviors. We analyzed the parameter space of the resulting model and visualized the phase diagram of “regulatory phenotypes,” quantities that are either experimentally constrained (such as mean expression, mean TF residence time, or sensitivity), are likely to be optimized by evolutionary pressures (such as noise and specificity), or both. This allowed us to recognize and understand biophysical limits and trade-offs and to identify the optimal operating regime of the proposed enhancer model that is consistent with current observations, as we summarize next.
Our analyses suggest the following. (i) Individual TFs are limited in their ability to discriminate specific from random sites, , so high specificity must be a collective enhancer effect in the proofreading regime, where . (ii) Mean TF residence times in an enhancer are not much higher than the typical TF residence time at an isolated specific site, , enabling rapid turnover of bound TFs on the 1- to 10-s timescale. (iii) Typical sensitivities are much lower than the total number of TF BSs, yielding a reasonable specificity/noise balance at (SI Appendix, Figs S7 and S8). (iv) Mediator basal rates should maximize ; i.e., mediator switches OFF essentially instantaneously if not stabilized by linked TFs. (v) TF concentrations required to activate the enhancer in this regime are substantially higher than expected for the equivalent, but highly cooperative, enhanceosome (at higher ). (vi) Optimal NEQ models achieve order-of-magnitude improvements in relative to matched EQ models—thereby avoiding cross-talk and spurious gene expression—by suppressing induction from noncognate (random) DNA, while induction curves from functional enhancers bear no clear signatures of NEQ operation. (vii) To permit large increases in specificity , enhancer-state fluctuations will develop long timescale correlations, (but still be bounded by the protein lifetime, , to enable noise averaging), leading to substantial observed noise levels. (viii) Enhancer ON residence-time distribution will be nonexponential, with excess probability for very long-lived events, during which an enhancer could trigger a transcriptional burst following an interaction with the promoter. (ix) In our model, long correlation time, , in steady state also implies long (minutes to hours) response times when TF concentration changes, which would be observable with live imaging on the transcriptional, but likely not protein-concentration, level.
We find it intriguing that a single-parameter extension of a classic EQ model led to such richness of observed behaviors and to a suggestion that the optimal operating regime is very different from regulation at equilibrium. Central to this qualitative change is the fact that long fluctuation and response timescales of enhancer activation appear necessary to achieve high specificity of regulation through proofreading. Such long timescales are not inconsistent with our current knowledge. Indeed, some developmental enhancers form active clusters (superenhancers) that are rather long-lived (order of minute to hours), perhaps precisely because developmental events need to be guided with extraordinary precision (55, 56).
A key open question concerns the universality of the trade-off between noise and specificity. Can this trade-off be mitigated or avoided in more complex enhancer models? We have numerically explored a generalization of our setup where, upon establishing the TF–Mediator link, the TF residence time is stabilized by that is different from the stabilization factor for the Mediator (SI Appendix, Figs S13 and S14). We observe that the noise/specificity trade-off can be alleviated, yet not removed, in an optimal regime of operation where . Understanding the emergence and the conditions under which such trade-offs apply analytically is a future theoretical challenge.
An experimental test of our model should proceed in two stages. The first stage is to qualitatively demonstrate the increased specificity in eukaryotic regulation due to kinetic proofreading (25). To that end, the most promising venue is suggested by Fig. 2E: confront the regulatory apparatus with synthetically mutated TF BSs, while increasing the concentration of the implicated TFs. In EQ models, mismatches in the binding sequence can always be compensated for by a higher TF concentration, thereby maintaining high-output gene expression; in proofreading models, in contrast, expression from “mismatched” regulatory sequences can be suppressed independently of the TF concentration. Only if this first test is passed could our model be further tested quantitatively. This would involve inferring its parameters (BS dissociation rates, , and ) from stationary transcriptional time series and predicting gene-expression response (e.g., mean transcriptional activation and its temporal autocorrelation function) upon a change in the TF concentration, as in Fig. 4. Established statistical methodology, including Bayesian model selection, could be used to compare the suggested model against equilibrium schemes or rigorously select between alternative nonequilibrium formulations. As a complement to statistical evidence, we cannot as of yet suggest a single “smoking gun” experimental test to unambiguously rule in or out kinetic schemes based on a finite rate of link establishment () between bound TFs and the Mediator that we propose here.
A strong objection to our model could be that it is too simple: After all, we neglected many structural and molecular details, many of which we may not even know yet. This is certainly true and was done, in part, on purpose, to permit exhaustive analysis across the complete parameter space. Such understanding would have been impossible if we explored much richer models or were concerned with quantitative fitting to a particular dataset. These are clearly the next steps, to which we contribute by highlighting the functional importance of breaking the equilibrium link between TF binding and enhancer activation state. Since our model is fully probabilistic, specializing it for a particular experimental setup—e.g., live transcriptional imaging—and doing rigorous inference is technically tractable, but beyond the scope of this paper.
Perhaps a key simplification of our model concerns the link between the enhancer/Mediator ON state and transcriptional activity. We assumed that expression is proportional to the probability of the enhancer state to be ON, yet the enhancer–promoter interaction itself is a matter of vibrant current experimentation and modeling (10, 54, 57–59). For example, long-lived activated enhancers that we predict could interact with promoters only intermittently to trigger transcriptional bursts, as suggested by the “dynamic kissing model” (55), which could substantially impact the experimentally observable quantitative noise signatures of enhancer function at the transcriptional level. Whatever the true nature of enhancer–promoter interactions might be, however, they are unlikely to be able to remove excess enhancer switching noise, due to its slow timescale, suggesting that the trade-offs that we identify should hold generically.
One could also question whether the importance we ascribed to high specificity is really warranted. Evolutionarily, regulatory cross-talk due to lower specificity helps networks evolve during transient bouts of adaptation, even though it could be ultimately selected against (60). Mechanistically, molecular processes, such as chromatin modification or the regulated three-dimensional structure of DNA, decrease the number of possible noncognate targets that could trigger erroneous gene expression (61, 62) and, thus, alleviate the need for high specificity of transcriptional control. Empirically, there is ample evidence for abortive or nonsensical transcriptional activity (63, 64), whose products could be dealt with downstream or simply ignored by the cell. Yet it is also clear that regulatory specificity must be a collective effect, as individual TFs bind pervasively across DNA, even in nonregulatory regions (65), and self-consistent arguments suggest that, in the absence of nonequilibrium mechanisms, cross-talk could be overwhelming in eukaryotes (24). It is also possible that real enhancers are very diverse, with large variation along the specificity axis, thereby navigating the noise–specificity trade-off as appropriate, given the biological context. Where some erroneous induction can be tolerated, expression could be quicker, less noisy, and closer to equilibrium. In contrast, where tight control is needed, enhancers could take a substantial amount of time to commit to expression correctly, perhaps benefitting additionally from extra time-averaging that could further reduce the Berg–Purcell-type noise intrinsic to TF concentration sensing (53, 66–68).
Supplementary Material
Acknowledgments
G.T. was supported by Human Frontiers Science Program Grant RGP0034/2018. R.G. was supported by the Austrian Academy of Sciences DOC Fellowship. R.G. thanks S. Avvakumov for helpful discussions.
Footnotes
The authors declare no competing interest.
This article is a PNAS Direct Submission.
*Our nomenclature is simply a shorthand for all co-factors necessary for eukaryotic transcriptional activation at an enhancer, which can include proteins not strictly a part of the Mediator family.
†Protein noise levels in Table 1 are estimated from reported mRNA noise levels.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2006731117/-/DCSupplemental.
Data Availability.
There are no data underlying this work.
References
- 1.Antoine C., Chow C. C., Singer R. H., Larson D. R., Eukaryotic transcriptional dynamics: From single molecules to cell populations. Nat. Rev. Genet. 14, 572–584 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wunderlich Z., Mirny L. A., Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 25, 434–440 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gebhardt C. M., et al. , Single-molecule imaging of transcription factor binding to DNA in live mammalian cells. Nat. Methods 10, 421–426 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Chen J., et al. , Single-molecule dynamics of enhanceosome assembly in embryonic stem cells. Cell 156, 1274–1285 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Thomas C., et al. , Hit and run versus long-term activation of PARP-1 by its different domains fine-tunes nuclear processes. Proc. Natl. Acad. Sci. U.S.A. 116, 9941–9946 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Shlyueva D., Stampfel G., Stark A., Transcriptional enhancers: From properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286 (2014). [DOI] [PubMed] [Google Scholar]
- 7.Petkova M. D., Tkačik G., Bialek W., Wieschaus E. F., Gregor T., Optimal decoding of cellular identities in a genetic network. Cell 176, 844–855.e15 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Nicolas D., Zoller B., Suter D. M., Naef F., Modulation of transcriptional burst frequency by histone acetylation. Proc. Natl. Acad. Sci. U.S.A. 115, 7153–7158 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Molina N., et al. , Stimulus-induced modulation of transcriptional bursting in a single mammalian gene. Proc. Natl. Acad. Sci. U.S.A. 110, 20563–20568 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Bartman C. R., Hsu S. C., Hsiung C. C.-S., Raj A., Blobel G. A.. Enhancer regulation of transcriptional bursting parameters revealed by forced chromatin looping. Mol. Cell 62, 237–247 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Park J., et al. , Dissecting the sharp response of a canonical developmental enhancer reveals multiple sources of cooperativity. eLife 8, e41266 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Ptashne M., A Genetic Switch: Gene Control and Phage [lambda] (Cell Press Cambridge, MA, 1986). [Google Scholar]
- 13.Kuhlman T., Zhang Z., Saier M. H., Hwa T., Combinatorial transcriptional control of the lactose operon of Escherichia coli. Proc. Natl. Acad. Sci. U.S.A. 104, 6043–6048 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Berg O. G., Peter von Hippel H., Selection of DNA binding sites by regulatory proteins: Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 193, 723–743 (1987). [DOI] [PubMed] [Google Scholar]
- 15.Kinney J. B., Murugan A., Callan C. G., Cox E. C., Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl. Acad. Sci. U.S.A. 107, 9158–9163 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Belliveau N. M., Kinney J. B., Phillips R., Systematic approach for dissecting the molecular mechanisms of transcriptional regulation in bacteria. Proc. Natl. Acad. Sci. U.S.A. 115, E4796–E4805 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Garcia H. G., et al. , Operator sequence alters gene expression independently of transcription factor occupancy in bacteria. Cell Rep. 2, 150–161 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Hammar P., et al. , Direct measurement of transcription factor dissociation excludes a simple operator occupancy model for gene regulation. Nat. Genet. 46, 405–408 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Forcier T. L., et al. , Measuring cis-regulatory energetics in living cells using allelic manifolds. Elife 7, e40618 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Bintu L., et al. , Transcriptional regulation by the numbers: Applications. Curr. Opin. Genet. Dev. 15, 125–135 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Bintu L., et al. , Transcriptional regulation by the numbers: Models. Curr. Opin. Genet. Dev. 15, 116–124 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Maerkl S. J., Quake S. R., A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007). [DOI] [PubMed] [Google Scholar]
- 23.Jones D. L., Brewster R. C., Phillips R., Promoter architecture dictates cell-to-cell variability in gene expression. Science 346, 1533–1536 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Friedlander T., et al. , Intrinsic limits to gene regulation by global crosstalk. Nat. Commun. 7, 12307 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Cepeda-Humerez S. A., Rieckh G., Tkačik G., Stochastic proofreading mechanism alleviates crosstalk in transcriptional regulation. Phys. Rev. Lett. 115, 248101 (2015). [DOI] [PubMed] [Google Scholar]
- 26.Donovan B. T., et al. , Live-cell imaging reveals the interplay between transcription factors, nucleosomes, and bursting. EMBO J. 38, e100809 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Estrada J., Wong F., DePace A., Gunawardena J., Information integration and energy expenditure in gene regulation. Cell 166, 234–244 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gertz J., Siggia E. D., Cohen B. A., Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature 457, 215–218 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Blake W. J., Kærn M., Cantor C. R., Collins J. J., Noise in eukaryotic gene expression. Nature, 422, 633–637 (2003). [DOI] [PubMed] [Google Scholar]
- 30.Suter D. M., et al. , Mammalian genes are transcribed with widely different bursting kinetics. Science 332, 472–474 (2011). [DOI] [PubMed] [Google Scholar]
- 31.Zoller B., Nicolas D., Molina N., Naef F., Structure of silent transcription intervals and noise characteristics of mammalian genes. Mol. Syst. Biol. 11, 823–823 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Dunham L. S. S., et al. Asymmetry between activation and deactivation during a transcriptional pulse. Cell Syst. 5, 646–653 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Li C., Cesbron F., Oehler M., Brunner M., Höfer T., Frequency modulation of transcriptional bursting enables sensitive and rapid gene regulation. Cell Syst. 6, 409–423 (2018). [DOI] [PubMed] [Google Scholar]
- 34.Tkačik G., Walczak A. M., Information transmission in genetic regulatory networks: A review. J. Phys. Condens. Matter 23, 153102 (2011). [DOI] [PubMed] [Google Scholar]
- 35.Rieckh G., Tkačik G., Noise and information transmission in promoters with multiple internal states. Biophys. J. 106, 1194–1204 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Tkačik G., Bialek W., Information processing in living systems. Annu. Rev. Condens. 7, 89–117 (2016). [Google Scholar]
- 37.Larson D. R., et al. , Direct observation of frequency modulated transcription in single cells using light activation. eLife 2, e00750 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Senecal A., et al. , Transcription factors modulate c-Fos transcriptional bursts. Cell Rep. 8, 75–83 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zoller B., Little S. C., Gregor T., Diverse spatial expression patterns emerge from unified kinetics of transcriptional bursting. Cell 175, 835–847.e25 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Phillips R., Theriot J., Kondev J., Garcia H., Physical Biology of the Cell (Garland Science, New York, NY, 2012). [Google Scholar]
- 41.Changeux J.-P., Allostery and the Monod-Wyman-Changeux model after 50 years. Annu. Rev. Biophys. 41, 103–133 (2012). [DOI] [PubMed] [Google Scholar]
- 42.Mirny L. A., Nucleosome-mediated cooperativity between transcription factors. Proc. Natl. Acad. Sci. U.S.A. 107, 22534–22539 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Walczak A. M., Tkačik G., Bialek W., Optimizing information flow in small genetic networks. II. Feed-forward interactions. Phys. Rev. 81, 041905 (2010). [DOI] [PubMed] [Google Scholar]
- 44.Sanchez A., Kondev J., Transcriptional control of noise in gene expression. Proc. Natl. Acad. Sci. U.S.A. 105, 5081–5086 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lestas I., Paulsson J., Ross N. E., Vinnicombe G., Noise in gene regulatory networks. IEEE Trans. Automat. Contr. 53, 189–200 (2008). [Google Scholar]
- 46.Walczak A. M., Mugler A., WIggins C. H., Analytic methods for modeling stochastic regulatory networks. Methods Mol. Biol. 880, 273–322 (2012). [DOI] [PubMed] [Google Scholar]
- 47.Gillespie D. T., Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 58, 35–55 (2007). [DOI] [PubMed] [Google Scholar]
- 48.Morisaki T., Müller W. G., Golob N., Mazza D., McNally J. G., Single-molecule analysis of transcription factor binding at transcription sites in live cells. Nat. Commun. 5, 4456 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Zenklusen D., Larson D. R., Singer R. H., Single-RNA counting reveals alternative modes of gene expression in yeast. Nat. Struct. Mol. Biol. 15, 1263–1271 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Hopfield J. J., Kinetic proofreading: A new mechanism for reducing errors in biosynthetic processes requiring high specificity. Proc. Natl. Acad. Sci. U.S.A. 71, 4135–4139 (1974). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Arnosti D. N., Kulkarni M. M., Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards?. J. Cell. Biochem. 94, 890–898 (2005). [DOI] [PubMed] [Google Scholar]
- 52.Paulsson J., Summing up the noise in gene networks. Nature 427, 415–418 (2004). [DOI] [PubMed] [Google Scholar]
- 53.Tkačik G., Gregor T., Bialek W., The role of input noise in transcriptional regulation. PloS One 3, e2774 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Chen H., et al. , Dynamic interplay between enhancer–promoter topology and gene activity. Nat. Genet. 50, 1296–1303 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Cho W.-K., et al. , Mediator and RNA polymerase II clusters associate in transcription-dependent condensates. Science 361, 412–415 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Sabari B. R., et al. , Coactivator condensation at super-enhancers links phase separation and gene control. Science 361, eaar3958 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Ren G., et al. , CTCF-mediated enhancer-promoter interaction is a critical regulator of cell-to-cell variation of gene expression. Mol. Cell 67, 1049–1058.e6 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Hnisz D., Krishna S., Young R. A., Chakraborty A. K., Sharp P. A., A phase separation model for transcriptional control. Cell 169, 13–23 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Bialek W., Gregor T., Tkačik G., Action at a distance in transcriptional regulation. arXiv:1912.08579 (18 December 2019). [Google Scholar]
- 60.Friedlander T., Prizak R., Barton N. H., Tkačik G.. Evolution of new regulatory functions on biophysically realistic fitness landscapes. Nat. Commun. 8, 216 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Adam R. C., et al. , Pioneer factors govern super-enhancer dynamics in stem cell plasticity and lineage choice. Nature 521, 366–370 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Klemm S. L., Zohar S., Greenleaf W. J., Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet. 20, 207–220 (2019). [DOI] [PubMed] [Google Scholar]
- 63.Struhl K., Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat. Struct. Mol. Biol. 14, 103–105 (2007). [DOI] [PubMed] [Google Scholar]
- 64.Ehrensberger A. H., Kelly G. P., Svejstrup J. Q., Mechanistic interpretation of promoter-proximal peaks and RNAPII density maps. Cell 154, 713–715 (2013). [DOI] [PubMed] [Google Scholar]
- 65.Biggin M. D., Animal transcription networks as highly connected, quantitative continua. Dev. Cell 21, 611–626 (2011). [DOI] [PubMed] [Google Scholar]
- 66.Berg H. C., Purcell E. M., Physics of chemoreception. Biophys. J. 20, 193–219 (1977). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Bialek W., Setayeshgar S., Physical limits to biochemical signaling. Proc. Natl. Acad. Sci. U.S.A. 102, 10040–10045 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Kaizu K., et al. , The Berg-Purcell limit revisited. Biophys. J. 106, 976–985 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
There are no data underlying this work.




