Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Apr 17.
Published in final edited form as: Phys Rev Lett. 2016 Feb 1;116(5):058101. doi: 10.1103/PhysRevLett.116.058101

Constraints on Fluctuations in Sparsely Characterized Biological Systems

Andreas Hilfinger 1, Thomas M Norman 1, Glenn Vinnicombe 2, Johan Paulsson 1,
PMCID: PMC4834202  NIHMSID: NIHMS775095  PMID: 26894735

Abstract

Biochemical processes are inherently stochastic, creating molecular fluctuations in otherwise identical cells. Such “noise” is widespread but has proven difficult to analyze because most systems are sparsely characterized at the single cell level and because nonlinear stochastic models are analytically intractable. Here, we exactly relate average abundances, lifetimes, step sizes, and covariances for any pair of components in complex stochastic reaction systems even when the dynamics of other components are left unspecified. Using basic mathematical inequalities, we then establish bounds for whole classes of systems. These bounds highlight fundamental trade-offs that show how efficient assembly processes must invariably exhibit large fluctuations in subunit levels and how eliminating fluctuations in one cellular component requires creating heterogeneity in another.


Processes that create nongenetic heterogeneity are ubiquitous in cells [14]. They are typically analyzed by simulating stochastic models for assumed interactions and parameters or by deriving intuitive results for approximate toy models. However, important properties are often unknown, broad principles are hard to extrapolate from examples, and many different stochastic models can fit the same data. Engineering and physics have met similar challenges by deriving results for families of models in terms of quantities that are more easily interpreted or measured [5,6]. To be broadly applicable in biology, such generalized analytical approaches would need to account for inherently stochastic processes far from thermodynamic equilibrium and allow for nonlinear reaction rates of adding or removing individual components in discrete steps or bursts—without linearizations or Gaussian approximations. They should also be formulated in terms of properties that have clear physical definitions or can be estimated experimentally, and—most importantly—be able to make strong statements about sparsely characterized reaction networks without ignoring or guessing the unknown parts.

This may seem impossible, and indeed it is if the goal is to obtain closed-form expressions capturing a system's behavior: most nonlinear stochastic models are analytically intractable, and the question of how a system behaves is not even well posed unless all parts are specified. However, it is possible to take this approach to determine bounds on behavior for classes of systems that share some specified parts but differ arbitrarily in any other parts. Specifically, though each network component is affected by every other indirectly connected component, the differential equations for averages and variances only directly depend on how the corresponding component is made and degraded. Those equations can be combined with basic statistical inequalities to derive general bounds, which, in turn, can be exactly expressed in terms of physical observables that can be experimentally identified without knowing the microscopic details of the system. If the bounds are achievable, this combination of simple mathematical principles identifies broad rules for what is possible in cells. A philosophically similar but mathematically different approach has been used to study the variation in reaction times for single substrate molecules, considering complex transitions between intermediate molecular states [79] while we consider complex control networks of interacting components.

General fluctuation constraints in terms of physical observables

We consider the general discrete stochastic process with state vector x = (x0, …, xn) undergoing reactions

xrk(x)x+dkk=1,2, (1)

that change the level of component Xi (lowercase xi denotes its abundance) by a discrete jump of size dik, with a rate rk(x) that can depend on the state of the entire system. The evolution equations for the statistical moments can then be summarized by collecting birth reactions with positive dik and death reactions with negative dik for each component Xi to define total birth (Ri+) and death (Ri) fluxes as Ri±=krk|dik| for dik > 0 and dik < 0, respectively. Specifically, the average abundances 〈xi〉 and the (co)variance matrix C with entries Cij = 〈xixj〉 − 〈xi〉 〈xj〉 are described by

dxidt=Ri+Ri=:ΔRidCijdt=Cov(xi,ΔRj)+Cov(xj,ΔRi)+kdikdjkrk. (2)

These are the usual moment equations for stochastic reaction systems [10]. We will consider them element by element for a small subset of specified variables (without loss of generality denoted X0, …, Xm). The unspecified variables (Xm+1, Xm+2…) are then trivially allowed any network topology and dynamics, and nonlinear feedback loops are allowed to connect the specified and unspecified parts. In fact, the unspecified variables may include nonphysical “mock variables”, and because we allow an unbounded number of such variables, there is virtually no limit to the complexity of features that can be realized by the unspecified parts, including history dependent dynamics. The results below thus also apply to a wide range of non-Markovian systems that technically are not described by a chemical master equation (see Supplemental Material [11]).

As in most previous studies, we restrict our analyses to systems where the specified variables are wide-sense stationary, which can be experimentally verified and does not exclude oscillatory or multistable systems that eventually decorrelate in the population-level statistics. The results also apply to the time averages of nonstationary systems with coordinated behavior that does not decorrelate (see Supplemental Material [11]).

The problem is that the equation systems cannot be solved if only a few parts are specified. However, random variables are also subject to many universal inequalities, e.g., that the normalized covariance matrices must be positive semidefinite

det(η)0forηij=Cov(xi,xj)xixj. (3)

Combining these with the equations of the specified components constrains the range of possible fluctuations.

Another challenge is that each specified covariance equation has a “diffusion” term that depends on the reaction step sizes and the average rates. When the rates are nonlinear functions of the state variables, these are analytically intractable, and even for linear rates, they can quickly become algebraically complicated combinations of kinetic parameters. We therefore exactly map the terms onto general physical descriptors. First, we apply Little's Law [21] from queueing theory, which universally relates arrival and service rates to lengths of queues and exactly applies to the type of stochastic processes above: regardless of the complexity of the system or nonlinearities in degradation rates, at stationarity the average lifetime τi of any component Xi satisfies

τi=xi/Ri. (4)

Next, we identify the average jump sizes 〈sij〉, defined as the average change in the number of Xj molecules as an Xi molecule is made or degraded, where the change is negative if one is made while the other is degraded. These quantities characterize the discreteness of each component's dynamics and are formally defined as

sij=kρik|djk|sgn(dikdjk), (5)

where ρik is the fraction of flux of component XI going through reaction k. They thus have straightforward interpretations regardless of their exact numerical values, but the latter also often follow directly from the stoichiometry of reactions changing Xi levels regardless of the rest of the network. For example, if X2 is made in bursts of a and eliminated in bursts of b then 〈s22〉 = (a + b)/2, and if the degradation bursts coincide with the production of c copies of X7, then 〈s27〉 = −c/2, regardless of how rate functions depend on other components in the network. The values of 〈sij〉 can even be analytically identified for broad ranges of systems with distributed step sizes, nonlinear rates, and incompletely specified dynamics (see Supplemental Material [11]).

Simple algebraic manipulations then show that:

U+UT=D (6)

for the matrices with entries

Uij=1τjCov(xi,RjRj+)xiRj±Dij=1τisijxj+1τjsjixi.

The “diffusion” matrix D exactly captures randomizing low number effects in terms of average abundances, lifetimes and step sizes—which have distinct physical interpretations and often can be experimentally identified—while U quantifies the correlations between states and net fluxes, which in conventional linearized approximations becomes a product of ηij and a corrective “drift” matrix (see Supplemental Material [11]).

The approach above makes no mathematical approximations for discrete and nonlinear stochastic systems, but the identified constraints could be mathematically conservative, or the fluctuations could be insignificantly constrained by a few specific assumptions if the components of interest can affect and be affected by arbitrarily complex systems. However, in all systems we have considered, the bounds have been surprisingly tight and severe even when specifying very little. Next, we demonstrate how our approach can be used to reveal unavoidable performance trade-offs in two central biological regulatory architectures: complex formation and negative feedback loops.

Bounds for generalized assembly processes

Many structures in cells are made by assembling smaller pieces into larger complexes. Unbalanced production rates will then cause some pieces to accumulate, subjecting them to degradation and dilution. Systems selected to minimize such wasteful turnover can tune the average production rates to match the assembly stoichiometries, as observed for many proteins expressed jointly in operons [22], but the probabilistic nature of individual reaction events will still cause temporary surpluses or deficits. Here, we analyze such operons embedded in complex networks, where subunits X0, X1 are produced at the same rate, degraded individually, and form a stable complex according to

x0R(u,x0,x1)x0+bx1R(u,x0,x1)x1+bx0βx0x01x1βx1x11(x0,x1)γx0x1(x01,x11). (7)

Here, R(u, x0, x1) is an arbitrary function that allows for any direct or indirect feedback control (or randomization) via the unspecified components u(t) := x2, x3, x4, … as indicated by the cloud in Fig. 1, including history dependence. Births are allowed to occur with an arbitrary “burst” size b, which can be straightforwardly generalized to arbitrary distributions of bursts for all applications we consider (see Supplemental Material [11]). We then formulate the results in terms of the assembly efficiency E, i.e., the fraction of molecules of either subunit that eventually end up in complexes

Fig. 1. Efficient complex assembly implies variability.

Fig. 1

We consider all reaction systems in which subunits X0 and X1 form stable complexes as defined in Eq. (7), where the cloud indicates that all system components can arbitrarily affect the shared production rate. The inset time traces illustrate the dramatic fluctuations in subunit levels, suggesting that small noise approximations are inaccurate. The solid red line is the exact lower bound on subunit fluctuations Eq. (9) as it applies to the average of the subunit noise levels [11]. As the efficiency approaches 100%, the subunit fluctuations diverge. We plot the average of η00 and η11 normalized by the noise that subunits would exhibit in the absence of feedback and complex formation. Dots indicate simulation results for different realizations of assembly processes across a range of parameters, feedback functions, and embedding networks. Many systems deviate greatly from the isolated linear noise prediction (dashed yellow line), but no system can beat the exact bound. The simulation data should not be interpreted in terms of density but are presented to illustrate what is possible and show by construction that the bound is virtually tight.

E:=γx0x1βx1+γx0x1=2s01=2s10. (8)

The typical approach is to analyze the above reactions as an isolated module, i.e., Eq. (7) for R(u, x0, x1) = const. This leads to a simple toy model of stochastic assembly processes that has been analyzed in many contexts [23,24]. However, not even this toy model can be solved analytically because nonlinear rates prohibit exact moment closure. Analytical approaches have therefore relied on linearizations assuming small noise, which leads to a matrix equation [25] that in terms of the efficiency E approximately predicts that normalized fluctuations of subunit levels diverge as η11 ∼ 1/(1 − E2) (see Supplemental Material [11]). In the biologically interesting regime of high efficiencies, the prediction thus contradicts the approximation that made the prediction possible. Furthermore, many biological systems are subject to feedback loops or other stabilizing network effects, which tend to have a particularly large effect on fluctuations for systems operating near points of neutral stability. This illustrates a common problem when mathematically analyzing biological networks: an interesting result is derived by approximating a specific model, but it is unclear if the principles identified apply to real systems.

To analyze these processes as incompletely specified reaction networks, we instead apply Eq. (6) to the general reaction system of Eq. (7), which leads to a set of three equations whose solutions for η00, η11, η01 are constrained by Eq. (3). The system is not assumed to be symmetric with respect to control, and without approximations, it follows that one of the variances must satisfy

ηiisiixi1E/22(1E), (9)

for i = 0 or 1. This hard bound also applies to the average of the two variances and shows that in any system that includes the reaction module of Eq. (7), fluctuations in the pools of free subunits diverge as the efficiency approaches 100% (Fig. 1). For example, with 95% efficiency, variances must be at least 5 time greater than for Poisson processes with the same averages.

Similar effects may be familiar from word games like Scrabble, where letters are drawn randomly and used to form words: because most words consist of both vowels and consonants, the type currently in shortage has faster turnover, which reinforces the shortage. The frustrating fluctuations between having mostly vowels or mostly consonants can be reduced by forfeiting a turn and trading in the letters for new ones or by playing words with more letters of one type, like “crwth” or “eau”. Cells face the same choice between lower efficiency and using the surpluses for other types of complexes. Because the latter may be difficult, many subunits in cells, whether in anabolism [23], translation [26], antisense RNA control [27,28], or protein complexes, may thus appear very noisy simply as a side effect of efficient complex formation. In fact, recent evidence [29] suggests that developmental mechanisms can utilize the fluctuations generated by such processes to create distinct and long-lived behaviors under constant growth conditions.

Variability as a necessity of control

Nongenetic heterogeneity can reflect probabilistic processes involving low numbers of molecules or bursts, but may also be an unavoidable by-product of efficient processes, as highlighted by the stochastic assembly example above. Next, we show that such heterogeneity can similarly be a consequence of effective control, much like the temperature of radiators must vary to maintain a constant room temperature. In particular, we combine Eqs. (6) and (3) to exactly quantify how the rates of self-corrective systems must vary in response to deviations. We consider all possible networks in which the fluctuations of some X1 are controlled via another component X0 defining only the following subset of the reactions,

x0R(u,x0,x1)x0+ax1ax0x1+bx0x0/τ0x01x1x1/τ1x11. (10)

The feedback control function R(u, x0, x1) is again left completely unspecified and can depend in arbitrary ways on X1 and X0 as well as an unspecified set of variables u(t) := x2, x3, x4, … that themselves can depend on X1 and X0 (as indicated by the cloud of components in Fig. 2(a). Such schemes arise in many contexts, and may, for instance, describe generalized transcriptional feedback control in genetic circuits, where the levels of a protein (X1) are controlled by changing the production rate of its cognate mRNA (X0), whose levels in turn set the translation rate of proteins. Because we left the processes in the cloud unspecified, we, e.g., allow for any systems in which protein levels affect the transition rates between different promoter states.

Fig. 2. Effective control implies variability.

Fig. 2

(a) We consider all possible feedback control systems in which fluctuations in component X1 are controlled via X0 as defined in Eq. (10). The specific type of feedback control is left unspecified (cloud of components). The exact bound of Eq. (11) establishes how much rate variability is needed to reduce X1 fluctuations when X0 and X1 have equal lifetimes (solid red line). We plot here the normalized standard deviation of X1 relative to its uncontrolled noise levels s11/x1. The dashed line indicates a marginally tighter but algebraically more involved bound (see Supplemental Material [11]) that is virtually indistinguishable from Eq. (10) in this range. Dots correspond to numerical solutions to various instantiations of the feedback systems for a range of response functions and parameters. (b) We consider any component X1 undergoing first order degradation while embedded in an arbitrarily complicated control network (cloud). Equation (12) bounds the relative noise suppression in such components in terms of the scaled variation of their production rate R (red line). This bound is provably tight (see Supplemental Material [11]).

Substituting Eq. (10) into Eq. (6) leads to a system of three equations that cannot be solved for the (co)variances of interest. However, applying Eq. (3) to X1, X2, and R shows that η00η11ηRRηRRη012η00ηR12η11ηR02+2η01ηR0ηR10, leading to a constraint on the space of possible solutions. The resulting bound is analytical but algebraically involved (see Supplemental Material [11]). We here present a simpler bound, which is also proven exactly and only marginally more conservative, for the interesting and biologically relevant special case of τ0 = τ1

ηRRs11/x14(η11s11/x11)2(η11s11/x1)3. (11)

This establishes the minimum variation in the rate R required to achieve a desired reduction in X1 fluctuations, relative to their uncontrolled noise levels, 〈s11〉/〈x1〉, see solid red line in Fig. 2(a). Equation (11) shows how very greatly the rates must vary to suppress fluctuations in X1 as the maximum noise suppression asymptotically depends inversely on the third root of ηRR.

We can broaden the assumptions even further by only specifying that a component of interest X1 undergoes first-order degradation, while allowing for arbitrary feedback control of the production, with rate R = R(u, x1). The above approach then immediately shows that fluctuations in X1 levels are bounded by the variation in the production rate according to

ηRRs11/x1(η11s11/x11)2(η11s11/x1)1. (12)

In the regime where noise suppression is significant, the normalized standard deviations are then asymptotically constrained by σ̃11σ̃RR ≳ 〈s11〉/〈x1〉. Suppressing noise in abundances thus requires particularly great variation in rates when abundances are low or bursts are large. For example, when 〈s11〉 is comparable to 〈x1〉, reducing the relative standard deviation in X1 to 10% requires a relative standard deviation of R close to 1000%.

Conceptually related trade-offs are reported in control theory, connecting performance to the cost of control [30], but here we derived direct bounds of one in terms of the other, without linearizations or continuity approximations, for arbitrarily complex systems.

Equations (11) and (12) show that heterogeneity in rates, which in turn implies heterogeneity in abundances, is a necessary feature of control. Rates are sometimes modeled as highly nonlinear functions of concentrations, but such models generally condense several elementary reactions into effective events, e.g., modeling protein production as a function of transcription factors while ignoring mRNAs. However, the variation in the fast variables are then still closely connected to the variation in rates. Contrary to the common perspective that mRNA noise necessarily begets protein noise, a wide distribution of mRNA levels across a population of cells is thus required to significantly reduce spontaneous protein fluctuations by any type of feedback mechanism that operates via transcription. This means that molecular networks need “sacrificial” components with large variation to ensure that other components remain constant. Since it is rarely known which components are controllers and which are controlled, and it is rare to measure both in the same individual cells, the components that experimentally appear the noisiest may in fact be the ones that help provide the tightest control.

Conclusions

Systems biology has tried to make sense of the complexity of biological networks by identifying characteristic properties of common kinetic motifs. The challenge is that, in real cells, these are generally embedded in larger networks such that the behavior the parts would display in isolation can become irrelevant. However, here we show that when further accounting for the probabilistic nature of chemical reactions, it is still possible to identify universal features of such motifs regardless how they affect and are affected by the rest of the system. This can identify broad rules for what biological systems cannot do, and greatly reduce the number of assumptions when testing kinetic models against data.

Supplementary Material

Acknowledgments

We are grateful to S. Eule, A. Klein, M. Loose, N. Lord, S. Tal, R. Ward, and S. Reuveni for comments on the Letter. The work was supported by Grant No. 1137676 from the Division of Mathematical Sciences at the National Science Foundation and Grant No. GM081563 from the National Institutes of Health.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

RESOURCES