Self-Consistent Proteomic Field Theory of Stochastic Gene Switches

Aleksandra M Walczak; Masaki Sasai; Peter G Wolynes

doi:10.1529/biophysj.104.050666

. 2004 Nov 12;88(2):828–850. doi: 10.1529/biophysj.104.050666

Self-Consistent Proteomic Field Theory of Stochastic Gene Switches

Aleksandra M Walczak ^*, Masaki Sasai ^†, Peter G Wolynes ^*,‡

PMCID: PMC1305159 PMID: 15542546

Abstract

We present a self-consistent field approximation approach to the problem of the genetic switch composed of two mutually repressing/activating genes. The protein and DNA state dynamics are treated stochastically and on an equal footing. In this approach the mean influence of the proteomic cloud created by one gene on the action of another is self-consistently computed. Within this approximation a broad range of stochastic genetic switches may be solved exactly in terms of finding the probability distribution and its moments. A much larger class of problems, such as genetic networks and cascades, also remain exactly solvable with this approximation. We discuss, in depth, certain specific types of basic switches used by biological systems and compare their behavior to the expectation for a deterministic switch.

INTRODUCTION

Genetic switch systems are an elementary means of regulatory control present in every living organism. Their complexity and details differ, but the general mechanism, that of the expression of a given gene being regulated by proteins, is believed to be universal (Ptashne and Gann, 2002). They are building blocks of larger regulatory elements: genetic networks and signaling cascades. The pathways by which these systems operate are passed on from generation to generation. Understanding their stability and characteristics is therefore fundamental. A lot of previous work has considered a deterministic description of genetic switches (Ackers et al., 1982; Hasty et al., 2001). The need for a stochastic treatment of genetic switches, due to the single copy of the DNA molecule and multiple protein molecules in the cell, has been largely recognized (Sneppen and Aurell, 2002; Kepler and Elston, 2001).

The most general way of accounting for nondeterministic processes is to write down the master equation for a given system. To define the state of the switch, one must specify the DNA binding states of particular genes and the number of proteins of each type. The probability distribution for even a single switch consisting of two genes, the product proteins of which act as regulator proteins for the system, may not be determined exactly and approximations must be considered (Bialek, 2001; Hasty et al., 2000; Sneppen and Aurell, 2002).

Several approaches to account for the probabilistic nature of chemical reactions have been undertaken, ranging from the Langevin description of single genes (Bialek, 2001), and two interacting gene switches (Hasty et al., 2000), to the master equation reduced to Fokker-Planck equation considerations (Kepler and Elston, 2001; Hasty et al., 2001a). A dynamical action formulation has also been used (Sneppen and Aurell, 2002) to determine the lifetimes of states of the switch. A popular alternative to purely analytical methods, which often need to make approximations or are limited to very simple model systems, has been to conduct stochastic simulations of genetic switches. Two types of simulations are mostly used. In the first, the randomness of the system is introduced by means of a Monte Carlo algorithm with a fixed time step (Paulsson et al., 2000). The second is based on the Gillespie algorithm (Gillespie, 1977) to predict the probability of a given reaction occurring (Arkin et al., 1998). For single-gene systems, stochastic simulations have shown that the stochasticity in the system is responsible for the bimodal probability distributions (Cook et al., 1998) that have been experimentally observed. These methods prove very useful, because they allow us to test the theoretical predictions on model systems that might be hard to build experimentally. However, this approach often does not enable us to gain intuition or insight into the mechanisms behind the functioning of the system. The aim of the present work is to gain a better and deeper understanding of the device physics of genetic switches. We therefore, contrary to many important previous discussions (McAdams and Arkin, 1997; Aurell et al., 2002; Vilar et al., 2003), do not present a specific concrete biological system, but discuss generic behavior and try to understand its sources. Our approximation also allows for an exact solution of a broad class of genetic switch systems without any further assumptions and with little computational effort. Hasty et al. (2001b) present an overview of the existent theoretical approaches.

A popular approximation assumes that the DNA binding state reaches equilibrium much faster than the protein number state. Therefore the adiabatic approximation is often considered (Ackers et al., 1982; Sneppen and Aurell, 2002; Darling et al., 2000), allowing for a thermodynamic treatment (Ackers et al., 1982) of the DNA binding state. The protein number fluctuations are then treated stochastically. Even before the statistical thermodynamics approach of Ackers et al. (1982) using partition functions, much previous work assumed the DNA binding and unbinding can simply be accounted for by an equilibrium constant, since the relaxation timescales for equilibration of the DNA state are much larger than those of the protein numbers, which require protein synthesis and degradation to change. The partition function approach has also been successful for looking at logic gates built from switches (Buchler et al., 2003). The adiabatic approximation is believed to hold true in many cases, judging by the experimental parameters of biological switches (Darling et al., 2000). But as the experiments of, for example, Becskei et al. (2001) show, not all switches need to function within the adiabatic limit and the nonadiabatic limit may result in new phenomena. We therefore consider a wide range of parameter ratios in our discussion.

In this article we explore more fully an approximation, previously used by Sasai and Wolynes (2003) for the variational treatment of the problem, the self-consistent proteomic field (SCPF) approximation. Within this approximation one assumes that the probability of finding the switch in a given state is a product of the probabilities of states of individual genes. One can then solve the steady-state master equation for the probability distribution of many regulatory systems exactly. We discuss the approximation and present a detailed study of different classes of genetic switches, some of which have never been considered theoretically. We consider separately several particular features of such systems that are found in known switches, to be able to characterize their contributions to the behavior of the whole system. To be specific, starting from a symmetric toggle switch, we go on to compare the effects of multimer binding, and of the production of proteins in bursts, on the stability of the switch.

The stochastic effects prove to be modest for symmetric switches without bursts, especially if the genes have a basal production rate. We find the deterministic and stochastic SCPF solutions to have similar probabilities of particular genes to be on, and similar mean numbers of proteins of a given species in the cell. However, in the nonadiabatic limit, when the unbinding rate from the DNA is smaller than the death rate of proteins, the probability distributions have two well-defined peaks, unlike in the deterministic approximation or adiabatic limit of the stochastic SCPF solution.

We also show that the effect of stochasticity on the observables becomes more apparent when proteins are produced in bursts. In these types of switches, the definition of the adiabatic limit, which was clear for the switches in which proteins are produced separately, is no longer simple. Our discussion shows that the properties of genes often analyzed in the deterministic limit, may be strongly influenced by stochasticity in this case. Randomness in a biological reaction system leads to quantitative and, in many examples, even qualitative changes, from predictions of deterministic models.

We also discuss the differences in the behavior of asymmetric and symmetric switches. We point to the mechanisms resulting in different types of bifurcations and show how they are influenced by noise. Within the SCPF approximation, switches that are regulated by binding and unbinding of monomers do not have regions of bistability. This holds true for both symmetric and asymmetric switches. When proteins are produced individually rather than in bursts, fast unbinding from the DNA can effectively minimize the destructive effect of protein number fluctuations on the stability of the DNA binding state. Furthermore, a detailed analysis of the probability distributions shows that they have long tails, and are far from Poissonian in both the adiabatic and nonadiabatic limits. We discuss the properties of the system in terms of clouds of proteins buffering the DNA. We show how fast or slow DNA binding characteristics and protein number fluctuations influence the stability of the buffering clouds, leading to specific emergent behavior of observables. Throughout the article, a comparison is made between results of the exact stochastic solution, to solutions of the deterministic kinetic equations for the system within the self-consistent proteomic field approximation.

We establish a base of potential building blocks of more complicated switches and systems, such as networks and signaling cascades, for which an exact solution within the present approximation can also be obtained. A detailed discussion of these larger systems will be the topic of another article. We also present limitations of the present style of analysis where exact solutions are not possible.

There are two aims of this article. The first is to discuss the self-consistent field approximation and show that it has an exact solution that could be extended to a large class of systems. This approximation lets one deal in a straightforward and computationally inexpensive manner with the effect of random processes on genetic networks. The second is to discuss the many components of biological switches present in nature and in engineered systems, in the necessary stochastic framework.

THE SELF-CONSISTENT PROTEOMIC FIELD APPROXIMATION

The basic mechanism of gene transcription regulation in prokaryotes may be reduced to the binding and unbinding of regulatory proteins, repressors, and activators, to the operator site of the DNA. If we use this simplified treatment, which neglects extra levels of regulation such as the binding of RNA polymerase, effectively each gene can be described as being either in an active (on) state, when the repressor is unbound (activator bound); or in an inactive (off) state, with the repressor bound (activator unbound). The stochastic system of a single gene and its product proteins is described by the joint probability distribution Inline graphic of the number of product proteins in the cell n, and the DNA binding-site state, as on (protein not bound) = 1; and off (protein bound) = 2. To conserve probability,

If one considers two interacting genes, the description in terms of a joint probability vector needs to be extended to four states: both genes may be on, or off; or one of the genes may be on, and the other off. If the two genes do not interact, as would be the case for two self-regulatory proteins, the probability of finding the two-gene system in a given state, defined by both the number of product proteins and the DNA binding-site state, would be the product of the states of particular genes P_jj′(n₁,n₂;t) = P_j(n₁;t)P_j′(n₂;t). This is generally not true for two interacting proteins, as is the case in a genetic switch. However, as a first approximation to the problem, one can ignore correlations between the spaces of the two genes and assume the space of the switch is a sum of spaces of the genes that compose it. Since we are looking for solutions in which the symmetry of the system is broken and different behaviors of the on- and off-state of a gene are possible, we must allow for different probability distribution functions for the on- and off-states. This is analogous to the unrestricted Hartree approximation in quantum mechanics, where allowing different spatial functions for spin-up and spin-down states results in breaking of the symmetry of the bound molecular orbital solution to the dissociated solution of two separate hydrogen atoms with opposite spin-states for large internuclear distances. We therefore allow for multiple solutions for a given set of parameters. The total probability of having a given gene state j and n_i proteins of that type is simply given by P_j(n_i,n_i′) = P_j,j′=0(n_i,n_i′) + P_j,j′=1(n_i,n_i′).

The self-consistent approximation is a crude one, since in the case of the genetic switch, the state of a given gene is often determined by the number of protein products of the other gene. However, within this approximation, one can solve the master equation for the probability distribution exactly, without any further approximations. This yields a powerful computational tool, which simultaneously gives useful insight.

THE TOGGLE SWITCH

For clarity of exposition, we show how the problem may be solved exactly within the self-consistent proteomic field approximation on a well-defined system of the toggle switch. We then expand the method to apply to other systems. The elementary system we use as an example is composed of two genes, labeled 1 and 2, as presented in Fig. 1. Gene 1 produces proteins of type 1, which act as regulatory proteins, i.e., repressors, on gene 2. The product of gene 2, proteins of type 2, in turn repress gene 1. In this simplified model, we assume that protein production occurs instantaneously upon unbinding of the repressor. For now, we assume that repressor proteins bind as dimers, since that is a common scenario in biological systems, but we do not treat dimerization kinetics explicitly. For simplicity, the coupling form between the genes responsible for binding will be taken to be of the form Inline graphic where p is the order of the multimerization of the repressor. This form is a small approximation to the more exact h_in_3−i(n_3−i − 1)…(n_3−i − p + 1). We have checked that using the simpler monomial does not influence the results in any regime discussed. We also do not account for the existence of mRNA molecules and the consequent time delays owing to their synthesis as intermediates. The extensions of the model are discussed later.

Within the self-consistent proteomic field approximation, the set of master equations for the corresponding system is of the form

(1)

for n ≥ 1 where the i = 1,2 refers to the gene label. P₁(n₁) describes the probability of gene 1 being in the on-state and there being n₁ protein molecules of type 1 in the cell. The first term on the right-hand side of Eq. 1 describes the production of proteins of type i with a production rate g_j(i), where j = 1, 2, depending on whether the gene is in the on- or off-state. The second term accounts for the destruction of proteins with rate k_i. The binding of repressor proteins produced by the other gene is proportional to the number of dimer molecules present in the system n_3−i with rate h_i. We assume unbinding occurs with a constant rate f_i. Binding and unbinding contributes to the kinetics of the DNA binding states, as described by the last two terms. This set is supplemented by the P_j(n_i = 0) equations to account for boundary conditions.

(2)

For convenience, let us define Inline graphic the probability of finding the DNA binding site in a given state. One can now sum the P_j(1) equations over the number states of the second protein with P₁(2) + P₂(2), and likewise the P_j(2) equations. Due to the SCPF approximation, the only term affected is the repressor binding term Inline graphic and since the summation results in where is the second moment of the number distributions of type 2 proteins produced when gene 2 is in the j^th state. The equations of motion of the moments of the probability distribution are of the form

(3)

The steady-state equations for the moments of the distributions that follow are closed-form; the Inline graphic order moment equation of motion depends only on the lower moments of the i^th gene and

To analyze the behavior of switches we introduce the following scaled parameters: the adiabaticity parameter ω_i = f_i/k_i, which represents the characteristic rate of change of the DNA state compared to the characteristic rate of change in protein number, Inline graphic measures the tendency for proteins to be unbound from the DNA; the effective production rate; and distinguishes between the two DNA states in terms of protein dynamics. We present a detailed derivation of the moment equations in Appendix A.

The resulting equations for the 0^th moments couple to the higher moments by the interaction function F(i). These lower moments can be solved self-consistently. The resulting solution predetermines all the other moments, which completely describe the probability distribution. Each gene therefore couples to the other gene by the influence of the self-consistently generated proteomic field. One could define the generating function and calculate the probabilities of having a given DNA binding state j for the i^th gene when there are n_i proteins of type i in the cell. In practice, it is easier to go back to the steady-state master equation and solve directly for the probability distributions than sum an infinite number of moments. Rewriting the steady-state master equation (Eq. 1) one gets

(4)

These sets of equations give recursion relations for P_j(n_i) that one can use to express P_j(n_i) as a function of P₁(0) and P₂(0). The normalization condition Inline graphic gives P_j(0) in term of constants and the result is the probability function P_j(n_i) as a series. The SCPF approximation reduces the two-gene problem to a one-gene problem parameterized by the moments of the second gene, which can be worked out independently, as we have already shown, and these are represented by F(3−i)—which is a constant in terms of this calculation.

To see the effect of the stochastic nature of the system, we compare the exact solutions of the self-consistent field approximation equations to the results that would follow from deterministic kinetic rate equations for the number of proteins of each type and the fraction of on/off DNA binding states for each gene,

(5)

where n(i) is the number of proteins of type i present in the cell. The exact SCPF equations reduce to the deterministic kinetic equations in the limit of large ω and X^ad for the case discussed above. The F(3−i) term in the stochastic SCPF equations is replaced by the n²(3−i) term in the deterministic kinetic rate equations. For the toggle switch, where repressors bind as dimers, it is easily shown that the interaction functional may be rewritten in the form

(6)

which in the large ω-limit reduces to F(i) = 〈n(i)〉² + 〈n(i)〉. So for large mean numbers of proteins present in the cell, which corresponds to large effective production rates X^ad, 〈n(i)〉 of the order of hundreds is a small correction to 〈n(i)〉². We therefore reproduce the deterministic kinetics result.

As shown by Sasai and Wolynes (2003), the difference in the probability that gene 1 is active and that gene 2 is active, ΔC = |C₁(1)−C₁(2)|, plays the role of an order parameter. We can now consider a family of switches and discuss their stability, sensitivity of regions of bistability to control parameters, and types of bifurcations.

THE SYMMETRIC TOGGLE SWITCH

For pedagogic purposes we will start by analyzing the single symmetric toggle switch, such as discussed above, in which repressors bind as dimers, with ω₁ = ω₂ = ω, Inline graphic and as it is the most intuitive and shows the most generic behavior. It is an academic example, as even individual genes in switches engineered in the laboratory mostly have different chemical parameters. Yet a lot can be learned from this simple system.

The general mechanism of the phase transition

Fig. 2 shows the phase diagrams for the system, |ΔC|, as a function of reservoir protein number and the adiabaticity parameter for the exact SCPF equations for growing values of the parameter describing the tendency that proteins are unbound from the DNA, X^eq. The deterministic kinetics and exact SCPF approximations give qualitatively similar results. The analogous deterministic kinetic phase diagrams agree with the SCPF solutions in the large ω- and X^ad-limit, hence they become more similar with growing X^eq, as the bifurcation occurs at larger effective production rates for larger X^eq. For large fluctuations and a small unbinding rate, neither gene 1 nor gene 2 is favored and the probability of a given gene to be on is determined solely by the effective production rate of the other gene and decreases in a quadratic manner as the number of repressor proteins grow (Fig. 3). Since the switch is symmetric, the system has one stable state, ΔC = 0, where the probabilities of the genes to be on are equal. As the relative protein number fluctuations get smaller and the DNA unbinding rate grows, a proteomic cloud buffers the repressed gene, keeping it repressed. The symmetry of the system is broken and the solution bifurcates into two separate basins of attraction. For the stochastic SCPF equations the bifurcation takes place for larger effective production rates (larger X^ad), than for the deterministic equations, even in the large ω-limit, which depicts their sensitivity to fluctuations. The critical number of reservoir proteins necessary for the bifurcation of the solution to take place is the same in both approximations and is determined by 〈n〉_c = (X^eq)^½ (Fig. 3). In the discussed example, 〈n〉_c = 32 = 1000^½, for X^eq = 1000. For the deterministic kinetic switch the bifurcation takes place when C₁(i) = (1 + 〈n(3−i)〉²/X^eq)⁻¹ = 0.5, due to the simple form of the interaction function equal to 〈n(3−i)〉² = (2X^adC₁(3−i))². So C₁(i) = 0.5 is equivalent to the 〈n(3−i)〉²/X^eq = 1. In a noisy system larger effective production rates are needed to achieve the critical value of proteins. The interaction function in this case may be written as Inline graphic and always. So at 〈n〉_c, F(3−i)/X^eq > 1 and the probability of the genes to be on is <0.5, therefore The mechanism of the bifurcation requires the two genes to be more likely to be unbound than bound for the phase transition to take place. The curvature of the null clines presented in Fig. 2 can be simply worked out to be of the form Inline graphic with ζ_i,ξ_i constants determined by the specific value of C₁(1), C₁(2).

Phase diagram obtained as an exact solution within the SCPF approximation for the single symmetric switch when repressors bind as dimers with X^eq = 1 (A), 100 (B), and 1000 (C). Contour lines mark values of ΔC.

Probability that genes are in the active state (A), the mean number of proteins of each type present in the cell 〈n(i)〉(B), and the mean number of proteins of each type present in the cell if gene i is in the on-state 〈n₁(i)〉 (C) as a function of X^ad = δX^sw for a symmetric switch. Exact solutions of the SCPF approximation equations compared with deterministic kinetic rate equations solutions, for a single symmetric switch, X^eq = 1000, ω = 0.5.

Adiabaticity parameter dependence

As the adiabaticity parameter decreases, the area of phase space corresponding to multiple solutions decreases (Fig. 2). For very small values of the adiabaticity parameter, there exists only one solution that corresponds to a state in which the two genes are off. The value of ω below which only one solution exists decreases with the tendency for proteins to be bound, but exists for all values of X^eq. Therefore if the two genes have very high repressor binding affinities, the critical number of proteins necessary for the phase transition to take place cannot be formed, even for very high production rates. This region of parameter space where one solution is possible corresponds to a situation in which a buffering proteomic cloud may not form, due to a very fast destruction rate of proteins or a very small unbinding rate from the DNA. The critical number of proteins necessary for the bifurcation to occur grows with the tendency for proteins to be unbound from the DNA (X^eq), as the cloud buffering the genes needs to be bigger and exhibit smaller relative protein number fluctuations, which effectively decrease with the growth of the adiabaticity parameter. This is further discussed in terms of the probability distributions. Therefore a monostable solution exists at all values of the effective growth rate, X^ad, for larger values of ω at large X^eq than at smaller X^eq values. The bifurcation point is a result of competition between the number of reservoir repressor proteins and the tendency for proteins to be unbound from the DNA. This is clear from the dependence of the number of proteins present in the cell at the bifurcation point on the relative values of X^ad and X^eq, but not the adiabaticity parameter ω.

Mean protein numbers

The total number of proteins present in the cell, produced both in the on- and off-states, asymptotically away from the bifurcation points is the same for the deterministic and stochastic approximations, and it is given by 〈n(i)〉 = 2X^ad, when C₁(1) ≈ 1 the probability of the gene to be on is close to unity. The number of proteins of a given type present in the cell, when the gene that produces them is in the on-state, is always considerably smaller in the noisy system than in the deterministic case (Fig. 3 C). Since the production rate in the off-state was assumed zero, in the deterministic case no proteins of a given type are present in the cell if the gene is in the off-state, unlike in the noisy system. Therefore the number of proteins in the deterministic system is nonzero only if the gene is on. But interaction of the DNA binding state with the proteins buffering it results in a residual number of proteins present in the off-state for all values of ω. The region of bistability of the switch in parameter space grows as the binding rate increases with respect to the unbinding rate, stabilizing the DNA binding states. As the susceptibility of the system to fluctuations increases, the deterministic equations prove to be a poor approximation to describe the state of the system.

Gene-buffering proteomic cloud interactions

The stochastic nature of the system also manifests itself at the DNA level (Fig. 2). As the tendency for proteins to be unbound from the DNA grows, the area of parameter space wherein multiple solutions are possible decreases—since a larger number of proteins is needed to reach a state in which two genes are more likely to be repressed (protein bound state) than at small X^eq. For small unbinding rates or large binding rates, regardless of the ratio of the rate of unbinding of repressors from the DNA to protein degradation, bistability requires smaller numbers of proteins, which correspond to larger relative fluctuations, than for large X^eq. Therefore a larger unbinding rate relative to the binding rate makes the system more susceptible to protein number noise. Competition between X^eq and 〈n(i)〉 results in X^eq, for a given null cline, being a parabolic function of X^ad, for the dimer binding case, with coefficients determined by ω and C₁(i). This is easily generalized to higher order functions for higher order (p) oligomers, and results in p-order dependence. The switching region, by which we mean that the region of parameter space between the bifurcation point and ΔC > 0.9 decreases as the binding and unbinding rates become comparable (X^eq decreases). As discussed above, the probability of the genes to be on at the bifurcation point tends to 0.5 as the adiabaticity parameter grows (Fig. 3), therefore the probability to be on has to increase by a smaller ΔC to reach C₁(i) = 1. Therefore the switching region decreases also as the unbinding rate from the DNA grows, since smaller effective production rates are needed to reach ΔC = 1, than for small ω. Small values of ω correspond to large fluctuations in the DNA binding state as well as the protein number state, and result in destabilizing the gene-buffering protein cloud interactions. Hence very large effective production rates are needed for ΔC > 0.9. Therefore the DNA unbinding rate must become considerably faster compared to the protein degradation rate for the switch to have two stable solutions in a large region of parameter space.

The probability distributions

A better understanding of the bifurcation can be gained from examining the probability distributions. Fig. 4, A and B, and Fig. 4, C and D, show the evolution of the probability distributions of gene 1 and gene 2, respectively, to be on and off as functions of X^ad. The peak of the distribution decreases and the width spreads out as the control parameter grows, until it reaches the bifurcation point at X^ad = 44. Then the value of the probability function corresponding to the most probable number of proteins grows again. The spread of the functions grows as the effective production rate in the on-state increases; it narrows, however, with the increase of the adiabaticity parameter, as would be expected, since the DNA state fluctuations become smaller with ω. The average number of proteins in the cell in the on-state (ΔC > 0.9) does not show a dependence on ω. Yet as the unbinding rate from the DNA becomes very fast compared to the protein number fluctuations, the system switches often between the two states, hence a large number of proteins is present even in the off-state. If the DNA unbinding rate is small, the protein number characteristics follow the DNA state having time to reach a steady state within each well, before the DNA binding site switches into the other state, so the number of proteins in the off-state falls to zero (Fig. 5, A and B). This results in a two-peak, bimodal probability distribution (Fig. 4). If ω is large, random fluctuations in the DNA state do not change the effective state of the system, since a residual high mean protein number is present even in the off-state. In such a case, lower effective production rates than for small ω result in higher protein yields, and hence smaller switching regions.

Evolution of probability distributions for the probability of the gene that will be active (on) after the bifurcation to be on (A) and *off* (B) and the gene that will be inactive (*off*) to be on (C) and *off* (D) as a function of the order parameter X^ad for a symmetric switch. The bifurcation occurs at X^ad = 44, X^eq = 1000, ω = 0.5.

Probability distributions for the gene to be in the on-state (A) and off-state (B) for a gene in the active state for different values of the adiabaticity parameter ω = 0.5, 10, 100. X^eq = 100, X^ad = δX^sw = 100. Comparison of probability distributions obtained by exactly solving the steady-state equations in the SCPF approximations with analogous Poissonian distributions (C and D). Symmetric switch, X^ad = 44, X^eq = 1000, ω = 0.5.

For small ω one might expect Poisson distributions of proteins in each of the DNA states, since the unbinding rate from the DNA is smaller than the protein degradation rate, so the proteins may reach a steady state without the DNA state changing. Hence, effectively proteins would feel the effect of only one well and be subject to a birth/death process. This is not true, however. The difference between the exact solution and a solution obtained within a Poissonian approximation to the state of the system is surprisingly large, owing to the skewed tails of these distributions. Fig. 5, C and D, compares these probability distributions with distributions for the same system if one assumes a Poissonian probability function. The distributions obtained as an exact solution within the SCPF approximation are clearly not symmetric, but exhibit long tails toward zero. Therefore, although the most probable values of the two types of distributions are similar, noise has a destructive impact on the system, resulting in a larger probability of having a smaller number of proteins in the cell than expected based on a Poissonian distribution, whose higher moments are equal to the mean. Therefore, a larger production rate is needed for one of the states to be favored as a result of noise, than that predicted from a symmetric probability distribution. The most probable number of proteins in the on-state, if the unbinding from the DNA is slow, is zero, unlike the number predicted by Poissonian distributions. The influence of noise on protein number fluctuations brings the protein-number means down, as can also be seen from Fig. 3 C. Overall, the spread of the probability distributions is large, and their characteristics for small values of the control parameters are different from those predicted by Poissonian distributions, let alone by deterministic kinetic equations; therefore the effects of stochasticity may not be neglected.

The nonzero basal effective production rate case

The above analysis concerns a switch with a zero basal production rate, so proteins were not produced in the off-state. In a number of biological systems (Ptashne and Gann, 2002) a nonzero basal production rate exists and we now turn to consider the effect of this on a symmetric switch. Fig. 6 B shows the dependence of the bifurcation curves for different values of the effective basal production rate g₂/(2k). Values <1, when the death rate is larger than the production rate, show that, for the symmetric switch, assuming the effective production rate to be zero in the off-state is a reasonable approximation. If the on-state has a positive input to the number of reservoir proteins present due to g₂/k > 1, the probability of the active gene to be on, even for very large on-state effective production levels X^ad, is <1. Hence the off-state contributes considerably to the steady-state number of proteins. The solution that corresponds to the more active of the two states may effectively be an off-state, since it has C₁(i) < 0.5, although the effective production rate in the on-state in the bifurcated region of parameter space is much larger than in the off-state (for example, the g₂/(2k) = 20 line in Fig. 6 B). As the effective basal production rate increases, a larger production rate in the on-state than for small g₂/(2k) > 1 is required to reach the critical number of proteins for the bifurcation to take place, which is given by 〈n(i)〉= 2X^adC₁(i) − g₂/k(2C₁(i) − 1). For this reason, even for the deterministic approximation at the bifurcation point, the two genes must be more probable to be off, as can also be seen for the exact SCPF solutions from the probability distributions (Fig. 7, B, C, E, and F). Fig. 6 A shows the dependence of the bifurcation curves on the adiabaticity parameter, which tend to the deterministic case for large ω. A closer analysis of the g₂/k > 1 case, since the g₂/k < 1 is analogous to the zero basal production rate case, which has already been discussed, shows that mean properties of the system are in even better agreement with the deterministic solution than the g₂ = 0 case (Fig. 7, A and D). The genes have a nonzero probability of being in the off-state, with the probability distribution of the off-gene having a long tail toward higher protein numbers (Fig. 7, E and F). In the off-state the effective production rate g₂/(2k) is small and the noise input is small, relative to the large protein numbers present in the system. The small effect of stochasticity results in the observed similar mean characteristics. Yet the form of the probability distributions for the genes to be on before the transition is especially broad, with a far smaller probability than those of the off-state (Fig. 7, B, C, E, and F). These clearly show that the two genes are more probable to be in the off-state before the bifurcation point. Therefore, although the average observables are similar for the deterministic and SCPF stochastic solutions, the predicted distributions are unusual.

Nullclines for a symmetric switch, where proteins bind as dimers, when the effective base production rate is g₂/(2k) ≠ 0. (A) Dependence on the adiabaticity parameter ω = 0.005, 0.05, 0.5, 5, and 50, compared to the deterministic equations solution, g₂/(2k) = 5. (B) Dependence on g₂/(2k) = 0.01, 0.1, 0.5, 1.0, 5, 10, and 20, ω = 0.5. X^eq = 1000.

Probability of genes to be on (A) and mean number of proteins of a given type present in the cell (D) for a symmetric switch with an effective base production rate. Evolution of probability distributions for the probability of the gene that will be active after the bifurcation to be on (B) and *off* (C) and the gene that will be inactive to be on (E) and *off* (F) as a function of the order parameter X^ad for the same system. The bifurcation occurs at X^ad = 61, g₂/(2k) = 5, ω = 0.5, X^eq = 1000.

Summary

The symmetric switch is based on a competition between the accessibility of the repressor site and the number of repressor proteins present in the cell. The bifurcation is solely a result of the nonlinearity of the system and introducing noise simply affects the region in parameter space where given states occur. The protein number fluctuations have a destructive role in determining the stability of the bifurcated solution; however, fast DNA unbinding rates can compensate for the destabilizing effect of protein number fluctuations. In this region the stochastic solution predicts similar means to the deterministic case, but the form of the probability distributions which depends on a large number of higher moments is nontrivial. It is a result of the interplay of the DNA binding and protein degradation kinetics.

THE ASYMMETRIC TOGGLE SWITCH

Most switches found in nature are not symmetric. For asymmetric switches, when proteins bind as dimers, the two genes interact, resulting in probabilities to be on, different from those imposed purely by the equilibrium between binding and unbinding. The steady-state solution is a compromise between the tendency that repressors are unbound from the initially off-gene ( Inline graphic for the forward transition, for the backward in the following discussion) and the effective production rate of the initially on-gene ( forward; backward transition), at least for the deterministic case. This results in the characteristic S-curve bifurcation diagram, as presented in, for example, Fig. 12, with possible forward and backward transitions, hence hysteresis. We refer to the transition that occurs with increasing Inline graphic as the forward transition, and that with decreasing as the backward transition. Since is a well-defined function of the probabilities that the genes are on, the simplicity of the deterministic equations allows for a completely analytic discussion of the asymmetric switch. The more complicated form of the exact SCPF equations makes this approach impossible. However, the deterministic rate solution offers valuable insight into the basic mechanism behind the transition.

Bifurcation diagrams as a function of for C₁(1), with g₂(1)/(2k) = g₂(2)/(2k) = 5 (A) and C₁(2) g₂(1)/(2k) = g₂(1)/(2k) = 0.5 (B) for 50, and 500. Comparison of exact solutions of the SCPF and deterministic kinetic equations for an asymmetric switch. ω₁ = ω₂ = 0.5, and

Inline graphic — Bifurcation diagrams as a function of for C₁(1), with g₂(1)/(2k) = g₂(2)/(2k) = 5 (A) and C₁(2) g₂(1)/(2k) = g₂(1)/(2k) = 0.5 (B) for 50, and 500. Comparison of exact solutions of the SCPF and deterministic kinetic equations for an asymmetric switch. ω₁ = ω₂ = 0.5, and

The general mechanism

By combining the steady-state equations of motion for the probabilities of the two genes to be on and noting that, with a zero basal production rate Inline graphic one can derive the form of the deterministic bifurcation curves as

(7)

as a function of C₁(2), and

(8)

as a function of C₁(1). The transition points are determined as the extrema of Eqs. 7 and 8, which are functions solely of the scaled parameter Inline graphic and are plotted on the bifurcation graphs. It is worth noticing that the bifurcation points C₁(i) do not depend on the value of the parameter describing the gene binding kinetics of the gene that is on initially. This is not true for the exact SCPF solution, which cannot be solved analytically, but the bifurcation curve has the more complex form of

(9)

where C₁(1) is a function of ω₂, Inline graphic C₁(2), and The bifurcation point is therefore determined by the protein () and DNA () characteristics and mutual interactions (ω_i) of their two genes. The deterministic approximation therefore greatly simplifies the mathematical mechanism of the transition. This may lead to large errors when studying more complicated biologically relevant systems, where one considers asymmetric switches with nonzero basal production rates and proteins are produced in bursts. The case of the nonzero basal production rate within the deterministic approximation also cannot be solved analytically.

The general picture behind the transition is seen from the deterministic approach. The larger the tendency for proteins to be unbound from the DNA, the larger the effective production rate Inline graphic must be for the transition, from one gene to be active, to the other to be active, to take place —inasmuch as repressor proteins are less likely to bind to the on-gene (i) at large than at small However, if one considers a noisy system, it is effectively harder for proteins to stay bound to the initially off-gene due to the destabilizing effect of DNA binding noise (Fig. 8). For the stochastic system, apart from very low values of the adiabaticity parameter (ω < 0.1) (Fig. 11), there is a threshold number of reservoir proteins that will cause a rapid transition. If we start with a small effective production rate for one type of protein and increase this rate, keeping the production rate of the other gene fixed at an initially higher value, the proteins produced by the gene with the initially smaller production rate repress it gradually and ineffectively, until they reduce the probability of the gene to be on to one-half, for the exact SCPF solution. The number of proteins present in the on-state decreases much more rapidly with the change of Inline graphic —whether it be an increase for the forward transition or a decrease for the backward transition in the examples presented—than the number of proteins in the off-state grows (Fig. 10). Hence, the probability of the initially active gene to be on shows a larger sensitivity to the change of Inline graphic than does the off-state probability. This leads to a rapid transition of the previously active gene to an inactive state (Fig. 9). Such behavior is described by Ptashne (1992) and Ptashne and Gann (2002) in the λ-phage switch; they point out its role as a “buffer against ordinary fluctuations in repressor concentration.” The observed system switches when the “repression probability” drops to 50%, as in the solutions of this model. Our analysis seconds the hypothesis of Ptashne and Gann, inasmuch as the deterministic system lacks this behavior, the transition is rapid, and for certain values of parameters, takes place when the probability of the initially on-gene drops to 80% (Fig. 8). The buffering capabilities of the stochastic system are clearly seen in the long tails toward n = 0 of the probability distributions of the gene that is switching from the on- to the off-state (Fig. 9, A and B).

Dependence of the probability of genes to be on in an asymmetric switch as a function of increasing parameters of one gene in the forward (*top*) and backward (*bottom*) transition for different values of : 5, 50, and 500. All other parameters fixed at and Comparison of solutions of deterministic and exact SCPF equations.

Bifurcation diagrams for an asymmetric switch, presenting as a function of C₁(2) (A–C), and C₁(1) (D–F) for different values of the adiabaticity parameter: ω₁ = ω₂ (A,D), ω₂, with ω₁ = 0.001 = *const* (B,D), ω₁, with ω₂ = 0.001 = *const* (C,F). and

Mean number of proteins of each type present in the cell, according to exact solutions of the SCPF approximation and deterministic kinetic rate equations for an asymmetric switch, with ω₁ = ω₂ = 0.5, and and 500 during the forward (A) and backward (B) transitions in an asymmetric switch.

Evolution of the probability distributions for the two genes to be active for the forward transition (A and B) and the backward (C and D) as a function of for with ω₁ = ω₂ = 0.5; for an asymmetric switch.

The effect of noise on the bifurcation mechanism

The mean number of proteins at the transition point differs for the deterministic and exact SCPF solution (Fig. 10). More repressors are needed to induce the transition in the deterministic approximation than in the stochastic system, since, due to the form of the interaction function for the exact case, F(i) = 〈n(i)〉²(ω + 1)/(ω + C₁(i)) + 〈n(i)〉 > 〈n(i)〉². A smaller number of proteins is therefore needed for the inactive gene to become competitive with the active gene. The mechanism of the transition is different from the symmetric gene case, where a critical number of proteins needs to be reached. The asymmetric switch is based on the competition between the probability that proteins of one kind will repress the opposing genes and the analogous probability for the other kind of proteins. The repression capability is governed by Inline graphic which might be looked upon as the product of the probability of having a certain number of repressor proteins (3–i) in the cell and the tendency for them to be bound to the opposing gene (i). In fact, the transition point in the deterministic case is purely a function of such ratios, Inline graphic In both the stochastic and deterministic cases, the transition points are set by the interaction function which regulates the on- and off-state probabilities of a given gene Inclusion of noise in the system effectively increases the nonlinearity of the system, which results in the already discussed buffering capabilities of the system. Stochasticity alters the very simple competitive mechanism seen in the deterministic kinetics to allow for more levels of control of the stability of the state of the system against random fluctuations.

Further comparison of solutions of the deterministic and stochastic equations leads to the same conclusions as for a symmetric switch. As the tendency for proteins to be unbound from the DNA grows, the difference in the critical number of reservoir proteins necessary for the transition to take place increases for both approximations. The critical number of proteins produced by a given gene necessary for the transition to take place for both genes is, in most cases (see ω dependence discussion), smaller for the exact solutions of the SCPF equations and the difference between the stochastic and deterministic result grows with both Inline graphic and decreases with ω_i (Fig. 10). It has a value of 15 for ω₁ = ω₂ = 0.5 and 2 for ω₁ = ω₂ = 10.

Consider the forward transition. The initially inactive gene is buffered by a cloud of repressor proteins. As one increases the effective production rate of the proteins produced by the inactive gene ( Inline graphic ), the number of proteins that are able to repress gene 2 grow slowly and linearly where C₁(1) ∼const, and form a buffering proteomic cloud around it. In the results presented in the figures of this article, the tendency that proteins are unbound from gene 2, (), is smaller than Inline graphic so gene 1 is able to produce enough repressors to form a stable buffering cloud around gene 2 and turn it into the inactive state at quite modest values of If gene 1 produces proteins less effectively, as the probability of it being repressed is larger than in the previous case, and larger values of Inline graphic are needed to produce enough repressors to achieve a high effective probability of binding, An example of how grows as is seen by comparing the for (Fig. 8) and for (Fig. 11).