Abstract
We chemically characterize the symmetries underlying the exact solutions of a stochastic negatively self-regulating gene. The breaking of symmetry at a low molecular number causes three effects. Two branches of the solution exist, having high and low switching rates, such that the low switching rate branch approaches deterministic behavior and the high switching rate branch exhibits sub-Fano behavior. The average protein number differs from the deterministically expected value. Bimodal probability distributions appear as the protein number becomes a readout of the ON/OFF state of the gene.
Symmetries, described by Lie algebras, have been central tools for new discoveries in quantum mechanics and quantum field theory.1–3 Applications of these techniques to problems in statistical physics have been limited.4,5 Over the last two decades, statistical physics has been widely applied to biological systems.6 Such studies frequently make use of the chemical master equation (CME), typically solved by Gillespie’s direct simulation method.7 This method requires reconstructing probability distributions experimentally from repeated computational runs, a procedure that can overlook important features of the distributions. In a case where exact solutions were obtained to the CME for a self-repressing gene8 in terms of generating functions with a symmetry described by a Lie algebra,9 physical insight into the general behavior of the system was limited by the dimensionality of the parameter space and the lack of physical interpretation of the symmetry. In this paper, we show that this symmetry has three important physical effects. First, an invariant quantity under the group’s action is a quadratic function of a certain ratio of protein removal rates. The two roots of that quadratic induce two branches of the solution. One branch approaches a deterministic regime as the molecular number increases, while the other represents a novel class of stochastic behavior. Second, symmetry in the form of numerical equivalence between the average number of molecules in the system in the deterministic and stochastic regimes is lost when the molecular number becomes sufficiently small. Finally, actions of the group that leave the system unchanged in the deterministic limit have two completely distinct effects in the stochastic regime depending on whether the system can be defined by a Langevin approximation or not. These physical manifestations of the underlying symmetry also provide a systematic characterization of the model’s behavior in the entire parameter space of the exact solutions. This mathematical characterization has been previously reported by use of the Poisson representation.14,15 The analysis of symmetries reveals new phenomenology in a fundamental physical system for investigating gene networks. The use of group theoretical techniques on generating functions represents a new class of applications of this technique for related problems involving the CME10,11 and other applications.12,13 It has provided us with a way to identify the building blocks necessary to understand the working of randomness and invariance in biological systems.
We conceive a deterministic model for a negative self-regulating gene as an ensemble of genes (operators, in the case of a prokaryote) at concentration [OT]. The operators may be in the ON or OFF state if they are, respectively, unbound or bound to the regulatory protein. The concentration of ON (OFF) operators is indicated by [O] ([OP]), with [OT] = [O] + [OP], and the protein concentration is given by [P]. The macroscopic reaction scheme is given by
| (1) |
All macroscopic rate constants are written with hats. , , and each have units of minute−1, while ĥ has units of liter/minute. (1) implies that
We denote the steady state concentrations of [O], [P], and [OP] by , , and . Then
| (2) |
| (3) |
where K, the equilibrium affinity, is given by liters and . Equation (2) indicates that the rate at which operators move from OFF to ON and ON to OFF is, respectively, given by and . Thus, the total rate of operator switching in both directions is . Note that the expected concentration of proteins is given by the product of the ratio between the protein synthesis and degradation rates and the concentration of ON operators. Hence, at the limit of small affinity of the repressor for the operator (K → 0), , the expected number of proteins in the absence of regulation.
A stochastic model for the negative self-regulating gene has been proposed in terms of two random variables, the protein number, denoted by n, and the operator state, which can be ON or OFF.8 The steady state probability of finding n proteins and the operator ON or OFF is denoted by αn or βn, respectively. In the stochastic model, we replace the reaction rate constants of Eq. (1) by propensities represented by the unhatted symbols , , , and h = Vĥ, where V is the system volume.7 Note that now we may consider a single gene instead of an ensemble and the proportion of ON operators of the deterministic model becomes the marginal probability of finding the operator ON, . The marginal probability of finding n proteins in the cytoplasm independent of the operator state being ON or OFF is given by ϕn = αn + βn and is computed in terms of the KummerM functions so that
where (x)n denotes the Pochhammer symbol defined by (x)n = x(x + 1) ⋯ (x + n − 1) and (x)0 = 1, and
| (4) |
N is the average number of proteins at the steady state regime if the operator is fully ON. z0 gives the proportion of protein removal from the cytoplasm by first order decay. a is the ratio of the OFF to ON transition rate to the protein degradation rate. a ≫ 1 (a ≪ 1) indicates a regime where, on average, the OFF operator switches back to the ON state faster (slower) than the time required for protein degradation. b gives the ratio of the operator switching to the protein removal rates. The operator switching rate is the sum of the average OFF to ON switching rate f and the ON to OFF rate defined in analogy with the deterministic case to be hk/(ρ + h). For b ≫ 1, the operator switches multiple times between the ON and OFF states during the average time for protein removal. In that case, the probability distributions for the protein number are unimodal. On average, for b ≈ 1 or smaller, the operator takes longer to switch from ON to OFF to ON (or vice versa) than the average protein removal time. In that case, and for a ≈ hk/(ρ + h), the probability distributions for the protein number are bimodal because most of the proteins synthesized when the operator is ON decay before the operator switches OFF. The average protein number of the distribution is given by
which can be written as ⟨n⟩ = NPα, with Pα being the stochastic equivalent of in Eq. (3).
This stochastic model is a combination of two stochastic processes and hence approaches equilibrium at the two rates ρ and b(ρ + h), the former related to the protein degradation and the latter to operator switching.16 The smallest of those two rates determines when the system reaches equilibrium. The time dependent solutions in terms of generating functions has the form
| (5) |
where j is a non-negative integer and and are confluent Heun functions.16,17 These solutions are obtained applying the separability ansatz whose z component obeys a second-order ordinary differential equation having two regular poles, one around z = 1, giving , and the other around z = z0, giving . It is evident that the only steady state solutions in Eq. (5) have vanishing time dependent exponents, implying the selection of j = 0 and . can then be written as a KummerM function so that
| (6) |
is the generating function of the probabilities ϕn.
These exact solutions of the steady state stochastic model indicated the existence of symmetries.9 The generating functions ϕ in Eq. (6) span irreducible representations of which in the Cartan basis has its operators denoted by H and E±. The Casimir operator is defined as C = −H2 + H + E+E− and the commutation relations are
The action of the algebraic operators on the generating functions ϕb,a is
| (7) |
| (8) |
| (9) |
The invariant of the algebra is determined by the eigenvalue of the Casimir operator and Eq. (7) implies that b is constant. The Cartan operator’s eigenvalue in Eq. (8) determines the OFF to ON switching rate in relation to the protein degradation rate, while the ladder operators change the value of a by one.
We start building the biological interpretation of the symmetries of the model by writing its invariant as b = az0 + Nz0(1 − z0). A fixed b leads to a 3D locus embedded in a 4D space. For fixed values of N, we obtain two possible values for z0, given by
| (10) |
Figure 1(a) shows the possible values of as functions of a. a ≥ b implies which is biologically meaningless and only has acceptable values [Eq. (4)]. For a given a ≤ b, the dynamical regime of the system is degenerate and two values of z0 distinguish those regimes in terms of the ON to OFF operator state transition. The first regime, (), has strong self-repression (high value of h) and a low steady state protein number. The second regime, (), is characterized by a high steady state protein number and weak self-repression (low value of h).
FIG. 1.
(a) and (b) show z0 and ⟨n⟩, respectively, as functions of a for fixed values of N as indicated by the keys in (a) and (b). Dashed-dotted (dashed) lines correspond to (). The vertical black line at a = 25 separates the sub-Fano and super-Fano noise regimes of the steady state probability distribution. In graph (a), the tan solid line indicates , and P1 and P2 show the (a, z0) values for two distributions shown in (c). (b) shows the dependence of ⟨n⟩ on a. (c) shows steady state probability distributions with (N, b) = (150, 25). (a, z0) for each distribution are P1 = (8.3, 0.86), P2 = (8.3, 0.19), P3 = (2.2832, 0.14), and P4 = (50, 0.81) with z0 calculated from Eq. (10), where P1, P3 (or P2, P4) were calculated with (or ). The Fano factor of each probability distribution is indicated by F.
Figure 1(b) shows a further consequence of this degeneracy on the average protein number. For sufficiently low ⟨n⟩, one has two possible values of a and , both characterized by the same value of b. Those values indicate two regimes of operator switching, with lower (or higher) values for the switching rates f and h, that is, slow (or fast) switching. For the specific condition when one regime has a > b and the other has a < b, the noise on the protein numbers is characterized, respectively, as sub-Fano and super-Fano.
The stochastic model exhibits splitting between the deterministic and stochastic solutions to the dynamics of the negative self-regulating gene when the average protein numbers are low. Figure 2(a) shows a comparison between the steady state concentration of proteins predicted by the deterministic model in Eq. (3) and the average protein number as given by the stochastic model. For high values of , there is a good agreement for the steady state number of proteins predicted by both the stochastic and deterministic approaches. As the steady state number of proteins decreases, discrepancies between the two approaches start to appear as f → 0. For sufficiently high h, the probability for the gene being OFF increases, and when f becomes very small, the stochastic and deterministic solutions diverge. The correspondence principle breaks down when the molecular number is extremely small. Figure 2(a) shows that for K = 1019, large values of f cause the protein number to approach 1. This is a consequence of the fact that a protein bound to the operator does not decay in the reaction scheme given in Eq. (1), and we have shown elsewhere that this case can give rise to Fano factors arbitrarily close to zero.18
FIG. 2.
(a) shows a comparison of the expectation of the steady state protein number of the stochastic model (dashed lines) and the protein number from the deterministic model (dashed-dotted lines) as a function of the parameter f. In the deterministic limit, , and we set V = 10−15 liters, the volume of a bacterial cell. K, in units of liters, is shown in the key. Synthesis and degradation rates are k = 500 and ρ = 1. (b) shows the distributions when ⟨n⟩ and are comparable. The relative error E is given by . The key shows the distributions by color and E and the Fano factor F are given for each distribution. The parameters (N, b) = (500, 0.1), and (a, K) are P5 = (0.06, 1.3 × 1012), P6 = (0.08, 5 × 1011), P7 = (0.09, 2 × 1011), and P8 = (0.099, 2 × 1010). The probabilities of finding up to 400 proteins (or more than 400) for curves P5, P6, P7, and P8, are approximately 0.42 (or 0.58), 0.22 (or 0.78), 0.11 (or 0.89), and 0.01 (or 0.99), respectively.
A third type of splitting arises from the following. The parameter a is the eigenvalue of the Cartan operator and gives the OFF to ON transition rate [see Eq. (8)]. The action of the ladder operators on the probability generating function ϕb,a changes the value of a by one [Eq. (9)] and connects probability distributions in which b values are the same and a values differ by an integer. The action of the raising operator changes to , and the action of E− is constructed by analogy. Let us assume that the action of E+ only changes f; hence, ρ′ = ρ and f′ = f + ρ. b remains unchanged under the action of E+; hence, the remaining constants N and z0 change. For a fixed value of z0, one has . For a fixed value of N, we consider that with . The increment of (or ) corresponds to a decrease (or increase) of the value of h that implies an increase of the mean protein number [see Fig. 3(b)].
FIG. 3.
(a) and (b) show probability distributions obtained with while was used to construct graph (c). Approximate values of were obtained from Eq. (10). Graph (a) has in L1 = (0.07, 0.99), L2 = (0.099, 0.99), and L3 = (0.09, 0.99). Graph (b) has in L4 = (6, 0.71), L5 = (15, 0.86), and L6 = (24, 0.99). Graph (c) has in L7 = (6, 0.35), L8 = (15, 0.29), L9 = (100, 0.13), and L10 = (150, 0.10).
The dynamics of the gene expression process may have two distinct characteristics, depending on the value of b. For b ≫ 1, the dominant decay rate to equilibrium is ρ and the changes of the value of h are not sufficient to cause changes in the time for the system to approach equilibrium, b(ρ + h). This regime coincides with a unimodal probability distribution and the action of the raising operator on the generating functions causes the mode of its probability distribution to be displaced to the right. For the case of b(ρ + h) ≪ ρ (or b ≪ z0), b(ρ + h) is the dominant decay rate and the increase (or decrease) of h corresponds to the system reaching equilibrium earlier (or later). This regime is characterized by probability distributions that may become bimodal and the action of the raising operator corresponds to an increase of the maximum probability of finding n equal to the higher mode [see Figs. 3(a) and 2(b)].
The regime with bimodal distributions has important experimental and theoretical consequences. In this regime, most of the proteins synthesized by the gene during the ON state are degraded before it switches back to the OFF state, and the remaining proteins degrade before the gene switches ON, giving rise to bimodal distributions of n which have been experimentally observed.19 In that case, the assumptions underlying the Langevin approach fail20 because the number of proteins at the steady state regime of the Langevin equation is governed by distributions that are Gaussian around . The probability distributions in this case and the breakdown of the Langevin regime are shown in Fig. 2(b).
We began our treatment by considering the macroscopic system because the master equation solution applies to cases with any number of molecules. This point is demonstrated in Fig. 2(a) which shows that increasing the equilibrium binding affinity reduces the deterministic equilibrium concentration, as expected. For fixed K, reducing f requires reducing h. Figure 2(a) shows that there is a symmetry breaking as the average protein number in the deterministic model splits from that given by the stochastic model, and moreover that the average number of proteins in the stochastic model is a function of f even when K is held constant, behavior never seen in the deterministic model. Although the deterministic correspondence principle holds for small numbers of molecules (≈10), correspondence is lost at one repressor molecule per cell, as discussed above.
The kinetic symmetries fully manifest themselves in the macroscopic case. The invariant of the algebra, , is the ratio between the switching rate and the protein removal rate. Since the invariant is quadratic in h, there exist two kinetic regimes for the same value of b. The first has protein removal predominantly because of protein destruction (for example, when ρ ≫ h) while protein binding prevails in the second. These regimes are macroscopically indistinguishable in the presence of a thermodynamically large number of operator sites. This fully macroscopic picture in fact never occurs in a biological system because the molecular number of operator sites per cell is small. In the “semimacroscopic” case of many protein molecules and a small number of operator sites, corresponding to the branch in Figs. 1(a) and 1(b), protein removal takes place primarily by first order decay. This super-Fano regime approaches the solutions of a near equilibrium thermodynamic system as the molecular number increases. In the branch, protein removal takes place primarily by binding, the operator becomes strongly repressed, and sub-Fano behavior results, a situation we have discussed in detail elsewhere.18
A further symmetry breaking manifests itself for certain values of b with respect to the protein number distribution when the number of operators is small. For the case of b < 1, the gene switching is slow in comparison with the protein removal rate; hence, the probability distributions for the protein number when the gene is ON (or OFF) are split and bimodal probability distributions are observed [see Figs. 3(a) and 2(b)]. When the operator number is large, these differences in ON and OFF states would average out in the reaction mixture and become unobservable. Here, the actual biological regime of a small gene number per cell is experimentally significant because it permits direct observation of stochastic switching between ON and OFF in living cells.19 When the gene switching is fast in comparison with the protein removal rate (b > 1), the distributions are unimodal and the existence of the two gene states cannot be established by the measurement of protein numbers, even with a low gene copy number [see Fig. 3(b)].
In conclusion, we have made use of symmetries described by a Lie algebra to fully characterize the behavior of a self-repressing gene. Because the exact solutions represent the behavior of the system for any number of reacting molecules and all values of kinetic constants, we interpret the deviation from deterministic behavior, the splitting of the two branches of z0, and the emergence of bimodal protein distributions as different types of symmetry breaking. The role the symmetries play in this analysis differs from how they are used in quantum problems, where symmetries involve the quantum state directly. Deeper insight into the role of symmetries will be helpful not only to statistical physics but also to other areas involving stochastic processes, including biological evolution.12,13
Acknowledgments
A.F.R. was supported by CAPES. J.R. was supported by Grant No. NIH R01 OD010936. We thank UnJin Lee and David H. Sharp for helpful comments and J. E. M. Hornos for stimulating discussion. AFR thanks Tinker Foundation and the Tinker Visiting Professor program at the University of Chicago for support.
REFERENCES
- 1.Anderson P. W., “Coherent excited states in the theory of superconductivity: Gauge invariance and the Meissner effect,” Phys. Rev. 110, 827–835 (1958). 10.1103/physrev.110.827 [DOI] [Google Scholar]
- 2.Gell-Mann M. and Ne’eman Y., The Eightfold Way (Benjamin, New York, 1964). [Google Scholar]
- 3.Higgs P. W., “Broken symmetries and the masses of gauge bosons,” Phys. Rev. Lett. 13, 508 (1964). 10.1103/physrevlett.13.508 [DOI] [Google Scholar]
- 4.Cannavacciuolo L. and Hulliger J., “Polarity formation in molecular crystals as a symmetry breaking effect,” Symmetry 8, 10 (2016). 10.3390/sym8030010 [DOI] [Google Scholar]
- 5.Tamm M., “A combinatorial approach to time asymmetry,” Symmetry 8, 11 (2016). 10.3390/sym8030011 [DOI] [Google Scholar]
- 6.Arkin A., Ross J., and MacAdams H. H., “Stochastic kinetic analysis of developmental pathway bifurcation in phage λ-infected Escherichia coli cells,” Genetics 149, 1633–1648 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gillespie D. T., “Exact stochastic simulation of coupled chemical reactions,” J. Phys. Chem. 81, 2340–2361 (1977). 10.1021/j100540a008 [DOI] [Google Scholar]
- 8.Hornos J. E. M., Schultz D., Innocentini G. C. P., Wang J. et al. , “Self-regulating gene: An exact solution,” Phys. Rev. E 72, 051907 (2005). 10.1103/physreve.72.051907 [DOI] [PubMed] [Google Scholar]
- 9.Ramos A. F. and Hornos J. E. M., “Symmetry and stochastic gene regulation,” Phys. Rev. Lett. 99, 108103 (2007). 10.1103/physrevlett.99.108103 [DOI] [PubMed] [Google Scholar]
- 10.Innocentini G. C. P. and Hornos J. E. M., “Modeling stochastic gene expression under repression,” J. Math. Biol. 55(3), 413–431 (2007). 10.1007/s00285-007-0090-x [DOI] [PubMed] [Google Scholar]
- 11.Lepzelter D. and Wang J., “Exact probabilistic solution of spatial-dependent stochastics and associated spatial potential landscape for the bicoid protein,” Phys. Rev. E 77, 041917 (2008). 10.1103/physreve.77.041917 [DOI] [PubMed] [Google Scholar]
- 12.Hornos J. E. M. and Hornos Y. M. M., “Algebraic model for the evolution of the genetic code,” Phys. Rev. Lett. 71(26), 4401–4404 (1993). 10.1103/physrevlett.71.4401 [DOI] [PubMed] [Google Scholar]
- 13.Bashford J. D., Jarvis P. D., Sumner J. G., and Steel M. A., “U(1) × U(1) × U(1) symmetry of the Kimura 3ST model and phylogenetic branching processes,” J. Phys. A: Math. Gen. 37, L81–L89 (2004). 10.1088/0305-4470/37/8/l01 [DOI] [Google Scholar]
- 14.Gardiner C. W. and Chaturvedi S., “The Poisson representation. I. A new technique for chemical master equations,” J. Stat. Phys. 17(6), 429–468 (1977). 10.1007/bf01014349 [DOI] [Google Scholar]
- 15.Iyer-Biswas S. and Jayaprakash C., “Mixed Poisson distributions in exact solutions of stochastic autoregulation models,” Phys. Rev. E 90, 052712 (2014). 10.1103/physreve.90.052712 [DOI] [PubMed] [Google Scholar]
- 16.Ramos A. F., Innocentini G. C. P., and Hornos J. E. M., “Exact time-dependent solutions for a self-regulating gene,” Phys. Rev. E 83, 062902 (2011). 10.1103/physreve.83.062902 [DOI] [PubMed] [Google Scholar]
- 17.Ronveaux A., Heun’s Differential Equations (Oxford University Press, 1995). [Google Scholar]
- 18.Ramos A. F., Hornos J. E. M., and Reinitz J., “Gene regulation and noise reduction by coupling of stochastic processes,” Phys. Rev. E 91, 020701(R) (2015). 10.1103/physreve.91.020701 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Suter D. M., Molina N., Garfield D., Schneider K., Schibler U., and Naef F., “Mammalian genes are transcribed with widely different bursting kinetics,” Science 332, 472–474 (2011). 10.1126/science.1198817 [DOI] [PubMed] [Google Scholar]
- 20.Gillespie D. T., “The chemical Langevin equation,” J. Chem. Phys. 113, 297–306 (2000). 10.1063/1.481811 [DOI] [Google Scholar]



