Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1997 Feb 4;94(3):814–819. doi: 10.1073/pnas.94.3.814

Stochastic mechanisms in gene expression

Harley H McAdams *,, Adam Arkin
PMCID: PMC19596  PMID: 9023339

Abstract

In cellular regulatory networks, genetic activity is controlled by molecular signals that determine when and how often a given gene is transcribed. In genetically controlled pathways, the protein product encoded by one gene often regulates expression of other genes. The time delay, after activation of the first promoter, to reach an effective level to control the next promoter depends on the rate of protein accumulation. We have analyzed the chemical reactions controlling transcript initiation and translation termination in a single such “genetically coupled” link as a precursor to modeling networks constructed from many such links. Simulation of the processes of gene expression shows that proteins are produced from an activated promoter in short bursts of variable numbers of proteins that occur at random time intervals. As a result, there can be large differences in the time between successive events in regulatory cascades across a cell population. In addition, the random pattern of expression of competitive effectors can produce probabilistic outcomes in switching mechanisms that select between alternative regulatory paths. The result can be a partitioning of the cell population into different phenotypes as the cells follow different paths. There are numerous unexplained examples of phenotypic variations in isogenic populations of both prokaryotic and eukaryotic cells that may be the result of these stochastic gene expression mechanisms.

Keywords: prokaryotic genetics, transcriptional regulation, simulation of genetic regulation, stochastic behavior


In all organisms, networks of coupled biochemical reactions and feedback signals organize developmental pathways, metabolism, and progression through the cell cycle. For example, overall coordination of the cell cycle results from an overarching set of dependent pathways in which the initiation of late events is dependent on the earlier events and the whole operates as a form of biochemical machine. Within these regulatory networks, genetic activity is controlled by molecular signals that determine when and how often a given gene is transcribed. Additional signals stimulated by environmental influences or by signals from other cells can affect the ongoing reactions to influence the future course of cellular events. Since a regulatory protein may act in combination with other signals to control many other genes, complex branching networks of interactions are possible. In these nets, one regulatory protein can control genes that produce other regulators, that in turn control still other genes.

How long does it take for these messages and controlling influences to move through a regulatory cascade? In biochemical regulatory networks, the time intervals between successive events are determined by the inevitable delays while signal molecule concentrations either accumulate or decline. Genetically coupled links are links where the protein product encoded by one gene regulates expression of other genes. The time delay in genetically coupled links (Fig. 1) depends on the time required for protein concentration growth, after promoter activation, to the concentration range that controls the next level in the cascade. Conversely, the time delay after the controlling promoter turns off depends on the time for the protein concentration to decay below the effective range. Fig. 1B shows a common architecture for such genetically coupled links. In these links, for appropriate combinations of input signals, transcripts are initiated and the protein product accumulates when production exceeds degradation; the increasing protein concentration simply broadcasts the information that the promoter is “on.” The message is “received” or detected by the concentration-dependent response at the protein signal’s site(s) of action, stimulating a response at each site in accord with that site’s chemical behavior. (We use the term “protein signal” to mean the regulatory protein concentration in its effective form at its site of action.)

Figure 1.

Figure 1

(A) A common coupled-reaction architecture for transmission of information or control in one link in a genetically coupled regulatory cascade. The promoter controls the transcript initiation rate. Each transcript leads to a pulse of protein production from downstream genes. Signal concentration at any time is determined by the cumulation over time of protein production and losses. The concentration of the effective form of a signal protein is sensed and responded to at its site(s) of action. The active form of protein signals is commonly a multimer; we assume a dimer here. Duplicate operator sites binding the same protein are also a common motif [true of 43% of 76 repressible promoters known in 1991 (1)]. P, Pi, proteins; PRx, promoter for protein x. (B) A representative autoregulating prokaryotic genetic circuit where the protein product controls its promoter. Autoregulation often serves to stabilize protein concentrations in a range that establishes sustained activation (or repression) of several controlled promoters.

In this paper we examine the properties of a single genetically coupled link as a precursor to modeling networks constructed from many such links. Specifically, we ask what determines the time required for protein concentration to grow to effective signaling levels after a promoter is activated and how statistical variations in this time can affect observed cellular phenomena across a cell population. It has been proposed that the pattern of protein concentration growth is stochastic, exhibiting short bursts of variable numbers of proteins at varying time intervals (2, 3). (Herein the term “stochastic” is used in the statistical sense of resulting from a random process.) We formalize and quantify this notion of randomness in genetic regulatory mechanisms by explicitly characterizing the statistics of the random processes implicit in the chemical reactions (4). By analogy to electrical circuits, we will refer to this time interval between the switching on of the first promoter and activation or repression of the second promoter as a “switching delay.” There is also a switching delay of a different magnitude for the inverse functions when the controlling promoter is switched off. We are neglecting here the case where multiple molecules act combinatorially to determine the controlling action.

Then, as a concrete illustration of switching delays over a genetically coupled link, we simulate a representative link using parameters characteristic of links in bacterial regulatory networks. The simulation results show that short-term fluctuations in protein production can be large relative to signal thresholds that control expression of critical genes. For the same link in different cells of the same genotype, there will be wide random variations in both the times to produce a given protein concentration or in the number of proteins produced when the promoter is transiently activated. Implications of this noisy pattern of gene expression for cellular regulation include: (i) the switching delay for genetically coupled links, hence the time for the cell to execute cascaded functions, can vary widely across isogenic cells in a population; (ii) the overall regulatory circuit design is probably strongly driven by the needed determinism in outcome for circuits constructed from these highly noisy components; and (iii) stochastic simulation techniques must be used to model regulated networks where high noise levels in parts of the network can produce statistical variation in phenotypic outcomes.

There are numerous unexplained observations of phenotypic variation in isogenic or clonal populations. The origin of the randomness is poorly understood; we suggest that it may be a consequence of the stochastic mechanisms in gene expression described here. One example is the distinctive individual chemotactic responses observed in clonal bacterial cells grown in homogenous conditions that persist over the cell lifetimes (5, 6). A second example is phase variation in Escherichia coli expression of type 1 pili in isogenic bacterial populations (710). A third example is the biochemical mechanism leading to the distribution of generation times of cells in growing E. coli cultures. The observed coefficient of variation of generation times is around 0.22 (1113). One consequence of these differing times between cell divisions is progressive desynchronization of initially synchronized cell populations. Within a single cell, random variations in duration of events in each cell-cycle controlling path will lead to uncoordinated variations in relative timing of equivalent cellular events. Checkpoints that resynchronize cell cycle events periodically are one strategy used by cells to deal with this phenomenon.

Quantitative analysis of the mechanisms underlying all these phenomena requires a statistical description of outcomes and explicit modeling of the stochastic mechanisms in the control logic.

Statistics of Prokaryotic Protein Production Mechanisms

In the following two sections we propose stochastic models for timing of signal protein production in prokaryotes applicable when the transcript initiation reactions are separate from the reactions controlling the number of proteins produced per transcript. These two models are closely based on experimentally characterized mechanisms for these functions, and they determine the statistical probabilities used in the stochastic simulation algorithm described below. The stochastic simulation is used to predict the patterns of signal protein production that determine switching delays.

Statistics of Transcript Initiation Intervals.

For many prokaryotic promoters a two-step reaction scheme, R + P ⇔ RPc ⇔ RPo, describes the formation of an RNA polymerase (RNAP) open complex where R is the RNAP, RPc is the closed complex, and RPo is the open complex (14). RNAP initiates transcription only from the open complex. The closed- to open-complex isomerization step is usually rate limiting (14). The subsequent energy-driven elongation reactions are strongly forward-biased, so the transcribing RNAP clears the polymerase binding site within a few seconds. Shea and Ackers (15) have proposed a quantitative physical–chemical model, which includes regulation of the promoter activity by one or more competitively binding effector molecules. A key assumption in the Shea–Ackers model is that there is rapid equilibrium between free RNAP and that bound to the promoter in closed form. Under these conditions, the slowly changing, instantaneous rate for transcript initiation at each promoter is proportional to the product of the fractional saturation of the promoter by RNAP and the rate constant governing the isomerization reaction. Thus, we can consider transcript initiation as a single reaction characterized by a single rate constant, which is unchanging over sufficiently short time intervals. In the stochastic formulation of chemical kinetics a reaction probability per unit time parameter corresponds to the macroscopic rate constant parameter (16, 17). At any instant, each promoter will have a near-constant (i.e., very slowly varying) probability of transcript initiation per unit time and therefore an exponential distribution of the time intervals between successive transcripts. Thus, the probability for a transcript initiation reaction in the small time interval Δt is (1/Tavg) exp(− t/Tavg) Δt, where t is time and Tavg is the instantaneous time parameter of the exponential distribution equal to the average transcript initiation interval, as determined by the underlying reactions. (In the results shown below, for transcript initiation reactions we have also included the short interval of blockage, while a newly formed open complex clears the promoter region.) The variance of the exponential distribution is (Tavg)2 and the distribution is highly skewed about the mean. Thus, 63% of the intervals will be shorter than the average; about 1 in 20 intervals will be more than 3 times the average interval.

Statistics of the Number of Protein Molecules Produced per Transcript.

The coupling between transcription, translation, and mRNA degradation has been examined carefully for the lacZ gene (3). Changes in the level of β-galactosidase expression and the lacZ mRNA level were observed to be highly correlated over a 200-fold range of β-galactosidase expression in E. coli. Additionally, ribosome spacings on bacterial mRNA are observed to be no more than a few ribosome diameters independent of the level of gene expression. Prokaryotic ribosomes bind to the mRNA as soon as it is accessible behind the transcribing RNAP. Multiple ribosomes spaced about 80 nucleotides apart simultaneously translate the emerging transcript, tracking closely behind the RNAP until the transcript is released. After release of the first protein, additional proteins are completed every several seconds as successive ribosomes reach the end of the reading frame.

The explanation for these observations, suggested in ref. 3 and argued therein to be broadly applicable, hinges on the proximity on the mRNA of the ribosome binding site and the binding site for RNase E (3, 18, 19). (The mRNA stability is controlled by RNase E, which ultimately initiates degradation of the mRNA.) Because RNase E cannot bind when its binding site is occluded, a ribosome that binds at the ribosome binding site protects the mRNA from degradation until the site is again exposed as the ribosome translates the mRNA. Most ribosomes that have initiated translation produce a functional protein. Thus, as shown in Fig. 2A, at each exposure of the ribosome binding site and RNase E sites on the mRNA, there is a direct competition between ribosome and RNase E binding. This competition leads either to successful translation and production of a protein or to degradation or inactivation of the transcript. (This model is applicable for the case where expression exceeds a threshold of about one translation event per transcript.) Thus, the rate-limiting step in chemical decay of the mRNA is an RNase E-dependent cleavage in competition with transcript initiation.

Figure 2.

Figure 2

Reaction model (A) and binding state model (B) characterizing sequential competitions between ribosomes and RNase E at two closely located sites on the transcript (denoted BS for binding sites). Binding of either occludes the binding site of the other. After ribosome binding leading to initiation of translation, the competition recurs after a delay while the translating ribosome’s footprint clears the two sites. This process repeats until RNase E binds and initiates degradation of the transcript. Each competition is an independent event with a probabilistic outcome. A transcript is initially in state 1 and thereafter in one of the five states shown in B. The number of proteins produced, N, will be the number of times state 4 is traversed before the process terminates in state 5. When the system is in any state i, aij dt is the probability of transition to state j in time interval {t, t + dt}, where i and j each denotes one of the states {1, … , 5}. Observations (see text) suggest that a24, a12a21 and a35, a13a31. When the system is in state 1, the probability of another protein is approximately (a12 a24)/(a12 + a13)(a21 + a24), neglecting higher order transitions such as 1 → 2 → 1 → 2 → 4.

Fig. 2B shows an equivalent representation of this translation control mechanism focused on the transitions between various binding configurations, assuming that the successive ribosome–RNase competitions are effectively independent trials. If we assume independent trials with constant probability p of “success” for ribosome binding, then the distribution for the number of proteins produced will be the same as the distribution of runs of “heads” from a biased coin with P(head) = p. A run of length N requires N “heads” followed by a “tail” so P(n = N) = pN(1-p), where n is the number of heads, or of proteins from a transcript, in each trial. This is the geometric distribution function. The mean of the geometric distribution is Navg = p/(1-p). Thus, for example, if Navg is 10 proteins, then p ≈ 0.91; i.e., the ribosome binds in about 91% of the opportunities. The geometric distribution is also highly skewed; the variance is p/(1-p)2 and P(nN) = pN. For Navg = 10 proteins, 25 or more proteins will be produced from 9% of the transcripts. Letting TD be the average time interval between successive competitions, then the number of mRNA messages Nmsg, surviving in the population versus time after transcription is blocked would be Nmsg = N0msg·pt/TD. This is equivalent to exponential message decay with half life Thalf = −(ln(2)/ln(p))·TD.

The principal assumptions in this translation control model are (i) that successive ribosome–RNase competitions are independent trials and (ii) the binding competition determines the outcome. The justification for i is the observation that the chemical environment of all the successive competitions will be similar, with the possible exception of the first event just as the ribosome binding site is cleared from the RNAP. Justifications for ii are the experimental observations that ribosome binding almost always leads to protein production and RNase E binding almost always leads to degradation.

Other Reactions Contributing to Concentration Fluctuations.

Degradation of the protein product, plus the forward and reverse dimerization reactions shown in Fig. 1A, also contributes to the stochastic noise in the protein dimer signal concentration. Stochastic effects of these conventional reactions are modeled by recognizing that macroscopic rate constants are directly related to molecular-level reaction probabilities and using a proven Monte Carlo simulation algorithm to determine outcomes of the coupled reactions (17).

Switching Delays for Genetically Coupled Links

How can we characterize the statistics of switching delays for the common type of bacterial genetically coupled links (Fig. 1) that result from the stochastic protein production mechanisms described above? Our approach is to estimate switching delays for a stochastic simulation of the coupled chemical reactions with representative parameters for links in bacterial regulatory networks.

Stochastic Simulation Algorithm.

Solutions to the stochastic formulation of coupled chemical reactions can be computed using the Monte Carlo procedure described by Gillespie (17). This algorithm calculates a stochastic description of the temporal behavior of the coupled reactions, which can be shown to have a more rigorous physical basis than the conventional chemical kinetics formulation. The key difference is that the conventional kinetic equation formulation is based on the assumption that changes in the chemical reaction system over time are both continuous and deterministic. This assumption is always invalid at low enough concentrations or slow enough reaction rates and may not apply at higher concentrations and rates if the system exhibits large, rapid, and discrete transitions. In bacterial cells, intracellular concentrations of promoter–operator regions are always low and the stochastic gene expression mechanism described above produces “bursts” of proteins from individual transcripts at random intervals. Consequently, the assumption of continuity and determinism will always be questionable for reactions involving gene expression in the cell at low gene dosages. If the physical model and its assumptions are valid, and parameter estimates are sound, then the stochastic algorithm produces a more realistic and complete description of the time-dependent behavior of such systems than a deterministic calculation (17). The Gillespie simulation algorithm calculates the probabilistic outcome of each discrete chemical event and the resulting changes in the number of each molecular species. By accumulating the results for all reactions over time, the statistics of the inherent fluctuations in the reaction products over time can be estimated. In our link simulation, each run produces a representative pattern for the growth in signal protein concentration and the resulting switching delay for that link in a single cell. The distribution of switching delays for that link across a cell population is estimated by performing multiple simulations. Statistical sampling theory can be used to determine how many simulation runs must be included to achieve a target confidence level.

What Are “Representative” Link Parameters?

The combination of the exponential time distribution of transcripts and the geometrically distributed number of proteins per transcript largely determines the time pattern of protein production initiated at a single promoter. The principal cellular parameters determining these distributions are the inherent strength of the promoter (considering any activating or repressing effectors) and the relative binding strengths of the ribosome and RNase to the mRNA transcript. These parameters can have a wide range of values, and different genes, even in the same operon, can have widely different translation rates (18). In the link simulation, we use gene expression parameters (for PRP1 in Fig. 1) that approximate those determining maximal Cro expression from the PR promoter in phage λ (15): open-complex initiation rate = 0.014 sec−1; and Navg = 10 proteins per transcript. Protein concentration growth is affected by additional parameters: rates of degradation and dimerization reactions, initial cell volume, and the cell growth rate.

The simulated switching delay is the time required in each run to accumulate the necessary concentration of proteins in their effective form to activate or repress the controlled promoter (PRP2 and PRP3 in Fig. 3). Most switching in bacterial regulatory networks must be accomplished by a few tens of molecules, since more than 80% of E. coli genes express fewer than 100 copies of their protein product per cell cycle. [The arguments for the low levels of expression of most genes are summarized by Guptasarma (21).] We have assumed in this analysis that the range of 25–50 nM is a representative range over which the controlled promoter is activated or repressed and switching of the controlled promoter is effected. (Our conclusion that genetically coupled links can exhibit wide random variability of switching delays is not sensitive to the specific range assumed.)

Figure 3.

Figure 3

(A) Three simulation runs for the onset of P1 dimer production for the regulatory configuration in Fig. 1B. Each run is a different realization of the pattern of the dimer concentration growth in an individual cell. The pattern of protein expression can be quite erratic and thus dramatically different in each cell. Rapid changes in dimer concentration due to forward and reverse dimer transitions contribute to the high frequency noise in the protein dimer signal. The broken lines are the declining concentrations equivalent to 25 and 50 dimer molecules in the growing cell. Parameters: P1 dimerization equilibrium constant = 20 nM; dimerization kr = 0.5 s−1; P1 half-life, 30 min. Initial cell volume comparable to E. coli of 1 × 10−15 liters, doubling with linear growth (20) in 45 min (12). (B) Mean and ± 1 σ results for 100 runs at gene dosages of 1, 2, and 4. The “σ” values plotted are the 16th and 84th concentration percentiles at each time point. At higher gene dosages, protein P1 is being produced from more genes; the concentration rises more rapidly, and the effective concentration range will be reached quicker. In addition, the dispersion in time to effectiveness (i.e., the switching delay) will be lower for faster growing signals. (C) Activation level of a controlled promoter (e.g., PRP3 in Fig. 1) assuming activation, A, is characterized by the Hill equation with Hill coefficient 2: A = (Kh[P1P1]2)/(1 + Kh[P1P1]2) where [P1P1] is the P1 dimer concentration and Kh is the Hill association constant, Kh = [KE]−2. Curves are labeled by N;KE, where N is the gene dosage and KE is the dimer-operator binding constant. Each curve reflects only the mean concentration curve plotted in B. Activation (or repression) of controlled genes in each cell and over the population will differ widely around this mean value as shown in A and B.

Protein Signal Production Patterns.

Fig. 3A shows three simulation results for growth of the P1 dimer (Fig. 1) concentration due to transcripts initiated at promoter PRP1 at a gene dosage of 1 (one PRP1 promoter in the cell). Each of the three runs shown exhibits a substantially different pattern of P1 concentration growth because of random differences in transcript initiation intervals, in transcription time, in the number of proteins produced from each transcript, in protein degradation, and in the dimerization reaction. Abrupt jumps in dimer concentration (e.g., at the arrows) can result from chance occurrence of either high protein output from a single transcript or a closely spaced series of transcripts. Periods with declining concentration are due to dilution and degradation during periods with chance occurrence of some combination of long inter-transcript intervals or low protein output from a series of transcripts. Both long intervals with few proteins and bursts of many proteins in a short time are common occurrences. Consequently, the concentration growth profile in each cell can be quite erratic and distinctive. Significantly, single “bursts” of signal proteins can occasionally be large enough to immediately activate or repress the controlled promoters.

Fig. 3B shows the mean and standard deviation of the number of P1 dimers in the cell at each time for gene dosage equal to 1, 2, and 4 calculated from 100 simulation runs at each gene dosage. The horizontal lines at 25 and 50 nM delineate the assumed critical range over which switching action is effected. Table 1 summarizes, as a function of dosage, the mean and ± 1 σ times required to reach 25 and 50 nM in Fig. 3B. The results show that higher gene dosage produces proportionately quicker time to effectivity and proportionately lower uncertainty in that time. Fig. 3C shows representative fractional activation of the controlled promoter (PRP3, Fig. 1) using a Hill equation with Hill coefficient 2 as a proxy for a moderately cooperative effector–promoter interaction (22). The activation curves correspond to the mean concentrations (bold lines) in Fig. 3B.

Table 1.

Range of switching delays

Effective dimer concentration, NM Dosage −1σ time, min Mean time to concentration, min +1σ time, min
25 1 10 20
50 1
25 2 4 6 10
50 2 9 16 28
25 4 2 3 6
50 4 4 6 10

Mean and ± 1σ time for dimer count to reach 25 and 50 nM in an E. coli cell for several gene dosages as determined from 200 runs of the stochastic simulation at each dosage. 

The principal observation from Fig. 3 is that switching time in growing cells has considerable uncertainty. The uncertainties shown are conservative in that we have assumed that the controlling gene is switched on instantaneously. In fact, however, it will be controlled by earlier regulatory reactions and will be activated over an interval as its controlling effectors increase. Stronger promoters, higher gene dosage (or equivalently in many cases, multiple promoters per gene), and lower signal thresholds all act to reduce timing uncertainty.

Stochastic Protein Expression and Autoregulatory Loops.

Autoregulatory feedback loops in genetic circuits can lock controlling protein signals on, in turn locking other signals either on or off. Also, since an autoregulatory loop that is locked on has a sustained level of transcript initiation, the loops can establish sustained transcription from additional genes in an operon other than the gene producing the autoregulating protein. The central role of autoregulation in bacterial genetic regulation is evident in statistics derived from a 1991 inventory of 107 σ70 promoters then known in E. coli (1). The promoters in that inventory are organized into 31 regulons, each jointly controlled by one or more regulatory proteins. Twenty-one (68%) of the principal 31 regulatory proteins are autoregulating, i.e., they repress their own synthesis. Four (13%) of the 31 are autoactivating, i.e., they activate their own synthesis. Four of the regulatory proteins repress their own synthesis, but are activators in regulating promoters for other genes. One represses its own synthesis from a σ70 promoter, but activates it from an overlapping σ54 promoter.

The erratic and pulsative character of protein expression will also affect the dynamics of autoregulated protein levels. We also simulated the steady-state behavior of the phage λ CI autoregulating circuit that maintains lysogeny (results not shown). For the particular case examined, the mean dimer concentration was about 140 nM in the cell. However, the simulated dimer concentration exhibited both high frequency, low amplitude variation, and a slow meandering in the range of about ± 20 nM around the 140 nM mean. This fluctuation pattern is caused by the random production and decay reactions affecting the number of autoregulating proteins. The protein concentration fluctuates about the “steady-state” value (where the mean dimer degradation rate equals the mean production rate) since at higher (lower) dimer concentrations, the mean degradation rate increasingly exceeds (is less than) the mean production rate. The level and character of such fluctuations in the autoregulated protein concentration may be important aspects of a regulatory protein’s function in the cell and can be estimated only by stochastic simulation.

Fig. 3B shows that rapid production of initial protein signals reduces switching delay uncertainty. Rapid production can be achieved with either a single strong promoter or by several identical promoters in the cell acting simultaneously (higher dosage). Sustained production at the initial high rate that produces more rapid and definitive switching would have a continuing high energy cost per unit time. If, however, the promoters are autoregulated by a negative feedback loop, the growing protein concentration will reduce rapidly the initial high rate of protein production and establish a steady, but lower, average level of ongoing transcript initiation at greatly reduced ongoing energy cost. Thus, with the autoregulating feedback loop, the cell can achieve rapid initial signal production to reduce variance in switching delays without an energy penalty.

Discussion

The average number of proteins expressed from a gene is the product of the average transcript initiation rate and the average number of proteins produced per transcript. Many different combinations of promoter strength and relative RNase and ribosome binding energies could lead to the same average protein production rate. More frequent transcripts and fewer proteins per transcript leads to more even production, but at higher energy cost for transcript synthesis; conversely, less frequent transcripts and more proteins per transcript produces a noisier signal, but at lower energy cost. Thus, there is a selection-driven trade-off between energy cost and noise level in the resulting protein signal.

The analysis in this paper has emphasized the mechanisms determining the pattern of protein signal growth and resulting switching delays after a controlling promoter is turned on. The time interval, after the controlling promoter is turned off, for decay of a control protein to the level where it is ineffective will also have a large random variation, but determined by different mechanisms than those discussed above. Autoregulated proteins, for one example, will have a broad concentration distribution across the cell population at any time due to the meandering of autoregulated signal levels discussed earlier. Consequently, the time to decay below the level of effectiveness will vary depending on the starting concentration.

Our observations regarding stochasticity of gene expression and its implications are not strongly dependent on the specific statistical distributions we postulate for transcript initiation intervals and proteins produced per transcript. The erratic time pattern of protein production we postulate will result so long as (i) the statistical distributions of intertranscript intervals and proteins per transcript are skewed and have long tails, (ii) the number of reaction centers (promoters) in each cell is small, and (iii) the mean intertranscript time interval is relatively long. The deeper mathematical and physical implications of skewed, long-tailed distributions widely found in physiology are discussed in (23).

In the introduction we cited several unexplained stochastic phenomena in prokaryotes. Eukaryotic cells also exhibit stochastic differences between isogenic cells. For example, experimental evidence from several eukaryotic systems, including cells infected with HIV-1 or the mouse mammary tumor virus, suggests that transcription of individual genes occurs randomly and infrequently (24, 25). Although the translation control model discussed above is specific to prokaryotes, we expect that eukaryotic translation mechanisms will also exhibit stochastic behavior affecting phenotypic outcomes. The commitment decision in human hematopoiesis is thought to have a stochastic component (26). The cellular decision to express the human CD2 gene in transgenic mice has also been shown to be a stochastic event (27). Analysis of cell cycle times of Swiss 3T3 cells under high and low serum conditions has demonstrated that there are two or more points in the cell cycle where a stochastic mechanism regulates cell cycle progression (28). Finally, evidence from population studies also suggests that stochastic events during early development are responsible for nongenetic and nonenvironmental phenotype variability. After a 30-year systematic effort to reduce genetic variability in laboratory mice, the residual irreducible variation is attributed to random variations effective at or before fertilization (29).

In summary, there is compelling evidence from many directions that outcomes of regulated events in both prokaryotic and eukaryotic organisms are not deterministic. Efforts to produce clonal organisms in identical environments always find an irreducible level of random variability in phenotypic details. Our analysis of experimentally characterized mechanisms of prokaryotic gene expression predicts that the temporal pattern of specific protein production in individual cells can be quite erratic and distinctive for each cell in a population. It is a statistical certainty that occasional bursts of signal proteins will be produced that are sufficiently large to completely activate or suppress controlled genes. Such events could trigger an ensuing cascade with macroscopically observable phenotypic consequences or they could decisively resolve competitively regulated switching mechanisms to probabilistically select one of several alternative paths. The cell can exploit these inherent fluctuations to achieve nongenetic diversity where this makes the population more capable of surviving in a wide range of environments. Alternatively, when the cell requires a deterministic outcome, regulatory circuit design and reaction parameters that favor predictability and stability in outcome will be evolutionarily selected. Also, one would expect to find organization of the chromosome so that genes for links with critical timing are replicated before their time of expression to achieve higher dosage. We predict that the stochastic character of fundamental mechanisms of both gene expression and control of expression is an important source of the observed stochasticity in cellular events.

Verification of this prediction requires detailed simulation of a well-characterized regulatory system with multimodal phenotypic outcomes using the methods described above to compare predicted population statistics to observed statistics. We are performing such an analysis using the proportion of lysogens produced by phage λ as a function of multiplicity of infection as the model system.

Acknowledgments

This work was supported by Office of Naval Research Grant N00014–96-1–0564. A.A. was partially supported by National Science Foundation Grant CHE9109301 awarded to John Ross (Department of Chemistry, Stanford University).

Footnotes

Abbreviation: RNAP, RNA polymerase.

References


Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES