Predicting rates of cell state change caused by stochastic fluctuations using a data-driven landscape model

Daniel R Sisan; Michael Halter; Joseph B Hubbard; Anne L Plant

doi:10.1073/pnas.1207544109

. 2012 Oct 30;109(47):19262–19267. doi: 10.1073/pnas.1207544109

Predicting rates of cell state change caused by stochastic fluctuations using a data-driven landscape model

Daniel R Sisan ¹, Michael Halter ¹, Joseph B Hubbard ¹, Anne L Plant ^1,¹

PMCID: PMC3511108 PMID: 23115330

Abstract

We develop a potential landscape approach to quantitatively describe experimental data from a fibroblast cell line that exhibits a wide range of GFP expression levels under the control of the promoter for tenascin-C. Time-lapse live-cell microscopy provides data about short-term fluctuations in promoter activity, and flow cytometry measurements provide data about the long-term kinetics, because isolated subpopulations of cells relax from a relatively narrow distribution of GFP expression back to the original broad distribution of responses. The landscape is obtained from the steady state distribution of GFP expression and connected to a potential-like function using a stochastic differential equation description (Langevin/Fokker–Planck). The range of cell states is constrained by a force that is proportional to the gradient of the potential, and biochemical noise causes movement of cells within the landscape. Analyzing the mean square displacement of GFP intensity changes in live cells indicates that these fluctuations are described by a single diffusion constant in log GFP space. This finding allows application of the Kramers’ model to calculate rates of switching between two attractor states and enables an accurate simulation of the dynamics of relaxation back to the steady state with no adjustable parameters. With this approach, it is possible to use the steady state distribution of phenotypes and a quantitative description of the short-term fluctuations in individual cells to accurately predict the rates at which different phenotypes will arise from an isolated subpopulation of cells.

Keywords: population distribution, dynamical systems, stochastic protein expression, biological noise

Genetically identical cells do not respond identically when exposed to nominally identical environmental conditions. Such nongenetic phenotypic variability has been widely observed in bacteria (1, 2), yeast (3), and mammalian cells (4–8). Population heterogeneity is thought to result from the inherently stochastic nature of intracellular events, which are subject to statistical fluctuations caused by small copy numbers of the constituent molecules, such as transcription factors (9). Many investigations into the origins and effects of stochastic gene expression have used engineered organisms and stochastic gene network models to determine the sources and magnitude of variability (10–12). These fluctuations, although causing continual change at the single-cell level, can lead to stable distributions of phenotypes within a population.

The idea of a stable distribution of states in the presence of random fluctuations is reminiscent of statistical physics, where randomness results from thermal fluctuations and the stable distribution of states reflects a potential energy function. The popular concept of the epigenetic landscape suggested in the work by Waddington (13) (i.e., a surface of branching valleys and ridges on which cells explore phenotypic states) can be thought of as a series of potential energy functions. The epigenetic landscape, in which different phenotypic states arise, despite the fact that cells have identical gene sequences, has been discussed widely, mostly in the context of developmental biology and stem cell fate decisions (14–17); it has been a useful framework, even as a qualitative description. Several studies have focused on developing the landscape concept quantitatively, where the connection between the landscape and an energy-like potential is made explicit and rigorous, but so far, these efforts have been limited to either theoretical treatments (18, 19) or completely specified systems of chemical reactions (20, 21) and have not been tested with experimental data.

Here, we develop an approach that allows us to quantitatively describe a landscape of cell phenotypes based on experimental data, and it is applicable to cell systems where the relevant underlying biochemistry is unknown or difficult to measure. The approach is similar to the landscape models of protein folding (22, 23) in several ways: the complexity of the system precludes a complete description, we focus on the dynamics of the system, and we carefully consider the choice of reaction coordinate (24). Our reaction coordinate is a 1D space (the x axis of the landscape), in which entities move diffusively and are subject to nonrandom forces determined by the gradient of the potential.

In this paper, we examine a fibroblast cell line that is stably transfected to express GFP in response to activation of the promoter for the ECM protein, tenascin-C (TN-C). TN-C, which is controlled by a large promoter sequence with a number of transcription factor binding sites (Fig. S1), is highly regulated both temporally and spatially during development, and in the adult, it is expressed predominantly under conditions of wound healing and tumor growth (25–27) and in hypertensive arteries (28), where it supports vascular smooth muscle cell proliferation, migration, and survival (29, 30). In our experiments, a clonal population of cells is grown under homogeneous conditions but exhibits a wide range of GFP intensities, most likely because of noise in promoter activity. To probe the dynamics underlying this variability, we use two types of kinetic experiments. One type is time-lapse microscopy to quantify fluctuations in GFP intensity in individual living cells. The second type isolates subpopulations of cells by cell sorting according to their GFP intensity and follows the kinetics of relaxation of these populations as they revert from their sorted distribution back to the steady state distribution. We find that the relaxation of a subpopulation back to the steady state distribution can be partially described by a simple two-state switching model, but an accurate analysis of the kinetics of relaxation requires a continuum model. We use a Langevin-type stochastic differential equation, which leads to a 1D quantitative potential landscape. The steady state population distribution of GFP is used to derive the potential. The measured fluctuations in cellular GFP, determined by time-lapse microscopy of individual living cells, are used to determine that the appropriate reaction coordinate is log GFP concentration, in which a single, constant diffusion coefficient characterizes fluctuations in GFP. This finding allows application of the classic Kramers’ theory of potential barrier crossing and prediction of the rates of switching between the two states based solely on the shape of the landscape. This landscape approach is tested with computer simulations that quantitatively predict the relaxation dynamics of the sorted subpopulations. We show that, with a steady state distribution and a quantitative description of fluctuations, this approach allows accurate prediction of the rates at which different phenotypes will arise from an isolated subpopulation of cells.

Results

Quantifying Cell-to-Cell Variability.

Cell-to-cell variability in GFP expression in these clonal fibroblasts can be measured reliably by flow cytometric analysis or quantitative imaging. The levels of TN-C promoter activity (as indicated by the range of GFP expression in individual cells within the population) is very broad [SD/mean Inline graphic coefficient of variation (CV) = 2], spanning over three orders of magnitude (Fig. S2). Because these cells are genetically identical and residing in homogeneous conditions, the observed variability results presumably from the inherent randomness in cellular reactions. These random fluctuations, although causing continual change at the single-cell level, leads to a stable steady state distribution of GFP intensities across the population. The steady state distribution can be described by a sum of two log normals (Fig. S2A), suggesting that TN-C has two major states of promoter activity: low and high. Random fluctuations presumably are responsible for the switching of cells between these states, and furthermore, fluctuations seem to be important, even within a state: the log-normal distribution of high-activity cells is itself broad, with a CV = 1.5.

Time-lapse fluorescence microscopy of live cells, in which individual cells were segmented and tracked over long times (31), reveals that TN-C promoter activity exhibits random fluctuations as well as cell cycle-dependent trends. On average, cells approximately double their total fluorescence before dividing. Even over several cell cycles (∼50 h), the fractional change in intensity is small compared with the orders of magnitude variation between individuals across the population (Fig. S2B). This finding suggests that population heterogeneity is determined by kinetics that are slow relative to the length of the cell cycle.

To probe the slow dynamics that give rise to the heterogeneity in GFP expression levels, we used FACS to separate subpopulations according to GFP intensity into four dishes that could be cultured and passaged separately (Fig. 1). The number of dishes was chosen arbitrarily, and the intensity boundaries were chosen so that the dishes had approximately equal portions of the population. Fluorescence distributions were then measured approximately every 3 d using flow cytometry. As seen in Fig. 1, the distributions approached the steady state distribution over the course of weeks.

Fig. 1. — Sorted subpopulations and relaxation back to the steady state distribution. Cells were sorted using FACS based on their fluorescence intensity into four dishes as shown, and then, they were cultured separately and regularly monitored using flow cytometry over many weeks. The cells in Dish I required more than 36 d but did eventually relax (Fig. 4).

State-Dependent Rates of Proliferation.

Differences in proliferation rates were obvious during passaging of the different subpopulations of cells. Independent analysis of live-cell data in an unsorted dish confirmed that the dimmest cells (corresponding to the lower log normal in Fig. S2A) proliferated at a rate that was 10% slower than the other cells (31). This observation is consistent with previous observations that expression of TN-C is associated with proliferating cells (28).

Two-State Switching with State-Dependent Rates of Proliferation.

The steady state distribution for TN-C promoter activity is comprised of two log-normal distributions (Fig. S2), which suggests two major states of activity. We, thus, pursued a two-state description (SI Text, section 1), which has also been used by others (8) for similar data involving relaxation of stem cell populations. The two-state model fits reasonably well to the data for relaxation of cells at the extremes of the distribution, but there are notable discrepancies with the data from the other two subpopulations (Fig. S3). We will show next that these deficiencies in the two-state model are addressed by applying a continuum model.

Potential Landscape.

Failure of a two-state model to describe all of the details of the relaxing distributions (Fig. S3) suggests the need for a continuum model. Furthermore, the gradual spreading of the shape of the distributions that are observed during relaxation (Fig. 1) suggests a process that is driven by small step fluctuations (i.e., diffusion). To explicitly introduce stochastic fluctuations in a description for TN-C promoter activity, we used a Langevin equation, because it provides a rigorous framework for a continuous, data-driven potential landscape description and can be simulated.

Langevin Equation.

A Langevin equation—a type of stochastic differential equation—can be used to describe the GFP dynamics on the single-cell level (Eq. 1):

where x is the GFP or other protein concentration (i.e., the reaction coordinate), A(x) describes all of the deterministic (nonrandom) dynamics, and the last term describes the random noise caused by fluctuations. D(x) is the diffusion coefficient, which controls the relative magnitude of the noise term, and it is a function of the protein concentration in general and GFP in our system (shown explicitly with the data in Fig. 2); Inline graphic is a random variable with zero mean, unity variance, and no correlation. Eq. 1 can be thought of as describing a random walk in GFP concentration space, which is subject to random fluctuations that arise from small copy numbers of chemical components, like transcription factors, and an average nonrandom force, Inline graphic , which describes the deterministic rules arising from the signaling pathways that control GFP expression levels.

Fig. 2. — Trajectories of cellular GFP intensities are diffusive, with a diffusion coefficient that is constant in log GFP space. (A) GFP intensity over time is shown for 32 representative cells (of 344 total analyzed) from mitosis to mitosis obtained from automated quantitative live-cell segmentation and tracking (*SI Text*, section 3). (B) The average trajectory (synchronized as in Fig. 1 and then normalized). The green curve is a sixth-order polynomial fit used as a smooth approximation to the data. (C) The detrended GFP intensity trajectories from A after dividing each cell’s trajectory by the fit in B. (D) The MSD is calculated for each detrended trajectory , and then, it is grouped into five bins (shown in C) according to GFP intensity and averaged. Only a few data points for groups IV and V are shown in the graph. (E) The slope increases approximately as the square of GFP intensity. *Inset* shows the same plot on log–log axes superimposed with a line of slope 2, the expected form for a square dependence on GFP. (F) The slopes of the MSD are approximately the same when calculated using the log-transformed trajectories in C. The MSD for the lowest intensity bin (blue) has a higher offset from the rest because of a higher relative measurement noise, which we can estimate to be <1% of the average GFP intensity for the population.

Inline graphic — Trajectories of cellular GFP intensities are diffusive, with a diffusion coefficient that is constant in log GFP space. (A) GFP intensity over time is shown for 32 representative cells (of 344 total analyzed) from mitosis to mitosis obtained from automated quantitative live-cell segmentation and tracking (*SI Text*, section 3). (B) The average trajectory (synchronized as in Fig. 1 and then normalized). The green curve is a sixth-order polynomial fit used as a smooth approximation to the data. (C) The detrended GFP intensity trajectories from A after dividing each cell’s trajectory by the fit in B. (D) The MSD is calculated for each detrended trajectory , and then, it is grouped into five bins (shown in C) according to GFP intensity and averaged. Only a few data points for groups IV and V are shown in the graph. (E) The slope increases approximately as the square of GFP intensity. *Inset* shows the same plot on log–log axes superimposed with a line of slope 2, the expected form for a square dependence on GFP. (F) The slopes of the MSD are approximately the same when calculated using the log-transformed trajectories in C. The MSD for the lowest intensity bin (blue) has a higher offset from the rest because of a higher relative measurement noise, which we can estimate to be <1% of the average GFP intensity for the population.

Eq. 1 is not well-defined when the diffusion coefficient, D(x), depends on the protein concentration, x. More than a single Fokker–Planck equation can be derived depending on the interpretation of the stochastic process (SI Text, section 4). In cases where D(x) depends on the reaction coordinate, a Langevin equation with a constant diffusion coefficient can often be derived by transforming the reaction coordinate of Eq. 1 to the following (unique) reaction coordinate (32) (Eq. 2):

graphic file with name pnas.1207544109eq2.jpg

where Inline graphic is a function of x, and as a result, the diffusion coefficient, , becomes a new variable that is independent of x. This operation transforms Eq. 1 into the following Langevin equation (Eq. 3):

With this transformation, the force, Inline graphic , now depends on the functional forms of and —and on the choice of interpretation. Therefore, the dilemma of needing to make an interpretation has not been avoided but has shifted from the noise term to .

We now show that the dilemma can be totally eliminated, because we can determine Inline graphic empirically from the measured steady state distribution of protein concentration as follows. With a constant diffusion coefficient, the Langevin equation can be used to construct the following Fokker–Planck equation for the probability distribution (33) describing a population of cells (Eq. 4):

The steady state probability distribution, P_ss, can be solved by setting the left-hand side to zero and integrating, and it is given by (Eq. 5)

graphic file with name pnas.1207544109eq5.jpg

where C is a normalization constant. We now define the following function (Eq. 6),

graphic file with name pnas.1207544109eq6.jpg

which we call a potential, because it puts the steady state distribution Inline graphic in a form that is analogous to the Boltzmann distribution . Furthermore, the force is proportional to the gradient of the potential. Importantly, the potential and the force can be obtained from the experimentally accessible steady state probability distribution, , and the diffusion coefficient, Inline graphic (Eq. 7):

graphic file with name pnas.1207544109eq7.jpg

and (Eq. 8)

This allows construction of a self-consistent, unambiguous Langevin description (and simulation) without direct knowledge of A(x) and without choosing between an Ito or Stratonovich interpretation (although which interpretation is appropriate remains an open question). In approaches that start with A(x) instead of empirical data (20), a choice between interpretations must be made and justified, because each interpretation leads to a different outcome.

Obtaining the Diffusion Coefficient and Potential Landscape from Cellular Data.

A complete mathematical description of the steady state distribution is required for establishing the potential landscape, and this description requires determining the appropriate reaction coordinate. An analysis of the noise in cellular GFP intensity is used to obtain Inline graphic , the appropriate reaction coordinate for the landscape, and , the diffusion coefficient, with which cells explore the landscape. This analysis is shown in Fig. 2. Representative fluorescence trajectories of total intensity in individual cells from time-lapse live -cell microscopy are shown in Fig. 2A. To approximate the concentration and minimize the cell cycle-dependent trends, each trajectory was detrended using the fit to the average trend in Fig. 2B, which is shown in Fig. 2C. The mean squared displacement (MSD) was calculated as Inline graphic 〉_t, where x is detrended GFP intensity, is the time lag, and the bracket denotes an average over . An MSD is determined at all Δt values for each cell, and then, MSD values are binned according to average cell intensity as indicated in Fig. 2C. As seen in Fig. 2D, all curves were found to be linear at short times, indicating that the dynamics of fluctuation in GFP are diffusive and therefore, that the Langevin equation is an appropriate description. The slope of the MSD, from which we derive the diffusion coefficient Inline graphic , depends strongly on the average GFP intensity of the cells (Fig. 2E), indicating GFP-dependent variations in the diffusion coefficient. In particular, the slope and thus, the diffusion coefficient increase as the square of the mean GFP.

We can use this relationship to determine a reaction coordinate in which the diffusion coefficient is constant. Substituting Inline graphic into Eq. 2, , indicates that the log-transformed GFP concentration should have fluctuations characterized by a constant diffusion coefficient, which is verified in Fig. 2F. The slopes of the MSD are approximately the same, indicating a constant diffusion coefficient, D = 0.02 ± 0.002 log²(GFP)/d (a description of error calculations is given in SI Text, section 3). Thus, we choose log GFP concentration as the reaction coordinate.

The steady state distribution, Inline graphic , can be transformed by Eq. 6 to result in the potential landscape. The steady state distribution shown in Fig. 3A is derived from the observed steady state distribution (Fig. S2A) and has been modified using the switching rates determined from the two-state model (Fig. S3) to deduce the steady state distribution resulting solely from variability of promoter activity decoupled from the effects of population dynamics (SI Text, section 1). The landscape, Inline graphic , which is derived from the steady state distribution, is shown in Fig. 3B. The landscape consists of two minima, which correspond to the two major states (attractors). The stability of the states is determined by the depth of each minimum and the height of the barrier. The diffusion coefficient, D, characterizes how fast individual cells in the population explore the range of phenotypes within each state.

Fig. 3. — The epigenetic landscape is derived from the measured steady state distribution. (A) The estimated unbiased steady state distribution was obtained by modifying the fit of the distribution in Fig. S1 to eliminate the effect of state-dependent proliferation rates by applying the ratio of low- to high-activity cells, , as determined from the two-state model (*SI Text*, section 1). (B) The potential is defined by Eq. 7 [], where is the probability distribution shown in A, , and log² [GFP]/d.

Langevin Simulation.

Stochastic numerical simulations of an ensemble of Langevin equations were performed (SI Text, section 2 and Fig. S4) to simulate the relaxation of the four sorted dishes (Fig. 1 and Fig. S3), with the force given by Inline graphic , the gradient of the landscape (Eq. 7), and the diffusion coefficient determined from the data in Fig. 2. In Fig. 4, the results of the simulations are compared with the experimental data from Fig. 1 and found to agree well without any tuned parameters. An analysis of the sensitivity of the simulation to different values for D is shown in Fig. S4 B–E, and it indicates that the simulation prediction is rather insensitive to changes in D up to a factor of two. This result suggests that the MSD analysis of the dynamic data, although orthogonal to the distribution measurements by flow cytometry, provides for a D that uniquely predicts the relaxation data. The simulation captures two features that were missed by the two-state description. First, for Dish II, in which the subpopulation of cells had a phenotype that was near the barrier of the two major states, the curve first increases and then decreases. Second, Dish IV takes longer to relax than Dish III, despite both being initially composed of all high-producing cells because of the time needed for cells to diffuse close to the barrier.

Fig. 4. — Results of a Langevin simulation of the relaxation experiment. Data points are the same as in Fig. S2. The simulation uses the experimentally derived diffusion coefficient, , the landscape, (Fig. 3B), and state-dependent rates of proliferation. Each time point in the simulation produces a population distribution of GFP concentrations, which is reduced to a scalar: the fraction of cells below a GFP concentration threshold as in Fig. S2 and described in *SI Text*, section 3.

The different rates of proliferation for low- and high-activity cells must be included in the simulation to provide an accurate description for the same reasons as for the two-state model (Fig. S3A). Although consideration of state-dependent rates of proliferation precludes a purely analytical time-dependent solution for the probability distribution from the Fokker–Planck equation (Eq. 3), we show that it is easily incorporated into a simulation.

Predicting Switching Rates Using the Potential Landscape: Kramers’ Theory.

The shape of landscape shown in Fig. 3B, with its potential minima and barrier between them, allows us to directly compute the rate constants for switching between two stable states (attractors) using Kramers’ classic theory of energy barrier crossing (34). Relying solely on the shape of the landscape, these rates are given by (Eq. 9)

where U is the potential function, primes denote derivatives [therefore, Inline graphic is the landscape’s curvature at a local minimum], x_max is the position of the barrier peak, and D is the diffusion coefficient. For the Kramers’ model, D must be constant. As we show above, the only transformation of the reaction coordinate that produces a constant D, in our case, the log transformation, will yield a landscape with the correct curvature for Eq. 8. Furthermore, note that the curvature implicitly includes D. Applying this formula to the potential in Fig. 3B and using D = 0.02 log²(GFP)/d, which was obtained from Fig. 2F, we obtain K_LH = 0.004/d as the rate constant for transition of low-activity cells to high-activity cells and K_HL = 0.025/d for the transition of high-activity cells to low-activity cells. These values agree to within 50% of the values estimated from the fitting of the relaxation data to the two-state model (Fig. S3), and establish the predictive use of the landscape.

Discussion

We have shown how to rigorously derive the potential landscape for the activity of a gene promoter. The landscape is derived from the steady state distribution of the range of promoter activities within the population as determined by flow cytometry or static imaging measurements, and the kinetics of fluctuations in promoter activity in individual cells are determined from time-dependent fluorescence imaging data. Our approach is based on a Langevin equation, which includes the force that defines the shape of the landscape and a single kinetic constant that describes the fluctuations in promoter activity. The model accurately predicts the complex kinetics with which cells that are sorted according to their promoter activity relax back to the steady state distribution of activities. The time required for a selected subpopulation to transition from its narrow distribution of states back to a broad distribution of phenotypes can take weeks, but it can be predicted using data that can be acquired in about 1 d. A similar slow relaxation was observed for subpopulations of stem cells sorted for Sca-1 activity, and the work by Chang et al. (8) showed the significance of population heterogeneity to differentiation potential. In that report, the data were modeled as a transition between two states (8). In this report, we also explored a two-state model, but in addition, we performed a kinetic analysis of fluctuations in promoter activity using time-lapse microscopy on individual cells. These data established that these fluctuations are diffusive in nature. The transformation of the reaction coordinate to log GFP provides a constant diffusion coefficient that allowed us to use a continuum model in the form of a simple Langevin equation and the Kramers’ theory to predict rates of transition between states.

The TN-C promoter seems to have two minima in its potential landscape. In a microarray gene expression study using hundreds of tissue samples in mice (35), TN-C was reported to be one of the genes that is expressed with a bimodal distribution of levels. It was reported to be off more often than on, consistent with our deduced steady state distribution, suggesting that the activity of the promoter measured in cell culture reflects, at least partially, its physiological activity. The potential landscape that we determine allows the switching rates between these two relatively stable promoter activity states to be predicted by applying Kramers’ theory (34), which takes into account the shape of the landscape and the diffusion coefficient. The calculated Kramers’ switching rates, K_LH = 0.004/d and K_HL = 0.025/d, were found to agree to within 50% of the switching rates obtained from the sorting/relaxation experiment, k_LH = 0.004/d and k_HL = 0.017/d. Based on these results, we can predict that, at the single-cell level (ignoring progeny from division), one would wait, on average, Inline graphic d for a high-activity cell to switch and d for a low-activity cell to switch. Considering progeny and starting with a single cell that doubles approximately every 1 d, one would wait, on average, d (for a high cell) and d (for a low cell) before observing a cell that had switched. The predictive value of this approach to modeling cellular dynamics may have important practical implications for applications in stem cell biology and biomanufacturing as well as tumor heterogeneity and drug resistance.

Although the Kramers’ rate of transition of cells in the population from the low to high states agrees with the values found from the two-state model, it should be noted that the two-state model only provides a reasonable fit to the relaxation responses from the populations of cells with the lowest and highest promoter activities. The intermediate populations show more complex relaxation kinetics, which require the Langevin/Fokker–Planck model to describe adequately. The biological significance of these findings is that switching from a low- to high-activity state for TN-C promoter activity is a diffusive process; therefore, cells with concentrations near the state barrier are closer to switching than cells away from the barrier. Cells that are near the barrier have a finite probability of changing in activity in either direction, thus displaying a more complex response in activity with time of relaxation. An alternate outcome could have been that switching occurs spontaneously through some discrete process without coupling to a slow variation because of noise in response within the population. However, our results (e.g., the faster relaxation of Dish III vs. Dish IV in Fig. 4) show that this result is not the case for TN-C, and the simulation allows us to predict how cells explore state space differently depending on where they reside in the landscape. A practical implication of this finding is that, for cell-sorting experiments, the details of the window size and location used to select subpopulations can have a measurable influence on the apparent stability of the selected subpopulations.

Fluctuations in promoter activity, caused by binding and unbinding of effector molecules, can dominate protein number fluctuations (36). For TN-C, the dependence of the noise in GFP on the intensity of GFP expression suggests a multiplicative process in the activation of the promoter. This finding is consistent with the known characteristics of the TN-C promoter, which is representative of a class of environmentally responsive proteins (SI Text, section 6).

It is important to note that, in a population of cells that exhibits a stationary distribution of phenotypic states, no state is a permanent condition for any cell. A force constrains the state space that can be explored by cells; however, because the entire space is never explored by any individual cell during its lifetime, the Langevin dynamics of the population by necessity require mother/daughter correlations (which we have independently determined). Those multigenerational correlations are epigenetic in nature in the broadest sense; they could be caused by epigenetic modifications to the DNA or perhaps, passage of high concentrations of transcription factors from mothers to daughters.

In this work, we have correctly predicted the time dependence for relaxation of subpopulations of cells as they reestablish the steady state distribution of promoter activities. This approach will likely be applicable to other similar perturbation/relaxation scenarios involving other cellular phenotypic features. For example, the induction of differentiation of a pluripotent population of cells may be thought of as imposing, through chemical treatment, a new potential landscape of gene expression on that population. The rate at which the pluripotent population relaxes to this new potential landscape should be predictable from the landscape that is derived from the final steady state distribution and the rate of fluctuation of the phenotypic feature.

Materials and Methods

Cells, Cell Culture, Quantitative Imaging, and Image Analysis.

The preparation of the cell line and all cell culture, imaging, and image analysis were performed as previously published (31). NIH 3T3 mouse fibroblasts (ATCC) were transfected with a construct of a 4.1-kbp fragment of the TN-C promoter (provided by Peter L. Jones, University of Pennsylvania, State College, PA) fused to a destabilized EGFP (Clontech Laboratories). Phase contrast and fluorescence images were acquired every 15 min for >62 h in 36 different fields. Additional details can be found in SI Text, section 5.

Flow Cytometry and FACS.

Flow cytometry was performed with a Beckmann Coulter Quanta SC Flow Cytometer. Volume and intensity measurements were performed simultaneously for each cell to obtain the relative GFP concentration (intensity/volume). FACS was performed using a FACSAria II Sorter (BD Biosciences). Cells were first gated on 2D forward and side scatter to exclude debris followed by four gates based on the FITC (GFP) channel fluorescence to isolate four subpopulations. Additional details can be found in SI Text, section 5.

Steady State Distribution Fitting.

The steady state distribution of GFP concentration for cells in the population (Fig. S2A) as measured by flow cytometry was fit to a sum of two log normals by log-transforming the data, binning, and then fitting the histogram to a sum of two Gaussians: Inline graphic , where x = log(intensity/volume), and are free parameters. The fitting was performed using the nonlinear curve-fitting function lsqcurvefit in Matlab (Optimization Toolbox). The parameters give the relative weighting of each Gaussian. The best-fit values in Fig. S2A were . For Fig. 3A, the weightings were changed to be Inline graphic ; therefore, and the parameters used were μ₁ = 1.2, μ₂ = 2.5, σ₁ = 0.20, and σ₂ = 0.58.

Quantifying the Portion of Cells in Low State.

For Fig. 4 and Fig. S3, the histograms measured by flow cytometry are reduced to a single scalar: the fraction of cells in the low state, which we determine by a threshold. The threshold was chosen to be the local minimum (between the two peaks) of the log-transformed GFP concentration. The portion of cells above and below this threshold was found to well approximate a₁ and a₂ from fitting (Fig. S2A).

Supplementary Material

Supporting Information

supp_109_47_19262__index.html^{(735B, html)}

Acknowledgments

This work was funded, in part, by a National Institute of Standards and Technology National Research Council (NRC) Postdoctoral Fellowship (to D.R.S.). Any mention of commercial products is for information only; it does not imply recommendation or endorsement by NIST.

Footnotes

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1207544109/-/DCSupplemental.

References

1.Novick A, Weiner M. Enzyme induction as an all-or-none phenomenon. Proc Natl Acad Sci USA. 1957;43(7):553–566. doi: 10.1073/pnas.43.7.553. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Spudich JL, Koshland DE., Jr Non-genetic individuality: Chance in the single cell. Nature. 1976;262(5568):467–471. doi: 10.1038/262467a0. [DOI] [PubMed] [Google Scholar]
3.Newman JR, et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature. 2006;441(7095):840–846. doi: 10.1038/nature04785. [DOI] [PubMed] [Google Scholar]
4.Sigal A, et al. Variability and memory of protein levels in human cells. Nature. 2006;444(7119):643–646. doi: 10.1038/nature05316. [DOI] [PubMed] [Google Scholar]
5.Langenbach KJ, Elliott JT, Tona A, McDaniel D, Plant AL. Thin films of Type 1 collagen for cell by cell analysis of morphology and tenascin-C promoter activity. BMC Biotechnol. 2006;6:14. doi: 10.1186/1472-6750-6-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Spencer SL, Gaudet S, Albeck JG, Burke JM, Sorger PK. Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature. 2009;459(7245):428–432. doi: 10.1038/nature08012. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Stockholm D, et al. Bistable cell fate specification as a result of stochastic fluctuations and collective spatial cell behaviour. PLoS One. 2010;5(12):e14441. doi: 10.1371/journal.pone.0014441. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S. Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature. 2008;453(7194):544–547. doi: 10.1038/nature06965. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Raj A, van Oudenaarden A. Nature, nurture, or chance: Stochastic gene expression and its consequences. Cell. 2008;135(2):216–226. doi: 10.1016/j.cell.2008.09.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Blake WJ, KAErn M, Cantor CR, Collins JJ. Noise in eukaryotic gene expression. Nature. 2003;422(6932):633–637. doi: 10.1038/nature01546. [DOI] [PubMed] [Google Scholar]
11.Mettetal JT, Muzzey D, Pedraza JM, Ozbudak EM, van Oudenaarden A. Predicting stochastic gene expression dynamics in single cells. Proc Natl Acad Sci USA. 2006;103(19):7304–7309. doi: 10.1073/pnas.0509874103. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Thattai M, van Oudenaarden A. Intrinsic noise in gene regulatory networks. Proc Natl Acad Sci USA. 2001;98(15):8614–8619. doi: 10.1073/pnas.151588598. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Waddington HK. The Strategy of the Genes; A Discussion of Some Aspects of Theoretical Biology. New York: Macmillan; 1957. [Google Scholar]
14.Goldberg AD, Allis CD, Bernstein E. Epigenetics: A landscape takes shape. Cell. 2007;128(4):635–638. doi: 10.1016/j.cell.2007.02.006. [DOI] [PubMed] [Google Scholar]
15.MacArthur BD, Please CP, Oreffo RO. Stochasticity and the molecular mechanisms of induced pluripotency. PLoS One. 2008;3(8):e3086. doi: 10.1371/journal.pone.0003086. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Slack JM. Conrad Hal Waddington: The last Renaissance biologist? Nat Rev Genet. 2002;3(11):889–895. doi: 10.1038/nrg933. [DOI] [PubMed] [Google Scholar]
17.Gilbert SF. Epigenetic landscaping: Waddington's use of cell fate bifurcation diagrams. Biol Philos. 1991;6(2):135–154. [Google Scholar]
18.Wang J, Zhang K, Xu L, Wang E. Quantifying the Waddington landscape and biological paths for development and differentiation. Proc Natl Acad Sci USA. 2011;108(20):8257–8262. doi: 10.1073/pnas.1017017108. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Sasai M, Wolynes PG. Stochastic gene expression as a many-body problem. Proc Natl Acad Sci USA. 2003;100(5):2374–2379. doi: 10.1073/pnas.2627987100. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Hasty J, Pradines J, Dolnik M, Collins JJ. Noise-based switches and amplifiers for gene expression. Proc Natl Acad Sci USA. 2000;97(5):2075–2080. doi: 10.1073/pnas.040411297. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Kim KY, Wang J. Potential energy landscape and robustness of a gene regulatory network: Toggle switch. PLoS Comput Biol. 2007;3(3):e60. doi: 10.1371/journal.pcbi.0030060. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG. Funnels, pathways, and the energy landscape of protein folding: A synthesis. Proteins. 1995;21(3):167–195. doi: 10.1002/prot.340210302. [DOI] [PubMed] [Google Scholar]
23.Onuchic JN, Luthey-Schulten Z, Wolynes PG. Theory of protein folding: The energy landscape perspective. Annu Rev Phys Chem. 1997;48:545–600. doi: 10.1146/annurev.physchem.48.1.545. [DOI] [PubMed] [Google Scholar]
24.Best RB, Hummer G. Coordinate-dependent diffusion in protein folding. Proc Natl Acad Sci USA. 2010;107(3):1088–1093. doi: 10.1073/pnas.0910390107. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Erickson HP, Bourdon MA. Tenascin: An extracellular matrix protein prominent in specialized embryonic tissues and tumors. Annu Rev Cell Biol. 1989;5:71–92. doi: 10.1146/annurev.cb.05.110189.000443. [DOI] [PubMed] [Google Scholar]
26.Chiquet-Ehrismann R, Hagios C, Schenk S. The complexity in regulating the expression of tenascins. Bioessays. 1995;17(10):873–878. doi: 10.1002/bies.950171009. [DOI] [PubMed] [Google Scholar]
27.Faissner A. The tenascin gene family in axon growth and guidance. Cell Tissue Res. 1997;290(2):331–341. doi: 10.1007/s004410050938. [DOI] [PubMed] [Google Scholar]
28.Jones PL, Cowan KN, Rabinovitch M. Tenascin-C, proliferation and subendothelial fibronectin in progressive pulmonary vascular disease. Am J Pathol. 1997;150(4):1349–1360. [PMC free article] [PubMed] [Google Scholar]
29.Jones FS, Jones PL. The tenascin family of ECM glycoproteins: Structure, function, and regulation during embryonic development and tissue remodeling. Dev Dyn. 2000;218(2):235–259. doi: 10.1002/(SICI)1097-0177(200006)218:2<235::AID-DVDY2>3.0.CO;2-G. [DOI] [PubMed] [Google Scholar]
30.Cowan KN, Jones PL, Rabinovitch M. Elastase and matrix metalloproteinase inhibitors induce regression, and tenascin-C antisense prevents progression, of vascular disease. J Clin Invest. 2000;105(1):21–34. doi: 10.1172/JCI6539. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Halter M, et al. Cell cycle dependent TN-C promoter activity determined by live cell imaging. Cytometry A. 2011;79(3):192–202. doi: 10.1002/cyto.a.21028. [DOI] [PubMed] [Google Scholar]
32.Risken H. The Fokker-Planck Equation: Methods of Solution and Applications. 2nd Ed. Springer, Berlin: Springer Series in Synergetics; 1989. [Google Scholar]
33.van Kampen NG. Stochastic Processes in Physics and Chemistry. 3rd Ed. Amsterdam: Elsevier; 2007. [Google Scholar]
34.Mel'nikov VI. The Kramers problem: Fifty years of development. Phys Rep. 1991;209(1–2):1–71. [Google Scholar]
35.Ertel A, Tozeren A. Switch-like genes populate cell communication pathways and are enriched for extracellular proteins. BMC Genomics. 2008;9:3. doi: 10.1186/1471-2164-9-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Walczak AM, Onuchic JN, Wolynes PG. Absolute rate theories of epigenetic stability. Proc Natl Acad Sci USA. 2005;102(52):18926–18931. doi: 10.1073/pnas.0509547102. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

supp_109_47_19262__index.html^{(735B, html)}

1207544109_pnas.201207544SI.pdf^{(781.4KB, pdf)}

[r1] 1.Novick A, Weiner M. Enzyme induction as an all-or-none phenomenon. Proc Natl Acad Sci USA. 1957;43(7):553–566. doi: 10.1073/pnas.43.7.553. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r2] 2.Spudich JL, Koshland DE., Jr Non-genetic individuality: Chance in the single cell. Nature. 1976;262(5568):467–471. doi: 10.1038/262467a0. [DOI] [PubMed] [Google Scholar]

[r3] 3.Newman JR, et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature. 2006;441(7095):840–846. doi: 10.1038/nature04785. [DOI] [PubMed] [Google Scholar]

[r4] 4.Sigal A, et al. Variability and memory of protein levels in human cells. Nature. 2006;444(7119):643–646. doi: 10.1038/nature05316. [DOI] [PubMed] [Google Scholar]

[r5] 5.Langenbach KJ, Elliott JT, Tona A, McDaniel D, Plant AL. Thin films of Type 1 collagen for cell by cell analysis of morphology and tenascin-C promoter activity. BMC Biotechnol. 2006;6:14. doi: 10.1186/1472-6750-6-14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6] 6.Spencer SL, Gaudet S, Albeck JG, Burke JM, Sorger PK. Non-genetic origins of cell-to-cell variability in TRAIL-induced apoptosis. Nature. 2009;459(7245):428–432. doi: 10.1038/nature08012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7] 7.Stockholm D, et al. Bistable cell fate specification as a result of stochastic fluctuations and collective spatial cell behaviour. PLoS One. 2010;5(12):e14441. doi: 10.1371/journal.pone.0014441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r8] 8.Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S. Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature. 2008;453(7194):544–547. doi: 10.1038/nature06965. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r9] 9.Raj A, van Oudenaarden A. Nature, nurture, or chance: Stochastic gene expression and its consequences. Cell. 2008;135(2):216–226. doi: 10.1016/j.cell.2008.09.050. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10] 10.Blake WJ, KAErn M, Cantor CR, Collins JJ. Noise in eukaryotic gene expression. Nature. 2003;422(6932):633–637. doi: 10.1038/nature01546. [DOI] [PubMed] [Google Scholar]

[r11] 11.Mettetal JT, Muzzey D, Pedraza JM, Ozbudak EM, van Oudenaarden A. Predicting stochastic gene expression dynamics in single cells. Proc Natl Acad Sci USA. 2006;103(19):7304–7309. doi: 10.1073/pnas.0509874103. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r12] 12.Thattai M, van Oudenaarden A. Intrinsic noise in gene regulatory networks. Proc Natl Acad Sci USA. 2001;98(15):8614–8619. doi: 10.1073/pnas.151588598. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r13] 13.Waddington HK. The Strategy of the Genes; A Discussion of Some Aspects of Theoretical Biology. New York: Macmillan; 1957. [Google Scholar]

[r14] 14.Goldberg AD, Allis CD, Bernstein E. Epigenetics: A landscape takes shape. Cell. 2007;128(4):635–638. doi: 10.1016/j.cell.2007.02.006. [DOI] [PubMed] [Google Scholar]

[r15] 15.MacArthur BD, Please CP, Oreffo RO. Stochasticity and the molecular mechanisms of induced pluripotency. PLoS One. 2008;3(8):e3086. doi: 10.1371/journal.pone.0003086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16.Slack JM. Conrad Hal Waddington: The last Renaissance biologist? Nat Rev Genet. 2002;3(11):889–895. doi: 10.1038/nrg933. [DOI] [PubMed] [Google Scholar]

[r17] 17.Gilbert SF. Epigenetic landscaping: Waddington's use of cell fate bifurcation diagrams. Biol Philos. 1991;6(2):135–154. [Google Scholar]

[r18] 18.Wang J, Zhang K, Xu L, Wang E. Quantifying the Waddington landscape and biological paths for development and differentiation. Proc Natl Acad Sci USA. 2011;108(20):8257–8262. doi: 10.1073/pnas.1017017108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r19] 19.Sasai M, Wolynes PG. Stochastic gene expression as a many-body problem. Proc Natl Acad Sci USA. 2003;100(5):2374–2379. doi: 10.1073/pnas.2627987100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r20] 20.Hasty J, Pradines J, Dolnik M, Collins JJ. Noise-based switches and amplifiers for gene expression. Proc Natl Acad Sci USA. 2000;97(5):2075–2080. doi: 10.1073/pnas.040411297. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21] 21.Kim KY, Wang J. Potential energy landscape and robustness of a gene regulatory network: Toggle switch. PLoS Comput Biol. 2007;3(3):e60. doi: 10.1371/journal.pcbi.0030060. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r22] 22.Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG. Funnels, pathways, and the energy landscape of protein folding: A synthesis. Proteins. 1995;21(3):167–195. doi: 10.1002/prot.340210302. [DOI] [PubMed] [Google Scholar]

[r23] 23.Onuchic JN, Luthey-Schulten Z, Wolynes PG. Theory of protein folding: The energy landscape perspective. Annu Rev Phys Chem. 1997;48:545–600. doi: 10.1146/annurev.physchem.48.1.545. [DOI] [PubMed] [Google Scholar]

[r24] 24.Best RB, Hummer G. Coordinate-dependent diffusion in protein folding. Proc Natl Acad Sci USA. 2010;107(3):1088–1093. doi: 10.1073/pnas.0910390107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r25] 25.Erickson HP, Bourdon MA. Tenascin: An extracellular matrix protein prominent in specialized embryonic tissues and tumors. Annu Rev Cell Biol. 1989;5:71–92. doi: 10.1146/annurev.cb.05.110189.000443. [DOI] [PubMed] [Google Scholar]

[r26] 26.Chiquet-Ehrismann R, Hagios C, Schenk S. The complexity in regulating the expression of tenascins. Bioessays. 1995;17(10):873–878. doi: 10.1002/bies.950171009. [DOI] [PubMed] [Google Scholar]

[r27] 27.Faissner A. The tenascin gene family in axon growth and guidance. Cell Tissue Res. 1997;290(2):331–341. doi: 10.1007/s004410050938. [DOI] [PubMed] [Google Scholar]

[r28] 28.Jones PL, Cowan KN, Rabinovitch M. Tenascin-C, proliferation and subendothelial fibronectin in progressive pulmonary vascular disease. Am J Pathol. 1997;150(4):1349–1360. [PMC free article] [PubMed] [Google Scholar]

[r29] 29.Jones FS, Jones PL. The tenascin family of ECM glycoproteins: Structure, function, and regulation during embryonic development and tissue remodeling. Dev Dyn. 2000;218(2):235–259. doi: 10.1002/(SICI)1097-0177(200006)218:2<235::AID-DVDY2>3.0.CO;2-G. [DOI] [PubMed] [Google Scholar]

[r30] 30.Cowan KN, Jones PL, Rabinovitch M. Elastase and matrix metalloproteinase inhibitors induce regression, and tenascin-C antisense prevents progression, of vascular disease. J Clin Invest. 2000;105(1):21–34. doi: 10.1172/JCI6539. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r31] 31.Halter M, et al. Cell cycle dependent TN-C promoter activity determined by live cell imaging. Cytometry A. 2011;79(3):192–202. doi: 10.1002/cyto.a.21028. [DOI] [PubMed] [Google Scholar]

[r32] 32.Risken H. The Fokker-Planck Equation: Methods of Solution and Applications. 2nd Ed. Springer, Berlin: Springer Series in Synergetics; 1989. [Google Scholar]

[r33] 33.van Kampen NG. Stochastic Processes in Physics and Chemistry. 3rd Ed. Amsterdam: Elsevier; 2007. [Google Scholar]

[r34] 34.Mel'nikov VI. The Kramers problem: Fifty years of development. Phys Rep. 1991;209(1–2):1–71. [Google Scholar]

[r35] 35.Ertel A, Tozeren A. Switch-like genes populate cell communication pathways and are enriched for extracellular proteins. BMC Genomics. 2008;9:3. doi: 10.1186/1471-2164-9-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r36] 36.Walczak AM, Onuchic JN, Wolynes PG. Absolute rate theories of epigenetic stability. Proc Natl Acad Sci USA. 2005;102(52):18926–18931. doi: 10.1073/pnas.0509547102. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Predicting rates of cell state change caused by stochastic fluctuations using a data-driven landscape model

Daniel R Sisan

Michael Halter

Joseph B Hubbard

Anne L Plant

Abstract