Abstract
Learning the underlying details of a gene network is a major challenge in cellular and synthetic biology. We address this challenge by building a chemical kinetic model that utilizes information encoded in the stochastic protein expression trajectories typically measured in experiments. The applicability of the proposed method is demonstrated in an auto-activating genetic circuit, a common motif in natural and synthetic gene networks. Our approach is based on the principle of maximum caliber (MaxCal)—a dynamical analog of the principle of maximum entropy—and builds a minimal model using only three constraints: 1) protein synthesis, 2) protein degradation, and 3) positive feedback. The MaxCal-generated model (described with four parameters) was benchmarked against synthetic data generated using a Gillespie algorithm on a known reaction network (with seven parameters). MaxCal accurately predicts underlying rate parameters of protein synthesis and degradation as well as experimental observables such as protein number and dwell-time distributions. Furthermore, MaxCal yields an effective feedback parameter that can be useful for circuit design. We also extend our methodology and demonstrate how to analyze trajectories that are not in protein numbers but in arbitrary fluorescence units, a more typical condition in experiments. This “top-down” methodology based on minimal information—in contrast to traditional “bottom-up” approaches that require ad hoc knowledge of circuit details—provides a powerful tool to accurately infer underlying details of feedback circuits that are not otherwise visible in experiments and to help guide circuit design.
Introduction
Biological function is largely dictated by gene networks that control protein expression in single cells. Understanding details of these networks and consequently building quantitative models is essential to control gene expression and ultimately regulate cellular dynamics. However, model development has been limited due to the lack of information about the complex web of interactions (including feedback regulation) that defines these networks. Typical experiments only provide partial information by measuring the expression levels of one or two proteins of interest using fluorescent tags, much less than the actual number of entities (mRNAs, promoters, nucleotides, and amino acids) involved in the process of gene expression. This problem of partial information is a key challenge for model building. Although the number of species monitored is limited, experimental read-outs contain crucial information, as they record the entire time trajectory of fluctuating protein expression levels. The stochastic nature of the trajectories is due to small copy numbers of molecules involved in these reactions (1, 2, 3, 4, 5, 6, 7, 8, 9). The details of noise statistics encode the details of network architecture. This provides a potentially useful avenue for inferring details of network architecture by analyzing noisy protein expression levels (10, 11, 12, 13, 14, 15). Despite realizing the power of this approach (10, 12, 14, 15), such efforts are still in their infancy. Existing models are either too simple, with limited single-cell-level predictive power, or too detailed, requiring too many unknown parameters (16). The most common stochastic approaches first define sets of reaction networks to be simulated using a Gillespie algorithm (17) or related methods and then fit different observables to determine the corresponding reaction rate parameters. A major drawback of these methods is that they are “bottom-up” and require detailed knowledge of the underlying reaction network. This is particularly challenging when networks involve feedback, a common feature in many natural networks and synthetic biology. It is currently impossible to test many of these ad hoc assumptions independently. Furthermore, these approaches can involve too many parameters that can fit the same data with multiple models, creating additional challenges for efficient parameter estimation (11). The challenge of having too many parameters is also problematic for circuit design (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30), as it requires ways to efficiently explore parameter space to test different models, thus demanding models with the least possible number of parameters.
To circumvent these obstacles, we propose a “top-down” approach for modeling these networks. We use the principle of maximum caliber (MaxCal) to model stochastic trajectories with minimal information. We show the application of MaxCal on a simple auto-activating circuit, a common motif in many biological circuits (31). MaxCal maximizes path entropy subject to constraints, similar to maximum entropy on state space, and directly works with path trajectories. This makes MaxCal directly applicable to experimentally measured time trajectories of protein numbers. We establish the methodology on synthetic data generated using Gillespie simulations (17) of a known auto-activating circuit. These trajectory data serve as the input data—a proxy for experimental data—for MaxCal. The minimal model of MaxCal is then applied to the raw trajectory statistics in conjunction with maximum likelihood (ML) to determine representative parameters for the model. These parameters can predict other statistics of the data and quantitatively infer several underlying physical variables that are not visible otherwise. In the next section, we first describe the synthetic circuit and generation of in silico data that mimic experimental data. Next, we introduce MaxCal and its specific application to the circuit. We show how MaxCal, along with ML, can be used to infer model parameters and make predictions. Comparing these predictions against the known model allows us to benchmark the predictive capabilities of MaxCal. Finally, we discuss how the methodology can be applied when the input data are not in protein number but in arbitrary fluorescence, a common challenge in interpreting experimental data.
Materials and Methods
Generating synthetic data for an auto-activating circuit
Considering the complexity of natural networks with many unknown or incompletely understood interactions, synthetic biologists are building mimics of frequently occurring parts of bigger networks, called network motifs (32, 33, 34). One natural network motif with important biological function that has inspired the design of many synthetic gene circuits is feedback regulation. Our previous work (35) has demonstrated the application of MaxCal on double-negative (overall positive) feedback circuits, where two genes mutually repress each other, commonly referred to as a toggle-switch circuit (36, 37). Here, we consider a positive feedback circuit where a single gene auto-activates itself. As a proof of concept, we apply MaxCal to synthetic data generated in silico using a model for which the underlying parameters are known. This will serve as a proxy for experimental data and provide us with a gold standard to which we can compare when demonstrating how well MaxCal performs given stochastic trajectories. Among several models of auto-activation in different biological contexts (38, 39, 40, 41, 42, 43, 44, 45), we adopt the one below (Eq. 1), studied by Kepler and Elston (46), to generate stochastic synthetic data that will serve to mimic experimental time traces:
(1) |
In this scheme, some generic protein, A, is created from its corresponding gene, α, at a rate g, degrades at a rate r, and dimerizes into with forward and backward rates and , respectively. can then bind and unbind to the promoter site of α at rates and , respectively, sending α into or out of its activated state, . In this activated state, creates protein A at a much faster rate, , capturing the essentials of a positive-feedback mechanism. Rates are chosen to produce switching times that are representative of experiments (31) while maintaining protein synthesis and degradation rates in the realm of typical rates (47). A Gillespie algorithm (17) was used to generate stochastic trajectories of protein (A) levels as shown in Fig. 1 A. Three major features are worth noting: 1) two clearly separated high and low states, 2) a large amount of fluctuation within each state, and 3) stochastic switching between the two states. In the next section, we first attempt to reproduce these three basic features in MaxCal using as simple a framework as possible.
MaxCal model for auto-activating circuit
Maximum caliber is a variational principle that gives a prescription for inferring dynamics by maximizing the path entropy (35, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58), or caliber, subject to known constraints enforced via Lagrange multipliers. For the gene circuit of interest, there are three minimal constraints that must be in place: 1) protein synthesis, 2) protein degradation, and 3) auto-activation/positive feedback. We enforce the first two by restricting the average number of proteins that are created in a discrete time interval as well as the average number of proteins that are destroyed (35, 55). To do this, we define as the production-state variable, which describes the number of proteins that are created in the time interval and ranges as integer values between zero and some predefined maximal value (M), i.e., . We also define as the degradation-state variable, which describes the number of previously existing proteins that still exist at the end of the time interval. Clearly, ranges as integer values between zero and the number of proteins present at the beginning of the time interval , i.e., . The corresponding Lagrange multipliers for these two constraints are and , and the probability of observing a particular combination of and is defined as . Next, we implement the constraint of positive feedback, the idea that a high number of proteins should positively correlate with the production of A. This is done by introducing a third Lagrange multiplier, , that enforces a coupling between protein production and the presence of proteins by constraining the average of . This is the lowest-order term in the coupling of these two variables that must be imposed to capture the essence of feedback. Similar arguments were used to build models to describe negative feedback in toggle-switch circuitry (35). The four basic ingredients of the model, described above, yield the caliber as
(2) |
and the corresponding caliber-maximized path probabilities are
(3) |
Using this path probability distribution, stochastic trajectories are generated using a Monte Carlo method to select a path for each time point. The system then creates and destroys the number of proteins corresponding to the and of the selected path and the time of the system advances by the predetermined . A quick search of the parameter phase space (, , , M) reveals that even with just these four parameters, bimodal behaviors like the ones seen in the self-promotion circuit of the previous section (characterized with seven parameters in Fig. 1 A) can be reproduced (see Fig. 1 B). For efficient computation, protein number probability distributions are generated using the method of finite-state projection (FSP) of Munsky and Khammash (59). This method is needed to provide a systematic way to truncate the infinite phase space of possible states, since protein number does not have an upper bound. FSP provides a rigorous self-consistent approach to ensure that the truncation error is within a pre-determined error bound (see Supporting Material for exact application).
Furthermore, the state variables and directly relate to effective protein synthesis and degradation rates analogous to g, , and r in the auto-activation circuit (Eq. 1). Specifically,
(4) |
where and are the peak values of the number of proteins in the low and high states, respectively, and is the probability of having N proteins within the system (calculated via FSP).
An additional metric that could be of interest in genetic circuit design is the effective feedback metric, F. We define F as the average Pearson correlation coefficient between and :
(5) |
where and represent the standard deviations of and , respectively. The averages (in the numerator) and standard deviations (in the denominator) are first evaluated for a given N and then the ratio is further averaged over the protein number distribution, , to yield effective feedback, F. This parameter is designed to be restricted between and 1 as a way to quantify the relative feedback within the system and will help to objectively compare two independent gene circuits. Although in the application presented here we expect , we anticipate while describing negative-feedback circuits.
Parameter estimation via maximum likelihood
The exercise above ensures that the minimal model of MaxCal with only four parameters is capable of producing the general features of a bimodal system. Next, we proceed to benchmark the performance of the model quantitatively when given a particular stochastic trajectory to characterize. This will allow us to learn about quantitative details of the underlying network by decoding information hidden in the noisy raw trajectory. For example, we may be interested in inferring the effective synthesis/degradation rates or the degree of feedback (F), quantities that are not directly available from the raw experimental trajectory. Below, we provide the framework to quantitatively infer these specific characteristics of a network from the stochastic trajectory.
Consider an experimentally observed trajectory of sufficiently long time, T, expressed in the units of the typical timescale used for sampling the data. In this intrinsic time unit , we have frames at which the protein number has been recorded. Now consider a particular transition between two subsequent frames, say t and , in which the protein number changed from i to j. We denote the probability of this one-step (single-frame) transition as , which is abbreviated as . These one-step transition probabilities can be determined from MaxCal as
(6) |
where δ is the Dirac delta function, and are functions of the Lagrange multipliers, described by Eq. 3. The likelihood of observing the experimental trajectory given a specific set of MaxCal parameters (, , , and M) can then be calculated as
(7) |
where is the number of proteins present in frame t, is the total number of one-step transitions, and the second product is over all possible transitions between different values of i and j. As outlined above (Eq. 6), values are determined using MaxCal, hence the likelihood is a function of , , , and M. Thus, we can maximize the likelihood of the trajectory to select , , , and M.
Experiments (and our Gillespie simulations) have no upper limit on production analogous to M in MaxCal. Rare fluctuations leading to unusually large jumps in protein number (>M) in one time step will severely penalize the likelihood of parameter values that are otherwise most likely. This discontinuous jump in likelihood will erroneously eliminate the most likely set of parameters. We avoid this problem by calculating transition probabilities over multiple intervals (m frames) for a given set of MaxCal parameters. We denote the probability of a multi-step (multiple-frame) transition as , abbreviated as . This slightly modifies our likelihood function, , as
(8) |
where is rounded down to the nearest integer and is the total number of transitions over m frames. An objective choice of m can be provided by using the average residence times (in frames) in the high and low states . However, our result is not sensitive to the choice of m and is robust for a range of values around the typical value.
Dealing with experimental data
Although the procedure above is applicable to synthetic data in terms of protein number, typical experimental read-outs are in arbitrary fluorescence units. Furthermore, the amount of fluorescence measured per protein is noisy and requires one to de-convolute fluorescence fluctuations from protein number fluctuations. To mimic typical experimental readouts with these challenges, we use the same synthetic data from the auto-activating circuit, but “corrupt” it to create a fluorescence trajectory in silico that is likely to be observed in an experiment. We assume the probability distribution of fluorescence intensity (I) measured per protein to be a Gaussian distribution (60, 61, 62) centered at a with a standard deviation of b, i.e., and . With this assumption, the fluorescence measured from N proteins would follow a probability distribution that is a convolution of N protein fluorescence distributions leading to a Gaussian distribution with mean and variance . To “corrupt” simulated trajectories of protein numbers, we select a fluorescence for each time point from this distribution where the mean and variance depend on the protein number, N. Although the procedure described here assumes that the fluorescence per protein follows a Gaussian distribution, we used a similar approach for Γ distributions (63, 64) as well.
With this “synthetic fluorescence trajectory” closely mimicking realistic experimental situations, we propose two strategies to infer the underlying model. In the first strategy, we assume the average fluorescence intensity per protein is known, possibly obtained by carrying out low-intensity photobleaching (65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77). We use a as a conversion factor to determine protein number (N) from the fluorescence intensity, f, as
(9) |
where the Int function yields the nearest integer with negative protein numbers being rounded to zero. Parameter estimation then proceeds in the same fashion as before when analyzing trajectories in terms of protein number over time. In this strategy, parameter estimation takes place in two steps serially: first, fluorescence to number conversion (using Eq. 9), and then MaxCal with ML (as described earlier). We call this method serial fluorescence-to-number conversion, or simply SFNC.
We propose a second strategy in which the fluorescence fluctuation is included when calculating the likelihood of a set of MaxCal parameters. In this second approach (termed parallel FNC, or PFNC), we assume that the variance—in addition to the average—in intensity fluctuation per protein is also known, i.e., both a and b are given. This can be obtained using the same photobleaching experiment mentioned above to measure the probability distribution of fluorescence per protein (65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77). With this information, we can incorporate the fluorescence distribution in the likelihood function (Eq. 8), modifying it to
(10) |
where is the fluorescence at frame t, and is the conditional probability that proteins are present given a fluorescence measurement of . These probabilities are known and used as previously with the knowledge of the known variance and mean. The probability is determined as above using MaxCal and is a function of the Lagrange multipliers and M. The new likelihood function (Eq. 10) is then maximized to determine , , , and M.
Results and Discussion
MaxCal accurately infers underlying rate parameters
Using the procedures described above, we determine , , and M for a given stochastic trajectory in terms of either protein number or fluorescence readout. These fully specify the minimal MaxCal model and are capable of making multiple predictions, such as of the underlying rate parameters. Effective values for the underlying production and degradation rates can be predicted using the average value of the production- and degradation-state variables, respectively (see Eq. 4). To see how well these inferred rates compare to the true values, we applied our inference method to input trajectories that are ∼2000 frames long with an intrinsic sampling rate of 5 min , equivalent to trajectories of 7 days. Furthermore, we used 100 such trajectories, equivalent to tracking protein numbers in 100 cells. These numbers were chosen to closely match typical experimental conditions (31). To quantify the variance of the effective rate estimates, we apply our method to 10 different sets of these simulations and present the average and standard deviation of the 10 sets of predicted rates. Using simulations from the reaction rates listed in Table 1, the predicted values compare well against the “true” values used to generate the synthetic data (see Table 1). The robustness of the prediction was further tested by creating synthetic data using different values of g, , and r, and similar accuracies were produced. In addition, the inference scheme was applied to an alternate model of positive feedback—different from Eq. 1—to generate the synthetic data, and again, the inferred rates matched well with input values (see Supporting Material for details). However, it is important to realize that Eq. 4 is only an approximation to infer the intrinsic production and degradation rates. Thus, it is possible to have deviations between the inferred and true rates—higher than the ones reported in Table 1—whereas MaxCal captures the temporal statistics well (e.g., fluctuations in the high/low states and transitions between states).
Table 1.
True Values | Predicted Values | |
---|---|---|
(s) | ||
(s) | ||
(bits) | 8.84 | 9.24 ± 0.03 |
(bits) | 9.42 | 9.07 ± 0.02 |
(bits) | 6.20 | 7.66 ± 0.04 |
(bits) | 1.03 | 1.02 ± 0.01 |
The first column reports the “true” underlying protein synthesis and degradation rates used to create synthetic input data (, , , ), average residence times in the high and low states, and corresponding path informational entropies. Synthetic input data were recorded at . The second column reports the average and standard deviation of the same quantities of interest, but extracted using the MaxCal model on 10 sets of synthetic data, each consisting of 100 trajectories of 7 days.
Distributions predicted from MaxCal agree well with data
For a more detailed demonstration of how well MaxCal describes data, we further compared MaxCal-predicted distributions to that of the input data (generated from the reaction network in Eq. 1). Fig. 2 A shows that the protein number distribution predicted from MaxCal agrees well with the input data in that the locations and widths of the two peaks are comparable between the two approaches. Next, we compare the distribution of dwell times predicted by MaxCal to that obtained from the synthetic data. The agreement for the shape of the distribution and the average dwell times in the low and high states (see Fig. 2, B and C; Table 1) are reasonable.
The comparisons between “true” and predicted values for multiple observables show that the minimal model of MaxCal with only four parameters can make reasonable predictions for data generated with more complex models (with seven parameters). To further quantify the quality of the parameter extraction and performance of our minimal model against those of the actual model with more parameters, we compare the informational content in the “synthetic” Gillespie trajectories and trajectories generated by MaxCal using these parameters. We compute path informational entropy as (78)
(11) |
where is the probability of having i proteins in the system and is the probability of transitioning from i proteins to j proteins after a single frame. If our MaxCal model is too simple and cannot adequately capture the dynamics of the Gillespie trajectories used, its will be notably different from that of the Gillespie model. We find that the MaxCal model selected by ML has only a 4.5% difference in path informational entropy compared to the “synthetic” input data from Gillespie simulations (see Table 1). This provides quantitative verification that the minimal constraints used in Eq. 2 are sufficient to describe the auto-activating circuit modeled here. The overall path entropy has contributions from three types of fluctuations: 1) within the high state, 2) within the low state, and 3) transitions between the high and low states. To further explore how MaxCal-generated path entropy captures details of these fluctuations, we compute three additional path entropies: , , and . and are computed in the same fashion as , but only consider parts of the trajectory in the high state and low state respectively (see Supporting Material for high/low state assignment). To measure , the trajectory is first coarse-grained into a binary trajectory between the low state and the high state . is then calculated in the same manner as Eq. 11. We find that MaxCal generated estimates of and are in excellent agreement with the input data, whereas differs by ∼24% from the input (see Table 1). The analysis above provides a quantitative measure of performance for MaxCal with a given set of constraints. These measures can be further used to determine the need for incorporating higher-order combinations of the state variables to the caliber function (Eq. 2; e.g., , , etc.) to develop models of higher complexity (51).
MaxCal provides an effective feedback parameter for the circuit
We also extract the effective feedback parameter, F, using Eq. 5. As a demonstration of its usefulness, if we compared the MaxCal parameters extracted from experimental traces with varying concentrations of inducer (31), the effective production and degradation rates might be similar, but F would be expected to vary with different amounts of inducer, representing the degree of coupling between the production of A and the concentration of A. To mimic the effect of varying inducer concentrations, we generated synthetic data with higher or lower promoter binding rates, , to effectively increase or decrease the amount of self promotion in the system. Next, we applied our MaxCal framework to these trajectories with different levels of self-promotion. Fig. 3, A–C, shows that MaxCal reproduces comparable protein number distributions regardless of the degree of self-promotion. Table 2 further demonstrates that although MaxCal infers very similar production and degradation rates between the three levels of self-promotion, the effective feedback, F, changes accordingly.
Table 2.
Extracted g | Extracted | Extracted r | F | |
---|---|---|---|---|
Each row reports the average and SD of the extracted production and degradation rates as well as the effective feedback, F, for different values of . Similar to Fig. 2, for all three cases, , , , , , , and 10 sets of synthetic data, each equivalent to 100 trajectories of 7 days, were used to extract predicted values and standard deviations.
Estimating the effective feedback parameter can be important, as it determines the onset of bimodality from unimodality as well as the relative population in the high and low states. Bimodal protein distributions and stochastic switching between the two states often dictate phenotypic variability, a characteristic of bet-hedging strategies used by microbes to evade stress such as antibiotic (31, 79, 80). Consequently, different strains that have evolved under different selection pressures may differentially tune their level of feedback (81). Similarly, it may be interesting to see whether strains using “resistance” or “tolerance” mechanisms to evade antibiotics (82) evolve their feedback parameters differently. Applying MaxCal on experimental trajectories of different strains evolved under different conditions to infer these feedback parameters can give us further insights into evolvability and selection. Similarly, this metric can be useful when describing circuits with negative feedback as well.
The ability to extract an effective feedback parameter is a special feature of MaxCal that provides a coarse-grained description of feedback. This is in contrast to traditional parameterization schemes that invoke auxiliary species and multiple reactions involving many parameters to describe feedback. As a result, MaxCal can provide a model with fewer parameters compared to traditional bottom-up approaches. This is true even when describing circuits with multiple species beyond the single-gene expression circuit used in this study (35, 55). The success of MaxCal presented here motivates the need for future studies on synthetic data generated using more intermediate steps, such as RNA synthesis before protein synthesis. Further research must also be performed on circuits involving more species that mutually regulate each other, possibly leading to oscillatory behaviors as in the repressilator circuit of Elowitz and Liebler (83). It is also important to note that MaxCal is exactly equivalent to the master equation when describing systems without feedback, e.g., biochemical cycles where states interconvert among themselves (52, 55, 84).
MaxCal can be applied when dealing with noisy fluorescence trajectories
The results above illustrate the applicability of MaxCal when experimental trajectories are expressed in protein-number fluctuations. We now proceed to demonstrate the applicability of MaxCal when data are reported in noisy fluorescence trajectories instead of protein-number trajectories. We use both methodologies, SFNC and PFNC, as described earlier, to infer the underlying model from the noisy data. Fig. 4 and Table 3 show the performance of these strategies tested against “corrupted” synthetic data. SFNC, based on only the knowledge of average intensity per protein, performs well when the fluctuation in fluorescence per protein is sufficiently small compared to the average fluorescence (see Table 3). However, SFNC starts to deviate significantly from the “true” values when noise increases, e.g., noise is >100% of the average fluorescence. In fact, at these levels of noise, it becomes increasingly difficult to determine the unique ML function, and the corresponding values of rate parameters start to deviate largely from the true values. Considering this deficiency, SFNC should not be used for noise levels >100%. PFNC, on the other hand, does not suffer from any such issues. PFNC infers rates with reasonable accuracy even when noise is as high as 200% (see Table 3, bottom rows). The success of PFNC is further demonstrated by comparing “true” and predicted distributions of protein numbers and dwell times (see Fig. 4) at this level of noise. PFNC performs better than SFNC due to the incorporation of fluorescence fluctuation within its ML procedure. Although the above results were extracted from data using a Gaussian fluorescence distribution, we carried out similar exercises using a Γ distribution for the fluorescence per protein (63, 64), with similarly accurate results. This highlights the need for carrying out controlled photobleaching experiments to learn about the average as well as the noise in the fluorescence per protein to faithfully infer underlying dynamics. In summary, the exercise above demonstrates broad applicability of MaxCal, even when experimental data are not in protein number but in fluorescence with high fluctuation.
Table 3.
Noise | ||||
---|---|---|---|---|
True | ||||
0% | MaxCal | |||
50% | SFNC | |||
50% | PFNC | |||
100% | SFNC | |||
100% | PFNC | |||
150% | PFNC | |||
200% | PFNC |
The first row reports the “true” underlying protein synthesis and degradation rates used to create synthetic input data (same rates and conditions as in Table 1). The second row reports the average and standard deviation of MaxCal-inferred rates when trajectories are in protein number. Rows 3–8 report extracted rates for synthetically corrupted trajectories generated using different levels of noise in fluorescence per protein compared to the average (indicated in column 1) and different methods of extraction (SFNC and PFNC, as indicated in column 5).
Conclusions
We use the principle of maximum caliber (MaxCal)—akin to the principle of maximum entropy applied to describe path probabilities—to model protein-number fluctuations as observed in genetic circuits. We demonstrate the application of MaxCal in a positive feedback circuit, a common motif in many naturally occurring and synthetic circuits. Specifically, we consider a single-gene auto-activating circuit where a minimal model based on MaxCal was developed with three physical constraints: protein synthesis, protein degradation, and positive feedback. Through this analysis, we make four key conclusions. First, the minimal model is capable of producing the switch-like behavior of the circuit. Second, the model shows its usefulness to quantitatively infer underlying parameters. To mimic raw data from experiment, synthetic data were generated using a Gillespie algorithm with a known reaction network model to produce trajectories of fluctuating protein numbers. MaxCal correctly infers underlying rates when compared to the “known” values. Furthermore, MaxCal-predicted distributions agree well with the ones derived from the input data. Third, MaxCal provides an effective feedback parameter to characterize these circuits that can be useful for circuit design as well as analysis of differently evolved strains. Finally, we show how similar methods can be applied when the raw trajectory is in fluorescence rather than protein number, a typical attribute of experimental data. We demonstrate this by “corrupting” the same synthetic protein number trajectories with Gaussian fluctuation to create noisy fluorescence trajectories. In the regime of low fluorescence noise, the average fluorescence per protein can be used to convert traces back to protein number, followed by MaxCal to infer the model (SFNC). However, higher levels of noise require a more integrated approach (PFNC), where a model’s likelihood is calculated by combining both MaxCal-generated transition probabilities and fluorescence fluctuation. Using fluorescence-corrupted trajectories, we show that PFNC can infer underlying rates and distributions of observables even when the relative noise is fairly high. The method presented here demonstrates the potential application of MaxCal to broader problems in gene networks involving feedback, even when data are presented in fluorescence.
Author Contributions
T.F., G.B., and K.G. designed research. T.F. performed research. T.F. and K.G. analyzed data, and T.F., G.B., and K.G. wrote the article.
Acknowledgments
We thank Steve Pressé, Brian Munsky, Tamás Székely, Tristan Bereau, and Ken Dill for many stimulating and helpful discussions. We acknowledge the High Performance Computing facility at the University of Denver for computing assistance.
We acknowledge support from the National Science Foundation (award number 1149992), the Research Corporation for Science Advancement (as a Cottrell Scholar), and the PROF grant from the University of Denver.
G.B. was supported by National Institutes of Health National Institute of General Medical Sciences (NIH-NIGMS) grant R35GM122561 and by a Laufer Center for Physical and Quantitative Biology endowment.
Editor: Nathalie Balaban.
Footnotes
Supporting Materials and Methods, one figure, and one table are available at http://www.biophysj.org/biophysj/supplemental/S0006-3495(17)31016-0.
Supporting Material
References
- 1.Ozbudak E.M., Thattai M., van Oudenaarden A. Regulation of noise in the expression of a single gene. Nat. Genet. 2002;31:69–73. doi: 10.1038/ng869. [DOI] [PubMed] [Google Scholar]
- 2.Kaern M., Elston T.C., Collins J.J. Stochasticity in gene expression: from theories to phenotypes. Nat. Rev. Genet. 2005;6:451–464. doi: 10.1038/nrg1615. [DOI] [PubMed] [Google Scholar]
- 3.Paulsson J. Summing up the noise in gene networks. Nature. 2004;427:415–418. doi: 10.1038/nature02257. [DOI] [PubMed] [Google Scholar]
- 4.Samoilov M., Plyasunov S., Arkin A.P. Stochastic amplification and signaling in enzymatic futile cycles through noise-induced bistability with oscillations. Proc. Natl. Acad. Sci. USA. 2005;102:2310–2315. doi: 10.1073/pnas.0406841102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sánchez A., Kondev J. Transcriptional control of noise in gene expression. Proc. Natl. Acad. Sci. USA. 2008;105:5081–5086. doi: 10.1073/pnas.0707904105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Shahrezaei V., Swain P.S. The stochastic nature of biochemical networks. Curr. Opin. Biotechnol. 2008;19:369–374. doi: 10.1016/j.copbio.2008.06.011. [DOI] [PubMed] [Google Scholar]
- 7.Elowitz M.B., Levine A.J., Swain P.S. Stochastic gene expression in a single cell. Science. 2002;297:1183–1186. doi: 10.1126/science.1070919. [DOI] [PubMed] [Google Scholar]
- 8.Tao Y. Intrinsic and external noise in an auto-regulatory genetic network. J. Theor. Biol. 2004;229:147–156. doi: 10.1016/j.jtbi.2004.03.011. [DOI] [PubMed] [Google Scholar]
- 9.Beard D.A., Qian H. University Press; Cambridge: 2008. Chemical Biophysics: Quantitative Analysis of Cellular Systems. [Google Scholar]
- 10.Munsky B., Trinh B., Khammash M. Listening to the noise: random fluctuations reveal gene network parameters. Mol. Syst. Biol. 2009;5:318. doi: 10.1038/msb.2009.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lillacci G., Khammash M. Parameter estimation and model selection in computational biology. PLOS Comput. Biol. 2010;6:e1000696. doi: 10.1371/journal.pcbi.1000696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zechner C., Ruess J., Koeppl H. Moment-based inference predicts bimodality in transient gene expression. Proc. Natl. Acad. Sci. USA. 2012;109:8340–8345. doi: 10.1073/pnas.1200161109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Lillacci G., Khammash M. A distribution-matching method for parameter estimation and model selection in computational biology. Int. J. Robust Nonlinear Control. 2012;22:1065–1081. [Google Scholar]
- 14.Ruess J., Milias-Argeitis A., Lygeros J. Designing experiments to understand the variability in biochemical reaction networks. J. R. Soc. Interface. 2013;10:20130588. doi: 10.1098/rsif.2013.0588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lillacci G., Khammash M. The signal within the noise: efficient inference of stochastic gene regulation models using fluorescence histograms and stochastic simulations. Bioinformatics. 2013;29:2311–2319. doi: 10.1093/bioinformatics/btt380. [DOI] [PubMed] [Google Scholar]
- 16.Kauffman S. A proposal for using the ensemble approach to understand genetic regulatory networks. J. Theor. Biol. 2004;230:581–590. doi: 10.1016/j.jtbi.2003.12.017. [DOI] [PubMed] [Google Scholar]
- 17.Gillespie D.T. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 1977;81:2340–2361. [Google Scholar]
- 18.Guet C.C., Elowitz M.B., Leibler S. Combinatorial synthesis of genetic networks. Science. 2002;296:1466–1470. doi: 10.1126/science.1067407. [DOI] [PubMed] [Google Scholar]
- 19.Hasty J., Dolnik M., Collins J.J. Synthetic gene network for entraining and amplifying cellular oscillations. Phys. Rev. Lett. 2002;88:148101. doi: 10.1103/PhysRevLett.88.148101. [DOI] [PubMed] [Google Scholar]
- 20.Stricker J., Cookson S., Hasty J. A fast, robust and tunable synthetic gene oscillator. Nature. 2008;456:516–519. doi: 10.1038/nature07389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tsai T.Y., Choi Y.S., Ferrell J.E., Jr. Robust, tunable biological oscillations from interlinked positive and negative feedback loops. Science. 2008;321:126–129. doi: 10.1126/science.1156951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Gore J., van Oudenaarden A. Synthetic biology: the yin and yang of nature. Nature. 2009;457:271–272. doi: 10.1038/457271a. [DOI] [PubMed] [Google Scholar]
- 23.Mukherji S., van Oudenaarden A. Synthetic biology: understanding biological design from synthetic circuits. Nat. Rev. Genet. 2009;10:859–871. doi: 10.1038/nrg2697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ellis T., Wang X., Collins J.J. Diversity-based, model-guided construction of synthetic gene networks with predicted functions. Nat. Biotechnol. 2009;27:465–471. doi: 10.1038/nbt.1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kittisopikul M., Süel G.M. Biological role of noise encoded in a genetic network motif. Proc. Natl. Acad. Sci. USA. 2010;107:13300–13305. doi: 10.1073/pnas.1003975107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Khalil A.S., Collins J.J. Synthetic biology: applications come of age. Nat. Rev. Genet. 2010;11:367–379. doi: 10.1038/nrg2775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Moon T.S., Lou C., Voigt C.A. Genetic programs constructed from layered logic gates in single cells. Nature. 2012;491:249–253. doi: 10.1038/nature11516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wu M., Su R.Q., Wang X. Engineering of regulated stochastic cell fate determination. Proc. Natl. Acad. Sci. USA. 2013;110:10610–10615. doi: 10.1073/pnas.1305423110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Wu F., Wang X. Applications of synthetic gene networks. Sci. Prog. 2015;98:244–252. doi: 10.3184/003685015X14368807556441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wu F., Su R.Q., Wang X. Engineering of a synthetic quadrastable gene network to approach Waddington landscape and cell fate determination. eLife. 2017;6:e23702. doi: 10.7554/eLife.23702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Nevozhay D., Adams R.M., Balázsi G. Mapping the environmental fitness landscape of a synthetic gene circuit. PLOS Comput. Biol. 2012;8:e1002480. doi: 10.1371/journal.pcbi.1002480. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Alon U. Network motifs: theory and experimental approaches. Nat. Rev. Genet. 2007;8:450–461. doi: 10.1038/nrg2102. [DOI] [PubMed] [Google Scholar]
- 33.Lyons S.M., Xu W., Prasad A. Loads bias genetic and signaling switches in synthetic and natural systems. PLOS Comput. Biol. 2014;10:e1003533. doi: 10.1371/journal.pcbi.1003533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Wang L.Z., Wu F., Wang X. Build to understand: synthetic approaches to biology. Integr. Biol. 2016;8:394–408. doi: 10.1039/c5ib00252d. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Pressé S., Ghosh K., Dill K.A. Modeling stochastic dynamics in biochemical systems with feedback using maximum caliber. J. Phys. Chem. B. 2011;115:6202–6212. doi: 10.1021/jp111112s. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gardner T.S., Cantor C.R., Collins J.J. Construction of a genetic toggle switch in Escherichia coli. Nature. 2000;403:339–342. doi: 10.1038/35002131. [DOI] [PubMed] [Google Scholar]
- 37.Lipshtat A., Loinger A., Biham O. Genetic toggle switch without cooperative binding. Phys. Rev. Lett. 2006;96:188101. doi: 10.1103/PhysRevLett.96.188101. [DOI] [PubMed] [Google Scholar]
- 38.Keller A.D. Model genetic circuits encoding autoregulatory transcription factors. J. Theor. Biol. 1995;172:169–185. doi: 10.1006/jtbi.1995.0014. [DOI] [PubMed] [Google Scholar]
- 39.Smolen P., Baxter D.A., Byrne J.H. Frequency selectivity, multistability, and oscillations emerge from models of genetic regulatory systems. Am. J. Physiol. 1998;274:C531–C542. doi: 10.1152/ajpcell.1998.274.2.C531. [DOI] [PubMed] [Google Scholar]
- 40.Becskei A., Séraphin B., Serrano L. Positive feedback in eukaryotic gene networks: cell differentiation by graded to binary response conversion. EMBO J. 2001;20:2528–2535. doi: 10.1093/emboj/20.10.2528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Tyson J.J., Chen K.C., Novak B. Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell. Curr. Opin. Cell Biol. 2003;15:221–231. doi: 10.1016/s0955-0674(03)00017-6. [DOI] [PubMed] [Google Scholar]
- 42.Cheng Z., Liu F., Wang W. Robustness analysis of cellular memory in an autoactivating positive feedback system. FEBS Lett. 2008;582:3776–3782. doi: 10.1016/j.febslet.2008.10.005. [DOI] [PubMed] [Google Scholar]
- 43.Bishop L.M., Qian H. Stochastic bistability and bifurcation in a mesoscopic signaling system with autocatalytic kinase. Biophys. J. 2010;98:1–11. doi: 10.1016/j.bpj.2009.09.055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Frigola D., Casanellas L., Ibañes M. Asymmetric stochastic switching driven by intrinsic molecular noise. PLoS One. 2012;7:e31407. doi: 10.1371/journal.pone.0031407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Faucon P.C., Pardee K., Wang X. Gene networks of fully connected triads with complete auto-activation enable multistability and stepwise stochastic transitions. PLoS One. 2014;9:e102873. doi: 10.1371/journal.pone.0102873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Kepler T.B., Elston T.C. Stochasticity in transcriptional regulation: origins, consequences, and mathematical representations. Biophys. J. 2001;81:3116–3136. doi: 10.1016/S0006-3495(01)75949-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Phillips R., Kondev J., Theriot J. Garland Science; New York, NY: 2009. Physical Biology of The Cell. [Google Scholar]
- 48.Ghosh K., Dill K.A., Phillips R. Teaching the principles of statistical dynamics. Am. J. Phys. 2006;74:123–133. doi: 10.1119/1.2142789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Seitaridou E., Inamdar M.M., Dill K. Measuring flux distributions for diffusion in the small-numbers limit. J. Phys. Chem. B. 2007;111:2288–2292. doi: 10.1021/jp067036j. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Wu D., Ghosh K., Phillips R. Trajectory approach to two-state kinetics of single particles on sculpted energy landscapes. Phys. Rev. Lett. 2009;103:050603. doi: 10.1103/PhysRevLett.103.050603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Otten M., Stock G. Maximum caliber inference of nonequilibrium processes. J. Chem. Phys. 2010;133:034119. doi: 10.1063/1.3455333. [DOI] [PubMed] [Google Scholar]
- 52.Pressé S., Ghosh K., Dill K.A. Dynamical fluctuations in biochemical reactions and cycles. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 2010;82:031905. doi: 10.1103/PhysRevE.82.031905. [DOI] [PubMed] [Google Scholar]
- 53.Ghosh K. Stochastic dynamics of complexation reaction in the limit of small numbers. J. Chem. Phys. 2011;134:195101. doi: 10.1063/1.3590918. [DOI] [PubMed] [Google Scholar]
- 54.Pressé S., Peterson J., Dill K. Single molecule conformational memory extraction: p5ab RNA hairpin. J. Phys. Chem. B. 2014;118:6597–6603. doi: 10.1021/jp500611f. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Pressé S., Ghosh K., Dill K. Principle of maximum entropy and maximum caliber in statistical physics. Rev. Mod. Phys. 2013;85:1115–1141. [Google Scholar]
- 56.Dixit P.D., Dill K.A. Inferring microscopic kinetic rates from stationary state distributions. J. Chem. Theory Comput. 2014;10:3002–3005. doi: 10.1021/ct5001389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Dixit P.D., Jain A., Dill K.A. Inferring transition rates of networks from populations in continuous-time Markov processes. J. Chem. Theory Comput. 2015;11:5464–5472. doi: 10.1021/acs.jctc.5b00537. [DOI] [PubMed] [Google Scholar]
- 58.Wan H., Zhou G., Voelz V.A. A maximum-caliber approach to predicting perturbed folding kinetics due to mutations. J. Chem. Theory Comput. 2016;12:5768–5776. doi: 10.1021/acs.jctc.6b00938. [DOI] [PubMed] [Google Scholar]
- 59.Munsky B., Khammash M. The finite state projection algorithm for the solution of the chemical master equation. J. Chem. Phys. 2006;124:044104. doi: 10.1063/1.2145882. [DOI] [PubMed] [Google Scholar]
- 60.Lawrimore J., Bloom K.S., Salmon E.D. Point centromeres contain more than a single centromere-specific Cse4 (CENP-A) nucleosome. J. Cell Biol. 2011;195:573–582. doi: 10.1083/jcb.201106036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Taniguchi Y., Choi P.J., Xie X.S. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 2010;329:533–538. doi: 10.1126/science.1188308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Tsekouras K., Custer T.C., Pressé S. A novel method to accurately locate and count large numbers of steps by photobleaching. Mol. Biol. Cell. 2016;27:3601–3615. doi: 10.1091/mbc.E16-06-0404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Friedman N., Cai L., Xie X.S. Linking stochastic dynamics to population distribution: an analytical framework of gene expression. Phys. Rev. Lett. 2006;97:168302. doi: 10.1103/PhysRevLett.97.168302. [DOI] [PubMed] [Google Scholar]
- 64.McLean, P., C. Smolke, and M. Salit. 2016. Characterizing the non-normal distribution of flow cytometry measurements from transiently expressed constructs in mammalian cells. Published online June 9, 2016. 10.1101/057950.
- 65.Coffman V.C., Wu J.Q. Counting protein molecules using quantitative fluorescence microscopy. Trends Biochem. Sci. 2012;37:499–506. doi: 10.1016/j.tibs.2012.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Coffman V.C., Wu P., Wu J.Q. CENP-A exceeds microtubule attachment sites in centromere clusters of both budding and fission yeast. J. Cell Biol. 2011;195:563–572. doi: 10.1083/jcb.201106078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Engel B.D., Ludington W.B., Marshall W.F. Intraflagellar transport particle size scales inversely with flagellar length: revisiting the balance-point length control model. J. Cell Biol. 2009;187:81–89. doi: 10.1083/jcb.200812084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Leake M.C., Chandler J.H., Armitage J.P. Stoichiometry and turnover in single, functioning membrane protein complexes. Nature. 2006;443:355–358. doi: 10.1038/nature05135. [DOI] [PubMed] [Google Scholar]
- 69.Ulbrich M.H., Isacoff E.Y. Subunit counting in membrane-bound proteins. Nat. Methods. 2007;4:319–321. doi: 10.1038/NMETH1024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Das S.K., Darshi M., Bayley H. Membrane protein stoichiometry determined from the step-wise photobleaching of dye-labelled subunits. ChemBioChem. 2007;8:994–999. doi: 10.1002/cbic.200600474. [DOI] [PubMed] [Google Scholar]
- 71.Shu D., Zhang H., Guo P. Counting of six pRNAs of ϕ29 DNA-packaging motor with customized single-molecule dual-view system. EMBO J. 2007;26:527–537. doi: 10.1038/sj.emboj.7601506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Delalez N.J., Wadhams G.H., Armitage J.P. Signal-dependent turnover of the bacterial flagellar switch protein FliM. Proc. Natl. Acad. Sci. USA. 2010;107:11347–11351. doi: 10.1073/pnas.1000284107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Demuro A., Penna A., Parker I. Subunit stoichiometry of human Orai1 and Orai3 channels in closed and open states. Proc. Natl. Acad. Sci. USA. 2011;108:17832–17837. doi: 10.1073/pnas.1114814108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Hastie P., Ulbrich M.H., Chen L. AMPA receptor/TARP stoichiometry visualized by single-molecule subunit counting. Proc. Natl. Acad. Sci. USA. 2013;110:5163–5168. doi: 10.1073/pnas.1218765110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Arumugam S.R., Lee T.H., Benkovic S.J. Investigation of stoichiometry of T4 bacteriophage helicase loader protein (gp59) J. Biol. Chem. 2009;284:29283–29289. doi: 10.1074/jbc.M109.029926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Pitchiaya S., Androsavich J.R., Walter N.G. Intracellular single molecule microscopy reveals two kinetically distinct pathways for microRNA assembly. EMBO Rep. 2012;13:709–715. doi: 10.1038/embor.2012.85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Pitchiaya S., Krishnan V., Walter N.G. Dissecting non-coding RNA mechanisms in cellulo by single-molecule high-resolution localization and counting. Methods. 2013;63:188–199. doi: 10.1016/j.ymeth.2013.05.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Schneidman E., Berry M.J., 2nd, Bialek W. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature. 2006;440:1007–1012. doi: 10.1038/nature04701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Balaban N.Q., Merrin J., Leibler S. Bacterial persistence as a phenotypic switch. Science. 2004;305:1622–1625. doi: 10.1126/science.1099390. [DOI] [PubMed] [Google Scholar]
- 80.Levy S.F., Ziv N., Siegal M.L. Bet hedging in yeast by heterogeneous, age-correlated expression of a stress protectant. PLoS Biol. 2012;10:e1001325. doi: 10.1371/journal.pbio.1001325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.González C., Ray J.C., Balázsi G. Stress-response balance drives the evolution of a network module and its host genome. Mol. Syst. Biol. 2015;11:827. doi: 10.15252/msb.20156185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Brauner A., Fridman O., Balaban N.Q. Distinguishing between resistance, tolerance and persistence to antibiotic treatment. Nat. Rev. Microbiol. 2016;14:320–330. doi: 10.1038/nrmicro.2016.34. [DOI] [PubMed] [Google Scholar]
- 83.Elowitz M.B., Leibler S. A synthetic oscillatory network of transcriptional regulators. Nature. 2000;403:335–338. doi: 10.1038/35002125. [DOI] [PubMed] [Google Scholar]
- 84.Ge H., Pressé S., Dill K.A. Markov processes follow from the principle of maximum caliber. J. Chem. Phys. 2012;136:064108. doi: 10.1063/1.3681941. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.