Abstract
In vivo dynamics of protein levels in bacterial cells depend on both intracellular regulation and relevant population dynamics. Such population dynamics effects, e.g., interplay between cell and plasmid division rates, are, however, often neglected in modeling gene expression regulation. Including them in a model introduces additional parameters shared by the dynamical equations, which can significantly increase dimensionality of the parameter inference. We here analyse the importance of these effects, on a case of bacterial restriction-modification (R-M) system. We redevelop our earlier minimal model of this system gene expression regulation, based on a thermodynamic and dynamic system modeling framework, to include the population dynamics effects. To resolve the problem of effective coupling of the dynamical equations, we propose a “mean-field-like” procedure, which allows determining only part of the parameters at a time, by separately fitting them to expression dynamics data of individual molecular species. We show that including the interplay between kinetics of cell division and plasmid replication is necessary to explain the experimental measurements. Moreover, neglecting population dynamics effects can lead to falsely identifying non-existent regulatory mechanisms. Our results call for advanced methods to reverse-engineer intracellular regulation from dynamical data, which would also take into account the population dynamics effects.
Keywords: restriction-modification systems, bacterial population dynamics, gene expression control, statistical thermodynamics, transcription regulation
1. Introduction
Technological developments in the past few decades enabled experimental in vivo measurements of protein levels in single cells with high temporal resolution, thus providing a good basis for studying gene expression control by mathematical modeling [1,2]. The ultimate goal of these studies is to provide accurate predictions of gene expression dynamics in a cell, which is of particular importance for the emerging synthetic biology field. In particular, advances in fluorescence microscopy and microfluidics [1,3,4,5] allowed measurements of protein expression levels in a clonal culture descending from the same (single) cell, which abolishes the need to synchronize the cell population. First such measurements for bacterial restriction-modification (R-M) systems have recently become available [6].
R-M systems have two main components: restriction endonuclease (R) recognizes specific DNA sequences and cuts them, while methyltransferase (M) methylates the same sequences and thereby protects them from being cut [7,8]. R-M systems carried on plasmids can spread to new (naïve) bacterial hosts by horizontal gene transfer. Consequently, the synthesis of R and M has to be tightly controlled, to ensure safe and efficient R-M establishment in a naïve bacterial host. Specifically, R has to be synthesized with a delay with respect to M, so that the host genome is protected by M, before R activity appears [7,9]. Furthermore, since the same R-M system can be established in different bacterial species, system regulation must be at least partially independent from the host transcription factors, i.e., encoded by the system itself. This property to a certain extent simplifies identifying all system regulation components and correctly describing regulatory mechanisms.
We previously showed that biophysical and dynamical system modeling can reasonably explain available experimental data on R-M transcription control [10,11,12]. However, in vivo protein expression dynamics reflect not only transcription regulation, but also the (often neglected) effects of global physiological factors, which can change in the culture [13,14,15,16]. It was early pointed out that to understand function of cellular networks, they should be considered in their natural environment [17]. The global physiological factors can change significantly in the environment, which impacts the growth of the population of cells. Dilution of molecule concentrations upon cell division, and the changes in gene copy number, which are especially pronounced when the gene of interest is carried on a plasmid, are prominent growth rate dependent factors [18]. These effects can significantly modulate gene expression dynamics. For example, it was previously shown in the case of the celebrated lac operon that population dynamics effects have to be included in modeling of its gene expression to explain the measured data [19]. For a review of interdependence between cell growth and gene expression, studied in other systems, see [18] and references therein.
Therefore, the obtained time profile of protein expression is in reality a result of coupled specific regulatory, and global physiological factors that modulate the relevant population dynamics (dynamics of cell and plasmid division). The Esp1396I R-M system which will be analysed here, provides an interesting example of an unexpected protein expression time profile [6]. In this experiment, fluorescently labeled M and R amounts were tracked in time at the level of single cells in a microcolony originating from a single cell transformed with the system on a plasmid. The obtained time profile can only in part be explained by a model of known system regulation mechanisms [6]. The minimal theoretical model (which includes the experimentally established transcription regulation mechanisms, and dilution of molecular species in accordance with empirically observed bacterial population growth) explains the main qualitative characteristics of Esp1396I system expression dynamics: massive initial synthesis of methyltransferase and a delayed start of restriction endonuclease synthesis. However, while the model provides a good description of early-time dynamics, it does not agree with experiments at later times—see Figure 4C in [6]. For example, to explain the observed increase in intracellular amounts of M at later times, one would have to invoke an otherwise non-existing transcription activation mechanism.
On the other hand, the bacterial cell division rate changes during the course of this particular experiment, so two growth regimes can be distinguished, in which the population grows exponentially, first with a higher and later, when the preferred nutrient is exhausted, with a lower cell division rate (see Figure 4A in [6]) [6,20]. In addition, the intracellular plasmid copy number dynamically increased during the experiment (unpublished observation). Consequently, there can be a highly non-trivial (and time dependent) interplay between cell and plasmid division rates, which might significantly affect the observed patterns of R and M dynamics. Such interplay (population dynamics effects) was neglected in our minimal model of Esp1396I expression dynamics, and is commonly neglected in modeling of bacterial intracellular dynamics. Therefore, before assuming that the unexplained properties of the measured dynamics are due to the action of some unknown regulatory mechanism(s) operating in the system, one should consider that they may arise from the already modelled regulation combined with the missing population dynamics effects.
Consequently, a major goal of this research is to analyse the importance of population dynamics effects for intracellular dynamics. Understanding this is necessary to accurately predict gene expression dynamics in a cell, which is also crucial for a number of practical applications, ranging from synthetic biology [21,22] to bacterial antibiotic resistance [18]. This is here addressed on the example of Esp1396I R-M system gene expression control since: (i) state-of-the-art measurements of protein dynamics for this system are available [6]; (ii) our previous work showed that theoretical modeling can accurately explain R-M experimental measurements [6,10,11,12], and (iii) R-M systems are both important experimental model systems and sufficiently simple to be realistically theoretically modeled. In particular, the number of parameters is significantly lower to what is often encountered in models of gene expression networks/dynamics [23], and all the effects introduced in the model are supported by direct experimental observations [6].
2. Methods
2.1. Modeling cr Operon and m Gene Transcription Activities
The C protein is a transcription factor, encoded by Esp1396I [9,24], which regulates transcription of both the operon consisting of the c gene itself and the r gene (cr operon), and the methyltransferase (m) gene, as indicated in (Figure 1A). In particular, binding of the C protein dimer to a site partially overlapping with the strong PM promoter of the m gene (denoted with MBS—for Methyltransferase Binding Site), represses m transcription (Figure 1C) [24].
The mechanism of cr transcription regulation is more complex (Figure 1B): at low C concentrations, C dimer is bound to the strong, distal binding site (DBS), and recruits RNA polymerase (RNAP) to the PCR promoter thereby activating transcription of the cr operon (Figure 1B); at higher C concentrations, another C dimer is recruited to the adjacent weak, proximal binding site (PBS) forming a C tetramer on DNA, which represses transcription of cr [10,24].
Transcription activity of Esp1396I R-M system promoters PCR and PM depicted in Figure 1A is modelled using statistical thermodynamics, i.e., by representing the relevant molecular components in a cell as a system whose probability of different microstates is described by the Boltzmann distribution. Appropriate dimensionless statistical weights are assigned to each equilibrium configuration of RNA polymerase and C protein molecules at PCR (Figure 1B) and PM (Figure 1C), reflecting energetic and entropic costs of their establishment [25,26]. One should note that the same statistical weights can be obtained by applying the law of mass action to appropriate equilibrium binding reactions, where the equilibrium dissociation constants Kd~exp(ΔG) (see [26] for a brief description of both approaches).
Statistical weights of the PCR promoter configurations are given by the following Equations:
(1) |
(2) |
(3) |
where statistical weights in Equations (1)–(3) correspond, respectively, to: (i) only RNA polymerase bound to the promoter, corresponding to basal transcription of the genes encoding C protein and restriction endonuclease (Equation (1)), (ii) RNA polymerase recruited to the promoter by the C dimer bound to the DBS (Equation (2)), and (iii) a second C dimer recruited to the PBS by the C dimer bound to the DBS, forming a C tetramer on DNA which represses transcription (Equation (3)). In the upper equations, k is a proportionality constant (with units one over concentration), concentrations of molecular species are labelled with square brackets, while protein-DNA and protein-protein interaction energies that enter Equations (1)–(3), (ΔGs in the units of kBT) are denoted in Figure 1B and listed in the caption of the Figure. Constants in Equations (1)–(3) can be absorbed into few parameters (a, b and c) that do not depend on C concentration, which results in the expressions for statistical weights denoted next to the appropriate configurations in Figure 1B.
The standard assumption in thermodynamic modeling is that transcriptional activity of the promoter is proportional to the equilibrium probability that RNA polymerase is bound to that promoter [27]. Accordingly, transcriptional activity of PCR is given by the following equation (1 is the statistical weight corresponding to the empty promoter):
(4) |
where α is a proportionality constant with units of transcript amount over time. Equation (4) can be rewritten introducing the derived statistical weights from Figure 1B, and defining the basal transcription activity of the PCR in the absence of C protein, :
(5) |
Transcription activity of PM promoter is modelled in the same manner (Figure 1C), assuming that it can be found: (i) empty, (ii) in a transcriptionally active state when it is occupied by RNA polymerase, or (iii) in a repressed state, when a C dimer is bound to the MBS. Statistical weights of the configurations described under (ii) and (iii) read:
(6) |
(7) |
while the PM transcription activity is given by the equation:
(8) |
where β is a proportionality constant with units of transcript amount over time. As in the case of the PCR activity modeling, constants in Equations (6) and (7) can be absorbed into few parameters (f and g) that do not depend on C concentration, which results in the expressions for statistical weights denoted next to the appropriate configurations in Figure 1C. Equation (8) becomes
(9) |
when and are introduced. Note that Ki corresponds to the equilibrium association constant for C dimer binding to DNA in the presence of a bound RNAP, which exerts an inhibiting effect on transcription.
2.2. Introducing the Interplay of Cell and Plasmid Division Rates
We here develop a model that includes an interplay of cell and plasmid division rates, so that full population dynamics effects due to (time dependent) division of both cells and plasmids are taken into account. Note that R-M systems typically produce thousands of RNAs and proteins of the two enzymes in the cell, so these systems can be reliably modelled deterministically.
The number of R-M system encoding plasmids per cell (np) is introduced as a time dependent variable, which increases due to plasmid replication, and decreases due to dilution after cell division. Specifically, the change of cell numbers (ncell), and the number of plasmids per cell (np) with time is described by the following differential equations:
(10) |
(11) |
where λcell and λp are division rates for cells and plasmids, respectively. Experimentally measured cell number dependence with time [6] indicates a relatively sharp transition from faster to slower cell division rate, happening at ~150 min. To obtain a curve that can satisfactory describe the measured data, the area of transition is interpolated using the hyperbolic tangent function. Note that this interpolation covers a relatively short time interval ~30 min, so it does not significantly affect the predictions, but is necessary to avoid discontinuity when solving differential equations—moreover, it naturally extrapolates between the two regimes of cell division. In particular, λcell(t) is determined as follows:
(12) |
where λcell1, λcell2 are constant parameters denoting the cell division rates in the first and the second time interval, while scell is a fixed parameter defining smoothness of the interpolation (taken as 120 min). Note that Equation (12) is just an empirical fit (basically continual interpolation) to the experimentally measured data, which is an input to the model of the gene expression dynamics, rather than a model prediction.
The plasmid division rate λp(t) is introduced in an analogous manner:
(13) |
with equivalent parameter labelling as in Equation (12). Finally, in the full dynamical model, the number of plasmids is obtained by solving Equations (11) and (13), and then multiplied by promoter transcription activities to obtain the generation rates of the corresponding RNA species. The full model is described with the following set of differential equations:
(14) |
(15) |
(16) |
(17) |
(18) |
Equations (14) and (15) describe how the amounts of transcripts of the cr operon and the m gene change with time, while Equations (16)–(18) describe the same for the amounts of proteins, namely, of C protein (C), restriction endonuclease (R) and methyltransferase (M). For parameter notation, see Table 1. Note that each transcript and protein decay rate (λ’s in the equations above) represents a sum of λcell (Equation (12)) and the corresponding molecule degradation rate (which we mark by ~ signs, e.g., ). Since we take protein decay rates of R and C to be equal, Equation (17) can be excluded from solving, i.e., replaced by an algebraic relation (note that C and R in Equations (16) and (17) can be rescaled with kC and kR, respectively, leading to the same equation).
Table 1.
Notation | Value | Description |
---|---|---|
Population Dynamics | ||
2.3 × 10−2 | Cell division rate in the first time interval | |
3.1 × 10−3 | Cell division rate in the second time interval | |
1.5 × 102 | Time of transition between the cell division rates | |
2.3 × 10−2 | Plasmid division rate in the first time interval | |
7.5 × 10−3 | Plasmid division rate in the second time interval | |
4.2 × 102 | Time of transition between the plasmid division rates | |
Restriction endonuclease dynamics | ||
5.3 × 10−1 | Basal transcription activity of the PCR promoter | |
2.7 × 10−1 | Constants which absorb the relevant interaction energies and RNA polymerase concentration | |
4.3 × 10−3 | ||
2.7 × 10−9 | ||
5.1 × 10−1 | Translation rate for control protein | |
Translation rate for restriction endonuclease | ||
3.8 × 10−2 | Decay rate for the operon transcript | |
1.2 × 10−6 | Decay rate for control protein | |
Decay rate for restriction endonuclease | ||
Methyltransferase dynamics | ||
4.6 × 101 | Basal transcription activity of the PM promoter | |
3.0 × 10−5 | Constants which absorb the relevant interaction energies and RNA polymerase concentration | |
1.5 | Translation rate for methyltransferase | |
2.0 × 10−1 | Decay rate for the m gene transcript | |
8.0 × 10−4 | Decay rate for methyltransferase |
Time is measured in minutes, while the rates are given in 1/min.
2.3. Numerically Solving and Fitting the Model Equations
If the population dynamics effects, in particular changes of per cell plasmid copy number throughout the experiment, would be neglected [6], the equations for R (and consequently also C, see above) dynamics could be solved separately from the equation for M dynamics. Once C dynamics are solved, they could be used to determine the M dynamics (since the m gene is regulated by C, see Equations (15) and (18)), as was done with the minimal model provided in [6]. However, the introduction of plasmid dynamics in the model leads to nontrivial coupling of R (Equations (14), (16) and (17)) and M dynamics (Equations (15) and (18)) through the time-dependent term np that enters both Equations (14) and (15). Consequently, these sets of equations have to be solved simultaneously, and their parameters can no longer be separately fitted to the experimental data for R and M dynamics. This then significantly increases dimensionality of the parameter inference problem, i.e., the joint fit leads to inferring parameters in a 17-dimensional space, which is computationally complicated, since a very large number of parameter combinations have to be explored to find the best fit of the model to experimental data.
To resolve this problem, which would become even more prominent with a larger number of species in the model, we here propose an iterative, “mean field-like” approach to effectively decouple such equations. The main idea is schematically depicted in Figure 2A, and in essence corresponds to solving for dynamics of only one molecular species (W in Figure 2A), in the “field” obtained by empirically approximating the dynamics of the other species (U and V and arrow 1 in Figure 2A). This then allows inferring the population dynamics parameters (which couple the species dynamics), by fitting the model only to experimental data for W dynamics. Once these population dynamics parameters are fixed, one can then return (arrow 2 in Figure 2A) to other species, and separately solve for their dynamics, by fits to corresponding experimental data. With solved dynamics for U and V, one then goes back (arrow 3) and solves W dynamics: (i) with either fixed (previously inferred) parameters for population dynamics, in the case that the procedure converges, i.e., leads to a satisfactory fit to experimental data or (ii) by refitting (inferring again) the population dynamics parameters, if the convergence has not been achieved—in this case, the whole procedure is further iteratively repeated until convergence.
The application of this procedure to R-M dynamics is depicted in Figure 2B. Note that the time dependence of R directly determines the time dependence of C, as they are proportional to each other (see the previous subsection). The procedure is started by empirically fitting the time dependence of R, as it is simpler compared to M. In fact, it can be well approximated by quadratic dependence on time. On the other hand, a more complex time dependence of M data is more suitable (i.e., likely more sensitive) to inferring the population dynamics parameters, in addition to fit being of smaller dimensionality compared to R. Once C dynamics is empirically fitted, the Equations (15) and (18) for M dynamics are solved and the solution is fitted to corresponding experimental data, from which the plasmid dynamics parameters are inferred. Then, one can use these parameters to solve for R dynamics, and thereby determine the rest of the parameters in R dynamics. One then goes back to M dynamics and solves it by using the newly inferred C(t). If the inferred (fixed) parameters of M dynamics lead to a satisfactory agreement with experimental data (converge) the procedure is stopped. If not, the plasmid dynamics parameters are again estimated (with C(t) from the previous step). Note that after each full cycle, the solutions for both M(t) and R(t) (i.e., C(t)) exactly solve the full system of the dynamical equations, though the parameters inferred after each cycle may not provide an optimal fit to data—thus, the fit is, if needed, improved through iterative cycles.
To implement the above procedure, the system of differential Equations (11), (14)–(16) and (18) that represents the full model of Esp1396I R-M system is solved numerically using the Runge-Kutta method [28]. The initial conditions are set to zero for all species except for plasmids, for which one plasmid per cell at the beginning of the experiment was set. The system parameters are varied in the ranges that correspond to biochemically realistic values [25]. The set of parameters that best fits experimental data was determined by minimizing R2 (the sum of the squared errors). Note that this numerical procedure is quite more complicated than standard fits to experimental data, as in our case there is no closed form expression that can be fitted to the data points. That is, the closed form expressions for transcription activities (Equations (1)–(9)) serve as an input for non-linear differential equations (Equations (14)–(18)), which cannot be integrated in a closed form expression, but have to be solved numerically, and these solutions then compared with experimental data.
Тo quantify the model comparison with experimental data, adjusted R2 was calculated for each fit. To quantitatively compare different fits (e.g., for “complex” and “simple” models), F values and corresponding P values used [29]:
(19) |
where and are the sum of residuals squared for the first (“simple”), and the second (“complex”) model; p1 and p2 are the corresponding numbers of parameters, while n is the number of data points. From this, we can estimate the corresponding P-values from cumulative distribution function of F statistics.
3. Results and Discussion
3.1. Including the Population Dynamics Effects
The minimal model from [6] was redeveloped to include the effects of simultaneous cell and plasmid division. However, the associated coupling of R and M dynamics brought a technical challenge. In particular, since the system genes are located together on a plasmid (see Equations (14)–(18)), the corresponding equations for R and M can no longer be solved (and fitted to experimental data) separately. This in turn resulted in a significant increase in the dimensionality of the parameter inference problem (to 17). This technical difficulty would become even more pronounced in a model with a larger number of species, where their dynamics would become coupled due to introducing population dynamics effects. To solve this problem, we here proposed a “mean field-like” iterative approach in which the dynamics of two enzymes is effectively decoupled by first empirically approximating dynamics of species 1 and using it to solve for the parameters related to the dynamics of species 2, including those describing the population effects.
Next, the parameters inferred for species 2 are used to solve the equations for species 1 and estimate the appropriate parameters of its regulation. The new species 1 parameters are then used to solve again for the species 2 parameters, and the process is repeated iteratively until the best fit to both species dynamics data is obtained. Note that this “mean field-like” procedure can be generalized to multiple molecular species, as described in Methods (see Figure 1A). In addition to a much simpler parameter inference, note that the decreased dimensionality of the parameter space, also leads to generically smaller chance to overfit the data (due to a smaller number of parameters involved in the fit).
Applying this procedure to Esp1396I system data leads to a much better agreement of the model with experiments (Figure 3A), compared to the fit of the minimal model (Figure 4C in [6]). In particular, the adjusted R2 for the two fits in Figures 3A and 4C from [6], are respectively, 0.97 and 0.25, where F value (160) for the fit comparison is statistically highly significant (P~10−15). The increase of M later in the experiment, which is now accounted for, is a consequence of slower cell division compared to plasmid replication later in the experiment and not due to overfitting (Figure 3A). These results also demonstrate the suitability of “the mean field–like” procedure for reducing the dimensionality of the parameter inference problem. The inferred parameter values (see Table 1) are also in general agreement with experimental observations: high stability of R-M proteins predicted by the model is in a direct agreement with experimental observations [6]; the inferred decay rates are also in accordance with the standard expectation that mRNAs are more rapidly degraded than proteins [30]; the obtained three times lower translation rate of cr compared to m is consistent with providing a robust delay of R with respect to M, and such mechanism was also found in AhdI R-M system [10]. The errors for the parameter fit values could be estimated through either Monte Carlo or bootstrapping procedure. While this is in principle straightforward, it is in practice highly computationally intensive, as the system has to be simulated for all the parameter combinations and for all of the many generated synthetic datasets [31].
3.2. Including Population Dynamics Improves Agreement with the Expression Measurements
We next address if the number of parameters in the model is minimal, i.e., if experimental data could be explained with a less complex model. Namely, plasmid dynamics were modelled with three parameters representing the plasmid division rate in the first time interval (λp1)), the plasmid division rate in the second time interval (λp2), and the time of transition between the two intervals (). Such dependence is analogous to experimentally measured cell division rate, where two intervals with almost constant division rates (transition at ~150 min; see Figure 4A) are separated by a transition period. By comparing Figure 4A,B, one can see that plasmid dynamics providing the best fit of the model to the data comprise a late (at ~400 min; see Figure 4B) transition between the two intervals with constant plasmid division rates. This inferred transition of the plasmid division rate from higher to lower value is consistent with general notion of positive correlation between the cell division rate and gene copy number increase [18], and with the fact that the cell division rate becomes slower later in the experiment (see Figure 4A).
We next test if the data can be explained by a simpler model of plasmid dynamics, with a smaller number of parameters. To that end, we try to describe plasmid division with only one parameter (instead of three), specifically, with a constant plasmid division rate throughout the course of the experiment. In the three parameter model, the plasmid division dynamics are dominated by the first division rate (see the full curve in Figure 4B). Accordingly, we test if the late decrease in the plasmid division rate is dispensable in terms of explaining the data, by assuming that the obtained value of the first plasmid division rate (describing the early plasmid dynamics) remains constant during the whole course of the experiment (see the dashed line in Figure 4B).
This one-parameter assumption clearly leads to a poorer agreement of the model with the data points in the late phase of the experiment (Figure S1A,B in the Supplementary Material), emphasizing that a finer description of plasmid division dynamics is important for explaining the experiment. Further, we model plasmid division dynamics with one parameter which now becomes free, and fit the whole gene expression model to the experimental data. However, such a model also cannot provide a good fit to the data (Figure S1C,D), where the disagreement is particularly pronounced for M expression dynamics (Figure S1C)—note that F (and corresponding P) values for comparison with fits in Figure 3) is provided in caption of Figure S1. Overall, neither of alternative possibilities can provide good agreement of the model with the data, implying that the original (three-parameter) model is indeed necessary (minimal) to explain the data.
To further test to what extent the included population dynamics are necessary to explain the data, we assess their contribution in establishing the final protein expression pattern. To this end, cell and plasmid division rates were set to zero in the full model, so that the model only describes specific gene regulation in the system. Simulation of the model in this case results in notably qualitatively changed protein dynamics for both M and R, which poorly fit the data (Figure 5). As M in the model is very stable, its amount decreases very slowly, governed by transcription repression by the C protein (Figure 5A). As expected, an increase in the amount of M later in the experiment cannot be recovered by the model which includes transcription regulation alone.
If one would attempt to explain the increase in the amount of M later in the experiment through transcription regulation alone, a non-existent activation of the PM promoter at higher C protein concentrations would have to be invoked. Furthermore, one may misinterpret the observed rapid M decrease after the peak, as an indication of highly cooperative repression of PM, since a trademark of high cooperativity is a sharp transition of the system from the ON to the OFF (or vice-verse) state [30]. However, this dynamical property is clearly a result of the population effects, since the Figure 5A suggests that without the population dynamics the peak in M dynamics would be much broader. The same (even more drastic) result is obtained if the model without population dynamics would be refitted to the experimental data (see Figure S2B).
With regards to the R dynamics, if the population dynamics effects are neglected, R slowly increases to some saturation value (Figure 5B), while the shape of the curve is concave instead of the convex form observed in the experiment. Namely, the rapid increase of R late in the experiment is due to two population dynamics effects, which overcome repression by C exhibited at later times. In particular, (i) the rate of cell division slows down at later times (Figure 4B), lowering the effective R decay rate, and (ii) the number of plasmids per cell keeps increasing, leading to increased generation of R transcripts. Both of these effects clearly promote the increase of R at later times counteracting the repression by C, which provides an explanation for the experimentally observed time dependence.
To further check if the model describing only internal system regulation can explain the observed dynamics, it was re-fitted to the experimental data of measured protein expression (Figure S2 in Supplementary material). It can be analytically shown that the observed R dynamics can be explained by such a model only under conditions that transcripts and proteins are very stable and that there is no regulation of the PCR promoter by C protein (Figure S2A). Clearly, this good fit does not provide a correct picture of experimentally inferred regulation in the system. In addition, this model gives a very poor fit to the experimental data for M expression (Figure S2B)—for M the adjusted R2 corresponding to 0.22, with F value for the fit comparison with Figure 3A of P~10−13. Together, these results strongly suggest that the observed Esp1396I system expression dynamics cannot be explained solely by the system specific regulation, and provide an additional argument that the data do not become over fitted by including the population dynamics effects.
Overall, it can be argued that population dynamics, while often neglected, significantly influences the expression dynamics of the system, and is crucial for understanding the system dynamics. Ignoring such dynamics can lead to identification of non-existent regulatory mechanisms of gene transcription in an effort to intuitively interpret experimental data.
3.3. Falsely Identifying Regulation when Plasmid Dynamics is not Included
We next analyse how neglecting only the plasmid division dynamics affects the observed kinetics of protein synthesis. Instead of focusing on quantitative comparison of the model predictions with the data, we will focus on how neglecting the plasmid dynamics can lead to false identification of regulatory mechanisms that in reality do not exist.
We start by observing how modeling agrees with M dynamics data when plasmid dynamics are included (full line), and when they are excluded from the model (dashed line), which is shown in Figure 6. After ~200 min the M amount in the model without plasmid dynamics slowly decreases over time, while the experimental data show that the amount of this enzyme starts to increase.
Consequently, one can speculate that a single, major effect behind the increase of M in the second interval of the experiment is plasmid dynamics, i.e., increase in the plasmid numbers late in the experiment. Neglecting plasmid dynamics in modeling can thus lead to (qualitative) misinterpretation of experimental results. For example, an increase in the amount of M later in the experiment can be interpreted as a consequence of non-existent PM activation at higher concentrations of C protein, while in reality there is only repression [24].
Similar holds when explaining the properties of R dynamics (Figure 7A): excluding plasmid replication from the model describing R expression results in a visibly different dynamics curve from that observed experimentally. Notably, experimental data for R dynamics can be empirically modelled with a quadratic equation. This time dependence can be derived analytically in the absence of population dynamics effects, if proteins and transcripts are assumed entirely stable, and if there is no regulation by C protein (see the pink curve in Figure 7B). Since a specific regulatory mechanism exploiting C proteins was confirmed in a number of previous experiments on the Esp1396I system [24,32,33], such a derivation obviously leads to a wrong conclusion about system expression control. Furthermore, the measured R dynamics naively look as arising from only transcription activation by C protein operating in the PCR promoter control (as R amounts monotonically increase with time), but from a quantitative viewpoint, modeling such a case provides a significantly worse fit to the data (purple curve in Figure 7B)—F value for the comparison with the fit in Figure 3A is 25 (P~10−7). Therefore, intuitive reasoning may be in disagreement with a true situation in a cell.
R dynamics are, therefore, a nice example for apparently simple (quadratic) time dependence generated by a complex interplay between very different opposing effects, in particular, those arising from intracellular regulation (repression at higher C concentrations) and population dynamics (the increase in plasmid numbers later in the experiment). This interplay could not be inferred intuitively, since an intuitive interpretation would suggest only activation of the cr operon by C protein. Moreover, it could not be inferred even through analytical derivations, since they imply a constitutive (non-regulated) gene expression, and stable RNA and protein amounts, which is very different from reality. Therefore, an accurate understanding of the system regulation and the resulting dynamics requires taking into account all relevant effects, and their careful computational modeling.
4. Conclusions
An earlier minimal model of the Esp1396I R-M system expression predicted that the methyltransferase amount changes oppositely in time to what was experimentally observed in microcolonies grown from single transformed cells. The disagreement could have been interpreted as a consequence of unknown regulatory mechanism(s) operating in the system, that were not accounted for in the model. However, from our analysis it follows that the reason behind this disagreement are the commonly neglected population dynamics effects—namely, kinetics of cell division and plasmid replication—which, when included, lead to a very good agreement with the data. Consequently, neglecting global physiological effects on gene expression can lead to falsely identified regulation by significantly impacting qualitative properties of intracellular protein expression dynamics.
From a computational perspective, we note that including population dynamics in the model, which is necessary to explain the experimental data, is a highly nontrivial task, as it inevitably increases dimensionality of the parameter inference. Still, we showed that this problem can be approached through a procedure in which the shared population dynamics parameters are initially estimated by considering dynamics of only one of the multiple species coupled by the regulation mechanisms, relying on approximated dynamics of the rest of the species. Such a “mean field-like” procedure can become even more necessary when considering larger gene regulatory networks, as dimensionality of the parameter inference problem would further increase from the one considered here. While the procedure is here introduced in the form that can be directly applied to any number of molecular species, it remains to be tested in practice when applied to more complex regulatory networks.
As an outlook, we have seen here that population dynamics effects, which are modulated by changes in global physiological factors, can have a significant effect on gene expression dynamics. Consequently, expression patterns of molecular species within a cell can result from a complex interplay of intracellular regulation and population dynamics effects, as demonstrated here in the case of the Esp1396I system. This then calls for advanced methods to reverse-engineer intracellular regulation from dynamical data, which would take into account both intracellular regulation and effects due to changing external conditions [13,18,20,34]. One approach used to filter out the effects of the global physiological state of the cell, which directly impact population dynamics parameters, involves experimentally tracking expression from a constitutive and from a regulated promoter in parallel [16]. In addition, varying of gene copy number with time (e.g., due to kinetics of plasmid division) should be accounted for by both experiments and theoretical models. A particular challenge would be to reconstruct regulatory networks from sole protein dynamics data, that are becoming more and more available [35]. Development of advanced theoretical methods for such reconstruction, analogous to those for reverse engineering of gene networks from gene expression data, may thus become a necessity in the future.
Acknowledgments
We thank Anton Sabantsev, Natalia Morozova and Iaroslav Ispolatov for useful discussions, and Bojana Blagojevic for critically reading the manuscript.
Supplementary Materials
The following are available online: Figures S1 and S2.
Author Contributions
Conceptualization, M.D. (Marko Djordjevi), M.D. (Magdalena Djordjevic) and K.S.; Methodology, M.D. (Marko Djordjevi), S.G. and A.R.; Writing code, S.G. and A.R.; Validation, M.D. (Marko Djordjevi), M.D. (Magdalena Djordjevic). and K.S.; Formal Analysis, S.G., A.R. and M.D. (Marko Djordjevi); Investigation, S.G. and A.R.; Writing—Original Draft Preparation, A.R. and S.G.; Writing—Review & Editing, all authors; Visualization, S.G., A.R. and M.D. (Magdalena Djordjevic); Supervision, M.D. (Marko Djordjevi); Project Administration, M.D. (Marko Djordjevi) and M.D. (Magdalena Djordjevic); Funding Acquisition, M.D. (Marko Djordjevi) and M.D. (Magdalena Djordjevic).
Funding
This research was funded by the Ministry of Education, Science and Technological Development of the Republic of Serbia under project number ON173052. KS was funded by National Institutes of Health grant GM10407 and Russian Science Foundation grant 14-14-00988.
Conflicts of Interest
The authors declare no conflict of interest.
Footnotes
Sample Availability: Not applicable.
References
- 1.Longo D., Hasty J. Dynamics of single-cell gene expression. Mol. Syst. Biol. 2006;2:64–73. doi: 10.1038/msb4100110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ohno M., Karagiannis P., Taniguchi Y. Protein expression analyses at the single cell level. Molecules. 2014;19:13932–13947. doi: 10.3390/molecules190913932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Elowitz M.B., Leibler S. A synthetic oscillatory network of transcriptional regulators. Nature. 2000;403:335–338. doi: 10.1038/35002125. [DOI] [PubMed] [Google Scholar]
- 4.Rosenfeld N., Young J.W., Alon U., Swain P.S., Elowitz M.B. Gene regulation at the single-cell level. Science. 2005;307:1962–1965. doi: 10.1126/science.1106914. [DOI] [PubMed] [Google Scholar]
- 5.Young J.W., Locke J.C., Altinok A., Rosenfeld N., Bacarian T., Swain P.S., Mjolsness E., Elowitz M.B. Measuring single-cell gene expression dynamics in bacteria using fluorescence time-lapse microscopy. Nat. Protoc. 2012;7:80–97. doi: 10.1038/nprot.2011.432. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Morozova N., Sabantsev A., Bogdanova E., Fedorova Y., Maikova A., Vedyaykin A., Rodic A., Djordjevic M., Khodorkovskii M., Severinov K. Temporal dynamics of methyltransferase and restriction endonuclease accumulation in individual cells after introducing a restriction-modification system. Nucleic Acids Res. 2016;44:790–800. doi: 10.1093/nar/gkv1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kobayashi I. Behavior of restriction–modification systems as selfish mobile elements and their impact on genome evolution. Nucleic Acids Res. 2001;29:3742–3756. doi: 10.1093/nar/29.18.3742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Mruk I., Kobayashi I. To be or not to be: Regulation of restriction–modification systems and other toxin–antitoxin systems. Nucleic Acids Res. 2013;42:70–86. doi: 10.1093/nar/gkt711. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Nagornykh M., Bogdanova E., Protsenko A., Solonin A., Zakharova M., Severinov K. Regulation of gene expression in a type II restriction-modification system. Russ. J. Genet. 2008;44:523–532. doi: 10.1134/S1022795408050037. [DOI] [Google Scholar]
- 10.Bogdanova E., Djordjevic M., Papapanagiotou I., Heyduk T., Kneale G., Severinov K. Transcription regulation of the type II restriction-modification system AhdI. Nucleic Acids Res. 2008;36:1429–1442. doi: 10.1093/nar/gkm1116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rodic A., Blagojevic B., Zdobnov E., Djordjevic M. Understanding key features of bacterial restriction-modification systems through quantitative modeling. BMC Syst. Biol. 2017;11:377–391. doi: 10.1186/s12918-016-0377-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Klimuk E., Bogdanova E., Nagornykh M., Rodic A., Djordjevic M., Medvedeva S., Pavlova O., Severinov K. Controller protein of restriction–modification system Kpn2I affects transcription of its gene by acting as a transcription elongation roadblock. Nucleic Acids Res. 2018;46:10810–10826. doi: 10.1093/nar/gky880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Stefan D., Pinel C., Pinhal S., Cinquemani E., Geiselmann J., De Jong H. Inference of quantitative models of bacterial promoters from time-series reporter gene data. PLoS Comput. Biol. 2015;11:e1004028. doi: 10.1371/journal.pcbi.1004028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Klumpp S., Zhang Z., Hwa T. Growth rate-dependent global effects on gene expression in bacteria. Cell. 2009;139:1366–1375. doi: 10.1016/j.cell.2009.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Gerosa L., Kochanowski K., Heinemann M., Sauer U. Dissecting specific and global transcriptional regulation of bacterial gene expression. Mol. Syst. Biol. 2013;9:658–668. doi: 10.1038/msb.2013.14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Berthoumieux S., De Jong H., Baptist G., Pinel C., Ranquet C., Ropers D., Geiselmann J. Shared control of gene expression in bacteria by transcription factors and global physiology of the cell. Mol. Syst. Biol. 2013;9:634–644. doi: 10.1038/msb.2012.70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Savageau M.A. Design of molecular control mechanisms and the demand for gene expression. Proc. Natl. Acad. Sci. USA. 1977;74:5647–5651. doi: 10.1073/pnas.74.12.5647. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Klumpp S., Hwa T. Bacterial growth: Global effects on gene expression, growth feedback and proteome partition. Curr. Opin. Biotechnol. 2014;28:96–102. doi: 10.1016/j.copbio.2014.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vilar J.M.G., Guet C.C., Leibler S. Modeling network dynamics: The lac operon, a case study. J. Cell Biol. 2003;161:471–476. doi: 10.1083/jcb.200301125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.De Jong H., Geiselmann J. Fluorescent Reporter Genes and the Analysis of Bacterial Regulatory Networks. In: Maler O., Halász Á., Dang T., Piazza C., editors. Hybrid Systems Biology. Springer; Cham, Switzerland: 2015. pp. 27–50. [Google Scholar]
- 21.Wang L.-Z., Wu F., Flores K., Lai Y.-C., Wang X. Build to understand: Synthetic approaches to biology. Integr. Biol. 2015;8:394–408. doi: 10.1039/C5IB00252D. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang C., Tsoi R., You L. Addressing biological uncertainties in engineering gene circuits. Integr. Biol. 2015;8:456–464. doi: 10.1039/C5IB00275C. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.De Jong H. Modeling and simulation of genetic regulatory systems: A literature review. J. Comput. Biol. 2002;9:67–103. doi: 10.1089/10665270252833208. [DOI] [PubMed] [Google Scholar]
- 24.Bogdanova E., Zakharova M., Streeter S., Taylor J., Heyduk T., Kneale G., Severinov K. Transcription regulation of restriction-modification system Esp1396I. Nucleic Acids Res. 2009;37:3354–3366. doi: 10.1093/nar/gkp210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Sneppen K., Zocchi G. Physics in Molecular Biology. Cambridge University Press; Cambridge, UK: 2005. [Google Scholar]
- 26.Rodic A., Blagojevic B., Djordjevic M. Systems Biology of Bacterial Immune Systems: Regulation of Restriction-Modification and CRISPR-Cas Systems. In: Rajewsky N., Jurga S., Barciszewski J., editors. Systems Biology. Springer Nature; Cham, Switzerland: 2018. pp. 37–58. [Google Scholar]
- 27.Shea M.A., Ackers G.K. The OR control system of bacteriophage lambda: A physical-chemical model for gene regulation. J. Mol. Biol. 1985;181:211–230. doi: 10.1016/0022-2836(85)90086-5. [DOI] [PubMed] [Google Scholar]
- 28.Butcher J.C. Numerical Methods for Ordinary Differential Equations. 3rd ed. John Wiley & Sons; Hoboken, NJ, USA: 2016. [Google Scholar]
- 29.Lomax R.G., Hahs-Vaughn D.L. Statistical Concepts: A Second Course. 4th ed. Routledge; Abingdon, UK: 2013. [Google Scholar]
- 30.Phillips R., Kondev J., Theriot J., Garcia H. Physical Biology of the Cell. Garland Science; New York, NY, USA: 2012. [Google Scholar]
- 31.Press W.H., Teukolsky S.A., Vetterling W.T., Flannery B.P. Numerical Recipes: The Art of Scientific Computing. 3rd ed. Cambridge University Press; Cambridge, UK: 2007. [Google Scholar]
- 32.McGeehan J., Ball N.J., Streeter S., Thresh S.-J., Kneale G. Recognition of dual symmetry by the controller protein C. Esp1396I based on the structure of the transcriptional activation complex. Nucleic Acids Res. 2011;40:4158–4167. doi: 10.1093/nar/gkr1250. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Martin R.N., McGeehan J.E., Ball N.J., Streeter S.D., Thresh S.-J., Kneale G. Structural analysis of DNA–protein complexes regulating the restriction–modification system Esp1396I. Acta Cryst. 2013;F69:962–966. doi: 10.1107/S174430911302126X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Barzel B., Liu Y.-Y., Barabási A.-L. Constructing minimal models for complex system dynamics. Nat. Commun. 2015;6:7186–7193. doi: 10.1038/ncomms8186. [DOI] [PubMed] [Google Scholar]
- 35.Porreca R., Cinquemani E., Lygeros J., Ferrari-Trecate G. Structural identification of unate-like genetic network models from time-lapse protein concentration measurements; Proceedings of the 49th IEEE Conference on Decision and Control; Atlanta, GA, USA. 15–17 December 2010; pp. 2529–2534. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.