Backward simulation for inferring hidden biomolecular kinetic profiles

Junghun Chae; Roktaek Lim; Cheol-Min Ghim; Pan-Jun Kim

doi:10.1016/j.xpro.2021.100958

. 2021 Nov 15;2(4):100958. doi: 10.1016/j.xpro.2021.100958

Backward simulation for inferring hidden biomolecular kinetic profiles

Junghun Chae ^1,^7,⁸, Roktaek Lim ^2,⁷, Cheol-Min Ghim ^1,^3,^∗, Pan-Jun Kim ^2,^4,^5,^6,^9,^∗∗

PMCID: PMC8605393 PMID: 34841277

Summary

Our backward simulation (BS) is an approach to infer the dynamics of individual components in ordinary differential equation (ODE) models, given the information on relatively downstream components or their sums. Here, we demonstrate the use of BS to infer protein synthesis rates with a given profile of protein concentrations over time in a circadian system. This protocol can also be applied to a wide range of problems with undetermined dynamics at the upstream levels.

For complete details on the use and execution of this protocol, please refer to Lim et al. (2021).

Subject areas: Biophysics, Systems biology, Computer sciences

Graphical abstract

Highlights

•
Inference of upstream dynamics with information on the downstream profile
•
Applies to infer protein synthesis rates with a given circadian protein profile
•
Widely applicable to systems biology models with known downstream profiles

Before you begin

Check whether the backward simulation (BS) is desirable

Timing: 10 min

We have developed the backward simulation (BS) method to discover the “internal” dynamics of the system described by ordinary differential equations (ODEs) with pre-selected model structure and parameter values. When the temporal profiles of relatively downstream components or their sums (but not the profiles of upstream components) are known, BS retrieves the upstream profiles that exactly reproduce the known downstream profiles, through the straightforward ODE calculation with the relevant downstream variables. On the other hand, an existing practice is to assume the plausible forms of these unknown upstream profiles with additional free parameters, and then estimate these parameters to best fit the observed downstream profiles. However, in contrast to the BS, the latter method incurs the computational costs for that parameter estimation and may not even necessarily reproduce the correct downstream profiles.

As a prototype application of this method, we here elaborate the case of the circadian protein degradation model in Lim et al. (2021). Specifically, in contrast to a common practice of simulating the time-course profile of a circadian protein concentration by its upstream elements such as the rhythmic synthesis rate of the protein over time, we intended to maintain the total protein concentration profile at the downstream side as it was and simulate the corresponding rate of the protein synthesis and other upstream kinetic processes in given parameter conditions (Lim et al., 2021). There were three main reasons for this simulation: (i) unlike the mRNA profile, the potentially time-of-day-specific translation rate per mRNA is not commonly known and hence it is difficult to determine the profile of the protein synthesis rate directly from the existing experimental data. In contrast, the experimental profile of the total protein levels is readily available for the incorporation to model simulations. In Lim et al., 2021, given the experimental protein profiles, we performed the BS over randomly-sampled parameter values, and identified the parameter sets for simulation results in quantitative agreement with the empirically observed, rhythmic degradation rates of the proteins. (ii) Another reason was that we wanted to dissect the underpinning mechanism of the rhythmic degradation rate of a circadian protein by rigorously controlling for the effect of the protein profile over a range of parameter values, while avoiding the confounding effect from the changes in the protein profile itself caused by the conventional simulation with given profiles of upstream elements (“forward simulation”). (iii) Lastly, we considered an evolutionary viewpoint that the protein profile can be of a more fundamental position than an mRNA or translation-rate profile so that the protein synthesis rate may have adapted to the protein profile—the protein profile is more likely to influence a biological phenotype than other elements in the system such as mRNA profile. Overall, we foresee a variety of applications of the BS, including the cases with data unavailability at upstream sides, the mechanistic studies with strictly-controlled downstream profiles, and the evolutionary modeling with fixed downstream profiles.

Install python and python packages

Timing: 10 min

1.
Download Python 3.7.4 or a higher version from https://www.python.org. The Python version can be checked by the following command:

> python3 –version

2.
To solve ODE models, the scipy python module is needed. Install scipy package:

> pip install scipy

Key resources table

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Software and algorithms

Custom codes for model simulation	This paper	https://git.io/JuAFt
Python 3.7.4 or higher version	Python Software Foundation	https://www.python.org
SciPy v1.3.1 or higher version	Virtanen et al. (2020)	https://www.scipy.org

Open in a new tab

Step-by-step method details

BS in general cases

Timing: 1 h for step 1 and 1 h for step 2

In this section, we will describe how to apply the BS to a general system of ODEs where the downstream profile is given.

1.
Formulate the dynamics as a set of coupled ODEs.
- a.
  For a dynamical system with n variables $[y_{1} (t), y_{2} (t), \dots, y_{n} (t)] \equiv \vec{y} (t)$ and m parameters $[p_{1} (t), p_{2} (t), \dots, p_{m} (t)] \equiv \vec{p} (t)$ that can be described by a set of coupled ODEs, we divide the rate processes into two parts, i.e.,
  $\frac{d y_{i} (t)}{d t} = F_{i} [\vec{p} (t); \vec{y} (t)] + G_{i} [\vec{p} (t); \vec{y} (t), t], i = 1,2, \dots, n,$ (Equation 1)
  where $F_{i} [\vec{p} (t); \vec{y} (t)]$ describes the intermediate or interconversion processes of components satisfying $\sum_{i = 1}^{n} F_{i} [\vec{p} (t); \vec{y} (t)] = 0$ , and $G_{i} [\vec{p} (t); \vec{y} (t), t]$ describes the source/sink-coupled, nonconservative events responsible for the changes in the total pool of y_i(t) (i=1,2,⋯,n).
- b.
  Summing up the left- and right-hand sides of Equation 1 over i, all the terms responsible for intermediate processes cancel out, leaving only the source/sink-coupled terms:
  $\frac{d}{d t} \sum_{i = 1}^{n} y_{i} (t) = \sum_{i = 1}^{n} G_{i} [\vec{p} (t); \vec{y} (t), t] .$
- c.
  Take extra care of any fundamental conditions of the variables and terms in the model. For example, if the variables represent molecular concentrations, they should be nonnegative, i.e., y_i(t)≥0 for i=1,2,⋯,n. Likewise, a nonnegative net influx of component y_j(t) from an external source indicates $G_{j} [\vec{p} (t); \vec{y} (t), t] \geq 0$ .

2.
Transform the ODEs to utilize the accessible information on relatively downstream variables in the ODE model.
- a.
  Introduce an observable time-course variable Y(t) as $Y (t) \equiv \sum_{i \in A} y_{i} (t)$ where A is the set of components whose sum is of an experimentally available quantity or of theoretical interest. For example
- b.
  For a particular component j selected among the elements of A, we rewrite y_j(t) as
  $y_{j} (t) = Y (t) - \sum_{i \in A \ {j}} y_{i} (t) .$ (Equation 2)
  
  Then, we define ${\vec{y}}_{Y} (t)$ as an alternative form of $\vec{y} (t)$ where y_j(t) is replaced by the right-hand side of Equation 2 and rewrite Equation 1 for $i \neq j$ :
  $\frac{d y_{i} (t)}{d t} = F_{i} [\vec{p} (t); {\vec{y}}_{Y} (t)] + G_{i} [\vec{p} (t); {\vec{y}}_{Y} (t), t], i \neq j .$ (Equation 3)
- c.
  Combining Equations 2 and 3 for i=j, we obtain
  $G_{j} [\vec{p} (t); {\vec{y}}_{Y} (t), t] = \frac{d Y (t)}{d t} - \sum_{k \in A \ {j}} \frac{d y_{k} (t)}{d t} - F_{j} [\vec{p} (t); {\vec{y}}_{Y} (t)] .$ (Equation 4)
  
  The conventional forward simulation numerically solves Equation 1 to obtain $\vec{y} (t)$ , given the values of $\vec{p} (t)$ and $G_{i} [\vec{p} (t); \vec{y} (t), t]$ and the initial condition of $\vec{y} (t)$ . On the other hand, our BS numerically solves Equations 3 and 4 to obtain $G_{j} [\vec{p} (t); {\vec{y}}_{Y} (t), t]$ and ${\vec{y}}_{Y} (t)$ , given the values of $\vec{p} (t)$ , Y(t), and $G_{i} [\vec{p} (t); {\vec{y}}_{Y} (t), t]$ ( $i \neq j$ ) and the initial condition of ${\vec{y}}_{Y} (t)$ . In other words, using a downstream observable Y(t), BS traces back the upstream processes such as $G_{j} [\vec{p} (t); {\vec{y}}_{Y} (t), t]$ .
- d.
  If the computed $G_{j} [\vec{p} (t); {\vec{y}}_{Y} (t), t]$ or ${\vec{y}}_{Y} (t)$ does not satisfy the fundamental conditions imposed by a modeler, they are treated as infeasible solutions. For example, if y_i(t) in ${\vec{y}}_{Y} (t)$ represents the concentration of each molecular species i, y_i(t) should be non-negative for all i’s. Depending on cases, the infeasible solutions may indicate the incompatibility of a simulated parameter set $\vec{p} (t)$ to an observed profile of Y(t).

Application of BS to a circadian protein degradation model

Timing: 1 h for step 3, 1 h for step 4, 1 h for step 5, and 2 h for step 6

In this section, we will describe how to apply the BS to a circadian protein degradation model in Lim et al. (2021). Step 3 and 4 show how to modify the set of coupled ODEs of a circadian protein degradation model for the BS. How to write the codes for the BS and how to examine the ODE system are included in step 5 and 6, respectively.

3.
Formulate the dynamics as a set of coupled ODEs.
- a.
  In our circadian protein degradation model in Lim et al. (2021), the time derivatives of the concentrations of several forms of a circadian protein are described as follows (Figure 1):
  $\frac{d x_{0} (t)}{d t} = g (t) - a_{0} u (t) x_{0} (t) + a_{1} x_{E, 0} (t) + s x_{H, u b} (t),$ (Equation 5)
  
  $\frac{d x_{E, 0} (t)}{d t} = a_{0} u (t) x_{0} (t) - a_{1} x_{E, 0} (t) - q x_{E, 0} (t),$ (Equation 6)
  
  $\frac{d x_{E, u b} (t)}{d t} = q x_{E, 0} (t) + a_{0} u (t) x_{0, u b} (t) - a_{2} x_{E, u b} (t) - r_{0} x_{E, u b} (t),$ (Equation 7)
  
  $\begin{matrix} \frac{d x_{0, u b} (t)}{d t} = a_{2} x_{E, u b} (t) + b_{1} x_{H, u b} (t) - b_{0} v (t) x_{0, u b} (t) \\ - a_{0} u (t) x_{0, u b} (t) - r_{0} x_{0, u b} (t), \end{matrix}$ (Equation 8)
  
  $\frac{d x_{H, u b} (t)}{d t} = b_{0} v (t) x_{0, u b} (t) - b_{1} x_{H, u b} (t) - s x_{H, u b} (t) - r_{0} x_{H, u b} (t)$ (Equation 9)
  
  with the following two quantities:
  $\bar{u} \equiv u (t) + x_{E, 0} (t) + x_{E, u b} (t),$ (Equation 10)
  
  $\bar{v} \equiv v (t) + x_{H, u b} (t) .$ (Equation 11)
  
  The variables and rate parameters in Equations 5, 6, 7, 8, 9, 10, and 11 are defined in Tables 1 and 2.
- b.
  As a sanity check, sum up Equations 5, 6, 7, 8, and 9 and obtain the time derivative of the total protein concentration:
  $\begin{matrix} \frac{d}{d t} [x_{0} (t) + x_{E, 0} (t) + x_{E, u b} (t) + x_{0, u b} (t) + x_{H, u b} (t)] \\ = g (t) - r_{0} [x_{E, u b} (t) + x_{0, u b} (t) + x_{H, u b} (t)] . \end{matrix}$ (Equation 12)
  
  This result makes sense because g(t) represents the ultimate source of protein production and the variables with a common coefficient r₀ are the concentrations of ubiquitinated proteins destined for degradation.
- c.
  Let x(t) denote the total protein concentration:
  $x (t) \equiv x_{0} (t) + x_{E, 0} (t) + x_{E, u b} (t) + x_{0, u b} (t) + x_{H, u b} (t) .$ (Equation 13)
  
  Equation 12 is rewritten as
  $\frac{d x (t)}{d t} = g (t) - r (t) x (t),$ (Equation 14)
  where r(t) is the protein degradation rate given by
  $r (t) \equiv r_{0} [x_{E, u b} (t) + x_{0, u b} (t) + x_{H, u b} (t)] / x (t) .$ (Equation 15)
- d.
  Be cautious about the implicit physical or biological constraints on the variables and parameters. In the example of our model, all the molecular concentrations should be nonnegative. That is,
  $\begin{matrix} x_{0} (t) \geq 0, x_{E, 0} (t) \geq 0, x_{E, u b} (t) \geq 0, x_{0, u b} (t) \geq 0, x_{H, u b} (t) \geq 0, \\ u (t) \geq 0, v (t) \geq 0 . \end{matrix}$ (Equation 16)
  
  Additionally, because the protein synthesis rate cannot be negative, the following inequality should be satisfied:
  $g (t) \geq 0 .$ (Equation 17)

4.
Transform the ODEs to utilize the experimentally available information on relatively downstream variables in the ODE model.
- a.
  In the example of our model, the experimentally available data were the total concentration x(t) of a protein rather than those of the individual sub-forms of the protein or the protein synthesis rate at the upstream level. Although the experimental data usually represent the relative, but not absolute, molecular concentrations, the use of the relative concentration for x(t) in our model only changes the “unit” of concentrations without loss of generality. On the other hand, if multiple datasets of their own relative levels are incorporated into the model variables of the same dimensional quantities, the appropriate proportionality coefficients or conversion factors should be introduced for the sake of a unified scale.
- b.
  Plug the following relation (from Equation 13) in Equations 5 and 6:
  $x_{0} (t) = x (t) - [x_{E, 0} (t) + x_{E, u b} (t) + x_{0, u b} (t) + x_{H, u b} (t)] .$ (Equation 18)
- c.
  g(t) in Equation 5 is rewritten as
  $g (t) = \frac{d x_{0} (t)}{d t} + a_{0} u (t) x_{0} (t) - a_{1} x_{E, 0} (t) - s x_{H, u b} (t),$ (Equation 19)
  where x₀(t) is replaced by the right-hand side in Equation 18. We are now ready for the BS of our model, using Equations 6, 7, 8, 9, 10, 11, 18, and 19. Note that, in usual practice, x(t) and other variables are simulated using the profile of g(t); in our BS, g(t) and other variables are reversely simulated using the profile of x(t), through Equations 6, 7, 8, 9, 10, 11, 18, and 19.

5.
Write the codes for simulation.
- a.
  Import Python modules (scipy).
- b.
  Read the time-course data of the total protein abundance. If the data do not span more than a single circadian period, expand them by the repetition of the data points to multiple circadian periods. This step is necessary if one wants to simulate the long-term asymptotic behavior of our circadian model, as will be explained later.
- c.
  Interpolate the data points of the total protein abundance. This step is necessary for the construction of a continuous trajectory of x(t) for our model simulation, given the discrete nature of time points with available data.
  - i.
    Import interp1d module from scipy.interpolate.
  - ii.
    Implement a cubic interpolation by setting the option “kind = ‘cubic’”. The result is an interpolated curve of x(t), as exemplified by Figure 2A.
  - iii.
    Build a Python function that uses time, an array of concentrations, an array of kinetic parameters, an interpolated curve of x(t) as input, and returns an array of the model variables and their time derivatives in Equations 6, 7, 8, 9, 10, 11, 18, and 19 as output.
- d.
  Solve the ODEs with scipy.solve_ivp module.
  - i.
    Import solve_ivp module from scipy.integrate.
  - ii.
    Set the parameters “fun”, “t_span”, “t_eval”, and “y0”. Here, “fun” is a Python function of which input is ‘t’ (time) and ‘y’ (an array of the values of variables) and output is an array of the time derivatives of the variables. “t_span” is a tuple containing two time points that indicate the beginning and end of the simulation. “t_eval” is an array of time points at which to store the computed solutions. “y0” is an array of the initial states of the variables.
  - iii.
    When solving the system of ODEs numerically, we must take the following into consideration:(1) Selection of time step: When selecting the maximum time step for the simulation, it should be much smaller than the time scale of dynamics. For the circadian proteins, the time scale is ∼24 h. Therefore, it is safe to select the maximum time step much smaller than an hour. Moreover, checking whether the solution does not noticeably change with smaller time steps will assure the selection of an adequate time step. In the scipy.solve_ivp module, changing the “max_step” option would modify the maximum time step to solve the ODEs.(2) Stiffness of the problem: If the ODEs contain stiff terms, the Runge-Kutta method is not recommended. In this case, “LSODA”, “BDF”, or “Radau” methods are recommended. The method to solve the ODEs can be changed by setting the “method” option in the scipy.solve_ivp module.(3) Numerical error tolerance: When solving the system of ODEs, estimated errors can be controlled. “atol” and “rtol” options in the scipy.solve_ivp module can be used to control absolute and relative errors respectively.

6.
Test the behaviors of the ODE model.
- a.
  Check whether the model output is not too sensitive to the initial conditions, given the profile of the observable (e.g., x(t) in our case) and the parameter values. This step is to determine whether the simulation outcome is essentially unique or not, regardless of particular initial conditions. One method is to randomly sample the initial conditions of each variable within a physiologically-relevant range and then check whether the simulation outcomes converge at similar trajectories. In the case of our model, the code in the following command allows the test of this initial condition dependency:
  > python check_initial_conditions_sensitivity.py
  
  The execution of the code gives the graphs of each variable as the functions of time with different initial conditions. Figures 2B and 2C demonstrate that g(t) and r(t) in our model converge well respectively, regardless of their initial conditions when x(t) (Figure 2A) and parameter values are assigned for the simulation.
- b.
  Determine the minimum simulation length to approach the asymptotic solutions of the ODE model. By running the simulation for a long enough time, i.e., setting the large values for ‘t_span’ and ‘t_eval’ options, check when the ODE solutions exhibit saturated and sustained oscillations in our case. If the simulation does not result in the stable oscillations, try longer simulation lengths. Identify the minimum simulation length to ensure the stable oscillations. The code in the following command allows this test:
  > python check_long_time.py
  
  Executing the code gives the graphs of each variable over a long time. Through these graphs, one can determine the minimum simulation length.
- c.
  Check whether the ODE solutions satisfy all the physical and biological constraints and thus can be considered as feasible solutions. Some ODE solutions may not satisfy these constraints, particularly in the case of BS. The reason is that BS does not run in a natural causal direction from upstream to downstream levels, but traces back the upstream states without the prospect of their compatibility with the parameter values. In other words, only the parameter values with the feasible solutions of BS are compatible with the downstream observable, and hence BS can identify those sensible parameter values. In the example of our model, the solutions should satisfy all the constraints in Equations 16 and 17. The code in the following command allows this feasibility test of the model solutions:
  > python check_physical_constraints.py
  
  The code verifies whether the constraints in Equations 16 and 17 are satisfied or not (Figure 2D).
- d.
  Debug the code of the ODE model. By simulating different parameter values, one can check the potential bugs in the code. In the example of our model, if $\bar{v}$ is set to 0, x_H,ub(t) should be zero. In Equation 8, if a₀u(t) and b₀v(t) are much higher than a₂ and b₁ by setting a₀, b₀, $\bar{u}$ , and $\bar{v}$ as relatively high and setting a₂ and b₁ as relatively low, then x_0,ub(t) should become very small (Figures 2E and 2F). The code in the following command allows these bug tests:
  > python check_debugging.py
  
  Executing this code gives the parameter conditions and the graphs of the relevant simulation results.

The protein degradation model

There are a total of five types of substrate proteins, defined in Table 2. Substrate proteins (rounded rectangles, sky blue) are synthesized from mRNA molecules (blue line, top left) in the ribosome (brown, top left) and ubiquitinated by E3 ubiquitin ligases (orange ovals) with ubiquitins (yellow circles). The ubiquitinated proteins are degraded (gray ovals) or deubiquitinated by deubiquitinating enzymes (light green hexagon). The total protein concentration is represented by x(t) with symbol Σ at the center.

Table 1.

Parameters in the protein degradation model

Parameter	Meaning
a₀	Rate of ubiquitin ligase binding to a substrate protein.
a₁	Dissociation rate of a ubiquitin ligase and a not-ubiquitinated protein.
a₂	Dissociation rate of a ubiquitin ligase and a ubiquitinated protein.
b₀	Rate of deubiquitinating enzyme binding to a ubiquitinated protein.
b₁	Dissociation rate of a deubiquitinating enzyme and a ubiquitinated protein.
q	Ubiquitination rate of a protein binding to a ubiquitin ligase.
s	Deubiquitination rate of a protein binding to a deubiquitinating enzyme, lumped with its subsequent dissociation from the deubiquitinating enzyme.
r₀	Degradation rate of a ubiquitinated protein.
$\bar{u}$	Total ubiquitin ligase concentration.
$\bar{v}$	Total deubiquitinating enzyme concentration.

Open in a new tab

Table 2.

Variables in the protein degradation model

Variable	Meaning
t	Time.
x₀(t)	Concentration of a free protein without ubiquitination.
x_E,0(t)	Concentration of a not-ubiquitinated protein that is binding to a ubiquitin ligase.
x_E,ub(t)	Concentration of a ubiquitinated protein that is binding to a ubiquitin ligase.
x_0,ub(t)	Concentration of a ubiquitinated protein that is not binding to a ubiquitin ligase.
x_H,ub(t)	Concentration of a ubiquitinated protein that is binding to a deubiquitinating enzyme.
g(t)	Protein synthesis rate.
u(t)	Concentration of a free ubiquitin ligase.
v(t)	Concentration of a free deubiquitinating enzyme.

Open in a new tab

Examining the behaviors of the protein degradation model

(A) The example, total protein concentration profile used for BS.

(B and C) Simulated g(t) (B) and r(t) (C) with different initial conditions. Given the protein profile in (A), the simulated g(t) or r(t) rapidly converges at the same trajectory, regardless of its initial conditions.

(D) The example output of the code “check_physical_constraints.py” to check the feasibility of the BS solution. If the solution does not satisfy its fundamental conditions, the code returns the message in (D).

(E and F) Simulated profiles of protein sub-states: x₀(t) (purple), x_E,0(t) (yellow), x_E,ub(t) (gray), x_0,ub(t) (blue), and x_H,ub(t) (red). When $\bar{v}$ ′ is set to zero, x_H,ub(t) becomes zero (E). When a₀, b₀, $\bar{u}$ ′, and $\bar{v}$ ′ are set as relatively high and a₂ and b₁ are set as relatively low, x_0,ub(t) becomes almost zero (F).

Sampling of parameter values

Timing: 1 h to several days (depending on the size of parameter dimension)

In this section, we will describe how to sample a large number of parameter values with multiple CPU cores. Utilizing parallel computing would save much of the simulation time.

7.
Generate multiple parameter values and run the BS with these parameter values.
- a.
  If the number of the parameter values is too large, the Python modules “multiprocessing” and “mpi4py” can save much of the simulation time.
  - i.
    How to use “multiprocessing” module: construct the wrapping module that only takes “parameter_sets”. The “parameter_sets” is a list of N parameter sets, and the wrapper is a wrapping module that takes a single parameter set as input. Parallel computing with a number of CPU cores can be implemented using these modules.
    > import multiprocessing as mp
    
    > parameter_sets = [parameter_set_1, …, parameter_set_N]
    
    > pool = mp.Pool(Number_of_CPU_cores)
    
    > pool.map(wrapper, parameter_sets)
  - ii.
    How to use “mpi4py” module: “mpi4py” module allows the model simulation with each node. An example is shown below.
    > from mpi4py import MPI
    
    > comm = MPI.COMM_WORLD
    
    > num_processor = comm.Get_size()
    
    > rank = comm.Get_rank()
    
    > simulation_done_by_one = 0
    
    > simulation_target_by_one = N
    
    > while(simulation_done_by_one < simulation_target_by_one):
    
    > …
    
    > simulation_done_by_one += 1
    
    Here, N is the target number of the simulations in one node.
    Note: In both the cases i and ii above, be careful when writing a file. If two different nodes access the same file, this file may not be readable at the end.

Expected outcomes

The BS method allows us to identify the valid parameter sets for experimentally available downstream data and to inspect the internal dynamics of the system with the fixed profiles of particular components. The latter would be useful for rigorous mechanistic inspection of a dynamical system. For example, we controlled for the protein profile x(t) in Lim et al. (2021) and found that the degradation rate r(t) tends to be more rhythmic with a lower level of a ubiquitin ligase $\bar{u}$ . In addition, we identified a definite lower bound of $\bar{u}$ for the establishment of a given profile of x(t) itself, along with other interesting phenomena. Without the help of the BS, these clear conclusions may not be drawn, because in the conventional simulation with a fixed profile of the protein synthesis rate g(t), the change of $\bar{u}$ modifies the oscillatory form of x(t) itself and thus does not clearly separate the effect of $\bar{u}$ from that of the x(t) profile on the generation of rhythmic r(t) (Lim et al., 2021).

Limitations

Our BS method is based on ODE models, and its applications beyond ODE models are not yet straightforward. In addition, it should be noticed that BS does not aim to infer unknown parameter values; rather, it infers the upstream dynamics with given parameter values, and thereby identifies the parameter values with feasible upstream states, compatible with the downstream observables. If the BS is implemented with randomly-sampled parameters, the large number of parameter sets may need to be sampled as the dimension of the parameter space increases ( d^N parameter sets, where N is the number of parameters in the model and d is the number of different values sampled for each parameter). Obviously, this massive parameter sampling can be computationally demanding.

Troubleshooting

Problem 1

When solving an ODE model with the scipy.solve_ivp module, the computation time may sometimes be very long (step 6 of section “application of BS to a circadian protein degradation model”).

Potential solution

When the interpolated curve of the experimental profile is smooth enough, the option “method = “RK45”” and “method = “LSODA”” will not take much different computational times. However, if the interpolated curve is not smooth enough, “method = “RK45”” will take longer computational time. Therefore, in this case, we recommend the use of “method = “LSODA”” for shorter computational time.

Problem 2

If the profile of x(t) in our model is too noisy, most BS results will not give the feasible solutions of the upstream states. These noisy patterns are likely to come from very high temporal resolution of the experimental data. For example, the PER2 profile for our BS in Lim et al. (2021) was obtained from the data of Zhou et al. (2015) with 6-min resolution. (step 2.a of section “BS in general cases” and step 4.a of section “application of BS to a circadian protein degradation model”).

Potential solution

Time window averaging or other denoising techniques can be applied to the noisy profile. However, be cautious of the possibility that such smoothening may distort the original patterns in the profile.

Problem 3

The ODE solutions are not sometimes accurate enough (step 5.d.iii and step 6 of section “application of BS to a circadian protein degradation model”).

Potential solution

There is an option in scipy.solve_ivp to control the error-tolerance level in numerical integration of ODEs. By adjusting the options ‘atol’ and ‘rtol’, one can manage the absolute and relative levels of the tolerance to the numerical errors, respectively. However, too small ‘atol’ and ‘rtol’ values can considerably slow down the computation.

Problem 4

When simulating excessively many parameter sets, it is difficult to manually determine whether ODE solutions have reached their attractors or not within the simulation time (step 7 of section “sampling of parameter values”).

Potential solution

Some parameter sets may take long computation time towards the asymptotic states of the model outcome. In the case of our model, if the peak values of the oscillating variables decrease or increase with more than some fold change in the past two circadian periods, these variables may not be considered to reach the stable oscillatory states at that time.

Problem 5

When solving the system of ODEs with the scipy.solve_ivp module, the solution includes time points assigned in the “t_eval” argument. However, a continuous time series of the solution might be needed in some cases (step 5.d of section “application of BS to a circadian protein degradation model”).

Potential solution

If an option in the solve_ivp module, “dense_output” is set to “True”, it will return a class instance for the ODE solution at a given time point. However, this option might increase the computation time for the solution especially when the solution involves a long time series. Alternatively, the use of small time steps for the “t_eval” option and the interpolation of the solution over the last one or two time periods only would save the computation time.

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Pan-Jun Kim (panjunkim@hkbu.edu.hk).

Materials availability

This study did not generate any unique reagents.

Acknowledgments

This work was supported by Hong Kong Baptist University, Research Committee, Start-up Grant for New Academics (R.L. and P.-J.K.) and the National Research Foundation of Korea Grants NRF-2020R1A4A1019140 and NRF-2020R1F1A1075942 funded by the Ministry of Science and ICT (J.C. and C.-M.G). This work was partially conducted with the resources of the High Performance Cluster Computing Centre, Hong Kong Baptist University, which receives funding from Research Grant Council, University Grant Committee of the HKSAR and Hong Kong Baptist University. We also acknowledge the support of the UNIST Supercomputing Center for the computing resources.

Author contributions

C.-M.G. and P.-J.K. supervised the research. J.C., R.L., C.-M.G., and P.-J.K. designed the research. J.C. and R.L. performed the research. J.C., R.L., C.-M.G., and P.-J.K. wrote the manuscript.

Declaration of interests

The authors declare no competing interests.

Contributor Information

Cheol-Min Ghim, Email: cmghim@unist.ac.kr.

Pan-Jun Kim, Email: panjunkim@hkbu.edu.hk.

Data and code availability

This study did not generate new experimental data.

Source codes for our model simulation have been deposited to public repository GitHub, and the link is provided in the key resources table.

References

Lim R., Chae J., Somers D.E., Ghim C.-M., Kim P.-J. Cost-effective circadian mechanism: rhythmic degradation of circadian proteins spontaneously emerges without rhythmic post-translational regulation. iScience. 2021;24:102726. doi: 10.1016/j.isci.2021.102726. [DOI] [PMC free article] [PubMed] [Google Scholar]
Virtanen P., Gommers R., Oliphant T.E., Haberland M., Reddy T., Cournapeau D., Burovski E., Peterson P., Weckesser W., Bright J., et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou M., Kim J.K., Eng G.W., Forger D.B., Virshup D.M. A Period2 phosphoswitch regulates and temperature compensates circadian period. Mol. Cell. 2015;60:77–88. doi: 10.1016/j.molcel.2015.08.022. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

This study did not generate new experimental data.

Source codes for our model simulation have been deposited to public repository GitHub, and the link is provided in the key resources table.

[bib1] Lim R., Chae J., Somers D.E., Ghim C.-M., Kim P.-J. Cost-effective circadian mechanism: rhythmic degradation of circadian proteins spontaneously emerges without rhythmic post-translational regulation. iScience. 2021;24:102726. doi: 10.1016/j.isci.2021.102726. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] Virtanen P., Gommers R., Oliphant T.E., Haberland M., Reddy T., Cournapeau D., Burovski E., Peterson P., Weckesser W., Bright J., et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] Zhou M., Kim J.K., Eng G.W., Forger D.B., Virshup D.M. A Period2 phosphoswitch regulates and temperature compensates circadian period. Mol. Cell. 2015;60:77–88. doi: 10.1016/j.molcel.2015.08.022. [DOI] [PubMed] [Google Scholar]

PERMALINK

Backward simulation for inferring hidden biomolecular kinetic profiles

Junghun Chae

Roktaek Lim

Cheol-Min Ghim

Pan-Jun Kim

Summary

Graphical abstract

Highlights

Before you begin

Check whether the backward simulation (BS) is desirable

Install python and python packages

Key resources table

Step-by-step method details

BS in general cases

Application of BS to a circadian protein degradation model

Figure 1.

Table 1.

Table 2.

Figure 2.

Sampling of parameter values

Expected outcomes

Limitations

Troubleshooting

Problem 1

Potential solution

Problem 2

Potential solution

Problem 3

Potential solution

Problem 4

Potential solution

Problem 5

Potential solution

Resource availability

Lead contact

Materials availability

Acknowledgments

Author contributions

Declaration of interests

Contributor Information

Data and code availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases