Abstract
The time-ordered product framework of quantum field theory can also be used to understand salient phenomena in stochastic biochemical networks. It is used here to derive Gillespie’s Stochastic Simulation Algorithm (SSA) for chemical reaction networks; consequently, the SSA can be interpreted in terms of Feynman diagrams. It is also used here to derive other, more general simulation and parameter-learning algorithms including simulation algorithms for networks of stochastic reaction-like processes operating on parameterized objects, and also hybrid stochastic reaction/differential equation models in which systems of ordinary differ-ential equations evolve the parameters of objects that can also undergo stochastic reactions. Thus, the time-ordered product expansion (TOPE) can be used systematically to derive simulation and parameter-fitting algorithms for stochastic systems.
1 Introduction
The master equation for a continuous-time stochastic dynamical system may be expressed as dp/dt = W · p where W is the time-evolution operator, often an infinite-dimensional matrix. Particular choices for W lead to the special case of the “chemical master equation” for stochastic chemical kinetics, often useful in bioligical applications; we will see this and other applications below. The general master equation has the formal solution p(t) = exp(tW) · p(0). If W can be decomposed as a sum W0 + W1, then there is a perturbation theory for exp(tW) in terms of exp(tW0) and its perturbations by W1. The time-ordered product expansion (which we refer to by the acronym TOPE) gives a formula for the solution of a master equation [1–3] which can be expressed as follows [4]:
| (1) |
This expression can be derived (as in [4]) by expanding in powers of W1, each expanded to all orders in W0, and using the normalization formula for the Dirichlet distribution to subdivide the time interval [0, t]) into k subintervals.
Since W0 and W1 do not generally commute, this expression involves alternation from right to left of W0 and W1 related operations. Using the “time-ordered exponential” of operators [5], this result can be compactly reexpressed as:
| (2) |
where
| (3) |
Here is obtained term by term from the Taylor series for the operator exponential, by reordering all monomials containing terms evaluated at different times so that they are indexed by ordered sequences of times (τk, …, τ1) that increase right to left (details reviewed in Section 3.5.1 below). In field theory it is standard to prefer Equation 2 over Equation 1 for theoretical calculations, but for algorithmic concreteness this paper will favor the more explicit expression Equation 1 where possible. Indeed, each summand of Equation 1 already looks like a Markov chain in which the matrix or operator product operation “·”. which sum over states, is supplemented by integration over an extra time variable. This observation will be made precise in Section 3. In general there is a risk that the infinite sum over terms could diverge. However, the master equation must conserve total probability and this constrains W to have zero column-sums and also constrains the spectrum of W to have nonpositive real parts. In this setting some decompositions W = W0 + W1 converge well enough, as we will see by example below.
One particular specialization of TOPE lets us derive Gillespie’s Stochastic Simulation Algorithm (SSA): take W0 = −D = the diagonal part of W, and W1 = Ŵ = the off-diagonal part of W. Then for chemical reaction networks TOPE generates Feynman-like diagrams. An example is illustrated below for the simple reaction network with just two reactions, the forwards and backwards parts of the generic trivalent reaction A + B ⇌ C, to which others can be reduced.
The TOPE (Equation 1 or Equation 2) can be applied recursively, since it reduces one operator exponential exp(tW) to another one exp(tW0). This fact will be exploited in Section 3 below. But eventually one must get to an operator exponential that is tractable by other means. One way to do this is to let W0 = D = the diagonal part of W, as in the SSA algorithm derivation below.
2 Methods
2.1 Creation/annihilation operator notation
We will use operator notation for molecule (or other reactant) creation and annihilation state changes [1–4]. Here we just review the notation as used in [4]. The elementary operators a and â act (respectively) to destroy and create identical particles of a given type. In the particle-number basis their elements have the entirely off-diagonal expressions
| (4) |
Here δij is the Kronecker delta function. The creation and annihilation operators satisfy the Heisenberg algebra [a, â] = I but are different from those of quantum mechanics because they are not conjugates or transposes of one another. (This is the reason we do not denote the creation operator a†, as it is in quantum mechanics, or a*.) Instead of being conjugate to â, the annihilator a encodes the chemical law of mass action since its nonzero entries are equal to the number of particles available to react or decay. The diagonal “number operator” is N ≡ âa.
The creation and annihilation operators may be represented in terms of their action on probability generating functions , where pn is the probability that there exist n particles of a given type. In this case:
| (5) |
In the presence of different types of particles (eg. molecules or other objects) the creation/annihilation operator notation is generalized, eg to aα and âβ for molecule types Aα, in which all operators for unequal types commute:
Operating on an empty “vacuum” state |0〉 with no objects, the monomials in the creation operators âβ span a Fock space. Molecule or object types indexed by α may even be taken to include arbitrary discrete-valued molecular attributes (or attributes of other objects) such as phosphorylation state or integer-valued parameters. Continuous-valued parameters such as position (in quantum field theory it would more naturally be the conserved momentum, unlike the typical viscous-medium dynamics in biology) may be encoded into a real-valued vector argument x which requires a Dirac delta function instead of a Kronecker delta function, so for example:
| (6) |
A non-molecular example of such parameterized objects would be: cells of a given real-valued volume and/or lengthscale as in Section 3.5.5 below.
However for some attributes such as real-valued object positions one may wish to limit the state space to between zero or nmax,α molecules (or other objects) at each unique real value. The resulting commutator is still diagonal as described in [4]. The particular case nmax,α = 1 is not a stochastic version of fermions because particles with different types or values of the attributes still commute rather than anticommuting.
The basic rule for translating chemical reactions into creation/annihilation operator notiation is: first, annihilate all objects on the incoming or left hand side of a reaction; then create all the objects on the outgoing or right hand side of a reaction. Thus, the off-diagonal part of the operator for a reaction
that converts an incoming multiset {···}* of numerically parameterized reactants {Aα(p)(xp)|p ∈ lhs(r)}* each with parameter vector xp (reactants can appear multiple times in a multiset) into an outgoing multiset {Aβ(q)(yq)|q ∈ rhs(r)}* each with parameter vector yq, is:
| (7) |
There is one such operator for every possible set of values for the numerical parameters. Since time-evolution operators for different processes just add, a generic operator for all parameter values must sum and/or integrate the operator of Equation 18 over all the parameters, in the Cartesian product of measure spaces in which they take values:
| (8) |
The generalization is conceptually straightforward because we have simply used a function ρr([xa], [yb]) to express the possibly infinite number of different reaction rates that pertain to objects that differ only in their attributes. Because of the algebra of noncommuting basic creation and annihilation operators, reaction operators Ôr and Ôr′ for reactions r ≠ r′ that produce and consume a shared reactant Aα(x) (or Aα for reactants with type α but no other parameters) generally also have nonzero commutators.
Equation 18 or Equation 8 add probability to the new state of the system, but do not take it away from the old state of the system before a reaction. That job requires a negative diagonal matrix as shown in Equation 10 below. In the case of Equation 18, the corresponding diagonal operator is . Examples are provided in [4] and below.
2.2 Solvable example: An exact solution for SSA behavior
For a few very simple examples, we can not only solve analytically for the behavior of the biochemical system, but we can even add in the behavior of the SSA simulation algorithm and solve for that exactly as well. For example consider the minimal bidirectional reaction A ↔ Ø. This case is analytically solvable, including the complete statistics of its SSA algorithm simulation. It has forward synthesis and backwards decay reactions. The operator expression is therefore:
| (9) |
Here α = 1 is the generating function variable for the total number of reactions, corresponding to off-diagonal matrix elements of W. Power series in α will decompose total probability according to this number.
Translating the master equation for Equation 9 into a PDE in the two variables t and z using representation Equation 5, and solving analytically, this model has the exact solution
As usual z is the generating function variable whose exponent is the total number nA of A molecules or particles, m is the initial number of molecules, and t is continuous time. The * operation is a convolution of probability distributions. A product of generating functions with the same variable is a convolution of distributions [7]. Note the interpretation in terms of Binomials and Poissons with time-varying parameters. The third factor represents a linearly increasing number of canceling forward/backward reaction pairs as a function of time - a kind of random walk.
The full derivation below will generalize this solvable example, again separating the diagonal from the off-diagonal terms in W.
2.3 Notation for SSA rederivation from TOPE
One specialization of TOPE lets us derive SSA for biochemical reaction networks, as follows. First decompose W into nonegative off-diagonal and non-positive diagonal parts, as must be possible by the conservation and nonnegativity of probability. For example conservation of probability implies ∀p 0 = d(1 · p)/dt = (1 · W) · p ⇒ 1 · W = 0. Then
| (10) |
where I and J index the possible states of the system. To prevent negative probabilities from evolving under the master equation, all entries of Ŵ and therefore D must be nonnegative. In this circumstance the TOPE becomes:
Since the summands over k and integrants over are mutually exclusive, exhaustive and nonnegative, we define the conditional probability distribution on k and by these summand/integrands (where denotes an ordered contiguous sequence of time intervals):
| (11) |
(where a product over zero terms such as is interpreted as the identity matrix, and products over negative numbers of terms such as should not occur). For DII ≠ 0,
Either way,
The bracket notation for an ordered set of components indexed by q will also be used for state variables . The notation “≜” means “equal by definition”. In what follows, the notation “Θ(Pred)” where Pred is a predicate is the Heaviside step function or indicator function taking the value 1 if the predicate is true and 0 if it is false.
2.4 Semigroup property
Suppose t = t1 + t2, all nonnegative. Then for any time-evolution equation we must have the semigroup property:
Is there a k-event version of this rule, for k = k1 + k2? In other words, can we add (nonnegative) numbers of reaction events rather than time intervals? We observe (where again )
Then, according to a derivation given in Appendix I, if k = k1 + k2 and for any , there is a semigroup law:
| (12) |
In this result there is an arbitrary choice of from the interval [0, τk1]. However this form does not yet pertain to conditional probabilities of the form Pr(I, t|k, J), as needed to obtain a computable Markov process algorithm.
3 Results and discussion
Given the foregoing notation, we undertake the derivation of a Markov chain representing the SSA algorithm. We then consider extensions of this result, including parameterized reactants, but focussing mainly on hybrid stochastic event/ordinary differential equation dynamical systems.
3.1 Derivation of a Markov chain
3.1.1 Bayesian recurrence
In Appendix I we argue that the correct Bayesian strategy for moving from Pr(I, k|t, J) to Pr(I, t|k, J), as needed to obtain a simulatable Markov chain, is to consider large stopping times T ≫ t which are overwhelmingly likely to have large reaction numbers n ≫ k; then to marginalize the probability distribution Pr([I], [τ], n|J, T) over all event numbers n > k and to conditionalize it over all event numbers q < k. By that means in Appendix I we derive the recurrence relation
| (13) |
where
| (14) |
and in particular
| (15) |
Note that marginalizes over τk, the time elapsed since the last event k, as well as all later times and events. It is a distribution on histories up to and including the “just-fired” k’th reaction event, within a much longer history.
3.1.2 Markov chain - Summary
From the forgoing Bayesian recurrence equation, and the definition
Appendix I shows is a probability density function and proves the following Markov-like property:
We may reexpress this result as
where we define the Markov chain kernal
| (16) |
In vector/operator notation, for k ≥ 2,
and finally the algorithmic Markov chain expression for SSA including now an initial distribution over J at time t = 0, we have for all k ≥ 0:
| (17) |
which expresses the iteration of a Markov chain of the SSA algorithm. Of course the factor [DK exp(−τDK)Θ(τ ≥ 0)] in Equation 16 is just the SSA exponential distribution of non-negative waiting times τ between reaction events, and ŴIK/DK is just the branching probability for immediately thereafter chosing a reaction that leads to state I.
The foregoing derivation was outlined in far less detail in [8]. A similar equation for SSA was reached by very different methods in [9], Theorem 10.1. To our knowledge this is the first complete derivation of SSA from field theory methods such as TOPE.
This Markov chain expression has also been used as the starting point for the derivation of exact accelerated stochastic simulation algorithms [10,11] that execute many reactions per step (i.e. they “leap” forward) and thus go much faster than SSA, while also sampling from the exact probability distribution given by the just-fired probabilities above. These derivations proceed by algebraic rearrangement of terms to express computationally efficient versions of rejection sampling. The algorithm of [11] has been parallelized, which is often difficult for discrete-event simulations.
3.2 Algorithm: SSA
The SSA algorithm represented by the Markov chain in Equation 16 and Equation 17 above may be written out in pseudocode as follows:
repeat {
compute propensities k(r)
compute
draw waiting time Δt from k(total) exp(−Δtk(total))
t: = t + Δt; // advance the clock by Δt
draw reaction r from distribution k(r)/k(total)and execute reaction r
} until t ≥ tmax
3.3 Extension: Parameterized rule and graph grammar SSA-like algorithm
For biological modeling, including spatial and mechanical modeling of biological systems, it is important to generalize from pure particles to particles with both discrete and continuous attributes. The complication is that reaction or process rates can then depend on the attributes both of the incoming and outgoing objects. A non-molecular example of such parameterized objects would be cells of a given size, whose propensity to divide may actually depend on their real-valued size parameter (as in Section 3.5.5). More generally, this capability enables agent-based modeling and simulation since it allows interacting objects to have dynamic internal state and even (as explained in Section 3.3.2 below) dynamic relationships.
The time-evolution operator of Equation 18 requires that each attribute or parameter vector consist of constants or variables, each variable appearing just once, and any relationships between variables (such as x2,1 = y1,2, x2,1 = x1,2, and/or y2,1 = y1,2) enforced by the reaction rate ρr([xp], [yq]). Alternatively we can allow repeated appearances of symbolic variables Xc upon which the attributes may depend, through the identity function or otherwise. This is a useful improvement in reaction notation which however may require special-purpose symbolic variable-binding algorithms to support efficiently. Generalizing from Equation 18 and Equation 8, as in [4], we include all possible instantiations of parameters xp[X] and yq[X], allowing for repeated occurences of some or all of the variables Xc in [X], with the integrated off-diagonal process representation operator
| (18) |
As before, α(p) and β(q) represent the type of a parameterized object i.e. an object with attributes. Now the symbolic variables Xc each have a type c which has an integration measure μc. Again (as in Equation 18 or 8), summation over all discrete-valued parameters and integration over all continuous-valued parameters generalizes the operator to handle all possible sets of parameter values.
3.3.1 Algorithm: SSA with parametrized reactant objects
The resulting variant of the SSA algorithm for parameterized reactions can be expressed in pseudocode as follows (outlined briefly in [8]):
forall reactions r factor ρ(r)(xin, xout) = k(r)(xin)p(r)(xout|xin);
repeat {
compute SSA propensities as k(r)(xin);
compute ;
draw waiting time Δt from k(total) exp(−Δtk(total));
t: = t + Δt; // advance the clock by Δt
draw reaction r from distribution k(r)(xin)/k(total);
draw xout from p(r)(xout|xin) and execute reaction r;
} until t ≥ tmax
3.3.2 Structural matching
The functions ρ(x, y) appearing in Equation 18 may impose constraints including equality of variables; equivalently we may allow some variables to appear multiple times in object parameter lists. Either way there follows a mechanism to encode structural relations - graphs and labelled graphs - in the input and output variable lists. Object attributes can include Object ID codes which other objects can also include in their parameter lists. (Of course, the numeric values of Object IDs can be globally permuted without changing the structural relationships among extant objects.) In this way, the integrated version of the parameterized reaction operator above encodes structural pattern matching, including variable-binding in logical formulae, among the preconditions that can be enforced before such a generalized reaction or “rule” can fire.
From this point of view, syntactic variable-binding has the semantics of multiple integration [4]. In this way we can entrain pattern-matching systems such as the computer algebra system Mathematica, or logic-based programming languages, to the job of simulating complex process rules. As in rule-based expert systems, when multiple rules might fire the Rete algorithm [12] can be used to speed up the computations required to maintain knowledge of their relative rates.
The resulting systems have the power to model and simulate dynamic labelled graphs including growing multicellular tissues with dynamical cell-neighbor relationships [4] and molecular complexes with dynamical binding structure [13–15]. Thus, the TOPE operator algebra approach also explains why and how structural (graph-) matching computations arise naturally in biochemical and multicellular biological simulation.
3.4 Hybrid SSA/ODE setup
As will be shown in Section 3.4.1 below, the operator formulation for a system of ordinary differential equations is [4]:
| (19) |
Here and in the calculations that follow, the Dirac delta function can be considered as a Gaussian with very small variance, which participates in a limiting process by which, at the end of each calculation, the limit of zero variance is taken.
In [4] this operator expression is generalized from ordinary differential equations to stochastic differential equations, for example those pertaining to the diffusion of particles, as equivalently represented by the Fokker-Planck equation.
3.4.1 Computation of matrix elements
From the commutator
we may calculate matrix elements of Ôdrift in Equation 18 such as:
The easiest treatment for the boundary terms is to add the assumptions that boundaries are at infinity in the space of parameters x, y and z, and that initial conditions place zero probability there, and that finite velocities v(x) ensure the probability remains zero at infinity at finite times. In that case boundary terms can be neglected. Alternatively, we can define Odrift = Ôdrift − diag(1·Ôdrift) which in this case subtracts off the boundary term. Then
| (20) |
If we define x(t) as a time-varying version of z, satisfying
then
Next we calculate 〈w|exp(τOdrift)|z〉. To this end, Taylor’s theorem may be written
if τ is a constant. For small τ we have
For larger τ we have
Thus (where “IC” means initial condition)
| (21) |
QED.
As far as we know this detailed derivation has not appeared previously, though our previous work [4] outlined a simplified version. As a corrollary, using Equation 21 we may multiply by f(w) and integrate over w to calculate
| (22) |
3.5 Hybrid SSA/ODE: Operator algebra derivation
We now derive a new SSA-like simulation algorithm for hybrid combinations of discrete events and ODE dynamics, using operator algebra. The main idea is to replace the exponential distribution factor exp(−tD)with a time integral [15]:
| (23) |
and to add an extra ODE to the system of ODEs in order to keep track of the integral. We will now use the more compact formulation of TOPE in Equation 2 to derive this method.
3.5.1 Heisenberg picture
Let the operators, rather than the states, evolve in time according to W0 according to Equation 3. This is traditionally called the “Heisenberg picture” in distinction to the “Schroedinger picture” in quantum mechanics. Recall Equation 2 and Equation 3, where (···)+ is the time-ordering super-operator:
(and likewise for higher order products). Note that if O(ti) and O(tj) commute for all pairs of times ti and tj, then (O(ti)O(tj))+ = O(ti)O(tj), and the time-ordering operator (···)+ can be dropped. Often the notation
(O(ti)O(tj)) is used in place of (O(ti)O(tj))+ to denote the super-operator that time-orders operator products.
3.5.2 Application to ODE + decay clock
The hybrid system consisting of chemical reactions (possibly parameterized) together with ordinary differential equations has the combined operator W = (Ôreact − Dreact) + ODE, which we can regroup as
and then apply TOPE to ODE − Dreact first with W00 = ODE and W01(tk) = − Dreact(tk), and then again to (ODE − Dreact)+ Ôreact with W0 = W00 + W01 = ODE − Dreact and W1 = Ôreact.
In the first application of TOPE to ODE − Dreact with W00 = ODE, the opererators W01(tk) = −Dreact(tk) defined at different times are all diagonal in the same (particle number basis and therefore commute:
In this circumstance, we can simply drop the time-ordering super-operator (···)+ in Equation 2 and write
| (24) |
where, as in Equation 3, Dreact(t′) = exp(−t′ODE)Dreact exp(t′ODE). In our case, Equation 24 specializes to:
| (25) |
This result looks very similar to Equation 22 applied to
and we now aim to understand and exploit this similarity.
3.5.3 Equivalent ODE
Consider the dynamics expressed in Equation 25. Can we obtain the first factor from ODE’s alone? Yes, if we introduce a new state variable τ involved in every ODE-related rule. Set τ(0) = 0 as the new variable’s initial condition, and augment the ODE operators as follows
| (26) |
In other words, we have added a differential equation for τ to the ODE system
| (27) |
This equation is solvable in terms of a “warped time” coordinate
| (28) |
(Cf. Equation 23.) There are degenerate cases Dreact = 0 only if there are terminal states in the reaction network.
To see that this is the correct procedure, calculate from Equation 21:
| (29) |
This expression agrees with Equation 25, as required. But how do we insure the IC on τ? That can be done as follows:
| (30) |
is a projection operator (i.e. one that satisfies P · P = P) that resets the variable τ to zero after each use. In summary,
| (31) |
Clearly this result is equivalent to Equation 25 and is in the correct form for a Markov chain that can represent a computation. Of course, the matrix element calculated is only relevant if τmax as drawn from the exponential is constrained to be equal to the final value of τ in the final state as solved by the ODE system Õ{DE}. We can implement this constraint with a factor of δ(τ, − τmax) in the Markov chain over states and times. Thus a step in the Markov chain in between reactions can be written as:
As in the SSA derivation, the reaction step is given by factors of Ôreact which need to be normalized by Dreact. Using δ(t − tmax(τmax))dt = δ(τ − τmax)dτ and dτ/dt = Dreact(t), we find
| (32) |
where
represents the Markov chain corresponding to the simulation algorithm.
In implementations so far [4,15] we have used instead the equivalent differential operator
with p = exp(−τ), initialized at p0 = 1, and a uniform distribution on pfinal ∈ [0, 1]. This variant of the ODE was reported independently in [16], though the derivation there did not proceed by general field theory techniques.
3.5.4 Algorithm: Hybrid SSA/ODE solver
By Equation 32 above, a Markov chain algorithm for simulating the hybrid system can be represented in the following SSA-like pseudocode:
factor ρ(r)(xin, xout) = k(r)(xin)p(r)(xout|xin);
repeat {
initialize SSA propensities as k(r)(xin);
initialize ;
initialize τ: = 0;
draw effective waiting time τmax from exp(−τmax)
solve ODE system, including an extra ODE updating
until τ = τmax
draw reaction r from distribution k(r)(xin)/k(total);
draw xout from p(r)(xout|xin) and execute reaction r;
} until t ≥ tmax
3.5.5 Application: Cell division
As a simplified model of stochastic cell division, we may consider constant growth of a linear dimension l of each cell, dl/dt = v, coupled with a stochastic cell division rule whose propensity depends on the ratio of l to a threshold length l0 for likely division:
with a sigmoidal function such as σ(x) = 1/(1 + exp(−x)). In this model the parameter β varies the sharpness of the threshold, and ρdivision is the maximal propensity for division. Experimental evidence for stochastic dependence of division events on cell size in plant cells is reviewed in [17].
The differential equation for length can also be put in the form of a reaction rule that includes an ODE:
as described in [15]. Clearly this model could be augmented with other parameters such as growth signals with their own dynamics. This was done in models of biological stem cell niches in mouse olfactory epithelium and plant root growth, using the foregoing cell division rules. These systems were studied and simulated using the hybrid SSA/ODE algorithm above, in [15,18].
3.5.6 Application: Time-varying propensity for complete polymerization
Consider the n-step polymerization reaction
There is an n(max). Then
where cn+1 is all zeros except for a “1” entry in the lower right corner. Since b̂ and I are matrices that commute, exp(tW) = exp(tλb̂n+1) exp(−tλIn+1) and we easily compute
This is the distribution on polymer completion times. It is an Erlang distribution (a Gamma distribution with integral values of n). If τ is held fixed and n tends towards infinity, this distribution approaches a delta function δ(t − τ), which can lead to differential-delay equation models for reaction networks involving polymerization processes such as transcription [19]. This probability distribution for termination times also corresponds to the time-varying propensity function
| (33) |
which increases monotonically in time.
As in Equation 32, the resulting time-varying propensity still fits within the framework of a Markov chain
(I, t′|J, t) that advances the time variable by an increment that is a random variable. The method of the previous section can be used to implement an SSA-like algorithm, with differential equations that govern propensities replaced by algebraic equations (Equation 33) or, if differential equations are also present, by differential-algebraic equations.
3.5.7 Extended Application: Tissue-level model of Arabidopsis root growth
A full tissue-level model of a hybrid SSA/ODE system has been presented in [18], which details a mathematical model of auxin growth hormone patterning along the developing root of the plant Arabidopsis thaliana, including the pattern formation system in the root apical meristem (RAM). The model was first formulated using a fixed 1D geometry of cells along the central “stele” of the root, including both passive diffusion of auxin originating in the above ground part of the plant, and more importantly autoregulated active transport of auxin. This much of the model is formulated using ordinary differential equations and spatial discretization at the scale of one cell.
However the real root involves cell growth, division and possibly biomechanics in an essential way, so the model was reimplemented in the “Plenum” implementation of the “Dynamical Grammars” modeling language. Dynamical Grammars support parameterized rules such as those of Section 3.5 above at multiple scales (eg cellular and/or molecular scales), and the Plenum implementation [15] uses the foregoing hybrid SSA/ODE algorithm as an essential part of its simulation engine. It also uses a data structure of pattern-matched objects (somewhat akin to that of the Rete algorithm [12]) for efficient handling of the variable-binding involved when there are many rules in a grammar, some of which include repeated variable names. The resulting root growth and patterning model includes rules for cell growth, cell division, mechanical forces between neighboring cells in 1D, cell death at the tip of the root, auxin influx from the shoot, production of a hypothetical second morphogen “Y” possibly playing a role similar to cytokinin, autoregulated active transport of auxin between neighboring cells, passive transport of auxin and Y between neighboring cells, degradation of auxin and Y, and dilution of auxin and Y due to cell growth. There are a total of 13 grammar rules that specify the foregoing mechanisms, with one or two rules per listed mechanism. As in the previous cell division example, each rule is either of “solving” keyword ODE type or of “with” keyword discrete event type. We now present the first four rules of this model.
In the root model there is just one type of parameterized object, a cell. Each cell carries its own internal state information in the form of the values of an ordered list of parameters, each of which is constrained to be of some type (often integers or real values) associated with a measure that can be summed or integrated over. In the plant root model, the parameter types of a cell object are as follows:
cell[currID : ℕ mode : ℕ, l : ℝ, r : ℝ, A : ℝ, Y : ℝ, prevID : ℕ, nextID : ℕ]
Here currID is the integer-valued (or “integer-typed”, currID: ℕ) unique identification number (ID) of the current cell, prevID is the integer-valued ID of the previous cell in the 1D line, nextID is the integer-valued ID of the next cell, mode is an integer-valued label specifying the cell’s internal growth state, l is the real-valued (l: ℝ) current cell length, r is its real-valued size or “radius”, A is its real-valued concentration of auxin, Y is its real-valued concentration of hypothetical substance Y. Here A and Y could alternatively be typed as nonnegative integers in a stochastic molecular simulation, but that was not the modeling choice in this investigation. A slightly simplified version of the rules for cell growth, cell division, biomechanics, and passive diffusion of chemical species between neighboring cells is:
grammar root {
/* cell growth: */
cell[curr, mode, x, r, A, Y, prev, next] → cell[curr, mode, x, r + dr, A, Y, prev, next]
solving {dr/dt = 1/τcycle}
/* cell biomechanics (point masses, dissipation dominated) */
C1 = cell[curr, mode, x, r, A, Y, prev, next],
C2 = cell[next, mode′, x′, r′, A′, Y′, curr, nextnext]
→ C1 = cell[curr, mode, x + dx, r, A, Y, prev, next], C2
solving {dx/dt = −∂xVspring(x − x′, r, r′)}
/*plus similar rule exchanging next and prev; dx/dt just adds up over rules */
/* switch from growth mode (mode=1) to division-waiting mode (=2): */
cell[curr, 1, x, r, A, Y, prev, next] → cell[curr, 2, x, r, A, Y, prev, next]
with ρstop/(1 + exp(− (r − rlim))/Tdiv)
/* cell replication, preserving 1D structure: */
cell[curr, 2, x, r, A, Y, prev, next], cell[prev, mode′, x′, r′, A′, Y′, prevprev, curr]
cell[next, mode, x″, r″, A″, Y″, curr, nextnext]
→ cell[new1, 1, x − r + 2rα + r(1 − α), r(1 − α), A, Y, prev, new2],
cell[new2, 1, x − r + rα, rα, A, Y, new1, next],
cell[prev, mode′, x′, r′, A′, Y′, prevprev, new1],
cell[next, mode, x″, r″, A″, Y″, new2, nextnext]
with
/* auxin/Y passive transport between two neighboring cells: */
C1 = cell[curr, mode, x, r, A, Y, prev, next],
C2 = cell[next, mode′, x′, r′, A′, Y′, curr, nextnext]
→ C1 = cell[curr, mode, x, r, A + dA, Y + dY, prev, next],
C2 = cell[next, mode′, x′, r′, A′ + dA′, Y ′ + dY′, curr, nextnext]
solving {dA/dt = DA(A′ − A), dA′/dt = DA(A − A′),
dY/dt = DY (Y′ − Y), dY′/dt = DY (Y − Y′) }
}
The actual code for these rules is given in Appendix III. It is written using the Plenum implementation [15] of the Dynamical Grammars framework [4]. Plenum is embedded in the Mathematica computer algebra problem-solving environment. Thus, ordinary and partial derivatives as used above are actually a part of the language. The full model file is available as Supplementary Data to this paper. Repeated variables on the left hand side (LHS) must have identical values for the rule to apply. This situation occurs in the cell biomechanics, cell replication and passive transport rules above, where left-right cell neighbor pairs point to one another by sharing ID parameter values like “curr”, “prev” and “next” in the first and last two parameter positions. By contrast, there is no repetition of variables on the LHS of the autonomous cell growth or cell mode-switching rules above, since they have only one object on the LHS of each rule. Algorithmically such repeated variable matching is achieved by symbolic pattern matching or variable-binding; mathematically it is expressed by operator integrals such as Equation 18. The coordinate system used in this example may seem “backwards” since it is a minor convention that roots grow from left to right, but that the quiescent center near the right tip is the origin of coordinates.
The Plenum implementation also performs several symbolic computations including variable-binding for efficient implementation of rules with repeated variables (Equation 18 above and the present extended example), and aggregating the ODE dτ/dt = k(tota)(t) of Section 3.5.4 by adding up the symbolic expressions from the individual rules.
Selected pattern formation snapshots are shown in Figure 3. The phenomenology of the resulting simulations, and of the actual root observations with which they largely agree, is discussed in [18]. Root apical meristem is an example of a stem cell niche in plants. A somewhat more complex stem cell niche model for mouse olfactory epithelium, in two dimensions using Plenum, is given in [15].
Figure 3.



Snapshots of the root growth model, showing cell positions along the horizontal axis (root tip to the right), and concentrations of auxin (solid red curve with one or two peaks) and hypothetical substance Y (dashed blue curve with one or two peaks) with increasing time. Cell state (1=idling in preparation for cell division, vs. 0=growth) is shown in green dotted curve. Parameters are: ρstop = 1, base = .005, ampl = 100, Y0 = 5, rlim = 1, Tdiv = .01, Δ= .2, DA = 0.08, DY = 0.16. Some parameter sets including this one develop extra auxin peaks to the left of the Quiescent Center (~rightmost blue peak), which may specify the location for a new lateral root. Full interpretation of this model is given in [18].
4 Conclusion and outlook
We have shown that the time-ordered product expansion (TOPE) can be used systematically to derive computational simulation and parameter-fitting algorithms for stochastic systems, connecting two seemingly distant areas of research. In doing so we have developed the means to translate formally between field theory language and the language of computable Markov chains in which randomized algorithms can be expressed and derived. By this means we hope to open the door to the use of TOPE and related methods from quantum and statistical field theory in the computational simulation of stochastic biochemical kinetics, with broad applicability in physically based biology. The particular hybrid stochastic process/ordinary differential equation simulation algorithm derived here is very different from interleaving and operator splitting algorithms which are intrinsically approximate; instead, this algorithm is exact in the same sense that SSA is (that is, it draws from the same distribution of just-fired reactions), except for any errors introduced by the ODE solver and in the solver’s detection of the ODE stopping criterion, which is that an auxiliary variable reaches a threshold value. A future prospect for the field theory approach is application to reaction-diffusion systems in which the propagator for particles between reactions is the heat kernal Green’s function for the diffusion equation. The result may be an alternative avenue for derivation of novel particle-based, off-grid stochastic numerical solvers for reaction-diffusion problems as treated in [2], which, like the algorithms shown here, are also amenable to generalizations to exact “leaping” acceleration and to hybrid stochastic/differential equation solution algorithms.
Supplementary Material
Figure 1.

A time history of the reaction A + B ⇌ C. Time flows left to right. Open circles represent reaction events, with probability factor ×W1. In between reaction events are unimolecular particle propagators exp((tk −tk−1)W0), labelled by arrows and particle names (repeated for clarity). This is a non-spatial version of the Lee model in quantum field theory (cf. for example [6]).
Figure 2.

Erlang-derived time-dependent propensities for completion of a multistage process τ = 1, n ∈ {1, …, 10}. Horizontal axis: time, t. Vertical axis: propensity, ρ(t|τ, n). Plots for varying n are superimposed. For larger n there is a “maturation” phenomenon whereby completion at small times is very unlikely, and when a process is “overdue” for completion then its propensity becomes very high. By comparison, propensities for very small n increase rapidly at first and are then relatively flat.
Acknowledgments
Research was supported by NIH grants R01 GM086883 and P50 GM76516 to UC Irvine. I also wish to acknowledge the hospitality, travel support, and research environments provided by the Center for Nonlinear Studies (CNLS) at the Los Alamos National Laboratory, the Sainsbury Laboratory Cambridge University, and the Pauli Center for Theoretical Studies at ETH Zürich and the University of Zürich.
Abbreviations list
- IC
Initial Condition
- ODE
Ordinary Differential Equation
- MC
Markov Chain
- SSA
(Gillespie) Stochastic Simulation Algorithm
- TOPE
Time-Ordered Product Expansion
- LHS
Left Hand Side
- RHS
Right Hand Side
C Appendix I: Bayesian inference derivation
C.1 Semigroup property
Here we provide the omitted details for Section 2.4. For nonnegative times t = t1 + t2, any time-evolution equation must obey the semigroup property:
i.e.
Is there a k-event version of this rule, for k = k1 + k2? We observe (where again )
and we calculate for 0 ≤ k1 ≤ k:
Thus, if k = k1 + k2 and for any , there is a semigroup law:
In this derivation there is an arbitrary choice of from the interval [0, τk1].
C.2 Bayesian recurrence relation
Here we provide the omitted details for Section 3.1.1. We seek a version of the semigroup law that pertains to Pr(I, t|k, J) rather than to Pr(I, k|J, t). This is achieved by a somewhat involved application of Bayes’ rule.
We observe (where again by definition )
so we may define (where J = I0 and I = Ik and DIq ≜ DIqIq
We seek a simple expression for Pr(I, t|k, J), and claim that with suitable caveats it will be determined recursively by
the two factors of which have inverse cancelling normalizations. The obstacle to overcome is that, from the Bayesian point of view, simultaneous knowledge of the simulation end time and final event number can trickle backwards and influence the distribution of likely event firing times at earlier times and event numbers - a completely nonphysical artifact. To avoid this effect we must be careful to ask the right questions for Bayesian inference to answer. To begin with we consider simulation ending times T much longer than event times t that we wish to sample. All events after event k at time t are assumed to be of no interest, so we integrate them out. All earlier events are assumed to be known already, so we conditionalize over them. This is the correct Bayesian way to introduce time asymmetry into the global distribution entire trajectories ( ) above.
The strategy then is to consider large times T ≫ t which are overwhelmingly likely to have large reaction numbers n ≫ k; then to marginalize the probability distribution Pr([I], [τ], n|J, T) over all event numbers n > k and to conditionalize it over all event numbers q < k.
C.2.1 Marginalizing
Define
| (34) |
This is a “just-fired” probability, in which any wait times τ and events after the kth event are integrated out (marginalized).
In the limit T → ∞ only summands with n ≫ k will contribute (assuming terminal states have been formally eg. by adding an extra, isolated, slow, reversible reaction). First, is this object really a probability distribution? Clearly every value is nonnegative. They also add up to one in the limit T → ∞:
| (35) |
due to the normalization of . So, is also a probability density function.
Next we compute using TOPE:
But now the product [ ] is a common factor and can be moved out of all the integrals and sums. Thus
(the first factor of which is independent of T), where
If we define new dummy variables I′q ≜ Iq+k, τ′q ≜ τq+k, and n′ ≜ n − k, then τ′n′ = τn, I′n′ = In, and
by adding and subtracting the missing n′= 0 summand and using TOPE again. Now we can take limits:
and
| (36) |
As a special case, for k = 1 we find .
C.2.2 Conditionalizing
If 2 ≤ k < n, Bayes’ Rule implies:
| (37) |
The denominator is a new quantity (since the k’s don’t match up the way they do in the numerator) and it is the integral of the numerator that normalizes the left hand side. It can be evaluated in the limit of large T:
from Equation 34 and Equation 36. As before, the second step is justified by the fact that n ≤ k has probability that approaches zero as T → ∞. Defining
we find (from Equation 37)
since for valid nonnegative τs the ratio of limits exists and is finite, as we will shortly see. Thus
The last line is actually independent (in the functional rather than probabilistic sense of “independent”) of the quantitites , and k, so we can drop all these arguments from . Restating,
Importantly, this expression is equal to as calculated at the end of the last section. Also the recursive statement of the Bayesian recurrence property Equation 37 becomes:
Since , we find for all k ≥ 2 the Bayesian recurrence relation in terms of alone:
| (38) |
C.3 Markov Chain Derivation
Here we provide the omitted details for Section 3.1.2. Continuing from the foregoing Bayesian recurrence property (Equation 38), we now sum over all Iq except Ik = I and I0 = J, and integrate over all τq, the following equation:
We define and calculate
This is also probability density:
as shown in Equation 35 above, using the definition of Equation 36. Summarizing its Markov property:
| (39) |
D Appendix II: Maximum likelihood parameter inference
Application of the TOPE to maximum-likelihood parameter learning in stochastic reaction networks has previously been presented [20]. Here, for completeness of presentation for a different audience, we just show the essential gradient calculation step.
Suppose we have observations of the state of a chemical reaction network at times {ts}, and wish to improve the probability P (Data|Model) of a reaction network model for the flow of probability at intermediate times. We will use the TOPE for each time interval in between observation times ts:
| (40) |
We will need to compute the derivatives of this probability with respect to reaction rates:
where we defined the “branching ratio”
for reaction r in state J, assuming each reaction r results in just one output state I per input state J. Here R(I, J) is the random variable denoting the actual reaction chosen in transitioning from state J to state I. Then
This finally is a quantity that is easy to compute as a running average during a simulation of the network with incorrect values of the parameters, thereby contributing to the calculation of an improved set of parameter values in a stochastic gradient descent algorithm. This is the key update equation in a learning algorithm for reaction rates in stochastic biochemical networks (extensible to other process networks). Algorithmic details can be found in [20], noting particularly Equation 2.4 therein. A related stochastic learning algorithm is proposed in [8].
E Appendix III: Dynamical grammar for root growth
Given the following function simplified definitions among others:
gGrowthModelMult = 1;
growthConst = 1/gCellCyleTime;
yEffectOnDivisionFunc[y_] := Module[{δ, h1, h2},
]
cellGrowthLocFunc[rad_] := Module[{},
gGrowthModelMult * growthConst
];
springXFunc[curPosx_, curRad_, nbrPosx_, nbrRad_] :=
-∂curPosx springPotential[curPosx, curRad, nbrPosx, nbrRad]
The actual grammar for selected rules is shown here:
gRootGrowth := Grammar[rules→
{
(* continuous change in cell c1 radius *)
{c1 Equal cell[cellID1, 1(* growth mode *), loc1, rad1, auxin1, y1, cellIDP, cellIDN]}→ c1,
solving[rad1′EqualcellGrowthLocFunc[rad1]],
(* continuous change in cell c1 location *)
{c1 Equal cell[cellID, cMode, loc, rad, auxin, y, cellIDPrev, cellIDNext],
c2 Equal cell[cellIDNext, cModeN, locN, radN, auxinN, yN, cellID, cellIDNN]}→ {c1, c2},
solving[loc′EqualgGrowthModelMult * springXFunc[loc, rad, locN, radN]],
(* continuous change in cell c1 location *)
{c1 Equal cell[cellID, cMode, loc, rad, auxin, y, cellIDPrev, cellIDNext],
c2 Equal cell[cellIDPrev, cModeP, locP, radP, auxinP, yP, cellIDPP, cellID]}→ {c1, c2},
solving[loc′EqualgGrowthModelMult * springXFunc[loc, rad, locP, radP]],
(* change cell mode from growth to wait, when over a radius threshold *)
cell[cellID, 1, loc, rad, auxin, y, cellIDPrev, cellIDNext]→
cell[cellID, 2, loc, rad, auxin, y, cellIDPrev, cellIDNext],
with[gGrowthModelMult * stopGrowthConst * grammarSigmoid[rad - gLimitCellRad, gDivideTemp]],
(* divide a cell when its in wait mode *)
cell[cellID, 2, loc, rad, auxin, y, cellIDPrev, cellIDNext]→ {
cell[cellIDPrev, cModeP, locP, radP, auxinP, yP, cellIDPP, cellID]→
cell[cellIDPrev, cModeP, locP, radP, auxinP, yP, cellIDPP, grammarCreateObjectID[1]],
cell[cellIDNext, cModeN, locN, radN, auxinN, yN, cellID, cellIDNN]→
cell[cellIDNext, cModeN, locN, radN, auxinN, yN, grammarCreateObjectID[2], cellIDNN],
cell[grammarCreateObjectID[1], 1, loc - rad + 2rad * cellpart + rad * (1 - cellpart),
rad * (1 - cellpart), auxin, y, cellIDPrev, grammarCreateObjectID[2]],
cell[grammarCreateObjectID[2], 1, loc - rad + rad * cellpart,
rad * cellpart, auxin, y, grammarCreateObjectID[1], cellIDNext]},
with[gGrowthModelMult * yEffectOnDivisionFunc[y] *
grammarPDF[UniformDistribution[{0.5 - gRangeParam, 0.5 + gRangeParam}], cellpart]],
(* … more rules … *)
(* auxin/y passive transport between two neighboring cells *)
{c0 Equal cell[cellID0, cMode0, loc0, rad0, auxin0, y0, cellIDP0, cellID1],
c1 Equal cell[cellID1, cMode1, loc1, rad1, auxin1, y1, cellID0, cellIDNext]}→ {c0, c1},
solving[auxin1′Equal pt(auxin0 - auxin1), auxin0′Equal pt(auxin1 - auxin0),
y1′Equal pty(y0 - y1), y0′Equal pty(y1 - y0)],
(* … more rules … *)
}];
Note that for efficiency, the symbolic partial derivative is taken out of the grammar (rule 3, biomechanics) and precomputed. Also, the cell division rule above actually has the form of a compound rule, whose right hand side comprises two further rules. This point was simplified out of the notation in the main text. It is an efficiency measure that allows a rule firing to be a multistep process (similar to the subgrammars or macros of [4]) without slowing down the computational identification of cells likely to divide specified by the with clause of the rule. However, its use here relies on the dynamically invariant, domain-specific fact that each cell is the nth neighbor (in this case n=1 or 2) of at most one other cell.
References
- 1.Doi J. Second quantization representation for classical many-particle system. 1976 J Phys A: Math Gen. 1976;9:1465. [Google Scholar]
- 2.Doi J. Stochastic theory of diffusion-controlled reaction 1976 J. Phys A: Math Gen. 1976;9:1479. [Google Scholar]
- 3.Mattis DC, Glasser ML. The uses of quantum field theory in diffusion-limited reactions. Rev Mod Phys. 1998;70:979–1001. [Google Scholar]
- 4.Mjolsness E, Yosiphon G. Stochastic Process Semantics for Dynamical Grammars. Annals of Mathematics and Artificial Intelligence. 2006;47(3–4) [Google Scholar]
- 5.Fried HM. Green’s Functions and Ordered Exponentials. Cambridge University Press; 2002. [Google Scholar]
- 6.Bender CM, Brandt SF, Chen JH, Wang Q. Ghost Busting: PT-Symmetric Interpretation of the Lee Model. Physical Review D. 2005;71:025014. [Google Scholar]
- 7.Zhang Xueying, DeCock Katrien, Bugallo Mónica F, Djurić Petar M. A general method for the computation of probabilities in systems of first order chemical reactions. J Chem Phys. 2005;122:104101. doi: 10.1063/1.1855311. [DOI] [PubMed] [Google Scholar]
- 8.Yosiphon G, Mjolsness E. Towards the Inference of Stochastic Biochemical Network and Parameterized Grammar Models. In: Lawrence N, Girolami M, Rattray M, Sanguinetti G, editors. Learning and Inference in Computational Systems Biology. MIT Press; 2010. [Google Scholar]
- 9.Wilkinson Darren J. Stochastic Modelling for Systems Biology. Chapman & Hall/CRC Press; Boca Raton, Florida: 2006. [Google Scholar]
- 10.Mjolsness E, Orendorff D, Chatelain P, Koumoutsakos P. An Exact Accelerated Stochastic Simulation Algorithm. Journal of Chemical Physics. 2009;130:144110. doi: 10.1063/1.3078490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Orendorff David. PhD thesis. UC Irvine Computer Science Department; 2012. Jun, Exact and Hierarchical Reaction Leaping: Asymptotic Improvements to the Stochastic Simulation Algorithm. Thesis available at: http://computableplant.ics.uci.edu/~dorendor/thesis. [Google Scholar]
- 12.Forgy C. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence. 1982;(19):17–37. [Google Scholar]
- 13.Hlavacek WS, Faeder JR, Blinov ML, Posner RG, Hucka M, Fontana W. Rules for modeling signal-transduction systems. Science’s STKE. 2006:re6. doi: 10.1126/stke.3442006re6. [DOI] [PubMed] [Google Scholar]
- 14.Danos V, Feret J, Fontana W, Harmer R, Krivine J. Rule-based modelling of cellular signaling. Lect Notes Comput Sci. 2007;4703:17–41. [Google Scholar]
- 15.Yosiphon G. PhD Thesis. UC Irvine Computer Science Department; 2009. Jun, Stochastic Parameterized Grammars: Formalization, Inference, and Modeling Applications. Thesis and software : http://computableplant.ics.uci.edu/~guy/Plenum.html. [Google Scholar]
- 16.Crudu A, Debussche A, Radulescu O. Hybrid stochastic simplifications for multiscale gene networks. BMC Systems Biology. 2009;3:89. doi: 10.1186/1752-0509-3-89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Roeder AHK. When and where plant cells divide: a perspective from computational modeling. Current Opinion in Plant Biology. 2012;15:1–7. doi: 10.1016/j.pbi.2012.08.002. [DOI] [PubMed] [Google Scholar]
- 18.Mironova VV, Omelyanchuk Nadya A, Yosiphon Guy, Fadeev Stanislav I, Kolchanov Nikolai A, Mjolsness Eric, Likhoshvai Vitaly A. A plausible mechanism for auxin patterning along the developing root. BMC Systems Biology. 2010;4:98. doi: 10.1186/1752-0509-4-98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Likhoshvai VA, Demidenko GV, Fadeev SI. Modeling of Gene Expression by the Delay Equation. Bioinformatics of Genome Regulation and Structure II (2006): Part. 2006;3:421–431. doi: 10.1007/0-387-29455-4_40. [DOI] [Google Scholar]
- 20.Wang Y, Christley S, Mjolsness E, Xie X. Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent. BMC Systems Biology. 2010;4:99. doi: 10.1186/1752-0509-4-99. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
