Robust Stochastic Chemical Reaction Networks and Bounded Tau-Leaping

David Soloveichik

doi:10.1089/cmb.2008.0063

. 2009 Mar;16(3):501–522. doi: 10.1089/cmb.2008.0063

Robust Stochastic Chemical Reaction Networks and Bounded Tau-Leaping

David Soloveichik ^1,^✉

PMCID: PMC3203517 PMID: 19254187

Abstract

The behavior of some stochastic chemical reaction networks is largely unaffected by slight inaccuracies in reaction rates. We formalize the robustness of state probabilities to reaction rate deviations, and describe a formal connection between robustness and efficiency of simulation. Without robustness guarantees, stochastic simulation seems to require computational time proportional to the total number of reaction events. Even if the concentration (molecular count per volume) stays bounded, the number of reaction events can be linear in the duration of simulated time and total molecular count. We show that the behavior of robust systems can be predicted such that the computational work scales linearly with the duration of simulated time and concentration, and only polylogarithmically in the total molecular count. Thus our asymptotic analysis captures the dramatic speedup when molecular counts are large, and shows that for bounded concentrations the computation time is essentially invariant with molecular count. Finally, by noticing that even robust stochastic chemical reaction networks are capable of embedding complex computational problems, we argue that the linear dependence on simulated time and concentration is likely optimal.

Key words: tau-leaping, computational complexity, stochastic processes, robustness

1. Introduction

The stochastic chemical reaction network (SCRN) model of chemical kinetics is used in chemistry, physics, and computational biology. It describes interactions involving integer number of molecules as Markov jump processes (Érdi and Tóth, 1989; Gillespie, 1992; McQuarrie, 1967; van Kampen, 1997), and is used in domains where the traditional model of deterministic continuous mass action kinetics is invalid due to small molecular counts. Small molecular counts are prevalent in biology: for example, over 80% of the genes in the Escherichia coli chromosome are expressed at fewer than a hundred copies per cell, with some key control factors present in quantities under a dozen (Guptasarma, 1995; Levin, 1999). Indeed, experimental observations and computer simulations have confirmed that stochastic effects can be physiologically significant (Elowitz et al., 2002; McAdams and Arkin, 1997; Suel et al., 2006). Consequently, the stochastic model is widely employed for modeling cellular processes (Arkin et al., 1998) and is included in numerous software packages (Adalsteinsson et al., 2004; Kierzek, 2002; Vasudeva and Bhalla, 2004).¹ The stochastic model becomes equivalent to the classical law of mass action when the molecular counts of all participating species are large (Ethier and Kurtz, 1986; Kurtz, 1972).

Gillespie's stochastic simulation algorithm (SSA) can be used to model the behavior of SCRNs (Gillespie, 1977). However, simulation of systems of interest often requires an unfeasible amount of computational time. Some work has focused on optimizing simulation of large SCRNs—many different species and reaction channels. For example, certain tricks can improve the speed of deciding which reaction occurs next if there are many possible choices (Gibson and Bruck, 2000). However, for the purposes of this paper we suppose that the number of species and reactions is relatively small, and that it is fundamentally the number of reaction occurrences in a given interval of time that presents the difficulty. Because SSA simulates every single reaction event, simulation is slow when the number of reaction events is large.

On the face of it, simulation should be possible without explicitly modeling every reaction occurrence. In the mass action limit, fast simulation is achieved using numerical ODE solvers. The complexity of the simulation does not scale at all with the actual number of reaction occurrences but with overall simulation time and the concentration of the species. If the volume gets larger without a significant increase in concentration, mass action ODE solvers achieve a profound difference in computation time compared to SSA.² Moreover maximum concentration is essentially always bounded, because the model is only valid for solutions dilute enough to be well mixed, and ultimately because of the finite density of matter. However, mass action simulation can only be applied if molecular counts of all the species are large. Even one species that maintains a low molecular count and interacts with other species prevents the use of mass action ODE solvers.

Another reason why it seems that it should be possible to simulate stochastic chemical systems quickly, is that for many systems the behavior of interest does not depend crucially upon details of events. For example biochemical networks tend to be robust to variations in concentrations and kinetic parameters (Alon, 2007; Morohashi et al., 2002). If these systems are robust to many kinds of perturbations, including sloppiness in simulation, can we take advantage of this to speed up simulation? For example, can we approach the speed of ODEs but allow molecular counts of some species to be small? Indeed, tau-leaping algorithms (Cao et al., 2006, Gillespie, 2001, 2007, Rathinam et al., 2003), are based on the idea that if we allow reaction propensities to remain constant for some amount of time τ, but therefore deviate slightly from their correct values, we don't have to explicitly simulate every reaction that occurs in this period of time (and can thus “leap” by amount of time τ).

In this paper we formally define robustness of the probability that the system is in a certain state at a certain time to perturbations in reaction propensities. We also provide a method for proving that certain simple systems are robust. We then describe a new approximate stochastic simulation algorithm called bounded tau-leaping (BTL), which naturally follows from our definition of robustness, and provably provides correct answers for robust systems. In contrast to Gillespie's and others' versions of tau-leaping, in each step of our algorithm the leap time, rather than being a function of the current state, is a random variable. This algorithm naturally avoids some pitfalls of tau-leaping: the concentrations cannot become negative, and the algorithm scales to SSA when necessary, in a way that there is always at least one reaction per leap. However, in the cases when there are “opposing reactions” (canceling or partially cancelling each other) other forms of tau-leaping may be significantly faster (Rathinam and El Samad, 2007).

BTL seems more amenable to theoretical analysis than Gillespie's versions (Gillespie, 2001, 2003, Cao et al., 2006), and may thus act as a stand-in for approximate simulation algorithms in analytic investigations. In this paper we use the language and tools of computational complexity theory to formally study how the number of leaps that BTL takes varies with the maximum molecular count m, time span of the simulation t, and volume V. In line with the basic computational complexity paradigm, our analysis is asymptotic and worst-case. “Asymptotic” means that we do not evaluate the exact number of leaps but rather look at the functional form of the dependence of their number on m, t, and V. This is easier to derive and allows for making fundamental distinctions (e.g., an exponential function is fundamentally larger than a polynomial function) without getting lost in the details. “Worst-case” means that we will not study the behavior of our algorithm on any particular chemical system but rather upper-bound the number of leaps our algorithm takes independent of the chemical system. This will allow us to know that no matter what the system we are trying to simulate, it will not be worse than our bound.

In this computational complexity paradigm, we show that indeed robustness helps. We prove an upper bound on the number of steps our algorithm takes that is logarithmic in m, and linear in t and total concentration C = m/V. This can be contrasted with the exact SSA algorithm which, in the worst case, takes a number of steps that is linear in m, t, and C. Since a logarithmic dependence is much smaller than a linear one, BTL is provably “closer” to the speed of ODE solvers for mass action systems which have no dependence on m.³

Finally we ask whether it is possible to improve upon BTL for robust systems, or did we exhaust the speed gains that can be obtained due to robustness? In the last section of the paper, we connect this question to a conjecture in computer science that is believed to be true. With this conjecture we prove that there are robust systems whose behavior cannot be predicted in fewer computational steps than the number of leaps that BTL makes, ignoring multiplicative constant factors and powers of log m. We believe other versions of tau-leaping have similar worst-case complexities as our algorithm, but proving equivalent results for them remains open.

2. Model and Definitions

A Stochastic Chemical Reaction Network (SCRN) Inline graphic specifies a set of N species and M reactions . The state of is a vector indicating the integral molecular counts of the species.⁴ A reaction R_j specifies a reactants' stoichiometry vector , a products' stoichiometry vector , and a real-valued rate constant k_j > 0. We describe reaction stoichiometry using a standard chemical “arrow” notation; for example, if there are three species, the reaction Inline graphic has reactants vector and products vector . A reaction R_j is possible in state if there are enough reactant molecules: (∀i) x_i − r_ij ≥ 0. Then if reaction R_j occurs (or “fires”) in state , the state changes to , where is the state change vector for reaction R_j defined as Inline graphic . We follow Gillespie and others and allow unary and bimolecular reactions only. Sometimes the model is extended to higher-order reactions (van Kampen, 1997), but the merit of this is a matter of some controversy.

Let us fix an SCRN Inline graphic . Given a starting state and a fixed volume V, we can define a continuous-time Markov process we call an SSA process⁵ of according to the following stochastic kinetics. Given a current state , the propensity function a_j of reaction R_j is defined so that is the probability that one R_j reaction will occur in the next infinitesimal time interval [t, t + dt). If R_j is a unimolecular reaction Inline graphic then the propensity is proportional to the number of molecules of S_i currently present since each is equally likely to react in the next time instant; specifically, for reaction rate constant k_j. If R_j is a bimolecular reaction where i ≠ i′, then the reaction propensity is proportional to Inline graphic , which is the number of ways of choosing a molecule of S_i and a molecule of , since each pair is equally likely to react in the next time instant. Further, the probability that a particular pair reacts in the next time instant is inversely proportional to the volume, resulting in the propensity function Inline graphic . If R_j is a bimolecular reaction then the number of ways of choosing two molecules of S_i to react is and the propensity function is .

Since the propensity function a_j of reaction R_j is defined so that Inline graphic is the probability that one R_j reaction will occur in the next infinitesimal time interval [t, t + dt), state transitions in the SSA process are equivalently described as follows: If the system is in state , no further reactions are possible if . Otherwise, the time until the next reaction occurs is an exponential random variable with rate Inline graphic . The probability that next reaction will be a particular is .

We are interested in predicting the behavior of SSA processes. While there are potentially many different questions that we could be trying to answer, for simplicity we define the prediction problem as follows. Given an SSA process Inline graphic , a time t, a state , and δ ≥ 0, predict⁶ whether is in at time t, such that the probability that the prediction is incorrect is at most δ. In other words we are interested in algorithmically generating values of a Bernoulli random variable such that the probability that when Inline graphic is not in at time t plus the probability that when is in at time t is at most δ. We assume δ is some small positive constant. We can easily extend the prediction problem to a set of states Γ rather than a single target state by asking to predict whether the process is in any of the states in Γ at time t. Since Γ is meant to capture some qualitative feature of the SSA process that is of interest to us, it is called an outcome.

By decreasing the volume V (which speeds up all bimolecular reactions), increasing t, or allowing for more molecules (up to some bound m) we are increasing the number of reaction occurrences that we may need to consider. Thus for a fixed SCRN, one can try to upper bound the computational complexity of the prediction problem as a function of V, t, and m. Given a molecular count bound m, we define the bounded-count prediction problem as before, but allowing an arbitrary answer if the molecular count exceeds m within time t. Suppose Inline graphic is a bounded-count prediction problem with molecular count bound m, error bound δ, about time t and an SSA process in which the volume is V. We then say is a (m, t, C, δ)-prediction problem where C = m/ V is a bound on the maximum concentration.⁷ Fixing some small δ, we study how the computational complexity of solving (m, t, C, δ)-prediction problems may scale with increasing m, t, and C. If the (m, t, C, δ)-prediction problem is regarding an outcome Γ consisting of multiple states, we require the problem of deciding whether a particular state is in Γ to be easily solvable. Specifically we require it to be solvable in time at most polylogarithmic in m, which is true for any natural problem.

It has been observed that permitting propensities to deviate slightly from their correct values, allows for much faster simulation, especially if the molecular counts of some species are large. This idea forms the basis of approximate stochastic simulation algorithms such as tau-leaping (Gillespie, 2001). As opposed to the exact SSA process described above, consider letting the propensity function vary stochastically. Specifically, we define new propensity functions Inline graphic where are random variables indexed by reaction and time. The value of ξ_j (t) describes the deviation from the correct propensity of reaction R_j at time t, and should be close to 1. For any SSA process we can define a new stochastic process called a perturbation of through the choice of the distributions of {ξ_j (t)}. Note that the new process may not be Markov, and may not possess Poisson transition probabilities. If there is a 0 < ρ < 1 such that Inline graphic , then we call the new process a ρ-perturbation. There may be systems exhibiting behavior such that any slight inexactness in the calculation of propensities quickly gets amplified and results in qualitatively different behavior. However, for some processes, if ρ is a small constant, the ρ-perturbation may be a good approximation of the SSA process. That a ρ-perturbation is bounded multiplicatively (i.e., that ξ_j (t) acts multiplicatively) corresponds to our intuitive notion that proportionally larger deviations are required to have an effect if the affected propensity is large.

We now define our notion of robustness. Intuitively, we want the prediction problem to not be affected even if reaction propensities vary slightly. Formally, we say an SSA process Inline graphic is (ρ, δ)-robust with respect to state at time t if for any ρ-deviating process based on , the probability of being in at time t is within plus or minus δ of the corresponding probability for . This definition can be extended to an outcome Γ similar to the definition on the prediction problem. Finally, we say an SSA process Inline graphic is (ρ, δ)-robust with respect to a prediction problem (or bounded-count prediction problem) if is (ρ, δ)-robust with respect to the same state (or outcome) as specified in , at the same time t as specified in .

For simplicity, we often use asymptotic notation. The notation O(1) is used to denote an unspecified positive constant. This constant is potentially different every time the expression O(1) appears.

3. Robustness Examples

In this section, we elucidate our notion of robustness by considering some examples. In general, the question of whether a given SSA process is (ρ, δ)-robust for a particular outcome seems a difficult one. The problem is especially hard because we have to consider every possible ρ-perturbation—thus, we may not even be able to give an approximate characterization of robustness by simulation with SSA. However, we can characterize the robustness of certain (simple) systems.

For an SSA process or ρ-perturbation Inline graphic , and outcome Γ, let be the probability of being in Γ at time t. Consider the SCRN shown in Figure 1a. We start with 300 molecules of S₁ and S₃ each, and are interested in the outcome Γ of having at least 150 molecules of S₄. The dashed line with circles shows F for the correct SSA process Inline graphic . (All plots of F are estimated from 10³ SSA runs.) The two dashed lines without circles show F for two “extremal” ρ-perturbations: with constant ξ_j (t) = 1 + ρ, and with constant ξ_j (t) = 1 − ρ. What can we say about other ρ-perturbations, particularly where the ξ_j (t) have much more complicated distributions? It turns out that for this SCRN and Γ, we can prove that any ρ-perturbation falls within the bounds set by the two extremal ρ-perturbations Inline graphic and . Thus, F for any ρ-perturbation falls within the dashed lines. Formally, is monotonic with respect to Γ using the definition of monotonicity in Appendix A.3. This is easily proven by Lemma A.5 because every species is a reactant in at most one reaction. Then by Lemma A.4, for any ρ-perturbation Inline graphic .

FIG. 1. — Examples of SCRNs exhibiting contrasting degrees of robustness. The SSA process and outcome Γ are defined for the two systems as follows: (a) Rate constants: k₁ = 1, k₂ = 0.001; start state: ; outcome Γ: x₄ ≥ 150. (b) Rate constants: k₁ = 0.01, k₂ = 0.01; start state: ; outcome Γ: x₂ ≥ 160. Plots show F^Γ (·, t) for an SSA process or ρ-perturbation estimated from 10³ SSA runs. (**Dashed line with circles**) Original SSA process . (**Dashed lines without circles**) The two extremal ρ-perturbations: with constant *ξ_j* (t) = 1 + ρ, and with constant *ξ_j* (t) = 1−ρ For SCRN (b), we also plot F^Γ (·, t) for a ρ-perturbation with constant ξ₁(t) = 1 + ρ, ξ₂(t) = 1 − ρ (triangles), or constant ξ₁(t) = 1 − ρ, ξ₂(t) = 1 + ρ (diamonds). Perturbation parameter ρ = 0.1 throughout.

To see how the robustness of this system can be quantified using our definition of (ρ, δ)-robustness, first consider two time points t = 4.5 and t = 6. At t = 4.5, the probability that the correct SSA process Inline graphic has produced at least 150 molecules of S₄ is slightly more than 0.5. The corresponding probability for ρ-perturbations of can be no larger than about 0.95 and no smaller than about 0.1. Thus is (ρ, δ)-robust with respect to outcome Γ at time t = 4.5 for ρ = 0.1 and δ approximately 0.45, but not for smaller δ. On the other hand at t = 6, the dashed lines are essentially on top of each other, resulting in a tiny δ. In fact, δ is small for all times less than approximately 3.5 or greater than approximately 5.5.

What information did we need to be able to measure (ρ, δ)-robustness? Processes Inline graphic and are simply scaled in time. Thus knowing how varies with t allows one to quantify (ρ, δ)-robustness at the various times; can be estimated from multiple SSA runs of as in Figure 1. Intuitively, is (ρ, δ)-robust for small δ at all times t when does not change quickly with t (see Appendix A.3). For systems that are not monotonic, knowing how Inline graphic varies with time may not help with evaluating (ρ, δ)-robustness.

Indeed, for a contrasting example, consider the SCRN in Figure 1b. We start with 300 molecules of S₁, 10 molecules of S₂, and 10 molecules of S₃, and we are interested in the outcome of having at least 160 molecules of S₂. Since S₁ is a reactant in both reactions, Lemma A.5 cannot be used. In fact, the figure shows two ρ-perturbations (triangles and diamonds) that clearly escape from the boundaries set by the dashed lines. The triangles show F for the ρ-perturbation where the first reaction is maximally sped up and the second reaction is maximally slowed down. (Vice versa for the diamonds.) For characterization of the robustness of this system via (ρ, δ)-robustness, consider the time point t = 2.5. The probability of having at least 160 molecules of S₂ in the correct SSA process Inline graphic is around 0.5. However, this probability for ρ-perturbations of can deviate by at least approximately 0.4 upward and downward as seen by the two ρ-perturbations (triangles and diamonds). Thus at this time the system is not (ρ, δ)-robust for δ approximately 0.4. What about other ρ-deviations? It turns out that for this particular system, the two ρ-perturbations corresponding to the triangles and diamonds bound F in the same way that Inline graphic and bounded F in the first example (exercise left to the reader). Nonetheless, for general systems that are not monotonic it is not clear how one can find such bounding ρ-perturbation and in fact they likely would not exist.

Of course, there are other types of SSA process that are not like either of the above examples: e.g., systems that are robust at many times but not monotonic. General ways of evaluating robustness of such systems remains an important open problem.

Finally, it is important to note that quantifying the robustness of SSA processes, even monotonic ones, seems to require computing many SSA runs. This is self-defeating when in practice one wants to show that the given SSA process is (ρ, δ)-robust in order to justify the use of an approximate simulation algorithm to quickly simulate it. In these cases, we have to consider (ρ, δ)-robustness a theoretical notion only. Note, however, that it may be much easier to show that a system is not robust by comparing the simulation runs of different ρ-perturbations, since the runs can be quickly obtained using fast approximate simulation algorithms such as that presented in the next section.

4. Bounded Tau-Leaping

4.1. The algorithm

We argued in the Introduction that sloppiness can allow for faster simulation. In this section we give a new approximate stochastic simulation algorithm called bounded tau-leaping (BTL) that simulates exactly a certain ρ-perturbation rather than the original SSA process. Consequently, the algorithm solves the prediction problem with allowed error δ for (ρ, δ)-robust SSA processes.

The algorithm is a variant of existing tau-leaping algorithms (Gillespie, 2007). However, while other tau-leaping algorithms have an implicit notion of robustness, BTL is formally compatible with our explicit definition. As we'll see below, our algorithm also has certain other advantages over many previous tau-leaping implementations: it naturally disallows negative concentrations and scales to SSA in a manner that there is always at least one reaction per leap. It also seems easier to analyze formally; obtaining a result similar to Theorem 4.1 is an open question for other tau-leaping variants.

BTL has overall form typical of tau-leaping algorithms. Rather than simulating every reaction occurrence explicitly as per the SSA, BTL divides the simulation into leaps which group multiple reaction events. The propensities of all of the reactions are assumed to be fixed throughout the leap. This is obviously an approximation since each reaction event affects molecular counts and therefore the propensities. However, this approximation is useful because simulating the system with the assumption that propensities are fixed turns out to be much easier. Instead of having to draw random variables for each reaction occurrence, the number of random variables drawn to determine how many reaction firings occurred in a leap is independent of the number of reaction firings. Thus we effectively “leap” over all of the reactions within a leap in few computational steps. If molecular counts do not change by much within a leap then the fixed propensities are close to their correct SSA values and the approximation is good.

Our definition of a ρ-perturbation allows us to formally define “good.” We want to guarantee that the approximate process that tau-leaping actually simulates is a ρ-perturbation of the exact SSA process. We can achieve this as follows. If Inline graphic is the state on which the leap started, throughout the leap the simulated reaction propensities are fixed at their SSA propensities on . Then for any state within the leap we want the correct SSA propensities to satisfy the following ρ-perturbation constraint (0 < ρ < 1): . As soon as we reach a state Inline graphic for which this constraint is violated, we start a new leap at which will use simulated reaction propensities fixed at . This ensures that at any time in the simulation, there is some (1 − ρ) ≤ ξ_j (t) ≤ (1 + ρ) such that multiplying the correct SSA propensity of reaction R_j by ξ_j (t) yields the propensity of R_j that the simulation algorithm is actually using. Therefore, we actually simulate a ρ-perturbation, and for (ρ, δ)-robust SSA processes, the algorithm can be used to provably solve the prediction problem with error δ.

Can we implement this simulation quickly, and, as promised, do little computation per leap? Note that in order to limit the maximum propensity deviation in a leap, we need to make the leap duration be a random variable dependent upon the stochastic events in the leap. If we evaluate Inline graphic after each reaction occurrence in a leap to verify the satisfaction of the ρ-perturbation constraint, we do not save time over SSA. However, we can avoid this by using a stricter constraint we call the {ɛ_ij}-perturbation constraint (0 < ɛ_ij < 1), defined as follows. If the leap starts in state Inline graphic , reaction R_j is allowed to change the molecular count of species S_i by at most plus or minus ɛ_ijx_i within a leap. Again, as soon as we reach a state where this constraint is violated, we start a new leap at .⁸

For any ρ, we can find a set of {ɛ_ij} bounds such that satisfying the {ɛ_ij}-perturbation constraint satisfies the ρ-perturbation constraint. In Appendix A.l we show that for any SCRN, the ρ-perturbation constraint is satisfied if Inline graphic where M is the number of reactions in the SCRN.

Simulating a leap such that it satisfies the {ɛ_ij}-perturbation constraint is easy and only requires drawing M gamma and M – 1 binomial random variables. Suppose the leap starts in state Inline graphic . For each reaction R_j, let b_j be the number of times R_j needs to fire to cause a violation of the {ɛ_ij} bounds for some species. Thus b_j is the smallest positive integer such that for some S_i. To determine τ, the duration of the leap, we do the following. First we determine when each reaction R_j would occur b_j times, by drawing from a gamma distribution with shape parameter b_j and rate parameter a_j. This generates a time τ_j for each reaction. The leap ends as soon as some reaction R_j occurs b_j times; thus to determine the duration of the leap τ we take the minimum of the τ_j's. At this point, we know that the first-violating reaction R_j_* (the one with the minimum τ_j*) occurred b_j_* times. But we also need to know how many times the other reactions occur. Consider any other reaction R_j (j ≠ j*). Given that the b_jth occurrence of reaction R_j would have happened at time τ_j had the leap not ended, we need to distribute the other b_j − 1 occurrences to determine how many happen before time τ. The number of occurrences at time τ is given by the binomial distribution with number of trials Inline graphic and success probability τ/τ_j. This enables us to define BTL as shown in Figure 2.

FIG. 2. — The bounded tau-leaping (BTL) algorithm. The algorithm is given the SCRN, the initial state , the volume V, and a set of perturbation bounds {*ɛ_ij*} > 0. If the state at a specific time *t_f* is desired, the algorithm checks if t + τ > *t_f* in step (3), and if so uses τ = *t_f* − τ, and treats all reactions as not first-violating in step (4). Gamma(n, λ) is a gamma distribution with shape parameter n and rate parameter λ. Binomial(n, p) is a binomial distribution with number of trials n and success probability p.

The algorithm is called “bounded” tau-leaping because the deviations of reaction propensities within a leap are always bounded according to ρ. This is in contrast with other tau-leaping algorithms, such as Gillespie's (Cao et al., 2006), in which the deviations in reaction propensities are small with high probability, but not always, and in fact can get arbitrarily high if the simulation is long enough. This allows BTL to satisfy our definition of a ρ-perturbation, and permits easier analysis of the behavior of the algorithm.

As any algorithm exactly simulating a ρ-perturbation would, BTL naturally avoids negative concentrations. Negative counts can occur only if an impossible reaction happens—in some state Inline graphic reaction R_j fires for which . But since in a ρ-perturbation propensity deviations are multiplicative, in state and so R_j cannot occur. Further, no matter how small the {ɛ_ij} bounds are, there is always at least one reaction per leap and thus BTL cannot take more steps than SSA.

On the negative side, in certain cases the BTL algorithm can take many more leaps than Gillespie's tau-leaping (Cao et al., 2006; Gillespie, 2001, 2003) and other versions. Consider the case where there are two fast reactions that partially undo each others' effect (for example the reactions may be reverses of each other). While both reactions may be occurring very rapidly, their propensities may be very similar (Rathinam and El Samad, 2007). Gillespie's tau-leaping will attempt to leap to a point where the molecular counts have changed enough according to the averaged behavior of these reactions. However, our algorithm considers each reaction separately and leaps to the point where the first reaction violates the bound on the change in a species in the absence of the other reactions. Thus in this situation our algorithm would perform unnecessarily many leaps for the desired level of accuracy.

4.2. Upper bound on the number of leaps

Suppose we fix some SCRN of interest, and run BTL on different initial states, volumes, and lengths of simulated time. How does varying these parameters change the number of leaps taken by BTL? In this section, we prove that no matter what the SCRN is, we can upper bound the number of leaps as a function of the total simulated time t, the volume V, and the maximum total molecular count m encountered during the simulation. For simplicity, we assume that all the ɛ_ij are equal to some global ɛ. (Alternatively, the theorem and proof can be easily changed to use min/max {ɛ_ij} values where appropriate.)

Theorem 4.1.

For any SCRN S with M species, any ɛ such that 0 < ɛ < 1/(12M), and any δ > 0, there are constants c₁, c₂, c₃ >0 such that for any bounds on time t and total molecular count m, for any volume V and any starting state, after c₁ log m + c₂ t (C + c₃) leaps where C = m/V, either the bound on time or the bound on total molecular count will be exceeded with probability at least 1 δ δ.

Proof

The proof is presented in Appendix A.2. ▪

Note that the upper bound on ɛ implies that the algorithm is exactly simulating some ρ-perturbation (see previous section).

Intuitively, a key idea in the proof of the theorem is that the propensity of a reaction decreasing a particular species is linear to the amount of that species (since the species must appear as a reactant). This allows us to bound the decrease of any species if a leap is short. Actually this implies that a short leap probably increases the amount of some species by a lot (some species must cause a violation—if not by a decrease it must be by an increase). This allows us to argue that if we have a lot of long leaps we exceed our time bound t and if we have a lot of short leaps we exceed our bound on total molecular count m. In fact because the effect of leaps is multiplicative, logarithmically many short leaps are enough to exceed m.

It is informative to compare this result with exact SSA, which in the worst case takes O(1) m t (C+O(1)) steps, since each reaction occurrence corresponds to an SSA step and the maximum reaction propensity is k_j m²/ V or k_jm. Since m can be very large, the speed improvement can be profound.

We believe, although it remains to be proven, that other versions of tau-leaping (Gillespie, 2007) achieve the same asymptotic worst case number of leaps as our algorithm.

How much computation is required per each leap? Each leap involves arithmetic operations on the molecular counts of the species, as well as drawing from a gamma and binomial distributions. Since there are fast algorithms for obtaining instances of gamma and binomial random variables (Ahrens and Dieter, 1978; Kachitvichyanukul and Schmeiser, 1988), we do not expect a leap of BTL to require much more computation than other forms of tau-leaping, and should not be a major contributor to the total running time. Precise bounds are dependent on the model of computation. (In the next section, we state reasonable asymptotic bounds on the computation time per leap for a randomized Turing machine implementation of BTL.)

5. On The Computational Complexity of the Prediction Problem for Robust SSA Processes

What is the computational complexity inherent in the prediction problem for robust SSA processes, and how close does BTL come to the optimum computation time? In order to be able to consider these questions formally, we specify our model of computation as being randomized Turing machines (see below). Then in terms of maximum total molecular count m, log m computation time is required to simply read in the initial state of the SSA process and target state of the prediction problem. We say that computation time polylogarithmic in m is efficient in m. What about the length of simulated time t and maximum concentration C? We have shown that the number of leaps that BTL takes scales at most linearly with t and C. However, for some systems there are analytic shortcuts to determining the probability of being in Γ at time t. For instance the “exponential decay” SCRN consisting of the single reaction S₁ → S₂ is easily solvable analytically (Malek-Mansour and Nicolis, 1975). The calculation of the probability of being in any given state at any given time t (among other questions) can be solved in time that grows minimally with t and C. In this section we prove that despite such examples, for any algorithm solving prediction problems for robust SSA processes, there are prediction problems about such processes that cannot be solved faster than linear in t and C, assuming a reasonable conjecture in computational complexity theory. We prove this result for any algorithm that is efficient in m. We finally argue, with certain caveats regarding implementing BTL on a Turing machine, that as an algorithm for solving prediction problems for robust SSA processes, BTL is asymptotically optimal among algorithms efficient in m because its computation time scales linearly with t and C.

In order to prove formal lower bounds on the computational complexity of the prediction problem, we must be specific about our computation model. We use the standard model of computation which captures stochastic behavior: randomized Turing machines (TM). A randomized TM is a non-deterministic TM⁹ allowing multiple possible transitions at a point in a computation. The actual transition taken is uniform over the choices (for equivalent formalizations, see Sipser, 1997). We say a given TM on a given input runs in computational time t_tm if there is no set of random choices that makes the machine run longer.

We want to show that for some SCRNs, there is no method of solving the prediction problem fast, no matter how clever we are. We also want these stochastic processes to be robust despite having difficult prediction problems. We use the following two ideas. First, a method based on Angluin et al. (2006) shows that predicting the output of given randomized TMs can be done by solving a prediction problem for certain robust SSA processes. Second, an open conjecture, but one that is strongly compatible with the basic beliefs of computational complexity theory, bounds how quickly the output of randomized TMs can be determined.

Computational complexity theory concerns measuring how the computational resources required to solve a given problem scale with input size n (in bits). The two most prevalent efficiency measures are time and space—the number of TM steps and the length of the TM tape required to perform the computation. We say a Boolean function f (x) is probabilistically computable by a TM M in time t (n) (where n = |x|) and space s (n) if M (x) runs in time t (n) using space at most s (n), and with probability at least 2/3 outputs f (x).¹⁰ A basic tenet of computational complexity is that allowing asymptotically more computation time t (n) always expands the set of problems that can be solved. Thus it is widely believed that for any (reasonable) t (n), there are “t(n)-hard” functions that can be probabilistically computed in t (n) time, but not in asymptotically smaller time.¹¹ For our argument we will need such a t (n)-hard function, but one that does not require too much space. Formally we assume the following hierarchy conjecture:

Conjecture 5.1 ([Probabilistic, Space-Limited] Time Hierarchy)

For any α < 1, and polynomials t (n) and s (n) such that t (n)^α and s(n) are at least linear, there are Boolean functions that can be probabilistically computed within time and space bounds bounds t (n) and s (n), but not in time O(1)t(n)^α (with unrestricted space usage).

Intuitively, we take a Boolean function that requires t (n) time and embed it in a chemical system in such a way that solving the prediction problem is equivalent to probabilistically computing the function. The conjecture implies that we cannot solve the prediction problem fast enough to allow us to solve the computational problem faster than t (n). Further, since the resulting SSA process is robust, the result lower-bounds the computational complexity of the prediction problem for robust processes. Note that we need a time hierarchy conjecture that restricts the space usage and talks about probabilistic computation because it is impossible to embed a TM computation in an SCRN such that its computation is error free (Soloveichik et al., 2008), and further such embedding seems to require more time as the space usage increases.

The following theorem lower-bounds the computational complexity of the prediction problem. The bound holds even if we restrict ourselves to robust processes. It shows that this computational complexity is at least linear in t and C, as long as the dependence on m is at most polylogarithmic. It leaves the possibility that there are algorithms for solving the prediction problem that require computation time more than polylogarithmic in m but less than linear in t or C. Let the prediction problem be specified by giving the SSA process (via the initial state and volume), the target time t, and the target outcome Γ in some standard encoding such that whether a state belongs to Γ can be computed in time polylogarithmic in m.

Theorem 5.1.

Fix any perturbation bound ρ > 0 and δ > 0. Assuming the hierarchy conjecture (Conjecture 5.1), there is an SCRN Inline graphic such that for any prediction algorithm and constants c₁, c₂, β, η, γ > 0, there is an SSA process of and a (m, t, C, 1/3)-prediction problem of such that cannot solve in computational time c₁ (log m)^β t^η (C + c₂)^γ if η < 1 or γ < 1. Further, Inline graphic is (ρ, δ)-robust with respect to .

Proof

The proof is presented in Appendix A.5. ▪

With the above theorem demarcating a boundary of what is possible, the natural question is how close to optimal does BTL come? In the previous section, we have derived an upper bound on the number of leaps that our algorithm takes. However, we need to address how the idealized bounded-tau leaping algorithm presented in Section 4.1 can be implemented on a randomized TM which allows only finite precision arithmetic and a restricted model of randomness generation. We have to deal with round-off error and approximate gamma and binomial random number generators, whose effect on the probability of outcome is difficult to track formally. Further, the computational complexity of these operations is a function of the bits of precision and is complicated to rigorously bound.

In Appendix A.6, we argue that BTL on a randomized TM runs in total computation time

(1)

where, in each leap, polylogarithmic time in m is required for arithmetic manipulation of molecular counts, and l captures the extra computation time required for the real number operations and drawing from the gamma and binomial distributions. Here l is potentially a function of m, V, t, and the bits of precision used. Assuming efficient methods for drawing the random variables, l is likely very small compared to the total number of leaps. So in as far as l in (Equation (1)) can be neglected, and further assuming we can ignore errors introduced due to finite precision arithmetic and approximate random number generation, bounded-tau leaping is asymptotically optimal up to multiplicative constants and powers of log m among algorithms efficient in m.

6. Discussion

The behavior of many stochastic chemical reaction networks does not depend crucially on getting the reaction propensities exactly right, prompting our definition of ρ-perturbations and (ρ, δ)-robustness. A ρ-perturbation of an SSA process is a stochastic process with stochastic deviations of the reaction propensities from their correct SSA values. These deviations are multiplicative and bounded between 1 − ρ and 1 + ρ. If we are concerned with how likely it is that the SSA process is in a given state at a given time, then (ρ, δ)-robustness captures how far these probabilities may deviate for a ρ-perturbation.

We formally showed that predicting the behavior of robust processes does not require simulation of every reaction event. Specifically, we described a new approximate simulation algorithm called bounded tau-leaping (BTL) that simulates a certain ρ-perturbation as opposed to the exact SSA process. The accuracy of the algorithm in making predictions about the state of the system at given times is guaranteed for (ρ, δ)-robust processes. We proved an upper bound on the number of leaps of BTL that helps explain the savings over SSA. The bound is a function of the desired length of simulated time t, volume V, and maximum molecular count encountered m. This bound scales linearly with t and C = m/V, but only logarithmically with m, while the total number of reactions (and therefore SSA steps) may scale linearly with t, C, and m. When total concentration is limited, but the total molecular count is large, this represents a profound improvement over SSA. Because the number of BTL leaps scales only logarithmically with m, BTL asymptotically nears the speed of mass action ODE solvers—which have no dependence on m. We also argue that asymptotically as a function of t and C our algorithm is optimal in as far as no algorithm can achieve sublinear dependence of the number of leaps on t or C. This result is proven based on a reasonable assumption in computational complexity theory. Unlike Gillespie's tau-leaping (Cao et al., 2006), our algorithm seems better suited to theoretical analysis. Thus while we believe other versions of tau-leaping have similar worst-case running times, the results analogous to those we obtain for BTL remain to be proved.

Our results can also be seen to address the following question. If concerned solely with a particular outcome rather than with the entire process trajectory, can one always find certain shortcuts to determine the probability of the outcome without performing a full simulation? Since our lower bound on computation time scales linearly with t, it could be interpreted to mean that, except in problem-specific cases, there is no shorter route to predicting the outcomes of stochastic chemical processes than via simulation. This negative result holds even restricting to the class of robust SSA processes.

While the notion of robustness is a useful theoretical construct, how practical is our definition in deciding whether a given system is suitable to approximate simulation via BTL or not? We prove that for the class of monotonic SSA processes, robustness is guaranteed at all times when in the SSA process the outcome probability is stable over an interval of time determined by ρ. However, it is not clear how this stability can be determined without SSA simulation. Even worse, few systems of interest are monotonic. Consequently, it is compelling to develop techniques to establish robustness for more general classes of systems. A related question is whether it is possible to connect our notion of robustness to previously studied notions in mass action stability analysis (Horn and Jackson, 1972; Sontag, 2007).

7. Appendix

A.1. Enforcing the ρ-perturbation constraint by the {ɛ_ij}-perturbation constraint

Recall that in Section 4.1 we introduced two constraints bounding the number of reaction events within a leap. If Inline graphic is the state at the beginning of the leap, the ρ-perturbation constraint is satisfied if for every reaction R_j, for any state within the leap. The {ɛ_ij}-perturbation constraint is satisfied if no reaction R_j changes the molecular count of species S_i by more than plus or minus ɛ_ijxi-within the leap. For a given ρ, we would like to find an appropriate {ɛ_ij}-perturbation constraint to use in the BTL algorithm such that we satisfy the ρ-perturbation constraint, thereby ensuring that we are exactly simulating some ρ-perturbation. To avoid making the {ɛ_ij}-perturbation constraint tighter than necessary requires knowledge of the exact reactions in the given SCRN. Nevertheless, worst-case analysis below shows that setting Inline graphic works for any SCRN of M reactions.

If ɛ_ij = ɛ then, for any SCRN with M reactions, the maximum change of any species S_i within a leap allowed by the {ɛ_ij}-perturbation constraint is plus or minus M_ɛxi. We want to find an ɛ > 0 such that if the changes to all species stay within the Mɛ bounds, then no reaction violates the ρ-perturbation constraint. Consider a bimolecular reaction Inline graphic first. The algorithm simulates its propensity as throughout the leap. If x_i = 0 or 1, then , and as long as M ɛ < 1, then still , satisfying the ρ-perturbation constraint for R_j. Otherwise, if x_i ≥ 2, then the SSA propensity at state within the leap is , and so the left half of the ρ-perturbation constraint Inline graphic is satisfied if . Similarly, the right half of the ρ-perturbation constraint is satisfied if (1 + ρ)(1 − M ɛ)x_i((1−M ɛ)x_i −1) ≥ x_i (x_i −1). These inequalities are satisfied for xi ≥ 2 when (which also ensures that M ɛ < 1).

In a likewise manner, for a unimolecular reaction Inline graphic , the ρ-perturbation constraint is satisfied if (1 − ρ) (1 + M ɛ)x_i ≤ x_i and (1 + ρ) (1 − M ɛ)x_i ≥ x_i,, and for a bimolecular reaction the constraint is satisfied if and . It is easy to see that setting e as above also satisfies the inequalities for these reaction types.

Throughout the paper we assume that ρ, ɛ or {ɛ_ij} are fixed and most of our asymptotic results do not show dependence on these parameters. Nonetheless, we can show that for a fixed SCRN and for small enough ρ, ɛ can be within the range O(1) ρ ≤ ɛ ≤ O(1)ρ and thus scales linearly with ρ. Therefore, in asymptotic results, the dependence on ɛ and ρ can be interchanged. Specifically, the ɛ dependence explored in Appendix A.2 can be equally well expressed as a dependence on ρ.

A.2. Proof of Theorem 4.1: upper bound on the number of leaps

In this section we prove Theorem 4.1 from the text, which upper-bounds the number of leaps BTL takes as a function of m, t, and C:

Theorem

For any SCRN S with M species, any ɛ such that 0 < ɛ < 1/(12M), and any δ > 0, there are constants c₁, c₂, c₃ >0 such that for any bounds on time t and total molecular count m, for any volume V and any starting state, after c₁ log m + c₂ t (C + c₃) leaps where C = m/V, either the bound on time or the bound on total molecular count will be exceeded with probability at least 1 − δ.

We prove a more detailed bound than stated in the theorem above which explicitly shows the dependence on ɛ hidden in the constants. Also since we introduce the asymptotic results only the end of the argument, the interested reader may easily investigate the dependence of the constants on other parameters of the SCRN such as N, M, v_ij, and k_j. We also show an approach to probability 1 that occurs exponentially fast as the bound increases: if the bound above evaluates to n, then the probability that the algorithm does not exceed m or t in n leaps is at most 2e^−O⁽¹⁾ⁿ.

Our argument starts with a couple of lemmas. Looking within a single leap, the first lemma bounds the decrease in the molecular count of a species due to a given reaction as a function of time. The argument is essentially that for a reaction to decrease the molecular count of a species, that species must be a reactant, and therefore the propensity of the reaction is proportional to its molecular count. Thus we see a similarity to an exponential decay process and use this to bound the decrease. Note that a similar result does not hold for the increase in the molecular count of a species, since the molecular count of the increasing species need not be in the propensity function.¹² Then the second lemma uses the upper bound on how fast a species can decrease (the first lemma), together with the fact that in a leap some reaction must change some species by a relatively large amount, to classify leaps into those that either (1) take a long time or (2) increase some species significantly without decreasing any other species by much. Finally we show that this implies that if there are too many leaps we either violate the time bound or the total molecular count bound.

For the following, values f and g will be free parameters to be determined later. It helps to think of them as 0 < f ≪ g ≪ 1. How long does it take for a reaction to decrease x_i by gth fraction of the violation bound ɛx_i? The number of occurrences of R_j to decrease x_i by gɛx_i or more is at least gɛx_i/|v_ij|. The following lemma bounds the time required for these many occurrences to happen.

Lemma A.1.

Take any f and g (0 < f, g < 1), any reaction R_j and species S_i such that v_ij < 0, any state Inline graphic , and any ɛ. Assuming that the propensity of R_j is fixed at , with probability at least 1 − f/g, fewer than gɛx_i/|v_ij| occurrences of R_j happen in time fɛ/ (|v_ij|k_j) if R_j is unimolecular, or time fɛ/(|v_ij|k_j C) if R_j is bimolecular.

Proof

For reaction R_j to decrease the amount of S_i, it must be that S_i is a reactant, and thus x_i is a factor in the propensity function. Suppose R_j is unimolecular. Then a_j = k_jx_i and the expected number of occurrences of R_j in time Inline graphic is . The desired result then follows from Markov's inequality. If R_j is bimolecular with S_i, ≠ S_i_′ being the other reactant then ; alternatively, if R_j is bimolecular with identical reactants. In general for bimolecular reactions a _j ≤ k_jx_tC. So the expected number of occurrences of R_j in time Inline graphic is . The desired result follows as before. ▪

Let time Inline graphic be the minimum over all reactions R_j and S_i such that v_ij < 0 of 1/(|v_ij|k_j) if R_j is unimolecular, or 1/(|v_ij|k_j C) if R_j is bimolecular. We can think of setting the units of time for our argument. The above lemma implies that with probability at least 1 − f / g no reaction decreases x_i by gɛx_i or more within time Inline graphic . The following lemma defines typical leaps; they are of two types: long or S_i-increasing. Recall M is the number of reaction channels and N is the number of species.

Lemma A.2 (Typical leaps)

For any f and g (0 < f, g < 1), and for any ɛ, with probability at least 1 − NMf / g one of the following is true of a leap:

1. (long leap)
2. (S_i-increasing leap) , and the leap increases some species S_i at least as , while no species S_i_′ decreases as much as .

Proof

By the union bound over the M reaction channels and the N species, Lemma A.1 implies that the probability that some reaction decreases the amount of some species S_i by gɛx_i or more in time Inline graphic is at most NM f / g. Now suppose this unlucky event does not happen. Then if the leap time is , no decrease is enough to cause a violation of the deviation bounds, and thus it must be that some reaction R_j increases some species S_i by more than ɛx_i. (Since R_j must occur an integer number of times, it actually must increase S_i by ⌈ɛx_i.⌉ or more.) Since no reaction decreases S_i by gɛx_i or more, we can be sure that S_i increases at least by ⌈ɛx_i⌉ − gMɛx_i. ▪

Lemma A.3.

For any species S_i, a leap decreases S_i at most as Inline graphic .

Proof

At most M reactions may be decreasing S_i. A reaction can decrease S_i by as much as ⌊ɛx_i⌊ without causing a violation of the deviation bounds. The last reaction firing that causes the violation of the deviation bounds ending the leap uses up at most 2 molecules of S_i (since reactions are at most bimolecular). ▪

Note that a similar lemma does not hold for Gillespie's tau-leaping algorithms (Cao et al., 2006; Gillespie, 2001, 2003) because the number of reaction firings in a leap can be only bounded probabilistically. With some small probability a leap can result in “catastrophic” changes to some molecular counts. Since with enough time such events are certain to occur, the asymptotic analysis must consider them. Consequently, asymptotic results analogous to those we derive in this section remain to be proved for tau-leaping algorithms other than BTL.

Our goal now is to use the above two lemmas to argue that if we have a lot of leaps, we would either violate the molecular count bound (due to many S_i-increasing leaps for the same S_i), or violate the time bound (due to long leaps). Let n be the total number of leaps. By Hoeffding's inequality, with probability at least Inline graphic (i.e., exponentially approaching 1 with n), the total number of atypical steps is bounded as:

(2)

Further, in order not to violate the time bound t, the number of long steps can be bounded as:

(3)

How can we bound the number of the other leaps (S_i-increasing, for some species S_i)? Our argument will be that having too many of such leaps results in an excessive increase of a certain species, thus violating the bound on the total molecular count. We start by choosing an S_i for which there is the largest number of S_i-increasing steps. Since there are N species, there must be a species S_i for which

(4)

At this point, it helps to develop an alternative bit of notation labeling the different kinds of leaps with respect to the above-chosen species S_i to indicate how much x_i may change in the leap. Since our goal will be to argue that the molecular count of S_i must be large, we would like to lower-bound the increase in S_i and upper-bound the decrease. An atypical leap or a long leap we label “⇊” By Lemma A.3, these leaps decrease S_i at most as Inline graphic . An S_i-increasing leap we label “↑” Finally, an -increasing leap for we label “↓.” By Lemma A.2, ↑ leaps increase S_i at least as , while ↓ leaps decrease S_i by less than .

We would like to express these operations purely in a multiplicative way so that they become commutative, allowing for bounding their total effect on x_i independent of the order in which these leaps occurred but solely as a function of the number of each type. Further, the multiplicative representation of the leap effects is important because we want to bound the number of leaps logarithmically in the maximum molecular count. Note that ⇊ leaps cause a problem because of the subtractive constant term, and ↑ leaps cause a problem because if x_i drops to 0 multiplicative increases are futile. Nonetheless, for the sake of argument suppose we knew that throughout the simulation x_i ≥ 3. Then assuming ɛ ≤ 1/(12M), we can bound the largest decrease due to a ⇊ leap multiplicatively as Inline graphic . Further, we lower-bound the increase due to a ↑ leap as . Then the lower bound on the final molecular count of S_i and therefore the total molecular count is

(5)

This implies an upper bound on the number of ↑ leaps, that together with (Equations (2)–(4)) provides an upper bound on the total number of leaps, as we will see below.

However, x_i might dip below 3 (including at the start of the simulation). We can adjust the effective number of ↑ leaps to compensate for these dips. We say a leap is in a dip if it starts at x_i < 3. Observe that the first leap in a dip starts at x_i < 3 while the leap after a dip starts at x_i ≥ 3. Thus, unless we end in a dip, cutting out the leaps in the dips can only decrease our lower bound on the final x_i. We'll make an even looser bound and modify Equation (5) simply by removing the contribution of the ↑ leaps that are in dips.¹³ How many ↑ leaps can be in dips? First, let us ensure g < 1/(3M). Then, since a ↓ leap decreases x_i by less than gMɛx_i < x_i/3, and the decrease amount must be an integer, a ↓ leap cannot bring x_i below 3 starting at x_i ≥ 3. Thus, if we start at x_i ≥ 3 a ⇊ leap must occur before we dip below 3. Thus the largest number of dips is n^⇊ + 1 (adding 1 since we may start the simulation below 3). Let Inline graphic and be the number of ↑ and ⇊ leaps in the dth dip (we don't care about ↓ leaps in a dip since they must leave x_i unchanged). Then and . Therefore, the adjusted bound (5) becomes: . For simplicity, we use the weaker bound

(6)

In order to argue that this bounds the number of ↑ leaps, we need to make sure the ↓ leaps and the ⇊ leaps don't cancel out the effect of the ↑ leaps. By inequality (4) we know that n^↓ < Nn^↑. If we can choose g to be a small enough constant such that more than N ↓ leaps are required to cancel the effect of a ↑ leap we would be certain the bound increases exponentially with n^↑ without caring about ↓ leaps. Specifically, we choose a g small enough such that Inline graphic . For example, we can let .¹⁴ Note that g depends only on constants N and M and is independent of ɛ. The bound then becomes .

Thus, finally we have the following system of inequalities that are satisfied with probability exponentially approaching 1 as n → ∞:

(7)

(8)

(9)

(10)

Solving for n, we obtain¹⁵

if (1 − 24hf / g) > 0 where h = (N + l)/log(l + ɛ/2) (also recall Inline graphic . To ensure this we let f ≤ g/(48h N M). Then with probability exponentially approaching 1 as n increases,

Asymptotically as ɛ → 0, m → ∞, t → ∞ with the system of chemical equations being fixed, we have g = O(1), h ≤ O(1)/ɛ, and write the above as Inline graphic . Recall our unit of time was defined to be the minimum over all reactions R_j and species S_i such that v_ij <0 of 1/(|v_ij|k_j) if R_j is unimolecular, or 1/(|v_ij|k_j C) if R_j is bimolecular. No matter what C is, we can say Thus we can write the above as

For any δ, we can find appropriate constants such that the above bound is satisfied with probability at least 1 − δ.

This bound on the number of leaps has been optimized for simplicity of proof rather than tightness. A more sophisticated analysis can likely significantly decrease the numerical constants. Further, we believe the cubic dependence on 1/ɛ in the time term is excessive.¹⁶

A.3. Proving robustness by monotonicity

In this section, we develop a technique that can be used to prove the robustness of certain SSA processes. We use these results to prove the robustness of the example in Section 3 as well as of the construction of Angluin et al. (2006), simulating a Turing machine in Appendix A.4.

Since ρ-perturbations are not Markovian, it is difficult to think about them. Can we use a property of the original SSA process that would allow us to prove robustness without referring to ρ-perturbations at all?

Some systems have the property that every reaction can only bring the system “closer” to the outcome of interest (or at least “no futher”). Formally, we say an SSA process is monotonic for outcome Γ if for all reachable states Inline graphic such that there is a reaction taking to , and for all t, the probability of reaching Γ within time t starting at is at least the probability of reaching Γ within time t starting at . Note that by this definition Γ must be absorbing. Intuitively, perturbation of propensities in monotonic systems only change how fast the system approaches the outcome. Indeed, we can bound the deviations in the outcome probability of any ρ-perturbation at any time by two specific ρ-perturbations, which are the maximally slowed down and sped up versions of the original process. This implies that monotonic SSA processes are robust at all times t when the outcome probability does not change quickly with t, and thus slowing down or speeding up the SSA process does not significantly affect the probability of the outcome.

For an SSA process or ρ-perturbation Inline graphic and set of states Γ, define F^Γ(, t) to be the probability of being in Γ at time t. For SSA process , let be the ρ-perturbation defined by the constant deviations ξ_j.(t) = 1 − ρ Similarly, let be the ρ-perturbation defined by the constant deviations ξ_j.(t) = 1 + ρ.

Lemma A.4.

If an SSA process Inline graphic is monotonic for outcome Γ, then for any ρ-perturbation of ,.

Proof

If an SSA process is monotonic, allowing extra “spontaneous” transitions (as long as they are legal according to the SSA process) cannot induce a delay in entering Γ. We can decompose a perturbation with ξ_j (t) ≥ 1 as the SSA process combined with some extra probability of reaction occurrence in the next interval dt. Thus, for a perturbation Inline graphic of a monotonic SSA process in which ξ_j (t) ≥ 1, we have . By a similar argument, if has , then . Now and are themselves monotonic SSA processes ( scaled in time). Then by the above bounds, for any ρ-perturbation of we have . ▪

Since Inline graphic and are simply the original SSA process scaled in time by a factor of 1/(1 + ρ) and 1/(1 − ρ), respectively, we can write the bound of the above lemma as . Rephrasing Lemma A.4:

Corollary A.1.

If an SSA process Inline graphic is monotonic for outcome Γ then it is (ρ, δ)-robust with respect to Γ at time t where .

For many SSA processes, it may not be obvious whether they are monotonic. We would like a simple “syntactic” property of the SCRN that guarantees monotonicity and can be easily checked. The following lemma makes it easy to prove monotonicity in some simple cases.

Lemma A.5.

Let Inline graphic be an SSA process and Γ an outcome of SCRN . If every species is a reactant in at most one reaction in , and there is a set {n _j} such that outcome Γ occurs as soon as every reaction R_j has fired at least n _j times, then is monotonic with respect to Γ.

Proof

The restriction on Γ allows us phrase everything in terms of counting reaction occurrences. For every reaction R_j, define F_j (n, t) to be the probability that R_j has fired at least n times within time t. Now suppose we induce some reaction to fire by fiat. The only way this can decrease some F_j (n, t) is if it decreases the count of a reactant of R_j or makes it more likely that some reaction Inline graphic will decrease the count of a reactant of R_j. Either possibility is avoided if the SCRN has the property that any species is a reactant in at most one reaction. Since , this quantity cannot decrease as well, and monotonicity follows. ▪

A.4. Robust embedding of a TM in an SCRN

Since we are trying to bound how the complexity of the prediction problem scales with increasing bounds on t and C but not with different SCRNs, we need a method of embedding a TM in an SCRN in which the SCRN is independent of the input length. Among such methods available (Angluin et al., 2006; Soloveichik et al., 2008), asymptotically the most efficient and therefore best for our purposes is the construction of Angluin et al. (2006). This result is stated in the language of distributed multi-agent systems rather than molecular systems; the system is a well-mixed set of “agents” that randomly collide and exchange information. Each agent has a finite state. Agents correspond to molecules (the system preserves a constant molecular count m); states of agents correspond to the species, and interactions between agents correspond to reactions in which both molecules are potentially transformed.

Now for the details of the SCRN implementation of the protocol of Angluin et al. (2006). Suppose we construct an SCRN corresponding to the Angluin et al. system as follows: Agent states correspond to species (i.e., for every agent state i there is a unique species S_i). For every pair of species S_i₁, S_i₂, (i₁ ≤ i₂) we add reaction S_i₁ + S_i₂ → S_i₃ + S_i₄ if the population protocol transition function specifies Inline graphic . Note that we allow null reactions of the form S_i₁ + S_i₂ → S_i₁ + S_i₂ including for i₁ = i₂. For every reaction R_j, we'll use rate constant k_j = 1. The sum of all reaction propensities is since every molecule can react with any other molecule.¹⁷ The time until next reaction is an exponential random variable with rate λ. Note that the transition probabilities between SCRN states are the same as the transition probabilities between the corresponding configurations in the population protocol since in the SCRN every two molecules are equally likely to react next. Thus, the SSA process is just a continuous time version of the population protocol process (where unit “time” expires between transitions). Therefore, the SCRN can simulate a TM in the same way as the population protocol.

But first we need to see how does time measured in the number of interactions correspond to the real-valued time in the language of SCRNs?

Lemma A.6.

If the time between population protocol interactions is an exponential random variable with rate λ, then for any positive constants c, c₁, c₂ such that c₁ < 1 < c₂, there is N₀ such that for all N > N₀, N interactions occur between time c₁N/λ and c₂N/λ with probability at least 1 − N^−c.

Proof

The Chernoff bound for the left tail of a gamma random variable T with shape parameter N and rate λ is Inline graphic for t < N/λ. Thus . Since when for large enough N. An identical argument applies to the right tail Chernoff bound for t > N/λ. ▪

The following lemma reiterates that an arbitrary computational problem can be embedded in a chemical system, and also shows that the chemical computation is robust with respect to the outcome of the computation. For a given TM and agent count m, let Inline graphic and be SCRN states corresponding to the TM halting with a 0 and 1 output respectively.

Lemma A.7.

Fix a perturbation bound ρ > 0, δ > 0, and a randomized TM M with a Boolean output. There is an SCRN implementing Angluin et al.'s population protocol, such that if M(x) halts in no more than t_tm steps using no more than s_tm time, then starting with the encoding of x and using Inline graphic molecules, at any time the SSA process is in with probability that is within δ of the probability that M(x) = b. Further, this SSA process is (ρ, δ)-robust with respect to states and at all times t ≥ t_ssa.

The first part of the lemma states that we can embed an arbitrary TM computation in an SCRN, such that the TM computation is performed fast and correctly with arbitrarily high probability. The second part states that this method can be made arbitrarily robust to perturbations of reaction propensities. The first part follows directly from the results of Angluin et al. (2006), while the second part requires some additional arguments on our part.

If we only wanted to prove the first part, fix any randomized TM M with a Boolean output and any constant δ > 0. There is a population protocol of Angluin et al. that can simulate the TM's computation on arbitrary inputs as follows: If on some input x, M uses t_tm computational time and s_tm space, then the protocol uses Inline graphic agents, and the probability that the simulation is incorrect or takes longer than N = O(1)t_tmm log⁴ m interactions is at most δ/2. This is proved by using Theorem 11 of Angluin et al. (2006), combined with the standard way of simulating a TM by a register machine using multiplication by a constant and division by a constant with remainder. The total probability of the computation being incorrect or lasting more than N interactions obtained is at most O(1)t_tmm^−c. Since for any algorithm terminating in t_tm steps, Inline graphic , we can make sure this probability is at most δ/2 by using a large enough constant in . By Lemma A.6, the probability that O(1)N interactions take longer than O(1)N/λ time to occur is at most δ/2. Thus the total probability of incorrectly simulating M on x or taking longer than time is at most δ. The Boolean output of M is indicated by whether we end up in state Inline graphic or . (If the computation was incorrect or took too long we can be in neither.) This proves the first part of the lemma.

We now sketch out the proof of how the robustness of the system of Angluin et al. (2006) can be established, completing the proof of Lemma A.7. The whole proof requires retracing the argument in the original paper; here, we just outline how this retracing can be done. First, we convert the key lemmas of their paper to use real-valued SCRN time rather than the number of interactions. The consequences of the lemmas (e.g., that something happens before something else) are preserved and thus the lemmas can be still be used as in the original paper to prove the corresponding result for SCRNs. The monotonicity of the processes analyzed in the key lemmas can be used to argue that the overall construction is robust.

We need the following consequence of Lemma A.4:

Corollary A.2.

If an SSA process Inline graphic is monotonic for outcome Γ, and with probability p it enters Γ after time t₁ but before time t₂, then for any ρ-perturbation of , the probability of entering Γ after time t₁/(1+ρ) but before time t₂/(1 − ρ) is at least p.

Proof

Let Inline graphic and . Using Lemma A.4 we know that . Thus, . Similarly we obtain . Thus . ▪

As an example let us illustrate the conversion of Lemma 2 of Angluin et al. (2006). The Lemma bounds the number of interactions to infect k agents in a “one-way epidemic” starting with a single infected agent. In the one-way epidemic, a non-infected agent becomes infected when it interacts with a previously infected agent. With our notation, this lemma states:

Let N(k) be the number of interactions before a one-way epidemic starting with a single infected agent infects k agents. Then for any fixed ɛ > 0 and c > 0, there exist positive constants c₁ and c₂ such that for sufficiently large total agent count m and any k > m^ɛ, c₁m lnk ≤ N(k) ≤ c₂m ln k with probability at least 1 − m^−c.

For any m and k we consider the corresponding SSA process Inline graphic and outcome Γ in which at least k agents are infected. Since the bounds on N(k) scale at least linearly with m, we can use Lemma A.6 to obtain:

Let t(k) be the time before a one-way epidemic starting with a single infected agent infects k agents. Then for any fixed ɛ > 0 and c > 0, there exist positive constants c₁ and c₂ such that for sufficiently large total agent count m and any k > m^ɛ, c₁m ln(k)/λ ≤ t(k) ≤ c₂m ln(k)/λ with probability at least 1 − m^−c.

Finally consider the SSA process of the one-way epidemic spreading. The possible reactions either do nothing (reactants are either both infected or both non-infected), or a new agent becomes infected. It is clear that for any m and k, Inline graphic is monotonic with respect to outcome Γ in which at least k agents are infected. This allows us to use Corollary A.2 to obtain:

Fix any ρ < 0, and let t(k) be the time before a one-way epidemic starting with a single infected agent infects k agents in some corresponding ρ-perturbation. Then for any fixed ɛ > 0, c > 0, there exist positive constants c₁ and c₂ such that for sufficiently large total agent count m and any k > m^ɛ, c₁m ln(k)/ (λ (1 + ρ)) ≤ t(k) ≤ c₂m ln (k)/(λ (1 - \rho)) with probability at least 1 − m^−c.

Since ρ is a constant, what we have effectively done is convert the result in terms of interactions to a result in terms of real-valued time that is robust to ρ-perturbations simply by dividing by λ and using different multiplicative constants.

The same process can be followed for the key lemmas of Angluin et al. (2006) (Lemmas 3–8). This allows us to prove a robust version of Theorem 11 of Angluin et al. (2006) by retracing the argument of their paper using the converted lemmas and the real-valued notion of time throughout. Since the only way that time is used is to argue that something occurs before something else, the new notion of time, obtained by dividing by λ with different constants, can always be used in place of the number of interactions.

A.5. Proof of Theorem 5.1: lower bound on the computational complexity of the prediction problem

In this section, we prove Theorem 5.1 from the text which lower-bounds the computational complexity of the prediction problem as a function of m, t, and C. The bound holds even for arbitrarily robust SSA processes. The theorem shows that this computational complexity is at least linear in t and C, as long as the dependence on m is at most polylogarithmic. The result is a consequence of the robust embedding of a TM in an SCRN (Lemma A.7).

Let the prediction problem be specified by giving the SSA process (via the initial state and volume), the target time t, and the target outcome Γ in some standard encoding such that whether a state belongs to Γ can be computed in time polylogarithmic in m.

Theorem

Fix any perturbation bound ρ > 0 and δ > 0. Assuming the hierarchy conjecture (Conjecture 5.1), there is an SCRN Inline graphic such that for any prediction algorithm and constants c₁, c₂, β η, γ > 0, there is an SSA process of and a (m, t, C, 1/3)-predictionproblem of such that cannot solve in computational time c₁ (log m)^β t^η (C + c₂)^γ if η < 1 or γ < 1. Further, Inline graphic is (ρ, δ)-robust with respect to .

Suppose someone claims that for any fixed SCRN, they can produce an algorithm for solving (m, t, C, 1/3)-prediction problems for SSA processes of this SCRN assuming the SSA process is (ρ, δ)-robust with respect to the prediction problem for some fixed ρ and δ, and further they claim the algorithm runs in computation time at most

(11)

for some η < 1 (β, γ > 0). We argue that assuming the hierarchy conjecture is true, such a value of η is impossible.

To achieve a contradiction of the hierarchy conjecture, consider any function probabilistically computable in Inline graphic time and space for . Construct a randomized TM having error at most 1/24 by running the original randomized TM O(1) times and taking the majority vote. Use Lemma A.7 to encode the TM probabilistically computing this function in a (ρ, δ)-robust SSA process such that the error of the TM simulation is at most 1/24. Then predicting whether the process ends up in state Inline graphic or provides a probabilistic algorithm for computing this function. The resulting error is at most 1/24 + 1/24 + 1/3 = 5/12 < 1/2, where the first term 1/24 is the error of the TM, the second term 1/24 is for the additional error of the TM embedding in the SSA process, and the last term 1/3 is for the allowed error of the prediction problem. By repeating O(1) times and taking the majority vote, this error can be reduced below 1/3, thereby satisfying the definition of probabilistic computation. How long does this method take to evaluate the function? We use V = m so that C is a constant, resulting in Inline graphic since m = O(1)2ⁿ. Setting up the prediction problem by specifying the SSA process (via the initial state and volume), target final state and time t_ssa requires O(1)log m = O(1)n time.¹⁸ Then the prediction problem is solved in computation time . Thus, the total computation time is Inline graphic which, by our choice of ζ, is less than , leading to a contradiction of the hierarchy conjecture.

Is γ < 1 possible? Observe that if γ < η then the claimed running time of the algorithm solving the prediction problem (expression (11)) with time t_ssa = O(1)Vt_tm (n) log⁴ (m)/m can be made arbitrarily small by decreasing V. This leads to contradiction of the hierarchy conjecture. Therefore y γ ≥ η ≥ 1.

A.6. On implementing BTL on a randomized TM

The idealized BTL algorithm presented in Section 4.1 relies on infinite precision real-value arithmetic, while only finite precision floating-point arithmetic is possible on a TM. Further, the basic randomness generating operation available to a randomized TM is choosing one of a fixed number of alternatives uniformly, which forces gamma and binomial draws to be approximated. This complicates estimates of the computation time required per leap, and also requires us to ensure that we can ignore round-off errors in floating-point operations and tolerate approximate sampling in random number draws.

Can we implement gamma and binomial random number generators on a randomized TM and how much computational time do they require? It is easy to see that arbitrary precision uniform [0, 1] random variates can be drawn on a randomized TM in time linear in precision. It is likely that approximate gamma and binomial random variables can be drawn using methods available in the numerical algorithms literature which uses uniform variate draws as the essential primitive. Since many existing methods for efficiently drawing (approximate) gamma and binomial random variables involve the rejection method, the computation time for these operations is likely to be an expectation. Specifically, it seems reasonable that drawing gamma and binomial random variables can be approximately implemented on a randomized TM such that the expected time of these operations is polynomial in the length of the floating-point representation of the distribution parameters and the resultant random quantity.¹⁹

The computational complexity of manipulating integer molecular counts on a TM is polylogarithmic in m. Let l be an upper bound on the expected computational time required for drawing the random variables and real number arithmetic; l is potentially a function of m, V, t, and the bits of precision used. Using Markov's inequality and Theorem 4.1 we can then obtain a bound on the total computation time that is true with arbitrarily high probability. We also make the TM keep track of the total number of computational steps it has taken²⁰ and cut off computation when it exceeds the expectation by some fixed factor. Then we obtain the following bound on the total computation time: Inline graphic .

We have three sources of error. First, since BTL simulates a ρ-perturbation rather than the original SSA process, the probability of the outcome may be off by δ₁, assuming the SSA process was (ρ, δ₁)-robust. Further, since we are using finite precision arithmetic and only approximate random number generation, the deviation from the correct probability of the outcome may increase by another δ₂. Finally, there is a δ₃ probability that the algorithm cuts off computation before it completes. We have assumed a fixed δ₁ < δ, where δ is the allowed error of the prediction problem. We can make δ₃ an arbitrarily small constant by increasing the total computation time by a constant factor (using Markov's inequality). Further, let us assume that δ₂ is small enough to ensure that the total error δ₁ + δ₂ + δ₃ ≤ δ fulfills the requirements of solving the (m, t, C, δ)-prediction problem.²¹

Footnotes

^¹

Some stochastic simulation implementations on the web: Systems Biology Workbench: http://sbw.sourceforge.net; BioSpice: http://biospice.lbl.gov; Stochastirator: http://opnsrcbio.molsci.org; STOCKS: http://www.sysbio.pl/stocks; BioNetS: http://x.amath.unc.edu:16080/BioNetS; SimBiology package for MATLAB: http://www.mathworks.com/products/simbiology/index.html.

^²

As an illustrative example, a prokaryotic cell and a eukaryotic cell may have similar concentrations of proteins but vastly different volumes.

^³

Indeed, the total molecular count m can be extremely large compared to its logarithm—e.g., Avogadro's number = 6 × 10²³, while its log₂ is only 79.

^⁴

Inline graphic and .

^⁵

It is exactly the stochastic process simulated by Gillespie's Stochastic Simulation Algorithm (SSA) (Gillespie, 1977).

^⁶

We phrase the prediction problem in terms appropriate for a simulation algorithm. An alternative formulation would be the problem of estimating the probability that the SSA process is in Inline graphic at time t. To be able to solve this problem using a simulation algorithm we can at most require that with probability at least δ₁ the estimate is within δ₂ of the true probability for some constants δ₁, δ₂ > 0. This can be attained by running the simulation algorithm a constant number of times.

^⁷

Maximum concentration C is a more natural measure of complexity compared to V because similar to m and t, computational complexity increases as C increases.

^⁸

An added benefit of providing {ɛij} bounds rather than ρ as a parameter to the BTL algorithm is that it allows flexibility on the part of the user to assign less responsibility for a violation to a reaction that is expected to be fast compared to a reaction that is expected to be slow. This may potentially speed up the simulation, while still preserving the ρ-perturbation constraint.

^⁹

Arbitrary finite number of states and tapes. Without loss of generality, we can assume a binary alphabet.

^¹⁰

Any other constant probability bounded away from 1/2 will do just as well: to achieve a larger constant probability of being correct, we can repeat the computation a constant number of times and take majority vote.

^¹¹

If we do not allow any chance of error, the corresponding statement is proven as the (deterministic) time hierarchy theorem (Sipser, 1997). Also see Barak (2002) and Fortnow and Santhanam (2006) for progress in proving the probabilistic version.

^¹²

If a reaction is converting a populous species to a rare one, the rate of the increase of the rare species can be proportional to m times its molecular count. The rate of decrease, however, is always proportional to the molecular count of the decreasing species, or proportional to C times the molecular count of the decreasing species (as we will see below).

^¹³

We know we cannot end in a dip if the resulting bound evaluates to 3 or more. Thus technically we assume m ≥ 3 for the bound to be always valid.

^¹⁴

Since g ≤ 1/(3M), make the simplification (1 + (1 − gM)ɛ) ≥ (1 + 2 ɛ/3) and solve for g. The solution is minimized when ɛ = 1.

^¹⁵

Logarithms are base-2.

^¹⁶

The cubic dependence on 1/ɛ in the time term is due to having to decrease the probability of an atypical step as ɛ decreases. It may be possible to reduce the cubic dependence to a linear one by moving up the boundary between a dip and the multiplicative regime as a function of ɛ rather than fixing it at 3. The goal is to replace the constant base Inline graphic term with a term. Then the effect of a ⇊ leap would scale with ɛ as does the effect of an ↑ leap.

^¹⁷

Just to confirm, splitting the reactions between the same species and between different species, the sum of the propensities is Inline graphic using the fact that and .

^¹⁸

By the construction of Angluin et al. (2006), setting up the initial state requires letting the binary expansion of the molecular count of a certain species be equal the input. Since the input is given in binary and all molecular counts are represented in binary, this is a linear time operation. Setting up the final state Inline graphic or is also linear time. Computing the target time for the prediction problem t_ssa is asymptotically negligible.

^¹⁹

The numerical algorithms literature, which assumes that basic floating point operations take unit time, describes a number of algorithms for drawing from an (approximate) standard gamma distribution (Ahrens and Dieter, 1978), and from a binomial distribution (Kachitvichyanukul and Schmeiser, 1988), such that the expected number of floating-point operations does not grow as a function of distribution parameters (however, some restrictions on the parameters may be required). On a TM basic arithmetic operations take polynomial time in the length of the starting numerical values and the calculated result.

^²⁰

Compute the bound and write this many 1's on a work tape, and after each computational step, count off one of the 1's until no more are left.

^²¹

We conjecture that for any fixed δ₂, we can find some fixed amount of numerical precision to not exceed δ₂ for (ρ, δ₁)-robust processes. We would like to show that robustness according to our definition implies robustness to round-off errors and approximate random number generation. While this conjecture has strong intuitive appeal, it seems difficult to prove formally, and represents an area for further study.

Acknowledgments

I thank Erik Winfree and Matthew Cook for providing invaluable support, technical insight, corrections, and suggestions. This work was supported by the NSF (grant 0523761 to Winfree) and the NIMH (training grant MH19138-15 to the Department of CNS).

Disclosure Statement

No competing financial interests exist.

References

Adalsteinsson D. McMillen D. Elston T.C. Biochemical network stochastic simulator (BioNetS): software for stochastic modeling of biochemical networks. BMC Bioinform. 2004;5:24–45. doi: 10.1186/1471-2105-5-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ahrens J. Dieter U. Generating gamma variates by a modified rejection technique. Language. 1978;54:853–882. [Google Scholar]
Alon U. An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman & Hall/CRC; Boca Raton, FL: 2007. [Google Scholar]
Angluin D. Aspnes J. Eisenstat D. Technical Report YALEU/DCS/TR-1358. Department of Computer Science. Yale University; New Haven, CT: 2006. Fast computation by population protocols with a leader. [Google Scholar]
Arkin A.P. Ross J. McAdams H.H. Genetics. Vol. 149. 1633: 1998. Stochastic kinetic analysis of a developmental pathway bifurcation in phage-l Escherichia coli; 1648 pp. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barak B. A probabilistic-time hierarchy theorem for ‘slightly non-uniform’ algorithms. Proc. RANDOM. 2002:194–208. [Google Scholar]
Cao Y. Gillespie D. Petzold L. Efficient step size selection for the tau-leaping simulation method. J. Chem. Phys. 2006;124:044109. doi: 10.1063/1.2159468. [DOI] [PubMed] [Google Scholar]
Elowitz M.B. Levine A.J. Siggia E.D., et al. Stochastic gene expression in a single cell. Science. 2002;297:1183–1185. doi: 10.1126/science.1070919. [DOI] [PubMed] [Google Scholar]
Érdi P. Tóth J. Mathematical Models of Chemical Reactions : Theory and Applications of Deterministic and Stochastic Models. Manchester University Press; Manchester, UK: 1989. [Google Scholar]
Ethier S.N. Kurtz T.G. Markov Processes: Characterization and Convergence. John Wiley & Sons; New York: 1986. [Google Scholar]
Fortnow L. Santhanam R. Recent work on hierarchies for semantic classes. SIGACT News. 2006;37:36–54. [Google Scholar]
Gibson M. Bruck J. Efficient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem. 2000;A 104:1876–1889. [Google Scholar]
Gillespie D.T. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 1977;81:2340–2361. [Google Scholar]
Gillespie D.T. A rigorous derivation of the chemical master equation. Physica A. 1992;188:404–425. [Google Scholar]
Gillespie D.T. Approximate accelerated stochastic simulation of chemically reacting systems. J. Chem. Phys. 2001;115:1716–1733. [Google Scholar]
Gillespie D.T. Improved leap-size selection for accelerated stochastic simulation. J. Chem. Phys. 2003;119:8229–8234. [Google Scholar]
Gillespie D.T. Stochastic simulation of chemical kinetics. Ann. Rev. Phys. Chem. 2007;58:35–55. doi: 10.1146/annurev.physchem.58.032806.104637. [DOI] [PubMed] [Google Scholar]
Guptasarma P. Does replication-induced transcription regulate synthesis of the myriad low copy number proteins of Escherichia coli? Bioessays. 1995;17:987–997. doi: 10.1002/bies.950171112. [DOI] [PubMed] [Google Scholar]
Horn F. Jackson R. General mass action kinetics. Arch. Rational Mech. Anal. 1972;47:81–116. [Google Scholar]
Kachitvichyanukul V. Schmeiser B. Binomial random variate generation. Commun. ACM. 1988;31:216–222. [Google Scholar]
Kierzek A.M. STOCKS: STOChastic kinetic simulations of biochemical systems with Gillespie algorithm. Bioinformatics. 2002;18:470–481. doi: 10.1093/bioinformatics/18.3.470. [DOI] [PubMed] [Google Scholar]
Kurtz T.G. The relationship between stochastic and deterministic models for chemical reactions. J. Chem. Phys. 1972;57:2976–2978. [Google Scholar]
Levin B. Genes VII. Oxford University Press; New York: 1999. [Google Scholar]
Malek-Mansour M. Nicolis G. A master equation description of local fluctuations. J. Statist. Phys. 1975;13:197–217. [Google Scholar]
McAdams H.H. Arkin A.P. Stochastic mechanisms in gene expression. Proc. Natl. Acad. Sci. USA. 1997;94:814–819. doi: 10.1073/pnas.94.3.814. [DOI] [PMC free article] [PubMed] [Google Scholar]
McQuarrie D.A. Stochastic approach to chemical kinetics. J. Appl. Probab. 1967;4:413–478. [Google Scholar]
Morohashi M. Winn A.E. Borisuk M.T., et al. Robustness as a measure of plausibility in models of biochemical networks. J. Theoret. Biol. 2002;216:19–30. doi: 10.1006/jtbi.2002.2537. [DOI] [PubMed] [Google Scholar]
Rathinam M. El Samad H. Reversible-equivalent-monomoleculartau: a leaping method for “small number and stiff” stochastic chemical systems. J. Comput. Phys. 2007;224:897–923. [Google Scholar]
Rathinam M. Petzold L. Cao Y., et al. Stiffness in stochastic chemically reacting systems: the implicit tau-leaping method. J. Chem. Phys. 2003;119:12784. doi: 10.1063/1.1763573. [DOI] [PubMed] [Google Scholar]
Sipser M. Introduction to the Theory of Computation. PWS Publishing; Boston, MA: 1997. [Google Scholar]
Soloveichik D. Cook M. Winfree E., et al. Computation with finite stochastic chemical reaction networks. Natural Computing. 2008;7:615–633. [Google Scholar]
Sontag E. Monotone and near-monotone systems. Lect. Notes Control Inform. Sci. 2007;357:79–122. [Google Scholar]
Suel G.M. Garcia-Ojalvo J. Liberman L.M., et al. An excitable gene regulatory circuit induces transient cellular differentiation. Nature. 2006;440:545–550. doi: 10.1038/nature04588. [DOI] [PubMed] [Google Scholar]
van Kampen N. Stochastic Processes in Physics and Chemistry. revised. Elsevier; New York: 1997. [Google Scholar]
Vasudeva K. Bhalla U.S. Adaptive stochastic-deterministic chemical kinetic simulations. Bioinformatics. 2004;20:78–84. doi: 10.1093/bioinformatics/btg376. [DOI] [PubMed] [Google Scholar]

[B1] Adalsteinsson D. McMillen D. Elston T.C. Biochemical network stochastic simulator (BioNetS): software for stochastic modeling of biochemical networks. BMC Bioinform. 2004;5:24–45. doi: 10.1186/1471-2105-5-24. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] Ahrens J. Dieter U. Generating gamma variates by a modified rejection technique. Language. 1978;54:853–882. [Google Scholar]

[B3] Alon U. An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman & Hall/CRC; Boca Raton, FL: 2007. [Google Scholar]

[B4] Angluin D. Aspnes J. Eisenstat D. Technical Report YALEU/DCS/TR-1358. Department of Computer Science. Yale University; New Haven, CT: 2006. Fast computation by population protocols with a leader. [Google Scholar]

[B5] Arkin A.P. Ross J. McAdams H.H. Genetics. Vol. 149. 1633: 1998. Stochastic kinetic analysis of a developmental pathway bifurcation in phage-l Escherichia coli; 1648 pp. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] Barak B. A probabilistic-time hierarchy theorem for ‘slightly non-uniform’ algorithms. Proc. RANDOM. 2002:194–208. [Google Scholar]

[B7] Cao Y. Gillespie D. Petzold L. Efficient step size selection for the tau-leaping simulation method. J. Chem. Phys. 2006;124:044109. doi: 10.1063/1.2159468. [DOI] [PubMed] [Google Scholar]

[B8] Elowitz M.B. Levine A.J. Siggia E.D., et al. Stochastic gene expression in a single cell. Science. 2002;297:1183–1185. doi: 10.1126/science.1070919. [DOI] [PubMed] [Google Scholar]

[B9] Érdi P. Tóth J. Mathematical Models of Chemical Reactions : Theory and Applications of Deterministic and Stochastic Models. Manchester University Press; Manchester, UK: 1989. [Google Scholar]

[B10] Ethier S.N. Kurtz T.G. Markov Processes: Characterization and Convergence. John Wiley & Sons; New York: 1986. [Google Scholar]

[B11] Fortnow L. Santhanam R. Recent work on hierarchies for semantic classes. SIGACT News. 2006;37:36–54. [Google Scholar]

[B12] Gibson M. Bruck J. Efficient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem. 2000;A 104:1876–1889. [Google Scholar]

[B13] Gillespie D.T. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 1977;81:2340–2361. [Google Scholar]

[B14] Gillespie D.T. A rigorous derivation of the chemical master equation. Physica A. 1992;188:404–425. [Google Scholar]

[B15] Gillespie D.T. Approximate accelerated stochastic simulation of chemically reacting systems. J. Chem. Phys. 2001;115:1716–1733. [Google Scholar]

[B16] Gillespie D.T. Improved leap-size selection for accelerated stochastic simulation. J. Chem. Phys. 2003;119:8229–8234. [Google Scholar]

[B17] Gillespie D.T. Stochastic simulation of chemical kinetics. Ann. Rev. Phys. Chem. 2007;58:35–55. doi: 10.1146/annurev.physchem.58.032806.104637. [DOI] [PubMed] [Google Scholar]

[B18] Guptasarma P. Does replication-induced transcription regulate synthesis of the myriad low copy number proteins of Escherichia coli? Bioessays. 1995;17:987–997. doi: 10.1002/bies.950171112. [DOI] [PubMed] [Google Scholar]

[B19] Horn F. Jackson R. General mass action kinetics. Arch. Rational Mech. Anal. 1972;47:81–116. [Google Scholar]

[B20] Kachitvichyanukul V. Schmeiser B. Binomial random variate generation. Commun. ACM. 1988;31:216–222. [Google Scholar]

[B21] Kierzek A.M. STOCKS: STOChastic kinetic simulations of biochemical systems with Gillespie algorithm. Bioinformatics. 2002;18:470–481. doi: 10.1093/bioinformatics/18.3.470. [DOI] [PubMed] [Google Scholar]

[B22] Kurtz T.G. The relationship between stochastic and deterministic models for chemical reactions. J. Chem. Phys. 1972;57:2976–2978. [Google Scholar]

[B23] Levin B. Genes VII. Oxford University Press; New York: 1999. [Google Scholar]

[B24] Malek-Mansour M. Nicolis G. A master equation description of local fluctuations. J. Statist. Phys. 1975;13:197–217. [Google Scholar]

[B25] McAdams H.H. Arkin A.P. Stochastic mechanisms in gene expression. Proc. Natl. Acad. Sci. USA. 1997;94:814–819. doi: 10.1073/pnas.94.3.814. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] McQuarrie D.A. Stochastic approach to chemical kinetics. J. Appl. Probab. 1967;4:413–478. [Google Scholar]

[B27] Morohashi M. Winn A.E. Borisuk M.T., et al. Robustness as a measure of plausibility in models of biochemical networks. J. Theoret. Biol. 2002;216:19–30. doi: 10.1006/jtbi.2002.2537. [DOI] [PubMed] [Google Scholar]

[B28] Rathinam M. El Samad H. Reversible-equivalent-monomoleculartau: a leaping method for “small number and stiff” stochastic chemical systems. J. Comput. Phys. 2007;224:897–923. [Google Scholar]

[B29] Rathinam M. Petzold L. Cao Y., et al. Stiffness in stochastic chemically reacting systems: the implicit tau-leaping method. J. Chem. Phys. 2003;119:12784. doi: 10.1063/1.1763573. [DOI] [PubMed] [Google Scholar]

[B30] Sipser M. Introduction to the Theory of Computation. PWS Publishing; Boston, MA: 1997. [Google Scholar]

[B31] Soloveichik D. Cook M. Winfree E., et al. Computation with finite stochastic chemical reaction networks. Natural Computing. 2008;7:615–633. [Google Scholar]

[B32] Sontag E. Monotone and near-monotone systems. Lect. Notes Control Inform. Sci. 2007;357:79–122. [Google Scholar]

[B33] Suel G.M. Garcia-Ojalvo J. Liberman L.M., et al. An excitable gene regulatory circuit induces transient cellular differentiation. Nature. 2006;440:545–550. doi: 10.1038/nature04588. [DOI] [PubMed] [Google Scholar]

[B34] van Kampen N. Stochastic Processes in Physics and Chemistry. revised. Elsevier; New York: 1997. [Google Scholar]

[B35] Vasudeva K. Bhalla U.S. Adaptive stochastic-deterministic chemical kinetic simulations. Bioinformatics. 2004;20:78–84. doi: 10.1093/bioinformatics/btg376. [DOI] [PubMed] [Google Scholar]

PERMALINK

Robust Stochastic Chemical Reaction Networks and Bounded Tau-Leaping

David Soloveichik

Abstract

1. Introduction

2. Model and Definitions

3. Robustness Examples

FIG. 1.

4. Bounded Tau-Leaping

4.1. The algorithm

FIG. 2.

4.2. Upper bound on the number of leaps

Theorem 4.1.

Proof

5. On The Computational Complexity of the Prediction Problem for Robust SSA Processes

Conjecture 5.1 ([Probabilistic, Space-Limited] Time Hierarchy)

Theorem 5.1.

Proof

6. Discussion

7. Appendix

A.1. Enforcing the ρ-perturbation constraint by the {ɛij}-perturbation constraint

A.2. Proof of Theorem 4.1: upper bound on the number of leaps

Theorem

Lemma A.1.

Proof

Lemma A.2 (Typical leaps)

Proof

Lemma A.3.

Proof

A.3. Proving robustness by monotonicity

Lemma A.4.

Proof

Corollary A.1.

Lemma A.5.

Proof

A.4. Robust embedding of a TM in an SCRN

Lemma A.6.

Proof

Lemma A.7.

Corollary A.2.

Proof

A.5. Proof of Theorem 5.1: lower bound on the computational complexity of the prediction problem

Theorem

A.6. On implementing BTL on a randomized TM

Footnotes

Acknowledgments

Disclosure Statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

A.1. Enforcing the ρ-perturbation constraint by the {ɛ_ij}-perturbation constraint